Most efficient way of looking up values in an array.

Question

Johann Vormadal on 17 Mar 2023

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1930445-most-efficient-way-of-looking-up-values-in-an-array

Commented: Johann Vormadal on 20 Mar 2023

I have an array (a kind of lookup array that I have generated) with 8 columns and around 400 rows. In every row the first four values represent the boundaries (y1, x2, y2 and x1) of rectangular regions in a plane. The last four values are integers I want to access if a query point is within the region described by the first four values. The rectangular regions fully cover the plane upto certain limits and do not overlap, but they are not regular/structured.

If I have, say, 10million points (represented by and x vector and a y vector) and I want to find for each of those the four integers (in column 5:8) that correspond to their region, What is the fastest way of looking this up? changing the way my lookup array represents the data is not off-limits, if it helps making the process faster.

I have tried the following, which is slow, seems very inefficient and breaks when the points are exactly on a boundary between regions:

QueryPoints = [X Y];                                                                                           % My 10000000 x 2 vector for query points.
n = length(X);                                                                                             % Determine number of query points
SearchIndex = all( [Y -X -Y X]  >= permute([IndexList(:,1) -IndexList(:,2:3) IndexList(:,4)], [3 2 1]),2);
myIntegers = zeros(n,4);
for i = 1:n
    myIntegers(i,:) = IndexList(SearchIndex(i,1,:),5:8);                                        % my 10000000 x 4 vector with the set of integers corresponding with my coordinates
end

Thanks.

4 Comments
Show 2 older commentsHide 2 older comments

Bruno Luong on 17 Mar 2023

Edited: Bruno Luong on 17 Mar 2023

Open in MATLAB Online

IMO using autoexpension might be not a good idea, since the intermediarete result (before all) requires

(10e6*4*8*452)/1e9
ans = 144.6400

144 Gb, so your computer might swap to HD and it works relly slow in thid condition.

It might be just loop on your rectangles (452 iterrations) then check which 10e6 point falling inside. The amount of memory requires is 0.33 Gb and it does not need to swap.

You can even split the 4 tests in succesive tests so that the later test work on a smaller set.

Johann Vormadal on 20 Mar 2023

Yes, the memory requirement was one of the issues I was having, that I didn't mention.

This seems to be a good way around that issue.

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 17 Mar 2023

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1930445-most-efficient-way-of-looking-up-values-in-an-array#answer_1195105

Edited: John D'Errico on 17 Mar 2023

Open in MATLAB Online

It hurts my head to see this ordering: (y1, x2, y2 and x1). Sigh. Why? Whatever floats your boat, I guess.

I would not be using a loop. Not even close. Not even remotely so. Nor would I be testing every rect, as your code does. Instead, I would be looking for a way to solve this using existing efficient code designed to solve that problem.

IF the rectangular regions fully tile a planar region, but do not overlap, I would first break each rect down into TWO triangles. The result will be a triangulation of that region. So I might now create a triangulation from those triangles. Don't use delaunay. (I don't know that your problem is convex anyway.) You already know the rects. And that means you already can compute the triangulation directly.

help triangulation
 triangulation     Triangulation in 2-D or 3-D
    triangulation supports topological and geometric queries for triangulations 
    in 2D and 3D space. For example, you can query triangles attached to a 
    vertex, triangle neighbor information, circumcenters, etc. 
    You can create a triangulation directly using existing triangulation data
    in matrix format. Alternatively, you can create a delaunayTriangulation, 
    which provides access to the triangulation functions.
 
    TR = triangulation(TRI, X, Y) creates a 2D triangulation from the 
    triangulation connectivity matrix TRI and the points (X, Y). TRI is an 
    m-by-3 matrix where m is the number of triangles.  Each row of TRI is a 
    triangle defined by vertex IDs - the row numbers of the points (X, Y).
    The point coordinates (X, Y) are column vectors representing the points 
    in the triangulation.
 
    TR = triangulation(TRI, X, Y, Z) creates a 3D triangulation from the 
    triangulation connectivity matrix TRI and the points (X, Y, Z). TRI is 
    an m-by-3 or m-by-4 matrix that represents m triangles or tetrahedra
    that have 3 or 4 vertices respectively.  Each row of TRI is a triangle 
    or tetrahedron defined by vertex IDs - the row numbers of the 
    points (X, Y, Z). The point coordinates (X, Y, Z) are column vectors 
    representing the points in the triangulation.
 
    TR = triangulation(TRI, P) creates a triangulation from the triangulation
    connectivity matrix TRI and the points P. TRI is an m-by-3 or m-by-4 
    matrix that represents m triangles or tetrahedra that have 3 or 4 vertices 
    respectively. Each row of TRI is a triangle or tetrahedron defined by 
    vertex IDs - the row numbers of the points P. P is a mpts-by-ndim matrix 
    where mpts is the number of points and ndim is the number of dimensions, 
    2 <= ndim <= 3.
 
 
    Example 1:
        % Load a 2D triangulation in matrix format and use the triangulation 
        % to query the free boundary edges.
        load trimesh2d
        % This loads triangulation tri and vertex coordinates x, y
        trep = triangulation(tri, x,y);
        fe = freeBoundary(trep)';
        triplot(trep);
        % Add the free edges in red
        hold on; plot(x(fe), y(fe), 'r','LineWidth',2); hold off;
        axis([-50 350 -50 350]);
        axis equal;
 
    Example 2:
        % Load a 3D tetrahedral triangulation in matrix format and use the 
        % triangulation to compute the free boundary.
        load tetmesh
        % This loads triangulation tet and vertex coordinates X
        trep = triangulation(tet, X);
        [tri, Xb] = freeBoundary(trep);
        %Plot the surface mesh
        trisurf(tri, Xb(:,1), Xb(:,2), Xb(:,3), 'FaceColor', 'cyan', 'FaceAlpha', 0.8);
 
    Example 3:
        % Direct query of a 3D Delaunay triangulation created using
        % delaunayTriangulation. Compute the free boundary as in Example 2
        Points = rand(50,3);
        dt = delaunayTriangulation(Points);
        [tri, Xb] = freeBoundary(dt);
        %Plot the surface mesh
        trisurf(tri, Xb(:,1), Xb(:,2), Xb(:,3), 'FaceColor', 'cyan','FaceAlpha', 0.8);
 
 
    triangulation methods:
         barycentricToCartesian - Converts the coordinates of a point from Barycentric to Cartesian
         cartesianToBarycentric - Converts the coordinates of a point from Cartesian to Barycentric
         circumcenter           - Circumcenter of triangle or tetrahedron
         edgeAttachments        - Triangles or tetrahedra attached to an edge
         edges                  - Triangulation edges
         faceNormal             - Triangulation face normal
         featureEdges           - Triangulation sharp edges
         freeBoundary           - Triangulation facets referenced by only one triangle or tetrahedron
         incenter               - Incenter of triangle or tetrahedron
         isConnected            - Test if a pair of vertices is connected by an edge
         neighbors              - Neighbors to a triangle or tetrahedron
         vertexAttachments      - Triangles or tetrahedra attached to a vertex
         vertexNormal           - Triangulation vertex normal
         size                   - Size of the triangulation ConnectivityList
         nearestNeighbor        - Vertex closest to a specified point
         pointLocation          - Triangle or tetrahedron containing a specified point
 
     triangulation properties:
         Points            - The coordinates of the points in the triangulation
         ConnectivityList  - The triangulation connectivity list
 
    See also delaunayTriangulation.

    Documentation for triangulation
       doc triangulation

    Other uses of triangulation

       polyshape/triangulation
methods triangulation
Methods for class triangulation:

barycentricToCartesian  edgeAttachments         featureEdges            isConnected             pointLocation           vertexAttachments       
cartesianToBarycentric  edges                   freeBoundary            nearestNeighbor         size                    vertexNormal            
circumcenter            faceNormal              incenter                neighbors               triangulation           

As you can see, triangulation has a pointLocation method. I might be using that.

help triangulation/pointLocation
  pointLocation  Triangle or tetrahedron containing specified point
  TI = pointLocation(T, QP)  returns the index of the triangle/tetrahedron
       enclosing the query point QP. The matrix QP contains the coordinates
       of the query points. QP is a mpts-by-ndim matrix where mpts is the 
       number of query points and 2 <= ndim <= 3.
       TI is a column vector of triangle or tetrahedron IDs corresponding to the 
       row numbers of the triangulation connectivity matrix T.ConnectivityList. 
       The triangle/tetrahedron enclosing the point QP(k,:) is TI(k). 
       Returns NaN for points not located in a triangle or tetrahedron of T.
  
       TI = pointLocation(T, QX,QY) and TI = pointLocation (T, QX,QY,QZ)
       allow the query points to be specified in alternative column vector 
       format when working in 2D and 3D.
  
       [TI, BC] = pointLocation(T,...) returns, in addition, the Barycentric
       coordinates BC. BC is a mpts-by-ndim matrix, each row BC(i,:) represents
       the Barycentric coordinates of QP(i,:) with respect to the enclosing TI(i).
 
     Example 1:
         % Point Location in 2D
         T = triangulation([1 2 4; 1 4 3; 2 4 5],[0 0; 2 0; 0 1; 1 1; 2 1]);
         % Find the triangle that contains the following query point
         QP = [1, 0.5];
         triplot(T,'-b*'), hold on
         plot(QP(:,1),QP(:,2),'ro')
         % The query point QP is located in a triangle with index TI = 1
         TI = pointLocation(T, QP)
 
     Example 2:
         % Point Location in 3D plus barycentric coordinate evaluation
         % for a Delaunay triangulation
         x = rand(10,1); y = rand(10,1); z = rand(10,1);
         DT = delaunayTriangulation(x,y,z);
         % Find the tetrahedra that contain the following query points
         QP = [0.25 0.25 0.25; 0.5 0.5 0.5]
         [TI, BC] = pointLocation(DT, QP)
 
     See also triangulation, triangulation.nearestNeighbor, delaunayTriangulation.

    Documentation for triangulation/pointLocation
       doc triangulation/pointLocation

Before I did too much, I might look to see if other tools, for example, tsearch might be faster.

For example:

n = 21;
[X,Y] = ndgrid(linspace(0,1,n)); % 400 rects
XYgrid = [X(:),Y(:)];
[ri,rj] = ndgrid(1:n-1);
rind = sub2ind([n,n],ri(:),rj(:));
tri = [[rind,rind+1,rind+n];[rind+n+1,rind+1,rind+n]];
triang = triangulation(tri,X(:),Y(:));
rectid = [rind;rind]; % maps each triangle into the original rect it came from
% now, generate 1e7 points in that region
xytest = rand(1e7,2);
tic
triloc = pointLocation(triang,xytest);
rectloc = rectid(triloc);
toc
Elapsed time is 7.104490 seconds.

7 seconds in the online tool, (my computer is always faster when running my own copy of MATLAB. That was only 1.5 seconds in my own copy of MATLAB.) seems reasonable to locate 1e7 points. In fact of course, I could do better knowing the actual layout of this grid. But you tell us that your rects are not so trivially arranged as I have in this example. And pointLocation should not care.

4 Comments
Show 2 older commentsHide 2 older comments

John D'Errico on 19 Mar 2023

Open in MATLAB Online

@Bruno Luong - sorry, but you are completely wrong in this. I showed how to use a triangulation, and the function pointLocation. Yes, I did suggest that tsearch MIGHT be an optional alternative. But I explaicity stated to use pointLocation as a working option. In fact, pointLocation does explicitly NOT need a convex triangulation.

Seriously Bruno, TRY IT if you don't know the answer.

PS = polyshape([0 0;2 0;2 1;1 1;1 2;0 2])

PS =

polyshape with properties: Vertices: [6×2 double] NumRegions: 1 NumHoles: 0

plot(PS)

PStri = triangulation(PS)

PStri =

triangulation with properties: Points: [6×2 double] ConnectivityList: [4×3 double]

PStri is a triangulation object, but not a convex one.

trimesh(PStri.ConnectivityList,PStri.Points(:,1),PStri.Points(:,2))

PStri can be used in by pointLocation.

pointLocation(PStri,2,2)
ans = NaN
pointLocation(PStri,.5,.5)
ans = 3

Convexity is not a requirement for pointLocation.

Even so, IF the domain is always a convex one, then tsearch MIGHT be faster. I have not compared the older code to see how it competes.

Bruno Luong on 19 Mar 2023

Edited: Bruno Luong on 19 Mar 2023

Delaunay doesn't not have to be convex If I remember one of the definntion is that that circumscrbed circle on any triangle does not contain any vertices other than the three of the triangle.

It can also derived as the geometrical "dual" of voronoi.

I believe various MATLAB tsearch uses some special property of Delaunay to make the search efficient and it breaks downs if it is not.

I remember in the pass trying to do interpolation and provide my own triangulation (for example spliiting some arbitray quadrilateral mesh coming from somewhere else) and calling closestpoint and I fail to do (or may be the process takes too much time, I can't recall the detail why I drop it).

May be I should revisit again.

Sign in to comment.

Answer 2

the cyclist on 17 Mar 2023

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1930445-most-efficient-way-of-looking-up-values-in-an-array#answer_1195045

Edited: the cyclist on 17 Mar 2023

Open in MATLAB Online

One simple step that could speed up your code a lot would be to preallocate memory for your output array. Put

myIntegers = zeros(n,4);

before your for loop.

You can read about why preallocation is important here.

1 Comment
Show -1 older commentsHide -1 older comments

Johann Vormadal on 17 Mar 2023

Edited: Johann Vormadal on 17 Mar 2023

Sorry, my Code snippet was missing that line. I am currently pre-allocating. I have editted the post to reflect this.

Sign in to comment.

Most efficient way of looking up values in an array.

4 Comments
Show 2 older commentsHide 2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Most efficient way of looking up values in an array.

4 Comments Show 2 older commentsHide 2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

4 Comments
Show 2 older commentsHide 2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

1 Comment
Show -1 older commentsHide -1 older comments