How can the following code be optimized / vectorized?

Question

0 votes

How can you optimize the following? It is painfully slow and takes around 5 minutes to run on my computer. Here I am calcuatling the distances between atoms in molecules while taking account periodic boundary conditions (which mean an atom on the edge of the box might actually belong to a molecule that wraps around, so to speak). I imagine this could be done in a far more efficient and speedy way than my Fortran-esque implementation.

A % an array where column 1 is the atom ID, column 2 is the molecule ID, and columns 3-5 are the positions in x, y, and z
N = 200; % molecules in system
Sites = 150; % sites in each molecule
length = 120; % length of box on each side
SitesInSystem = N*Sites;
B = zeros(SitesInSystem,SitesInSystem);
% center box at origin
A(:,3) = A(:,3) - length/2;
A(:,4) = A(:,4) - length/2;
A(:,5) = A(:,5) - length/2;
for ii = 1:SitesInSystem
    for jj = 1:SitesInSystem
        
        if ii ~= jj % if not the same site
            
            dx = A(jj,3) - A(ii,3);
            dy = A(jj,4) - A(ii,4);
            dz = A(jj,5) - A(ii,5);
                
            % taking into account periodic boundaries
            if (dx >   length * 0.5)
                dx = dx - length;
            elseif (dx <= -length * 0.5) 
                dx = dx + length;
            end
            if (dy >   length * 0.5)
                dy = dy - length;
            elseif (dy <= -length * 0.5) 
                dy = dy + length;
            end
            if (dz >   length * 0.5)
                dz = dz - length;
            elseif (dz <= -length * 0.5) 
                dz = dz + length;
            end
                
            % calculate distances
            B(ii,jj) = sqrt(dx^2+dy^2+dz^2);
                
        end
    end
end
B = triu(B); % to avoid double counting
    

Perhaps this also can be combined with the following snippet, which calculates something based on the values of the matrix B:

for ii = 1:N % molecule
    for jj = 1:Sites % site
        for kk = 1:N
            for mm = 1:Sites
                if ii ~= kk % if not the same molecule
                    if (B(ii*jj,kk*mm) < 1.75) && (B(ii*jj,kk*mm) > 0)
                        % do a calculation, i.e., add edge ii-kk to a graph
                        % with N nodes
                    end
                        
                end
            end
        end
    end
end

4 Comments
Show 2 older comments Hide 2 older comments

Stephen23 on 16 Mar 2022

"Here you can read the data into MATLAB and use readtable and table2array to turn the data into a MATLAB array."

READMATRIX would be simpler and more efficient: https://www.mathworks.com/help/matlab/ref/readmatrix.html

L'O.G. on 16 Mar 2022

@Stephen Thanks. I didn't know about that function!

Sign in to comment.

Sign in to answer this question.

Sign in to follow activity

Answer 1

Bruno Luong on 16 Mar 2022

Edited: Bruno Luong on 16 Mar 2022

Open in MATLAB Online

0 votes

It takes about 25 seconds only PC

Ac35 = A(:,3:5);
Ai = reshape(Ac35,[],1,3);
B = zeros(SitesInSystem,SitesInSystem);
for j=1:SitesInSystem
    Aj = reshape(Ac35(j,:),1,[],3);
    dxyz = Ai-Aj;
    dxyz = mod(dxyz+length/2,length)-length/2;
    B(:,j) = sqrt(sum(dxyz.^2,3));
end

1 Comment
Show -1 older comments Hide -1 older comments

L'O.G. on 16 Mar 2022

Excellent. Thank you.

Sign in to comment.

Answer 2

KSSV on 16 Mar 2022

Open in MATLAB Online

0 votes

I think pdist2, this inbuilt function will work for you.

T = readtable('Sample.xlsx') ; 
A = table2array(T) ; 
A = A(:,3:5) ;  % pick coordinates (x,y,z) 
d = pdist2(A,A) ;
B = triu(d) ;     % you may check this with you answer. Use isequal to check. 

3 Comments
Show 1 older comment Hide 1 older comment

KSSV on 16 Mar 2022

I have checked your code and pdist2 for the provided 274*5 data. Both the methods, gave same solution.

L'O.G. on 16 Mar 2022

Edited: L'O.G. on 16 Mar 2022

@KSSV Right, but for the full case (not the sample file I gave) I would need to consider that some atoms might be at the edges of the box, so that periodic boundaries as applied to that molecule must be taken into account. Sorry I didn't make that clearer. I mentioned it in my initial query and coded it using the if statements nested in the orignal set of for loops, but I guess not very clearly.

Sign in to comment.

Answer 3

Walter Roberson on 16 Mar 2022

Open in MATLAB Online

0 votes

You can see that the rem() based expression I use here can be used to vectorize the calculation.

format bank
Length = 5;
x = -3.5 : .5 : 3.5
x = 1×15
         -3.50         -3.00         -2.50         -2.00         -1.50         -1.00         -0.50             0          0.50          1.00          1.50          2.00          2.50          3.00          3.50
for K = 1 : length(x)
    dx = x(K);
    if (dx >   Length * 0.5)
        dx = dx - Length;
    elseif (dx <= -Length * 0.5) 
        dx = dx + Length;
    end
    newx_if(K) = dx;
end
newx_mod = rem((x - Length/2), Length) + Length/2;
newx_if
newx_if = 1×15
          1.50          2.00          2.50         -2.00         -1.50         -1.00         -0.50             0          0.50          1.00          1.50          2.00          2.50         -2.00         -1.50
newx_mod
newx_mod = 1×15
          1.50          2.00          2.50         -2.00         -1.50         -1.00         -0.50             0          0.50          1.00          1.50          2.00          2.50          3.00          3.50
newx_if - newx_mod
ans = 1×15
             0             0             0             0             0             0             0             0             0             0             0             0             0         -5.00         -5.00

3 Comments
Show 1 older comment Hide 1 older comment

Walter Roberson on 16 Mar 2022

Open in MATLAB Online

newxyz_mod = rem((A - Length/2), Length) + Length/2; % for x, y, z

looks fine, as long as Length is the same in all three directions.

You would follow it with

B = sqrt( sum(newxyz_mod.^2, 2) );

L'O.G. on 16 Mar 2022

Edited: L'O.G. on 16 Mar 2022

@Walter Roberson Got it. Thank you! And about my other question, the differences -- rem() should use the differences, correct? Not the positions themselves? That's my understanding, but I don't know how to calculate that efficiently. As I mentioned in my previous comment, something like pdist(A(:,1),@(x,y) x-y); generates a huge vector and is slow, and that's just for x! I think there will be (N*Sites)^2 / 2 possible distances which will fill an upper triangular matrix. I'm not sure how to code that efficiently here, though.

Sign in to comment.

How can the following code be optimized / vectorized?

4 Comments
Show 2 older comments Hide 2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (2)

3 Comments
Show 1 older comment Hide 1 older comment

3 Comments
Show 1 older comment Hide 1 older comment

Categories

Products

Release

Tags

Community Treasure Hunt

How can the following code be optimized / vectorized?

4 Comments Show 2 older comments Hide 2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (2)

3 Comments Show 1 older comment Hide 1 older comment

3 Comments Show 1 older comment Hide 1 older comment

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

3 Comments
Show 1 older comment Hide 1 older comment

3 Comments
Show 1 older comment Hide 1 older comment