How can the following code be optimized / vectorized?

How can you optimize the following? It is painfully slow and takes around 5 minutes to run on my computer. Here I am calcuatling the distances between atoms in molecules while taking account periodic boundary conditions (which mean an atom on the edge of the box might actually belong to a molecule that wraps around, so to speak). I imagine this could be done in a far more efficient and speedy way than my Fortran-esque implementation.
A % an array where column 1 is the atom ID, column 2 is the molecule ID, and columns 3-5 are the positions in x, y, and z
N = 200; % molecules in system
Sites = 150; % sites in each molecule
length = 120; % length of box on each side
SitesInSystem = N*Sites;
B = zeros(SitesInSystem,SitesInSystem);
% center box at origin
A(:,3) = A(:,3) - length/2;
A(:,4) = A(:,4) - length/2;
A(:,5) = A(:,5) - length/2;
for ii = 1:SitesInSystem
for jj = 1:SitesInSystem
if ii ~= jj % if not the same site
dx = A(jj,3) - A(ii,3);
dy = A(jj,4) - A(ii,4);
dz = A(jj,5) - A(ii,5);
% taking into account periodic boundaries
if (dx > length * 0.5)
dx = dx - length;
elseif (dx <= -length * 0.5)
dx = dx + length;
end
if (dy > length * 0.5)
dy = dy - length;
elseif (dy <= -length * 0.5)
dy = dy + length;
end
if (dz > length * 0.5)
dz = dz - length;
elseif (dz <= -length * 0.5)
dz = dz + length;
end
% calculate distances
B(ii,jj) = sqrt(dx^2+dy^2+dz^2);
end
end
end
B = triu(B); % to avoid double counting
Perhaps this also can be combined with the following snippet, which calculates something based on the values of the matrix B:
for ii = 1:N % molecule
for jj = 1:Sites % site
for kk = 1:N
for mm = 1:Sites
if ii ~= kk % if not the same molecule
if (B(ii*jj,kk*mm) < 1.75) && (B(ii*jj,kk*mm) > 0)
% do a calculation, i.e., add edge ii-kk to a graph
% with N nodes
end
end
end
end
end
end

4 Comments

It looks like this can be achieved with ease. Why don't you attach the data for A. What is its size? I feel it is a column major array. I am surprised with the loop indexing looking at A.
L'O.G.
L'O.G. on 16 Mar 2022
Edited: L'O.G. on 16 Mar 2022
A is quite large: 30000 (rows) x 5 (columns), since N (molecules) is 200 here and Sites (atoms) are 150. As mentioned in my attempt, column 1 is the atom ID (integer from 0 to 29999), column 2 is the molecule ID (integer from 0 to 199), and columns 3-5 are the positions in x, y, and z, respectively (integer for now, but could be a floating point number). The attached is a very pared down example (again, not the whole data, which is quite large). The example just has two molecules consisting of 137 atoms each (rather than 150 atoms per molecule as in my code above). Here you can read the data into MATLAB and use readtable and table2array to turn the data into a MATLAB array.
"Here you can read the data into MATLAB and use readtable and table2array to turn the data into a MATLAB array."
READMATRIX would be simpler and more efficient: https://www.mathworks.com/help/matlab/ref/readmatrix.html
@Stephen Thanks. I didn't know about that function!

Sign in to comment.

 Accepted Answer

It takes about 25 seconds only PC
Ac35 = A(:,3:5);
Ai = reshape(Ac35,[],1,3);
B = zeros(SitesInSystem,SitesInSystem);
for j=1:SitesInSystem
Aj = reshape(Ac35(j,:),1,[],3);
dxyz = Ai-Aj;
dxyz = mod(dxyz+length/2,length)-length/2;
B(:,j) = sqrt(sum(dxyz.^2,3));
end

More Answers (2)

I think pdist2, this inbuilt function will work for you.
T = readtable('Sample.xlsx') ;
A = table2array(T) ;
A = A(:,3:5) ; % pick coordinates (x,y,z)
d = pdist2(A,A) ;
B = triu(d) ; % you may check this with you answer. Use isequal to check.

3 Comments

L'O.G.
L'O.G. on 16 Mar 2022
Edited: L'O.G. on 16 Mar 2022
@KSSV Thanks, I really appreciate it. Two things: first, due to the calculation of the differences dx, dy, and dz that take into account periodic boundary calculations, is there an easy way of incorporating that without all of the for loops? In your proposed solution, that's not taken into account. Second, is the second set of for loops necessary to do the calculations I mention using B?
I have checked your code and pdist2 for the provided 274*5 data. Both the methods, gave same solution.
L'O.G.
L'O.G. on 16 Mar 2022
Edited: L'O.G. on 16 Mar 2022
@KSSV Right, but for the full case (not the sample file I gave) I would need to consider that some atoms might be at the edges of the box, so that periodic boundaries as applied to that molecule must be taken into account. Sorry I didn't make that clearer. I mentioned it in my initial query and coded it using the if statements nested in the orignal set of for loops, but I guess not very clearly.

Sign in to comment.

You can see that the rem() based expression I use here can be used to vectorize the calculation.
format bank
Length = 5;
x = -3.5 : .5 : 3.5
x = 1×15
-3.50 -3.00 -2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50
for K = 1 : length(x)
dx = x(K);
if (dx > Length * 0.5)
dx = dx - Length;
elseif (dx <= -Length * 0.5)
dx = dx + Length;
end
newx_if(K) = dx;
end
newx_mod = rem((x - Length/2), Length) + Length/2;
newx_if
newx_if = 1×15
1.50 2.00 2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 -2.00 -1.50
newx_mod
newx_mod = 1×15
1.50 2.00 2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50
newx_if - newx_mod
ans = 1×15
0 0 0 0 0 0 0 0 0 0 0 0 0 -5.00 -5.00

3 Comments

L'O.G.
L'O.G. on 16 Mar 2022
Edited: L'O.G. on 16 Mar 2022
@Walter Roberson Thanks very much. I am trying to understand what you did. I guess I should actually take the differences dx (and dy and dz) before using rem() to take into account the boundaries. How would I do that efficiently? Generating all distances (for x, for example) by something like pdist(A(:,1),@(x,y) x-y); generates a huge vector and is slow.
newxyz_mod = rem((A - Length/2), Length) + Length/2; % for x, y, z
looks fine, as long as Length is the same in all three directions.
You would follow it with
B = sqrt( sum(newxyz_mod.^2, 2) );
L'O.G.
L'O.G. on 16 Mar 2022
Edited: L'O.G. on 16 Mar 2022
@Walter Roberson Got it. Thank you! And about my other question, the differences -- rem() should use the differences, correct? Not the positions themselves? That's my understanding, but I don't know how to calculate that efficiently. As I mentioned in my previous comment, something like pdist(A(:,1),@(x,y) x-y); generates a huge vector and is slow, and that's just for x! I think there will be (N*Sites)^2 / 2 possible distances which will fill an upper triangular matrix. I'm not sure how to code that efficiently here, though.

Sign in to comment.

Products

Release

R2021b

Asked:

on 16 Mar 2022

Commented:

on 16 Mar 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!