# faster way to add many large matrix in matlab

43 views (last 30 days)
Haining Pan on 8 May 2018
Edited: Jan on 4 Nov 2019
Say I have many (around 1000) large matrices (about 1000 by 1000) and I want to add them together element-wise. The very naive way is using a temp variable and accumulates in a loop. For example,
summ=0;
for ii=1:20
for jj=1:20
summ=summ+ rand(400);
end
end
After searching on the Internet for some while, someone suggests it's better to do with the help of sum(). For example,
sump=zeros(400,400,400);
count=0;
for ii=1:20
for j=1:20
count=count+1;
sump(:,:,count)=rand(400);
end
end
summ=sum(sump,3);
However, after I tested two ways, the result is
Elapsed time is 0.780819 seconds.
Elapsed time is 1.085279 seconds.
which means the second method is even worse.
So I am just wondering if there any effective way to do addition? Assume that I am working on a computer with very large memory and a GTX 1080 (CUDA might be helpful but I don't know whether it's worthy to do so since communication also takes time.)

Ameer Hamza on 8 May 2018
On my machine, this is even slower than the OP's method.
1) Result for OP's method
tic
sump=zeros(400,400,400);
count=0;
for ii=1:20
for j=1:20
count=count+1;
sump(:,:,count)=rand(400);
end
end
summ=sum(sump,3);
toc
Result
Elapsed time is 0.790315 seconds.
2) indexing first entry
tic
sump=zeros(400,400,400);
count=0;
for ii=1:20
for j=1:20
count=count+1;
sump(count,:,:)=rand(400);
end
end
summ=sum(sump,1);
toc
Result
Elapsed time is 1.390100 seconds.
Haining Pan on 8 May 2018
@per isakson, I got even worse performance. The result now becomes
Elapsed time is 0.737624 seconds.
Elapsed time is 1.620767 seconds.
per isakson on 10 May 2018
I assumed the question was about the speed of sum( .... );

Jan on 8 May 2018
Edited: Jan on 4 Nov 2019
The main time is spent in rand() in your example. With using ones() instead, the runtime goes from 0.71 sec to 0.25 sec on my machine.
Instead of creating the matrices explicitely, you could think of solving the problem mathematically, if the matrices are really exp(i*x+j*y). So please post the real code, not just some dummy code, whose most expensive function is not part of the real problem at all.

Haining Pan on 8 May 2018
Thanks for your reply! I have tried what you suggested but find these:
1.you said it's because rand() takes so much time but the two different methods both have rand() which is called for the same times.
2.If I changed rand() to ones(), I still get the result that
Elapsed time is 0.287218 seconds.
Elapsed time is 0.593293 seconds.
which means the second is slower.
3. The actual codes are as follows, although it has many irrelevant functions.
[~,aa]=energy(kx,ky,parameters);%%here aa is a returned value of user-defined function _energy_ which is just a column of length Nmax^2
x=linspace(nn(1),pp(1),NN);
y=linspace(nn(2),pp(2),NN);%%nn(1) and pp(1),etc are just real numbers, you can substitute them with any numbers.
[XX,YY]=meshgrid(x,y);
sumarray=zeros(NN,NN,Nmax*Nmax);
counter=0;
for j=-Nmax:Nmax
for k=-Nmax:Nmax
counter=counter+1;
sumarray(:,:,counter)=aa(counter)*exp(1i*((j*b1(1)+k*b2(1))*x+(j*b1(2)+k*b2(2))*y));%%b1(1),b2(1),etc are also just real numbers.
end
end
total=sum(sumarray,3);
Jan on 8 May 2018
A cleaned version of the code would be:
x = linspace(nn(1),pp(1),NN);
y = linspace(nn(2),pp(2),NN);
b11 = b1(1);
b12 = b1(2);
b21 = b2(1);
b22 = b2(2);
s = 0;
for j = -Nmax:Nmax
for k = -Nmax:Nmax
s = s + aa(counter) * ...
exp(1i*((j*b11 + k*b21) * x + ...
(j*b12 + k*b22) * y));
end
end
The most expensive part is the exp function here. As far as I can see, the argument of exp() is a [1 x NN] vector and not a [NN x NN] matrix. Then the code should fail with an error message. Please post a running code. Replace functions like your energy by rand, if it is sufficient for the computations.
It is hard to suggest methods to accelerate code, which does not run at all. But the general idea is to reduce the number of exp evaluations. Use exp(a+b) = exp(a)*exp(b). Instead of creating the matrix as argument based on two vectors, calculate the exp at first and create the matrix afterwards. In addition you might be able to exploit, that x and y are created by linspace:
x = linspace(1, 10, 2000);
e1 = exp(x); % 2000 expensive exp() calls
c = x(2) - x(1);
% 2 expensive exp() calls and a cheaper cumulative product:
e2 = cumprod([exp(x(1)), repmat(exp(c), 1, length(x)-1)]);
The cumulative product is much cheaper, but suffers from the accumulated rounding error.
You can use an equivalent method to use the value of the exp function at k=n to get the value for k=n+1.
I recommend to write down the formula and simplify the equation with paper and pencil at first. Solving the sum by brute computing power is less efficient.
Haining Pan on 10 May 2018
After several days attempting, I found a very straightforward method by using 3d matrix. For example, I can use a=rand(400,400,400) to directly create such the whole pages of matrices and sum(a,3) to get the sum. For this exact problem, I used x+y to create a 2d matrix and multiplied (.*) by a 1 by 1 by (2Nmax+1)^2 matrix of j and k to have exp(j*x+ k*y), which is a 3d matrix. Then simply take the sum by sum(..,3).
This is about 3 times faster, and even 10 times faster if I used CUDA.