Fast calculation of short cumulative product vector

I want to calculate a reasonable short vector many times with variable inputs a and len
%y = [a a^2 a^3...a^(len)]
My methods so far are the following:
y1 = cumprod(a(ones(len,1)))
y2 = a.^(1:len)
y1 seems to be faster than y2 for len>11 or so. Any suggestions for doing this even faster? The knowledge that y(n) = y(n-1)*y(1) should be exploitable somehow, I just dont see how.

Answers (6)

Or this variation, which I don't see in the current posts:
y = cumprod(a.*ones(1,n));
I think the MATLAB JIT may recognize this formulation and not actually do the multiply, but just populate the result ala repmat.
And if this calculation is dominating your overall runtime, you can always turn to a simple mex function to do it and avoid the imtermediate array formulations.
time = zeros(100,3);
for len = 1:100
a = rand(1);
o = ones(1,len);
v = 1:len;
tic
for s = 1:1000; y1 = cumprod(a(o)); end
time(len,1) = toc;
tic
for s = 1:1000; y2 = a.^(v); end
time(len,2) = toc;
tic
for s = 1:1000; y3 = bsxfun(@power,a,v); end
time(len,3) = toc;
end
plot(time)
legend('cumprod','.^','bsxfun')
EDIT cumprod, power, explog, cumprodx, cumprodpow
James' cumprodpow is the fastest.
num_iter = 3000;
time = zeros(100,5);
for len = 1:100
a = rand(1);
tic
for s = 1:num_iter
y1 = cumprod(a(ones(1,len)));
end
time(len,1) = toc;
pause(0.001)
tic
for s = 1:num_iter
y2 = a.^(1:len);
end
time(len,2) = toc;
pause(0.001)
tic
for s = 1:num_iter
y3 = exp(log(a)*(1:len));
end
time(len,3) = toc;
pause(0.001)
tic
for s = 1:num_iter
y4 = cumprodx(a,len);
end
time(len,4) = toc;
pause(0.001)
tic
for s = 1:num_iter
y5 = cumprod(a.*ones(1,len));
end
time(len,5) = toc;
end
plot(time)
legend('cumprod','power','explog','cumprodx','cumprodpow')

1 Comment

Hi friends!
Firstly, +1;
secondly, small additions:
time = zeros(100,5);
for len = 1:100
a = rand(1);
o = ones(1,len);
v = 1:len;
tic
for s = 1:1000; y1 = cumprod(a(o)); end
time(len,1) = toc;
tic
for s = 1:1000; y2 = a.^(v); end
time(len,2) = toc;
tic
for s = 1:1000; y3 = bsxfun(@power,a,v); end
time(len,3) = toc;
tic
for s = 1:1000; y4 = arrayfun(@(x)a^x,v); end
time(len,4) = toc;
tic
for s = 1:1000, for v = len:-1:1, y5(v) = a^v; end; end
time(len,5) = toc;
end
plot(time)
legend('cumprod','power','bsxfun','arrayfun','for..end')

Sign in to comment.

Nobody has followed Knut's suggestion:
y(n) = y(n-1)*y(1)
It is in fact a special case of Horner's Method for evaluating a polynomial
yn(x) = a0 + a1*x + a2*x^2 + ... + an*x^n.
The time behavior of the two methods below is very erratic but the loop or Horner method always seems to beat the CumProd method by 30% to 50% up to about n = 150. After that, CumProd gets better and better. At n = 1000, Horner's time is about 5*CumProd time.
If you have the time try [yc,yh,time] = CumProdvsHorner(1000, 10000);
Has anyone got an explanation for the spikes in the execution times?
function [yc,yh,yx,time] = CumProdvsHorner(nmax,ns);
% Calculation of y = [a a^2 ... a^n]
% USE: [yc,yh,yx,time] = CumProdvsHorner(300,1000);
nmeths=3;
time = zeros(nmax,nmeths);
ntime = zeros(nmax,nmeths);
for n = 2:nmax
x = rand; xsave = x;
yc = zeros(1,n);
yh = zeros(1,n);
yx = zeros(1,n);
tc = tic;
for s = 1:ns
yc = cumprod(x.*ones(1,n));
end
time(n,1) = toc(tc);
tr = tic;
x = xsave;
for s = 1:ns
yh(1) = x;
for k = 2:n
yh(k) = x*yh(k-1);
end
end
time(n,2) = toc(tr);
tx = tic;
for s = 1:ns
yx = cumprodx(x,n);
end
time(n,3) = toc(tx);
end % for n
figure
plot(20:nmax,time(20:nmax,:))
legend('CumProd','Horner','CumProdx')
xlabel('n');
ylabel('Time (secs)')
title('Calculate y(x,n) = [x x^2 x^3 ... x^n]')
for m = 1:nmeths
ntime(:,m) = time(:,m)./(1:nmax)';
end
figure
plot(20:nmax,ntime(20:nmax,:))
legend('CumProd','Horner','CumProdx')
xlabel('n');
ylabel('Normalized Time T(n)/n')
title('Calculate y(x,n) = [x x^2 x^3 ... x^n]')
%
% END CumProdvsHorner
EDIT: Added plots for normalized times, renamed some variables.

5 Comments

Time spikes can be due to additional cores being allocated (automatic multiprocessing within MATLAB proper), or due to MATLAB deciding to switch over to BLAS (which in turn might use multiple cores.)
Which MATLAB version did you time on, Derek ?
Walter,
Using Windows Task Manager, I set the affinity for Matlab to cpu 0, on this setup:
System: Dell Precision 690, Intel 2xQuad-Core E5345 @ 2.33GHz
16GB RAM, Windows7 64-bit Professional.
MATLAB Version 7.6.0.324 (R2008a)
I get the same spikes, no multi-core
Dell Precision 360 @ 2.25 GHz 1 GBRam, windoze XP SP2
Matlab 2008b
maybe time for new computer :(
also spikes are different for Horner & CumProd
@proecsm
Take a look at the normalized times with the updated function using:
[yc,yh,time] = CumProdvsHorner(1000,1000);
CumProd's behavior is very strange. As Walter says, this may be due to switching in and out of BLAS.

Sign in to comment.

Here is the quick and dirty mex solution:
// File cumprodx.c
// c = cumprodx(a,n) does equivalent of c = cumprod(a.*ones(1,n))
// a = real double scalar
// n = real double scalar with integral value > 0
// Caution: No argument checking
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double a = mxGetScalar(prhs[0]);
mwSize i, n = mxGetScalar(prhs[1]);
double *pr;
plhs[0] = mxCreateDoubleMatrix(1,n,mxREAL);
pr = mxGetPr(plhs[0]);
pr[0] = a;
for( i=1; i<n; i++ ) {
pr[i] = pr[i-1] * a;
}
}

3 Comments

James,
For [yc,yh,yx,time] = CumProdvsHorner(1000,10000);
Horner beats cumprod up to n = 150, cumprodx up to n = 250.
After that cumprod beats cumprodx but they seem to be converging to equality at about n = 1500 - 2000.
For the smaller values of n the overhead plays a significant factor as a percentage of the overall timing. The timing advantage of Horner's method in the testing isn't because it is inherently a faster method, but because it avoids the overhead of function calls. If you place Horner's method inside a function I think you will see it suddenly become one of the slowest methods. I am not knocking your previous timing tests ... they are valid and the fact that one can code a method without function calls *is* a real advantage. I am just pointing out that the timing comparison is actually being dominated by function call overhead in the small n cases.
A good point. I had overlooked the function call overhead, which
can be a significant part of the time to do the necessary (n-1)
multiplications.
However, the timings serve as a warning not to forget the simpler,
loopy methods, especially with (an improving?) JIT compiler.
Vectorization has become something of a fetish with some Matlab
programmers, which blinds them to simpler methods.

Sign in to comment.

Thank you for all your help. There seems to be many different ways to accomplish this, each with different speed-tradeoffs. The mex-file might run faster with single-presicion and optimization flags, but I did not bother.
For lengths < ~80 elements, 'Horners Method' seems to be the best choice. I did notice some strange "warm-up" effects: increasing the number of iterations caused the relative performance to shift quite a lot. Is this the JIT playing games on me?
num_iter = 100;
len_span = [1:50 100:100:2000 1e4 3e4 1e5 3e5 1e6];
num_iter_span = 1e6./len_span;
time = zeros(length(len_span),7);
for len_idx = 1:length(len_span)
num_iter = num_iter_span(len_idx);
len = len_span(len_idx);
eps = 1e-7;
a = rand(1);
tic
for s = 1:num_iter;
y1 = cumprod(a*ones(1, len));
end
time(len_idx,1) = toc;
assert(max(abs(y1-y1))<eps);
tic
for s = 1:num_iter;
y2 = a.^(1:len);
end
time(len_idx,2) = toc;
assert(max(abs(y1-y2))<eps);
tic
for s = 1:num_iter;
y3 = bsxfun(@power,a,1:len);
end
time(len_idx,3) = toc;
assert(max(abs(y1-y3))<eps);
tic
for s = 1:num_iter,
for v = len:-1:1
y4(v) = a^v;
end;
end
time(len_idx,4) = toc;
assert(max(abs(y1-y4))<eps);
tic
for s = 1:num_iter,
y5(1) = a;
for v = 2:len
y5(v) = a*y5(v-1);
end;
end
time(len_idx,5) = toc;
assert(max(abs(y1-y5))<eps);
tic
for s = 1:num_iter,
y6 = cumprodx(a,len);
end
time(len_idx,6) = toc;
assert(max(abs(y1-y6))<eps);
tic
for s = 1:num_iter,
y7 = exp(log(a)*(1:len));
end
time(len_idx,7) = toc;
assert(max(abs(y1-y7))<eps);
end
figure
semilogx(len_span, time)
xlabel('vector length')
ylabel('time/vector length')
grid on
axis tight
legend('cumprod(a([1 1,...]))','power','bsxfun','for..end', 'Horners Method', 'mex', 'log')

3 Comments

Why do you keep the 1:len and ones(.) within the tic toc?
Because this is similar to how I use the code in my application:
len_vect = [9 20 19 42 50];
a_vect = [0.7 0.8 0.9 1 1.1];
for t = 1:length(len_vect)
y = a_vect(t).^(1:len_vect(t));
end
As both a and len change for each iteration, it makes sense to include them within the tic and toc: I dont see how I can hide the cost of creating the vectors outside the loop?
You didn't mention that in your original post.
Anyways, in this case I find that James' cumprod(a.*ones(1,len)) is the fastest, always.

Sign in to comment.

Tags

Asked:

on 6 Sep 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!