Can this for-loop code get faster in some way?

I got a big result table named resTbl.
There I need for each row to grab a timestamp (posix time) and construct a period interval vector (I use a constant, ts_length, to construct this).
Then I need to find all those in period_interval that are represented in the variable data (column 2), and take the sum of those in data (column 7).
The below code works as I want it to:
for i = 1 : size(resTbl ,1)
period_interval = resTbl(i,2) : 60000 : resTbl(i,2) + ts_length;
[hd, he] = ismember(period_interval,data(:,2));
resTbl(i,10) = sum(data(he(hd),7));
end
The problem is that it is slow since resTbl has many rows. Does anyone have a suggestion how to make it faster?

Answers (1)

darova
darova on 5 Oct 2019
Edited: darova on 5 Oct 2019
Maybe ismember function can be replaced:
dt = 60000;
period_interval = 0 : dt : ts_length;
n = length(period_interval);
for i = 1 : size(resTbl ,1)
cond = ~mod( data(:,2)-resTbl(i,2),dt ); % multiple by 60 000
mult = (data(:,2)-resTbl(i,2))/dt;
ind = (0 <= mult & mult <= n) & cond; % (0 <= multiplier <= n) and multiple by 60 000
% [hd, he] = ismember(period_interval,data(:,2));
resTbl(i,10) = sum(data(ind,7));
end

4 Comments

Thank you very much. Unfortunately I am not sure how much faster this is. It looks like it is almost the same
What about this?
dt = 60000;
period_interval = 0 : dt : ts_length;
n = length(period_interval);
[D,R] = ndgrid(data(:,2),resTbl(:,2)); % 2D matrix
cond = ~mod( D-R,dt ); % multiple by 60 000
mult = (D-R)/dt; % multiplier
ind = (0 <= mult & mult <= n) & cond; % (0 <= multiplier <= n) and multiple by 60 000
ndata = ind .* repmat( data(:,7), [1 size(resTbl,1)] );
resTbl(:,10) = sum(ndata)';
Pretty brilliant solution I really have to admit!
It is faster but unfortunately it is however also pretty time consuming and VERY heavy on the memory due to the matrices!
With my current memory, 32 GB, I can not run the full resTbl set (3400 rows) on the dataset which is like 2 million rows and 7 columns.
I tried with smaller resTbl and data set, and for some reason I had to alter n to this: n = length(period_interval)-1; to match my own results.
I am not even sure if there exist a better solution to this problem than yours. I gladly hear from you again, but otherwise I say thank you very much!
2 million rows and 7 columns
Maybe time is a price in this case

Sign in to comment.

Products

Asked:

on 5 Oct 2019

Commented:

on 6 Oct 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!