Optimize this loop to take less computing time

Below is a snippet of code which loops through four arrays (stored as structures) and returns an array based on comparison of variables between each of the input arrays while displaying the growing length of dist_in. The output is exactly what I want: an array of se_dist.dist ordered by specific matching indexes in the other arrays. The problem is that this code takes forever to run since se_dist.dist is ~[11000 1] and ptime_in.evid is ~[12400 1] making the total loops in the millions. I am looking for a way to have this maintain the same output while decreasing the computational time.
I have attempted using find, ismember, intersect and others, however intersect seems to only return the first matching index, and the others require arrays of the same length.
h = length(se_dist.dist);
r = length(ptime_in.evid);
dist_in = zeros(r,1);
for k = 1:h
for n = 1:r
if strcmp(se_dist.id(k),ptime_in.sta(n))==1 && se_dist.evid(k)==ptime_in.evid(n)
dist_in(n) = se_dist.dist(k);
end
end
end
A bit of extra information. se_dist.dist is [10880x1 double]. ptime_in.evid is [12413x1 double]. se_dist.id is {10880x1 cell}. ptime_in.sta is {12413x1 cell}. se_dist.evid is [10880x1 double].
I don't mind the code taking a while, however this takes ~20min to run, and this is just a test case. In the future I will be running this where the length of the arrays are on the scale of ~100000 instead of ~10000 which assuming linear scaling would mean 3.5ish hrs just for this nested loop.
Any help is Appreciated. I can provide any other necessary information if needed.

5 Comments

With meshgrid you can replicate your cell-vector and double-vector to matrices. strcmp and == still will work and you can at least reduce it to loop through the values that matter (use the [row,col]=find( ) syntax). I am not sure about how you could rework the inside so you can avoid a loop altogether.
Let me know if you need some code for this, I'll try to put something together. I suspect that there are really fast solutions involving bsxfun, so you could also take a look there.
Jan
Jan on 14 Mar 2017
Edited: Jan on 14 Mar 2017
For an optimization it would be very helpful if you can provide some test data. Otherwise improvements are based on guessing and experiences with totally different problems.
disp requires a lot of time. Is this required?
I will look into the meshgrid and bsxfun commands as suggested. In the mean time here are the two structures which are used in the loop. They should be all that is needed to run the code snippet.
Also it seems I must have had other things processing when I timed the loop last, as now with nothing running it takes ~20 min on my machine. Ideally I would still like to get this down to a shorter time.
Also disp() was just a convenience for me to make sure that it was running when I was building and testing the code and is not needed. I will take it out. Thanks for the responses.
Hello Clayton, Certainly an easy improvement just to start with is to take the lookups of se_dist.id(k) and two other variables out of the inner for loop (Matlab may be smart enough to do that on its own these days, I guess this will find out). Further improvement is definitely in the cards but you can take a look and see what this does timewise.
for k = 1:h
idk = se_dist.id(k)
evidk = se_dist.evid(k)
distk = se_dist.dist(k)
for n = 1:r
if strcmp(idk,ptime_in.sta(n))==1 && evidk ==ptime_in.evid(n)
dist_in(n) = distk;
id = find(dist_in~=0); % needed?
disp(length(id)); % needed?
end
end
end
dist_in(n) = se_dist.dist(k);
This will overwrite the Nth value of dist_in if subsequent iterations through k satisfy the if logic. Is this truly the output you desire?

Sign in to comment.

 Accepted Answer

Greg
Greg on 15 Mar 2017
Edited: Greg on 15 Mar 2017
I didn't run your data through to verify, but I'm pretty sure you can kill the inner loop. Although, the fact that nobody else has suggested this makes me feel like I'm missing a piece. For the record, strcmp() already returns logical, so the ==1 is unnecessary, and slower.
for k = 1:h
blnMatch = strcmp(se_dist.id{k},ptime_in.sta) & se_dist.evid(k)==ptime_in.evid
dist_in(blnMatch) = se_dist.dist(k);
end

1 Comment

Thank you all for your input. Prior to Greg's answer I think i was getting close with the bsxfun command however the solution he proposed works as intended and is quite quick (~2.2s). Thank you for the responses.

Sign in to comment.

More Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!