ismember runs too slow (others have asked similar questions)

I am trying to use ismember in a long for loop to identify indices shared between arrays. This is pretty similar to a question that was asked here. Hopefully, an answer can be found for my question.
Below is my script that is taking far too long to run. I've included a sample of my workspace with the important variables here.
The main issue I'm encountering is that using ismember takes far too long. I've tried using parfor to see if that would help but it doesn't mitigate the problem much. If there is a faster way to make the second for loop run faster that would be the type of solution I'm looking for.
As suggested by those in the comments, I have profiled the section of my code in the second for loop. The largest amount of total time spent is from using ismember. The function with the largest self time is sortrows>sort_back_to_front. This is after preallocating the TimeAvg variable.
Lcu =length(count_uni);
for i=1:Lcu
count = count_uni(i);
k = find(count_all==count);
% count = 1: 1 unique values
% count = 2: 2 unique values
% etc.
PosAllk = PosAllrep(k,:);
A_allk = A_all(k); PVTr_allk = PVTr_all(k);
count_allk = count_all(k);
if count>1
LPAk = length(PosAllk);
for j=1:LPAk
b = ismember(PosAllrep,PosAllk(j,:),'rows');
Aavg = mean(A_all(b)); Pavg = mean(PVTr_all(b));
g = ismember(PosAlluni,PosAllk(j,:),'rows');
TimeAvg = [TimeAvg;[PosAlluni(g,:),Aavg,Pavg,count]];
end
else
TimeAvg = [PosAllk,A_allk,PVTr_allk,count_allk];
end
end

12 Comments

Use the paper clip to attach data here instead of offsite...
Have you run the profiler to prove it is ismember that is the culprit here? The following dynamic reallocation is a likely candidate for performance bottlenecking, I suggest to preallocate and assign as the first step.
TimeAvg = [TimeAvg;[PosAlluni(g,:),Aavg,Pavg,count]];
Then, providing background on what is actually being done would let others think of algorithmic approaches instead of just trying to do micro-optimation of code or trying to decipher root purpose on their own.
You can only upload data less than 5MB. I provided a link because my data was over 15MB when zipped.
What do you mean "run the profiler"?
TimeAvg = [TimeAvg;[PosAlluni(g,:),Aavg,Pavg,count]];
Seems like you are growing TimeAvg per iteration. Are you sure this isn't what's causing the code to be slow?
And this is the profiler that @dpb is talking about. https://www.mathworks.com/help/matlab/ref/profile.html
I'm certain timing comparisons could be done with much less than 15MB (or even 5MB) of data...
@OCDER, I will look into this profiler link you included. I think they both are potentially causing problems.
I don't know exactly how big the final size of TimeAvg will be, maybe the same number of rows as PosAlluni but I'm not sure. So either way it seems that the variable will continue to grow in size.
As for the effect of using ismember, on one iteration ismember took a few seconds to run, so in the for the loop the size of the one I am using that time is greatly compounded.
@Nathaniel Werner: if you haven't profiled the code then blaming ismember is premature and possibly misleading. You need to actually know what parts of the code are slow, for which you need to use the profiler.
As the others have pointed out, that you expand the TimeAvg array is likely big waste of time, which can be fixed by preallocating:
MATLAB had some easy rules for writing efficient code:
Alright, I had never heard of profiling the code. I will do that. I can also preallocate TimeAvg to have the same number of rows as PosAllrep since TimeAvg cannot exceed the number of rows in PosAllrep. That way I can just remove the extra rows later.
I'd reiterate that if you'd describe the problem it would make thinking of alternate solutions much easier and more often than not it is algorithmic changes rather than "optimizing" that makes significant reductions in efficiency.
I have edited the question to reflect the result of profiling the for loop. It does appear that ismember is the biggest culprit, but I may be interpreting the results incorrectly.
"... profiled ... code [and] largest ... total time spent is ... ismember [then] sortrows>sort_back_to_front. This is after preallocating the TimeAvg variable"
Just for comparison, edit the code to show the timed version so can see it and folks can use it if trying to help.
Also, just for comparison, how does the overall run time of the modified code with preallocation compare to that before--did that make noticeable improvement?
Yet again I go back to my previous comment as likely being the way to get more original ideas generated of alternate solutions.
Ok now I have compared the situations there is not any noticeable difference.
With preallocation.
Without preallocation.

Sign in to comment.

 Accepted Answer

Looks like your positional data could fit in a 3-D histogram (168x24x168). This could be made considerably faster, but hopefully this will be enough to get you started:
load sample
xval = unique(PosAllrep(:,1));
yval = unique(PosAllrep(:,2));
zval = unique(PosAllrep(:,3));
A_i = discretize(PosAllrep(:,1),xval);
A_j = discretize(PosAllrep(:,2),yval);
A_k = discretize(PosAllrep(:,3),zval);
sz = [length(xval), length(yval), length(zval)];
subs = [A_i, A_j, A_k];
A_all_sum = accumarray(subs,A_all,sz);
PVTr_sum = accumarray(subs,PVTr_all,sz);
bin_cnt = accumarray(subs,1,sz);
A_mean = A_all_sum ./ bin_cnt;
PVTr_mean = PVTr_sum ./ bin_cnt;
ind = find(~isnan(A_mean(:)));
[ix,iy,iz] = ind2sub(sz,ind);
TimeAvg = [xval(ix) yval(iy) zval(iz) A_mean(ind) PVTr_mean(ind) bin_cnt(ind)];

2 Comments

Thank you I will check this out and get back to you.
@Greg Dionne, this works very well thank you!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!