Implicit loop or break for large data set

2 views (last 30 days)
Hi,
I am not very experienced with MATLAB and am trying to figure out the following situation. I have two large data sets, each a column of time information which is increasing. I want to find the indices of vector 1 that occur x seconds after an entry in vector 2, and return the indices of those locations in vector 2 as well.
My approach is that I have column vectors A and B, each with >2,000,000 data points but they are not the same size. Let us say I want the ones in B that occur within 1 s after A. So I have:
Atol = A +1 <-- this can be the maximum that the element in B can be, where the minimum is A. Then:
for i = 1:L1 (where L1 is length of A)
indx1 = (B >= A(i) & B <= Atol(i) )
end
And then to find the ones in A, I would do the same but the other way around.
for i = 1:L2
indx2 = (A <= B(i) & A>= Btol(i) ) where Btol is B-1
end
This would take forever to run however simply because of the large sets of data. Is there a way I can run this faster? I was thinking of implementing a break so that once I find one index, I stop and run through starting from that point since the answer for the second element will be later than the one for the first element, and so on. I have tried to implement this but it does not work. Alternatively, I have read that implicit loops are faster, but I cannot get this working either.

Accepted Answer

Guillaume
Guillaume on 11 Mar 2019
Edited: Guillaume on 11 Mar 2019
If I understood correctly:
Aidx = discretize(B, [A; Inf]); %find the index of the A element that is immediately smaller than the corresponding B element
Bdist = B - A(Aidx); %difference between B and the A element that is smaller
tokeep = Bdist <= 1; %indicates which elements in B are less than one second after A
B1sAfterA = B(tokeep); %keep elements of B that are less than one second after an element of A
Acorresponding = A(Aidx(tokeep)); %corresponding A elements
No idea how fast discretize will run on two vectors of > 2e6 elements.
  2 Comments
L N
L N on 11 Mar 2019
Edited: L N on 12 Mar 2019
Hi, thanks for the help. this works, but only if first element in B is smaller than A, else I get NaN. I have not tried with my large set yet.
I removed values of B smaller than value of first element of A and it works beautifully with my data. thank you!!
Guillaume
Guillaume on 12 Mar 2019
Edited: Guillaume on 12 Mar 2019
Yes, sorry, I assumed that the Bs were all greater than the min of A.
I also forgot to say that it requires A to be sorted monotonically increasing. B does not have to be sorted at all.
If you don't want to remove the small Bs, you could modify the code as such:
Aidx = discretize(B, [A; Inf]);
nottoosmall = ~isnan(Aidx);
Bdist = B(nottoosmall) - A(Aidx(nottoosmall));
tokeep = false(size(Aidx));
tokeep(nottoosmall) = Bdist <= 1;
B1sAfterA = B(tokeep);
Acorresponding = A(Aidx(tokeep));

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!