How to speed up pairwise difference calculation between vectors under certain condition?
8 views (last 30 days)
Show older comments
I have two double arrays A (size: Nx1) and B (Mx1) both with large increasing values (from ~1e9 to ~1e13; not sure this is relevant).
I am trying to compute the pairwise differences between the two vectors but only for pairs that respect a certain condition, that is the absolute value of the difference between the two values is smaller than t.
I first tried to brute force calculate all the pairwise differences but as you would expect a NxM array cannot be stored on a regular computer; it is around a few hundred of terabytes.
So far I have implemented a rather straightforward solution which works fine with small N and M.
Low = 1e9; % lower bound for values in A & B
Upp = 1e13; % upper bound for values in A & B
N = 1e4; % number of elements in A and B (assuming they are equal, which is not the case)
A = sort((Upp-Low)*rand(N,1)+Low,'ascend');
B = sort((Upp-Low)*rand(N,1)+Low,'ascend');
maxdif = 1000000; % maximum difference
PrWD = cell(1,N);
tic
for i = 1:N
% pairwise difference if |B-A(i)|<=maxdif
PrWD{i} = A(i) - B(abs(B-A(i))<=maxdif)';
end
toc
However, I have A and B arrays which are both very large (N and M is the order of a few tens of millions) such that this operation takes a few hours in practice.
I tried arrayfun or bsxfun on small subsets (N=M=1e5) but it does not speed up the process.
At this point, I understand that this is extremely demanding as calculating all the pairwise differences or even the indices verifying the condition seem to scale with N*M.
Is there any (Matlab) way to speed up this process? Maybe taking into acccount the fact that values are increasing, or that not all differences need to be calculated?
1 Comment
Accepted Answer
Jan
on 8 Jun 2019
Instead of creating the huge index matrix abs(B-A(i))<=maxdif in each iteration, it is cheaper to search the lower and upper limits of the matching elements iteratively:
tic
PrWD2 = cell(1,N);
ini = 1;
fin = ini;
for i = 1:N
% Find initial and final index:
while B(ini) < A(i) - maxdif && ini < N
ini = ini + 1;
end
while B(fin) <= A(i) + maxdif && fin < N
fin = fin + 1;
end
if ini <= fin - 1
PrWD2{i} = A(i) - B(ini:fin - 1).';
end
end
toc
% PrWD2(cellfun('isempty', PrWD2)) = {zeros(1,0)};
% isequal(PrWD, PrWD2)
With your example and N=1e5 this is about 1100 times faster than the original code.
This code replies the empty matrix [], if no element is matching, while the original code creates a [1 x 0] matrix. For a comparison of the results, the empty matrices are adjusted, but this is not needed in the productive code most likely.
0 Comments
More Answers (0)
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!