Matlab comparison of two large matricies
1 view (last 30 days)
Show older comments
I am trying to retrieve the index of exact matches (row-specific) between two large matricies. I have a n x 61 matrix A containing values from 0 to 9 and another n x 61 matrix B , whereas each row here contains values from 0 to 9 but mostly NaN (only 2 to 8 columns in each row of matrix B contain actual numbers). Matrix A can be expected to have between 1.5 million and 3 million rows, whereas matrix B has around 0.2 to 0.5 million rows. Here is an example of the setup:
% create matrix a with random data
dataSample = [0 9];
numRows = 1000000;
numCols = 61;
A = randi(dataSample,numRows,numCols);
% create matrix B with random data
numRows = 100000;
numCols = 61;
numColsUse = 2:8;
dataRange = 0:9;
B = NaN(numRows,numCols);
for i = 1:size(B,1)
% randomly selet number of columns to fill
numColsFill = datasample(numColsUse,1);
% randomly select column index from available columns
colIdx = datasample([1:numCols],numColsFill);
% randomly select values from 0 to 9
numFill = datasample([0:9],numColsFill);
% insert numbers at respective column in matrix B
B(i,colIdx) = numFill;
end
I want to compare every single row of matrix A with the entire matrix B and find exact matches, where the numbers of matrix B match the numbers of matrix A at their respective positions (columns) - hence the NaN in matrix B are to be ignored.
I can achieve the desired result using cellfun, where I slice matrix A in several subsets and then use a custom function to compare the rows of the subset with each row in matrix B, like so:
% put all rows of matrix B in single cell
cellB = {B};
% take subset of matrix A and convert to cell array
subA = A(1000:5000,:);
subA = num2cell(subA,2);
% prepare cellB to meet cellfun conditions
cellB = repmat(cellB, [size(subA,1) 1]);
% apply cellfun to retrieve index of each exact match
idxContainer = cellfun(@findMatch, cellB, subA, 'UniformOutput', false);
Function findMatch looks as follows:
function [ idx ] = LTableEval( cellB, subA )
idxCheckLT = lt(cellB, repmat(subA, [size(cellB,1) 1]));
idxCheckGT = gt(cellB, repmat(subA, [size(cellB,1) 1]));
idxCheck = idxCheckLT + idxCheckGT;
idxSum = sum(idxCheck,2);
idx = find(idxSum == 0);
end
This approach works, but it appears to be very inefficient, especially RAM-wise, as the cellfun requires all inputs to have the same size and hence a multiplication of the same data sets. Any ideas on how to tackle this problem in a more efficient way? Many thanks!
0 Comments
Accepted Answer
Guillaume
on 28 May 2017
This is how I'd do it:
matches = cell(size(B, 1), 1);
for Brow = 1:size(B, 1)
Bcols = find(~isnan(B(Brow, :)));
matches{Brow} = find(all(A(:, Bcols) == B(Brow, Bcols), 2)); %requires R2016b or later
end
It's certainly a lot more efficient than any of the solutions you already have.
Note: in R2015b or earlier replace the relevant line by:
matches{Brow} = find(all(bsxfun(@eq, A(:, Bcols), B(Brow, Bcols)), 2));
4 Comments
Guillaume
on 29 May 2017
Try this:
matches = cell(size(B, 1), 1);
for Brow = 1:size(B, 1)
Bcols = find(~isnan(B(Brow, :)));
matchedrows = find(all(A(:, Bcols) == B(Brow, Bcols), 2));
matches{Brow} = [matchedrows, repmat(Brow, size(matchedrows))];
end
matches = cell2mat(matches);
finalScores = accumarray(matches(:, 1), matches(:, 2), [size(A, 1), 1], @(Brows) mean(Bscores(Brows)), nan);
However, if most rows of A all match at least some rows of B it may be more efficient to have just one loop over the rows of A. (I regard accumarray as another loop)
More Answers (1)
Matthew Eicholtz
on 26 May 2017
A couple comments:
1. Did you mean to convert to the cell array in this manner?
subA = num2cell(subA);
If you want to look at each row as its own cell, I think you need:
subA = num2cell(subA,2);
2. I am not sure how much more efficient this solution will be, but you can replace
idxContainer = cellfun(@findMatch, cellB, subA, 'UniformOutput', false);
with
idxContainer = cellfun(@(x,y) find(all(isnan(x)|x==y,2)), cellB, subA, 'UniformOutput', false);
Let me know if this helps at all.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!