Searching for rows of cell arrays containing strings in a different cell array having a collection of multiple other strings

6 views (last 30 days)
I have to search exact same rows between two cell arrays, but my my rows aren't exactly similar among those two. For example;
A = {'ABCS' '100'; 'A' '10'; 'C' '0'; 'ASD' '12'};
B = {'ABCS' '100'; 'A' '100'; 'C' '0'};
I have to search all the rows of B in A.
I am able to do a row by row search but its difficult to do when the size of the cell arrays are
A : 19 million rows
B : 29,000 rows
I have gone through most of the posts but couldn't get hold of it.
Thanks
  4 Comments
BR
BR on 3 Jun 2019
Yes I used ismember () as follows;
[~,index]=ismember(A(:),B(:));
ind=find(index(:,1)==index(:,2));
But this obviously didn't work as this will return index values considering both columns as one (I guess).
Then I implemented the index by index search for B(:,1) in A (:,1) and for matched indices look for the second column value.

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 3 Jun 2019
This is not a very beautiful solution, nor might it be suitable for such large cell arrays:
>> A = {'ABCS' '100'; 'A' '10'; 'C' '0'; 'ASD' '12'};
>> B = {'ABCS' '100'; 'A' '100'; 'C' '0'};
>> [X,Y] = ismember([char(B(:,1)),char(B(:,2))],[char(A(:,1)),char(A(:,2))],'rows')
X =
1
0
1
Y =
1
0
3
  5 Comments
Stephen23
Stephen23 on 4 Jun 2019
Something like this will be reasonably efficient at finding all rows in A that match any row in B:
load('A.mat')
load('B.mat')
N = size(B,1);
C = cell(N,1);
for k = 1:N
X = strcmp(B{k,1},A(:,1)) & strcmp(B{k,2},A(:,2));
Y = find(X);
Y(:,2) = k;
C{k} = Y;
end
Z = vertcat(C{:});
For your sample data it gives this list, the columns are [A rows, B rows]:
>> Z = vertcat(C{:});
>> Z
Z =
6804 4
8198 43
2358 337
3753 993
2362 1020
2362 1040
2362 1060
8314 1819
86 2008
5757 2018
5757 2039
5757 2060
5757 2081
5757 2102
5757 2123
5757 2144
5757 2165
1753 2471
4504 2496
8673 2746
4836 4280
6805 5099
6805 5120
2961 6004
8006 9811
8666 10100
6799 10386
1060 12578
6552 12625
5751 13859
3340 14120
4503 14231
2998 15221
7932 17269
6805 19850
6805 19871
2961 20755
8006 24562
8666 24851
6799 25137
1060 27329
6552 27376
5751 28610
3340 28871
4503 28982
BR
BR on 4 Jun 2019
Edited: BR on 4 Jun 2019
That is really amazing. That works much efficiently. Bit slow again for huge datasets but yeah works really well.
Big help, man.
Thanks a lot.
Cheers

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!