How can I speed up this indexing code?

6 views (last 30 days)
I have a cell array in which each cell contains a string. E.g....
a={'AAAAAA';'BBBBBB';'CCCCCC';'AAAAAA';'DDDDDD'};
Each cell in array a is associated with a row in a numerical array that contains 10 columns. E.g....
b=[0,0,0,1,1,0,0,0,0,0;
0,1,0,0,0,0,0,0,0,0;
1,0,1,0,1,0,2,0,0,0;
3,0,0,0,0,0,0,0,0,1;
0,0,0,0,0,0,2,0,0,1];
Some strings in a are repeated such as 'AAAAAA' as shown above. What I need to do is find all repeated cells in a and sum the assocated columns from b into a single row. This should result in two new arrays (unia and bnew) which have equal numbers of rows but every string in unia is unique.
Easy enough to do with a loop such as:
unia=unique(a);
bnew=zeros(numel(unia),10);
for n=1:numel(unia)
pos=find(strcmp(a,unia{n}));
bnew(n,:)=sum(b(pos,:),1);
end
This works fine for small arrays but I have a case where a has 6 million cells and unia has 300,000 cells so I need something much faster. Any ideas?
Thanks!

Accepted Answer

Ive J
Ive J on 28 Oct 2021
Avoid comparing strings within the loop and instead take advantage of the index vector from unique:
a = ["A", "B2", "A", "C", "AA", "B2", "B2"]; % use strings instead of cell array of characters, they're much more efficinet to work with
b = randi([0 2], numel(a), 3)
b = 7×3
1 0 1 0 1 2 0 1 2 0 2 1 2 0 2 2 2 0 0 2 1
[anew, ~, idx] = unique(a);
bnew = arrayfun(@(x) sum(b(x == idx, :), 1), 1:numel(anew), 'uni', false);
bnew = vertcat(bnew{:})
bnew = 4×3
1 1 3 2 0 2 2 5 3 0 2 1
anew
anew = 1×4 string array
"A" "AA" "B2" "C"
Also, you can use tall arrays when dealing with large arrays.

More Answers (0)

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!