Efficiently populating an array without for loops
24 views (last 30 days)
Show older comments
Hi Everyone,
I have a list of data with 10,000,000 rows and 3 columns. The columns correspond to the shape, size, and color of an object, which is indexed with a number. There are 100 shapes, 100 sizes, and 50 colors.
I want to create a matrix (100x100x50) that essentially stores the count of each object type, kind of like a histogram for unique objects.
Rather than my following code, which is too slow to run because of the for-loops, does anyone know of a way to complete the same operation using direct matrix operations? It seems these comparisons should be relatively fast, but are extremely slow in Matlab the way I am doing it.
ObjectTypes = zeros(100,100,50);
for Shape=1:100
for Size=1:100
for Color=1:50
ObjectTypes(Shape,Size,Color) = size(MyData(MyData(:,1) == Shape & MyData(:,2) == Size & MyData(:,3) == Color),1);
end
end
end
0 Comments
Accepted Answer
Geoff
on 27 May 2012
Hah... So an alternative in Order(N) time...
for n = 1:size(MyData,1)
row = MyData(n, [1,2,3]);
ObjectTypes(row(1),row(2),row(3)) = ObjectTypes(row(1),row(2),row(3)) + 1;
end
More Answers (2)
Geoff
on 27 May 2012
Yeah that's searching through your data an awful lot every time you do the == comparisons. The way I do this kind of thing when populating a matrix from database results is to have the data sorted by two variables, and then use diff and find to get the data ranges.
So start with this:
MyData = sortrows(MyData);
Grab out the begin and end index for each group of values in column one.
% Partition by shape
begin1 = [1; 1+find(diff(MyData(:,1)))];
end1 = [begin1(2:end)-1; size(MyData,1)];
Now you can combine these into a loop variable, so each time through the loop will give you a 2x1 vector containing the start and end range. You do the same thing again with column 2. Finally I use accumarray to count up all the colours for a given size and shape:
% Process the Shape partitions
for r1 = [begin1, end1]'
Shape = MyData(r1(1), 1); % Single Shape
% Partition by Size
idx1 = r1(1):r1(2);
col2 = MyData(idx, 2);
begin2 = [1; 1+find(diff(col2))];
end2 = [begin2(2:end)-1; numel(col2)];
% Process the Size partitions
for r2 = [begin2, end2]'
Size = col2(r2(1)); % Single Size
idx2 = r1(1)+r2(1):r1(1)+r2(2);
% Count up all the Color occurrences for Shape and Size
Color = MyData(idx2, 3);
colorCount = accumarray(Color, ones(numel(Color),1));
ObjectTypes(Shape, Size, 1:max(Color)) = colorCount;
end
end
I would hope this is faster than your current loop, although there are probably clever ways to use accumarray without all the looping guff I've done. Apologies if there are errors in this code. I just hacked it straight into my web browser =)
1 Comment
Walter Roberson
on 27 May 2012
Are the numbers for the shape, size, color consecutive integers each starting from 1? If they are then the code can be reduced to
ObjectTypes = accumarray(MyData, 1);
If not then you can create the consecutive integers by using the thiree-output version of unique().
[ushape, junk, shapeidx] = unique(MyData(:,1));
[ucol, junk, colidx] = unique(MyData(:,2));
[usize, junk, sizidx] = unique(MyData(:,3));
ObjectTypes = accumarray( [shapeid(:), colidx(:), sizidx(:)], 1);
0 Comments
See Also
Categories
Find more on Data Distribution Plots in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!