MATLAB Answers

0

How to pre-allocate changeable size arrays in a for-loop?

Asked by Ana Castanheiro on 20 Mar 2017
Latest activity Commented on by Guillaume
on 24 Mar 2017
Hi all!
I have a quite large script, where I deal with different particle size classes (or intervals) depending on given input. Each of the size classes (i=1:NrInt) will result in a cell array with a certain size (nr rows depends on nr of particles, nr columns depends on nr of chemical elements present). These arrays are then submitted to a number of operations in order to produce correspondent pie charts (1 pie per particle size interval).
I received part of this script from a previous colleague, which I then tried to adapt to my specific problem. After lots of error and trial, the script seems to be working just fine, but I still need to improve speed. As several variables are changing size inside the loop, I'm getting the warning preallocate messages all over the place.
The code for that part is in attachment. Can someone help me out?
Many thanks!

  3 Comments

You should always use
doc profile
before undertaking optimisation work. Following the M-lint messages is usually a good idea, but it always pays to have an overall analysis of exactly how long parts of your program take. If only 1% of the time is spent on that particular part of code then there is really no point even attempting to optimise it as the impact will be negligible.
Indeed, Adam. I profilled my script and that specific part does not seem to take a lot of time, at least compared to the total time. I have a few dialog boxes to request for input, and these are always more time consuming. Nevertheless, I'd like to solve the warning messages anyway.
Also, I think it might be something simple to do and easily repeatable for all the changeable size variables. So, I'm still looking for suggestions.
Ah, well, dialog boxes will skew profiler results probably in terms of percentage time taken in each part of the program. If the program is waiting for user input it will be sat there with the timer ticking away on that function, giving an erroneous evaluation of the time spent on the function itself.
Your code is too complicated for me to just glance at in the time I have though and make any valid suggestions.
You really ought to make use of blank lines in code for readability! I wall of text covering 30 lines or more is really hard to read.
I notice you are using cell arrays and these are never good for performance. Maybe they are totally necessary here, but if you can in any way use numeric arrays instead of cell arrays that would likely improve performance.
As for preallocation, sometimes you simply cannot do it if you have no idea how big your array will be beforehand. If you can estimate an upper bound on the size then you can presize it to this and then just trim it down to the smallest size it can be after the loop. Sometimes I do this if I can make a sensible estimate. Otherwise you can tell it to ignore that warning message. You are right to look into the message and try to solve it first though - only disable warnings when you have evaluated them and are happy to ignore them for valid reasons.

Sign in to comment.

2 Answers

Answer by Guillaume
on 23 Mar 2017
Edited by Guillaume
on 24 Mar 2017
 Accepted Answer

I'm with Adam and Dhruvesh, your code is very difficult to parse. Better indentation (select all code and press CTRL+I), more white spaces and comments would greatly help.
At a quick glance, I fail to see which variable cannot be pre-allocated. They all seem to be indexed by i which you know will have NrInt steps.
Like Adam, I wonder if all these cell arrays are necessary. There are also several number to string conversions. That's never going to be fast.
I also noticed several instances of
somevar = find(someexpression)
othervar(somevar) = ...
which can be replaced by
othervar(someexpression) = ...
There's no point in using find to convert the logical array returned by someexpression into explicit indices when you can use that logical array directly for indexing. The find call just slow things down.
Note that if you cannot preallocate (which is perfectly fine) you can get rid of the warning either by right clicking on the squigly line and selecting Suppress ... on ..., or adding %#ok<AGROW> at the end of the line.

  3 Comments

Many thanks for the useful tips on how to improve my script writing. I tried to implement alreayd some of them, and my code looks a bit better now.
I know how many NrInt steps I will have, as I define this using a prompt input, but this number can be different. And more important, depending on the requested NrInt, the arrays in use will have different sizes. That's why I get the message that the variables appear to change size on every loop iteration.
I attach my updated script anyway, in case you have any more suggestions.
But all those variables in the loop are still being indexed by i, so when you get to the end of the loop they should all be of size NrInt so you could presize them all to that.
The code is indeed a lot easier to read.
That the size of the arrays stored in each cell of the cells array differs does not matter. The number of cells of the cell array is fixed at NrInt. Hence all your cell arrays and vectors could be pre-allocated. The only variable whose size is unknown at the beginning of the loop appears to be TotLabel. That one however could be created after the loop.
So any cell array and vector can be predeclared with, e.g.
ClassHistFinal = cell(1, NrInt);
Nrparticles = zeros(1, NrInt); %for vectors
As said, TotLabel can be calculated after the loop has ended with:
TotLabel = [Label{:}];
There are a lot of what looks like intermediary results that are stored in cell arrays. Is it really necessary? Do you really need to keep all the intermediary results ClassNor, ElementsAllZero, etc. If not get rid of the indexing.
I personally prefer building strings with fprintf or fprintf and a format string, rather than string concatenation and num2str. So I'd rather have:
fprintf('Number of particles between %g and %g µm : %d(%g%%)\n', ThresholdVal(i:i+1), NrParticles(i), NrParticles(i)/TotalNrParticles*100)
than your
disp(horzcat(...
That won't have an impact on speed. I just find the former more readable and it's easier to customise the number display

Sign in to comment.


Answer by Dhruvesh Patel on 23 Mar 2017

Your code indeed is difficult to read. However here are some general pointers which might help you undersatnd what is going on under the hood when MATLAB resizes an array. The way MATLAB works while resizing an array when more elements are asked for by the for-loop is nicely explained in the following answer. It talks about both, the normal arrays as well as cell arrays.
So, it is always a good idea to take an estimate for the size and pre-allocate using that as this would mean that MATLAB will not have to resize atleast till the size reaches this estimated value. This would improve execution time as well as reduce memory fragmentation. Ideally if you have an upper bound for your loop iterations (looks like its 'NrInt' in your case) you can pre-allocate using that.

  0 Comments

Sign in to comment.