MATLAB Answers

Large, unused cell aray in memory still slows down calculation significantly

8 views (last 30 days)
Matthijs V.
Matthijs V. on 27 Nov 2019
Edited: James Tursa on 2 Dec 2019
I encountered a strange problem in simulation code today. The problem is that having an initialized, large cell array in memory during a calculation, slows down the calculation, even though the cell array is not used anywhere after its initialization! If I initialize the same variable to approximately the same size, but then in a multi-dimensional array, there is no problem. The array is initialized as follows:
%--------------------------------------------------------------------------
% Cell array initialization (note that SRR is not used anywhere!)
%--------------------------------------------------------------------------
SRR = cell(NKstd, NLstd, NManufacture);
for i = 1:NKstd
for j = 1:NLstd
for k = 1:NManufacture
SRR{i, j, k} = complex(zeros(2, 1, length(y)), ...
zeros(2, 1, length(y)));
end
end
end
%--------------------------------------------------------------------------
% Multi array initialization (note that SRR is not used anywhere!)
%--------------------------------------------------------------------------
SRR = complex(zeros(2, 1, length(y), NKstd, NLstd, NManufacture), ...
zeros(2, 1, length(y), NKstd, NLstd, NManufacture));
% Display size
whos SRR;
length(y) = 2001, NKstd = NLstd = 21, NManufacture = 100, leading to a size of approximately 3 GB after initialization. It is important to again stress that SRR is not used anywhere. I can comment this entire piece of code away and my simulation still runs.
If I comment out the multi array part, I get (with runtime of my simulation, for which timing starts well after the initialization):
Name Size Bytes Class Attributes
SRR 21x21x100 2828750400 cell
Running: 21 21
Done in 136.3 seconds.
If I comment out the cell part, I get (with runtime of my simulation):
Name Size Bytes Class Attributes
SRR 6-D 2823811200 double complex
Running: 21 21
Done in 57.5 seconds.
If I comment out the entire code above (showing SRR is not used anywhere), I get results very similar to multidimensional initialization:
Running: 21 21
Done in 58.6 seconds.
So, how does an unused cell array cause such a large difference in calculation time? Eventually, I would like to used this array to store some calculation details.
EDIT:
As discussed in some of the comments to answers below, the timing I'm doing specifically excludes the allocation, so this is not the issue. Both types of allocation only take a few seconds at most anyways.

  2 Comments

Matthijs V.
Matthijs V. on 29 Nov 2019
So, I did some more digging. Turns out, the first time I run my simulation with the cell array allocation after a fresh start of MATLAB, there is no slow down:
Done in 54.6 seconds.
However, just running the code 9 times after that, while still doing the cell allocation shows a clear slow down:
Done in 54.6 seconds. (first time)
Done in 60.7 seconds.
Done in 66.0 seconds.
Done in 72.8 seconds.
Done in 83.6 seconds.
Done in 88.8 seconds.
Done in 95.2 seconds.
Done in 103.4 seconds.
Done in 106.0 seconds.
Done in 113.3 seconds.
Then, going back to multidimensional and running the code:
Done in 58.1 seconds.
This is consistent across many repetitions. Now switch back to cell:
Done in 124.4 seconds.
So, the slow down seems persistent. This must have something to do with the way MATLAB allocates and frees large cell arrays. In the link I posted (https://nl.mathworks.com/matlabcentral/answers/331930-can-anyone-explain-why-matlab-gets-slower-and-slower-until-restart-if-large-cell-or-struct-arrays-ar), there seems to be a similar problem, however there the cell array is being used in the script. Here, it is not.
As discussed in the comments on that page, this might have to do with Windows memory handling. Is this known? And how can I avoid this? Can MATLAB avoid it by changing the way they allocate cell arrays?
Matthijs V.
Matthijs V. on 29 Nov 2019
I have made a test file and testCalculator to show this problem. Excuse the unreadability of the testCalculator code, I obfuscated it to simplify and hide what I'm actually simulating. Essentially it does an elaborate random matrix calculation. The point however is, is that allocating a large cell array M really impacts performance. After a fresh start of MATLAB:
>> for i = 1:10; test(1); end
Large UNUSED cell array allocated in 1.475781 seconds.
Elapsed time is 48.154090 seconds.
Large UNUSED cell array allocated in 1.705383 seconds.
Elapsed time is 53.350614 seconds.
Large UNUSED cell array allocated in 1.826082 seconds.
Elapsed time is 58.685069 seconds.
Large UNUSED cell array allocated in 2.054494 seconds.
Elapsed time is 66.113372 seconds.
Large UNUSED cell array allocated in 2.273053 seconds.
Elapsed time is 71.093404 seconds.
Large UNUSED cell array allocated in 2.427023 seconds.
Elapsed time is 74.004914 seconds.
Large UNUSED cell array allocated in 2.587717 seconds.
Elapsed time is 76.941799 seconds.
Large UNUSED cell array allocated in 2.655030 seconds.
Elapsed time is 78.275337 seconds.
Large UNUSED cell array allocated in 2.963142 seconds.
Elapsed time is 85.800173 seconds.
Large UNUSED cell array allocated in 3.182321 seconds.
Elapsed time is 90.013523 seconds.
>> test(2);
Large multimensional array allocated in 2.099121 seconds.
Elapsed time is 47.672167 seconds.
>> test(0);
No unused data allocated.
Elapsed time is 46.655427 seconds.
So the test files clearly show the problem. Feel free to look/test for yourself. Note that also the allocation itself starts taking longer and longer.
I have filed a service request and MathWorks is looking into the problem.

Sign in to comment.

Answers (2)

Guillaume
Guillaume on 27 Nov 2019
Well, with the cell array you're doing 2x44100 matrix allocations with zeros, which all have to be copied into a new complex matrix, so that's 88200 allocations and 88200 copies + 44100 allocation of complex matrices, whereas with the matrix it's just 3 allocation and 2 copies (I assume all this, details of memory management are not public). There's quite a lot of bookkeeping involve with allocating matrices so that could explain the slowdown (although i wouldn't expect it to be that big).
Note that another way of creating that cell array would be with:
SRR = squeeze(num2cell(complex(zeros(2, 1, length(y), NKstd, NLstd, NManufacture), ...
zeros(2, 1, length(y), NKstd, NLstd, NManufacture)), [1 2 3]));
See if it's any faster.
Of course, since all the matrices are identical you could just do:
SRR = repmat({complex(zeros(2, 1, length(y)), zeros(2, 1, length(y)))}, NKstd, NLstd, NManufacture);
This will definitively be faster to create but on the other hand, you'll have a slowdown when first assigning to each matrix when matlab need to split the shared copy between the cells.

  8 Comments

Show 5 older comments
Guillaume
Guillaume on 27 Nov 2019
Is there any nested or anonymous functions created in the same workspace as the cell array? Like James I'm making wild guesses at what could be the issue, but these functions, do capture the the workspace of the function in which they're defined. Anonymous functions in particular do have to make a copy of the workspace variables. However, they should only capture the variables that are actually used by the anonymous function and the copy is probably a shared copy.
Matthijs V.
Matthijs V. on 27 Nov 2019
There are no anonymous functions. The calculations I'm doing are done with function calls on a handle class. The calculations involve a ton of temporary variables, so perhaps James is right and there is something going on with the variable headers?
Running the profiler shows that the exact same lines in my calculations take much longer with the cell array present in memory, even though the cell array is not used anywhere.
Matthijs V.
Matthijs V. on 29 Nov 2019
I have attached demonstration files to a comment on my question. Feel free to test for yourself, I'm curious if others experience the same problem.

Sign in to comment.


James Tursa
James Tursa on 27 Nov 2019
Edited: James Tursa on 27 Nov 2019
Can you clarify if the above code is included in your timings? For all we know, you are simply showing that creating 44100 separate MATLAB variables takes longer than creating 3 MATLAB variables.
Also, it is rarely the case that you need to pre-allocate cell contents, since these are typically overwritten downstream in your code. It is only if you are modifying the contents by element that it might make sense. And if only some of the contents will be changing, this might be better:
SRR = cell(NKstd, NLstd, NManufacture);
SRR(:) = {zeros(2, 1, length(y),'like',1i)}; % reference copies fast to create
Finally, your 6D array example is better allocated as:
SRR = zeros(2, 1, length(y), NKstd, NLstd, NManufacture,'like',1i);
This avoids those temporary real & imaginary parts that you are currently using.

  8 Comments

Show 5 older comments
Guillaume
Guillaume on 27 Nov 2019
I think you ought to raise a service request with mathworks because only they can really explain what's going on. If you do get a satisfying explanation do tell us as it's an interesting topic for advanced users.
Matthijs V.
Matthijs V. on 27 Nov 2019
That's a good idea, thank you. I'll do that when I'm back in the office tomorrow. I will also try to make a simple demonstrative example script and class for others to test, without putting my simulation code online.
Matthijs V.
Matthijs V. on 29 Nov 2019
I have attached demonstration files to a comment on my question. Feel free to test for yourself, I'm curious if others experience the same problem.

Sign in to comment.

Sign in to answer this question.

Products


Release

R2019b