simple copy task much slower with high memory use, workaround possible?
Show older comments
All memory usage is as reported by the function "memory".
I found that my algorithm is much slower after running for some time, and the reason is Matlab takes longer for a quite simple copy task, when it already uses a lot of memory. In my case Matlab uses 9.9 GB RAM of 16 GB. When copying matrices like
tic; for i = 1:10000; t = a(1:n); end; toc
and plotting the resulting time over n, the result is:

On an Matlab instance with 5.5 GB RAM usage, the same leads to this instead:

So some kind of very time-intensive memory handling sets in with the 9.9GB-Matlab when copying more than 2000 entries. Why? What can I do to work around that? Using less memory is an option but adds time for file handling, so I'd like to keep using ~10GB.
Edit: I tried generating enough random data from scratch to reach 9.9GB usage, the problem doesn't arise in that case! This seems to be a reproducale bug in memory handling with my specific 9.9GB of data!
2nd edit: saved to disk, the "real" 9.9GB data, make up 15.2 GB data. The "fake" 9.9 GB data are only 7.5GB on disk. Maybe that is important somehow?
10 Comments
Jan
on 10 Mar 2021
The diagrams are far to small to recognize anything. What are the units?
How do you determine the memory usage of Matlab? Remember, that "used memory" is not uniquely defined. When Matlab allocates 1GB of RAM and the corresponding variable is cleared, the OS does not release the corresponding memory immediately. As long as this memory is available for Matlab, it appears in Matlab's memory usage, although it is not used currently.
I guess in the 2nd diagram you see, that Matlab allocates new memory instead of reusing the formerly reserved memory. This should happen, when the OS did not clean the memory before by overwriting it with zeros. Then providing "new" RAM is more efficient.
The solution is to avoid unnecessary memory copies.
Jochen Schuettler
on 10 Mar 2021
Edited: Jochen Schuettler
on 10 Mar 2021
Jan
on 11 Mar 2021
In the 9,9GB version, copying more than 2000 entries 10000 times takes more than 7s, while slightly less takes almost nothing. In the 5.5GB version, all copying takes less than 50 ms!
I still do not understand, what the "9.9" and "5.5GB versions" are. What does "9.9GB-Matlab" mean?
This is not clear to me also: saved to disk, the "real" 9.9GB data, make up 15.2 GB data. How do you save the data and why are they larger?
copying more than 2000 entries 10000 times takes more than 7s, while slightly less takes almost nothing
This sounds like an effect of the cache size: The CPU cache ist very fast compared to the RAM. If the data are available there already, the OS saves the time for reloading from the slow RAM completely. Of course this takes no time.
Jochen Schuettler
on 11 Mar 2021
Jan
on 11 Mar 2021
On disk, my "real" 9.9 GB lead to file of 15.2 GB size, while the "fake" leads to only 7.5 GB in file size.
This is such confusing, that I have problem to concentrate on the rest. If Matlab stores 9.9 GB of RAM in a 15.2 GB MAT file, there is a fundamental problem. With the -v7.3 format, the save command compresses the data. I would be surprised, if this increases the size.
I do not understand, what "fake" data are. For Matlab all data are numbers.
Jochen Schuettler
on 12 Mar 2021
Jochen Schuettler
on 15 Mar 2021
Jan
on 15 Mar 2021
If the 9.9 GB of RAM are occupied by one variable, it is stored as a contiguos block and the rest of the RAM is free (execpt for the RAM used by the OS and programs). If yout create thousands of variables in Matlab with a sum of 9.9 GB the RAM can be fragmented, e.g. by having free blocks of 1 MB between the variables. Then the sum of the free RAM can be large, but there is no space to store a variable with 2 MB anywhere.
Having a fragmented memory is a serious problem. There is no easy solution to solve this. Therefore it is a good programming pratice to avoid this. E.g. the iterative growing of arrays has to be avoided.
The 15.2 GB MAT file is really strange. Maybe it contains a lot of figure handles? Note that WHOS claims, that a figure uses 8 bytes only, because this is the memory of the handle only. Storing this handle on the disk, write the contents of the figure also, which takes much more space.
It would be useful, if you provide a minimal working example. You have posted some code, which does not produce the shown diagrams. My tests with this code did not show any suspicious behviour in Matlab R2018b:
figure;
axes('NextPlot', 'add', 'XScale', 'log');
a = rand(1, 1e6);
for n = 1:0.5:6
len = round(10^n);
tic
for i = 1:10000
t = a(1:len);
end
b = toc;
plot(len, b/len, 'o');
drawnow;
end
Walter Roberson
on 15 Mar 2021
-v7.3 files are stored in HDF 5, and there is a notable amount of overhead for container datatypes such as cell arrays or struct; storing pure numeric arrays is not nearly as bad.
Jochen Schuettler
on 15 Mar 2021
Edited: Jochen Schuettler
on 16 Mar 2021
Answers (2)
Jochen Schuettler
on 11 Mar 2021
Edited: Jochen Schuettler
on 11 Mar 2021
0 votes
4 Comments
Jochen Schuettler
on 11 Mar 2021
Jochen Schuettler
on 11 Mar 2021
Jan
on 15 Mar 2021
Why does task manager show so much less memory use than matlabs "memory"?
The term "memory usage" is not uniquely defined.
x = rand(1, 1e6);
x = 5
Now the large memory block reserved by the first command is free'd, but maybe the memory manager of the OS did not overwrite it with zeros already and therefore it is not available for other programs. Now it is a question of taste, if this memory belongs to Matlab or not. In theory the OS can decide by its own when to clear the contents. Usually it does this on demand or on idle only.
In addition the OS stores a file cache for each program. Does this belong to the application or to the operating system?
Would it help to declare the intermediate data as global/persistent, so we don't need for-loops?
Maybe. You did not post a section of your code, which would clarify exactly what you are doing. So I could speculate only. It is more reliable, if you try this. Remember, that the behavior can change with the Matlab version, the OS and the avalaible free RAM.
Jochen Schuettler
on 15 Mar 2021
Jochen Schuettler
on 16 Mar 2021
Edited: Jochen Schuettler
on 16 Mar 2021
7 Comments
Jochen Schuettler
on 17 Mar 2021
I have a hard time reading e.g. such lines:
I2 = [randperm(l(9)-1,l(11)-1) l(9)];
for i = 1:l(11)
To avoid mistakes I recommend to avoid mixing "1"s and "l"s.
But Matlab is not impeded by this. I'm not sure if there is a better strategy to represent your data. I try to avoid nested cells, but sometimes it is the most efficient way.
Therefore I have a very cheap idea only:
If memory is rare, install more RAM. If you wark with 10GB of data, having 16 GB of RAM is short. If Matlab starts to use virtual memory, this slows down the speed massively. 16 GB of additional DDR4 RAM cost 72€ currently. With more RAM the problem of fragmentation is less severe.
Jochen Schuettler
on 18 Mar 2021
Edited: Jochen Schuettler
on 18 Mar 2021
Jan
on 19 Mar 2021
"I don't know, why the structurally similar data takes so much more space with this example."
The OS provides the available memory to an application, when this is possible.
If a problem requires a lot of RAM, there are some tricks, but nothing is as efficient as running the code on a machine with enough ressources.
Jochen Schuettler
on 19 Mar 2021
Edited: Jochen Schuettler
on 19 Mar 2021
Jochen Schuettler
on 19 Mar 2021
Jan
on 19 Mar 2021
In this forum "closing" means, that a question is removed soon. So we close questions only, if they contain too few information to be answered.
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
