Find out why mat files differ in size

17 views (last 30 days)
I'm developing a rather complex class hierachy with a few GB of data embedded in its instances which might get saved to mat files for later analysis.
I refactored a lot to improve memory and CPU footprints (using dependent properties, customized loadobj and saveobj methods etc) and saw that the resulting mat file grows in size (using save() with v7.0 and enabled compression). I screwed it up.
I have some old reference mat files from the former versions that are smaller (~30%). However if I load them using the current class definitions, the resulting objects in RAM are almost exactly (just <1% difference) in size (using the great getArrayFromByteStream function, see Serializing/deserializing Matlab data - Undocumented Matlab). That means I can't infer from the instantiated objects, what grew in size.
Question: How do I find out what really gets saved to the mat file, i.e. which variable/object is much larger compared to the old versions?
I can roll-back to my former version via Git, but that does not really help me to understand, why exactly the mat files got bigger.
Any ideas?
Thanks,
Jan

Accepted Answer

Jan Kappen
Jan Kappen on 25 Mar 2024
Got it fixed.
I've followed a similar approach as @Samay Sagar proposed, but ultimately used getArrayFromByteStream, see Serializing/deserializing Matlab data - Undocumented Matlab. And I checked out the old version of my library in a second MatLab session and compared all properties step by step, skipping Dependent properties via reflection.
Root cause: I've split a data table (class table) into two class objects which should've used dependent properties, and an internal table to store the data. Turned out I forgot to make one block of properties transient/dependent to avoid saving them.
Afterwards, the mat file sizes were basicaly the same - quite interesting that there's no difference if the table is saved or a wrapping class around it - both can get compressed very efficiently, very nice Mathworks!
PS, just found out that mat files can be compared visually too: Compare and Merge MAT-Files - MATLAB & Simulink (mathworks.com) and that it can even "look" into objects, but not arbitrarily nested. But it could also be a good starting point:

More Answers (1)

Samay Sagar
Samay Sagar on 25 Mar 2024
You can utilize the "whos" command for thorough examination of variable sizes within MATLAB objects, facilitating the discernment of any modifications in variable dimensions present in MAT files.
Here is a sample script to identify changes in MAT file:
% Extract variables of interest
oldVariables = whos('-file', 'old_version.mat');
newVariables = whos('-file', 'new_version.mat');
% Compare variable sizes
for i = 1:length(oldVariables)
oldSize = oldVariables(i).bytes;
newSize = 0; % Initialize new size
% Find corresponding variable in new version
for j = 1:length(newVariables)
if strcmp(oldVariables(i).name, newVariables(j).name)
newSize = newVariables(j).bytes;
break;
end
end
if newSize == 0
fprintf('%s:\n', oldVariables(i).name);
fprintf(' Variable not found in new version\n\n');
else
sizeChange = newSize - oldSize;
percentageChange = (sizeChange / oldSize) * 100;
fprintf('%s:\n', oldVariables(i).name);
fprintf(' Old Size: %d bytes\n', oldSize);
fprintf(' New Size: %d bytes\n', newSize);
fprintf(' Size Change: %d bytes (%.2f%%)\n\n', sizeChange, percentageChange);
end
end
Read more about “whos” here:
  1 Comment
Jan Kappen
Jan Kappen on 25 Mar 2024
Thank you very much for that approach. Unfortunately, it looks like that does not work with handle class objects. Plus, I just had one variable in that mat file, a big class object that capsules all the data.
I've followed a similar approach but ultimately used getArrayFromByteStream, see Serializing/deserializing Matlab data - Undocumented Matlab. And I checked out the old version of my library in a second MatLab session and compared all properties step by step, skipping Dependent properties via reflection.
Root cause: I've split a data table (class table) into two class objects which should've used dependent properties, and an internal table to store the data. Turned out I forgot to make one block of properties transient/dependent to avoid saving them.
Afterwards, the mat file sizes were basicaly the same - quite interesting that there's no difference if the table is saved or a wrapping class around it - both can get compressed very efficiently, very nice Mathworks!
PS, just found out that mat files can be compared visually too: Compare and Merge MAT-Files - MATLAB & Simulink (mathworks.com) and that it can even "look" into objects, but not arbitrarily nested. But it could also be a good starting point:

Sign in to comment.

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!