Loading a 600 mb file ends up astronomically large

2 views (last 30 days)
Hey all!
I'm curently loading a csv file that has both test and numeric data. The file is 600 mb, but when I load it, my ram goes through the roof! (see attached) Why is this happening? I would presume it would just jump by 600 mb.
Any suggestions at all would be helpful!
Thanks!
Trevor

Answers (1)

dpb
dpb on 22 Feb 2021
Edited: dpb on 23 Feb 2021
Attach the file (or at least a portion of it).
Memory usage is only 1:1 with disk storage for numeric data types stored as stream data--otherwise there is overhead associated with higher-level storage types like cell arrays, struct's, tables, etc., ...
While a .csv file on disk will generally occupy more memory than the numeric values in memory owing to representation as character strings instead of internal storage, even that is not necessarily always true -- consider an integer array -- as a double, each will be 8 bytes, but it would take integers >1E7 to require 8 digits/characters to store in a .csv file -- 7 characters for numbers plus a comma.
> v=randi(100,10,1);
>> csvwrite('inttest.csv',v)
>> !dir inttest.csv
Volume in drive C is OS
Volume Serial Number is 3260-4552
Directory of C:\...\MATLAB\Work
02/22/2021 02:31 PM 28 inttest.csv
1 File(s) 28 bytes
0 Dir(s) 807,796,576,256 bytes free
>> whos v
Name Size Bytes Class Attributes
v 10x1 80 double
>>
so this simple case resulted in an 80/28 = 2.86X memory multiplier.
Now look at what happens if turn into a cell array--
>> c=num2cell(v)
c =
10×1 cell array
{[64]}
{[51]}
{[90]}
{[32]}
{[ 8]}
{[88]}
{[68]}
{[78]}
{[ 7]}
{[19]}
>> whos c
Name Size Bytes Class Attributes
c 10x1 1200 cell
>>
Now it's 1200/28 --> 42.86:1 !!!
You can demonstrate such for all storage classes other than base numeric ones--there is no free lunch!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!