You are now following this question

You will see updates in your followed content feed.
You may receive emails, depending on your communication preferences.

How to increase reading speed from a Gigabyte large file ?

2 views (last 30 days)

Show older comments

farzad on 17 Jun 2019

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file

⋮

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file

Commented: farzad on 20 Jun 2019

Hi all

how do I increase reading speed from an Excel file that contains rows and columns with a volume of some GigaBytes?

18 Comments
Show 16 older commentsHide 16 older comments

dpb on 17 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715498

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715498

Increase relative to what?

farzad on 17 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715501

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715501

Let's ask it this way :

what is the fastest way to parse the data?

dpb on 17 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715516

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715516

Dunno...'pends on what the data are and how saved...getting it out of Excel and into a .mat or stream file would undoutedly be the fastest.

farzad on 17 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715519

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715519

The data are float and let's say 5 Gigabytes.

why .mat and why stream file ? how would the code be like ?

is using the table useful ?

dpb on 17 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715625

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715625

'Cuz both .mat and stream files are binary representations of the actual bytes in memory, thus eliminating the need for conversion.

You've still not said which form of file it actually is; if it is .xls(x), then the xlsread is fairly slow.

A table would be one choice for internal storage in Matlab; how useful depends entirely on what the data are and how they need to be processed which like the actual file itself, you're keeping us totally in the dark so all we can do is guess...

farzad on 17 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715628

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715628

I need to knpw for both csv and xlsx I mentioned before, data type is float

dpb on 17 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715657

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715657

Well, with .xlsx files you have the choice between xlsread and readtable. You'll just have to test which is faster--one presumes probably readtable. If you have R2019a, you can try the new readmatrix which is now recommended instead of xlsread.

For csv files, the historic ways are csvread, textscan, fscanf altho again with the caveat of requiring R2019a, readmatrix is the TMW-recommended alternative now.

I don't have R2019a installed yet, so I can't comment on the relative performance between it and alternatives.

Still, if speed and doing this more than once will be required, then doing it once and then using .mat or stream files will undoubtedly beat any of the alternatives.

You could, if your application can live with single precision, cut the file size in half by saving single instead of double. That's purely a case of what is required of the data itself as to whether would be a viable alternative or not.

Walter Roberson on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715761

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715761

Edited: Walter Roberson on 18 Jun 2019

I wrote out 1e6 by 50 of doubles = 4 gigabytes in binary form, and tested how long loading took.

When saved as space-delimited double using save -ascii -double, then using load() of the 12501000000 bytes of text file took 1416 seconds.

textscan() of that same file took 265 seconds.

fscanf() of the same file took 371 seconds.

When saved as a .csv file using dlmwrite() with precision 16, then using load() took 1107 seconds.

When saved as -v7.3 .mat, then using load() of the 3796914266 bytes of file took 25 seconds.

When saved as a pure binary file, then fread(fid, [1e6 500],'*double') took 14 1/4 seconds the first time, and 2.1 seconds the second time (file in operating system cache.) fread(fid, [1 inf], '*double') takes 4.6 seconds when the file is in operating system cache, which tells us that there is more memory management overhead when the size is unknown.

(I will update as I generate more times.)

farzad on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715766

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715766

Thank you very much Walter

That is very much what's I was searching for. How do you save as mat?

Walter Roberson on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715787

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715787

Open in MATLAB Online

data = rand(1e6, 50);
save testdata.mat data -v7.3

but this relies upon having the data in the first place to write out as .mat.

farzad on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715790

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715790

On the contrary if it's an excellent file with database put of matlab?

Walter Roberson on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715809

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715809

I am having difficulty creating a excel file that large. I wrote the file as .csv but my Excel complains about running out of memory when trying to import it, which does not make sense to me.

farzad on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715819

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715819

This is a new problem then

Walter Roberson on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715826

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715826

I have been updating the timings; you might want to have another look, above.

dpb on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715885

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715885

All of which continues to say "ditch Excel" entirely for such large files...

I do find it interesting that textscan manages to beat fscanf -- one would think would boil down to the same C runtime library call. Just out of curiosity, what were the two specific commands used, Walter? Oh--did you include overhead to cast the cell array from textscan to double?

Walter Roberson on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715980

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715980

Edited: Walter Roberson on 18 Jun 2019

Open in MATLAB Online

I created a format with repmat of '%f' 50 times. I fopen and then

datacell = textscan(fid, fmt, 'collectoutput', 1);

Because this puts everything into a single cell the overhead to extract the array is trivial.

The timing with collectoutput 0 without joining the columns after, was a hair higher but not statistically significant.

dpb on 18 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715983

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715983

Yeah, that's kinda' what I suspected, thanks for confirming, Walter.

I still find it more than strange that there's 30% reduction over fscanf -- what are they doing wrong with it then is the question that there's that much room for improvement?

These timings couldn't possibly be related to caching issues, I presume; you're too careful for that! :)

farzad on 20 Jun 2019

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_716517

⋮

Link

Direct link to this comment

https://se.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_716517

Thank you all guys ! It was a great answer ! thank you

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

An Error Occurred

Unable to complete the action because of changes made to the page. Reload the page to see its updated state.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

(English)
(Deutsch)
(Français)

（简体中文）
(English)

You can also select a web site from the following list

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

América Latina (Español)
Canada (English)
United States (English)

Europe

Belgium (English)
Denmark (English)
Deutschland (Deutsch)
España (Español)
Finland (English)
France (Français)
Ireland (English)
Italia (Italiano)
Luxembourg (English)

Netherlands (English)
Norway (English)
Österreich (Deutsch)
Portugal (English)
Sweden (English)
Switzerland
United Kingdom (English)

Asia Pacific

Australia (English)
India (English)
New Zealand (English)
中国
- 简体中文Chinese
- English
日本Japanese (日本語)
한국Korean (한국어)

Contact your local office

How to increase reading speed from a Gigabyte large file ?

18 Comments
Show 16 older commentsHide 16 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

How to increase reading speed from a Gigabyte large file ?

18 Comments Show 16 older commentsHide 16 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

18 Comments
Show 16 older commentsHide 16 older comments