MATLAB Answers

What is the fastest way to load many large files and then reuse that data

2 views (last 30 days)
Daniel
Daniel on 20 Aug 2020
Commented: Daniel on 28 Aug 2020
I have upwards of 200 .csv files that are around 500 MB each. Each file contains a one line text header and 10 columns of numeric data with many, many rows. I only need to load columns 2-4 once from any one of the files as that information is identical in all files. From all of the files, I need columns 5-8 only. The files are all in one folder with a systematic naming convention if that helps at all. What is the fastest way to do this the first time? I've tried importdata, textscan, and readmatrix and have either not been able to do what I want above or have found it still too slow. Once it's loaded, I'll do some manipulation and save it as a .mat to work on later. Am I right that saving as .mat will produce the fastest load times in the future?

  3 Comments

dpb
dpb on 20 Aug 2020
How large an array does one of these files create in memory? The size of a .csv file is extremely dependent upon how many digits of precision were saved.
It's going to take time to do this, no matter what; if you can even manage to hold all the needed data in memory at once or not.
It probably is as fast to read the full file and then only keep the information you need from each in turn as it is to add the overhead of reading only part of the file during the input step.
Is there any real need to have all the data at one time--cannot it be analyzed/plotted/whatever piecewise?
A .mat file will certainly be quite a lot smaller and quicker to load than all the .csv files but of that size it still will be noticeable. The fastest possible would be fwrite, fread, a straight binary file. For a single array, this is trivial code.
Walter Roberson
Walter Roberson on 20 Aug 2020
textscan() with %* formats to skip columns is probably about the fastest you are going to get.
Daniel
Daniel on 28 Aug 2020
One whole file is 301,409,168 bytes once loaded. I had been loading all the data and plotting some things as a sort of quality check. I don't suppose I'd know how to just load it and save it to a .mat in a piecewise fashion, though that might suffice for what I want.

Sign in to comment.

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!