How can I load a large csv file?

I have a large csv file (6GB) and try to load it into MATLAB and save it into structure file.
I am currrently using textscan, and MATLAB freeze and the computer stopped responding after a certain time.
The file has 54,200,000 lines with 10 data in each line. I tried loading only few columns at a time, and it is still not working.
Is there a way I can load them all at once?
Thank you in advance~~~

Answers (1)

Cedric
Cedric on 16 Apr 2013
Edited: Cedric on 16 Apr 2013
Did you try using CSVREAD and DLMREAD? The latter would allow you loading the file by block.
Also, what type of data is stored in the file? Could you copy/paste the first two rows here? Storing and array of size 54,200,000 x 10 as double requires a little more than 4GB RAM. What kind of system are you working with? If it can't handle this, you could read by block and convert into a smaller type/class for storing.

6 Comments

the file contains only numbers like:
1357014085000 58609 41.4002021500 -88.0114677748 10035.88200 196.20920 151.66576 227.40741 151.66576 ZAU B737 1357014090000 58609 41.3954721678 -88.0080549443 10141.70400 203.76750 151.66603 236.45369 151.66603 ZAU B737 1357014095000 58609 41.3905620328 -88.0045124423 10255.33700 211.04308 151.66644 245.22408 151.66644 ZAU B737
I have divided the file into several smaller .csv files, but then I need to do comparison between these data in different files.
I am working on a MAC with 12GB RAM~
I will try csvread for the time being.
Thank you ~
Cedric
Cedric on 16 Apr 2013
Edited: Cedric on 16 Apr 2013
I see, the fact that you have text in the last column will prevent you from using CSVREAD or DLMREAD efficiently.
Have you looked at this thread yet?
Sorry that I get back to you this late. I have looked at that thread and it seems like my MATLAB will turn into a freezing mode and I can't really track the status and whether it will eventually dead.
For each .csv piece translated into structure file, I get 1.2G per file. I have total 20 of them. Which means that, even if I have them loaded all at once and translated into structure file, I would still probably encounter the huge file problem of my structure file.
Is there a way I can fix this?
Thank you so much
No problem! What structure does your structure file have? Do you need all the data from all files present in memory for treatment before you can start building this file, or could you treat the whole by smaller chunks (i.e. import a CSV file, export part of the structure file, import the second CSV file, export the next part of the structure file, etc)? Also, do you need all the columns of the input files or only a few of them?
The problem is, I need to do a comparison among all the data in that big CSV file. The smaller chunks strategy would require me to do cross file comparison. I tried loading only few columns of the data with the entire 6G, my MATLAB doesn't seem happy and stopped working :[
But which columns do you need? all? And what kind of processing/comparison do you have to perform?

Sign in to comment.

Asked:

on 16 Apr 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!