How to import some large data please

Hi all I have a file called DJ.csv which has 5 columns. 1) Dates (01/02/2007), 2) Times (30.42.0), 3) prices 12553, 12442, 4) Codes (DJ123) and 5) trade size.
I want to take column 3 and 5 (price and trade size into matlab). I am having some trouble as the csv is quite big.
I tried this:
fileID = fopen('K:\test\test\DJ.csv');
A = fread(fileID,'double');
fclose(fileID);
But it only gives me a vector of values which are not the same as my data. Any help would be very much appreciated.
Thanks.

1 Comment

As a note, importdata works, but it is not suitable for very large files.

Sign in to comment.

 Accepted Answer

dpb
dpb on 25 Dec 2013
Edited: dpb on 26 Dec 2013
fread is for stream unformatted files; you have formatted delimited file--
doc textscan % and friends
If you really only want/need the two columns sotoo (air-code, untested)
[p,s]=textread('K:\test\test\DJ.csv','%*s%*s,%f%*f%f','delimiter',',');
ought to do unless the third column is indeed a comma-for-a-decimal point as well as a comma-delimited file. In that case you've got a problem. You'll have to read three values instead of just two or preprocess the file or otherwise handle the decimal separator as Matlab can't (and you can't expect it to) know the difference between comma-delimiters and decimal places.

7 Comments

Mate 2u
Mate 2u on 2 Jan 2014
Edited: Mate 2u on 2 Jan 2014
Hi, unfortunately this does not work.
My data is in this form:
01/02/2007 21:58.0 12541 DJH07 1
01/02/2007 22:50.0 12541 DJH07 1
01/02/2007 30:42.0 12545 DJH07 1
01/02/2007 11:31.0 12553 DJH07 2
01/02/2007 51:48.0 12554 DJH07 2
01/02/2007 13:30.0 12554 DJH07 1
01/02/2007 16:14.0 12554 DJH07 3
Could somebody help me please?
fid = fopen('K:\test\test\DJ.csv', 'r');
datacell = textscan(fid,'%*s%*s,%f%*s%f','delimiter',',');
fclose(fid);
prices = datacell{1};
tradesize = datacell{2};
Hi there,
Still not working.
When I run above, I get fid=3, datacell [1x2 cell] which is blank, and blank for prices and tradesize.
To note the above data was pasted from excel. Thanks for all your help.
What's the actual file look like is the question. Is there a header row, perhaps, ahead of the data so you also need 'headerlines',1 as an argument pair to the textscan call?
At least the fid=3 indicates did open the file successfully.
Remember when you're testing to always either
frewind(fid)
or
fid= fclose(fid);
and then reopen between attempts or you'll leave the file pointer somewhere besides the beginning which will be bound to cause confusion at best.
Hi there, There is no header row just data. The data as open in notepad is the following:
01/02/2007,00:15:00.000,12540,DJH07,1
01/02/2007,00:21:58.000,12541,DJH07,1
01/02/2007,00:22:50.000,12541,DJH07,1
01/02/2007,00:30:42.000,12545,DJH07,1
01/02/2007,01:11:31.000,12553,DJH07,2
01/02/2007,01:51:48.000,12554,DJH07,2
01/02/2007,02:13:30.000,12554,DJH07,1
01/02/2007,02:16:14.000,12554,DJH07,3
01/02/2007,02:21:40.000,12554,DJH07,1
01/02/2007,02:26:48.000,12558,DJH07,1
01/02/2007,02:50:44.000,12555,DJH07,1
01/02/2007,03:14:57.000,12557,DJH07,1
01/02/2007,03:22:41.000,12559,DJH07,1
But each data entry is different lines within the notepad file. Thanks so much for your help.
datacell = textscan(fid,'%*s%*s%f%*s%f','delimiter',',');
The previous version had a stray comma in the format.
Thank you, worked well.

Sign in to comment.

More Answers (0)

Asked:

on 24 Dec 2013

Commented:

on 5 Jan 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!