File is all numeric, but csv read does not work fully

1 view (last 30 days)
Hi, I'm trying to read in a bunch of data files one at time as a matrix, use the find function to find a certain z location and when I do, I store that row of data in a new matrix. My problem is that no matter what I do, I get this error:
??? Error using ==> dlmread at 145
Mismatch between file and format string.
Trouble reading number from file (row 781666, field 4) ==>
Error in ==> csvread at 52
m=dlmread(filename, ',', r, c);
Error in ==> Velocity_AtPt_vs_Time at 67
datafile=csvread(fullname,1);
The data files are all identical, 1 row of column headers and 14 columns of all numeric data, I made it so that the csvread skips the first row and reads all else. My files are approximately 1 million rows x 14 columns.
What's happening is the code executes for 69 data files doing exactly the steps I wish it to/filling the new matrix properly and then stops and gives me this error after the 69th. I have tried taking away the 70th and 71st files to see what happens, it now stops at 67 files. Very odd. If anyone has suggestions, please let me know! Thanks for reading
This is my loop that receives the error message:
for k = 1:numel(filenames)
% Create full file name and partial filename
fullname = [currentfolder filesep NEWFileNames(k).name];
% Read in data
datafile=csvread(fullname,1);
[rr1,cc1] = find(datafile(:,z)==0.0075000000008515);
firstrow1 = rr1(1,1);
firscol1 = cc1(1,1);
dataset1(k,:)=datafile(firstrow1,:);
end
Note: The rr1 and cc1 are just so that I may take the first instance this value shows up, but is not the error with this code
  15 Comments
Jenna P
Jenna P on 19 Apr 2016
Edited: Jenna P on 19 Apr 2016
I still have not found a solution to this problem. It does not make sense to me
edit: Actually... using xlsread instead of csvread may have worked..but painfully slow
dpb
dpb on 19 Apr 2016
Did you try the textscan solution? The first response in my earlier Answer should work simply substituting it (and the appropriate fopen|fclose pair of course) for csvread.
I'd surely suggest making that attempt before going to xlsread. If it's something peculiar about [csv|dlm]read causing the (what I think is a resource issue) error, textscan is standalone and if it also aborts that's pretty indicative it's more fundamental.
Also, did you file a Service Request with TMW Support on the issue?

Sign in to comment.

Answers (1)

dpb
dpb on 7 Apr 2016
Edited: dpb on 20 Apr 2016
OK, to separate from the long-winded chain of comments...this isn't the full answer yet, but a "getting-started" for textscan solution.
>> fid=fopen('file70_part.csv'); % open the file
>> d=cell2mat(textscan(fid,'','headerlines',1,'delimiter',',','collectoutput',1));
>> whos d
Name Size Bytes Class Attributes
d 26x14 2912 double
>> fid=fclose(fid);
>>
The above is all needed to read the full file; I've done a couple of things to make note of--
  1. Used empty string '' for the format string. This has the effect that Matlab will determine the fields per record automagically and return the proper shape; otherwise you have to know the number per record and write a specific format string to match, and;
  2. Used cell2mat around the textscan call to return the data as double array rather than the cell array otherwise returned. 'collectoutput' serves to make a single array, not 14.
What's not shown here is a counted number of records to read... That can be as simple as--after the fopen, of course:
>> fgetl(fid); % get, throwaway the header row
>> while ~feof(fid) % until run out of data
d=cell2mat(textscan(fid,'',5,'delimiter',',','collectoutput',1));
d(:,14).',end
ans =
0.0079 0.0078 0.0076 0.0070 0.0071
ans =
0.0072 0.0074 0.0075 0.0088 0.0087
ans =
0.0085 0.0084 0.0083 0.0081 0.0080
ans =
0.0079 0.0078 0.0076 0.0075 0.0074
ans =
0.0071 0.0072 0.0087 0.0085 0.0084
ans =
0.0083
>>
This aborts with a short group since the size isn't evenly divisible; there were 26 lines of data. You'll end up aborting the loop because (hopefully) your search for the particular value will have succeeded and then you do a break, fclose and do whatever with the data you found and go to the next file.
NB Previously had forgotten to remove the 'headerlines',1 parameter so was skipping a record each loop through. Had accounted for the single header record at the beginning of the file with the fgetl call before beginning the loop.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!