Need to open a large dat. file to read, and then plot, the problem is that when opening and reading the file in MATLAB is not reading all the lines

14 views (last 30 days)
The dat. file contains 42672 lines of data to read that need to be plotted. This is the code I have done so far
fileID=fopen('F:\code\matlab\GVRs\data\met\met_backup\full_met_files\met_CR1000_met_all.dat','r');
while ~feof(fileID)
tline = fgetl(fileID);
S = textscan(fileID,'%s %d %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f', 'HeaderLines', 4, 'Delimiter', ','); % HeaderLines- to skip as much rows as needed. Delimeter ',' so matlab doesn't assumes is a space and acknowledges is a coma
newChr = strrep(S{1},'"','' );
disp(newChr);
date =strrep(newChr,'''',''); %get rid of single quote
DateString = date;
formatIn = 'yyyy-mm-dd HH:MM:SS';
DateNumber = datenum(DateString,formatIn);
end
The addition of fgetl(fileID) was because originally I had only this line, but not all the line files get translated only like around 400 lines files are getting read by MATLAB using the code below
fileID=fopen('F:\code\matlab\GVRs\data\met\met_backup\full_met_files\met_CR1000_met_all.dat','r');
S = textscan(fileID,'%s %d %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f', 'HeaderLines', 4, 'Delimiter', ','); % HeaderLines- to skip as much rows as needed. Delimeter ',' so matlab doesn't assumes is a space and acknowledges is a coma
  4 Comments
Walter Roberson
Walter Roberson on 29 Nov 2022
fgetl() reads one line from the file.
There is no difference in final file position our textscan output between
fgetl(fileID);
textscan(fileID, format, 'HeaderLines', 4)
compared to
textscan(fileID, format, 'HeaderLines', 5) %with no fgetl
Carolina Corella Velarde
Carolina Corella Velarde on 29 Nov 2022
I tried this but is only reading the beginning portion of the file, like around 400 data points, and not the rest. In total I need to be able to have matlab read and process 42668 data points.

Sign in to comment.

Answers (1)

Walter Roberson
Walter Roberson on 29 Nov 2022
S = textscan(fileID,'%s %d %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f', 'HeaderLines', 4, 'Delimiter', ','); % HeaderLines- to skip as much rows as needed. Delimeter ',' so matlab doesn't assumes is a space and acknowledges is a coma
For all format items except %c the first thing that is done when a % format is encountered, is to examine the current character in the file, and skip any character that appears in the Whitespace list or EndOfLine list; if 'MultipleDelimiters' option is true, then any character in the Delimiters list is also skipped.
By the time the processing of the format itself starts, except for %c the file positioned will be positioned at a non-Whitespace character; if MultipleDelimiters is false then it might be positioned at a Delimiter character
A %s format will read from that (non-whitespace) character until it encounters something in Delimiter or in EndOfLine, or reaches end of file, "consuming" the (first) delimiter or end of line character and leaving the file positioned immediately after that. Note that % does not specifically look for non-numeric characters: %s is entirely happy to consider (for example) '1984' as being %s. The only way that %s can be considered by textscan to fail to match is if end of file was reached; the skipping of initial whitespace and end of line would have zipped through any empty lines and any leading blanks or tabs on such lines, looking for the first non-blank thing and reading that. Encountering something in the Delimiter list immediately is not considered a failure for %s purposes: that just results in that particular entry being recorded as empty character vector.
%d after that first triggers discarding of whitespace, as described above. Then it looks for characters that can be present in a (possibly complex-valued) floating point number. If it finds such a number, it converts it to int32() and saves it. If it does not find such a number, then the match is considered to fail and textscan would stop processing; in the case where the %s succeeded but the %d failed, then the cell for the %s would have one more entry than the cell for the %d .
Notice that the processing absolutely does not examine the whole line to determine whether the line as a whole matches. If the input was
header1
header2
header3
header4
antaires 1134 (bunch of numbers)
mars -803 (bunch of numbers)
end part1
header1
header2
header3
header4
zernith 3510 (bunch of numbers)
lanroz 3535 (bunch of numbers)
end part2
then the %s%d%f<etc> format would not look at the "end part1" line and say "oh, that does not match the template of the format" and stop reading when it reached that line. Instead, the %s is going to discard the leading blanks, read and store the word "end" from the line, and then the %d would get control and would fail because 'part1' is not a valid number. The return values would be {'antaires'; 'mars'; 'end'} (three rows) and {[1134; -803]} (two rows). And textscan would be left positioned at the beginning of the 'part1'
I predict, then, that the reason you are not reading as many lines as you expect, is that you have incorrect expectations about how textscan will determine the end of repetitions of the format. That %s at the beginning is going to give you problems, happy to consume any leading keyword or difference in style that a human might look at and say "Oh, that's obviously the end of the block". textscan() %s just knows it is text and will grab it; textscan() never does look-ahead to see whether the rest of the format matches, and instead just keeps reading until one of the format items fails.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!