reading only a portion of a text file
Show older comments
I would like to read in a text file that contains a header and footer of information, where the number of rows of the header/footer, and number of rows to read can vary.
Here is an example of a row of data I would like to read from the text file.
All rows start with the ARR++ and are delineated with ':' I now I need i most likely need to use fopen / fprintf / fget1 / textscan but hoping someone can help set this up.
One other thing with the rows of data I would like to read in, there is date information like: 2013320133. I would ideally like to read only the first 5 digits of that date and separate the year and quarter into separate columns -- 20133 --> 2013 3
Here is a better example of the type of file I would like to read in. I am only interested in the ARR++ lines. I would be interested to have only the first 5 digits of 2013420134. Thanks a lot.
UNA:+.? ' UNB+UNC:140305:1444++' UNH+:2:1:E6' BGM+74' NA+Z02+' NAD+M+50' ND+MS+C2' STS+3+7' DM+242:20144:203' GI+AR3' GS+1:::-' ARR++Q:S:C:A:1N:2013420134:708:1234.323:A:N' ARR++Q:S:C:A:1N:2013420134:708:12234.323:A:N' ARR++Q:S:C:A:1N:2013420134:708:133234.323:A:N' ARR++Q:S:C:A:1N:2013420134:708:132234.323:A:N' ARR++Q:S:C:A:1N:2013420134:708:123334.323:A:N' ARR++Q:S:C:A:1N:2013420134:708:1232134.323:A:N' ARR++Q:S:C:A:1N:2013420134:708:123324.323:A:N' ARR++Q:S:C:A:1N:2013420134:708:123234.323:A:N' UNT+16+' UNZ+1+I3800'
5 Comments
Image Analyst
on 5 Apr 2014
You'll learn it better if you set it up yourself. You may also find the function strfind() useful for ignoring lines without ARR++ in them.
dpb
on 5 Apr 2014
An actual exact copy of a short segment of a file would help more than a paraphrased one--can't tell what's editorial and what's data as posted.
One useful feature in textscan is the 'commentstyle' optional argument--that'll probably allow you to account for the variable header length if it is delineated as shown w/ the ";" by using them as matching pairs.
The question is what does the actual dataline look like--are these actual lines or header/format lines w/ the ;start/;end messages?
Or, how large a file? If not large, it would be trivial to fgetl a line at a time and find the 'ARR++' strings and then just parse them. The line-at-a-time inefficiency for the i/o isn't generally that bad if files aren't quite large...
Jeff
on 5 Apr 2014
Image Analyst
on 5 Apr 2014
In the past 6 hours, have you at least given fgetl() or textscan() a try ? Or do you really really need us to do it 100% for you?
Accepted Answer
More Answers (2)
Image Analyst
on 5 Apr 2014
OK Jeff I did it for you. It just took a couple of minutes. I copied the data you gave to a test.dat file. Then I wrote code to read it in using fgetl() and search for lines that start with "ARR++Q:S:C:A:1N:" based on code I got in the help for fgetl. Then I extracted the 5 numerical characters from the string and converted it to a double number. Here is the code for you:
fid = fopen('test.dat');
tline = fgetl(fid);
k = 1; % Counter for lines that are valid.
while ischar(tline)
disp(tline)
colonLocation = strfind(tline, 'ARR++Q:S:C:A:1N:');
if ~isempty(colonLocation)
subString = tline(17:21);
output(k) = str2double(subString);
k = k + 1;
end
tline = fgetl(fid);
end
fclose(fid);
% Print output to command window:
output
Results in the command window:
output =
20134 20134 20134 20134 20134 20134 20134 20134
4 Comments
Jeff
on 5 Apr 2014
Image Analyst
on 5 Apr 2014
Just add a line
outputStrings{k} = tline;
to save the entire line also.
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!