Remove or ignore certain row while reading from text files?
11 views (last 30 days)
Show older comments
Hello. I have a number of text files in different subfolders which are in one main folder. My task was to read all the text files, convert all the read information in a particular format into a cell array and then write the cell array to an excel sheet.
The task is completely done, however there is a slight change in the data in text files. The new files that i have gotten have one extra row. Without that row my script runs totally fine. But with the new row added, i get this error:
Subscript indices must either be real positive
integers or logicals.
Error in taskFinal (line 52)
newPDU(i) = newPDU(i-1);
What I need is a little help regarding how to deal with this useless row.
The row number is 37 in the files. What I need is that while reading the data from the text files, either we ignore that row or also simply remove the line from the cell array when the data from the file is read into the cell array. There is only one word in that row which is " [7E8] ". The m-file and one text file is attached below.
Thank you for any kind of help.
EDIT: Text file attached.
EDIT: The unwanted row is present in some files while in some files it is not.
4 Comments
per isakson
on 22 Jul 2016
Edited: per isakson
on 23 Jul 2016
"however there is a slight change in the data in text files"   This reminds me of function I made long time ago to read a huge text file with descriptive information from a building automation system, BAS. With each revision of the BAS there was a number of changes in the text file format. The purpose of many changes was just to make the text more readable on screen. I guess, it wasn't intended to be read automatically. Eventually, I gave up to maintain the function.
Question: Do you foresee a need to maintain this script to account for changes in the file format and/or requirements to extract more information? Currently, you only read a fourth of the file.
Accepted Answer
per isakson
on 21 Jul 2016
Edited: per isakson
on 26 Jul 2016
A quick and dirty solution: Delete the row, which causes trouble. Try
>> tic, preTaskFinal( 'h:\m\cssm\SS Escape EPA Hwy Cat Mon _6-2-2016_9-25-40 AM.txt' ); toc
Elapsed time is 0.580805 seconds.
where
function preTaskFinal( filespec )
fid = fopen( filespec, 'r' );
cac = textscan( fid, '%s', 'Delimiter','\n' );
[~] = fclose( fid );
cac = cac{1};
is_spurious_row = strncmp( cac, '[7E8]', 5 );
cac( is_spurious_row ) = [];
fid = fopen( 'TempTxt4TaskFinal.txt', 'w' );
for jj = 1 : length( cac )
fprintf( fid, '%s\r\n', cac{jj} );
end
[~] = fclose( fid );
end
 
Here is a different implementation.
>> source_spec = 'SS Escape EPA Hwy Cat Mon _6-2-2016_9-25-40 AM.txt';
>> row_content = '[7E8]';
>> target_spec = 'temp.txt';
>> tic, remove_specific_row( source_spec, row_content, target_spec ), toc
Elapsed time is 0.150321 seconds.
where
function remove_specific_row( source_spec, row_content, target_spec )
str = fileread( source_spec );
xpr = sprintf( '\\<[ ]*%s\\s+?\\n' ...
, regexptranslate('escape',row_content) );
buf = regexprep( str, xpr, '', 'once' );
fid = fopen( target_spec, 'w' );
fprintf( fid, '%s', buf );
fclose( fid );
end
and a slightly different one, which is faster
>> tic, remove_specific_row( source_spec, row_content, target_spec ), toc
Elapsed time is 0.028050 seconds.
where
function remove_specific_row( source_spec, row_content, target_spec )
str = fileread( source_spec );
xpr = sprintf( '(?<=\\n)\\s*?%s\\s*?\\n' ...
, regexptranslate('escape',row_content) );
buf = regexprep( str, xpr, '', 'once' );
fid = fopen( target_spec, 'w' );
fprintf( fid, '%s', buf );
fclose( fid );
end
5 Comments
per isakson
on 26 Jul 2016
Edited: per isakson
on 26 Jul 2016
I added a faster (and "better") implementation to the answer.
More Answers (0)
See Also
Categories
Find more on Low-Level File I/O in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!