Remove or ignore certain row while reading from text files?

11 views (last 30 days)
Hello. I have a number of text files in different subfolders which are in one main folder. My task was to read all the text files, convert all the read information in a particular format into a cell array and then write the cell array to an excel sheet.
The task is completely done, however there is a slight change in the data in text files. The new files that i have gotten have one extra row. Without that row my script runs totally fine. But with the new row added, i get this error:
Subscript indices must either be real positive
integers or logicals.
Error in taskFinal (line 52)
newPDU(i) = newPDU(i-1);
What I need is a little help regarding how to deal with this useless row.
The row number is 37 in the files. What I need is that while reading the data from the text files, either we ignore that row or also simply remove the line from the cell array when the data from the file is read into the cell array. There is only one word in that row which is " [7E8] ". The m-file and one text file is attached below.
Thank you for any kind of help.
EDIT: Text file attached.
EDIT: The unwanted row is present in some files while in some files it is not.
  4 Comments
per isakson
per isakson on 22 Jul 2016
Edited: per isakson on 23 Jul 2016
"however there is a slight change in the data in text files" &nbsp This reminds me of function I made long time ago to read a huge text file with descriptive information from a building automation system, BAS. With each revision of the BAS there was a number of changes in the text file format. The purpose of many changes was just to make the text more readable on screen. I guess, it wasn't intended to be read automatically. Eventually, I gave up to maintain the function.
Question: Do you foresee a need to maintain this script to account for changes in the file format and/or requirements to extract more information? Currently, you only read a fourth of the file.
yousaf obaid
yousaf obaid on 25 Jul 2016
Edited: yousaf obaid on 25 Jul 2016
probably i will not be needing to extract more information from the text files. Maybe in the near future i might need it depending on the needs of my colleague but right now i just have to read only one fourth of the file as you noted.

Sign in to comment.

Accepted Answer

per isakson
per isakson on 21 Jul 2016
Edited: per isakson on 26 Jul 2016
A quick and dirty solution: Delete the row, which causes trouble. Try
>> tic, preTaskFinal( 'h:\m\cssm\SS Escape EPA Hwy Cat Mon _6-2-2016_9-25-40 AM.txt' ); toc
Elapsed time is 0.580805 seconds.
where
function preTaskFinal( filespec )
fid = fopen( filespec, 'r' );
cac = textscan( fid, '%s', 'Delimiter','\n' );
[~] = fclose( fid );
cac = cac{1};
is_spurious_row = strncmp( cac, '[7E8]', 5 );
cac( is_spurious_row ) = [];
fid = fopen( 'TempTxt4TaskFinal.txt', 'w' );
for jj = 1 : length( cac )
fprintf( fid, '%s\r\n', cac{jj} );
end
[~] = fclose( fid );
end
&nbsp
Here is a different implementation.
>> source_spec = 'SS Escape EPA Hwy Cat Mon _6-2-2016_9-25-40 AM.txt';
>> row_content = '[7E8]';
>> target_spec = 'temp.txt';
>> tic, remove_specific_row( source_spec, row_content, target_spec ), toc
Elapsed time is 0.150321 seconds.
where
function remove_specific_row( source_spec, row_content, target_spec )
str = fileread( source_spec );
xpr = sprintf( '\\<[ ]*%s\\s+?\\n' ...
, regexptranslate('escape',row_content) );
buf = regexprep( str, xpr, '', 'once' );
fid = fopen( target_spec, 'w' );
fprintf( fid, '%s', buf );
fclose( fid );
end
and a slightly different one, which is faster
>> tic, remove_specific_row( source_spec, row_content, target_spec ), toc
Elapsed time is 0.028050 seconds.
where
function remove_specific_row( source_spec, row_content, target_spec )
str = fileread( source_spec );
xpr = sprintf( '(?<=\\n)\\s*?%s\\s*?\\n' ...
, regexptranslate('escape',row_content) );
buf = regexprep( str, xpr, '', 'once' );
fid = fopen( target_spec, 'w' );
fprintf( fid, '%s', buf );
fclose( fid );
end
  5 Comments
yousaf obaid
yousaf obaid on 26 Jul 2016
Edited: yousaf obaid on 26 Jul 2016
hello. Thank you for your help. i got it done using strncmp as you suggested but in a slightly different way. here is what i did:
if (strcmp(parameter{1}, '[7E8]')) %look and compare [7E8]
parameter=parameter(2:end); %if found, ignore it and start from next row
end
it looked for the "[7E8]" in the first cell of parameter column and if its present there, then it simply moved on to the second row of the parameter coulumn. Now i dont know if its efficient enough or not but its working for me and thats all that i wanted.
Any further input on this issue from your side is appreciated.
Thank you again for your help.
per isakson
per isakson on 26 Jul 2016
Edited: per isakson on 26 Jul 2016
I added a faster (and "better") implementation to the answer.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!