MATLAB Answers

How to delete rows of characters in Text files?

25 views (last 30 days)
Lei
Lei on 1 Jan 2015
Edited: per isakson on 9 Jan 2015
I was trying to input the data from lots of TXT files, but there were rows of characters. How can I delete the rows with characters? How can I create a new txt file with just numerical data? The example of the txt data is as follows:
*****************************************
* Log File Started 11:29:05 Wed Dec 31 2014
* Using PFC3D 4.00-182 (64-bit)
* Serial Number: 262-000-0000-00000
* By:
*
*****************************************
Fish>
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
Fish>
*****************************************
* Log File Ended 11:29:07 Wed Dec 31 2014
*****************************************
I would like to delete all the headers and footers. I was trying to use fgetl function, but only the headers was deleted.

  0 Comments

Sign in to comment.

Answers (2)

Pourya Alinezhad
Pourya Alinezhad on 1 Jan 2015
hi there, u can use the following lines of code:
fid=fopen('txtfile.extention');
textdata=textscan(fid,'%n%n%n%n%n%n','headerlines',8,'delimiter','\b\t');

  1 Comment

Lei
Lei on 2 Jan 2015
In the reference link, there is still only method for removing the headers, no footers actually. Thank you for this anyway.

Sign in to comment.


per isakson
per isakson on 2 Jan 2015
Edited: per isakson on 9 Jan 2015
There is no easy way to read blocks of numerical data, which are embedded in text. That might not be quite true, I just learned
Here are three different functions, which read and parse the numerical block of the the example file, cssm.txt, of the question.
cssm_1 &nbsp is a straight forward use of textscan. There are no problems to use it in this case because it is easy to determine the numbers of lines in the header and the block of data, respectively. &nbsp Matlab evolves gradually and it is easy to miss new behavior. With R2013a it is not neccessary to set rows_of_data, the number of time the formatspec is used. "[...] and stops when it cannot match formatSpec to the data." is new in the documentation of R2014a.
cssm_2 &nbsp is based on a different approach. The entire file is read to a string and regexp extracts the blocks of numerical data. str2num converts the blocks to numerical arrays. This function can handle many blocks.
cssm_3 &nbsp Sometimes the beginning and end of the blocks of tabular data are indicated with special strings. In this case Fish> indicates both beginning and end. fileread reads entire file to a string and regexp extracts the blocks bewteen the beginning and end markers. textscan parses the blocks.
Run on R2013a
>> num = read_block_demo( )
num(:,:,1) =
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
num(:,:,2) =
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
num(:,:,3) =
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0
>>
where
function num = read_block_demo()
filespec = 'cssm.txt';
data_frmt = '%f%f%f%f%f%f%f%f%f';
rows_of_data = 5;
header_lines = 8;
begin_xpr = '\*{20,}\s+Fish>\s+';
end_xpr = '\s+Fish>\s+\*{20,}';
num(:,:,1) = cssm_1( filespec,data_frmt, rows_of_data, header_lines);
num(:,:,2) = cssm_2( filespec, 50 );
num(:,:,3) = cssm_3( filespec, data_frmt, begin_xpr, end_xpr );
assert( all(all(num(:,:,2)==num(:,:,1))) ...
&& all(all(num(:,:,3)==num(:,:,1))) ...
, 'The methods don''t return indentical results' )
end
function num = cssm_1(filespec, data_frmt, rows_of_data, header_lines )
fid = fopen( filespec );
cac = textscan( fid, data_frmt, rows_of_data ...
, 'Headerlines' , header_lines ...
, 'CollectOutput' , true );
fclose( fid );
num = cac{1};
end
function num = cssm_2( filespec, block_size )
cac = read_blocks_of_numerical_data( filespec, block_size );
num = cac{1};
end
function num = cssm_3( filespec, data_frmt, begin_xpr, end_xpr )
str = fileread( filespec );
cac = regexp( str, ['(?<=',begin_xpr,').+(?=',end_xpr,')'], 'match' );
cac = textscan( cac{1}, data_frmt, 'CollectOutput', true );
num = cac{1};
end
function out=read_blocks_of_numerical_data(filespec,block_size,delimiter )
% block_size lower limit of number of characters in numerical block
%
% Within a block all rows must have the same number of "columns".
narginchk( 2, 3 )
buffer = fileread( filespec );
if nargin == 2
del_xpr = '[ ]+';
trl_xpr = '[ ]*';
else
del_xpr = ['([ ]*',delimiter,'[ ]*)'];
trl_xpr = ['([ ]*',delimiter,'?[ ]*)'];
end
num_xpr = '([+-]?(\d+(\.\d*)?)|(\.\d+))';
sen_xpr = '([EeDd](\+|-)\d{1,3})?'; % optional scientific E notation
num_xpr = [ num_xpr, sen_xpr ];
nl_xpr = '((\r\n)|\n)';
row_xpr = cat( 2, '(^|', nl_xpr, ')[ ]*(' ...
, num_xpr, del_xpr, ')*' ...
, num_xpr, trl_xpr, '(?=' ...
, nl_xpr,'|$)' );
blk_xpr = ['(',row_xpr,')+'];
blocks = regexp( buffer, blk_xpr, 'match' );
is_long = cellfun( @(str) length(str)>=block_size, blocks );
blocks(not(is_long)) = [];
out = cell( 1, length( blocks ) );
for jj = 1 : length( blocks )
out{jj} = str2num( blocks{jj} );
end
end
&nbsp
I learned that textscan in this case handles the free text at the end of a file better than I thought it would.
@Lei [...] there is still only method for removing the headers, no footers actually &nbsp textscan actually removes (/ignores) the footer automagically in your example.

  0 Comments

Sign in to comment.

Sign in to answer this question.