How to import Text File with 2 different Delimiters (how to organize header data and numeric data)

10 views (last 30 days)
I want to import a text file. This contains a header (with space as delimiter) and data (tab delimited).
The txt-file looks like this:
FORMAT TAB_DELIMITED
NUM_HEADER_BLOCKS 162
NUM_PARAMS 646
PT_COUNT.CND_1 3895
FRAMES.CND_1 16
FILE_TYPE TIME_HISTORY
OPERATION RSP_TO_TAB
DATA_TYPE ASCII_FLOATING_POINT
DATE Fri Jun 23 11:20:24 2017
DELTA_T 9.765625e-02
TOTAL_T 3.803711e+02
PTS_PER_FRAME 256
PTS_PER_GROUP 256
CHANNELS 120
.
.
NUM_ZEROS 5 %end of header with line index 646
RfLongPositionFbk RfLatPositionFbk ...... %start of tab delimited area with the data (120 channels)
mm mm
-12.6182 -4.071238
-12.6192 -4.070237
-12.6182 -4.069237
  1. I want to search the Line which contains "NUM_PARAMS" and want to read the numeric value, which tell me the size of the header section.
  2. After that I want to read the file up to the line 646 in 2 rows - (1st row -> parameter name and 2nd row value.#Then I want to read the data (which is tab delimited - 120 channels).It would be fine if I can rename the channels with the names shown in the line above the units of measurement.
I started to read the full txt-file with the following code to import the header and search for the NUM_PARAM:
s = textscan(fid, '%s%s', 'delimiter', ' ');
idx_NUM_PARAMS = find(strcmp(s{1}, 'NUM_PARAMS'), 1, 'first');
NUM_PARAMSdbl = str2double(s{1,2}{idx_NUM_PARAMS,1});
But I imported also the data as String which is not usable because of the different delimiter.
So I read out the data in a second step:
dataTable = readtable(fileName, 'Delimiter', '\t', 'headerLines',NUM_PARAMSdbl+4,'ReadVariableNames',true);
But I cannot name the rows with the channel names, only with the line right above the data (with the units of measurement).
Thank you for every hint how can I solve my problem.

Answers (1)

Cedric
Cedric on 1 Nov 2017
Edited: Cedric on 1 Nov 2017
You may not need to use header information for parsing your file. Look at this example (applied to data.txt attached):
content = fileread( 'data.txt' ) ;
% - Split header/data.
pos = strfind( content, 'RfLongPositionFbk' ) ;
header = strtrim( content(1:pos-1) ) ;
data = content(pos:end) ;
% - Header -> struct with numeric values when possible.
header = regexp( header, '^(\S+)\s+([^\r\n]+)', 'tokens', 'lineanchors' ) ;
header = vertcat( header{:} ) ;
fNames = regexprep( header(:,1), '\W', '_' ) ;
values = strtrim( header(:,2) ) ;
buffer = str2double( values ) ;
isNum = ~isnan( buffer ) ;
values(isNum) = num2cell( buffer(isNum) ) ;
header = cell2struct( values,fNames ) ;
% - Data -> num array.
data = cell2mat( textscan( data, '%f %f', 'headerlines', 2 )) ;
Running this, you get:
>> header
header =
struct with fields:
FORMAT: 'TAB_DELIMITED'
NUM_HEADER_BLOCKS: 162
NUM_PARAMS: 646
PT_COUNT_CND_1: 3895
FRAMES_CND_1: 16
FILE_TYPE: 'TIME_HISTORY'
OPERATION: 'RSP_TO_TAB'
DATA_TYPE: 'ASCII_FLOATING_POINT'
DATE: 'Fri Jun 23 11:20:24 2017'
DELTA_T: 0.0977
TOTAL_T: 380.3711
PTS_PER_FRAME: 256
PTS_PER_GROUP: 256
CHANNELS: 120
NUM_ZEROS: 5
>> data
data =
-12.6182 -4.0712
-12.6192 -4.0702
-12.6182 -4.0692
  7 Comments
Stephen23
Stephen23 on 3 Nov 2017
Ulrich Bretz's "Answer" moved here:
That's now my status:
content = fileread(fileName);
lineStarts = [0, strfind( content, sprintf('\n') )] + 1 ;
numParams_header = str2double( regexp( content, '(?<=NUM_PARAMS\s+)\S+', 'match', 'once' ));
header = content(lineStarts(1):(lineStarts(numParams_header+1)-1));
channels = content(lineStarts(numParams_header +3):(lineStarts(numParams_header +4)-1));
units = content(lineStarts(numParams_header +4):(lineStarts(numParams_header +5)-1));
data = content(lineStarts(numParams_header +6):end);
How can i convert the channels and units from a sequence of characters to a char array?
I use Matlab R2014a
Cedric
Cedric on 3 Nov 2017
Edited: Cedric on 3 Nov 2017
The answer in my comment above does this already. But if you want to follow your current approach, you can use STRSPLIT to get cell arrays of channel names and units (and possibly STRTRIM before, to get rid of \r if STRSPLIT outputs a 121th empty cell).
For the data, I would do it this way:
data = sscanf( data, '%f' ) ; % Long vector of all data.
data = reshape( data, numel(channels), [] ).' ; % Reshape into array.
where channels is a cell array of channel names (output of STRSPLIT).

Sign in to comment.

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!