- the first columns comprise of exactly four characters (which may be spaces).
- the date vectors always start with asterisks, but no other lines do.
- no empty lines between the date vectors and the data matrices.
- the matrices contain numeric data only.
using regexp for space delimited strings in text file.
5 views (last 30 days)
Show older comments
I need to extract repeated strings' lines from the attached text file. For example there are 2 lines which start with "P 1" (two spaces after P) string in the data file. I need to extract 2nd and 4th column of these lines as follows;
array_P1=[ 6444.951599 -24080.372159 -8934.980576; 6645.371003 -22892.293251 -11497.619680];
I use following codes (from Stephen Cobeldick) if there are no space in repeated strings (for example P1);
fid = fopen('data_file.txt','rt');
str = fscanf(fid,'%c',Inf);
fclose(fid);
C = regexp(str,'^P1( +\S+)+\s+$','lineanchors','tokens');
C = regexp(vertcat(C{:}),'\S+','match');
N = str2double(vertcat(C{:}));
But this doesn't work if there are spaces in the repeated strings as in my example (P 1)
0 Comments
Accepted Answer
Stephen23
on 26 Jan 2016
Edited: Stephen23
on 26 Jan 2016
Try this:
% textscan options:
opt = {'MultipleDelimsAsOne',true,'CollectOutput',true};
% required arrays:
str = 'X';
dtv = [];
dat = {};
% open textfile:
fid = fopen('data.txt','rt');
while ischar(str)
% skip lines until first char is '*' (date vector):
while ~strcmp(str(1),'*')
str = fgetl(fid);
end
% convert date vector to numeric:
dtv(end+1,:) = str2double(regexp(str(2:end),'\S+','match')); %#ok<SAGROW>
% get file position:
pos = ftell(fid);
% read first line of matrix:
str = fgetl(fid);
if ischar(str)
% calculate how many columns in the matrix:
N = numel(regexp(str(5:end),'\S+','match'));
fmt = repmat('%f',1,N);
% rewind one line:
fseek(fid,pos,'bof');
% read entire matrix:
dat{end+1} = textscan(fid,['%4[^*]',fmt],opt{:}); %#ok<SAGROW>
end
end
% concatenate data in cell arrays:
dat = vertcat(dat{:});
mat = vertcat(dat{:,2});
This reads the entire data matrix (between the date vectors) into a numeric matrix inside the cell array dat, and the date vectors in dtv. It automatically adjusts for the different numbers of columns in your matrices. Some important assumptions:
Have a look inside dat, and pick the data that you need:
>> cell2mat(cellfun(@(m)m(1,[1,2,3]),dat(:,2),'UniformOutput',false))
ans =
1.0e+04 *
0.6445 -2.4080 -0.8935
0.6645 -2.2892 -1.1498
I also concatenated the matrices into mat, which lets gives you all of the matrices in one. This might be easier to access:
>> mat([1,10],[1,2,3])
ans =
1.0e+04 *
0.6445 -2.4080 -0.8935
0.6645 -2.2892 -1.1498
I tested this code on both of the files that you have provided (this question, and your last question), which are also available here:
More Answers (1)
Guillaume
on 26 Jan 2016
This regex should work for you:
'^P\s*1( +\S+)+\s+$'
It simply adds 0 or more (the *) whitespace characters (the \s) between P and 1.
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!