extracting numbers with decimal places from the body of text file and assigning to a variable

21 views (last 30 days)
Hi,
I have a text file that I have read in to Matlab as a character array. This file has text written in the body of it, however I am after the specific variables.
I want to extract specific values from the text to assign to their specific variable.
For example my text has something like the following in italics:
Header with text and comments
other text that I am not interested in, etc.
AAA = 18.457
BBB = 34.6
CCC = 4
I would like my results to be a series of variables
AAA = 18.457
BBB = 34.6
CCC = 4
Which I could then use to perform operations on.
I tried using the following:
fid = fopen(“file”,’r’)
text = textscan(fid,'%s','Delimiter','','endofline','');
text = text{1}{1};
fid = fclose(fid);
Expression = ‘AAA = (\d+)';
AAA = regexp(text,expression,'tokens');
However, this only printed out “18” rather than my desired “18.457” (so stopping at the decimal character). Is there a way to extract a number that may or may not have decimal places?
Ideally, I would also make it so it wasn’t sensitive to matching the exact number of spaces after the variable either “e. It just needs “AAA”, rather than “AAA “.
Is there a way to use Matlab to achieve what I want?

Accepted Answer

Stephen23
Stephen23 on 1 Jan 2021
Edited: Stephen23 on 1 Jan 2021
%str = fileread(..) % <- simpler way to import the file data.
str = sprintf('%s\n','Header with text and comments','other text that I am not interested in, etc.','AAA = 18.457','BBB = 34.6','CCC = 4')
str =
'Header with text and comments other text that I am not interested in, etc. AAA = 18.457 BBB = 34.6 CCC = 4 '
rgx = '^\s*(\w+)\s*=\s*(\d+\.?\d*)';
tkn = regexp(str,rgx,'tokens','lineanchors');
tkn = vertcat(tkn{:}).';
tkn(2,:) = num2cell(str2double(tkn(2,:)));
out = struct(tkn{:})
out = struct with fields:
AAA: 18.4570 BBB: 34.6000 CCC: 4
out.AAA
ans = 18.4570
Personally I would use a different approach: open the file, read the header lines using fgetl, then import the data using textscan. It would probably be easier than messing about with matching number formats (i.e. don't reinvent the wheel).
  4 Comments
James Browne
James Browne on 2 Jan 2021
Thanks, I made that work with my code.
I added in "\-?" so the token is now "(\-?\d+\.?\d*)" because I also wanted to include negative numbers as possible outputs.
Instead of pulling out individual variables from the structure array (ie. with out.aaa) is it possible to make each variable in the structure array into a variable along with it's name?
Stephen23
Stephen23 on 2 Jan 2021
Edited: Stephen23 on 2 Jan 2021
"is it possible to make each variable in the structure array into a variable along with it's name?"
Possible yes, but only if you want to force yourself into writing slow, complex, obfuscated, buggy code that is difficult to debug:
There are so many reasons why that is a fragile, bad approach to writing your code. For example, consider what your code would do if the header name happens to be the same name as any existing variable: it would simply overwrite that variable without any warning. Such bad code design allows for all sorts of latent bugs that are difficult to track down because they depend on specific data... ugh.
If you know the headers/variables in advance then by all means allocate them explicitly:
If you do NOT know the headers in advance then magically creating variables from them would be a fragile, buggy, ugly approach: how would you even know what header had been imported? (trivially easy to do with the structure, quite tricky to do with randomly named variables in a workspace)

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!