Regexp to extract standalone numbers from string
9 views (last 30 days)
Show older comments
Hello,
I'm trying to extract numbers from a txt file which contains tables where the elements are separated by different amount of white space.
The content might look like the example below and variable rows and columns. However the amount of "free" numbers is always the same
To get the file in matlab i read it line by line using fgetl
str{1,1} = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
My goal is to extract only the numbers that are not part of text string. So that would be 21, 202, 203.02, -204.001, 1, 01. So that would be both decimal separated and non-decimal separated numbers.
I've played a bit with the regexp patterns and the closest i get is to use;
rxpPat = '\d+\.?\d*';
regexp(str{1,1},rxpPat,'match')
The problem with that is that it will also catch the numbers from X?YYx0123 and that way distorts my result.
Do you have an idea how i can approach the problem?
0 Comments
Accepted Answer
Cris LaPierre
on 11 Dec 2022
str{1,1} = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -';
regexp(str{1,1},'(?<=\s)[+-]?\d+\.?\d*(?=\s)', 'match')
2 Comments
Walter Roberson
on 12 Dec 2022
str{1,1} = '404 X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - - 92';
christ = regexp(str{1,1},'(?<=\s)[+-]?\d+\.?\d*(?=\s)', 'match')
wdr = str2double(regexp(str{1,1}, '(?<=^|\s)[+-]?\d+(\.\d*)?(?=\s|$)', 'match'))
That is, the version Cris posted does not find the numbers if they are first or last in the string, but the version I posted in my Answer does.
More Answers (4)
Steven Lord
on 11 Dec 2022
I wouldn't use regexp here. I'd use string, strsplit, and double.
S = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
S = string(S);
parts = strsplit(S, ' ')
Because we converted S from a char vector into a string array above, we can use double to turn those elements of parts that are the text representation of valid numbers into those numbers while turning the other strings into NaN. If we'd left them as a char array we'd get the values of the characters that make up the text representations of those numbers, not the numbers themselves.
notWhatWeWant = double(char(parts(5))) % double('21') is not 21
D = double(parts) % double("21") is 21
Now just remove the NaN values. This does assume that NaN is not a valid numeric value in your string that you want to extract.
validparts = D(~isnan(D))
0 Comments
Voss
on 11 Dec 2022
Edited: Voss
on 11 Dec 2022
Very similar to Steven Lord's answer, but using str2double() instead of converting to string and using double():
str{1,1} = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
D = str2double(strsplit(str{1,1}));
D = D(~isnan(D))
0 Comments
Image Analyst
on 11 Dec 2022
I don't understand what the problem is. What's wrong with getting the numbers from X?YYx0123?
By the way, here is the new way to get numbers:
str{1,1} = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
pat = digitsPattern
numbers = extract(str{1,1}, pat)
0 Comments
Walter Roberson
on 11 Dec 2022
Edited: Walter Roberson
on 11 Dec 2022
format short
S = 'X?YYx0123 [un] 21ZZz20AaaB00 A200.1 21 Xx2222 202 203.02 -204.001 A(2) B(V31) 1 01 - -'
D = str2double(regexp(S, '(?<=^|\s)[+-]?\d+(\.\d*)?(?=\s|$)', 'match'))
- This supports optional positive or negatives sign
- This supports the possibility that the value is an integer with no decimal point
- This supports the possibility that the value has a decimal point but there are no digits after the decimal point
- This specifically checks for whitespace before and after the number, so the A200.1 would not be matched. But that also means that comma directly after a number is not supported.
- This does not support exponent notation with d or D or e or E, and with optional + or - before the exponent values
- This does not support number starting directly with the decimal point without a 0 before the decimal point, such as .2
0 Comments
See Also
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!