Regexp to extract standalone numbers from string

Question

Dan on 10 Dec 2022

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1875582-regexp-to-extract-standalone-numbers-from-string

Commented: Walter Roberson on 12 Dec 2022

Hello,

I'm trying to extract numbers from a txt file which contains tables where the elements are separated by different amount of white space.

The content might look like the example below and variable rows and columns. However the amount of "free" numbers is always the same

To get the file in matlab i read it line by line using fgetl

str{1,1} = 'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'
str = 1×1 cell array
    {'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'}

My goal is to extract only the numbers that are not part of text string. So that would be 21, 202, 203.02, -204.001, 1, 01. So that would be both decimal separated and non-decimal separated numbers.

I've played a bit with the regexp patterns and the closest i get is to use;

rxpPat = '\d+\.?\d*';
regexp(str{1,1},rxpPat,'match')
ans = 1×14 cell array
    {'0123'}    {'21'}    {'20'}    {'00'}    {'200.1'}    {'21'}    {'2222'}    {'202'}    {'203.02'}    {'204.001'}    {'2'}    {'31'}    {'1'}    {'01'}

The problem with that is that it will also catch the numbers from X?YYx0123 and that way distorts my result.

Do you have an idea how i can approach the problem?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Cris LaPierre on 11 Dec 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1875582-regexp-to-extract-standalone-numbers-from-string#answer_1125132

Open in MATLAB Online

Borrowing heavily from this answer and this doc page.

str{1,1} = 'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -';
regexp(str{1,1},'(?<=\s)[+-]?\d+\.?\d*(?=\s)', 'match')
ans = 1×6 cell array
    {'21'}    {'202'}    {'203.02'}    {'-204.001'}    {'1'}    {'01'}

2 Comments
Show NoneHide None

Dan on 12 Dec 2022

That seems to be working fine for what i need. Thanks!

Walter Roberson on 12 Dec 2022

Open in MATLAB Online

str{1,1} = '404 X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - - 92';
christ = regexp(str{1,1},'(?<=\s)[+-]?\d+\.?\d*(?=\s)', 'match')
christ = 1×6 cell array
    {'21'}    {'202'}    {'203.02'}    {'-204.001'}    {'1'}    {'01'}
wdr = str2double(regexp(str{1,1}, '(?<=^|\s)[+-]?\d+(\.\d*)?(?=\s|$)', 'match'))
wdr = 1×8
  404.0000   21.0000  202.0000  203.0200 -204.0010    1.0000    1.0000   92.0000

That is, the version Cris posted does not find the numbers if they are first or last in the string, but the version I posted in my Answer does.

Sign in to comment.

Answer 2

Steven Lord on 11 Dec 2022

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1875582-regexp-to-extract-standalone-numbers-from-string#answer_1125137

Open in MATLAB Online

I wouldn't use regexp here. I'd use string, strsplit, and double.

S = 'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'
S = 'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'
S = string(S);
parts = strsplit(S, ' ')
parts = 1×15 string array
    "X?YYx0123"    "[un]"    "21ZZz20AaaB00"    "A200.1"    "21"    "Xx2222"    "202"    "203.02"    "-204.001"    "A(2)"    "B(V31)"    "1"    "01"    "-"    "-"

Because we converted S from a char vector into a string array above, we can use double to turn those elements of parts that are the text representation of valid numbers into those numbers while turning the other strings into NaN. If we'd left them as a char array we'd get the values of the characters that make up the text representations of those numbers, not the numbers themselves.

notWhatWeWant = double(char(parts(5))) % double('21') is not 21
notWhatWeWant = 1×2
    50    49
D = double(parts) % double("21") is 21
D = 1×15
       NaN       NaN       NaN       NaN   21.0000       NaN  202.0000  203.0200 -204.0010       NaN       NaN    1.0000    1.0000       NaN       NaN

Now just remove the NaN values. This does assume that NaN is not a valid numeric value in your string that you want to extract.

validparts = D(~isnan(D))
validparts = 1×6
   21.0000  202.0000  203.0200 -204.0010    1.0000    1.0000

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

Voss on 11 Dec 2022

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1875582-regexp-to-extract-standalone-numbers-from-string#answer_1125197

Edited: Voss on 11 Dec 2022

Open in MATLAB Online

Very similar to Steven Lord's answer, but using str2double() instead of converting to string and using double():

str{1,1} = 'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'
str = 1×1 cell array
    {'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'}
D = str2double(strsplit(str{1,1}));
D = D(~isnan(D))
D = 1×6
   21.0000  202.0000  203.0200 -204.0010    1.0000    1.0000

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 4

Image Analyst on 11 Dec 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1875582-regexp-to-extract-standalone-numbers-from-string#answer_1125117

Open in MATLAB Online

I don't understand what the problem is. What's wrong with getting the numbers from X?YYx0123?

By the way, here is the new way to get numbers:

str{1,1} = 'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'
str = 1×1 cell array
    {'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'}
pat = digitsPattern
pat = pattern
  Matching:

    digitsPattern
numbers = extract(str{1,1}, pat)
numbers = 17×1 cell array
    {'0123'}
    {'21'  }
    {'20'  }
    {'00'  }
    {'200' }
    {'1'   }
    {'21'  }
    {'2222'}
    {'202' }
    {'203' }
    {'02'  }
    {'204' }
    {'001' }
    {'2'   }
    {'31'  }
    {'1'   }
    {'01'  }

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 5

Walter Roberson on 11 Dec 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1875582-regexp-to-extract-standalone-numbers-from-string#answer_1125182

Edited: Walter Roberson on 11 Dec 2022

Open in MATLAB Online

format short
S = 'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'
S = 'X?YYx0123 [un]  21ZZz20AaaB00     A200.1  21  Xx2222 202  203.02 -204.001  A(2) B(V31)  1 01  - -'
D = str2double(regexp(S, '(?<=^|\s)[+-]?\d+(\.\d*)?(?=\s|$)', 'match'))
D = 1×6
   21.0000  202.0000  203.0200 -204.0010    1.0000    1.0000

This supports optional positive or negatives sign
This supports the possibility that the value is an integer with no decimal point
This supports the possibility that the value has a decimal point but there are no digits after the decimal point
This specifically checks for whitespace before and after the number, so the A200.1 would not be matched. But that also means that comma directly after a number is not supported.
This does not support exponent notation with d or D or e or E, and with optional + or - before the exponent values
This does not support number starting directly with the decimal point without a 0 before the decimal point, such as .2

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Regexp to extract standalone numbers from string

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (4)

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Regexp to extract standalone numbers from string

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (4)

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments