How to effectively use look ahead with regexp?

Question

pietro on 26 Jun 2017

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/346279-how-to-effectively-use-look-ahead-with-regexp

Edited: Stephen23 on 27 Jun 2017

Hi all,

I'm doing some coding with regular expressions, but there are a couple of things I can't understand. Look at the following

1. searching the letter 'r' followed by a number:

regexp('19f/4r power shift','(?<=\d*) ?r')
ans = 
  6    12
regexp('19f/4r power shift','(?<=\d)\s?r')
ans = 
    6

Why the '*' change so much the result? The 'r' at the 12th position is not followed by any number.

2- Searching for the word 'Reverser' that is not preceded by the words 'power' or 'powr'.

regexp('power  Reverser','(?<!powe?r) *-? *Reverser','match')
ans = 
    ' Reverser'

Reverser is preceded by the string 'power', so it shouldn't be selected.

Why do these occur?

Thanks

Best regards,

Pietro

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Stephen23 on 26 Jun 2017

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/346279-how-to-effectively-use-look-ahead-with-regexp#answer_271972

Edited: Stephen23 on 26 Jun 2017

Open in MATLAB Online

1. "searching the letter 'r' followed by a number." Actually you seem to be wanting to search for the letter 'r' preceded by a number, not "followed by". Only the second of your regexps does this. By adding the * to the first regexp you make the digits optional (the asterisk matches zero or more times!) So clearly the second r in that short string matches your first regular expression: it constitutes an 'r' preceded by zero spaces (permitted by the ?) and by zero digits (permitted by the *).

You could use + (match one or more) rather than * (match zero or more):

regexp('19f/4r power shift','(?<=\d+)\s?r')

but this is not really necessary: matching one digit is enough because if there are multiple digits then there is also one digit.

2. This is a much more subtle problem. The basic problem here is the optimism of regular expressions, and that * on the space character. What happens is that the regular expression parser keeps on trying new combinations to match as much of the string as possible, which clearly differs from how you perceive its operation (you want it to quit after matching that lookaround once).

The regular expression will correctly match 'power', but then it notices that you placed an asterisk * on the space. When it tries, for example, one space character preceding that word then your lookaround is satisfied: if it matches one space with the optional spaces ' *' regex, then the look around is also satisfied because what precedes that one space? Another space character! Therefore the lookaround is happy (one space is not equal to 'power'), and the regular expression parser is happy because it wants to match as much of the string as possible. Therefore it picks this option.

Basically what you seem to want is a pessimistic parser (you want to return no match if any one combination is a match to that lookaround, even if others do not match the lookaround), but in reality regexp parsers are optimistic: they return a match if any one combination is a match. They reject the one case that you are interested in because other cases better fulfill their basic operational principal: match as much as possible, however it can.

To see what parts of the strings are matched you should look at using a dynamic regular expression, e.g. adding:

(?@disp($1))

into your regexp and seeing how the string is parsed.

Do you really need to match an unknown number of space characters?

2 Comments
Show NoneHide None

pietro on 26 Jun 2017

I got it!!! thanks a lot

Stephen23 on 27 Jun 2017

Edited: Stephen23 on 27 Jun 2017

Open in MATLAB Online

You could move the space inside the lookaround:

>> regexp('power  Reverser','(?<!powe?r *)Reverser','match')
ans = 
     {}
>> regexp('power X Reverser','(?<!powe?r *)Reverser','match')
ans = 
    'Reverser'

Sign in to comment.

How to effectively use look ahead with regexp?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

How to effectively use look ahead with regexp?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None