- (?<=^|\s): Positive lookbehind to ensure the match starts at the beginning of the string or immediately after whitespace.
- (?<before>\w+): Captures the initial sequence of word characters (letters, digits, or underscores).
- (?:\(-(?<digit>\d+)\))?: An optional non-capturing group that matches the parentheses and the negative number inside it.
- (?<after>\w+): Captures the sequence of word characters after ampersand
- (?=\s|$): Positive lookahead to ensure the match ends before whitespace or at the end of the string.
condition in regular expression
4 views (last 30 days)
Show older comments
Hi,
I would like to match a regular expression, which can take two forms:
aaa1(-1)&abcd
or
bbb2&aefdg
More specifically,
- the expression starts with word characters (e.g. aaa1 or bbb2)
- it is optionally followed by a parenthesis, then a minus sign, then a closing parenthesis
- it is followed by the ampersand (&)
- it is followed by word characters and then
I would like to capture those expressions and I wrote the following code, which does not work well
expr='\<(?<before>\w+)\>(\(-)?(?<digit>\d+)?(\))?&(?<after>\w+)';
regexp('vvv&mp abvg(-5)&ads abvg-5&ads',expr,'names')
It does not work well because the third expression (abvg-5&ads) should not be a match. That is, the digits should be matched only if they are around parentheses.
I thought that maybe using some form of condition
(?(cond)expr1|expr2)
would help but I was not successful in implementing it. Maybe that is the way to go, maybe there is another way, I don't know.
Any suggestions?
Thanks
0 Comments
Answers (1)
Ishaan
on 22 Apr 2025
Hello,
I understand your need a regular expression to match with two distinct forms, capturing word characters at the start and end, an optional parenthesized negative number, and an ampersand separator. The challenge lies in excluding matches where digits appear without parentheses.
The following regular expression effectively addresses your requirements:
expr = '(?<=^|\s)(?<before>\w+)(?:\(-(?<digit>\d+)\))?&(?<after>\w+)(?=\s|$)';
It enforces “all or nothing” presence of the parenthesis and minus sign before the digits. i.e. the pattern should only match when the entire structure is present, and not when only parts of it are found.
Explanation:
With the forementioned expression
regexp('vvv&mp abvg(-5)&ads abvg-5&ads', expr, 'names')
returns the following struct:
fields before after
1. 'vvv' 'mp'
2. 'abvg' 'ads'
which is the expected outcome.
There is no need for conditional sub-patterns in this case, as the grouping and anchoring in the regular expression are sufficient to enforce the desired structure.
Hope this helped.
0 Comments
See Also
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!