MATLAB Answers

Regexp lookbehind and lineanchors

20 views (last 30 days)
alelap83
alelap83 on 12 Sep 2019
Edited: alelap83 on 16 Sep 2019
Could someone help me to understand why
st = ' a b c';
pattern = '(?<=^\s*)c';
regexp(st,pattern,'lineanchors')
ans =
[]
i.e., does not match (as I expected), while
st2 = [newline,st];
regexp(st2,pattern,'lineanchors')
ans =
7
i.e., finds a match?
My intent is to match 'c' that is preceded by the beginning of a line and zero or more white character. How should I do?

  2 Comments

Stephen Cobeldick
Stephen Cobeldick on 12 Sep 2019
Getting an output of 7 seems like a bug to me. Strangely the bug occurs even if the "zero or more matches" character does not even exist in the input string (R2012b):
>> regexp([char(10),st],'(?<=^_*)c','lineanchors') % Underscore is not in st.
ans =
7
>> regexp([char(10),st],'(?<=^)c','lineanchors') % expected
ans =
[]
>> regexp(st,'(?<=^_*)c','lineanchors') % expected
ans =
[]
What MATLAB version are you using?
You should report this as a bug, giving a link to this thread.
alelap83
alelap83 on 12 Sep 2019
R2019a. Reported to Technical Support.
Edit: bug confirmed. Excerpt from Matlab Support's answer:
Indeed the behavior that you observed is indeed a bug in "regexp", which the developers are now aware of, and which might be addressed in some future release.
However, a workaround does exists, which consists in giving up on using the 'lineanchors' option (which makes the "^" and "$" metacharacters match embedded newlines too), and rely on grouping the (absolute) beginning of line "^" and the embedded newline "\n" as two alternatives.

Sign in to comment.

Accepted Answer

per isakson
per isakson on 13 Sep 2019
Edited: per isakson on 16 Sep 2019
"My intent is to match 'c' that is preceded by the beginning of a line and zero or more white character."
In the character array, ' a b c', the character, 'c', is (after the beginning of the line) preceded not only by whitespace but also by the characters 'a' and 'b'. Thus, [] is the expected result. Try
%%
chr = ' a b c';
xpr = '(?<=^[ ab]*)c';
regexp( chr, xpr, 'match', 'lineanchors' )
that returns
ans =
1×1 cell array
{'c'}
I fail to understand the behavior of your second example. I expect [], not 7. It's looks like a bug to me.
/R2018b
ADDENDUM
I learned something about the option,'once', the other day. It affects the type of the output. In this case the output is a character row instead of a cell array containing the character row. Thus,
>> regexp( chr, xpr, 'match', 'lineanchors', 'once' )
ans =
'c'

  2 Comments

alelap83
alelap83 on 13 Sep 2019
Thank you. It works. Unfortunately, my example is a simplification of my real scenario where a b c are more complicated expressions and I cannot use this method.
per isakson
per isakson on 13 Sep 2019
"Could someone help me to understand why" I think I did that.
I cannot help regarding the "real scenario" because of lack of information.

Sign in to comment.

More Answers (0)

Sign in to answer this question.