How to extract matches from results of a regexp match

Question

Bill Tubbs on 8 Jun 2022

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1736060-how-to-extract-matches-from-results-of-a-regexp-match

Edited: Stephen23 on 19 Jun 2022

I'm trying to find the columns of a table that match a pattern. This works:

col_names = {'X_est_9', 'X_est_10', 'Y_est_9', 'Y_est_10', 'E_obs_9', 'E_obs_10'};
result = regexp(col_names, 'E_obs_\d*', 'match')

But the result is a cell array of cells (not sure why):

result =

1×6 cell array

{0×0 cell} {0×0 cell} {0×0 cell} {0×0 cell} {1×1 cell} {1×1 cell}

I just want a cell array of the matched results:

matched_col_names =

1×2 cell array

{'E_obs_9'} {'E_obs_10'}

Must be an easier way than this:

matched_col_names = cellfun(@(x) x, result(~cellfun(@isempty, result)))

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Stephen23 on 8 Jun 2022

2
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1736060-how-to-extract-matches-from-results-of-a-regexp-match#answer_981115

Edited: Stephen23 on 19 Jun 2022

Open in MATLAB Online

"But the result is a cell array of cells (not sure why):"

Summary: you need to use the ONCE option.

Explanation: There are two things going on in your question. Firstly you used the default ALL option shown here:

https://www.mathworks.com/help/matlab/ref/regexp.html#btn_p45-option

which matches all occurances in the input string that match the regular expression, which could be two or more times. Because there could be multiple matches, all of the outputs are nested in cell arrays (you can see this by reading through the output descriptions, too many to copy here).

Because you only want to match the regular expression once (not multiple times), you should specify the ONCE option... this will remove one level of nested cell arrays from the output. If you are planning on using REGEXP, you will find the ONCE option very useful.

Secondly the MATCH output cell array always has the same size as the input cell array. If you provide it with a six-element cell array, then you will get a six-element cell array at the output. So your expected output size is not supported by REGEXP (and for reasons of traceability should not occur).

But you can remove the empty elements yourself, this is quite easy and much more efficient than your code:

col_names = {'X_est_9', 'X_est_10', 'Y_est_9', 'Y_est_10', 'E_obs_9', 'E_obs_10'};
result = regexp(col_names, 'E_obs_\d*', 'match', 'once')
result = 1×6 cell array
    {0×0 char}    {0×0 char}    {0×0 char}    {0×0 char}    {'E_obs_9'}    {'E_obs_10'}
result(cellfun('isempty',result)) = []
result = 1×2 cell array
    {'E_obs_9'}    {'E_obs_10'}

Bonus: You might find this tool useful when developing regular expressions:

https://www.mathworks.com/matlabcentral/fileexchange/48930-interactive-regular-expression-tool

1 Comment
Show -1 older commentsHide -1 older comments

the cyclist on 8 Jun 2022

Today I learned about the 'once' option (which I did not find, despite looking through the docs). But will I remember?!

:-)

Sign in to comment.

Answer 2

the cyclist on 8 Jun 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1736060-how-to-extract-matches-from-results-of-a-regexp-match#answer_981100

Open in MATLAB Online

Even when using a single character array input along with the 'match' option, MATLAB has to return outputs in a cell array, to be able to handle cases where there are multiple matches within a single input:

regexp('E_obs_9 E_obs_10','E_obs_\d*','match')
ans = 1×2 cell array
    {'E_obs_9'}    {'E_obs_10'}

Because you are passing in a cell array of character arrays, you get out a cell array of cell arrays. You get the empty ones because MATLAB has no way of "knowing" that you don't want the empty ones. In particular, if it only output two cells, you would have no way of knowing which two input element that those two outputs corresponded to.

So, I'm afraid that you are stuck doing the post-processing step, as far as I can tell.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

Bill Tubbs on 19 Jun 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1736060-how-to-extract-matches-from-results-of-a-regexp-match#answer_988995

Edited: Bill Tubbs on 19 Jun 2022

Open in MATLAB Online

Here is a one-line solution—it's based on the answer of Stephen23 but instead of finding the matches, it finds the first indeces of any matches (this is the default for regexp), and then makes a boolean of matches/non-matches and uses it to index the orignal cell array.

matched_col_names = col_names(~cellfun('isempty', regexp(col_names, 'E_obs_\d*')))

How to extract matches from results of a regexp match

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to extract matches from results of a regexp match

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments