Need help with regexpi expression for multiple variants of the same phrase

1 view (last 30 days)
I have a question regarding the use of regexpi to determine if certain string words are input into a text file. The text files were created by multiple individuals and use slightly different phrasing to mean the same variable. For example in a text file containing a gait evaluation the phrase 'slow cadence' was recorded, but 'slow cadence' can be denoted as 'slow cadence' or 'slow stepping'. My original code was as follows:
data=fileread('Test.txt');
A=isempty(regexpi(data{'slow cadence','slow stepping'}));
However, this version can return a false positive as it will mix and match string within the {}. For example the following code for the same file will return a '0' for the isempty function even though none of the string phrases match completely:
data=fileread('Test.txt');
A=isempty(regexpi(data{'fast cadence','slow stepping'}));
I feel like I am missing a simple command to indicate that A can be 'slow cadence' OR 'slow stepping'. Any help is much appreciated.

Answers (2)

Stephen23
Stephen23 on 14 Dec 2022
Edited: Stephen23 on 14 Dec 2022
You will probably find the 'ONCE' option also very very very useful (here I inverted the logical output, because true=contains is usually much simpler to work with than messing-with-your-head true=doesnotcontain):
str = fileread('Test.txt');
idx = ~isempty(regexpi(str, 'slow (cadence|stepping)','once'))
idx = logical
1
Using regular expressions requires reading the documentation again and again and again and again and again... it takes quite a while to get profficient and comfortable using them. Also, make sure you read the documentation.
You might also find my interactive tool useful for helping to develop regular expressions:
I should also mention, that if you want to use regular expressions then you need to read the documentation. A lot.
PS: Another approach using the newer CONTAINS and patterns:
pat = regexpPattern('slow (cadence|stepping)');
idx = contains(str,pat, 'ignorecase',true)
idx = logical
1

Fifteen12
Fifteen12 on 14 Dec 2022
I think you want to look at making regular expressions. Try this:
A=isempty(regexpi(data,'(slow cadence|slow stepping)'));
You'll probably want to do more case matching as well, using wild cards to subsitite for white spaace, etc. You can find more here

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!