Matching combinations of strings

5 views (last 30 days)
Marcus Glover
Marcus Glover on 17 Jun 2024
Edited: DGM on 22 Jun 2024
I have a table TT with a string variable TT.name. I want to return true if TT.name matches any entry in another table variable OK.name. However, I have some complications I am having a hard time parsing.
Many of the strings in TT.name are combinations of strings that appear in OK.name. I want to include these as a true match. Sometimes they have a + symbol, sometimes just a space. Further complicating matters, the table OK contains some entries with spaces, and if they do I want to treat them as an entire entry, and not break them up at the spaces.
I believe I will usually have a combination of 2 strings only, though 3 and 4 may be possible.
TT = table(["Green"; "Red"; "Blue"; "Black Blue"; "Black"; "Blue Green"; "Red + Blue"; "Red Orange"; "Red + White"; "Black Blue Red"], 'VariableNames', {'name'})
TT = 10x1 table
name ________________ "Green" "Red" "Blue" "Black Blue" "Black" "Blue Green" "Red + Blue" "Red Orange" "Red + White" "Black Blue Red"
OK = table(["Red"; "Green"; "Blue"; "Black Blue"], 'VariableNames', {'name'})
OK = 4x1 table
name ____________ "Red" "Green" "Blue" "Black Blue"
This is the output I would want, but not by manually changing rows 6 and 7:
TT.match=ismember(TT.name,OK.name);
TT.match([6 7 10])=1
TT = 10x2 table
name match ________________ _____ "Green" true "Red" true "Blue" true "Black Blue" true "Black" false "Blue Green" true "Red + Blue" true "Red Orange" false "Red + White" false "Black Blue Red" true
In the example, "Blue Green" and "Red + Blue" are true matchs, because "Blue" "Green" and "Red" all appear as entries in OK.name.
SImilarly, "Black Blue Red" is ok because it is a combination of "Black Blue" and "Red"
"Black" is not a match, because the only entry in OK.name is "Black Blue" and I do not want to separate the words from this table.
"Red Orange" and "Red + Orange" are not matches because only "Red" is in the OK table.
  2 Comments
Stephen23
Stephen23 on 18 Jun 2024
Edited: Stephen23 on 18 Jun 2024
The task is ill-defined, and most likely impossible in a general sense: this is due to the same delimiters being used to separate words in OK as well as to separate combinations from TT. Consider:
TT = "black blue" + "red" -> "black blue red"
OK = ["black", "blue red"]
Also note that a naive approach considering all permutations of OK will quickly become intractable.
Questions:
  • what size is OK ?
  • what size is TT ?
Marcus Glover
Marcus Glover on 18 Jun 2024
Edited: Marcus Glover on 18 Jun 2024
I think the size of OK (~250) is indeed going to make this intractable. (TT is hundreds of thousands of entries) The solution is to fix the issue with delimiters in the data.

Sign in to comment.

Answers (1)

Umar
Umar on 18 Jun 2024
Hi Marcus,To achieve this, you can use a combination of string manipulation functions and logical comparisons in MATLAB. Here's a step-by-step approach to solving this problem: 1. Iterate through each row in the `TT.name` table. 2. For each row, split the string into individual words based on spaces or the "+" symbol. 3. Check if each individual word exists as an entry in the `OK.name` table. 4. If all words in the split string are found in the `OK.name` table, consider it a match. 5. Update the `TT.match` column accordingly. Here's some MATLAB code that implements this logic: ```matlab TT.match = false(size(TT, 1), 1); for i = 1:size(TT, 1) words = strsplit(TT.name{i}, {' ', '+'}); match_count = sum(ismember(words, OK.name)); if match_count == numel(words) TT.match(i) = true; end end ``` By following these steps, you can efficiently handle combinations of strings and spaces within the `TT.name` table and accurately identify matches based on the entries in the `OK.name` table. This approach ensures that you can automatically identify true matches without manually changing rows, as demonstrated in your desired output example. Additionally, it considers multiple strings combinations while respecting the specific conditions outlined for matching entries.
  9 Comments
DGM
DGM on 22 Jun 2024
Edited: DGM on 22 Jun 2024
It's okay. You're still free to think of me as a jerk. I mean, it's fair. Just please try to work on the formatting and stuff.
FWIW, also if you don't have MATLAB, I'm pretty sure you can use MATLAB Online for free for something like 20h a month. It doesn't have as many toolboxes installed as the forum editor, but it does allow the use of certain things (interactive tools) that the forum editor can't use.

Sign in to comment.

Categories

Find more on Get Started with MATLAB in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!