Find indices of multiple strings within another string
14 views (last 30 days)
Show older comments
I am trying to efficiently find which strings (character vectors) match between two cell arrays.
One cell array contains ~1000 equations written as strings that I'm trying to parse by matching to strings in another array (100,000 items). I need to know the indices from the 100,000 items that are found within the ~1000 equations. There may be multiple of the 100,000 items found within each of the 1000 equations.
I'm currently implementing this as such:
Equations.Equation % this is a list of ~1000 equations, a cell array of character vectors
OutputData.DataName % list of ~100,000 possible strings I'm looking for in the equations (my variable names)
for ii = 1:length(Equations)
matches=cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName);
indices = find(matches);
% do some other stuff with the matches found, then move onto the next iteration of the loop
end
This is fairly slow. Is there a way to more efficiently find within Equations(ii).Equation which items within OutputData.DataName are found and the index of those items?
4 Comments
Paul
on 9 Apr 2022
Something's not working with this example data and the code in the question. Is there a typo somewherer?
Equations.Equation = { '(123X + 123Y).^2'; ...
'500 + 456X + 123Z'; ...
'200 * abs(789Z * pi) + 123X'}
OutputData.DataName = {'123A'; '123B'; '123C'; '123X'; '123Y'; '123Z'; '456X'; '456Y'; '456Z'; '789X'; '789Y'; '789Z'};
for ii = 1:length(Equations)
matches=cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName);
indices = find(matches);
% do some other stuff with the matches found, then move onto the next iteration of the loop
end
Voss
on 9 Apr 2022
It seems like Equations is actually a struct array:
Equations = struct('Equation',{ ...
'(123X + 123Y).^2'; ...
'500 + 456X + 123Z'; ...
'200 * abs(789Z * pi) + 123X'})
OutputData.DataName = {'123A'; '123B'; '123C'; '123X'; '123Y'; '123Z'; '456X'; '456Y'; '456Z'; '789X'; '789Y'; '789Z'};
for ii = 1:length(Equations)
matches = cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName).'
indices = find(matches)
% do some other stuff with the matches found, then move onto the next iteration of the loop
end
Accepted Answer
Paul
on 10 Apr 2022
It looks like using string variables with an inner loop is much faster than a cell array with cellfun, at least here on Answers with the data provided.
Orignal code, modified by @_
Equations = struct('Equation',{ ...
'(123X + 123Y).^2'; ...
'500 + 456X + 123Z'; ...
'200 * abs(789Z * pi) + 123X'});
OutputData.DataName = {'123A'; '123B'; '123C'; '123X'; '123Y'; '123Z'; '456X'; '456Y'; '456Z'; '789X'; '789Y'; '789Z'};
for ii = 1:length(Equations)
matches = cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName).'
indices = find(matches)
% do some other stuff with the matches found, then move onto the next iteration of the loop
end
Convert the cell arrays to strings, and implement an inner loop to compute matches. Verify the results are the same
equations = string({Equations.Equation});
dataname = string(OutputData.DataName);
mathces = nan(1,numel(dataname));
for ii = 1:numel(equations)
for jj = 1:numel(dataname)
matches(jj) = contains(equations(ii),dataname(jj));
end
matches
indices = find(matches)
end
Wrap an outer loop aorund the original code to test timing.
ntrials = 1e5;
tic
for trials = 1:ntrials
for ii = 1:length(Equations)
matches = cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName).';
indices = find(matches);
% do some other stuff with the matches found, then move onto the next iteration of the loop
end
end
toc
tic
for trials = 1:ntrials
for ii = 1:numel(equations)
for jj = 1:numel(dataname)
matches(jj) = contains(equations(ii),dataname(jj));
end
matches;
indices = find(matches);
end
end
toc
I was actually surprised that there isn't a string function that can replace that inner loop, but I couldnt't find one. Maybe it can be done using a particular pattern, but I couldn't figure that out either.
0 Comments
More Answers (0)
See Also
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!