Find indices of multiple strings within another string

Question

Nick Smith on 8 Apr 2022

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1692275-find-indices-of-multiple-strings-within-another-string

Answered: Paul on 10 Apr 2022

I am trying to efficiently find which strings (character vectors) match between two cell arrays.

One cell array contains ~1000 equations written as strings that I'm trying to parse by matching to strings in another array (100,000 items). I need to know the indices from the 100,000 items that are found within the ~1000 equations. There may be multiple of the 100,000 items found within each of the 1000 equations.

I'm currently implementing this as such:

Equations.Equation % this is a list of ~1000 equations, a cell array of character vectors

OutputData.DataName % list of ~100,000 possible strings I'm looking for in the equations (my variable names)

for ii = 1:length(Equations)

matches=cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName);

indices = find(matches);

% do some other stuff with the matches found, then move onto the next iteration of the loop

end

This is fairly slow. Is there a way to more efficiently find within Equations(ii).Equation which items within OutputData.DataName are found and the index of those items?

4 Comments
Show 2 older commentsHide 2 older comments

Paul on 9 Apr 2022

Open in MATLAB Online

Something's not working with this example data and the code in the question. Is there a typo somewherer?

Equations.Equation = { '(123X + 123Y).^2'; ...
                       '500 + 456X + 123Z'; ...
                       '200 * abs(789Z * pi) + 123X'}
Equations = struct with fields:
    Equation: {3×1 cell}
OutputData.DataName = {'123A'; '123B'; '123C'; '123X'; '123Y'; '123Z'; '456X'; '456Y'; '456Z'; '789X'; '789Y'; '789Z'};
for ii = 1:length(Equations)
    matches=cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName);
    indices = find(matches);
    % do some other stuff with the matches found, then move onto the next iteration of the loop
end
Error using cellfun
Non-scalar in Uniform output, at index 1, output 1.
Set 'UniformOutput' to false.

Voss on 9 Apr 2022

Open in MATLAB Online

It seems like Equations is actually a struct array:

Equations = struct('Equation',{ ...
    '(123X + 123Y).^2'; ...
    '500 + 456X + 123Z'; ...
    '200 * abs(789Z * pi) + 123X'})
Equations = 3×1 struct array with fields:
    Equation
OutputData.DataName = {'123A'; '123B'; '123C'; '123X'; '123Y'; '123Z'; '456X'; '456Y'; '456Z'; '789X'; '789Y'; '789Z'};
for ii = 1:length(Equations)
    matches = cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName).'
    indices = find(matches)
    % do some other stuff with the matches found, then move onto the next iteration of the loop
end
matches = 1×12 logical array
   0   0   0   1   1   0   0   0   0   0   0   0
indices = 1×2
     4     5
matches = 1×12 logical array
   0   0   0   0   0   1   1   0   0   0   0   0
indices = 1×2
     6     7
matches = 1×12 logical array
   0   0   0   1   0   0   0   0   0   0   0   1
indices = 1×2
     4    12

Sign in to comment.

Sign in to answer this question.

Answer 1

Paul on 10 Apr 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1692275-find-indices-of-multiple-strings-within-another-string#answer_939640

Open in MATLAB Online

It looks like using string variables with an inner loop is much faster than a cell array with cellfun, at least here on Answers with the data provided.

Orignal code, modified by @_

Equations = struct('Equation',{ ...
    '(123X + 123Y).^2'; ...
    '500 + 456X + 123Z'; ...
    '200 * abs(789Z * pi) + 123X'});
OutputData.DataName = {'123A'; '123B'; '123C'; '123X'; '123Y'; '123Z'; '456X'; '456Y'; '456Z'; '789X'; '789Y'; '789Z'};
for ii = 1:length(Equations)
    matches = cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName).'
    indices = find(matches)
    % do some other stuff with the matches found, then move onto the next iteration of the loop
end
matches = 1×12 logical array
   0   0   0   1   1   0   0   0   0   0   0   0
indices = 1×2
     4     5
matches = 1×12 logical array
   0   0   0   0   0   1   1   0   0   0   0   0
indices = 1×2
     6     7
matches = 1×12 logical array
   0   0   0   1   0   0   0   0   0   0   0   1
indices = 1×2
     4    12

Convert the cell arrays to strings, and implement an inner loop to compute matches. Verify the results are the same

equations = string({Equations.Equation});
dataname = string(OutputData.DataName);
mathces = nan(1,numel(dataname));
for ii = 1:numel(equations)
    for jj = 1:numel(dataname)
        matches(jj) = contains(equations(ii),dataname(jj));
    end
    matches
    indices = find(matches)
end
matches = 1×12 logical array
   0   0   0   1   1   0   0   0   0   0   0   0
indices = 1×2
     4     5
matches = 1×12 logical array
   0   0   0   0   0   1   1   0   0   0   0   0
indices = 1×2
     6     7
matches = 1×12 logical array
   0   0   0   1   0   0   0   0   0   0   0   1
indices = 1×2
     4    12

Wrap an outer loop aorund the original code to test timing.

ntrials = 1e5;
tic
for trials = 1:ntrials
    for ii = 1:length(Equations)
    matches = cellfun(@(x) contains(Equations(ii).Equation,x),OutputData.DataName).';
    indices = find(matches);
    % do some other stuff with the matches found, then move onto the next iteration of the loop
    end
end
toc
Elapsed time is 15.236180 seconds.
tic
for trials = 1:ntrials
   for ii = 1:numel(equations)
    for jj = 1:numel(dataname)
        matches(jj) = contains(equations(ii),dataname(jj));
    end
    matches;
    indices = find(matches);
   end 
end
toc
Elapsed time is 2.448469 seconds.

I was actually surprised that there isn't a string function that can replace that inner loop, but I couldnt't find one. Maybe it can be done using a particular pattern, but I couldn't figure that out either.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Find indices of multiple strings within another string

4 Comments
Show 2 older commentsHide 2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Find indices of multiple strings within another string

4 Comments Show 2 older commentsHide 2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

4 Comments
Show 2 older commentsHide 2 older comments

0 Comments
Show -2 older commentsHide -2 older comments