Is it possible to extract numbers from formated strings without a for cycle?

2 views (last 30 days)
I have {'abc12', 'abc23', 'abc24', 'abc99'} and I need the vector [12,23,24,99]. How to do this without a for cycle?

Accepted Answer

Stephen23
Stephen23 on 29 Jan 2018
Edited: Stephen23 on 29 Jan 2018
Much faster than cellfun, str2double, or strrep, and with no explicit loop:
>> C = {'abc12', 'abc23', 'abc24', 'abc99'};
>> V = sscanf([C{:}],'%*3c%d')
V =
12
23
24
99
>>
  2 Comments
Jan
Jan on 29 Jan 2018
Edited: Jan on 29 Jan 2018
+1. Very efficient.
If C is large (e.g. 5000 elements), the concatenation needs a lot of time. It seems like Matlab's horzcat has a problem with the pre-allocation. Using FEX: Cell2Vec and with the format string 'abc%d' the code is even two times faster. But if the OP needs to do this for short arrays strings, the time for compiling the fast C-Mex function might be wasted.
See timings in my answer.
Remark: What a pity, that Matlab's sprintfc is not documented and that there is no corresponding sscanfc.
Stephen23
Stephen23 on 29 Jan 2018
Edited: Stephen23 on 29 Jan 2018
@Jan Simon: thank you for your in-depth timing and investigation.
"with the format string 'abc%d' the code is even two times faster"
I thought this might be the case, but suspected (based on this question) that the user would not always want the same 'abc' characters being matched.

Sign in to comment.

More Answers (3)

Matt J
Matt J on 29 Jan 2018
Edited: Matt J on 29 Jan 2018
No, it is not really possible to do this without a for-loop. The suggestions Star Strider and I have given you use str2double and/or cellfun, which have for-loops inside them.
If for-loops hidden in functions don't count for you, then okay, but you could just as easily write your own function to hide the loop.
  3 Comments
Matt J
Matt J on 29 Jan 2018
Edited: Matt J on 29 Jan 2018
Hmmm. In R2017b, str2double is Mcoded but strrep is not, so maybe both have loops, but they can't be the "same kind".
Jan
Jan on 29 Jan 2018
@Matt J: Exactly, this is what I actually wanted to express. Mentioning strrep and str2double was thought to supplement your answer. They contain loops in M or C code level. As you wrote: "it is not really possible to do this without a for-loop". +1
Remark: Old versions of Matlab contained many C codes of the builtin functions, like cellfun.c or histc.c. Even for some P coded files the corresponding M files have been shipped. These source codes have been very good examples. What a pity, that they are not available in modern versions anymore.

Sign in to comment.


Jan
Jan on 29 Jan 2018
Edited: Jan on 29 Jan 2018
V = sscanf(Cell2Vec(C), 'abc%d');
Some timings:
C = repmat({'abc12', 'abc23', 'abc24', 'abc99'}, 1, 1000);
tic;
for k = 1:100
V = sscanf([C{:}],'%*3c%d');
end;
toc
tic;
for k = 1:100
V = sscanf(Cell2Vec(C), '%*3c%d');
end;
toc
% While '%*3c%d' requires some work, 'abc%d' is cheaper:
tic;
for k = 1:100
V = sscanf(Cell2Vec(C), 'abc%d');
end;
toc
tic;
for k = 1:100
V = str2double(strrep(C, 'abc', ''));
end
toc
tic;
for k = 1:100
V = cellfun(@str2double, regexp(C, '\d*', 'match'));
end
toc
Results:
Elapsed time is 0.556899 seconds. % sccanf([C{:}], '%*3c%d')
Elapsed time is 0.311912 seconds. % sccanf(Cell2Vec(C), '%*3c%d')
Elapsed time is 0.254616 seconds. % sccanf(Cell2Vec(C), 'abc%d')
Elapsed time is 12.940544 seconds. % str2double(strrep)
Elapsed time is 22.966292 seconds. % cellfun(@str2double, regexp)

Matt J
Matt J on 29 Jan 2018
str2double( strrep( {'abc12', 'abc23', 'abc24', 'abc99'},'abc','') )

Categories

Find more on Structures in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!