Quick conversion of a large (and with mixed data types) cell array into a numerical array

18 views (last 30 days)
Sim
Sim on 17 Feb 2020
Edited: Sim on 17 Feb 2020
Hi, I have a 800'000 x 20 cell array and I would like to convert it into a 800'000 x 20 double precision array.
My cell array has the following features:
  • it is a mix of data types, i.e. some columns are character vectors (which represent numbers) and some columns are numbers.
  • within a column of character vectors, the number of characters is different among the rows.
Here, an example/part of my cell array:
a =
10×3 cell array
{'1170790,889999999897555'} {'1009,697999999999979'} {[ 1]}
{'1170260,709000000031665'} {'869,886999999999944' } {[ 2]}
{'1181959,743999999947846'} {'504,125999999999976' } {[ 3]}
{'1174019,564000000013039'} {'674,142000000000053' } {[ 4]}
{'1173729,793000000063330'} {'689,635999999999967' } {[ 5]}
{'1181555,257999999914318'} {'509,391999999999996' } {[ 6]}
{'1180374,280000000027940'} {'538,126999999999953' } {[ 7]}
{'1180124,550999999977648'} {'568,719000000000051' } {[ 8]}
{'1179305,570999999996275'} {'626,528999999999996' } {[ 9]}
{'1177132,448000000091270'} {'705,443999999999960' } {[10]}
Any suggestion to quickly/efficiently convert this (mixed data type) cell array into a numerical array, possibly without using str2double() ?
Just for information: among my attempts for conversion, I found that str2double() is very slow and
>> cell2mat(a(:,2))
Error using cat
Dimensions of arrays being concatenated are not consistent.
Error in cell2mat (line 83)
m{n} = cat(1,c{:,n});
and
>> str2num(char(a(:,3)))
Error using char
Cell elements must be character arrays.
  2 Comments
Sim
Sim on 17 Feb 2020
Yes, you are right...
I replaced the commas with dots in other parts of my code with
strrep(C,',','.')
....And I should do it for this case too... :)

Sign in to comment.

Accepted Answer

Stephen
Stephen on 17 Feb 2020
Edited: Stephen on 17 Feb 2020
This is likely to be faster than str2double. Use a decimal point rather than a decimal comma for more speed.
>> C = {'1170790,889999999897555','1009,697999999999979',1;'1170260,709000000031665','869,886999999999944',2;'1181959,743999999947846','504,125999999999976',3;'1174019,564000000013039','674,142000000000053',4;'1173729,793000000063330','689,635999999999967',5;'1181555,257999999914318','509,391999999999996',6;'1180374,280000000027940','538,126999999999953',7;'1180124,550999999977648','568,719000000000051',8;'1179305,570999999996275','626,528999999999996',9;'1177132,448000000091270','705,443999999999960',10}
C =
'1170790,889999999897555' '1009,697999999999979' [ 1]
'1170260,709000000031665' '869,886999999999944' [ 2]
'1181959,743999999947846' '504,125999999999976' [ 3]
'1174019,564000000013039' '674,142000000000053' [ 4]
'1173729,793000000063330' '689,635999999999967' [ 5]
'1181555,257999999914318' '509,391999999999996' [ 6]
'1180374,280000000027940' '538,126999999999953' [ 7]
'1180124,550999999977648' '568,719000000000051' [ 8]
'1179305,570999999996275' '626,528999999999996' [ 9]
'1177132,448000000091270' '705,443999999999960' [10]
>> X = cellfun('isclass',C,'char'); % slower: cellfun(@ischar,C)
>> M = nan(size(C));
>> M(~X) = [C{~X}]; % numeric data
>> M(X) = sscanf(strrep(sprintf(' %s',C{X}),',','.'),'%f') % character data
M =
1170790.890 1009.698 1
1170260.709 869.887 2
1181959.744 504.126 3
1174019.564 674.142 4
1173729.793 689.636 5
1181555.258 509.392 6
1180374.280 538.127 7
1180124.551 568.719 8
1179305.571 626.529 9
1177132.448 705.444 10
If the intermediate character vector does not fit into memory then simply do the above for each column of C.
  3 Comments

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!