MATLAB Answers


Is there a faster way of splitting a cell array into numeric array while preserving NaN?

Asked by Alex Wolf on 22 Aug 2019
Latest activity Commented on by TADA
on 23 Aug 2019
I am trying to split a set of data into rows and columns of numeric data that will preserve the position of empty data (as NaN or anything similar).
The input data is a cell array with rows of strings. The columns are delimited by a semi-colon ' ; '. The first 8 columns are filled with garbage data and there are many trailing columns with no data at all. I even sometimes have rows with no data. The attached data sample is just 4,000 rows long but I actually have datasets that have between 50,000 and 300,000 rows.
I have been using the code below but the str2double step is incredibly slow. Can anyone offer an alternative approach that can cut down on the processing time?
% split data by the ' ; ' separator
data = cellfun(@(x) split(x,';'),data,'UniformOutput',false);
% get rid of preceding garbage data in columns 1 to 8
data = cellfun(@(x) x(9:end),data,'UniformOutput',false);
% convert data into double. This step is incredibly slow
data = cellfun(@str2double,data,'UniformOutput',false);
% example of next operations I wish to perform on this data
data_a = cellfun(@(x) x(1:2:end),data,'UniformOutput',false);
data_b = cellfun(@(x) x(2:2:end),data,'UniformOutput',false);
Thank you in advance for any help


This offers a fairly good improvement to my code. Thank you for your suggestion.
Thank you for the feedback. I've never tried the function myself.

Sign in to comment.




1 Answer

Answer by TADA
on 22 Aug 2019
Edited by TADA
on 22 Aug 2019
 Accepted Answer

try this
endsWithSemicolon = cellfun(@(s) endsWith(s, ';'), data);
x = cellfun(@(s) textscan(s, '%f', 'Delimiter', ';', 'EmptyValue', nan(), 'Whitespace', ' *\n\t\r\b'), data);
x = cellfun(@(a) a(9:end), x, 'UniformOutput', false);
x(endsWithSemicolon) = cellfun(@(a) [a; nan], x(endsWithSemicolon), 'UniformOutput', false);


Show 1 older comment
Your solution works quite well for my purpose. Here is the difference in performance.
Original approach with str2double: 52.586311 seconds
Alternate approach with str2doubleq: 0.731596 seconds
Alternate approach with textscan: 0.343899 seconds
Both your solution and the one offered by Adam Danz improve my code significantly. Thank you.

Sign in to comment.