Textscan: how to ignore single '-' characters, while preserving '-' in negative numbers?

I use the following code to read the block below.
fid = fopen('data.csv');
C = textscan(fid,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');
fclose(fid);
Because of the single '-' characters in data.csv this does not work yet. I want to ignore single '-' characters from the input and use NaN values there.
How can I read single '-' characters as NaN? I tried 'TreatAsEmpty' but this leads to the situation where negative values are transformed to positive. Because negative values also include a '-' character, and 'TreatAsEmpty' also removes these.
Block:
Headerline
01-01-2006 (00 uur);-;-1.61;-;-0.70;-;1;-;239
01-01-2006 (01 uur);-;-1.66;-;-0.70;-;-;-;1108
01-01-2006 (02 uur);-;-1.68;-;-0.75;-;1;-;1827
01-01-2006 (03 uur);-;-1.64;-;-0.77;-;-;-;-
01-01-2006 (04 uur);-;-1.62;-;-0.74;-;-;-;-
01-01-2006 (05 uur);-;-1.61;-;-0.74;-;1;-;2053
01-01-2006 (06 uur);-;-1.66;-;-0.75;-;-;-;2870
01-01-2006 (07 uur);-;-1.68;-;-0.80;-;0;-;3585
01-01-2006 (08 uur);-;-1.64;-;-0.80;-;-;-;-
01-01-2006 (09 uur);-;-1.63;-;-0.79;-;-;-;-
01-01-2006 (10 uur);-;-1.62;-;-0.77;-;-;-;-
01-01-2006 (11 uur);-;-1.62;-;-0.74;-;1;-;3967
[EDITED, Jan, code and file contents formatted]

 Accepted Answer

Try this
str = fileread('cssm.txt');
str = strrep( str, '-;', 'nan;' );
nl = [char(13),char(10)];
str = regexprep( str, [';-\s*',nl], [';nan',nl] );
C = textscan( str,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');
where cssm.txt contains the rows of text in the question
The approach is
  1. read the whole file as text
  2. replace the "-", which stands for missing, with NaN
  3. parse the modified string with textscan
Note: the value of the variable, nl, must match the end of line characters in your file.
Jan, thanks for formatting the question.

4 Comments

Hello Per! Thanks for your answer. I am very pleased to recieve your help. I tested the first part of your solution. This works very well.
But I believe that this part is not working yet:
%%
nl = [char(13),char(10)];
str = regexprep( str, [';-\s*',nl], [';nan',nl] );
because the ;- at the end of my input is still remaining in the output. Can you help me again?
Best wishes, Roel
How about this
str = fileread('hyphens.txt')
str = regexprep(str,'-(?!\d)','nan')
C = textscan(str,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');
@Roel, I guess you need to change
nl = [char(13),char(10)];
to
nl = [char(10)];
according to my "Note:". You could check with
double( str(1:80) )
and look for the number "10". Is it preceeded by "13" or not?
.
@Matt, your expression is better; it is shorter and more robust. It actually checks whether "-" is followed by a digit. I tried the approach, but made a mistake:(. Thus replace
str = strrep( str, '-;', 'nan;' );
nl = [char(13),char(10)];
str = regexprep( str, [';-\s*',nl], [';nan',nl] );
by
str = regexprep(str,'-(?!\d)','nan')
Brilliant! this works very well. Thanks a lot Per and Matt!

Sign in to comment.

More Answers (0)

Categories

Tags

Asked:

R V
on 13 Sep 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!