How to read a comma delimited .dat file containing mixed formats

9 views (last 30 days)
Hi, I have been struggling with this for hours: I have a .dat file containing comma separated values. this is how the first lines look like:
Cell 1,13,3-22-2018 05:30,-0.02496409
Cell 1,14,3-22-2018 05:35,-0.02496409
Cell 1,15,3-22-2018 05:40,-0.02496409
Cell 1,16,3-22-2018 05:45,-0.02496409
Cell 1,17,3-22-2018 05:50,-0.02496409
My file has more than 1,100,000 lines so I know I have to use textscan. I have tried textread,dlmread,importdata,readtable... textscan is my best bet, but I can't figure out the format.
I would like to end up with
  • CellNb = vector containing all my cell numbers (in this example it'll be 1 1 1 1 1)
  • EnsNb = vector containing all my ensemble numbers (in this example it'll be 13 14 15 16 17)
  • Date = vector containing my dates + times (datenum format)
  • Val = containing the values on the last column.
So far I am trying:
filename='SimplifiedFile_commas.dat';
fid = fopen(filename);
C=textscan(fid,'%s%d%s%f64\n','delimiter',',');
fclose(fid);
celldisp(C)
which clearly doesn't work because i get:
C{1}{1} =
ÿþC e l l 1
C{2} =
[]
C{3} =
{}
C{4} =
[]
Could anyone give me a hint on the format I have to use? Thanks!!!

Accepted Answer

dpb
dpb on 2 Oct 2018
Edited: dpb on 3 Oct 2018
Just read the file as it's given then parse the data columns...
t=readtable('couzi.csv','ReadVariableNames',0,'Delimiter',','); % read base file
t.Var1=str2num(char(strrep(t.Var1,'Cell ',''))); % parse cell number
t.Var3=datetime(t.Var3,'Format','M-dd-yyyy HH:mm'); % convert to datetime
t.Properties.VariableNames={'Cell','Ens','Date','Val'} % set desired variable names
t =
5×4 table
Cell Ens Date Val
____ ___ _______________ _________
1 13 3-22-2018 05:30 -0.024964
1 14 3-22-2018 05:35 -0.024964
1 15 3-22-2018 05:40 -0.024964
1 16 3-22-2018 05:45 -0.024964
1 17 3-22-2018 05:50 -0.024964
>>
ADDENDUM
As pedagogical note; a little more obtuse but perhaps "more MATLABy" solution to the first column could be
t.Var1=cellfun(@(s) sscanf(s,'Cell%f'),t.Var1);
to read and convert the actual cell content rather than strip/replace. Not sure if there would be any noticeable performance difference for large files or not.
Which leads to the alternate original solution to parse the file directly-- textscan can be used but readtable is more flexible in many ways altho for a very large file the resulting table may be somewhat slow in performance.
fmt='Cell%f%f%q%f';
c=textscan(fid,fmt,'delimiter',',','collectoutput',0);
will return
c =
1×4 cell array
{5×1 double} {5×1 double} {5×1 cell} {5×1 double}
where the time string is a cell array to convert to datetime.
Unfortunately, the '%{}D' time format scanning routine is broken for embedded space in the string between the date and time fields as is so common; the obvious
>> textscan(c{3},'%{M-d-yyyy HH:mm}D')
Error using textscan
Unable to read the DATETIME data with the format 'M-d-yyyy HH:mm'. If the data is not a time, use %q to get
text data.
fails miserably. One can workaround it with
>> t=textscan(strrep(c{3}(1),' ',''),'%{M-d-yyyyHH:mm}D')
t =
1×1 cell array
{[3-22-201805:30]}
and end up with an ugly, illegible output format; this, too, is fixable:
>> t.Format='default'
t =
datetime
22-Mar-2018 05:30:00
  4 Comments
dpb
dpb on 3 Oct 2018
AH! I had not thought of there being the interaction with 'whitespace', Walter.
The textscan doc says when sees %{}D format the field is scanned as '%q' but of course the given string is malformed for embedded blanks by not being surrounded by ".
I suppose it would be too much to ask for but I think (and believe most users would inherently expect that) textscan/%D should be smart enough to read the provided format string as given and parse the corresponding field as told, only reporting error or failing when the string doesn't actually match the data or the format string itself is malformed, embedded delimiter or no.
But, thanks for pointing out the workaround; I'll try to recall it next time it comes up... :)
PS. I don't see the ambiguity you point out in the last example there; if 'whitespace','' is given in conjunction with any format string conversion specifier, textscan does not add the superfluous blank it does with other delimiters. That example seems clear to me it should work unless I'm missing something.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!