Read csv with quotes and datetime stamp (readtable configuration)

10 views (last 30 days)
Hi,
I have a csv file which contains the following format for each line. I understand how fscanf can be used for parsing data if the first digit is just a number: https://www.mathworks.com/matlabcentral/answers/1805220-read-csv-with-quotes-and-numbers. I tried setting this up for a datetime entry in the first field. However this gets tricky as I have to scan individual fields in the date and time if using fscanf.
Is there a more efficient way to parse and scan this using readtable where I can use the datatime variable format so that the date and time can be directly included in the post processing of the data?
Unfortunately I am not able to remove the quotes at this time due to the way in which the datafile is generated. Thanks a lot !
Data format:
14-Sep-2022 10:02:38.140000,"1034,19,AB"
14-Sep-2022 10:02:38.140000,"1034,19,AB"

Accepted Answer

Walter Roberson
Walter Roberson on 14 Sep 2022
lines = {'14-Sep-2022 10:02:38.140000,"1034,19,AB"', '14-Sep-2022 10:02:38.140000,"1034,19,AB"'};
filename = tempname + ".txt";
writelines(lines, filename);
dbtype(filename)
1 14-Sep-2022 10:02:38.140000,"1034,19,AB" 2 14-Sep-2022 10:02:38.140000,"1034,19,AB"
fmt = '%{dd-MMM-uuuu}D %{hh:mm:ss.SSSSSS}T"%f%f%[^"]"';
fid = fopen(filename, 'r');
datacell = textscan(fid, fmt, 'whitespace', ',');
fclose(fid);
Time = datacell{1} + datacell{2};
Time.Format = 'dd-MMM-uuuu HH:mm:ss.SSSSSS';
T = table(Time, datacell{3:end})
T = 2×4 table
Time Var2 Var3 Var4 ___________________________ ____ ____ ______ 14-Sep-2022 10:02:38.140000 1034 19 {'AB'} 14-Sep-2022 10:02:38.140000 1034 19 {'AB'}
  2 Comments
Walter Roberson
Walter Roberson on 15 Sep 2022
There are two tricks here:
  1. it is difficult to get a %D format to include a space when parsing a datetime specification. It is possible by changing the Whitespace specification, but that tends to mess up other fields. So it is typically easier to read the date portion with %D and the time portion with %T and then add the two together
  2. When using datetime() the input format specification for 24 hour days is HH and where hh would be used for 12 hour days (with there typically being an 'a' field for AM/PM indicator). But when you are using duration() then the number of hours is purely a count, not a time of day -- for example 78:32 would be valid for 78 hours 32 minutes for duration() purposes. So there is a conflict between datetime() and duration() as to what to use for hours beyond 12 in a day -- you need HH for datetime but hh for duration. You cannot simply break a datetime() format up into date and time portions; you have to change the HH of datetime to hh for duration purposes.

Sign in to comment.

More Answers (1)

dpb
dpb on 14 Sep 2022
readtable will import the file time field as a datetime variable automagically; you don't need to do anything except read it.
You'll then have to split the second column that will be imported as a cellstr array
I thought could add the " to the 'Whitespace' but that didn't work by itself -- would take a fair amount of effort to modify an import options object to account for the additional variables; think it's simpler to just import as is and split. Or, of course, you could read the file as text and delete the " and rewrite or parse from memory instead.
  1 Comment
AG15
AG15 on 14 Sep 2022
Edited: AG15 on 14 Sep 2022
Thank, yes I was thinking oriniginally of replacing the "" by a comma in a text editor as an intermediate step to allow parsing the data, that I think would also allow the readtable to parse data

Sign in to comment.

Categories

Find more on Dates and Time in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!