Using readtable on a csv with missing columns
30 views (last 30 days)
Show older comments
Hey all,
I'm writing matlab script to load a number of CSV files into matlab to process them.
The last column of these CSVs is a notes column, which is often left blank. The software we use to edit our CSVs sometimes doesn't even fill in the last comma, like so:
colA,colB,colC,Notes
1000,2000,3000
1100,2200,3300
1110,2220,3330
1111,2222,3333
My ideal output would be
colA colB colC Notes
____ ____ ____ ____
1000 2000 3000 NaN
1100 2200 3300 NaN
1110 2220 3330 NaN
1111 2222 3333 NaN
I'm currently using readtable to load this data in, and have tried a few different parameters. Unfortunately, they all seem to fall short.
tab = readtable("tt.csv", 'MissingRule', 'fill');
Var1 Var2 Var3
____ ____ ____
1000 2000 3000
1100 2200 3300
1110 2220 3330
1111 2222 3333
tab = readtable("tt.csv", 'ReadVariableNames', true)
x1000 x2000 x3000
_____ _____ _____
1100 2200 3300
1110 2220 3330
1111 2222 3333
Note how both examples are missing the variable names (they should be named with col instead of var), and they're both missing the 4th column.
Any ideas? I could just write a helper script that goes around adding extra commas, but that seems like a lot of extra work for what might just be a setting.
Thanks!
2 Comments
Voss
on 3 Sep 2024
Can you upload (using the paperclip button) at least two of these CSV files - at least one that has a Notes column and at least one that does not?
Accepted Answer
dpb
on 3 Sep 2024
Edited: dpb
on 3 Sep 2024
f=["colA,colB,colC,colD"
"1000,2000,3000"
"1100,2200,3300"
"1110,2220,3330"
"1111,2222,3333"];
writelines(f,'tt.csv')
type tt.csv
opt=detectImportOptions('tt.csv','filetype','text','readvariablenames',1,'Range',1)
t=readtable('tt.csv',opt)
NOTA BENE: If the notes column ever does contain a note, then reading will fail if try to read as numeric.
ERRATUM: I think there is a failure in that setting only 'ReadVariableNames',1 didn't read the first record as the variable names but skipped it and read the first data row instead. I would submit this as a bug/support request. Inferring import options from a general file of almost any description is a tough job; detectImportOptions continues to evolve; example cases that don't work as expected are always good fodder for enhancements.
That the file is malformed in missing the delimiter to always have the fourth column is an excuse, but it is a fairly common thing with csv files.
2 Comments
dpb
on 3 Sep 2024
Edited: dpb
on 3 Sep 2024
The how/why is compare the import options structure with and without it...the comment about 'ReadVariableNames' is that without the range value for some reason the first line was skipped as it should have been for data, but it was not read as the variable names per the documented behavior, but the second line was as your example above. That's what I think is worthy of a bug report.
To work around that, I forced it to begin reading at the first row.
What I didn't try was
f=["colA,colB,colC,colD"
"1000,2000,3000"
"1100,2200,3300"
"1110,2220,3330"
"1111,2222,3333"];
writelines(f,'tt.csv')
%type tt.csv
%opt=detectImportOptions('tt.csv','filetype','text','readvariablenames',1,'Range',1)
t=readtable('tt.csv','readvariablenames',1,'Range',1)
That also works at least on this example without the extra step of the import options object.
More Answers (0)
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!