How can I get MATLAB to preserve column names as variables when importing a csv file to timetable?

169 views (last 30 days)
Hello,
I can't seem to get MATLAB to import a csv file and preserve the column names as variables. The variable names come back as Var1, Var2, etc. Here's one of the approaches I've tried:
opts = detectImportOptions('test.csv',"VariableNamingRule","preserve");
test= readtimetable('test.csv', opts);
I've also left off the options qualifiers, but to no avail. The variable names still come back as Var1, Var2, etc.
Interestingly, the variable names are preserved if I use the Import wizard, but this is not a long-term solution. I have the same problem if readtable is used instead of readtimetable.
Any assistance greatly appreciated! Thanks for reading-
Nina
  12 Comments
Nina Sweeney
Nina Sweeney on 25 Oct 2023
Walter, Star Strider:
Yes, I have no trouble preserving the variable names for the snippet; the problem is that the variable names change to Var1,...,Var14 for the full csv file.
Nina
Nina Sweeney
Nina Sweeney on 25 Oct 2023
I haven't tried datastore yet, but I plan to do so.
Here's the solution, at least for now:
  1. Import the large csv with the Import Tool.
  2. Once in the Import Tool: Import Selection > Generate Script (or Generate Function).
  3. Save and run the script. The script produces the needed options to use the column headers of the csv file as variable names.
  4. The generated code is shown below:
IMPORTFILE Import data from a text file
% NODE202FLIGHTDATAOUT = IMPORTFILE(FILENAME) reads data from text file
% FILENAME for the default selection. Returns the data as a table.
%
% NODE202FLIGHTDATAOUT = IMPORTFILE(FILE, DATALINES) reads data for the
% specified row interval(s) of text file FILENAME. Specify DATALINES as
% a positive scalar integer or a N-by-2 array of positive scalar
% integers for dis-contiguous row intervals.
%
% Example:
% node202flightdataout = importfile("C:\Nina\Aircraft Position\All Logs Update 1\node202_flight_data_out.csv", [2, Inf]);
%
% See also READTABLE.
%
% Auto-generated by MATLAB on 25-Oct-2023 08:07:30
%% Input handling
% If dataLines is not specified, define defaults
if nargin < 2
dataLines = [2, Inf];
end
%% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 15);
% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";
% Specify column names and types
opts.VariableNames = ["LogSet", "UTCTimeS", "Time", "rxEbno1For257", "rxEbno2For257", "RxLinkRatebps", "TxPowerdBm", "TxLinkRatebps", "Lat", "lon", "Altm", "Headingdeg", "Rolldeg", "Pitchdeg", "VarName15"];
opts.VariableTypes = ["double", "double", "datetime", "double", "double", "double", "double", "double", "double", "double", "double", "double", "double", "double", "string"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Specify variable properties
opts = setvaropts(opts, "VarName15", "WhitespaceRule", "preserve");
opts = setvaropts(opts, "VarName15", "EmptyFieldRule", "auto");
opts = setvaropts(opts, "Time", "InputFormat", "yyyy-MM-dd'T'HH:mm:ss");
% Import the data
node202flightdataout = readtable(filename, opts);
end

Sign in to comment.

Accepted Answer

Nina Sweeney
Nina Sweeney on 26 Oct 2023
I haven't tried datastore yet, but I plan to do so.
Here's the solution, at least for now:
  1. Import the large csv with the Import Tool.
  2. Once in the Import Tool: Import Selection > Generate Script (or Generate Function).
  3. Save and run the script. The script produces the needed options to use the column headers of the csv file as variable names.
  4. The generated code is shown below:
IMPORTFILE Import data from a text file
% NODE202FLIGHTDATAOUT = IMPORTFILE(FILENAME) reads data from text file
% FILENAME for the default selection. Returns the data as a table.
%
% NODE202FLIGHTDATAOUT = IMPORTFILE(FILE, DATALINES) reads data for the
% specified row interval(s) of text file FILENAME. Specify DATALINES as
% a positive scalar integer or a N-by-2 array of positive scalar
% integers for dis-contiguous row intervals.
%
% Example:
% node202flightdataout = importfile("C:\Nina\Aircraft Position\All Logs Update 1\node202_flight_data_out.csv", [2, Inf]);
%
% See also READTABLE.
%
% Auto-generated by MATLAB on 25-Oct-2023 08:07:30
%% Input handling
% If dataLines is not specified, define defaults
if nargin < 2
dataLines = [2, Inf];
end
%% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 15);
% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";
% Specify column names and types
opts.VariableNames = ["LogSet", "UTCTimeS", "Time", "rxEbno1For257", "rxEbno2For257", "RxLinkRatebps", "TxPowerdBm", "TxLinkRatebps", "Lat", "lon", "Altm", "Headingdeg", "Rolldeg", "Pitchdeg", "VarName15"];
opts.VariableTypes = ["double", "double", "datetime", "double", "double", "double", "double", "double", "double", "double", "double", "double", "double", "double", "string"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Specify variable properties
opts = setvaropts(opts, "VarName15", "WhitespaceRule", "preserve");
opts = setvaropts(opts, "VarName15", "EmptyFieldRule", "auto");
opts = setvaropts(opts, "Time", "InputFormat", "yyyy-MM-dd'T'HH:mm:ss");
% Import the data
node202flightdataout = readtable(filename, opts);
end

More Answers (2)

Star Strider
Star Strider on 24 Oct 2023
Edited: Star Strider on 25 Oct 2023
I am not certain that 'VariableNamingRule' works as part of the options structure.
Try this instead:
test= readtimetable('test.csv', ,"VariableNamingRule","preserve");
EDIT — (25 Oct 2023 at 01:28)
It seems to work correctly here —
opts = detectImportOptions('test.csv',"VariableNamingRule","preserve");
test = readtimetable('test.csv', opts)
test = 3×13 timetable
Time LogSet UTCTimeS rxEbno1For257 rxEbno2For257 RxLinkRate(bps) TxPower(dBm) TxLinkRate(bps) Lat lon Alt(m) Heading(deg) Roll(deg) Pitch(deg) ____________________ ______ __________ _____________ _____________ _______________ ____________ _______________ ______ _______ ______ ____________ _________ __________ 11-Aug-2023 21:02:29 0 1.6918e+09 1.3 9.4 76250 2.2 20250 38.292 -76.419 29 125.15 -0.86387 1.8717 11-Aug-2023 21:02:30 0 1.6918e+09 -1.7 9 76250 3.5 20250 38.292 -76.419 29 124.9 -0.87145 1.9309 11-Aug-2023 21:02:31 0 1.6918e+09 -6.8 8.6 76250 2.9 20250 38.292 -76.419 29 124.24 -0.89849 1.9858
EDIT — (25 Oct 2023 at 11:45)
@Nina Sweeney — See if the same problem occurs with datastore. It also has a VariableNamingRule option. It is designed for large arrays, so it may work with your file.
.
  2 Comments
Star Strider
Star Strider on 25 Oct 2023
I tried that recently and it threw an error, although I don’t remember the context. Using it as a name-value pair (copied and pasted, so spelling or typos were not an issue) in readtable worked.

Sign in to comment.


Walter Roberson
Walter Roberson on 24 Oct 2023
Works for me, provided that the number of header variables does not exceed the number of detected columns.
When I accidentally added extra numeric columns in the data, I did get trailing Var* variables being generated.
All of this requires that detectImportOptions was able to figure out which row had your variable names. If that goes wrong, then Sure, it might end up generating Var* variables.
filename = 'test.csv';
opts = detectImportOptions(filename,"VariableNamingRule","preserve");
%for this PARTICULAR test file, the datetime format is not recognized
opts = setvartype(opts,1,'datetime');
opts = setvaropts(opts,1,'InputFormat', 'MMM d uuuu HH:mm');
test = readtable('test.csv',opts)
test = 2×7 table
first second third one fourth %increase month to date last ____________________ ______ _________ ______ _________ _____________ ____ 03-Jan-2001 18:03:00 1 2 3 4 5 6 17-Aug-2022 05:19:00 9 8 7 6 5 4

Categories

Find more on Data Preprocessing in Help Center and File Exchange

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!