How to convert categorical data to numeric in separate columns?
15 views (last 30 days)
Show older comments
% Hi! I have a dataset 'data5' with a column 'Location' which contains values Asia, US and Africa.
% I'm wanting to convert it to 3 separate columns, one for each location, which contains a 1 if the row is from that location and 0 otherwise
% This is the function I have created:
function data = categorical_values(data, var)
uniques = unique(var);
for i = 1:length(uniques)
values(:, i) = double(ismember(var, uniques(i)));
end
t = table;
[rows, cols] = size(values);
for i = 1:cols
t1 = table(values(:, i));
t1.Properties.VariableNames = uniques(i);
t = [t t1];
end
data = [t data];
end
% And this is the code I have been running, in a file called prep.m:
new = categorical_values(data5, data5.Location);
new.Location = []; % delete the old Location column
% I have been getting this error:
Error using categorical_values (line 11)
The VariableNames property is a cell array of character vectors. To
assign multiple variable names, specify names in a string array or a cell
array of character vectors.
Error in prep (line 16)
new = categorical_values(data5, data5.Location);
% Can anyone help??????? Thanks!
0 Comments
Answers (1)
Adam Danz
on 10 Aug 2020
Edited: Adam Danz
on 26 Oct 2020
Here's a more efficient solution.
% Create demo data
location = categorical({'Asia','US','Asia','Africa','Africa','US','US','Asia'}');
unqCountries = unique(location(:)')
% Create matrix of 1s % 0s.
% Columns are identified by "unqCountries"
countryIdx = location(:) == unqCountries
% If you want to turn it into a table
T = array2table(countryIdx, 'VariableNames', string(unqCountries))
The error you're getting is because you're assigning a categorical variable as a table variable name which must be a character array or string. Convert to string:
t1.Properties.VariableNames = string(unique(i));
4 Comments
Adam Danz
on 26 Oct 2020
"Is this same as dummy coding or One Hot Encoding?"
The T table could be used as dummy variables and contains binary values (true|false) which is similar to using dummy variables in regression.
See Also
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!