new input prediction error for a model created through onehotencode function

1 view (last 30 days)
Hello everybody,
This time, I worked with a data set with some categorical features.
If using a function named "onehotencode", I can easily convert the categorica variable into a numeric type, and then making the prediction model using it.
I have a problem ...
To predict a new input using the created model, there is an error showened "Input Data Sizes do not match net".
I think the reason is..., if the categorical features of new input is processed in onehotencode,
the size of newinput is smaller than the data set used in the model,
I think therer is some solution for it.... Is there a way to solve this?
Below is my code~!!!!
clear all; close all;clc;
% Load the carbig data set.
data = readtable('modeldata.xlsx');
% To train a network using categorical features, first convert the categorical features to numeric.
categoricalInputNames = ["cyl4" "Mfg" "Origin" "when"];
tbl = convertvars(data,categoricalInputNames,'categorical');
% Loop over the categorical input variables. For each variable:
for i = 1:numel(categoricalInputNames)
name = categoricalInputNames(i);
oh = onehotencode(tbl(:,name));
tbl = addvars(tbl,oh,'After',name);
tbl(:,name) = [];
end
% Split the vectors into separate columns using the splitvars function.
tbl = splitvars(tbl);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
x = tbl{:,2:end}'; % input data.
t = tbl{:,1}'; % target data.
% Choose a Training Function
trainFcn = 'trainlm'; % Levenberg-Marquardt backpropagation.
% Create a Fitting Network
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize,trainFcn);
% Train the Network
[net,tr] = train(net,x,t);
% Test the Network
prediction = net(x);
plotregression(t,prediction )
save net
%% New input prediction
clear all; close all; clc; % 초기화
newinput = readtable('newinput.xlsx'); % new input import
% Convert the categorical features to numeric.
categoricalInputNames = ["cyl4" "Mfg" "Origin" "when"];
newinput = convertvars(newinput,categoricalInputNames,'categorical');
% Loop over the categorical input variables. For each variable:
for i = 1:numel(categoricalInputNames)
name = categoricalInputNames(i);
oh = onehotencode(newinput(:,name));
newinput = addvars(newinput,oh,'After',name);
newinput(:,name) = [];
end
% Split the vectors into separate columns using the splitvars function.
newinput = splitvars(newinput);
newx = newinput{:,:}'; % input data.
% Error occures here belows. this is the part for prediction using model with new input
load net
newoutput = net(newx);

Answers (1)

Sanjana
Sanjana on 9 Jun 2023
Hi Smithy,
I understand that you are facing an issue when creating onehotencodings for the categorical data present in the train and test dataset.
As per my analysis, the problem here is that, when using “onehotencode” function, it automatically decides the no. of categories based on the input, and because of difference in the train and test data elements, the dimensions of the onehotencodings created are also different.
One possible solution is to fix the categories based on the input data, and pass those categories as “ClassNames” to the “onehotencode” function, to remove the randomness introduced for the no. of categories based on the input data.
Please refer to the following documentation, for further information,
Hope this helps!

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!