Regression Learner: Training a model with multiple datasets?

23 views (last 30 days)
My goal is to train a model with 100 datasets, which are loaded into the workspace as arrays. Each dataset shows 5 predictors and 1 response in the timespan of 1 year. The Problem is, that in the Regression Learner App, its only possible to train a model with 1 dataset, not 100.
1.Is it possible for an array (for example a tall array) to recognize that it contains different datasets? If so, does the Regression Learner also recognizes the datasets so that the array can be used for training? Simply adding the datasets doesnt work, because the start and the end of each dataset doesnt match
2. As explained at https://de.mathworks.com/help/stats/export-regression-model-to-predict-new-data.html , a way to train the model with more datasets should be possible trough generating the Function. The generated function is saved as "trainRegressionModel":
function [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
% [trainedModel, validationRMSE] = trainRegressionModel(trainingData)
% Returns a trained regression model and its RMSE. This code recreates the
% model trained in Regression Learner app. Use the generated code to
% automate training the same model with new data, or to learn how to
% programmatically train models.
%
% Input:
% trainingData: A matrix with the same number of columns and data type
% as the matrix imported into the app.
%
% Output:
% trainedModel: A struct containing the trained regression model. The
% struct contains various fields with information about the trained
% model.
%
% trainedModel.predictFcn: A function to make predictions on new data.
%
% validationRMSE: A double containing the RMSE. In the app, the Models
% pane displays the RMSE for each model.
%
% Use the code to train the model with new data. To retrain your model,
% call the function from the command line with your original data or new
% data as the input argument trainingData.
%
% For example, to retrain a regression model trained with the original data
% set T, enter:
% [trainedModel, validationRMSE] = trainRegressionModel(T)
%
% To make predictions with the returned 'trainedModel' on new data T2, use
% yfit = trainedModel.predictFcn(T2)
%
% T2 must be a matrix containing only the predictor columns used for
% training. For details, enter:
% trainedModel.HowToPredict
% Auto-generated by MATLAB on 22-Nov-2021 12:01:25
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
% Convert input to table
inputTable = array2table(trainingData, 'VariableNames', {'column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6', 'column_7'});
predictorNames = {'column_1', 'column_2', 'column_3', 'column_4', 'column_5'};
predictors = inputTable(:, predictorNames);
response = inputTable.column_6;
isCategoricalPredictor = [false, false, false, false, false];
% Train a regression model
% This code specifies all the model options and trains the model.
regressionTree = fitrtree(...
predictors, ...
response, ...
'MinLeafSize', 4, ...
'Surrogate', 'off');
% Create the result struct with predict function
predictorExtractionFcn = @(x) array2table(x, 'VariableNames', predictorNames);
treePredictFcn = @(x) predict(regressionTree, x);
trainedModel.predictFcn = @(x) treePredictFcn(predictorExtractionFcn(x));
% Add additional fields to the result struct
trainedModel.RegressionTree = regressionTree;
trainedModel.About = 'This struct is a trained model exported from Regression Learner R2021b.';
trainedModel.HowToPredict = sprintf('To make predictions on a new predictor column matrix, X, use: \n yfit = c.predictFcn(X) \nreplacing ''c'' with the name of the variable that is this struct, e.g. ''trainedModel''. \n \nX must contain exactly 5 columns because this model was trained using 5 predictors. \nX must contain only predictor columns in exactly the same order and format as your training \ndata. Do not include the response column or any columns you did not import into the app. \n \nFor more information, see <a href="matlab:helpview(fullfile(docroot, ''stats'', ''stats.map''), ''appregression_exportmodeltoworkspace'')">How to predict using an exported model</a>.');
% Extract predictors and response
% This code processes the data into the right shape for training the
% model.
% Convert input to table
inputTable = array2table(trainingData, 'VariableNames', {'column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6', 'column_7'});
predictorNames = {'column_1', 'column_2', 'column_3', 'column_4', 'column_5'};
predictors = inputTable(:, predictorNames);
response = inputTable.column_6;
isCategoricalPredictor = [false, false, false, false, false];
% Perform cross-validation
partitionedModel = crossval(trainedModel.RegressionTree, 'KFold', 5);
% Compute validation predictions
validationPredictions = kfoldPredict(partitionedModel);
% Compute validation RMSE
validationRMSE = sqrt(kfoldLoss(partitionedModel, 'LossFun', 'mse'));
After running the function with a new dataset, a struct named "trainedModel" is created in the workspace. The struct contains a bunch of informations and it is possible to make predictions with it, but I found no way to further train the trainedModel. Is there a way to use the generated struct for further training?
Also I dont understand why the part "extract predictors and response" is used a second time after the struct was already created. Doesnt it have 0 impact at all ,because it executes after the struct is created?

Answers (1)

Shivam Singh
Shivam Singh on 8 Dec 2021
Hello,
It is my understanding that you want to do Incremental Learning.
For initializing and training an incremental model using the information gained from previous training in the regression learner app, you may refer the following:
This can also be done for a simple linear model returned by fitlm. Then you can use the incrementalRegressionLinear and updateMetricsAndFit functions to perform incremental learning. You may refer the following example:
Mdl = fitlm(X,Y); % X,Y is one dataset
Bias = Mdl.Coefficients.Estimate(1);
Beta = Mdl.Coefficients.Estimate(2:end);
IncrementalMdl = incrementalRegressionLinear('Learner','leastsquares',... 'Bias',Bias,'Beta',Beta);
% Xnew, Ynew is another dataset (containing the same predictor and response variables)
IncrementalMdl = updateMetricsAndFit(incrementalMdl, Xnew, Ynew )
For more examples, you may refer the following:
Please note that the trained incremental regression model may not give the exact same results as a model trained on all the data, together.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!