predict

Predict responses for Gaussian kernel regression model

Syntax

YFit = predict(Mdl,X)

YFit = predict(Mdl,X,PredictionForMissingValue=prediction)

Description

YFit = predict(Mdl,X) returns a vector of predicted responses for the predictor data in the matrix or table X, based on the binary Gaussian kernel regression model Mdl.

example

YFit = predict(Mdl,X,PredictionForMissingValue=prediction) uses the prediction value as the predicted response for observations with missing values in the predictor data X. By default, predict uses the median of the observed response values in the training data. (since R2023b)

Examples

collapse all

Predict Test Set Responses

Open Live Script

Predict the test set responses using a Gaussian kernel regression model for the carbig data set.

Load the carbig data set.

load carbig

Specify the predictor variables (X) and the response variable (Y).

X = [Weight,Cylinders,Horsepower,Model_Year];
Y = MPG;

Delete rows of X and Y where either array has NaN values. Removing rows with NaN values before passing data to fitrkernel can speed up training and reduce memory usage.

R = rmmissing([X Y]); 
X = R(:,1:4); 
Y = R(:,end);

Reserve 10% of the observations as a holdout sample. Extract the training and test indices from the partition definition.

rng(10)  % For reproducibility 
N = length(Y); 
cvp = cvpartition(N,'Holdout',0.1);
idxTrn = training(cvp); % Training set indices
idxTest = test(cvp);    % Test set indices

Train the regression kernel model. Standardize the training data.

Xtrain = X(idxTrn,:);
Ytrain = Y(idxTrn);
Mdl = fitrkernel(Xtrain,Ytrain,'Standardize',true)

Mdl = 
  RegressionKernel
              ResponseName: 'Y'
                   Learner: 'svm'
    NumExpansionDimensions: 128
               KernelScale: 1
                    Lambda: 0.0028
             BoxConstraint: 1
                   Epsilon: 0.8617

Mdl is a RegressionKernel model.

Predict responses for the test set.

Xtest = X(idxTest,:);
Ytest = Y(idxTest);

YFit = predict(Mdl,Xtest);

Create a table containing the first 10 observed response values and predicted response values.

table(Ytest(1:10),YFit(1:10),'VariableNames', ...
    {'ObservedValue','PredictedValue'})

ans=10×2 table
    ObservedValue    PredictedValue
    _____________    ______________

         18              17.616    
         14              25.799    
         24              24.141    
         25              25.018    
         14              13.637    
         14              14.557    
         18              18.584    
         27              26.096    
         21              25.031    
         13              13.324

Estimate the test set regression loss using the mean squared error loss function.

L = loss(Mdl,Xtest,Ytest)

L = 
9.2664

Input Arguments

collapse all

`Mdl` — Kernel regression model
`RegressionKernel` model object

Kernel regression model, specified as a RegressionKernel model object. You can create a RegressionKernel model object using fitrkernel.

`X` — Predictor data used to generate responses
numeric matrix | table

Predictor data used to generate responses, specified as a numeric matrix or table.

Each row of X corresponds to one observation, and each column corresponds to one variable.

For a numeric matrix:
- The variables in the columns of X must have the same order as the predictor variables that trained Mdl.
- If you trained Mdl using a table (for example, Tbl) and Tbl contains all numeric predictor variables, then X can be a numeric matrix. To treat numeric predictors in Tbl as categorical during training, identify categorical predictors using the CategoricalPredictors name-value pair argument of fitrkernel. If Tbl contains heterogeneous predictor variables (for example, numeric and categorical data types) and X is a numeric matrix, then predict throws an error.
For a table:
- predict does not support multicolumn variables or cell arrays other than cell arrays of character vectors.
- If you trained Mdl using a table (for example, Tbl), then all predictor variables in X must have the same variable names and data types as those that trained Mdl (stored in Mdl.PredictorNames). However, the column order of X does not need to correspond to the column order of Tbl. Also, Tbl and X can contain additional variables (response variables, observation weights, and so on), but predict ignores them.
- If you trained Mdl using a numeric matrix, then the predictor names in Mdl.PredictorNames and corresponding predictor variable names in X must be the same. To specify predictor names during training, see the PredictorNames name-value pair argument of fitrkernel. All predictor variables in X must be numeric vectors. X can contain additional variables (response variables, observation weights, and so on), but predict ignores them.

Data Types: double | single | table

`prediction` — Predicted response value to use for observations with missing predictor values
`"median"` (default) | `"mean"` | numeric scalar

Since R2023b

Predicted response value to use for observations with missing predictor values, specified as "median", "mean", or a numeric scalar.

Value	Description
`"median"`	`predict` uses the median of the observed response values in the training data as the predicted response value for observations with missing predictor values.
`"mean"`	`predict` uses the mean of the observed response values in the training data as the predicted response value for observations with missing predictor values.
Numeric scalar	`predict` uses this value as the predicted response value for observations with missing predictor values.

Example: "mean"

Example: NaN

Data Types: single | double | char | string

Output Arguments

collapse all

`YFit` — Predicted responses
numeric vector

Predicted responses, returned as a numeric vector.

YFit is an n-by-1 vector of the same data type as the response data (Y) used to train Mdl, where n is the number of observations in X.

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

The predict function supports tall arrays with the following usage notes and limitations:

predict does not support tall table data.

For more information, see Tall Arrays.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™. (since R2023a)

Usage notes and limitations:

Use saveLearnerForCoder, loadLearnerForCoder, and codegen (MATLAB Coder) to generate code for the predict function. Save a trained model by using saveLearnerForCoder. Define an entry-point function that loads the saved model by using loadLearnerForCoder and calls the predict function. Then use codegen to generate code for the entry-point function.
To generate single-precision C/C++ code for predict, specify the name-value argument "DataType","single" when you call the loadLearnerForCoder function.
If the code generator uses the Open Multiprocessing (OpenMP) library, the generated code of predict splits the predictor data X into multiple chunks and predicts responses for the chunks in parallel. The generated code uses parfor (MATLAB Coder) to create loops that run in parallel on supported shared-memory multicore platforms. If your compiler does not support the OpenMP application interface, or if you disable the OpenMP library, the generated code does not split the predictor data and, therefore, processes one observation at a time. To find supported compilers, see Supported Compilers. To disable the OpenMP library, set the EnableOpenMP property of the configuration object to false. For details, see coder.CodeConfig (MATLAB Coder).

This table contains notes about the arguments of predict. Arguments not included in this table are fully supported.

Argument	Notes and Limitations
`Mdl`	For the usage notes and limitations of the model object, see Code Generation of the `RegressionKernel` object.
`X`	For general code generation, `X` must be a single-precision or double-precision matrix or a table containing numeric variables, categorical variables, or both. The number of rows, or observations, in `X` can be a variable size, but the number of columns in `X` must be fixed. If you want to specify `X` as a table, then your model must be trained using a table, and your entry-point function for prediction must do the following: Accept data as arrays. Create a table from the data input arguments and specify the variable names in the table. Pass the table to `predict`. For an example of this table workflow, see Generate Code to Classify Data in Table. For more information on using tables in code generation, see Code Generation for Tables (MATLAB Coder) and Table Limitations for Code Generation (MATLAB Coder).
Name-value arguments	Names in name-value arguments must be compile-time constants. If the value of `PredictionForMissingValue` is nonnumeric, then it must be a compile-time constant.

For more information, see Introduction to Code Generation.

Version History

Introduced in R2018a

expand all

R2023b: Specify predicted response value to use for observations with missing predictor values

Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.

This table lists the object functions that support the PredictionForMissingValue name-value argument. By default, the functions use the training set median as the predicted response value for observations with missing predictor values.

Model Type	Model Objects	Object Functions
Gaussian process regression (GPR) model	`RegressionGP`, `CompactRegressionGP`	`loss`, `predict`, `resubLoss`, `resubPredict`
Gaussian process regression (GPR) model	`RegressionPartitionedGP`	`kfoldLoss`, `kfoldPredict`
Gaussian kernel regression model	`RegressionKernel`	`loss`, `predict`
Gaussian kernel regression model	`RegressionPartitionedKernel`	`kfoldLoss`, `kfoldPredict`
Linear regression model	`RegressionLinear`	`loss`, `predict`
Linear regression model	`RegressionPartitionedLinear`	`kfoldLoss`, `kfoldPredict`
Neural network regression model	`RegressionNeuralNetwork`, `CompactRegressionNeuralNetwork`	`loss`, `predict`, `resubLoss`, `resubPredict`
Neural network regression model	`RegressionPartitionedNeuralNetwork`	`kfoldLoss`, `kfoldPredict`
Support vector machine (SVM) regression model	`RegressionSVM`, `CompactRegressionSVM`	`loss`, `predict`, `resubLoss`, `resubPredict`
Support vector machine (SVM) regression model	`RegressionPartitionedSVM`	`kfoldLoss`, `kfoldPredict`

In previous releases, the regression model loss and predict functions listed above used NaN predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.

R2023a: Generate C/C++ code for prediction

You can generate C/C++ code for the predict function.

predict

Syntax

Description

Examples

Predict Test Set Responses

Input Arguments

Mdl — Kernel regression model RegressionKernel model object

X — Predictor data used to generate responses numeric matrix | table

prediction — Predicted response value to use for observations with missing predictor values "median" (default) | "mean" | numeric scalar

Output Arguments

YFit — Predicted responses numeric vector

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. (since R2023a)

Version History

R2023b: Specify predicted response value to use for observations with missing predictor values

R2023a: Generate C/C++ code for prediction

See Also

`Mdl` — Kernel regression model
`RegressionKernel` model object

`X` — Predictor data used to generate responses
numeric matrix | table

`prediction` — Predicted response value to use for observations with missing predictor values
`"median"` (default) | `"mean"` | numeric scalar

`YFit` — Predicted responses
numeric vector

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™. (since R2023a)