Main Content

predict

Predict responses using multiresponse regression model

Since R2024b

    Description

    predictedY = predict(Mdl,X) returns the predicted responses for the predictor data in the matrix or table X using the trained multiresponse regression model Mdl. For more information, see Prediction with Regression Chain Ensembles.

    By default, predict returns the predicted responses as a matrix. That is, this syntax is equivalent to predict(Mdl,X,OutputType="matrix").

    predictedY = predict(Mdl,X,OutputType="table") returns the predicted responses as a table.

    example

    Examples

    collapse all

    Create a regression model with more than one response variable by using fitrchains.

    Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Displacement, Horsepower, and so on, as well as the response variables Acceleration and MPG. Display the first eight rows of the table.

    load carbig
    cars = table(Displacement,Horsepower,Model_Year, ...
        Origin,Weight,Acceleration,MPG);
    head(cars)
        Displacement    Horsepower    Model_Year    Origin     Weight    Acceleration    MPG
        ____________    __________    __________    _______    ______    ____________    ___
    
            307            130            70        USA         3504           12        18 
            350            165            70        USA         3693         11.5        15 
            318            150            70        USA         3436           11        18 
            304            150            70        USA         3433           12        16 
            302            140            70        USA         3449         10.5        17 
            429            198            70        USA         4341           10        15 
            454            220            70        USA         4354            9        14 
            440            215            70        USA         4312          8.5        14 
    

    Categorize the cars based on whether they were made in the USA.

    cars.Origin = categorical(cellstr(cars.Origin));
    cars.Origin = mergecats(cars.Origin,["France","Japan",...
        "Germany","Sweden","Italy","England"],"NotUSA");

    Partition the data into training and test sets. Use approximately 85% of the observations to train a multiresponse model, and 15% of the observations to test the performance of the trained model on new data. Use cvpartition to partition the data.

    rng("default") % For reproducibility
    c = cvpartition(height(cars),"Holdout",0.15);
    carsTrain = cars(training(c),:);
    carsTest = cars(test(c),:);

    Train a multiresponse regression model by passing the carsTrain training data to the fitrchains function. By default, the function uses bagged ensembles of trees in the regression chains.

    Mdl = fitrchains(carsTrain,["Acceleration","MPG"])
    Mdl = 
      RegressionChainEnsemble
               PredictorNames: {'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
                 ResponseName: ["Acceleration"    "MPG"]
        CategoricalPredictors: 4
            ResponseTransform: 'none'
              NumObservations: 346
    
    
    

    Mdl is a trained RegressionChainEnsemble model object. You can use dot notation to access the properties of Mdl. For example, you can specify Mdl.Learners to see the bagged ensembles used to train the model.

    Evaluate the performance of the regression model on the test set by computing the test mean squared error (MSE). Smaller MSE values indicate better performance. Return the loss for each response variable separately by setting the OutputType name-value argument to "per-response".

    testMSE = loss(Mdl,carsTest,["Acceleration","MPG"], ...
        OutputType="per-response")
    testMSE = 1×2
    
        2.4909    9.0154
    
    

    Predict the response values for the observations in the test set. Return the predicted response values as a table.

    predictedY = predict(Mdl,carsTest,OutputType="table")
    predictedY=60×2 table
        Acceleration     MPG  
        ____________    ______
    
           11.847       16.124
           10.625       13.991
           11.142       12.963
           15.106       21.015
           12.227       13.764
           13.264       14.154
           17.129       30.216
           16.379       29.004
           13.374       14.188
             11.3       13.055
           13.482       13.274
           15.006       20.903
           16.481       24.615
           12.429        15.31
           15.699       19.329
           12.095       13.274
          ⋮
    
    

    Input Arguments

    collapse all

    Multiresponse regression model, specified as a RegressionChainEnsemble or CompactRegressionChainEnsemble object.

    Predictor data, specified as a numeric matrix or a table. Each row of X corresponds to one observation, and each column corresponds to one predictor. X must have the same data type as the predictor data used to train Mdl, and must contain the same predictors.

    Data Types: single | double | table

    Output Arguments

    collapse all

    Predicted responses, returned as a numeric matrix or table. For observation i in X and response variable j, the value predictedY(i,j) is the mean predicted response value across the regression chains.

    For more information, see Prediction with Regression Chain Ensembles.

    Algorithms

    collapse all

    Prediction with Regression Chain Ensembles

    A regression chain is a sequence of regression models in which the response variables for previous models become predictor variables for subsequent models. If the training data consists of p predictor variables and k response variables, then a regression chain includes exactly k models, each with a different response variable. The first model has p predictors, the second model has p+1 predictors, and so on, with the last model having p+k–1 predictors.

    Mdl is a regression chain ensemble, where each row of Mdl.Learners corresponds to one regression chain. Each entry in Mdl.Learners is a compact regression model object. Each model produces predictions for one response variable. For example, for regression chain i:

    • The first model (Mdl.Learners{i,1}) uses the predict object function of the compact object to predict values for response variable Mdl.ChainOrders(i,1) by using the predictor data in X.

    • The second model (Mdl.Learners{i,2}) uses the predict object function to predict values for response variable Mdl.ChainOrders(i,2) by using the predictor data in X and the predicted response values returned by Mdl.Learners{i,1}.

    • The process repeats for each model in the regression chain, so that each response variable has a set of predicted response values.

    After each regression chain produces predicted responses, the software averages the results to return predictedY.

    References

    [1] Spyromitros-Xioufis, Eleftherios, Grigorios Tsoumakas, William Groves, and Ioannis Vlahavas. "Multi-Target Regression via Input Space Expansion: Treating Targets as Inputs." Machine Learning 104, no. 1 (July 2016): 55–98. https://doi.org/10.1007/s10994-016-5546-z.

    Version History

    Introduced in R2024b