# fitrchains

## Syntax

## Description

returns a trained multiresponse regression model `Mdl`

= fitrchains(`Tbl`

,`ResponseVarNames`

)`Mdl`

by using
regression chains. The function trains the model using the predictors in the table
`Tbl`

and the response values in the
`ResponseVarNames`

table variables. For more information, see Regression Chains.

specifies options using one or more name-value arguments in addition to any of the input
argument combinations in previous syntaxes. For example, you can specify the type of model
to use in the regression chains by setting the `Mdl`

= fitrchains(___,`Name=Value`

)`Learner`

name-value
argument.

## Examples

### Train Multiresponse Regression Model with Regression Chains

Create a regression model with more than one response variable by using `fitrchains`

.

Load the `carbig`

data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables `Displacement`

, `Horsepower`

, and so on, as well as the response variables `Acceleration`

and `MPG`

. Display the first eight rows of the table.

load carbig cars = table(Displacement,Horsepower,Model_Year, ... Origin,Weight,Acceleration,MPG); head(cars)

Displacement Horsepower Model_Year Origin Weight Acceleration MPG ____________ __________ __________ _______ ______ ____________ ___ 307 130 70 USA 3504 12 18 350 165 70 USA 3693 11.5 15 318 150 70 USA 3436 11 18 304 150 70 USA 3433 12 16 302 140 70 USA 3449 10.5 17 429 198 70 USA 4341 10 15 454 220 70 USA 4354 9 14 440 215 70 USA 4312 8.5 14

Categorize the cars based on whether they were made in the USA.

cars.Origin = categorical(cellstr(cars.Origin)); cars.Origin = mergecats(cars.Origin,["France","Japan",... "Germany","Sweden","Italy","England"],"NotUSA");

Partition the data into training and test sets. Use approximately 85% of the observations to train a multiresponse model, and 15% of the observations to test the performance of the trained model on new data. Use `cvpartition`

to partition the data.

rng("default") % For reproducibility c = cvpartition(height(cars),"Holdout",0.15); carsTrain = cars(training(c),:); carsTest = cars(test(c),:);

Train a multiresponse regression model by passing the `carsTrain`

training data to the `fitrchains`

function. By default, the function uses bagged ensembles of trees in the regression chains.

Mdl = fitrchains(carsTrain,["Acceleration","MPG"])

Mdl = RegressionChainEnsemble PredictorNames: {'Displacement' 'Horsepower' 'Model_Year' 'Origin' 'Weight'} ResponseName: ["Acceleration" "MPG"] CategoricalPredictors: 4 ResponseTransform: 'none' NumObservations: 346

`Mdl`

is a trained `RegressionChainEnsemble`

model object. You can use dot notation to access the properties of `Mdl`

. For example, you can specify `Mdl.Learners`

to see the bagged ensembles used to train the model.

Evaluate the performance of the regression model on the test set by computing the test mean squared error (MSE). Smaller MSE values indicate better performance. Return the loss for each response variable separately by setting the `OutputType`

name-value argument to `"per-response"`

.

testMSE = loss(Mdl,carsTest,["Acceleration","MPG"], ... OutputType="per-response")

`testMSE = `*1×2*
2.4909 9.0154

Predict the response values for the observations in the test set. Return the predicted response values as a table.

`predictedY = predict(Mdl,carsTest,OutputType="table")`

`predictedY=`*60×2 table*
Acceleration MPG
____________ ______
11.847 16.124
10.625 13.991
11.142 12.963
15.106 21.015
12.227 13.764
13.264 14.154
17.129 30.216
16.379 29.004
13.374 14.188
11.3 13.055
13.482 13.274
15.006 20.903
16.481 24.615
12.429 15.31
15.699 19.329
12.095 13.274
⋮

### Specify Multiresponse Regression Model Properties

Train a multiresponse regression model using regression chains. Specify the type of regression models to use in the regression chains, and train the models with predicted values for response variables used as predictors.

Load the `carbig`

data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables `Displacement`

, `Horsepower`

, and so on, as well as the response variables `Acceleration`

and `MPG`

. Display the first eight rows of the table.

load carbig cars = table(Displacement,Horsepower,Model_Year, ... Origin,Weight,Acceleration,MPG); head(cars)

Displacement Horsepower Model_Year Origin Weight Acceleration MPG ____________ __________ __________ _______ ______ ____________ ___ 307 130 70 USA 3504 12 18 350 165 70 USA 3693 11.5 15 318 150 70 USA 3436 11 18 304 150 70 USA 3433 12 16 302 140 70 USA 3449 10.5 17 429 198 70 USA 4341 10 15 454 220 70 USA 4354 9 14 440 215 70 USA 4312 8.5 14

Categorize the cars based on whether they were made in the USA.

cars.Origin = categorical(cellstr(cars.Origin)); cars.Origin = mergecats(cars.Origin,["France","Japan",... "Germany","Sweden","Italy","England"],"NotUSA");

Remove observations with missing values.

cars = rmmissing(cars);

Train a multiresponse regression model by passing the `cars`

data to the `fitrchains`

function. Use regression chains composed of regression support vector machine (SVM) models with standardized numeric predictors. When training the SVM models, use the predicted values for the response variables that are treated as predictors.

Mdl = fitrchains(cars,["Acceleration","MPG"], ... Learner=templateSVM(Standardize=true), ... ChainPredictedResponse=true);

`Mdl`

is a trained `RegressionChainEnsemble`

model object. You can use dot notation to access the properties of `Mdl`

.

Display the order of the response variables in the regression chains in `Mdl`

, and display the trained regression SVM models in the regression chains.

Mdl.ChainOrders

`ans = `*2×2*
1 2
2 1

Mdl.Learners

`ans=`*2×2 cell array*
{1x1 classreg.learning.regr.CompactRegressionSVM} {1x1 classreg.learning.regr.CompactRegressionSVM}
{1x1 classreg.learning.regr.CompactRegressionSVM} {1x1 classreg.learning.regr.CompactRegressionSVM}

In the first regression chain, the first SVM model uses `Acceleration`

as the response variable. The second SVM model uses `MPG`

as the response variable and the predicted values for `Acceleration`

as a predictor variable. The first SVM model provides the predicted `Acceleration`

values used by the second SVM model.

Recall that the SVM models use standardized numeric predictors. Find the means (`Mu`

) and standard deviations (`Sigma`

) used by the second model in the first regression chain.

Chain1Model2 = Mdl.Learners{1,2}; Mdl.PredictorNames

`ans = `*1x5 cell*
{'Displacement'} {'Horsepower'} {'Model_Year'} {'Origin'} {'Weight'}

Chain1Model2.ExpandedPredictorNames

`ans = `*1x7 cell*
{'x1'} {'x2'} {'x3'} {'x4 == 1'} {'x4 == 2'} {'x5'} {'x6'}

Chain1Model2.Mu

ans =1×710^{3}× 0.1944 0.1045 0.0760 0 0 2.9776 0.0153

Chain1Model2.Sigma

`ans = `*1×7*
104.6440 38.4912 3.6837 1.0000 1.0000 849.4026 2.2190

The SVM model uses five numeric predictors: `Displacement`

(`x1`

), `Horsepower`

(`x2`

), `Model_Year`

(`x3`

), `Weight`

(`x5`

), and the predicted values for `Acceleration`

(`x6`

). The software uses the corresponding `Mu`

and `Sigma`

values to standardize the predictor data before predicting with the `predict`

object function.

The categorical predictor `Origin`

is split into two variables (`x4 == 1`

and `x4 == 2`

) after categorical expansion. The corresponding `Mu`

and `Sigma`

values indicate that the two variables are unchanged after standardization.

## Input Arguments

`Tbl`

— Sample data

table

Sample data used to train the model, specified as a table. Each row of
`Tbl`

corresponds to one observation, and each column corresponds
to one variable. Multicolumn variables and cell arrays other than cell arrays of
character vectors are not allowed.

`Tbl`

must contain columns for the response variables and can
contain a column for the observation weights. Each response and observation weight
variable must be a numeric vector.

You must specify the response variables in `Tbl`

by using
`ResponseVarNames`

or `formula`

, and specify the
observation weights in `Tbl`

by using `Weights`

.

When you specify the response variables by using

`ResponseVarNames`

,`fitrchains`

uses the remaining variables as predictors. To use a subset of the remaining variables in`Tbl`

as predictors, specify predictor variables by using`PredictorNames`

.When you define a model specification by using

`formula`

,`fitrchains`

uses a subset of the variables in`Tbl`

as predictor variables and response variables, as specified in`formula`

.

**Data Types: **`table`

`ResponseVarNames`

— Names of response variables

names of variables in `Tbl`

Names of the response variables, specified as the names of variables in
`Tbl`

. Each response variable must be a numeric vector.

You must specify `ResponseVarNames`

as a string array or a cell
array of character vectors. For example, if `Tbl`

stores the response
variables `Y1`

and `Y2`

as `Tbl.Y1`

and `Tbl.Y2`

, respectively, then specify
`ResponseVarNames`

as `["Y1","Y2"]`

. Otherwise,
the software treats the `Y1`

and `Y2`

columns of
`Tbl`

as predictors when training the model.

**Data Types: **`string`

| `cell`

`formula`

— Explanatory model of response variables and subset of predictor variables

character vector | string scalar

Explanatory model of the response variables and a subset of the predictor variables,
specified as character vector or string scalar in the form
`"Y1,Y2~x1+x2+x3"`

. In this form, `Y1`

and
`Y2`

represent the response variables, and `x1`

,
`x2`

, and `x3`

represent the predictor
variables.

To specify a subset of variables in `Tbl`

as predictors for
training the model, use a formula. If you specify a formula, then the software does not
use any variables in `Tbl`

that do not appear in
`formula`

, except for observation weights (if specified).

The variable names in the formula must be both variable names in `Tbl`

(`Tbl.Properties.VariableNames`

) and valid MATLAB^{®} identifiers. You can verify the variable names in `Tbl`

by
using the `isvarname`

function. If the variable names
are not valid, then you can convert them by using the `matlab.lang.makeValidName`

function.

**Data Types: **`char`

| `string`

`Y`

— Response data

numeric matrix | numeric table

Response data, specified as a numeric matrix or table. Each row corresponds to an
observation, and each column corresponds to a response variable. `Y`

must have the same number of rows as the predictor data `X`

.

**Data Types: **`single`

| `double`

| `table`

`X`

— Predictor data

numeric matrix | numeric table

Predictor data, specified as a numeric matrix or table. Each row corresponds to an
observation, and each column corresponds to a predictor. Optionally, when
`X`

is a table, it can contain a column for the observation
weights. `X`

and `Y`

must have the same number of
rows.

If

`X`

is a matrix, you can specify the names of the predictors in the order of their appearance in`X`

by using the`PredictorNames`

name-value argument.If

`X`

is a table, you can use a subset of the variables in`X`

as predictors. To do so, specify predictor variables by using`PredictorNames`

.

**Data Types: **`single`

| `double`

**Note**

The software treats `NaN`

, empty character vector
(`''`

), empty string (`""`

),
`<missing>`

, and `<undefined>`

elements as missing
data. Before training `Mdl`

, the software removes observations with
missing values in the response data, although the model retains the observations in its data
properties (for example, `Mdl.X`

and `Mdl.Y`

). The
treatment of observations with missing values in the predictor data depends on the
regression model type specified by the `Learner`

name-value argument.

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

**Example: **`fitrchains(Tbl,["Y1","Y2"],Learner="svm",ChainPredictedResponse=true)`

creates a support vector machine (SVM) regression model with two response variables and uses
predicted responses in the regression chains to train the model.

`ChainOrder`

— Order of response variables in regression chain

`[]`

(default) | positive integer vector

Order of the response variables in the regression chain, specified as a positive integer vector. For more information, see Regression Chains.

If you specify `ChainOrder`

, `Mdl`

contains
only one regression chain.

**Example: **`ChainOrder=[1 3 2]`

**Data Types: **`single`

| `double`

`ChainPredictedResponse`

— Flag to use predicted responses in regression chains

`false`

or `0`

(default) | `true`

or `1`

Flag to use predicted responses in the regression chains, specified as a numeric
or logical `0`

(`false`

) or `1`

(`true`

).

A value of

`0`

indicates to train models with observed values for response variables used as predictors.A value of

`1`

indicates to train models with predicted values for response variables used as predictors.

For more information, see Regression Chains.

**Example: **`ChainPredictedResponse=true`

**Data Types: **`single`

| `double`

| `logical`

`Learner`

— Type of regression model to train

`"bag"`

(default) | `"gam"`

| `"gp"`

| `"kernel"`

| `"linear"`

| `"lsboost"`

| `"svm"`

| `"tree"`

| template object

Type of regression model to train, specified as one of the values in this table.

Value | Regression Model Type |
---|---|

`"bag"` or `templateEnsemble` template (with
the method specified as `"Bag"` and the weak learners
specified as `"Tree"` ) | Bagged ensemble of trees |

`"gam"` or `templateGAM` template | General additive model (GAM) |

`"gp"` or `templateGP` template | Gaussian process regression (GPR) |

`"kernel"` or `templateKernel` template | Kernel model |

`"linear"` or `templateLinear` template | Linear model |

`"lsboost"` or `templateEnsemble` template (with
the method specified as `"LSBoost"` and the weak learners
specified as `"Tree"` ) | Boosted ensemble of trees |

`"svm"` or `templateSVM` template | Support vector machine (SVM) |

`"tree"` or `templateTree` template | Decision tree |

**Example: **`Learner="svm"`

**Example: **`Learner=templateEnsemble("LSBoost",50,"Tree")`

`MaxNumChains`

— Maximum number of regression chains

`10`

(default) | positive scalar

Maximum number of regression chains, specified as a positive scalar. Because each
regression chain contains one regression model for each response variable, specify
`MaxNumChains`

to limit the total number of regression models to
train.

**Example: **`MaxNumChains=5`

**Data Types: **`single`

| `double`

`CategoricalPredictors`

— Categorical predictors list

vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `"all"`

Categorical predictors list, specified as one of the values in this table.

Value | Description |
---|---|

Vector of positive integers |
Each entry in the vector is an index value indicating that the corresponding predictor is
categorical. The index values are between 1 and If |

Logical vector |
A |

Character matrix | Each row of the matrix is the name of a predictor variable. The names must match the entries in `PredictorNames` . Pad the names with extra blanks so each row of the character matrix has the same length. |

String array or cell array of character vectors | Each element in the array is the name of a predictor variable. The names must match the entries in `PredictorNames` . |

`"all"` | All predictors are categorical. |

By default, if the predictor data is in a table, `fitrchains`

assumes that a variable is categorical if it is a logical vector, categorical vector,
character array, string array, or cell array of character vectors. However, learners
that use decision trees assume that mathematically ordered categorical vectors are
continuous variables. If the predictor data is a matrix,
`fitrchains`

assumes that all predictors are continuous. To
identify any other predictors as categorical predictors, specify them by using the
`CategoricalPredictors`

name-value argument.

The software creates dummy variables based on the `Learner`

name-value argument and the underlying fitting function used to create the regression
models in the `Learners`

property of `Mdl`

. For more information on
how fitting functions treat categorical predictors, see Automatic Creation of Dummy Variables.

**Example: **`CategoricalPredictors="all"`

**Data Types: **`single`

| `double`

| `logical`

| `char`

| `string`

| `cell`

`Options`

— Options for computing in parallel and setting random streams

structure

Options for computing in parallel and setting random streams, specified as a
structure. Create the `Options`

structure using `statset`

. This table lists the option fields and their
values.

Field Name | Value | Default |
---|---|---|

`UseParallel` | Set this value to `true` to run computations in
parallel. | `false` |

`UseSubstreams` | Set this value to To compute
reproducibly, set | `false` |

`Streams` | Specify this value as a `RandStream` object or
cell array of such objects. Use a single object except when the
`UseParallel` value is `true`
and the `UseSubstreams` value is
`false` . In that case, use a cell array that
has the same size as the parallel pool. | If you do not specify `Streams` , then
`fitrchains` uses the default stream or
streams. |

**Note**

You need Parallel Computing Toolbox™ to run computations in parallel.

**Example: **`Options=statset(UseParallel=true,UseSubstreams=true,Streams=RandStream("mlfg6331_64"))`

**Data Types: **`struct`

`PredictorNames`

— Predictor variable names

string array | cell array of character vectors

Predictor variable names, specified as a string array or a cell array of character vectors.

If you supply predictor data using a numeric matrix, then you can use

`PredictorNames`

to assign names to the predictor variables.The order of the names in

`PredictorNames`

must correspond to the order of the columns in the matrix.By default,

`PredictorNames`

is`{'x1','x2',...}`

.

If you supply predictor data using a table, then you can use

`PredictorNames`

to specify which variables to use as predictors during training.`PredictorNames`

must be a subset of the variable names in the table and cannot include the names of response variables.By default,

`PredictorNames`

contains the names of all predictor variables.

**Example: **`PredictorNames=["SepalLength","SepalWidth","PetalLength","PetalWidth"]`

**Data Types: **`string`

| `cell`

`ResponseName`

— Response variable names

string array | cell array of character vectors

Response variable names, specified as a string array or a cell array of character vectors.

If you supply

`Y`

, then you can use`ResponseName`

to specify names for the response variables.If you supply

`ResponseVarNames`

or`formula`

, then you cannot use`ResponseName`

.

**Example: **`ResponseName=["Response1","Response2"]`

**Data Types: **`string`

| `cell`

`Weights`

— Observation weights

nonnegative numeric vector | name of variable in `X`

or `Tbl`

Observation weights, specified as a nonnegative numeric vector or the name of a
variable in `X`

or `Tbl`

. The software weights
each observation in `X`

or `Tbl`

with the
corresponding value in `Weights`

. The length of
`Weights`

must equal the number of observations in
`X`

or `Tbl`

.

If you specify the input data as a table, then `Weights`

can be
the name of a variable in the table that contains a numeric vector. In this case, you
must specify `Weights`

as a character vector or string scalar. For
example, if the weights vector `W`

is stored as
`Tbl.W`

, then specify it as `"W"`

. Otherwise, the
software treats the `W`

column of `Tbl`

as a
predictor during the training process.

By default, `Weights`

is `ones(n,1)`

, where
`n`

is the number of observations in `X`

or
`Tbl`

.

Before training, `fitrchains`

normalizes the weights to sum to
1.

**Data Types: **`single`

| `double`

| `char`

| `string`

## Output Arguments

`Mdl`

— Multiresponse regression model

`RegressionChainEnsemble`

model object

Multiresponse regression model, returned as a `RegressionChainEnsemble`

model object. To access the properties of
`Mdl`

, use dot notation.

## Algorithms

### Regression Chains

A *regression chain* is a sequence of regression models in which
the response variables for previous models become predictor variables for subsequent models.
If the training data consists of *p* predictor variables and
*k* response variables, then a regression chain includes exactly
*k* models, each with a different response variable. The first model has
*p* predictors, the second model has *p*+1 predictors, and so on, with the last model having *p*+*k*–1 predictors.

For example, suppose that the predictor data in `X`

or
`Tbl`

consists of three variables, *x1*,
*x2*, and *x3*, and the response data in
`Y`

or `Tbl`

consists of two variables,
*y1* and *y2*. A regression chain with the chain order
`[2 1]`

(`ChainOrder`

) consists of a model trained on
the predictor data [*x1*, *x2*,
*x3*] and the response variable *y2*, followed by a model
trained on the predictor data [*x1*, *x2*, *x3*,
*y2*] and the response variable *y1*.

If you specify to use predicted responses in regression chains
(`ChainPredictedResponse`

), the predictor data for the second model is [*x1*, *x2*, *x3*,
*yfit2*], where *yfit2* contains the predicted responses returned
by the first model.

In general, `fitrchains`

returns an ensemble of regression chains
`Mdl`

, where each row of `Mdl.Learners`

corresponds to
one regression chain.

## References

[1] Spyromitros-Xioufis, Eleftherios,
Grigorios Tsoumakas, William Groves, and Ioannis Vlahavas. "Multi-Target Regression via Input
Space Expansion: Treating Targets as Inputs." *Machine
Learning* 104, no. 1 (July 2016): 55–98.
https://doi.org/10.1007/s10994-016-5546-z.

## Extended Capabilities

### Automatic Parallel Support

Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, specify the `Options`

name-value argument in the call to
this function and set the `UseParallel`

field of the
options structure to `true`

using
`statset`

:

`Options=statset(UseParallel=true)`

For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

## Version History

**Introduced in R2024b**

## See Also

`RegressionChainEnsemble`

| `CompactRegressionChainEnsemble`

| `loss`

| `predict`

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)