# quantileError

Quantile loss using bag of regression trees

## Syntax

## Description

returns
half of the mean absolute deviation (MAD) from comparing the true
responses in the table `err`

= quantileError(`Mdl`

,`X`

)`X`

to the predicted medians
resulting from applying the bag of regression trees `Mdl`

to
the observations of the predictor data in `X`

.

`Mdl`

must be a`TreeBagger`

model object.The response variable name in

`X`

must have the same name as the response variable in the table containing the training data.

uses
the true response and predictor variables contained in the table `err`

= quantileError(`Mdl`

,`X`

,`ResponseVarName`

)`X`

. `ResponseVarName`

is
the name of the response variable and `Mdl.PredictorNames`

contain
the names of the predictor variables.

uses
any of the previous syntaxes and additional options specified by one
or more `err`

= quantileError(___,`Name,Value`

)`Name,Value`

pair arguments. For example,
specify quantile probabilities, the error type, or which trees to
include in the quantile-regression-error estimation.

## Input Arguments

`Mdl`

— Bag of regression trees

`TreeBagger`

model object (default)

Bag of regression trees, specified as a `TreeBagger`

model object created by the `TreeBagger`

function. The value of `Mdl.Method`

must be
`regression`

.

`X`

— Sample data

numeric matrix | table

Sample data used to estimate quantiles, specified as a numeric matrix or table.

Each row of `X`

corresponds to one observation,
and each column corresponds to one variable. If you specify `Y`

,
then the number of rows in `X`

must be equal to the
length of `Y`

.

For a numeric matrix:

The variables making up the columns of

`X`

must have the same order as the predictor variables that trained`Mdl`

(stored in`Mdl.PredictorNames`

).If you trained

`Mdl`

using a table (for example,`Tbl`

), then`X`

can be a numeric matrix if`Tbl`

contains all numeric predictor variables. If`Tbl`

contains heterogeneous predictor variables (for example, numeric and categorical data types), then`quantileError`

throws an error.Specify

`Y`

for the true responses.

For a table:

`quantileError`

does not support multicolumn variables or cell arrays other than cell arrays of character vectors.If you trained

`Mdl`

using a table (for example,`Tbl`

), then all predictor variables in`X`

must have the same variable names and data types as those variables that trained`Mdl`

(stored in`Mdl.PredictorNames`

). However, the column order of`X`

does not need to correspond to the column order of`Tbl`

.`Tbl`

and`X`

can contain additional variables (response variables, observation weights, etc.).If you trained

`Mdl`

using a numeric matrix, then the predictor names in`Mdl.PredictorNames`

and corresponding predictor variable names in`X`

must be the same. To specify predictor names during training, see the`PredictorNames`

name-value pair argument of the`TreeBagger`

function. All predictor variables in`X`

must be numeric vectors.`X`

can contain additional variables (response variables, observation weights, etc.).If

`X`

contains the response variable:If the response variable has the same name as the response variable that trained

`Mdl`

, then you do not have to supply the response variable name or vector of true responses.`quantileError`

uses that variable for the true responses by default.You can specify

`ResponseVarName`

or`Y`

for the true responses.

**Data Types: **`table`

| `double`

| `single`

`ResponseVarName`

— Response variable name

character vector | string scalar

Response variable name, specified as a character vector or string scalar.
`ResponseVarName`

must be the name of the response
variable in the table of sample data `X`

.

If the table `X`

contains the response variable,
and it has the same name as the response variable used to train `Mdl`

,
then you do not have to specify `ResponseVarName`

. `quantileError`

uses
that variable for the true responses by default.

**Data Types: **`char`

| `string`

`Y`

— True responses

numeric vector

True responses, specified as a numeric vector. The number of rows in `X`

must
be equal to the length of `Y`

.

**Data Types: **`double`

| `single`

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

`Mode`

— Ensemble error type

`'ensemble'`

(default) | `'cumulative'`

| `'individual'`

Ensemble error type, specified as the comma-separated pair consisting
of `'Mode'`

and a value in this table. Suppose * tau* is
the value of

`Quantile`

.Value | Description |
---|---|

`'cumulative'` |
`err(` is
the quantile
regression error using the learners in `Mdl.Trees(1:` only. |

`'ensemble'` |
`err(` is
the ensemble
quantile regression error. |

`'individual'` |
`err(` is
the quantile
regression error using the learner in `Mdl.Trees(` only. |

For `'cumulative'`

and `'individual'`

,
if you include fewer trees in quantile estimation using `Trees`

or `UseInstanceForTree`

,
then the number of rows in `err`

decreases from `Mdl.NumTrees`

.

**Example: **`'Mode','cumulative'`

`Weights`

— Observation weights

`ones(size(X,1),1)`

(default) | numeric vector of positive values

Observation weights, specified as the comma-separated pair consisting
of `'Weights'`

and a numeric vector of positive values
with length equal to `size(X,1)`

. `quantileError`

uses `Weights`

to
compute the weighted average of the deviations when estimating the
quantile regression error.

By default, `quantileError`

attributes a weight
of `1`

to each observation, which yields an unweighted
average of the deviations.

`Quantile`

— Quantile probability

`0.5`

(default) | numeric vector containing values in [0,1]

Quantile probability, specified as the comma-separated pair
consisting of `'Quantile'`

and a numeric vector containing
values in the interval [0,1]. For each element in `Quantile`

, `quantileError`

returns
corresponding quantile regression errors for all probabilities in `Quantile`

.

**Example: **`'Quantile',[0 0.25 0.5 0.75 1]`

**Data Types: **`single`

| `double`

`Trees`

— Indices of trees to use in response estimation

`'all'`

(default) | numeric vector of positive integers

Indices of trees to use in response estimation, specified as
the comma-separated pair consisting of `'Trees'`

and `'all'`

or
a numeric vector of positive integers. Indices correspond to the cells
of `Mdl.Trees`

; each cell therein contains a tree
in the ensemble. The maximum value of `Trees`

must
be less than or equal to the number of trees in the ensemble (`Mdl.NumTrees`

).

For `'all'`

, `quantileError`

uses
all trees in the ensemble (that is, the indices `1:Mdl.NumTrees`

).

Values other than the default can affect the number of rows
in `err`

.

**Example: **`'Trees',[1 10 Mdl.NumTrees]`

**Data Types: **`char`

| `string`

| `single`

| `double`

`TreeWeights`

— Weights to attribute to responses from individual trees

`ones(Mdl.NumTrees,1)`

(default) | numeric vector of nonnegative values

Weights to attribute to responses from individual trees, specified
as the comma-separated pair consisting of `'TreeWeights'`

and
a numeric vector of `numel(`

nonnegative
values. * trees*)

*is the value of*

`trees`

`Trees`

.If you specify `'Mode','individual'`

, then `quantileError`

ignores `TreeWeights`

.

**Data Types: **`single`

| `double`

`UseInstanceForTree`

— Indicators specifying which trees to use to make predictions for each observation

`'all'`

(default) | logical matrix

Indicators specifying which trees to use to make predictions
for each observation, specified as the comma-separated pair consisting
of `'UseInstanceForTree'`

and an *n*-by-`Mdl.Trees`

logical
matrix. *n* is the number of observations (rows)
in `X`

. Rows of `UseInstanceForTree`

correspond
to observations and columns correspond to learners in `Mdl.Trees`

. `'all'`

indicates
to use all trees for all observations when estimating the quantiles.

If `UseInstanceForTree(`

= * j*,

*)*

`k`

`true`

,
then `quantileError`

uses the tree in `Mdl.Trees(``k`

)

when
it predicts the response for the observation `X(``j`

,:)

.You can estimate quantiles using the response data in `Mdl.Y`

directly
instead of using the predictions from the random forest by specifying
a row composed entirely of `false`

values. For example,
to estimate the quantile for observation * j* using
the response data, and to use the predictions from the random forest
for all other observations, specify this matrix:

UseInstanceForTree = true(size(Mdl.X,2),Mdl.NumTrees); UseInstanceForTree(j,:) = false(1,Mdl.NumTrees);

Values other than the default can affect the number of rows
in `err`

. Also, the value of `Trees`

affects
the value of `UseInstanceForTree`

. Suppose that * U* is
the value of

`UseInstanceForTree`

. `quantileError`

ignores
the columns of *corresponding to trees not being used in estimation from the specification of*

`U`

`Trees`

.
That is, `quantileError`

resets the value of `'UseInstanceForTree'`

to `U(:,``trees`

)

,
where *is the value of*

`trees`

`'Trees'`

.**Data Types: **`char`

| `string`

| `logical`

## Output Arguments

`err`

— Half of quantile regression error

numeric scalar | numeric matrix

Half of the quantile regression error,
returned as a numeric scalar or * T*-by-

`numel(``tau`

)

matrix. *is the value of*

`tau`

`Quantile`

.* T* depends on the values of

`Mode`

, `Trees`

, `UseInstanceForTree`

,
and `Quantile`

. Suppose that you specify `'Trees',``trees`

and
you use the default value of `'UseInstanceForTree'`

.For

`'Mode','cumulative'`

,`err`

is a`numel(`

-by-)`trees`

`numel(`

numeric matrix.)`tau`

`err(`

is the,`j`

)`k`

cumulative quantile regression error using the learners in(`tau`

)`k`

`Mdl.Trees(`

.(1:`trees`

))`j`

For

`'Mode','ensemble'`

,`err`

is a`1`

-by-`numel(`

numeric vector.)`tau`

`err(`

is the)`k`

cumulative quantile regression error using the learners in(`tau`

)`k`

`Mdl.Trees(`

.)`trees`

For

`'Mode','individual'`

,`err`

is a`numel(`

-by-)`trees`

`numel(`

numeric matrix.)`tau`

`err(`

is the,`j`

)`k`

quantile regression error using the learner in(`tau`

)`k`

`Mdl.Trees(`

.(`trees`

))`j`

## Examples

### Estimate In-Sample Quantile Regression Error

Load the `carsmall`

data set. Consider a model that predicts the fuel economy of a car given its engine displacement, weight, and number of cylinders. Consider `Cylinders`

a categorical variable.

```
load carsmall
Cylinders = categorical(Cylinders);
X = table(Displacement,Weight,Cylinders,MPG);
```

Train an ensemble of bagged regression trees using the entire data set. Specify 100 weak learners.

rng(1); % For reproducibility Mdl = TreeBagger(100,X,'MPG','Method','regression');

`Mdl`

is a `TreeBagger`

ensemble.

Perform quantile regression, and estimate the MAD of the entire ensemble using the predicted conditional medians.

err = quantileError(Mdl,X)

err = 1.2339

Because `X`

is a table containing the response and commensurate variable names, you do not have to specify the response variable name or data. However, you can specify the response using this syntax.

`err = quantileError(Mdl,X,'MPG')`

err = 1.2339

### Find Appropriate Ensemble Size Using Quantile Regression Error

Load the `carsmall`

data set. Consider a model that predicts the fuel economy of a car given its engine displacement, weight, and number of cylinders.

```
load carsmall
X = table(Displacement,Weight,Cylinders,MPG);
```

Randomly split the data into two sets: 75% training and 25% testing. Extract the subset indices.

rng(1); % For reproducibility cvp = cvpartition(size(X,1),'Holdout',0.25); idxTrn = training(cvp); idxTest = test(cvp);

Train an ensemble of bagged regression trees using the training set. Specify 250 weak learners.

Mdl = TreeBagger(250,X(idxTrn,:),'MPG','Method','regression');

Estimate the cumulative 0.25, 0.5, and 0.75 quantile regression errors for the test set. Pass the predictor data in as a numeric matrix, and the response data in as a vector.

err = quantileError(Mdl,X{idxTest,1:3},MPG(idxTest),'Quantile',[0.25 0.5 0.75],... 'Mode','cumulative');

`err`

is a 250-by-3 matrix of cumulative quantile regression errors. Columns correspond to quantile probabilities and rows correspond to trees in the ensemble. The errors are cumulative, so they incorporate aggregated predictions from previous trees. Although, `Mdl`

was trained using a table, if all predictor variables in the table are numeric, then you can supply a matrix of predictor data instead.

Plot the cumulative quantile errors on the same plot.

figure; plot(err); legend('0.25 quantile error','0.5 quantile error','0.75 quantile error'); ylabel('Quantile error'); xlabel('Tree index'); title('Cumulative Quantile Regression Error')

Training using about 60 trees appears to be enough for the first two quartiles, but the third quartile requires about 150 trees.

## More About

### Quantile Regression Error

The *quantile regression error* of
a model given observed predictor data and responses is the weighted
mean absolute deviation (MAD). If the model under-predicts the response,
then deviation weights are *τ*, the quantile
probability. If the model over-predicts, then deviation weights are
1 – *τ*.

That is, the *τ* quantile regression
error is

$${L}_{\tau}=\tau \frac{{\displaystyle \sum _{\{j:{y}_{j}\ge {\widehat{y}}_{\tau ,j}\}}{w}_{j}\left({y}_{j}-{\widehat{y}}_{\tau ,j}\right)}}{{\displaystyle \sum _{j=1}^{n}{w}_{j}}}+\left(1-\tau \right)\frac{{\displaystyle \sum _{\{j:{y}_{j}<{\widehat{y}}_{\tau ,j}\}}{w}_{j}}\left({\widehat{y}}_{\tau ,j}-{y}_{j}\right)}{{\displaystyle \sum _{j=1}^{n}{w}_{j}}}.$$

*y _{j}* is
true response

*j*, $${\widehat{y}}_{\tau ,j}$$ is the

*τ*quantile that the model predicts, and

*w*is observation weight

_{j}*j*.

## Tips

To tune the number of trees in the ensemble, set

`'Mode','cumulative'`

and plot the quantile regression errors with respect to tree indices. The maximal number of required trees is the tree index where the quantile regression error appears to level off.To investigate the performance of a model when the training sample is small, use

`oobQuantileError`

instead.

## References

[1] Breiman, L. *Random Forests.* Machine
Learning 45, pp. 5–32, 2001.

[2] Meinshausen, N. “Quantile Regression
Forests.” *Journal of Machine Learning Research*,
Vol. 7, 2006, pp. 983–999.

## Version History

**Introduced in R2016b**

## See Also

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)