# cvloss

Regression error by cross-validation for regression tree model

## Syntax

``E = cvloss(tree)``
``````[E,SE,Nleaf,BestLevel] = cvloss(tree)``````
``E = cvloss(tree,Name=Value)``
``````[E,SE,Nleaf,BestLevel] = cvloss(___)``````

## Description

example

````E = cvloss(tree)` returns the cross-validated regression error (loss) `E` for the trained regression tree model `tree`.```
``````[E,SE,Nleaf,BestLevel] = cvloss(tree)``` also returns the standard error of `E`, the number of leaf nodes of `tree`, and the optimal pruning level for `tree`.```

example

````E = cvloss(tree,Name=Value)` specifies additional options using one or more name-value arguments. For example, you can specify the pruning level, tree size, and number of cross-validation samples.```

example

``````[E,SE,Nleaf,BestLevel] = cvloss(___)``` also returns the standard error of `E`, the number of leaf nodes of `tree`, and the optimal pruning level for `tree`, using any of the input argument combinations in the previous syntaxes.```

## Examples

collapse all

Compute the cross-validation error for a default regression tree.

Load the `carsmall` data set. Consider `Displacement`, `Horsepower`, and `Weight` as predictors of the response `MPG`.

```load carsmall X = [Displacement Horsepower Weight];```

Grow a regression tree using the entire data set.

`Mdl = fitrtree(X,MPG);`

Compute the cross-validation error.

```rng(1); % For reproducibility E = cvloss(Mdl)```
```E = 27.6976 ```

`E` is the 10-fold weighted, average MSE (weighted by number of test observations in the folds).

Apply k-fold cross validation to find the best level to prune a regression tree for all of its subtrees.

Load the `carsmall` data set. Consider `Displacement`, `Horsepower`, and `Weight` as predictors of the response `MPG`.

```load carsmall X = [Displacement Horsepower Weight];```

Grow a regression tree using the entire data set. View the resulting tree.

```Mdl = fitrtree(X,MPG); view(Mdl,Mode="graph")```

Compute the 5-fold cross-validation error for each subtree except for the first two lowest and highest pruning level. Specify to return the best pruning level over all subtrees.

```rng(1); % For reproducibility m = max(Mdl.PruneList) - 1```
```m = 15 ```
`[~,~,~,bestLevel] = cvloss(Mdl,SubTrees=2:m,KFold=5)`
```bestLevel = 14 ```

Of the `15` pruning levels, the best pruning level is `14`.

Prune the tree to the best level. View the resulting tree.

```MdlPrune = prune(Mdl,Level=bestLevel); view(MdlPrune,Mode="graph")```

## Input Arguments

collapse all

Regression tree model, specified as a `RegressionTree` model object trained with `fitrtree`.

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `[E,SE,Nleaf,BestLevel] = cvloss(tree,KFold=5)` specifies to use 5 cross-validation samples.

Pruning level, specified as a vector of nonnegative integers in ascending order or `"all"`.

If you specify a vector, then all elements must be at least `0` and at most `max(tree.PruneList)`. `0` indicates the full, unpruned tree, and `max(tree.PruneList)` indicates the completely pruned tree (that is, just the root node).

If you specify `"all"`, then `cvloss` operates on all subtrees, meaning the entire pruning sequence. This specification is equivalent to using `0:max(tree.PruneList)`.

`cvloss` prunes `tree` to each level specified by `Subtrees`, and then estimates the corresponding output arguments. The size of `Subtrees` determines the size of some output arguments.

For the function to invoke `Subtrees`, the properties `PruneList` and `PruneAlpha` of `tree` must be nonempty. In other words, grow `tree` by setting `Prune="on"` when you use `fitrtree`, or by pruning `tree` using `prune`.

Example: `Subtrees="all"`

Data Types: `single` | `double` | `char` | `string`

Tree size, specified as one of these values:

• `"se"``cvloss` returns the best pruning level (`BestLevel`), which corresponds to the highest pruning level with the loss within one standard deviation of the minimum (`L`+`se`, where `L` and `se` relate to the smallest value in `Subtrees`).

• `"min"``cvloss` returns the best pruning level, which corresponds to the element of `Subtrees` with the smallest loss. This element is usually the smallest element of `Subtrees`.

Example: `TreeSize="min"`

Data Types: `char` | `string`

Number of cross-validation samples, specified as a positive integer value greater than 1.

Example: `KFold=8`

Data Types: `single` | `double`

## Output Arguments

collapse all

Cross-validation mean squared error (loss), returned as a numeric vector of the same length as `Subtrees`.

Standard error of `E`, returned as a numeric vector of the same length as `Subtrees`.

Number of leaf nodes in the pruned subtrees, returned as a numeric vector of the same length as `Subtrees`. Leaf nodes are terminal nodes, which give responses, not splits.

Best pruning level, returned as a numeric scalar whose value depends on `TreeSize`:

• When `TreeSize` is `"se"`, the `loss` function returns the highest pruning level whose loss is within one standard deviation of the minimum (`L`+`se`, where `L` and `se` relate to the smallest value in `Subtrees`).

• When `TreeSize` is `"min"`, the `loss` function returns the element of `Subtrees` with the smallest loss, usually the smallest element of `Subtrees`.

## Alternatives

You can create a cross-validated tree model using `crossval`, and call `kfoldLoss` instead of `cvloss`. If you are going to examine the cross-validated tree more than once, then the alternative can save time.

However, unlike `cvloss`, `kfoldLoss` does not return `SE`, `Nleaf`, or `BestLevel`.

## Version History

Introduced in R2011a