# Tobit

## Description

Create and analyze a `Tobit`

model object to calculate loss
given default (LGD) using this workflow:

Use

`fitLGDModel`

to create a`Tobit`

model object.Use

`predict`

to predict the LGD.Use

`modelDiscrimination`

to return AUROC and ROC data. You can plot the results using`modelDiscriminationPlot`

.Use

`modelCalibration`

to return the R-squared, RMSE, correlation, and sample mean error of predicted and observed LGD data. You can plot the results using`modelCalibrationPlot`

.

## Creation

### Description

specifies options using one or more name-value pair arguments in addition to
the input arguments in the previous syntax. The optional name-value pair
arguments set the model object properties. For example,
`TobitLGDModel`

= fitLGDModel(___,`Name,Value`

)```
lgdModel = fitLGDModel(data,'tobit','PredictorVars',{'LTV'
'Age'
'Type'},'ResponseVar','LGD','CensoringSide','left','LeftLimit',1e-4)
```

creates a `lgdModel`

object using a
`Tobit`

model type.

### Input Arguments

`data`

— Data for loss given default

table

Data for loss given default, specified as a table.

**Data Types: **`table`

`ModelType`

— Model type

string with value `"Tobit"`

| character vector with value `'Tobit'`

Model type, specified as a string with the value of
`"Tobit"`

or a character vector with the value of
`'Tobit'`

.

**Data Types: **`char`

| `string`

**Name-Value Arguments**

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **```
lgdModel = fitLGDModel(data,'tobit','PredictorVars',{'LTV'
'Age'
'Type'},'ResponseVar','LGD','CensoringSide','left','LeftLimit',1e-4)
```

`ModelID`

— User-defined model ID

`"Tobit"`

(default) | string | character vector

User-defined model ID, specified as the comma-separated pair
consisting of `'ModelID'`

and a string or character
vector. The software uses the `ModelID`

text to
format outputs and is expected to be short.

**Data Types: **`string`

| `char`

`Description`

— User-defined description for model

`""`

(default) | string | character vector

User-defined description for model, specified as the
comma-separated pair consisting of `'Description'`

and a string or character vector.

**Data Types: **`string`

| `char`

`PredictorVars`

— Predictor variables

all columns of `data`

except for
`ResponseVar`

(default) | string array | cell array of character vectors

Predictor variables, specified as the comma-separated pair
consisting of `'PredictorVars'`

and a string array
or cell array of character vectors. `PredictorVars`

indicates which columns in the `data`

input
contain the predictor information. By default,
`PredictorVars`

is set to all the columns in
the `data`

input except for
`ResponseVar`

.

**Data Types: **`string`

| `cell`

`ResponseVar`

— Response variable

last column of `data`

(default) | string | character vector

Response variable, specified as the comma-separated pair
consisting of `'ResponseVar'`

and a string or
character vector. The response variable contains the LGD data and
must be a numeric variable. An LGD value of `0`

indicates no loss (full recovery), `1`

indicates
total loss (no recovery), and values between `0`

and `1`

indicate a partial loss. By default,
`ResponseVar`

is set to the last column.

**Data Types: **`string`

| `char`

`CensoringSide`

— Censoring side

`"both"`

(default) | character vector with value of `'left'`

,
`'right'`

, or `'both'`

| string with value of `"left"`

,
`"right"`

, or `"both"`

Censoring side, specified as the comma-separated pair consisting
of `'CensoringSide'`

and a character vector or
string. `CensoringSide`

indicates whether the
desired Tobit model is left-censored, right-censored, or censored on
both sides.

**Data Types: **`string`

| `char`

`LeftLimit`

— Left-censoring limit

`0`

(default) | numeric between `0`

and
`1`

Left-censoring limit, specified as the comma-separated pair
consisting of `'LeftLimit'`

and a scalar numeric
between `0`

and `1`

.

**Data Types: **`double`

`RightLimit`

— Right-censoring limit

`1`

(default) | numeric between `0`

and
`1`

Right-censoring limit, specified as the comma-separated pair
consisting of `'RightLimit'`

and a scalar numeric
between `0`

and `1`

.

**Data Types: **`double`

`SolverOptions`

— `optimoptions`

object

object

Options for fitting, specified as the comma-separated pair
consisting of `'SolverOptions'`

and an
`optimoptions`

object that is created using
`optimoptions`

from
Optimization Toolbox™. The defaults for the `optimoptions`

object are:

`"Display"`

—`"none"`

`"Algorithm"`

—`"sqp"`

`"MaxFunctionEvaluations"`

—`500`

✕ Number of model coefficients`"MaxIterations"`

— The number of Tobit model coefficients is determined at run time, it depends on the number of predictors and the number of categories in the categorical predictors.

**Note**

When using `optimoptions`

with a Tobit
model, specify the `SolverName`

as
`fmincon`

.

**Data Types: **`object`

## Properties

`ModelID`

— User-defined model ID

`Tobit`

(default) | string

User-defined model ID, returned as a string.

**Data Types: **`string`

`Description`

— User-defined description

`""`

(default) | string

User-defined description, returned as a string.

**Data Types: **`string`

`UnderlyingModel`

— Underlying statistical model

compact linear model

This property is read-only.

Underlying statistical model, returned as a compact linear model object.
The compact version of the underlying regression model is an instance of the
`classreg.regr.CompactLinearModel`

class. For more
information, see `fitlm`

and `CompactLinearModel`

.

**Data Types: **`CompactLinearModel`

`PredictorVars`

— Predictor variables

all columns of `data`

except for the
`ResponseVar`

(default) | string array

Predictor variables, returned as a string array.

**Data Types: **`string`

`ResponseVar`

— Response variable

last column of `data`

(default) | string

Response variable, returned as a string.

**Data Types: **`string`

`CensoringSide`

— Censoring side

`"both"`

(default) | string with value of `"left"`

,
`"right"`

, or `"both"`

This property is read-only.

Censoring side, returned as a string.

**Data Types: **`string`

`LeftLimit`

— Left-censoring limit

`0`

(default) | numeric between `0`

and `1`

This property is read-only.

Left-censoring limit, returned as a scalar numeric between
`0`

and `1`

.

**Data Types: **`double`

`RightLimit`

— Right-censoring limit

`1`

(default) | numeric between `0`

and `1`

This property is read-only.

Right-censoring limit, returned as a scalar numeric between
`0`

and `1`

.

**Data Types: **`double`

## Object Functions

`predict` | Predict loss given default |

`modelDiscrimination` | Compute AUROC and ROC data |

`modelDiscriminationPlot` | Plot ROC curve |

`modelCalibration` | Compute R-square, RMSE, correlation, and sample mean error of predicted and observed LGDs |

`modelCalibrationPlot` | Scatter plot of predicted and observed LGDs |

## Examples

### Create Tobit LGD Model

This example shows how to use `fitLGDModel`

to create a `Tobit`

model for loss given default (LGD).

**Load LGD Data**

Load the LGD data.

```
load LGDData.mat
head(data)
```

LTV Age Type LGD _______ _______ ___________ _________ 0.89101 0.39716 residential 0.032659 0.70176 2.0939 residential 0.43564 0.72078 2.7948 residential 0.0064766 0.37013 1.237 residential 0.007947 0.36492 2.5818 residential 0 0.796 1.5957 residential 0.14572 0.60203 1.1599 residential 0.025688 0.92005 0.50253 investment 0.063182

rng('default'); NumObs = height(data); c = cvpartition(NumObs,'HoldOut',0.4); TrainingInd = training(c); TestInd = test(c);

**Create Tobit LGD Model**

Use `fitLGDModel`

to create a `Tobit`

model using the `TrainingInd`

data.

lgdModel = fitLGDModel(data(TrainingInd,:),'Tobit',... 'ModelID','Example Tobit',... 'PredictorVars',{'LTV' 'Age' 'Type'},... 'ResponseVar','LGD',... 'CensoringSide','left',... 'LeftLimit',1e-4); disp(lgdModel)

Tobit with properties: CensoringSide: "left" LeftLimit: 1.0000e-04 RightLimit: 1 ModelID: "Example Tobit" Description: "" UnderlyingModel: [1x1 risk.internal.credit.TobitModel] PredictorVars: ["LTV" "Age" "Type"] ResponseVar: "LGD"

Display the underlying model. The underlying model is a left-censored Tobit model. Use the `'CensoringSide'`

argument and the `'LeftLimit'`

`and`

`'RightLimit'`

arguments to modify the underlying Tobit model.

disp(lgdModel.UnderlyingModel)

Tobit regression model, left-censored: LGD = max(0.0001,Y*) Y* ~ 1 + LTV + Age + Type Estimated coefficients: Estimate SE tStat pValue ________ _________ ______ __________ (Intercept) 0.057356 0.02657 2.1587 0.030985 LTV 0.2003 0.030591 6.5475 7.3413e-11 Age -0.09405 0.0073019 -12.88 0 Type_investment 0.10071 0.017916 5.6212 2.15e-08 (Sigma) 0.28833 0.0055232 52.203 0 Number of observations: 2093 Number of left-censored observations: 547 Number of uncensored observations: 1546 Number of right-censored observations: 0 Log-likelihood: -638.353

**Predict LGD**

For Tobit models, use `predict`

to calculate the predicted LGD value, which is the unconditional expected value of the response, given the predictor values.

predictedLGD = predict(lgdModel,data(TestInd,:))

`predictedLGD = `*1394×1*
0.0871
0.1228
0.3181
0.0926
0.1654
0.2215
0.2347
0.0102
0.1576
0.1969
⋮

**Validate LGD Model**

Use `modelDiscriminationPlot`

to plot the ROC curve.

modelDiscriminationPlot(lgdModel,data(TestInd,:))

Use `modelCalibrationPlot`

to show a scatter plot of the predictions.

modelCalibrationPlot(lgdModel,data(TestInd,:))

## More About

### Loss Given Default Tobit Models

The loss given default (LGD) Tobit models fit a Tobit model to LGD data.

Tobit models are “censored” regression models. Tobit models assume that the response variable can be observed only within certain limits, and no value outside the limits can be observed. In the case of LGD models, the limits are typically 0 (total recovery or cure) and 1 (total loss). A distribution of response values where there is a high frequency of observations at the limits is consistent with the model assumptions. For LGD models, it is common to have distributions with a high proportion of cures, or high proportion of total losses, or both.

The Tobit model combines the following two formulas:

$$\begin{array}{l}Y=\mathrm{min}\left\{\mathrm{max}\left\{L,{Y}^{*}\right\},R\right\}\\ {Y}^{*}={\beta}_{0}+{\beta}_{1}{X}_{1}+\mathrm{...}+{\beta}_{p}{X}_{p}+\sigma \epsilon =X\beta +\sigma \epsilon \end{array}$$

where

*Y*is the observed response variable, the observed LGD data for an LGD model.*L*is the left limit, the lower bound for the response values, typically`0`

for LGD models.*R*is the right limit, the upper bound for the response values, typically`1`

for LGD models.*Y*^{*}is a latent, unobserved variable.β

_{j}is the coefficient of the*j*th predictor (or the intercept for*j*=`0`

).σ is the standard deviation of the error term.

ε is the error term, assumed to follow a standard normal distribution.

The first formula above is written using `min`

and
`max`

operators and is equivalent to

$$Y=\left\{\begin{array}{l}L\text{if}{Y}^{*}\le L\\ {Y}^{*}\text{if}L{Y}^{*}R\\ R\text{if}{Y}^{*}\ge R\end{array}\right\}$$

The standard deviation of the error is explicitly indicated in the formulas.
Unlike traditional regression least-squares estimation, where the standard deviation
of the error can be inferred from the residuals, for Tobit models the estimation is
via maximum likelihood and the standard deviation needs to be handled explicitly
during the estimation. If there are *p* predictor variables, the
Tobit model estimates *p*+2 coefficients, namely, one coefficient
for each predictor, plus an intercept, plus a standard deviation.

Three censoring side options are supported in the Tobit LGD models with the
`CensoringSide`

name-value argument:

`'both'`

— This is the default option, with censoring on both sides. The estimation uses left and right limits.`'left'`

— The left-censored version of the model has no right limit (or*R*= ∞). The relationship between*Y*and*Y*^{*}is*Y*=`max`

{*L*,*Y*^{*}}.`'right'`

— The right-censored version of the model has no left limit (or*L*= -∞). The relationship between*Y*and*Y*^{*}is*Y*=`min`

{*Y*^{*},*R*}.

The parameters of the Tobit model are estimated using maximum likelihood. For
observation *i* = 1,…,*n*, the likelihood function is

$$LF(\beta ,\sigma |{X}_{i},{Y}_{i})=\left\{\begin{array}{l}\Phi (L;{X}_{i}\beta \text{,}\sigma \text{)if}{Y}_{i}\le L\\ \varphi ({Y}_{i}{\text{;X}}_{i}\beta \text{,}\sigma \text{)if}L{Y}_{i}R\\ 1-\Phi (R;{X}_{i}\beta ,\sigma )\text{if}{Y}_{i}\ge R\end{array}\right\}$$

where

$$\Phi $$(

*x*;*m*,*s*) is the cumulative normal distribution with mean*m*and standard deviation*s*.$$\varphi $$(

*x*;*m*,*s*) is the normal density function with mean*m*and standard deviation*s*.

This likelihood function is for models censored on both sides. For left-censored
models, the right limit has no effect, and the likelihood function has two cases
only (*R* = ∞); likewise for right-censored models
(*L* = -∞).

The log-likelihood function is the sum of the logarithm of the likelihood functions for individual observations

$$LLF(\beta ,\sigma |X,Y)={\displaystyle \sum _{i=1}^{n}\mathrm{log}(LF(}\beta ,\sigma |{X}_{i},{Y}_{i}))$$

The parameters are estimated by maximizing the log-likelihood function. The only constraint is that the σ parameter must be positive.

To predict an LGD value, Tobit LGD models return the unconditional expected value of the response, given the predictor values

$$LG{D}_{i}^{pred}=E\left[{Y}_{i}|{X}_{i}\right]$$

The expression for the expected value can be separated into the cases

$$\begin{array}{l}E\left[Y\right]=E\left[Y|Y=L\right]P(Y=L)\\ +E\left[Y|L<Y<R\right]P(L<Y<R)\\ +E\left[Y|Y=R\right]P(Y=R)\end{array}$$

Using the previous expression and the properties of the (truncated) normal distribution, it follows that

$$E\left[{Y}_{i}|{X}_{i}\right]=\Phi ({a}_{i})L+(\Phi ({b}_{i})-\Phi ({a}_{i}))({X}_{i}\beta +\sigma {\lambda}_{i})+(1-\Phi ({b}_{i}))R$$

where

$${a}_{i}=\frac{L-{X}_{i}\beta}{\sigma},{b}_{i}=\frac{R-{X}_{i}\beta}{\sigma},\text{and}{\lambda}_{i}=\frac{\varphi ({a}_{i})-\varphi ({b}_{i})}{\Phi ({b}_{i})-\Phi ({a}_{i})}$$

This expression applies to the models censored on both sides. For models censored
on one side only, the corresponding expressions can be derived from here. For
example, for left-censored models, let the *R* limit in the
expression above go to infinity, and the resulting expression is

$$E\left[{Y}_{i}|{X}_{i}\right]=\Phi ({a}_{i})L+(1-\Phi ({a}_{i}))\left({X}_{i}\beta \text{+}\sigma \text{}\frac{\varphi ({a}_{i})}{1-\Phi ({a}_{i})}\right)$$

Similarly, for right-censored models, the *L* limit is decreased
to minus infinity to get

$$E\left[{Y}_{i}|{X}_{i}\right]=\Phi ({b}_{i})\left({X}_{i}\beta -\sigma \text{}\frac{\varphi ({b}_{i})}{\Phi ({b}_{i})}\right)+(1-\Phi ({b}_{i}))R$$

## References

[1] Baesens, Bart, Daniel Roesch,
and Harald Scheule. *Credit Risk Analytics: Measurement Techniques,
Applications, and Examples in SAS.* Wiley, 2016.

[2] Bellini, Tiziano.
*IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical Guide
with Examples Worked in R and SAS.* San Diego, CA: Elsevier,
2019.

## Version History

**Introduced in R2021a**

### R2023a: `modelAccuracy`

object function is renamed to `modelCalibration`

function

The `modelAccuracy`

object function is renamed to
`modelCalibration`

function. The use of
`modelAccuracy`

is discouraged, use `modelCalibration`

instead.

### R2023a: `modelAccuracyPlot`

object function is renamed to `modelCalibrationPlot`

function

The `modelAccuracyPlot`

object function is renamed to
`modelCalibrationPlot`

function. The use of
`modelAccuracyPlot`

is discouraged, use `modelCalibrationPlot`

instead.

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)