Main Content

Residuals are useful for detecting outlying *y* values and checking the linear regression assumptions with respect to the error term in the regression model. High-leverage observations have smaller residuals because they often shift the regression line or surface closer to them. You can also use residuals to detect some forms of heteroscedasticity and autocorrelation.

The `Residuals`

matrix is an *n*-by-4 table containing four types of residuals, with one row for each observation.

Observed minus fitted values, that is,

$${r}_{i}={y}_{i}-\widehat{y}{}_{i}.$$

Raw residuals divided by the root mean squared error, that is,

$$p{r}_{i}=\frac{{r}_{i}}{\sqrt{MSE}},$$

where *r*_{i} is the raw residual and *MSE* is the mean squared error.

Standardized residuals are raw residuals divided by their estimated standard deviation. The standardized residual for observation *i* is

$$s{t}_{i}=\frac{{r}_{i}}{\sqrt{MSE\left(1-{h}_{ii}\right)}},$$

where *MSE* is the mean squared error and *h*_{ii} is the leverage value for observation *i*.

Studentized residuals are the raw residuals divided by an independent estimate of the residual standard deviation. The residual for observation *i* is divided by an estimate of the error standard deviation based on all observations except for observation *i*.

$$s{r}_{i}=\frac{{r}_{i}}{\sqrt{MS{E}_{\left(i\right)}\left(1-{h}_{ii}\right)}},$$

where *MSE*_{(i)} is the mean squared error of the regression fit calculated by removing observation *i*, and *h*_{ii} is the leverage value for observation *i*. The studentized residual *sr*_{i} has a *t*-distribution with *n* – *p* – 1 degrees of freedom.

After obtaining a fitted model, say, `mdl`

, using `fitlm`

or `stepwiselm`

, you can:

Find the

`Residuals`

table under`mdl`

object.Obtain any of these columns as a vector by indexing into the property using dot notation, for example,

mdl.Residuals.Raw

Plot any of the residuals for the values fitted by your model using

For details, see theplotResiduals(mdl)

`plotResiduals`

method of the`LinearModel`

class.

This example shows how to assess the model assumptions by examining the residuals of a fitted linear regression model.

Load the sample data and store the independent and response variables in a table.

load imports-85 tbl = table(X(:,7),X(:,8),X(:,9),X(:,15),'VariableNames',... {'curb_weight','engine_size','bore','price'});

Fit a linear regression model.

mdl = fitlm(tbl)

mdl = Linear regression model: price ~ 1 + curb_weight + engine_size + bore Estimated Coefficients: Estimate SE tStat pValue __________ _________ _______ __________ (Intercept) 64.095 3.703 17.309 2.0481e-41 curb_weight -0.0086681 0.0011025 -7.8623 2.42e-13 engine_size -0.015806 0.013255 -1.1925 0.23452 bore -2.6998 1.3489 -2.0015 0.046711 Number of observations: 201, Error degrees of freedom: 197 Root Mean Squared Error: 3.95 R-squared: 0.674, Adjusted R-Squared: 0.669 F-statistic vs. constant model: 136, p-value = 1.14e-47

Plot the histogram of raw residuals.

plotResiduals(mdl)

The histogram shows that the residuals are slightly right skewed.

Plot the box plot of all four types of residuals.

Res = table2array(mdl.Residuals); boxplot(Res)

You can see the right-skewed structure of the residuals in the box plot as well.

Plot the normal probability plot of the raw residuals.

`plotResiduals(mdl,'probability')`

This normal probability plot also shows the deviation from normality and the skewness on the right tail of the distribution of residuals.

Plot the residuals versus lagged residuals.

`plotResiduals(mdl,'lagged')`

This graph shows a trend, which indicates a possible correlation among the residuals. You can further check this using `dwtest(mdl)`

. Serial correlation among residuals usually means that the model can be improved.

Plot the symmetry plot of residuals.

`plotResiduals(mdl,'symmetry')`

This plot also suggests that the residuals are not distributed equally around their median, as would be expected for normal distribution.

Plot the residuals versus the fitted values.

`plotResiduals(mdl,'fitted')`

The increase in the variance as the fitted values increase suggests possible heteroscedasticity.

[1] Atkinson, A. T. *Plots, Transformations, and Regression. An Introduction to Graphical Methods of Diagnostic Regression Analysis.* New York: Oxford Statistical Science Series, Oxford University Press, 1987.

[2] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. *Applied Linear Statistical Models*. IRWIN, The McGraw-Hill Companies, Inc., 1996.

[3] Belsley, D. A., E. Kuh, and R. E. Welsch. *Regression Diagnostics, Identifying Influential Data and Sources of Collinearity*. Wiley Series in Probability and Mathematical Statistics, John Wiley and Sons, Inc., 1980.

`dwtest`

| `fitlm`

| `LinearModel`

| `plotDiagnostics`

| `plotResiduals`

| `stepwiselm`