# devianceTest

Analysis of deviance for generalized linear regression model

## Syntax

``tbl = devianceTest(mdl)``

## Description

example

````tbl = devianceTest(mdl)` returns an analysis of deviance table for the generalized linear regression model `mdl`. The table `tbl` gives the result of a test that determines whether the model `mdl` fits significantly better than a constant model.```

## Examples

collapse all

Perform a deviance test on a generalized linear regression model.

Generate sample data using Poisson random numbers with two underlying predictors `X(:,1)` and `X(:,2)`.

```rng('default') % For reproducibility rndvars = randn(100,2); X = [2 + rndvars(:,1),rndvars(:,2)]; mu = exp(1 + X*[1;2]); y = poissrnd(mu);```

Create a generalized linear regression model of Poisson data.

`mdl = fitglm(X,y,'y ~ x1 + x2','Distribution','poisson')`
```mdl = Generalized linear regression model: log(y) ~ 1 + x1 + x2 Distribution = Poisson Estimated Coefficients: Estimate SE tStat pValue ________ _________ ______ ______ (Intercept) 1.0405 0.022122 47.034 0 x1 0.9968 0.003362 296.49 0 x2 1.987 0.0063433 313.24 0 100 observations, 97 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 2.95e+05, p-value = 0 ```

Test whether the model differs from a constant in a statistically significant way.

`tbl = devianceTest(mdl)`
```tbl=2×4 table Deviance DFE chi2Stat pValue __________ ___ __________ ______ log(y) ~ 1 2.9544e+05 99 log(y) ~ 1 + x1 + x2 107.4 97 2.9533e+05 0 ```

The small p-value indicates that the model significantly differs from a constant. Note that the model display of `mdl` includes the statistics shown in the second row of the table.

## Input Arguments

collapse all

Generalized linear regression model, specified as a `GeneralizedLinearModel` object created using `fitglm` or `stepwiseglm`, or a `CompactGeneralizedLinearModel` object created using `compact`.

## Output Arguments

collapse all

Analysis of deviance summary statistics, returned as a table.

`tbl` contains analysis of deviance statistics for both a constant model and the model `mdl`. The table includes these columns for each model.

ColumnDescription
`Deviance`

Deviance is twice the difference between the loglikelihoods of the corresponding model (`mdl` or constant) and the saturated model. For more information, see Deviance.

`DFE`

Degrees of freedom for the error (residuals), equal to np, where n is the number of observations, and p is the number of estimated coefficients

`chi2Stat`

F-statistic or chi-squared statistic, depending on whether the dispersion is estimated (F-statistic) or not (chi-squared statistic)

• F-statistic is the difference between the deviance of the constant model and the deviance of the full model, divided by the estimated dispersion.

• Chi-squared statistic is the difference between the deviance of the constant model and the deviance of the full model.

`pValue`

p-value associated with the test: chi-squared statistic with p – 1 degrees of freedom, or F-statistic with p – 1 numerator degrees of freedom and `DFE` denominator degrees of freedom, where p is the number of estimated coefficients

collapse all

### Deviance

Deviance is a generalization of the residual sum of squares. It measures the goodness of fit compared to a saturated model.

Deviance of a model M1 is twice the difference between the loglikelihood of the model M1 and the saturated model Ms. A saturated model is a model with the maximum number of parameters that you can estimate.

For example, if you have n observations (yi, i = 1, 2, ..., n) with potentially different values for XiTβ, then you can define a saturated model with n parameters. Let L(b,y) denote the maximum value of the likelihood function for a model with the parameters b. Then the deviance of the model M1 is

`$-2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right),$`

where b1 and bs contain the estimated parameters for the model M1 and the saturated model, respectively. The deviance has a chi-square distribution with np degrees of freedom, where n is the number of parameters in the saturated model and p is the number of parameters in the model M1.

Assume you have two different generalized linear regression models M1 and M2, and M1 has a subset of the terms in M2. You can assess the fit of the models by comparing the deviances D1 and D2 of the two models. The difference of the deviances is

`$\begin{array}{l}D={D}_{2}-{D}_{1}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)+2\left(\mathrm{log}L\left({b}_{1},y\right)-\mathrm{log}L\left({b}_{S},y\right)\right)\\ \text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{\hspace{0.17em}}=-2\left(\mathrm{log}L\left({b}_{2},y\right)-\mathrm{log}L\left({b}_{1},y\right)\right).\end{array}$`

Asymptotically, the difference D has a chi-square distribution with degrees of freedom v equal to the difference in the number of parameters estimated in M1 and M2. You can obtain the p-value for this test by using `1 – chi2cdf(D,v)`.

Typically, you examine D using a model M2 with a constant term and no predictors. Therefore, D has a chi-square distribution with p – 1 degrees of freedom. If the dispersion is estimated, the difference divided by the estimated dispersion has an F distribution with p – 1 numerator degrees of freedom and np denominator degrees of freedom.

## Version History

Introduced in R2012a