Main Content

Lasso is a regularization technique. Use `lasso`

to:

Reduce the number of predictors in a regression model.

Identify important predictors.

Select among redundant predictors.

Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.

Elastic net is a related technique. Use elastic net when you
have several highly correlated variables. `lasso`

provides
elastic net regularization when you set the `Alpha`

name-value
pair to a number strictly between `0`

and `1`

.

See Lasso and Elastic Net Details.

For lasso regularization of regression ensembles, see `regularize`

.

Lasso is a regularization technique for performing linear regression.
Lasso includes a penalty term that constrains the size of the estimated
coefficients. Therefore, it resembles ridge regression. Lasso
is a *shrinkage estimator*: it generates coefficient
estimates that are biased to be small. Nevertheless, a lasso estimator
can have smaller mean squared error than an ordinary least-squares
estimator when you apply it to new data.

Unlike ridge regression, as the penalty term increases, lasso sets more coefficients to zero. This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative to stepwise regression and other model selection and dimensionality reduction techniques.

Elastic net is a related technique. Elastic net is a hybrid of ridge regression and lasso regularization. Like lasso, elastic net can generate reduced models by generating zero-valued coefficients. Empirical studies have suggested that the elastic net technique can outperform lasso on data with highly correlated predictors.

The *lasso* technique solves this regularization
problem. For a given value of *λ*, a nonnegative
parameter, `lasso`

solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{2N}{\displaystyle \sum _{i=1}^{N}{\left({y}_{i}-{\beta}_{0}-{x}_{i}^{T}\beta \right)}^{2}}+\lambda {\displaystyle \sum _{j=1}^{p}\left|{\beta}_{j}\right|}\right).$$

*N*is the number of observations.*y*is the response at observation_{i}*i*.*x*is data, a vector of_{i}*p*values at observation*i*.*λ*is a positive regularization parameter corresponding to one value of`Lambda`

.The parameters

*β*_{0}and*β*are scalar and*p*-vector respectively.

As *λ* increases, the number of nonzero
components of *β* decreases.

The lasso problem involves the *L*^{1} norm
of *β*, as contrasted with the elastic net
algorithm.

The *elastic net* technique solves this
regularization problem. For an *α* strictly
between 0 and 1, and a nonnegative *λ*, elastic
net solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{2N}{\displaystyle \sum _{i=1}^{N}{\left({y}_{i}-{\beta}_{0}-{x}_{i}^{T}\beta \right)}^{2}}+\lambda {P}_{\alpha}\left(\beta \right)\right),$$

where

$${P}_{\alpha}\left(\beta \right)=\frac{(1-\alpha )}{2}{\Vert \beta \Vert}_{2}^{2}+\alpha {\Vert \beta \Vert}_{1}={\displaystyle \sum _{j=1}^{p}\left(\frac{(1-\alpha )}{2}{\beta}_{j}^{2}+\alpha \left|{\beta}_{j}\right|\right)}.$$

Elastic net is the same as lasso when *α* = 1. As *α* shrinks
toward 0, elastic net approaches `ridge`

regression.
For other values of *α*, the penalty term *P _{α}*(

[1] Tibshirani, R. *Regression shrinkage
and selection via the lasso.* Journal of the Royal Statistical
Society, Series B, Vol 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie. *Regularization
and variable selection via the elastic net.* Journal of
the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320,
2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie.
*Regularization paths for generalized linear models via coordinate
descent.* Journal of Statistical Software, Vol 33, No. 1, 2010.
`https://www.jstatsoft.org/v33/i01`

[4] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning,* 2nd edition. Springer,
New York, 2008.

`fitrlinear`

| `lasso`

| `lassoglm`

| `lassoPlot`

| `ridge`