Lasso or elastic net regularization for linear models

`B = lasso(X,y)`

`B = lasso(X,y,Name,Value)`

```
[B,FitInfo]
= lasso(___)
```

returns fitted least-squares regression coefficients for linear models of the
predictor data `B`

= lasso(`X`

,`y`

)`X`

and the response `y`

. Each
column of `B`

corresponds to a particular regularization
coefficient in `Lambda`

. By default, `lasso`

performs lasso regularization using a geometric sequence of
`Lambda`

values.

fits regularized regressions with additional options specified by one or more
name-value pair arguments. For example, `B`

= lasso(`X`

,`y`

,`Name,Value`

)`'Alpha',0.5`

sets elastic
net as the regularization method, with the parameter `Alpha`

equal
to 0.5.

Construct a data set with redundant predictors and identify those predictors by using `lasso`

.

Create a matrix `X`

of 100 five-dimensional normal variables. Create a response vector `y`

from just two components of `X`

, and add a small amount of noise.

rng default % For reproducibility X = randn(100,5); weights = [0;2;0;-3;0]; % Only two nonzero coefficients y = X*weights + randn(100,1)*0.1; % Small added noise

Construct the default lasso fit.

B = lasso(X,y);

Find the coefficient vector for the 25th `Lambda`

value in `B`

.

B(:,25)

`ans = `*5×1*
0
1.6093
0
-2.5865
0

`lasso`

identifies and removes the redundant predictors.

Construct a data set with redundant predictors and identify those predictors by using cross-validated `lasso`

.

Create a matrix `X`

of 100 five-dimensional normal variables. Create a response vector `y`

from two components of `X`

, and add a small amount of noise.

rng default % For reproducibility X = randn(100,5); weights = [0;2;0;-3;0]; % Only two nonzero coefficients y = X*weights + randn(100,1)*0.1; % Small added noise

Construct the lasso fit by using 10-fold cross-validation with labeled predictor variables.

[B,FitInfo] = lasso(X,y,'CV',10,'PredictorNames',{'x1','x2','x3','x4','x5'});

Display the variables in the model that corresponds to the minimum cross-validated mean squared error (MSE).

idxLambdaMinMSE = FitInfo.IndexMinMSE; minMSEModelPredictors = FitInfo.PredictorNames(B(:,idxLambdaMinMSE)~=0)

`minMSEModelPredictors = `*1x2 cell array*
{'x2'} {'x4'}

Display the variables in the sparsest model within one standard error of the minimum MSE.

idxLambda1SE = FitInfo.Index1SE; sparseModelPredictors = FitInfo.PredictorNames(B(:,idxLambda1SE)~=0)

`sparseModelPredictors = `*1x2 cell array*
{'x2'} {'x4'}

In this example, `lasso`

identifies the same predictors for the two models and removes the redundant predictors.

Visually examine the cross-validated error of various levels of regularization.

Load the sample data.

`load acetylene`

Create a design matrix with interactions and no constant term.

X = [x1 x2 x3]; D = x2fx(X,'interaction'); D(:,1) = []; % No constant term

Construct the lasso fit using 10-fold cross-validation. Include the `FitInfo`

output so you can plot the result.

rng default % For reproducibility [B,FitInfo] = lasso(D,y,'CV',10);

Plot the cross-validated fits.

lassoPlot(B,FitInfo,'PlotType','CV'); legend('show') % Show legend

The green circle and dotted line locate the `Lambda`

with minimum cross-validation error. The blue circle and dotted line locate the point with minimum cross-validation error plus one standard deviation.

Predict students' exam scores using `lasso`

and the elastic net method.

Load the `examgrades`

data set.

```
load examgrades
X = grades(:,1:4);
y = grades(:,5);
```

Split the data into training and test sets.

```
n = length(y);
c = cvpartition(n,'HoldOut',0.3);
idxTrain = training(c,1);
idxTest = ~idxTrain;
XTrain = X(idxTrain,:);
yTrain = y(idxTrain);
XTest = X(idxTest,:);
yTest = y(idxTest);
```

Find the coefficients of a regularized linear regression model using 10-fold cross-validation and the elastic net method with `Alpha`

= 0.75. Use the largest `Lambda`

value such that the mean squared error (MSE) is within one standard error of the minimum MSE.

[B,FitInfo] = lasso(XTrain,yTrain,'Alpha',0.75,'CV',10); idxLambda1SE = FitInfo.Index1SE; coef = B(:,idxLambda1SE); coef0 = FitInfo.Intercept(idxLambda1SE);

Predict exam scores for the test data. Compare the predicted values to the actual exam grades using a reference line.

yhat = XTest*coef + coef0; hold on scatter(yTest,yhat) plot(yTest,yTest) xlabel('Actual Exam Grades') ylabel('Predicted Exam Grades') hold off

`X`

— Predictor datanumeric matrix

Predictor data, specified as a numeric matrix. Each row represents one observation, and each column represents one predictor variable.

**Data Types: **`single`

| `double`

`y`

— Response datanumeric vector

Response data, specified as a numeric vector. `y`

has
length *n*, where *n* is the number of
rows of `X`

. The response `y(i)`

corresponds to the *i*th row of
`X`

.

**Data Types: **`single`

| `double`

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside quotes. You can specify several name and value
pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

`lasso(X,y,'Alpha',0.75,'CV',10)`

performs elastic net
regularization with 10-fold cross-validation. The `'Alpha',0.75`

name-value pair argument sets the parameter used in the elastic net
optimization.`'AbsTol'`

— Absolute error tolerance`1e–4`

(default) | positive scalarAbsolute error tolerance used to determine the convergence of the
ADMM Algorithm, specified as the
comma-separated pair consisting of `'AbsTol'`

and a
positive scalar. The algorithm converges when successive estimates of
the coefficient vector differ by an amount less than
`AbsTol`

.

This option applies only when you use `lasso`

on tall arrays. See Extended Capabilities for more
information.

**Example: **`'AbsTol',1e–3`

**Data Types: **`single`

| `double`

`'Alpha'`

— Weight of lasso versus ridge optimization`1`

(default) | positive scalarWeight of lasso (*L*^{1})
versus ridge (*L*^{2})
optimization, specified as the comma-separated pair consisting of
`'Alpha'`

and a positive scalar value in the
interval `(0,1]`

. The value
`Alpha = 1`

represents lasso regression,
`Alpha`

close to `0`

approaches
ridge regression, and other
values represent elastic net optimization. See Elastic Net.

**Example: **`'Alpha',0.5`

**Data Types: **`single`

| `double`

`'B0'`

— Initial values for vector of zeros (default) | numeric vector

Initial values for *x*-coefficients in ADMM Algorithm, specified as the
comma-separated pair consisting of `'B0'`

and a numeric
vector.

This option applies only when you use `lasso`

on tall arrays. See Extended Capabilities for more
information.

**Data Types: **`single`

| `double`

`'CV'`

— Cross-validation specification for estimating mean squared error`'resubstitution'`

(default) | positive integer scalar | `cvpartition`

objectCross-validation specification for estimating the mean squared error
(MSE), specified as the comma-separated pair consisting of
`'CV'`

and one of the following:

`'resubstitution'`

—`lasso`

uses`X`

and`y`

to fit the model and to estimate the MSE without cross-validation.Positive scalar integer

`K`

—`lasso`

uses`K`

-fold cross-validation.`cvpartition`

object`cvp`

—`lasso`

uses the cross-validation method expressed in`cvp`

. You cannot use a`'leaveout'`

partition with`lasso`

.

**Example: **`'CV',3`

`'DFmax'`

— Maximum number of nonzero coefficients`Inf`

(default) | positive integer scalarMaximum number of nonzero coefficients in the model, specified as the
comma-separated pair consisting of `'DFmax'`

and a
positive integer scalar. `lasso`

returns results only
for `Lambda`

values that satisfy this
criterion.

**Example: **`'DFmax',5`

**Data Types: **`single`

| `double`

`'Lambda'`

— Regularization coefficientsnonnegative vector

Regularization coefficients, specified as the comma-separated pair
consisting of `'Lambda'`

and a vector of nonnegative
values. See Lasso.

If you do not supply

`Lambda`

, then`lasso`

calculates the largest value of`Lambda`

that gives a nonnull model. In this case,`LambdaRatio`

gives the ratio of the smallest to the largest value of the sequence, and`NumLambda`

gives the length of the vector.If you supply

`Lambda`

, then`lasso`

ignores`LambdaRatio`

and`NumLambda`

.If

`Standardize`

is`true`

, then`Lambda`

is the set of values used to fit the models with the`X`

data standardized to have zero mean and a variance of one.

The default is a geometric sequence of `NumLambda`

values, with only the largest value able to produce
`B`

= `0`

.

**Example: **`'Lambda',linspace(0,1)`

**Data Types: **`single`

| `double`

`'LambdaRatio'`

— Ratio of smallest to largest `Lambda`

values`1e–4`

(default) | positive scalarRatio of the smallest to the largest `Lambda`

values when you do not supply `Lambda`

, specified as
the comma-separated pair consisting of `'LambdaRatio'`

and a positive scalar.

If you set `LambdaRatio`

= 0, then
`lasso`

generates a default sequence of
`Lambda`

values and replaces the smallest one
with `0`

.

**Example: **`'LambdaRatio',1e–2`

**Data Types: **`single`

| `double`

`'MaxIter'`

— Maximum number of iterations allowedpositive integer scalar

Maximum number of iterations allowed, specified as the comma-separated
pair consisting of `'MaxIter'`

and a positive integer
scalar.

If the algorithm executes `MaxIter`

iterations
before reaching the convergence tolerance `RelTol`

,
then the function stops iterating and returns a warning message.

The function can return more than one warning when
`NumLambda`

is greater than
`1`

.

Default values are `1e5`

for standard data and
`1e4`

for tall arrays.

**Example: **`'MaxIter',1e3`

**Data Types: **`single`

| `double`

`'MCReps'`

— Number of Monte Carlo repetitions for cross-validation`1`

(default) | positive integer scalarNumber of Monte Carlo repetitions for cross-validation, specified as
the comma-separated pair consisting of `'MCReps'`

and a
positive integer scalar.

If

`CV`

is`'resubstitution'`

or a`cvpartition`

of type`'resubstitution'`

, then`MCReps`

must be`1`

.If

`CV`

is a`cvpartition`

of type`'holdout'`

, then`MCReps`

must be greater than`1`

.

**Example: **`'MCReps',5`

**Data Types: **`single`

| `double`

`'NumLambda'`

— Number of `Lambda`

values`100`

(default) | positive integer scalarNumber of `Lambda`

values
`lasso`

uses when you do not supply
`Lambda`

, specified as the comma-separated pair
consisting of `'NumLambda'`

and a positive integer
scalar. `lasso`

can return fewer than
`NumLambda`

fits if the residual error of the
fits drops below a threshold fraction of the variance of
`y`

.

**Example: **`'NumLambda',50`

**Data Types: **`single`

| `double`

`'Options'`

— Option to cross-validate in parallel and specify random streamsstructure

Option to cross-validate in parallel and specify the random streams,
specified as the comma-separated pair consisting of
`'Options'`

and a structure. This option requires
Parallel
Computing Toolbox™.

Create the `Options`

structure with `statset`

. The option
fields are:

`UseParallel`

— Set to`true`

to compute in parallel. The default is`false`

.`UseSubstreams`

— Set to`true`

to compute in parallel in a reproducible fashion. For reproducibility, set`Streams`

to a type allowing substreams:`'mlfg6331_64'`

or`'mrg32k3a'`

. The default is`false`

.`Streams`

— A`RandStream`

object or cell array consisting of one such object. If you do not specify`Streams`

, then`lasso`

uses the default stream.

**Example: **`'Options',statset('UseParallel',true)`

**Data Types: **`struct`

`'PredictorNames'`

— Names of predictor variables`{}`

(default) | string array | cell array of character vectorsNames of the predictor variables, in the order in which they appear in
`X`

, specified as the comma-separated pair
consisting of `'PredictorNames'`

and a string array or
cell array of character vectors.

**Example: **`'PredictorNames',{'x1','x2','x3','x4'}`

**Data Types: **`string`

| `cell`

`'RelTol'`

— Convergence threshold for coordinate descent algorithm`1e–4`

(default) | positive scalarConvergence threshold for the coordinate descent algorithm [3], specified as the comma-separated pair
consisting of `'RelTol'`

and a positive scalar. The
algorithm terminates when successive estimates of the coefficient vector
differ in the *L*^{2} norm by a
relative amount less than `RelTol`

.

**Example: **`'RelTol',5e–3`

**Data Types: **`single`

| `double`

`'Rho'`

— Augmented Lagrangian parameterpositive scalar

Augmented Lagrangian parameter *ρ* for the ADMM Algorithm, specified as the
comma-separated pair consisting of `'Rho'`

and a
positive scalar. The default is automatic selection.

This option applies only when you use `lasso`

on tall arrays. See Extended Capabilities for more
information.

**Example: **`'Rho',2`

**Data Types: **`single`

| `double`

`'Standardize'`

— Flag for standardizing predictor data before fitting models`true`

(default) | `false`

Flag for standardizing the predictor data `X`

before fitting the models, specified as the comma-separated pair
consisting of `'Standardize'`

and either
`true`

or `false`

. If
`Standardize`

is `true`

, then
the `X`

data is scaled to have zero mean and a
variance of one. `Standardize`

affects whether the
regularization is applied to the coefficients on the standardized scale
or the original scale. The results are always presented on the original
data scale.

`X`

and `y`

are always
centered.

**Example: **`'Standardize',false`

**Data Types: **`logical`

`'U0'`

— Initial value of scaled dual variablevector of zeros (default) | numeric vector

Initial value of the scaled dual variable *u* in the
ADMM Algorithm, specified as the
comma-separated pair consisting of `'U0'`

and a numeric
vector.

`lasso`

on tall arrays. See Extended Capabilities for more
information.

**Data Types: **`single`

| `double`

`'Weights'`

— Observation weights`1/n*ones(n,1)`

(default) | nonnegative vectorObservation weights, specified as the comma-separated pair consisting
of `'Weights'`

and a nonnegative vector.
`Weights`

has length *n*, where
*n* is the number of rows of
`X`

. The `lasso`

function scales
`Weights`

to sum to `1`

.

**Data Types: **`single`

| `double`

`B`

— Fitted coefficientsnumeric matrix

Fitted coefficients, returned as a numeric matrix. `B`

is a *p*-by-*L* matrix, where
*p* is the number of predictors (columns) in
`X`

, and *L* is the number of
`Lambda`

values. You can specify the number of
`Lambda`

values using the
`NumLambda`

name-value pair argument.

The coefficient corresponding to the intercept term is a field in
`FitInfo`

.

**Data Types: **`single`

| `double`

`FitInfo`

— Fit information of modelsstructure

Fit information of the linear models, returned as a structure with the fields described in this table.

Field in FitInfo | Description |
---|---|

`Intercept` | Intercept term
β_{0} for each
linear model, a `1` -by-L
vector |

`Lambda` | Lambda parameters in ascending order, a
`1` -by-L
vector |

`Alpha` | Value of the `Alpha` parameter, a
scalar |

`DF` | Number of nonzero coefficients in `B`
for each value of `Lambda` , a
`1` -by-L
vector |

`MSE` | Mean squared error (MSE), a
`1` -by-L
vector |

`PredictorNames` | Value of the `PredictorNames` parameter,
stored as a cell array of character vectors |

If you set the `CV`

name-value pair argument to
cross-validate, the `FitInfo`

structure contains these
additional fields.

Field in FitInfo | Description |
---|---|

`SE` | Standard error of MSE for each `Lambda` ,
as calculated during cross-validation, a
`1` -by-L
vector |

`LambdaMinMSE` | `Lambda` value with the minimum MSE, a
scalar |

`Lambda1SE` | Largest `Lambda` value such that MSE is
within one standard error of the minimum MSE, a
scalar |

`IndexMinMSE` | Index of `Lambda` with the value
`LambdaMinMSE` , a scalar |

`Index1SE` | Index of `Lambda` with the value
`Lambda1SE` , a scalar |

For a given value of *λ*, a nonnegative parameter,
`lasso`

solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{2N}{\displaystyle \sum _{i=1}^{N}{\left({y}_{i}-{\beta}_{0}-{x}_{i}^{T}\beta \right)}^{2}}+\lambda {\displaystyle \sum _{j=1}^{p}\left|{\beta}_{j}\right|}\right).$$

*N*is the number of observations.*y*is the response at observation_{i}*i*.*x*is data, a vector of length_{i}*p*at observation*i*.*λ*is a nonnegative regularization parameter corresponding to one value of`Lambda`

.The parameters

*β*_{0}and*β*are a scalar and a vector of length*p*, respectively.

As *λ* increases, the number of nonzero components of
*β* decreases.

The lasso problem involves the *L*^{1}
norm of *β*, as contrasted with the elastic net algorithm.

For *α* strictly between 0 and 1, and nonnegative
*λ*, elastic net solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{2N}{\displaystyle \sum _{i=1}^{N}{\left({y}_{i}-{\beta}_{0}-{x}_{i}^{T}\beta \right)}^{2}}+\lambda {P}_{\alpha}\left(\beta \right)\right),$$

where

$${P}_{\alpha}\left(\beta \right)=\frac{(1-\alpha )}{2}{\Vert \beta \Vert}_{2}^{2}+\alpha {\Vert \beta \Vert}_{1}={\displaystyle \sum _{j=1}^{p}\left(\frac{(1-\alpha )}{2}{\beta}_{j}^{2}+\alpha \left|{\beta}_{j}\right|\right)}.$$

Elastic net is the same as lasso when *α* = 1. For
other values of *α*, the penalty term
*P _{α}*(

`ridge`

regression.When operating on tall arrays, `lasso`

uses an algorithm based
on the Alternating Direction Method of Multipliers (ADMM) [5]. The notation used here is the same as in the reference paper. This method solves
problems of the form

*Minimize*
$$l\left(x\right)+g\left(z\right)$$

*Subject to*
$$Ax+Bz=c$$

Using this notation, the lasso regression problem is

*Minimize*
$$l\left(x\right)+g\left(z\right)=\frac{1}{2}{\Vert Ax-b\Vert}_{2}^{2}+\lambda {\Vert z\Vert}_{1}$$

*Subject to*
$$x-z=0$$

Because the loss function $$l\left(x\right)=\frac{1}{2}{\Vert Ax-b\Vert}_{2}^{2}$$ is quadratic, the iterative updates performed by the algorithm amount to solving a linear system of equations with a single coefficient matrix but several right-hand sides. The updates performed by the algorithm during each iteration are

$$\begin{array}{l}{x}^{k+1}={\left({A}^{T}A+\rho I\right)}^{-1}\left({A}^{T}b+\rho \left({z}^{k}-{u}^{k}\right)\right)\\ {z}^{k+1}={S}_{\lambda /\rho}\left({x}^{k+1}+{u}^{k}\right)\\ {u}^{k+1}={u}^{k}+{x}^{k+1}-{z}^{k+1}\end{array}$$

*A* is the dataset (a tall array), *x* contains
the coefficients, *ρ* is the penalty parameter (augmented
Lagrangian parameter), *b* is the response (a tall array), and
*S* is the soft thresholding operator.

$${S}_{\kappa}\left(a\right)=\{\begin{array}{c}\begin{array}{cc}a-\kappa ,\text{\hspace{0.17em}}& a>\kappa \end{array}\\ \begin{array}{cc}0,\text{\hspace{0.17em}}& \left|a\right|\text{\hspace{0.17em}}\le \kappa \text{\hspace{0.17em}}\end{array}\\ \begin{array}{cc}a+\kappa ,\text{\hspace{0.17em}}& a<\kappa \text{\hspace{0.17em}}\end{array}\end{array}.$$

`lasso`

solves the linear system using Cholesky factorization
because the coefficient matrix $${A}^{T}A+\rho I$$ is symmetric and positive definite. Because $$\rho $$ does not change between iterations, the Cholesky factorization is
cached between iterations.

Even though *A* and *b* are tall arrays, they
appear only in the terms $${A}^{T}A$$ and $${A}^{T}b$$. The results of these two matrix multiplications are small enough
to fit in memory, so they are precomputed and the iterative updates between
iterations are performed entirely within memory.

[1] Tibshirani, R. “Regression Shrinkage and Selection via
the Lasso.” *Journal of the Royal Statistical Society.*
Series B, Vol. 58, No. 1, 1996, pp. 267–288.

[2] Zou, H., and T. Hastie. “Regularization and Variable
Selection via the Elastic Net.” *Journal of the Royal Statistical
Society.* Series B, Vol. 67, No. 2, 2005, pp. 301–320.

[3] Friedman, J., R. Tibshirani, and T. Hastie.
“Regularization Paths for Generalized Linear Models via Coordinate
Descent.” *Journal of Statistical Software.* Vol. 33, No. 1,
2010. `https://www.jstatsoft.org/v33/i01`

[4] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning.* 2nd edition. New York: Springer,
2008.

[5] Boyd, S. “Distributed Optimization and Statistical
Learning via the Alternating Direction Method of Multipliers.”
*Foundations and Trends in Machine Learning.* Vol. 3, No. 1,
2010, pp. 1–122.

Calculate with arrays that have more rows than fit in memory.

This function supports tall arrays for out-of-memory data with some limitations.

With tall arrays,

`lasso`

uses an algorithm based on ADMM (Alternating Direction Method of Multipliers).No elastic net support. The

`'Alpha'`

parameter is always 1.No cross-validation (

`'CV'`

parameter) support, which includes the related parameter`'MCReps'`

.The output

`FitInfo`

does not contain the additional fields`'SE'`

,`'LambdaMinMSE'`

,`'Lambda1SE'`

,`'IndexMinMSE'`

, and`'Index1SE'`

.The

`'Options'`

parameter is not supported because it does not contain options that apply to the ADMM algorithm. You can tune the ADMM algorithm using name-value pair arguments.Supported name-value pair arguments are:

`'Lambda'`

`'LambdaRatio'`

`'NumLambda'`

`'Standardize'`

`'PredictorNames'`

`'RelTol'`

`'Weights'`

Additional name-value pair arguments to control the ADMM algorithm are:

`'Rho'`

— Augmented Lagrangian parameter,*ρ*. The default value is automatic selection.`'AbsTol'`

— Absolute tolerance used to determine convergence. The default value is`1e–4`

.`'MaxIter'`

— Maximum number of iterations. The default value is`1e4`

.`'B0'`

— Initial values for the coefficients*x*. The default value is a vector of zeros.`'U0'`

— Initial values of the scaled dual variable*u*. The default value is a vector of zeros.

For more information, see Tall Arrays (MATLAB).

Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, set the `'UseParallel'`

option to `true`

.

Set the `'UseParallel'`

field of the options structure to `true`

using `statset`

and specify the `'Options'`

name-value pair argument in the call to this function.

For example: `'Options',statset('UseParallel',true)`

For more information, see the `'Options'`

name-value pair argument.

For more general information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)