# ocsvm

Fit one-class support vector machine (SVM) model for anomaly detection

*Since R2022b*

## Syntax

## Description

Use the `ocsvm`

function to fit a one-class support vector
machine (SVM) model for outlier detection and novelty detection.

Outlier detection (detecting anomalies in training data) — Use the output argument

`tf`

of`ocsvm`

to identify anomalies in training data.Novelty detection (detecting anomalies in new data with uncontaminated training data) — Create a

`OneClassSVM`

object by passing uncontaminated training data (data with no outliers) to`ocsvm`

. Detect anomalies in new data by passing the object and the new data to the object function`isanomaly`

.

returns
a `Mdl`

= ocsvm(`Tbl`

)`OneClassSVM`

object
(one-class SVM model object) for predictor data in the table
`Tbl`

.

specifies options using one or more name-value arguments in addition to any of the input
argument combinations in the previous syntaxes. For example,
`Mdl`

= ocsvm(___,`Name=Value`

)

instructs the function
to process 10% of the training data as anomalies.`ContaminationFraction`

=0.1

## Examples

## Input Arguments

## Output Arguments

## More About

## Tips

After training a model, you can generate C/C++ code that finds anomalies for new data. Generating C/C++ code requires MATLAB

^{®}Coder™. For details, see Code Generation of the`isanomaly`

function and Introduction to Code Generation.

## Algorithms

`ocsvm`

considers`NaN`

,`''`

(empty character vector),`""`

(empty string),`<missing>`

, and`<undefined>`

values in`Tbl`

and`NaN`

values in`X`

to be missing values.`ocsvm`

removes observations with all missing values.`ocsvm`

does not use observations with some missing values. The function assigns the anomaly score of`NaN`

and anomaly indicator of`false`

(logical 0) to the observations.

`ocsvm`

minimizes the regularized objective function using a Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) solver with ridge (*L*_{2}) regularization. If`ocsvm`

requires more memory than the value of`BlockSize`

to hold the transformed predictor data, then the function uses a block-wise strategy.When

`ocsvm`

uses a block-wise strategy, it implements LBFGS by distributing the calculation of the loss and gradient among different parts of the data at each iteration. Also,`ocsvm`

refines the initial estimates of the linear coefficients and the bias term by fitting the model locally to parts of the data and combining the coefficients by averaging. If you specify`Verbose=1`

, then`ocsvm`

displays diagnostic information for each data pass.When

`ocsvm`

does not use a block-wise strategy, the initial estimates are zeros. If you specify`Verbose=1`

, then`ocsvm`

displays diagnostic information for each iteration.

## Alternative Functionality

You can also use the `fitcsvm`

function to train a one-class SVM model for
anomaly detection.

The

`ocsvm`

function provides a simpler and preferred workflow for anomaly detection than the`fitcsvm`

function.The

`ocsvm`

function returns a`OneClassSVM`

object, anomaly indicators, and anomaly scores. You can use the outputs to identify anomalies in training data. To find anomalies in new data, you can use the`isanomaly`

object function of`OneClassSVM`

. The`isanomaly`

function returns anomaly indicators and scores for the new data.The

`fitcsvm`

function supports both one-class and binary classification. If the class label variable contains only one class (for example, a vector of ones),`fitcsvm`

trains a model for one-class classification and returns a`ClassificationSVM`

object. To identify anomalies, you must first compute anomaly scores by using the`resubPredict`

or`predict`

object function of`ClassificationSVM`

, and then identify anomalies by finding observations that have negative scores.Note that a large positive anomaly score indicates an anomaly in

`ocsvm`

, whereas a negative score indicates an anomaly in`predict`

of`ClassificationSVM`

.

The

`ocsvm`

function finds the decision boundary based on the primal form of SVM, whereas the`fitcsvm`

function finds the decision boundary based on the dual form of SVM.The solver in

`ocsvm`

is computationally less expensive than the solver in`fitcsvm`

for a large data set (large*n*). Unlike solvers in`fitcsvm`

, which require computation of the*n*-by-*n*Gram matrix, the solver in`ocsvm`

only needs to form a matrix of size*n*-by-*m*. Here,*m*is the number of dimensions of expanded space, which is typically much less than*n*for big data.

## References

## Version History

**Introduced in R2022b**