Configure Incremental Learning Model

An incremental learning model object fully specifies how functions implement incremental fitting and model performance evaluation. To configure (or prepare) an incremental learning model, create one by calling the object directly, or by converting a traditionally trained model to one of the objects. The following table lists the available model types, model objects for incremental learning, and conversion functions.

Objective	Model Type	Model Object for Incremental Learning	Conversion Function
Binary classification	Linear support vector machine (SVM) and logistic regression with Gaussian kernels	`incrementalClassificationKernel`	`incrementalLearner` converts a kernel classification model (`ClassificationKernel`).
Binary classification	Linear SVM and logistic regression	`incrementalClassificationLinear`	`incrementalLearner` converts a linear SVM model (`ClassificationSVM` or `CompactClassificationSVM`). `incrementalLearner` converts a linear classification model (`ClassificationLinear`).
Multiclass classification	Error-correcting output codes (ECOC) model with binary learners	`incrementalClassificationECOC`	`incrementalLearner` converts an ECOC model (`ClassificationECOC` or `CompactClassificationECOC`) with binary learners.
Multiclass classification	Naive Bayes with normal, multinomial, or multivariate multinomial predictor conditional distributions	`incrementalClassificationNaiveBayes`	`incrementalLearner` converts a full naive Bayes classification model (`ClassificationNaiveBayes`).
Regression	Least-squares and linear SVM regression with Gaussian kernels	`incrementalRegressionKernel`	`incrementalLearner` converts a kernel regression model (`RegressionKernel`).
Regression	Least-squares and linear SVM regression	`incrementalRegressionLinear`	`incrementalLearner` converts a linear SVM regression model (`RegressionSVM` or `CompactRegressionSVM`). `incrementalLearner` converts a linear regression model (`RegressionLinear`).

The approach you choose to create an incremental model depends on the information you have and your preferences.

Call object: Create an incremental model to your specifications by calling the object directly. This approach is flexible, enabling you to specify most options to suit your preferences, and the resulting model provides reasonable default values. For more details, see Call Object Directly.
Convert model: Convert a traditionally trained model to an incremental learner to initialize a model for incremental learning by using the incrementalLearner function. The function passes information that the traditionally trained model learned from the data. To convert a traditionally trained model, you must have a set of labeled data to which you can fit a model.
When you use incrementalLearner, you can specify all performance evaluation options and only those training, model, and data options that are unknown during conversion. For more details, see Convert Traditionally Trained Model.

Regardless of the approach you use, consider these configurations:

Model performance evaluation settings, such as the performance metrics to measure. For details, see Model Options and Data Properties.
For ECOC models:
- Binary learners
- Coding design matrix for the binary learners.
For kernel models:
- Model type, such as SVM
- Objective function solver, such as standard stochastic gradient descent (SGD)
- Hyperparameters for random feature expansion, such as the kernel scale parameter and number of dimensions of expanded space
For linear models:
- Model type, such as SVM
- Coefficient initial values
- Objective function solver, such as standard stochastic gradient descent (SGD)
- Solver hyperparameter values, such as the learning rate of SGD solvers
For naive Bayes models, the conditional distribution of the predictor variables. In a data set, you can specify that real-valued predictors are normally distributed and that categorical predictors (where levels are numeric scalars) are multivariate multinomial. For a bag-of-tokens model, where each predictor is a count, you can specify that all predictors are jointly multinomial.

Call Object Directly

Unlike when working with other machine learning model objects, you can create an incremental learning model by calling the corresponding object directly, with little knowledge about the data. For example, the following code creates a default incremental model for linear regression and a naive Bayes classification model for a data stream containing 5 classes.

MdlLR = incrementalRegressionLinear();
MdlNB = incrementalClassificationNaiveBayes(MaxNumClasses=5)

For linear and kernel models, the only information required to create a model directly is the machine learning problem, either classification or regression. An estimation period might also be required, depending on your specifications.
For naive Bayes and ECOC classification models, you must specify the maximum number of classes or all class names expected in the data during incremental learning.

If you have information about the data to specify, or you want to configure model options or performance evaluation settings, use name-value arguments when you call the object. (All model properties are read-only; you cannot adjust them using dot notation.) For example, the following pseudocode creates an incremental logistic regression model for binary classification, initializes the linear model coefficients Beta and bias Bias (obtained from prior knowledge of the problem), and sets the performance metrics warm-up period to 500 observations.

Mdl = incrementalClassificationLinear(Learner="logistic", ...
    Beta=beta,Bias=bias,MetricsWarmupPeriod=500);

The following tables briefly describe notable options for the major aspects of incremental learning. For more details on all options, see the Properties section of each incremental model object page.

Model Options and Data Properties

This table contains notable model options and data characteristics.

Model Type	Model Options and Data Properties	Description
Classification	`ClassNames`	For classification, the expected class names in the observation labels
ECOC classification	`BinaryLearners`*	Binary learners
	`CodingMatrix`*	Class assignment codes
	`CodingName`*	Coding design name
Kernel classification or regression	`KernelScale`	Kernel scale parameter that the software uses for random feature expansion
	`Learner`	Model type, such as linear SVM, logistic regression, or least-squares regression
	`NumExpansionDimensions`	Number of dimensions of expanded space
Linear classification or regression	`Beta`	Linear coefficients that also serve as initial values for incremental fitting
	`Bias`	Model intercept that also serve as an initial value for incremental fitting
	`Learner`	Model type, such as linear SVM, logistic regression, or least-squares regression
Naive Bayes classification	`Cost`	Misclassification cost matrix

*You can specify the BinaryLearners property by using the Learners name-value argument, and specify the CodingMatrix and CodingName properties by using the Coding name-value argument. Set the other properties by using name-value argument syntax with the arguments of the same name when you call the object. For example, incrementalClassificationKernel(Learner="logistic") sets the Learner property to "logistic".

Training and Solver Options and Properties

This table contains notable training and solver options and properties.

Model Type	Training and Solver Options and Properties	Description
Kernel classification or regression	`EstimationPeriod`	Pretraining estimation period
	`Solver`	Objective function optimization algorithm
	`Standardize`	Flag to standardize predictor data
	`Mu`**	Predictor variable means
	`Sigma`**	Predictor variable standard deviations
Linear classification or regression	`EstimationPeriod`	Pretraining estimation period
	`Solver`	Objective function optimization algorithm
	`Standardize`	Flag to standardize predictor data
	`Lambda`	Ridge penalty, a model hyperparameter that requires tuning for SGD optimization
	`BatchSize`	Mini-batch size, an SGD hyperparameter
	`LearnRate`	Learning rate, an SGD hyperparameter
	`Mu`**	Predictor variable means
	`Sigma`**	Predictor variable standard deviations
Naive Bayes classification	`DistributionParameters`**	Learned distribution parameters. For each predictor with conditionally normal distributions given a class, the fitted, weighted mean and standard deviation. For conditionally joint multinomial predictors given a class, relative frequencies of the levels the predictors represent. For each conditionally multivariate multinomial given a class, a vector of relative frequencies of the levels of a predictor.

**You cannot specify the Mu, Sigma, and DistributionParameters properties, whereas you can set the other properties by using name-value argument syntax when you call the object.

Mu and Sigma (linear and kernel models) — When you set Standardize=true and specify a positive estimation period, and the properties are empty, incremental fitting functions estimate means and standard deviations using the estimation period observations. For more details, see Standardize Data.
DistributionParameters (naive Bayes classification models) — The property must be fitted to data, by fit, or updateMetricsAndFit.

For linear classification and regression models:

The estimation period, specified by the number of observations in EstimationPeriod, occurs before training begins (see Incremental Learning Periods). During the estimation period, the incremental fitting function fit or updateMetricsAndFit computes quantities required for training when they are unknown. For example, if you set Standardize=true, incremental learning functions require predictor means and standard deviations to standardize the predictor data. Consequently, the incremental model requires a positive estimation period (the default is 1000).
The default solver is the adaptive scale-invariant solver "scale-invariant" [2], which is hyperparameter free and insensitive to the predictor variable scales; therefore, predictor data standardization is not required. You can specify standard or average SGD instead, "sgd" or "asgd". However, SGD is sensitive to predictor variable scales and requires hyperparameter tuning, which can be difficult or impossible to do during incremental learning. If you plan to use an SGD solver, complete these steps:
1. Obtain labeled data.
2. Traditionally train a linear classification or regression model by calling fitclinear or fitrlinear, respectively. Specify the SGD solver you plan to use for incremental learning, cross-validate to determine an appropriate set of hyperparameters, and standardize the predictor data.
3. Train the model on the entire sample using the specified hyperparameter set.
4. Convert the resulting model to an incremental learner by using incrementalLearner.

Performance Evaluation Options and Properties

Performance evaluation properties and options enable you to configure how and when model performance is measured by the incremental learning function updateMetrics or updateMetricsAndFit. Regardless of the options you choose, first familiarize yourself with the incremental learning periods.

This table contains all performance evaluation options and properties.

Performance Evaluation Options and Properties	Description
`Metrics`	Specify the list of performance metrics or loss functions to measure incrementally by using the `Metrics` name-value argument. The `Metrics` property stores a table of tracked cumulative and window metrics.
`MetricsWarmupPeiod`	Number of observations to which the incremental model must be fit before it tracks performance metrics
`MetricsWindowSize`	Number of observations to use to compute window performance metrics
`IsWarm`***	Flag indicating whether the model is warm (measures performance metrics)

***You cannot specify the IsWarm property, whereas you can set the other properties by using name-value argument syntax when you call the object.

The metrics specified by the Metrics name-value argument form a table stored in the Metrics property of the model. For example, if you specify Metrics=["Metric1","Metric2"] when you create an incremental model Mdl, the Metrics property is

>> Mdl.Metrics

ans =

  2×2 table
                Cumulative    Window
                __________    ______

    Metric1        NaN         NaN
    Metric2        NaN         NaN

Specify a positive metrics warm-up period when you believe the model is of low quality and needs to be trained before the function updateMetrics or updateMetricsAndFit tracks performance metrics in the Metrics property. In this case, the IsWarm property is false, and you must pass the incoming data and model to the incremental fitting function fit or updateMetricsAndFit.

When the incremental fitting function processes enough data to satisfy the estimation period (for linear and kernel models) and the metrics warm-up period, the IsWarm property becomes true, and you can measure the model performance on incoming data and optionally train the model. For naive Bayes and ECOC classification models, incremental fitting functions must additionally fit the model to all expected classes to become warm.

When the model is warm, updateMetrics or updateMetricsAndFit tracks all specified metrics cumulatively (from the start of the evaluation) and within a window of observations specified by the MetricsWindowSize property. Cumulative metrics reflect the model performance over the entire incremental learning history; after Performance Evaluation Period 1 starts, cumulative metrics are independent of the evaluation period. Window metrics reflect the model performance only over the specified window size for each performance evaluation period.

Convert Traditionally Trained Model

incrementalLearner enables you to initialize an incremental model using information learned from a traditionally trained model. The converted model can generate predictions and it is warm, which means that incremental learning functions can measure model performance metrics from the start of the data stream. In other words, estimation and performance metrics warm-up periods are not required for incremental learning.

To convert a traditionally trained model to an incremental learner, pass the model and any options specified by name-value arguments to incrementalLearner. For example, the following pseudocode initializes an incremental classification model by using all information that a linear SVM model for binary classification has learned from a batch of data.

Mdl = fitcsvm(X,Y);
IncrementalMdl = incrementalLearner(Mdl,Name=Value);

IncrementalMdl is an incremental one-class SVM model object for anomaly detection.

Ease of incremental model creation and initialization is offset by decreased flexibility. The software assumes that fitted parameters, hyperparameter values, and data characteristics learned during traditional training are appropriate for incremental learning. Therefore, you cannot set corresponding learned or tuned options when you call incrementalLearner.

This table lists notable read-only properties of IncrementalMdl that the incrementalLearner function transfers from Mdl or infers from other values. For more details, see the output argument description of each incrementalLearner function page.

Model Type	Property	Description
All	`NumPredictors`	Number of predictor variables. For models that dummy-code categorical predictor variables, `NumPredictors` is `numel(Mdl.ExpandedPredictorNames)`, and predictor variables expected during incremental learning correspond to the names. For more details, see Dummy Variables.
Classification	`ClassNames`	All class labels expected during incremental learning
	`Prior`	Prior class distribution
	`ScoreTransform`	A function to apply to classification scores. For example, if you configure an SVM model to compute posterior class probabilities, `ScoreTransform` (containing the score-to-posterior-probability function learned from the data) is transferred.
Regression	`Epsilon`	For an SVM learner, half the width of the epsilon-insensitive band
Regression	`ResponseTransform`	A function to apply to predicted responses
ECOC classification	`BinaryLearners`	Trained binary learners, a cell array of model objects
	`CodingMatrix`	Class assignment codes for the binary learners
	`CodingName`	Coding design name
Kernel classification or regression	`KernelScale`	Kernel scale parameter
	`Learner`	Linear model type
	`Mu`	Predictor variable means
	`NumExpansionDimensions`	Number of dimensions of expanded space, a positive integer
	`Sigma`	Predictor variable standard deviations
Linear classification or regression	`Beta`	Linear model coefficients
	`Bias`	Model intercept
	`Learner`	Linear model type
	`Mu`	For an SVM model object, the predictor variable means
	`Sigma`	For an SVM model object, the predictor variable standard deviations
Naive Bayes classification	`DistributionNames`	Conditional distribution of the predictor variables given the class, having either of the following values: A `NumPredictors` length cell vector with entries `"normal"`, when the corresponding predictor is normal, or `"mvmn"`, when the corresponding predictor is multivariate multinomial. `"mn"`, when all predictor variables compose a multinomial distribution. If you convert a naive Bayes classification model containing at least one predictor with a kernel distribution, `incrementalLearner` issues an error.
	`DistributionParameters`	Fitted distribution parameters of each conditional predictor distribution given each class, a `NumPredictors`-by-`K` cell matrix.
	`CategoricalPredictors`	Numeric vector of indices of categorical predictors
	`CategoricalLevels`	Multivariate multinomial predictor levels, a cell vector of length `NumPredictors`

Note

The NumTrainingObservations property of IncrementalMdl does not include the observations used to train Mdl. It only includes the observations used for incremental learning when you call fit or updateMetricsAndFit.
If you specify Standardize=true when you train Mdl, IncrementalMdl is configured to standardize predictors during incremental learning by default.

The following conditions apply when you convert a linear classification or regression model (ClassificationLinear and RegressionLinear, respectively):

Incremental fitting functions support ridge (L2) regularization only.
Incremental fitting functions support the specification of only one regularization value. Therefore, if you specify a regularization path (vector of regularization values) when you call fitclinear or fitrlinear, choose the model associated with one penalty by passing it to selectModels.
If you solve the objective function by using standard or average SGD ("sgd" or "asgd" for the Solver name-value argument), these conditions apply when you call incrementalLearner:
- incrementalLearner transfers the solver used to optimize Mdl to IncrementalMdl.
- You can specify the adaptive scale-invariant solver "scale-invariant" instead, but you cannot specify a different SGD solver.
- If you do not specify the adaptive scale-invariant solver, incrementalLearner transfers model and solver hyperparameter values to the incremental model object, such as the learning rate LearnRate, mini-batch size BatchSize, and ridge penalty Lambda. You cannot modify the transferred properties.

Call Object After Training Model

If you require more flexibility when you create an incremental model, you can call the object directly and initialize the model by individually setting learned information using name-value arguments. The following pseudocode show two examples:

Initialize an incremental classification model from the coefficients and class names learned by fitting a linear SVM model for binary classification to a batch of data Xc and Yc.
```
Mdl = fitcsvm(Xc,Yc);
IncrementalMdl = incrementalClassificationLinear( ...
    Beta=Mdl.Beta,Bias=Mdl.Bias,ClassNames=Mdl.ClassNames);
```

Initialize an incremental regression model from the coefficients learned by fitting a linear model to a batch of data Xr and Yr.

Mdl = fitlm(Xr,Yr);
bias = Mdl.Coefficients.Estimate(1);
beta = Mdl.Coefficients.Estimate(2:end);
IncrementalMdl = incrementalRegressionLinear( ...
    Learner="leastsquares",Bias=bias,Beta=beta);

References

[1] Bifet, Albert, Ricard Gavaldá, Geoffrey Holmes, and Bernhard Pfahringer. Machine Learning for Data Streams with Practical Example in MOA. Cambridge, MA: The MIT Press, 2007.

[2] Kempka, Michał, Wojciech Kotłowski, and Manfred K. Warmuth. "Adaptive Scale-Invariant Online Algorithms for Learning Linear Models." Preprint, submitted February 10, 2019. https://arxiv.org/abs/1902.07528.