Predict responses for new observations from naive Bayes classification model for incremental learning
specifies options using one or more namevalue arguments. For example, you can specify a
custom misclassification cost matrix (in other words, override the value
label
= predict(Mdl
,X
,Name,Value
)Mdl.Cost
) for computing predictions by specifying the
Cost
argument.
[
also returns the posterior probabilities
(label
,Posterior
,Cost
] = predict(___)Posterior
) and predicted (expected) misclassification costs
(Cost
) corresponding to the observations (rows) in
X
using any of the inputargument combinations in the previous
syntaxes. For each observation in X
, the predicted class label
corresponds to the minimum expected classification cost among all classes.
Load the human activity data set.
load humanactivity
For details on the data set, enter Description
at the command line.
Fit a naive Bayes classification model to the entire data set.
TTMdl = fitcnb(feat,actid)
TTMdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [1 2 3 4 5] ScoreTransform: 'none' NumObservations: 24075 DistributionNames: {1×60 cell} DistributionParameters: {5×60 cell} Properties, Methods
TTMdl
is a ClassificationNaiveBayes
model object representing a traditionally trained model.
Convert the traditionally trained model to a naive Bayes classification model for incremental learning.
IncrementalMdl = incrementalLearner(TTMdl)
IncrementalMdl = incrementalClassificationNaiveBayes IsWarm: 1 Metrics: [1×2 table] ClassNames: [1 2 3 4 5] ScoreTransform: 'none' DistributionNames: {1×60 cell} DistributionParameters: {5×60 cell} Properties, Methods
IncrementalMdl
is an incrementalClassificationNaiveBayes
model object prepared for incremental learning.
The incrementalLearner
function initializes the incremental learner by passing learned conditional predictor distribution parameters to it, along with other information TTMdl
learned from the training data.
IncrementalMdl
is warm (IsWarm
is 1
), which means that incremental learning functions can start tracking performance metrics.
An incremental learner created from converting a traditionally trained model can generate predictions without further processing.
Predict class labels for all observations using both models.
ttlabels = predict(TTMdl,feat); illables = predict(IncrementalMdl,feat); sameLabels = sum(ttlabels ~= illables) == 0
sameLabels = logical
1
Both models predict the same labels for each observation.
This example shows how to apply misclassification costs for label prediction on incoming chunks of data, while maintaining a balanced misclassification cost for training.
Load the human activity data set. Randomly shuffle the data.
load humanactivity n = numel(actid); rng(10); % For reproducibility idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);
Create a naive Bayes classification model for incremental learning; specify the class names. Prepare it for predict
by fitting the model to the first 10 observations.
Mdl = incrementalClassificationNaiveBayes(ClassNames=unique(Y)); initobs = 10; Mdl = fit(Mdl,X(1:initobs,:),Y(1:initobs)); canPredict = size(Mdl.DistributionParameters,1) == numel(Mdl.ClassNames)
canPredict = logical
1
Consider severely penalizing the model for misclassifying "running" (class 4). Create a cost matrix that applies 100 times the penalty for misclassifying running as compared to misclassifying any other class. Rows correspond to the true class, and columns correspond to the predicted class.
k = numel(Mdl.ClassNames);
Cost = ones(k)  eye(k);
Cost(4,:) = Cost(4,:)*100; % Penalty for misclassifying "running"
Cost
Cost = 5×5
0 1 1 1 1
1 0 1 1 1
1 1 0 1 1
100 100 100 0 100
1 1 1 1 0
Simulate a data stream, and perform the following actions on each incoming chunk of 100 observations.
Call predict
to predict labels for each observation in the incoming chunk of data.
Call predict
again, but specify the misclassification costs by using the Cost
argument.
Call fit
to fit the model to the incoming chunk. Overwrite the previous incremental model with a new one fitted to the incoming observation.
numObsPerChunk = 100; nchunk = ceil((n  initobs)/numObsPerChunk); labels = zeros(n,1); cslabels = zeros(n,1); cst = zeros(n,5); cscst = zeros(n,5); % Incremental learning for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j1) + 1 + initobs); iend = min(n,numObsPerChunk*j + initobs); idx = ibegin:iend; [labels(idx),~,cst(idx,:)] = predict(Mdl,X(idx,:)); [cslabels(idx),~,cscst(idx,:)] = predict(Mdl,X(idx,:),Cost=Cost); Mdl = fit(Mdl,X(idx,:),Y(idx)); end labels = labels((initobs + 1):end); cslabels = cslabels((initobs + 1):end);
Compare the predicted class distributions between the prediction methods by plotting histograms.
figure; histogram(labels); hold on histogram(cslabels); legend(["Defaultcost prediction" "Costsensitive prediction"])
Because the costsensitive prediction method penalizes misclassifying class 4 so severely, more predictions into class 4 result as compared to the prediction method that uses the default, balanced cost.
Load the human activity data set. Randomly shuffle the data.
load humanactivity n = numel(actid); rng(10); % For reproducibility idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);
For details on the data set, enter Description
at the command line.
Create a naive Bayes classification model for incremental learning; specify the class names. Prepare it for predict
by fitting the model to the first 10 observations.
Mdl = incrementalClassificationNaiveBayes('ClassNames',unique(Y));
initobs = 10;
Mdl = fit(Mdl,X(1:initobs,:),Y(1:initobs));
canPredict = size(Mdl.DistributionParameters,1) == numel(Mdl.ClassNames)
canPredict = logical
1
Mdl
is an incrementalClassificationNaiveBayes
model. All its properties are readonly. The model is configured to generate predictions.
Simulate a data stream, and perform the following actions on each incoming chunk of 100 observations.
Call predict
to compute class posterior probabilities for each observation in the incoming chunk of data.
Consider incrementally measuring how well the model predicts whether a subject is dancing (Y is 5). You can accomplish this by computing the AUC of an ROC curve created by passing, for each observation in the chunk, the difference between the posterior probability of class 5 and the maximum posterior probability among the other classes to perfcurve
.
Call fit
to fit the model to the incoming chunk. Overwrite the previous incremental model with a new one fitted to the incoming observation.
numObsPerChunk = 100; nchunk = floor((n  initobs)/numObsPerChunk)  1; Posterior = zeros(nchunk,numel(Mdl.ClassNames)); auc = zeros(nchunk,1); classauc = 5; % Incremental learning for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j1) + 1 + initobs); iend = min(n,numObsPerChunk*j + initobs); idx = ibegin:iend; [~,Posterior(idx,:)] = predict(Mdl,X(idx,:)); diffscore = Posterior(idx,classauc)  max(Posterior(idx,setdiff(Mdl.ClassNames,classauc)),[],2); [~,~,~,auc(j)] = perfcurve(Y(idx),diffscore,Mdl.ClassNames(classauc)); Mdl = fit(Mdl,X(idx,:),Y(idx)); end
Mdl
is an incrementalClassificationNaiveBayes
model object trained on all the data in the stream.
Plot the AUC on the incoming chunks of data.
plot(auc) ylabel('AUC') xlabel('Iteration')
The AUC suggests that the classifier correctly predicts dancing subjects well during incremental learning.
Mdl
— Naive Bayes classification model for incremental learningincrementalClassificationNaiveBayes
model objectNaive Bayes classification model for incremental learning, specified as an incrementalClassificationNaiveBayes
model object. You can create Mdl
directly or by converting a supported, traditionally trained machine learning model using the incrementalLearner
function. For more details, see the corresponding reference page.
You must configure Mdl
to predict labels for a batch of observations.
If Mdl
is a converted, traditionally trained model, you can predict labels without any modifications.
Otherwise, Mdl.DistributionParameters
must be a cell matrix with Mdl.NumPredictors
> 0 columns and at least one row, where each row corresponds to each class name in Mdl.ClassNames
.
X
— Batch of predictor dataBatch of predictor data for which to predict labels, specified as an nbyMdl.NumPredictors
floatingpoint matrix.
The length of the observation labels Y
and the number of
observations in X
must be equal;
Y(
is the label of observation
j (row or column) in j
)X
.
Data Types: single
 double
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Namevalue arguments must appear after other arguments, but the order of the
pairs does not matter.
Cost=[0 2;1 0]
attributes double the penalty for
misclassifying observations with true class Mdl.ClassNames(1)
, than for
misclassifying observations with true class
Mdl.ClassNames(2)
.Cost
— Cost of misclassifying an observationMdl.Cost
(default)  square matrix  structure arrayCost of misclassifying an observation, specified as a value in the table, where
c is the number of classes in Mdl.ClassNames
.
The specified value overrides the value of Mdl.Cost
.
Value  Description 

cbyc numeric matrix 

Structure array  A structure array having two fields:

Example: Cost=struct('ClassNames',Mdl.ClassNames,'ClassificationCosts',[0 2; 1 0])
Data Types: single
 double
 struct
Prior
— Prior class probabilitiesMdl.Prior
(default)  numeric vectorPrior class probabilities, specified as a value in this numeric vector. Prior
has the same length as the number of classes in Mdl.ClassNames
, and the order of the elements corresponds to the class order in Mdl.ClassNames
. predict
normalizes the vector so that the sum of the result is 1.
The specified value overrides the value of Mdl.Prior
.
Data Types: single
 double
ScoreTransform
— Score transformation functionMdl.ScoreTransform
(default)  string scalar  character vectorScore transformation function describing how incremental learning functions transform raw response values, specified as a character vector, string scalar, or function handle. The specified value overrides the value of Mdl.ScoreTransform
.
This table describes the available builtin functions for score transformation.
Value  Description 

"doublelogit"  1/(1 + e^{–2x}) 
"invlogit"  log(x / (1 – x)) 
"ismax"  Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0 
"logit"  1/(1 + e^{–x}) 
"none" or "identity"  x (no transformation) 
"sign"  –1 for x < 0 0 for x = 0 1 for x > 0 
"symmetric"  2x – 1 
"symmetricismax"  Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1 
"symmetriclogit"  2/(1 + e^{–x}) – 1 
Data Types: char
 string
label
— Predicted responsesPredicted responses (or labels), returned as a categorical or character array; floatingpoint, logical, or string vector; or cell array of character vectors with n rows. n is the number of observations in X
, and label(
is the predicted response for observation j
)
. j
label
has the same data type as the class names stored in Mdl.ClassNames
. (The software treats string arrays as cell arrays of character
vectors.)
Posterior
— Class posterior probabilitiesClass posterior probabilities, returned as an nby2 floatingpoint matrix. Posterior(
is the posterior probability that observation j
,k
)
is in class j
. k
Mdl.ClassNames
specifies the order of the classes.
Cost
— Expected misclassification costsExpected misclassification costs, returned as an nbynumel(Mdl.ClassNames)
floatingpoint matrix.
Cost(
is the expected misclassification cost of the observation in row j
,k
)
of j
X
predicted into class
(k
Mdl.ClassNames(
).k
)
A misclassification cost is the relative severity of a classifier labeling an observation into the wrong class.
There are two types of misclassification costs: true and expected. Let K be the number of classes.
True misclassification cost — A
KbyK matrix, where element
(i,j) indicates the misclassification
cost of predicting an observation into class j if its true
class is i. The software stores the misclassification cost in
the property Mdl.Cost
, and uses it in computations. By
default, Mdl.Cost(i,j)
= 1 if i
≠
j
, and Mdl.Cost(i,j)
= 0 if
i
= j
. In other words, the cost is
0
for correct classification and 1
for
any incorrect classification.
Expected misclassification cost — A Kdimensional vector, where element k is the weighted average misclassification cost of classifying an observation into class k, weighted by the class posterior probabilities.
$${c}_{k}={\displaystyle \sum _{j=1}^{K}\widehat{P}}\left(Y=j{x}_{1},\mathrm{...},{x}_{P}\right)Cos{t}_{jk}.$$
In other words, the software classifies observations to the class corresponding with the lowest expected misclassification cost.
The posterior probability is the probability that an observation belongs in a particular class, given the data.
For naive Bayes, the posterior probability that a classification is k for a given observation (x_{1},...,x_{P}) is
$$\widehat{P}\left(Y=k{x}_{1},\mathrm{..},{x}_{P}\right)=\frac{P\left({X}_{1},\mathrm{...},{X}_{P}y=k\right)\pi \left(Y=k\right)}{P\left({X}_{1},\mathrm{...},{X}_{P}\right)},$$
where:
$$P\left({X}_{1},\mathrm{...},{X}_{P}y=k\right)$$ is the conditional
joint density of the predictors given they are in class k. Mdl.DistributionNames
stores
the distribution names of the predictors.
π(Y = k)
is the class prior probability distribution. Mdl.Prior
stores
the prior distribution.
$$P\left({X}_{1},\mathrm{..},{X}_{P}\right)$$ is the joint density of the predictors. The classes are discrete, so $$P({X}_{1},\mathrm{...},{X}_{P})={\displaystyle \sum _{k=1}^{K}P}({X}_{1},\mathrm{...},{X}_{P}y=k)\pi (Y=k).$$
You have a modified version of this example. Do you want to open this example with your edits?
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
Select web siteYou can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.