kfoldPredict

Classify observations in cross-validated kernel classification model

Syntax

label = kfoldPredict(CVMdl)

[label,score] = kfoldPredict(CVMdl)

Description

label = kfoldPredict(CVMdl) returns class labels predicted by the cross-validated, binary kernel model (ClassificationPartitionedKernel) CVMdl. For every fold, kfoldPredict predicts class labels for validation-fold observations using a model trained on training-fold observations.

example

[label,score] = kfoldPredict(CVMdl) also returns classification scores for both classes.

example

Examples

collapse all

Classify Observations Using Cross-Validation

Open Live Script

Classify observations using a cross-validated, binary kernel classifier, and display the confusion matrix for the resulting classification.

Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g').

load ionosphere

Cross-validate a binary kernel classification model using the data.

rng(1); % For reproducibility 
CVMdl = fitckernel(X,Y,'Crossval','on')

CVMdl = 
  ClassificationPartitionedKernel
    CrossValidatedModel: 'Kernel'
           ResponseName: 'Y'
        NumObservations: 351
                  KFold: 10
              Partition: [1x1 cvpartition]
             ClassNames: {'b'  'g'}
         ScoreTransform: 'none'

CVMdl is a ClassificationPartitionedKernel model. By default, the software implements 10-fold cross-validation. To specify a different number of folds, use the 'KFold' name-value pair argument instead of 'Crossval'.

Classify the observations that fitckernel does not use in training the folds.

label = kfoldPredict(CVMdl);

Construct a confusion matrix to compare the true classes of the observations to their predicted labels.

C = confusionchart(Y,label);

Figure contains an object of type ConfusionMatrixChart.

The CVMdl model misclassifies 32 good ('g') radar returns as being bad ('b') and misclassifies 7 bad radar returns as being good.

Estimate k-Fold Cross-Validation Posterior Class Probabilities

Open Live Script

Estimate posterior class probabilities using a cross-validated, binary kernel classifier, and determine the quality of the model by plotting a receiver operating characteristic (ROC) curve. Cross-validated kernel classification models return posterior probabilities for logistic regression learners only.

Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, which are labeled either bad ('b') or good ('g').

load ionosphere

Cross-validate a binary kernel classification model using the data. Specify the class order, and fit logistic regression learners.

rng(1); % For reproducibility 
CVMdl = fitckernel(X,Y,'Crossval','on', ...
    'ClassNames',{'b','g'},'Learner','logistic')

CVMdl = 
  ClassificationPartitionedKernel
    CrossValidatedModel: 'Kernel'
           ResponseName: 'Y'
        NumObservations: 351
                  KFold: 10
              Partition: [1x1 cvpartition]
             ClassNames: {'b'  'g'}
         ScoreTransform: 'none'

Predict the posterior class probabilities for the observations that fitckernel does not use in training the folds.

[~,posterior] = kfoldPredict(CVMdl);

The output posterior is a matrix with two columns and n rows, where n is the number of observations. Column i contains posterior probabilities of CVMdl.ClassNames(i) given a particular observation.

Compute the performance metrics (true positive rates and false positive rates) for a ROC curve and find the area under the ROC curve (AUC) value by creating a rocmetrics object.

rocObj = rocmetrics(Y,posterior,CVMdl.ClassNames);

Plot the ROC curve for the second class by using the plot function of rocmetrics.

plot(rocObj,ClassNames=CVMdl.ClassNames(2))

Figure contains an axes object. The axes object with title ROC Curve, xlabel False Positive Rate, ylabel True Positive Rate contains 3 objects of type roccurve, scatter, line. These objects represent g (AUC = 0.9441), g Model Operating Point.

The AUC is close to 1, which indicates that the model predicts labels well.

Input Arguments

collapse all

`CVMdl` — Cross-validated, binary kernel classification model
`ClassificationPartitionedKernel` model object

Cross-validated, binary kernel classification model, specified as a ClassificationPartitionedKernel model object. You can create a ClassificationPartitionedKernel model by using fitckernel and specifying any one of the cross-validation name-value pair arguments.

To obtain estimates, kfoldPredict applies the same data used to cross-validate the kernel classification model (X and Y).

Output Arguments

collapse all

`label` — Predicted class labels
categorical array | character array | logical matrix | numeric matrix | cell array of character vectors

Predicted class labels, returned as a categorical or character array, logical or numeric matrix, or cell array of character vectors.

label has n rows, where n is the number of observations in X, and has the same data type as the observed class labels (Y) used to train CVMdl. (The software treats string arrays as cell arrays of character vectors.)

kfoldPredict classifies observations into the class yielding the highest score.

`score` — Classification scores
numeric array

Classification scores, returned as an n-by-2 numeric array, where n is the number of observations in X. score(i,j) is the score for classifying observation i into class j. The order of the classes is stored in CVMdl.ClassNames.

If CVMdl.Trained{1}.Learner is 'logistic', then classification scores are posterior probabilities.

More About

collapse all

Classification Score

For kernel classification models, the raw classification score for classifying the observation x, a row vector, into the positive class is defined by

$f (x) = T (x) β + b .$

$T (\cdot)$ is a transformation of an observation for feature expansion.
β is the estimated column vector of coefficients.
b is the estimated scalar bias.

The raw classification score for classifying x into the negative class is −f(x). The software classifies observations into the class that yields a positive score.

If the kernel classification model consists of logistic regression learners, then the software applies the 'logit' score transformation to the raw classification scores (see ScoreTransform).

Version History

Introduced in R2018b

expand all

R2023b: Observations with missing predictor values are used in resubstitution and cross-validation computations

Starting in R2023b, the following classification model object functions use observations with missing predictor values as part of resubstitution ("resub") and cross-validation ("kfold") computations for classification edges, losses, margins, and predictions.

Model Type	Model Objects	Object Functions
Discriminant analysis classification model	`ClassificationDiscriminant`	`resubEdge`, `resubLoss`, `resubMargin`, `resubPredict`
Discriminant analysis classification model	`ClassificationPartitionedModel`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Ensemble of discriminant analysis learners for classification	`ClassificationEnsemble`	`resubEdge`, `resubLoss`, `resubMargin`, `resubPredict`
	`ClassificationPartitionedEnsemble`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Gaussian kernel classification model	`ClassificationPartitionedKernel`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Gaussian kernel classification model	`ClassificationPartitionedKernelECOC`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Linear classification model	`ClassificationPartitionedLinear`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Linear classification model	`ClassificationPartitionedLinearECOC`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Neural network classification model	`ClassificationNeuralNetwork`	`resubEdge`, `resubLoss`, `resubMargin`, `resubPredict`
Neural network classification model	`ClassificationPartitionedModel`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`
Support vector machine (SVM) classification model	`ClassificationSVM`	`resubEdge`, `resubLoss`, `resubMargin`, `resubPredict`
Support vector machine (SVM) classification model	`ClassificationPartitionedModel`	`kfoldEdge`, `kfoldLoss`, `kfoldMargin`, `kfoldPredict`

In previous releases, the software omitted observations with missing predictor values from the resubstitution and cross-validation computations.

kfoldPredict

Syntax

Description

Examples

Classify Observations Using Cross-Validation

Estimate k-Fold Cross-Validation Posterior Class Probabilities

Input Arguments

CVMdl — Cross-validated, binary kernel classification model ClassificationPartitionedKernel model object

Output Arguments

label — Predicted class labels categorical array | character array | logical matrix | numeric matrix | cell array of character vectors

score — Classification scores numeric array

More About

Classification Score

Version History

R2023b: Observations with missing predictor values are used in resubstitution and cross-validation computations

See Also

`CVMdl` — Cross-validated, binary kernel classification model
`ClassificationPartitionedKernel` model object

`label` — Predicted class labels
categorical array | character array | logical matrix | numeric matrix | cell array of character vectors

`score` — Classification scores
numeric array