Main Content

Model Building and Assessment

Feature selection, feature engineering, model selection, hyperparameter optimization, cross-validation, predictive performance evaluation, and classification accuracy comparison tests

When you build a high-quality, predictive classification model, it is important to select the right features (or predictors) and tune hyperparameters (model parameters that are not estimated).

Feature selection and hyperparameter tuning can yield multiple models. You can compare the k-fold misclassification rates, receiver operating characteristic (ROC) curves, or confusion matrices among the models. Or, conduct a statistical test to detect whether a classification model significantly outperforms another.

To engineer new features before training a classification model, use gencfeatures.

To build and assess classification models interactively, use the Classification Learner app.

To automatically select a model with tuned hyperparameters, use fitcauto. This function tries a selection of classification model types with different hyperparameter values and returns a final model that is expected to perform well on new data. Use fitcauto when you are uncertain which classifier types best suit your data.

To tune hyperparameters of a specific model, select the hyperparameter values and cross-validate the model using those values. For example, to tune an SVM model, choose a set of box constraints and kernel scales, and then cross-validate a model for each pair of values. Certain Statistics and Machine Learning Toolbox™ classification functions offer automatic hyperparameter tuning through Bayesian optimization, grid search, or random search. bayesopt, the main function for implementing Bayesian optimization, is flexible enough for many other applications as well. See Bayesian Optimization Workflow.

To interpret a classification model, you can use lime, shapley, and plotPartialDependence.

Apps

Classification LearnerTrain models to classify data using supervised machine learning

Functions

expand all

fscchi2Univariate feature ranking for classification using chi-square tests (Since R2020a)
fscmrmrRank features for classification using minimum redundancy maximum relevance (MRMR) algorithm (Since R2019b)
fscncaFeature selection using neighborhood component analysis for classification
oobPermutedPredictorImportancePredictor importance estimates by permutation of out-of-bag predictor observations for random forest of classification trees
predictorImportanceEstimates of predictor importance for classification tree
predictorImportanceEstimates of predictor importance for classification ensemble of decision trees
relieffRank importance of predictors using ReliefF or RReliefF algorithm
selectFeaturesSelect important features for NCA classification or regression (Since R2023b)
sequentialfsSequential feature selection using custom criterion
gencfeaturesPerform automated feature engineering for classification (Since R2021a)
describeDescribe generated features (Since R2021a)
transformTransform new data using generated features (Since R2021a)
fitcautoAutomatically select classification model with optimized hyperparameters (Since R2020a)
bayesoptSelect optimal machine learning hyperparameters using Bayesian optimization
hyperparametersVariable descriptions for optimizing a fit function
optimizableVariableVariable description for bayesopt or other optimizers
crossvalEstimate loss using cross-validation
cvpartitionPartition data for cross-validation
repartitionRepartition data for cross-validation
testTest indices for cross-validation
trainingTraining indices for cross-validation

Local Interpretable Model-Agnostic Explanations (LIME)

limeLocal interpretable model-agnostic explanations (LIME) (Since R2020b)
fitFit simple model of local interpretable model-agnostic explanations (LIME) (Since R2020b)
plotPlot results of local interpretable model-agnostic explanations (LIME) (Since R2020b)

Shapley Values

shapleyShapley values (Since R2021a)
fitCompute Shapley values for query point (Since R2021a)
plotPlot Shapley values (Since R2021a)

Partial Dependence

partialDependenceCompute partial dependence (Since R2020b)
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots

Confusion Matrix

confusionchartCreate confusion matrix chart for classification problem
confusionmatCompute confusion matrix for classification problem

Receiver Operating Characteristic (ROC) Curve

rocmetricsReceiver operating characteristic (ROC) curve and performance metrics for binary and multiclass classifiers (Since R2022a)
addMetricsCompute additional classification performance metrics (Since R2022a)
averageCompute performance metrics for average receiver operating characteristic (ROC) curve in multiclass problem (Since R2022a)
plotPlot receiver operating characteristic (ROC) curves and other performance curves (Since R2022a)
perfcurveReceiver operating characteristic (ROC) curve or other performance curve for classifier output
testcholdoutCompare predictive accuracies of two classification models
testckfoldCompare accuracies of two classification models by repeated cross-validation

Objects

expand all

FeatureSelectionNCAClassificationFeature selection for classification using neighborhood component analysis (NCA)
FeatureTransformerGenerated feature transformations (Since R2021a)
BayesianOptimizationBayesian optimization results

Properties

ConfusionMatrixChart PropertiesConfusion matrix chart appearance and behavior
ROCCurve PropertiesReceiver operating characteristic (ROC) curve appearance and behavior (Since R2022a)

Topics

Classification Learner App

Feature Selection

Feature Engineering

Automated Model Selection

Hyperparameter Optimization

Model Interpretation

Cross-Validation

Classification Performance Evaluation