Main Content

featureSelectionClassificationNCAComponent

Pipeline component for performing feature selection using neighborhood component analysis (NCA) for classification

Since R2026a

    Description

    featureSelectionRegressionNCAComponent is a pipeline component that performs feature selection using neighborhood component analysis (NCA) for classification. The pipeline component uses the functionality of the fscnca function during the learn phase to identify important predictors in the data. During the run phase, the component selects the same predictors from a new data set.

    Creation

    Description

    component = featureSelectionRegressionNCAComponent creates a pipeline component for feature selection using an NCA feature selection model. Use the component when creating a pipeline for classification.

    component = featureSelectionClassificationNCAComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, you can specify the regularization parameter, solver type, and method used for model fitting.

    example

    Properties

    expand all

    Structural Parameters

    The software sets structural parameters when you create the component. You cannot modify structural parameters after creating the component.

    This property is read-only after the component is created.

    Observation weights flag, specified as 0 (false) or 1 (true). If UseWeights is true, the component adds a third input "Weights" to the Inputs component property, and a third input tag 3 to the InputTags component property.

    Example: c = featureSelectionClassificationNCAComponent(UseWeights=1)

    Data Types: logical

    Learn Parameters

    The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

    Method for fitting the model, specified as one of the following:

    • "exact" — Performs fitting using all of the data.

    • "none" — No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights.

    • "average" — Divides the data into partitions (subsets), fits each partition using the exact method, and returns the average of the feature weights.

    Example: c = featureSelectionClassificationNCAComponent(FitMethod="none")

    Example: c.FitMethod = "average"

    Data Types: char | string

    Relative convergence tolerance on the gradient norm, specified as a positive real scalar.

    This property is valid only when Solver is "lbfgs".

    Example: c = featureSelectionClassificationNCAComponent(GradientTolerance=2e-6)

    Example: c.GradientTolerance = 1e-5

    Data Types: single | double

    Size of the history buffer for Hessian approximation, specified as a positive integer. At each iteration, the component uses the most recent HessianHistorySize iterations to build an approximation of the inverse Hessian.

    This property is valid only when Solver is "lbfgs".

    Example: c = featureSelectionClassificationNCAComponent(HessianHistorySize=20)

    Example: c.HessianHistorySize = 10

    Data Types: single | double

    Initial learning rate for the "sgd" solver, specified as a positive real scalar or "auto".

    When Solver is "sgd", the learning rate decays over iterations starting with the value specified for InitialLearningRate.

    When you specify "auto", the initial learning rate is determined using experiments on small subsets of data. Use the NumTuningIterations property to specify the number of iterations for automatically tuning the initial learning rate. Use the TuningSubsetSize property to specify the number of observations to use for automatically tuning the initial learning rate.

    For solver type "minibatch-lbfgs", you can set InitialLearningRate to a very high value. In this case, the function applies LBFGS to each mini-batch separately with initial feature weights from the previous mini-batch.

    Example: c = featureSelectionClassificationNCAComponent(InitialLearningRate=0.9)

    Example: c.InitialLearningRate = "auto"

    Data Types: single | double | char | string

    Initial step size, specified as a positive real scalar or "auto".

    This property is valid only when Solver is "lbfgs".

    Example: c = featureSelectionClassificationNCAComponent(InitialStepSize=0.1)

    Example: c.InitialStepSize = "auto"

    Data Types: single | double | char | string

    Maximum number of iterations, specified as a positive integer.

    Each pass through a batch is an iteration. Each pass through all of the data is an epoch. If the data is divided into k mini-batches, then every epoch is equivalent to k iterations.

    If Solver is "sgd", the default value is 10000. If Solver is "lbfgs" or "minibatch-lbfgs", the default value is 1000.

    Example: c = featureSelectionClassificationNCAComponent(IterationLimit=250)

    Example: c.IterationLimit = 1000

    Data Types: single | double

    Regularization parameter to prevent overfitting, specified as a nonnegative scalar.

    As the number of observations increases, the chance of overfitting decreases and the required amount of regularization also decreases.

    The default value is 1/n, where n is the number of observations in the first data argument of learn.

    Example: c = featureSelectionClassificationNCAComponent(Lambda=0.002)

    Example: c.Lambda = 0.01

    Data Types: single | double

    Width of the kernel, specified as a positive real scalar.

    A length scale value of 1 is sensible when all predictors are on the same scale. If the predictors are of very different magnitudes, then consider standardizing the predictor values using the Standardize property.

    Example: c = featureSelectionClassificationNCAComponent(LengthScale=1.5)

    Example: c.LengthScale = 1.25

    Data Types: single | double

    Line search method, specified as one of the following:

    • "weakwolfe" — Weak Wolfe line search

    • "strongwolfe" — Strong Wolfe line search

    • "backtracking" — Backtracking line search

    This property is valid only when Solver is "lbfgs".

    Example: c = featureSelectionClassificationNCAComponent(LineSearchMethod="strongwolfe")

    Example: c.LineSearchMethod = "backtracking"

    Data Types: char | string

    Loss function, specified as "classiferror" or a function handle.

    When you specify "classiferror", the component uses the misclassification error for computing the objective function..

    To specify a custom loss function, use function handle notation. The function must have the form L = lossfun(Yu,Yv), where Yu is a u-by-1 vector, Yv is a v-by-1 vector, and L is a u-by-v matrix of loss values.

    Example: c = featureSelectionClassificationNCAComponent(LossFun=@lossfun)

    Example: c.LossFun = "classiferror"

    Data Types: char | string

    Maximum number of line search iterations, specified as a positive integer.

    This property is valid only when Solver is "lbfgs".

    Example: c = featureSelectionClassificationNCAComponent(MaxLineSearchIterations=25)

    Example: c.MaxLineSearchIterations = 15

    Data Types: single | double

    Max weight fraction for selecting features, specified as a numeric scalar in the range (0,1].

    If you do not specify the NumFeatures or MaxWeightFraction value, the software selects all features. You cannot specify both NumFeatures and MaxWeightFraction.

    Example: c = featureSelectionClassificationNCAComponent(MaxWeightFraction=0.5)

    Example: c.MaxWeightFraction = 0.75

    Data Types: single | double

    Maximum number of iterations per mini-batch LBFGS step, specified as a positive integer.

    This property is valid only when Solver is "minibatch-lbfgs".

    Example: c = featureSelectionClassificationNCAComponent(MiniBatchLBFGSIterations=15)

    Example: c.MiniBatchLBFGSIterations = 20

    Data Types: single | double

    Number of observations to use in each batch, specified as a positive integer between 1 and n, where n is the number of observations in the first data argument of learn.

    This property is valid only when Solver is "sgd".

    The default value is min(10,n).

    Example: c = featureSelectionClassificationNCAComponent(MiniBatchSize=25)

    Example: c.MiniBatchSize = 20

    Data Types: single | double

    Number of features (predictors) to select, specified as a positive integer scalar.

    If you do not specify the NumFeatures or MaxWeightFraction value, the software selects all features. You cannot specify both NumFeatures and MaxWeightFraction.

    Example: c = featureSelectionClassificationNCAComponent(NumFeatures=5)

    Example: c.NumFeatures = 10

    Data Types: single | double

    Number of tuning iterations, specified as a positive integer.

    This property is valid only when Solver is "sgd" and InitialLearningRate is "auto".

    Example: c = featureSelectionClassificationNCAComponent(NumTuningIterations=15)

    Example: c.NumTuningIterations = 25

    Data Types: single | double

    Maximum number of passes, specified as a positive integer. Each pass through all of the data is called an epoch.

    This property is valid only when Solver is "sgd".

    Example: c = featureSelectionClassificationNCAComponent(PassLimit=10)

    Example: c.PassLimit = 3

    Data Types: single | double

    Prior probabilities for each class, specified as a value in this table.

    ValueDescription
    "empirical"The class prior probabilities are the class relative frequencies. The class relative frequencies are determined by the second data argument of learn.
    "uniform"All class prior probabilities are equal to 1/K, where K is the number of classes.
    structure

    A structure S with two fields:

    • S.ClassNames contains a list of the class names.

    • S.ClassProbs contains a vector of corresponding prior probabilities. The component normalizes the elements such that they sum to 1.

    Example: c = featureSelectionClassificationNCAComponent(Prior="uniform")

    Example: c.Prior = "empirical"

    Data Types: char | string | struct

    Solver type for estimating feature weights, specified as one of the following:

    • "lbfgs" — Limited memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) algorithm

    • "sgd" — Stochastic gradient descent (SGD) algorithm

    • "minibatch-lbfgs" — Stochastic gradient descent with LBFGS algorithm applied to mini-batches

    The default value is "sgd" when n>1000, where n is the number of observations in the first data argument of learn. Otherwise, the default value is "lbfgs".

    Example: c = featureSelectionClassificationNCAComponent(Solver="sgd")

    Example: c.Solver = "lbfgs"

    Data Types: char | string

    Indicator for standardizing the predictor data, specified as 0 (false) or 1 (true).

    Example: c = featureSelectionClassificationNCAComponent(Standardized=true)

    Example: c.Standardize = false

    Data Types: logical

    Convergence tolerance on the step size, specified as a positive real scalar.

    This property is valid only when Solver is "sgd" or "lbfgs".

    The "lbfgs" solver uses an absolute step tolerance, and the "sgd" solver uses a relative step tolerance.

    Example: c = featureSelectionClassificationNCAComponent(StepTolerance=5e-6)

    Example: c.StepTolerance = 1e-5

    Data Types: single | double

    Number of observations to use for tuning the initial learning rate, specified as a positive integer value from 1 to n, where n is the number of observations in the first data argument of learn.

    This property is valid only when Solver is "sgd" and InitialLearningRate is "auto".

    The default value is min(100,n).

    Example: c = featureSelectionClassificationNCAComponent(TuningSubsetSize=25)

    Example: c.TuningSubsetSize = 50

    Data Types: single | double

    Component Properties

    The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) using dot notation at any time. You cannot modify the HasLearnables and HasLearned properties directly.

    Component identifier, specified as a character vector or string scalar.

    Example: c = featureSelectionClassificationNCAComponent(Name="FeatureSelector")

    Example: c.Name = "NCASelector"

    Data Types: char | string

    Names of the input ports, specified as a character vector, string array, or cell array of character vectors. If UseWeights is true, the component adds the input port "W" to Inputs.

    Example: c = featureSelectionClassificationNCAComponent(Inputs=["Data1","Data2"])

    Example: c.Inputs = ["X1","Y1"]

    Data Types: char | string | cell

    Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

    Example: c = featureSelectionClassificationNCAComponent(Outputs=["newX","importance"])

    Example: c.Outputs = ["X","S"]

    Data Types: char | string | cell

    Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, then the number of tags must match the number of inputs in Inputs. If UseWeights is true, the software adds a third input tag to InputTags.

    Example: c = featureSelectionClassificationNCAComponent(InputTags=[1 0])

    Example: c.InputTags = [1 2]

    Data Types: single | double

    Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, then the number of tags must match the number of outputs in Outputs.

    Example: c = featureSelectionClassificationNCAComponent(OutputTags=[1 0])

    Example: c.OutputTags=[1 2]

    Data Types: single | double

    This property is read-only.

    Indicator for the learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

    Data Types: logical

    This property is read-only.

    Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value of 1 indicates that the learn object function has been applied to the component and the Learnables are nonempty.

    Data Types: logical

    Learnables

    The software sets learnables when you use the learn object function. You cannot modify learnables directly.

    This property is read-only.

    Neighborhood component analysis model for classification, returned as a FeatureSelectionNCAClassification model object.

    This property is read-only.

    Names of the features selected by the component, returned as a string array. The features correspond to columns in the first data argument of learn.

    Data Types: string

    This property is read-only.

    Names of the variables used by the component to select features, returned as a string array. The variables correspond to columns in the first data argument of learn.

    Data Types: string

    Object Functions

    learnInitialize and evaluate pipeline or component
    runExecute pipeline or component for inference after learning
    resetReset pipeline or component
    seriesConnect components in series to create pipeline
    parallelConnect components or pipelines in parallel to create pipeline
    viewView diagram of pipeline inputs, outputs, components, and connections

    Examples

    collapse all

    Create a featureSelectionClassificationNCAComponent pipeline component. Specify to select 3 features.

    component = featureSelectionClassificationNCAComponent(NumFeatures=3)
    component = 
      featureSelectionClassificationNCAComponent with properties:
    
                     Name: "FeatureSelectionClassificationNCA"
                   Inputs: ["X"    "Y"]
                InputTags: [1 2]
                  Outputs: ["XSelected"    "Scores"]
               OutputTags: [1 NaN]
    
       
    Learnables (HasLearned = false)
                    Model: []
        SelectedVariables: []
            UsedVariables: []
    
       
    Structural Parameters (locked)
               UseWeights: 0
    
       
    Learn Parameters (unlocked)
              NumFeatures: 3
    
    
    Show all parameters
    

    component is a featureSelectionClassificationNCAComponent object that contains three learnables: Model, SelectedVariables, and UsedVariables. These properties remains empty until you pass data to the component during the learn phase.

    Read the fisheriris data set into a table. Store the predictor and response data in the tables X and Y, respectively.

    fisheriris = readtable("fisheriris.csv");
    X = fisheriris(:,1:end-1);
    Y = fisheriris(:,end);

    Use the learn object function to select features from the predictor data X.

    component = learn(component,X,Y)
    component = 
      featureSelectionClassificationNCAComponent with properties:
    
                     Name: "FeatureSelectionClassificationNCA"
                   Inputs: ["X"    "Y"]
                InputTags: [1 2]
                  Outputs: ["XSelected"    "Scores"]
               OutputTags: [1 NaN]
    
       
    Learnables (HasLearned = true)
                    Model: [1×1 FeatureSelectionNCAClassification]
        SelectedVariables: ["PetalLength"    "PetalWidth"    "SepalWidth"]
            UsedVariables: ["SepalLength"    "SepalWidth"    "PetalLength"    "PetalWidth"]
    
       
    Structural Parameters (locked)
               UseWeights: 0
    
       
    Learn Parameters (locked)
              NumFeatures: 3
    
    
    Show all parameters
    

    Note that the HasLearned property is set to true and Model, SelectedVariables, and UsedVariables are nonempty.

    Find the names of the selected features.

    names = component.SelectedVariables
    names = 
    
      1×3 string array
    
        "PetalLength"    "PetalWidth"    "SepalWidth"

    Version History

    Introduced in R2026a