Main Content

regressionGPComponent

Pipeline component for Gaussian process regression (GPR)

Since R2026a

    Description

    regressionGPComponent is a pipeline component that creates a Gaussian process regression (GPR) model. The pipeline component uses the functionality of the fitrgp function during the learn phase to train the GPR model. The component uses the functionality of the predict and loss functions during the run phase to perform regression.

    Creation

    Description

    component = regressionGPComponent creates a pipeline component for a Gaussian process regression (GPR) model.

    example

    component = regressionGPComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, you can specify the explicit basis in the model, form of covariance function, and method used to estimate model parameters.

    Properties

    expand all

    Structural Parameters

    The software sets structural parameters when you create the component. You cannot modify structural parameters after creating the component.

    This property is read-only after the component is created.

    Observation weights flag, specified as 0 (false) or 1 (true). If UseWeights is true, the component adds a third input "Weights" to the Inputs component property, and a third input tag 3 to the InputTags component property.

    Example: c = regressionGPComponent(UseWeights=1)

    Data Types: logical

    Learn Parameters

    The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

    Observations in the active set, specified as an m-by-1 vector of integers ranging from 1 to n (mn), or a logical vector of length n with at least one true element. n is the total number of observations in the training data.

    The component uses the observations specified by ActiveSet to train the GPR model. The active set cannot have duplicate elements.

    If you specify a value for ActiveSet, the component does not use ActiveSetMethod or ActiveSetSize to select observations in the active set.

    Example: c = regressionGPComponent(ActiveSet=[1 4 7 13])

    Example: c.ActiveSet = [0 1 0 0 1 1 0 1]

    Data Types: single | double | logical

    Active set selection method, specified as one of the following values.

    ValueDescription
    "random"Random selection
    "sgma"Sparse greedy matrix approximation
    "entropy"Differential entropy-based selection
    "likelihood"Subset of regressors loglikelihood-based selection

    This property is valid only when ActiveSet is empty ([]).

    Example: c = regressionGPComponent(ActiveSetMethod="entropy")

    Example: c.ActiveSetMethod = "sgma"

    Data Types: char | string

    Size of the active set, specified as an integer m, 1 ≤ mn, where n is the number of observations. This property is valid only when FitMethod is "sd", "sr", or "fic".

    The default value is min(1000,n) when FitMethod is "sr" or "fic", and min(2000,n) when FitMethod is "sd".

    This property is valid only when ActiveSet is empty ([]).

    Example: c = regressionGPComponent(ActiveSetSize=100)

    Example: c.ActiveSetSize = 250

    Data Types: single | double

    Explicit basis in the GPR model, specified as "constant", "none", "linear", "pureQuadratic", or a function handle. If n is the number of observations, the basis function adds the term H*β to the model, where H is the basis matrix and β is a p-by-1 vector of basis coefficients. p is the number of columns in the basis matrix H.

    Explicit BasisBasis Matrix
    "none"Empty matrix
    "constant"

    H=1

    H is an n-by-1 vector of 1s, where n is the number of observations.

    "linear"

    H=[1,X]

    X is the expanded predictor data after the component creates dummy variables for the categorical variables.

    "pureQuadratic"

    H=[1,X,X2],

    where

    X2=[x112x122x1d2x212x222x2d2xn12xn22xnd2].

    For this basis option, the component does not support X with categorical predictors.

    Function handle

    Function handle hfcn, which the component calls as

    H=hfcn(X),

    where X is an n-by-d matrix of predictors, d is the number of predictors after the component creates dummy variables for the categorical variables, and H is an n-by-p matrix of basis functions.

    Example: c = regressionGPComponent(BasisFunction="linear")

    Example: c.BasisFunction = "none"

    Data Types: char | string | function_handle

    Initial value of the coefficients, specified as a p-by-1 numeric vector, where p is the number of columns in the basis matrix H. The basis matrix depends on the value of BasisFunction.

    The component uses the initial values of the coefficients as the known coefficient values only when FitMethod is "none".

    Example: c = regressionGPComponent(Beta=[0.5,2,11,0.7,6])

    Example: c.Beta = 8

    Data Types: single | double

    Block size for the BCD (block coordinate descent) method, specified as an integer in the range 1 to n, where n is the number of observations.

    This property is valid only when PredictMethod is "bcd".

    Example: c = regressionGPComponent(BlockSizeBCD=1500)

    Example: c.BlockSizeBCD = 500

    Data Types: single | double

    Cache size in megabytes (MB), specified as a positive scalar. Cache size is the extra memory available beyond the memory required for fitting the model and selecting the active set. The component uses CacheSize to determine the following:

    • Whether the component caches inter-point distances when estimating parameters

    • How the component computes matrix vector products for the block coordinate descent method and for making predictions

    Example: c = regressionGPComponent(CacheSize=2000)

    Example: c.CacheSize = 1500

    Data Types: single | double

    Method used to compute the loglikelihood and gradient for parameter estimation, specified as "qr" or "v".

    • If ComputationMethod is "qr", the component uses the QR factorization approach, which provides better accuracy.

    • If ComputationMethod is "v", the component uses the V method approach, which provides faster computation.

    This property is valid only when FitMethod is "sr" or "fic".

    Example: c.regressionGPComponent(ComputationMethod="v")

    Example: c.ComputationMethod = "qr"

    Data Types: char | string

    Constant value of Sigma for the noise standard deviation of the Gaussian process model, specified as 0 (false) or 1 (true). When ConstantSigma is true, the component does not optimize the value of Sigma, but instead uses the initial value throughout its computations.

    Example: c = regressionGPComponent(ConstantSigma=true)

    Example: c.ConstantSigma = 0

    Data Types: logical

    Method used to compute inter-point distances to evaluate the built-in kernel functions, specified as "fast" or "accurate".

    • If DistanceMethod is "fast", the component computes (xy)2 as x2+y22*x*y.

    • If DistanceMethod is "accurate", the component computes (xy)2.

    Example: c = regressionGPComponent(DistanceMethod="accurate")

    Example: c.DistanceMethod = "fast"

    Data Types: char | string

    Method used to estimate the parameters of the GPR model, specified as one of the following.

    Fit MethodDescription
    "none"No estimation. Use the initial parameter values as the known parameter values.
    "exact"Exact Gaussian process regression. This value is the default if n ≤ 2000, where n is the number of observations.
    "sd"Subset of data points approximation. This value is the default if n > 2000, where n is the number of observations. "sd" is a sparse method.
    "sr"Subset of regressors approximation. "sr" is a sparse method.
    "fic"Fully independent conditional approximation. "fic" is a sparse method.

    Example: c = regressionGPComponent(FitMethod="fic")

    Example: c.FitMethod = "sd"

    Data Types: char | string

    Initial step size, specified as a real positive scalar or "auto". InitialStepSize is the approximate maximum absolute value of the first optimization step when Optimizer is "quasinewton" or "lbfgs". The initial step size can determine the initial Hessian approximation during optimization.

    • If InitialStepSize is a real positive scalar, the component uses the value as the initial step size during optimization.

    • If InitialStepSize is "auto", the component determines the initial step size automatically based on initial parameter values.

    • If InitialStepSize is [], the component does not use an initial step size.

    Example: c = regressionGPComponent(InitialStepSize="auto")

    Example: c.InitialStepSize = 0.5

    Data Types: single | double

    Maximum number of BCD (block coordinate descent) method iterations, specified as a positive integer.

    This property is valid only when PredictMethod is "bcd".

    Example: c = regressionGPComponent(IterationLimitBCD=10000)

    Example: c.IterationLimitBCD = 100000

    Data Types: single | double

    Form of the covariance function, specified as one of the following values.

    ValueDescription
    "exponential"Exponential kernel
    "squaredexponential"Squared exponential kernel
    "matern32"Matern kernel with parameter 3/2
    "matern52"Matern kernel with parameter 5/2
    "rationalquadratic"Rational quadratic kernel
    "ardexponential"Exponential kernel with a separate length scale per predictor
    "ardsquaredexponential"Squared exponential kernel with a separate length scale per predictor
    "ardmatern32"Matern kernel with parameter 3/2 and a separate length scale per predictor
    "ardmatern52"Matern kernel with parameter 5/2 and a separate length scale per predictor
    "ardrationalquadratic"Rational quadratic kernel with a separate length scale per predictor
    Function handle

    Function handle in the form

    Kmn = kfcn(Xm,Xn,theta),

    where Xm is an m-by-d matrix, Xn is an n-by-d matrix, and Kmn is an m-by-n matrix of kernel products such that Kmn(i,j) is the kernel product between Xm(i,:) and Xn(j,:). d is the number of predictor variables after the component creates dummy variables for the categorical variables. theta is the r-by-1 unconstrained parameter vector for kfcn.

    Example: c = regressionGPComponent(KernelFunction="matern32")

    Example: c.KernelFunction = "ardexponential"

    Data Types: char | string | function_handle

    Initial values for the kernel parameters, specified as a numeric vector. The size of the vector and its values depend on the value of KernelFunction.

    KernelFunction ValueKernelParameters Value
    "exponential", "squaredexponential", "matern32", or "matern52"

    2-by-1 vector phi, where phi(1) contains the length scale and phi(2) contains the signal standard deviation.

    The default initial value of the length scale parameter is the mean of the standard deviations of the predictors. The signal standard deviation is the standard deviation of the responses divided by the square root of 2. That is, phi = [mean(std(X));std(y)/sqrt(2)].

    "rationalquadratic"

    3-by-1 vector phi, where phi(1) contains the length scale, phi(2) contains the scale-mixture parameter, and phi(3) contains the signal standard deviation.

    The default initial value of the length scale parameter is the mean of the standard deviations of the predictors. The default initial value of the scale-mixture parameter is 1. The signal standard deviation is the standard deviation of the responses divided by the square root of 2. That is, phi = [mean(std(X));1;std(y)/sqrt(2)].

    "ardexponential", "ardsquaredexponential", "ardmatern32", or "ardmatern52"

    (d+1)-by-1 vector phi, where phi(i) contains the length scale for predictor i, and phi(d+1) contains the signal standard deviation. d is the number of predictor variables after the component creates dummy variables for the categorical variables.

    The default initial values of the length scale parameters are the standard deviations of the predictors. The signal standard deviation is the standard deviation of the responses divided by the square root of 2. That is, phi = [std(X)';std(y)/sqrt(2)].

    "ardrationalquadratic"

    (d+2)-by-1 vector phi, where phi(i) contains the length scale for predictor i, phi(d+1) contains the scale-mixture parameter, and phi(d+2) contains the signal standard deviation. d is the number of predictor variables after the component creates dummy variables for the categorical variables.

    The default initial values of the length scale parameters are the standard deviations of the predictors. The default initial value of the scale-mixture parameter is 1. The signal standard deviation is the standard deviation of the responses divided by the square root of 2. That is, phi = [std(X)';1;std(y)/sqrt(2)].

    Function handle

    r-by-1 vector for the initial value of the unconstrained parameter vector phi for the custom kernel function kfcn.

    When KernelFunction is a function handle, you must supply initial values for the kernel parameters.

    For more information about the kernel functions, see Kernel (Covariance) Function Options.

    Example: c = regressionGPComponent(KernelParameters=[3.5,6.2])

    Example: c.KernelParameters = [10,10,10]

    Data Types: single | double

    Number of repetitions for automatic active set selection and parameter estimation, specified as a positive integer.

    If you do not specify an active set, the component automatically selects the active set using a process that repeats NumActiveSetRepeats times. During each repetition, the component selects an active set, estimates parameters based on the active set, and uses the estimated parameters to select a new active set. For more information on automatic active set selection, see NumActiveSetRepeats.

    This property is valid only when ActiveSet is [] and ActiveSetMethod is not "random".

    Example: c = regressionGPComponent(NumActiveSetRepeats=5)

    Example: c.NumActiveSetRepeats = 7

    Data Types: single | double

    Number of greedy selections for the BCD (block coordinate descent) method, specified as an integer in the range 1 to BlockSizeBCD.

    This property is valid only when PredictMethod is "bcd".

    Example: c = regressionGPComponent(NumGreedyBCD=150)

    Example: c.NumGreedyBCD = 50

    Data Types: single | double

    Optimizer used to estimate parameters, specified as one of the values in this table.

    ValueDescription
    "quasinewton"Dense, symmetric rank-1-based, quasi-Newton approximation to the Hessian
    "lbfgs"LBFGS-based quasi-Newton approximation to the Hessian
    "fminsearch"Unconstrained nonlinear optimization using the simplex search method of Lagarias et al. [1]
    "fminunc"Unconstrained nonlinear optimization (requires an Optimization Toolbox™ license)
    "fmincon"Constrained nonlinear optimization (requires an Optimization Toolbox license)

    Example: c = regressionGPComponent(Optimizer="fminsearch")

    Example: c.Optimizer = "lbfgs"

    Data Types: char | string

    Options for the Optimizer, specified as a structure or object created by optimset, statset, or optimoptions (Optimization Toolbox).

    OptimizerFunction for Creating Optimizer Options
    "fminsearch"optimset (structure)
    "quasinewton" or "lbfgs"statset("fitrgp") (structure)
    "fminunc" or "fmincon"optimoptions (object)

    The default options depend on the specified optimizer.

    Example: c = regressionGPComponent(OptimizerOptions=statset("fitrgp"))

    Example: c.OptimizerOptions = optimset

    Data Types: struct

    Method used to make predictions from the Gaussian process model given the parameters, specified as one of the following values.

    ValueDescription
    "exact"Exact Gaussian process regression. This value is the default if n ≤ 10,000.
    "bcd"Block coordinate descent (BCD). This value is the default if n > 10,000.
    "sd"Subset of data points approximation
    "sr"Subset of regressors approximation
    "fic"Fully independent conditional approximation

    Example: c = regressionGPComponent(PredictMethod="bcd")

    Example: c.PredictMethod = "sr"

    Data Types: char | string

    Random search set size per greedy inclusion for active set selection, specified as a positive integer.

    Example: c = regressionGPComponent(RandomSearchSetSize=30)

    Example: c.RandomSearchSetSize = 45

    Data Types: single | double

    Regularization standard deviation for the subset of regressors and fully independent conditional approximation methods, specified as a positive scalar. The default value is 1e-2*std(y), where y is the second data argument of learn.

    This property is valid only when FitMethod is "sr" or "fic"

    Example: c = regressionGPComponent(Regularization=0.2)

    Example: c.Regularization = 0.5

    Data Types: single | double

    Initial value for the noise standard deviation of the Gaussian process model, specified as a positive scalar.

    The component parameterizes the noise standard deviation as the sum of SigmaLowerBound and exp(η), where η is an unconstrained value. Therefore, Sigma must be larger than SigmaLowerBound by a small tolerance so that the component can initialize η to a finite value. Otherwise, the component resets Sigma to a compatible value.

    The tolerance is 1e-3 when ConstantSigma is false and 1e-6 otherwise. If the tolerance is not small enough relative to the scale of the response variable, you can scale up the response variable so that the tolerance value is small for the response variable.

    The default value is std(y)/sqrt(2), where y is the second data argument of learn.

    Example: c = regressionGPComponent(Sigma=2)

    Example: c.Sigma = 5

    Data Types: single | double

    Lower bound on the noise standard deviation (Sigma), specified as a positive scalar. The default value is 1e-2*std(y), where y is the second data argument of learn.

    Sigma must be larger than SigmaLowerBound by a small tolerance.

    Example: c = regressionGPComponent(SigmaLowerBound=0.02)

    Example: c.SigmaLowerBound = 0.5

    Data Types: single | double

    Flag to standardize the predictors, specified as 0 (false) or 1 (true). If Standardize is true, the component centers and scales each column of the first data argument of learn by the column mean and standard deviation, respectively.

    The component does not standardize the data contained in the dummy variable columns generated for categorical predictors.

    Example: c = regressionGPComponent(Standardize=true)

    Example: c.Standardize = 0

    Data Types: logical

    Absolute tolerance on the step size for terminating the BCD (block coordinate descent) method iterations, specified as a positive scalar.

    This property is valid only when PredictMethod is "bcd".

    Example: c = regressionGPComponent(StepToleranceBCD=0.002)

    Example: c.StepToleranceBCD = 0.005

    Data Types: single | double

    Relative tolerance for terminating the active set selection, specified as a positive scalar.

    Example: c = regressionGPComponent(ToleranceActiveSet=0.0002)

    Example: c.ToleranceActiveSet = 0.00005

    Data Types: single | double

    Relative tolerance on the gradient norm for terminating the BCD (block coordinate descent) method iterations, specified as a positive scalar.

    This property is valid only when PredictMethod is "bcd".

    Example: c = regressionGPComponent(ToleranceBCD=0.002)

    Example: c = ToleranceBCD=0.005

    Data Types: single | double

    Run Parameters

    The software sets run parameters when you create the component. You can modify the run parameters using dot notation at any time. Any unset run parameters use the corresponding default values.

    Loss function, specified as "mse" (mean squared error) or a function handle.

    To specify a custom loss function, use function handle notation. For more information on custom loss functions, see lossfun.

    Example: c = regressionGPComponent(LossFun=@LossFun)

    Example: c.LossFun = "mse"

    Data Types: char | string | function_handle

    Function for transforming raw response values, specified as a function handle or function name. The default is "none", which means @(y)y, or no transformation. The function must accept a vector (the original response values) and return a vector of the same size (the transformed response values).

    Example: c = regressionGPComponent(ResponseTransform=@(y)exp(y))

    Example: c.ResponseTransform = "exp"

    Data Types: char | string | function_handle

    Component Properties

    The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) using dot notation at any time. You cannot modify the HasLearnables and HasLearned properties directly.

    Component identifier, specified as a character vector or string scalar.

    Example: c = regressionGPComponent(Name="GP")

    Example: c.Name = "GPRegression"

    Data Types: char | string

    Names of the input ports, specified as a character vector, string array, or cell array of character vectors. If UseWeights is true, the component adds the input port "Weights" to Inputs.

    Example: c = regressionGPComponent(Inputs=["X","Y"])

    Example: c.Inputs = ["X1","Y1"]

    Data Types: char | string | cell

    Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

    Example: c = regressionGPComponent(Outputs=["Responses","LossVal"])

    Example: c.Outputs = ["X","Y"]

    Data Types: char | string | cell

    Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, the number of tags must match the number of inputs in Inputs. If UseWeights is true, the software adds a third input tag to InputTags.

    Example: c = regressionGPComponent(InputTags=[0 1])

    Example: c.InputTags = [1 0]

    Data Types: single | double

    Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, the number of tags must match the number of outputs in Outputs.

    Example: c = regressionGPComponent(OutputTags=[0 1])

    Example: c.OutputTags=[1 2]

    Data Types: single | double

    This property is read-only.

    Indicator for the learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

    Data Types: logical

    This property is read-only.

    Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value of 1 indicates that the learn object function has been applied to the component, and the Learnables are nonempty.

    Data Types: logical

    Learnables

    The software sets learnables when you use the learn object function. You cannot modify learnables directly.

    This property is read-only.

    Trained model, returned as a CompactRegressionGP model object.

    Object Functions

    learnInitialize and evaluate pipeline or component
    runExecute pipeline or component for inference after learning
    resetReset pipeline or component
    seriesConnect components in series to create pipeline
    parallelConnect components or pipelines in parallel to create pipeline
    viewView diagram of pipeline inputs, outputs, components, and connections

    Examples

    collapse all

    Create a regressionGPComponent pipeline component.

    component = regressionGPComponent
    component = 
    
      regressionGPComponent with properties:
    
                Name: "RegressionGP"
              Inputs: ["Predictors"    "Response"]
           InputTags: [1 2]
             Outputs: ["Predictions"    "Loss"]
          OutputTags: [1 0]
    
       
    Learnables (HasLearned = false)
        TrainedModel: []
    
       
    Structural Parameters (locked)
          UseWeights: 0
    
    
    Show all parameters

    component is a regressionGPComponent object that contains one learnable, TrainedModel. This property remains empty until you pass data to the component during the learn phase.

    To use a linear basis function, set the BasisFunction property of the component to "linear".

    component.BasisFunction = "linear";

    Load the carsmall data set and remove missing entries from the data. Separate the predictor and response variables into two tables.

    load carsmall
    carData = table(Cylinders,Displacement,Horsepower,Weight,MPG);
    R = rmmissing(carData);
    X = R(:,["Cylinders","Displacement","Horsepower","Weight"]);
    Y = R(:,"MPG");

    Use the learn object function to train the regressionGPComponent using the car data.

    component = learn(component,X,Y)
    component = 
      regressionGPComponent with properties:
    
                 Name: "RegressionGP"
               Inputs: ["Predictors"    "Response"]
            InputTags: [1 2]
              Outputs: ["Predictions"    "Loss"]
           OutputTags: [1 0]
    
       
    Learnables (HasLearned = true)
         TrainedModel: [1×1 classreg.learning.regr.CompactRegressionGP]
    
       
    Structural Parameters (locked)
           UseWeights: 0
    
       
    Learn Parameters (locked)
        BasisFunction: 'Linear'
    
    
    Show all parameters
    

    Note that the HasLearned property is set to true, which indicates that the software trained the GPR model TrainedModel. You can use component to predict response values for new data using the run object function.

    References

    [1] Lagarias, J. C., J. A. Reeds, M. H. Wright, and P. E. Wright. "Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions." SIAM Journal of Optimization. Vol. 9, Number 1, 1998, pp. 112–147.

    Version History

    Introduced in R2026a