## Kernel (Covariance) Function Options

In supervised learning, it is expected that the points with similar predictor values ${x}_{i}$, naturally have close response (target) values ${y}_{i}$. In Gaussian processes, the covariance function expresses this similarity [1]. It specifies the covariance between the two latent variables $f\left({x}_{i}\right)$ and $f\left({x}_{j}\right)$, where both ${x}_{i}$ and ${x}_{j}$ are d-by-1 vectors. In other words, it determines how the response at one point ${x}_{i}$ is affected by responses at other points ${x}_{j}$, ij, i = 1, 2, ..., n. The covariance function $k\left({x}_{i},{x}_{j}\right)$ can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector $\theta$. Hence, it is possible to express the covariance function as $k\left({x}_{i},{x}_{j}|\theta \right)$.

For many standard kernel functions, the kernel parameters are based on the signal standard deviation ${\sigma }_{f}$ and the characteristic length scale ${\sigma }_{l}$. The characteristic length scales briefly define how far apart the input values ${x}_{i}$ can be for the response values to become uncorrelated. Both ${\sigma }_{l}$ and ${\sigma }_{f}$ need to be greater than 0, and this can be enforced by the unconstrained parametrization vector $\theta$, such that

`${\theta }_{1}=\mathrm{log}{\sigma }_{l},\text{ }{\theta }_{2}=\mathrm{log}{\sigma }_{f}.$`

The built-in kernel (covariance) functions with same length scale for each predictor are:

• Squared Exponential Kernel

This is one of the most commonly used covariance functions and is the default option for `fitrgp`. The squared exponential kernel function is defined as

where ${\sigma }_{l}$ is the characteristic length scale, and ${\sigma }_{f}$ is the signal standard deviation.

• Exponential Kernel

You can specify the exponential kernel function using the `'KernelFunction','exponential'` name-value pair argument. This covariance function is defined by

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\mathrm{exp}\left(-\frac{r}{{\sigma }_{l}}\right),$`

where ${\sigma }_{l}$ is the characteristic length scale and

is the Euclidean distance between ${x}_{i}$ and ${x}_{j}$.

• Matern 3/2

You can specify the Matern 3/2 kernel function using the `'KernelFunction','matern32'` name-value pair argument. This covariance function is defined by

`$\begin{array}{l}k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\left(1+\frac{\sqrt{3}r}{{\sigma }_{l}}\right)\text{exp}\left(-\frac{\sqrt{3}r}{{\sigma }_{l}}\right)\hfill \end{array},$`

where

is the Euclidean distance between ${x}_{i}$ and ${x}_{j}$.

• Matern 5/2

You can specify the Matern 5/2 kernel function using the `'KernelFunction','matern52'` name-value pair argument. The Matern 5/2 covariance function is defined as

`$\begin{array}{l}k\left({x}_{i},{x}_{j}\right)={\sigma }_{f}^{2}\left(1+\frac{\sqrt{5}r}{{\sigma }_{l}}+\frac{5{r}^{2}}{3{\sigma }_{l}^{2}}\right)\text{exp}\left(-\frac{\sqrt{5}r}{{\sigma }_{l}}\right)\hfill \end{array},$`

where

is the Euclidean distance between ${x}_{i}$ and ${x}_{j}$.

You can specify the rational quadratic kernel function using the `'KernelFunction','rationalquadratic'` name-value pair argument. This covariance function is defined by

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}{\left(1+\frac{{r}^{2}}{2\alpha {\sigma }_{l}^{2}}\right)}^{-\alpha },$`

where ${\sigma }_{l}$ is the characteristic length scale, $\alpha$ is a positive-valued scale-mixture parameter, and

is the Euclidean distance between ${x}_{i}$ and ${x}_{j}$.

It is possible to use a separate length scale ${\sigma }_{m}^{}$ for each predictor m, m = 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD) [2]. The unconstrained parametrization $\theta$ in this case is

`$\begin{array}{l}{\theta }_{m}=\mathrm{log}{\sigma }_{m},\text{ }\text{for}\text{\hspace{0.17em}}m=1,2,...,d\text{ }\\ {\theta }_{d+1}=\mathrm{log}{\sigma }_{f}.\end{array}$`

The built-in kernel (covariance) functions with separate length scale for each predictor are:

• ARD Squared Exponential Kernel

You can specify this kernel function using the `'KernelFunction','ardsquaredexponential'` name-value pair argument. This covariance function is the squared exponential kernel function, with a separate length scale for each predictor. It is defined as

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\text{exp}\left[-\frac{1}{2}\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}\right].$`

• ARD Exponential Kernel

You can specify this kernel function using the `'KernelFunction','ardexponential'` name-value pair argument. This covariance function is the exponential kernel function, with a separate length scale for each predictor. It is defined as

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\mathrm{exp}\left(-r\right),$`

where

`$r=\sqrt{\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}}.$`
• ARD Matern 3/2

You can specify this kernel function using the `'KernelFunction','ardmatern32'` name-value pair argument. This covariance function is the Matern 3/2 kernel function, with a different length scale for each predictor. It is defined as

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\left(1+\sqrt{3}\text{\hspace{0.17em}}r\right)\text{exp}\left(-\sqrt{3}\text{\hspace{0.17em}}r\right),$`

where

`$r=\sqrt{\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}}.$`

• ARD Matern 5/2

You can specify this kernel function using the `'KernelFunction','ardmatern52'` name-value pair argument. This covariance function is the Matern 5/2 kernel function, with a different length scale for each predictor. It is defined as

`$\begin{array}{l}k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}\left(1+\sqrt{5}\text{\hspace{0.17em}}r+\frac{5}{3}\text{\hspace{0.17em}}{r}^{2}\right)\text{exp}\left(-\sqrt{5}\text{\hspace{0.17em}}r\right)\hfill \end{array},$`

where

`$r=\sqrt{\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}}.$`

You can specify this kernel function using the `'KernelFunction','ardrationalquadratic'` name-value pair argument. This covariance function is the rational quadratic kernel function, with a separate length scale for each predictor. It is defined as

`$k\left({x}_{i},{x}_{j}|\theta \right)={\sigma }_{f}^{2}{\left(1+\frac{1}{2\alpha }\sum _{m=1}^{d}\frac{{\left({x}_{im}-{x}_{jm}\right)}^{2}}{{\sigma }_{m}^{2}}\right)}^{-\alpha }.$`

You can specify the kernel function using the `KernelFunction` name-value pair argument in a call to `fitrgp`. You can either specify one of the built-in kernel parameter options, or specify a custom function. When providing the initial kernel parameter values for a built-in kernel function, input the initial values for signal standard deviation and the characteristic length scale(s) as a numeric vector. When providing the initial kernel parameter values for a custom kernel function, input the initial values the unconstrained parametrization vector $\theta$. `fitrgp` uses analytical derivatives to estimate parameters when using a built-in kernel function, whereas when using a custom kernel function it uses numerical derivatives.

## References

[1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.

[2] Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics, 118, 1996.