`predict`

uses three quantities to classify observations: posterior probability, prior probability, and cost.

`predict`

classifies so as to minimize the expected
classification cost:

$$\widehat{y}=\underset{y=1,\mathrm{...},K}{\mathrm{arg}\mathrm{min}}{\displaystyle \sum _{k=1}^{K}\widehat{P}\left(k|x\right)C\left(y|k\right)},$$

where

$$\widehat{y}$$ is the predicted classification.

*K*is the number of classes.$$\widehat{P}\left(k|x\right)$$ is the posterior probability of class

*k*for observation*x*.$$C\left(y|k\right)$$ is the cost of classifying an observation as

*y*when its true class is*k*.

The space of `X`

values divides into regions where a classification
`Y`

is a particular value. The regions are separated by straight
lines for linear discriminant analysis, and by conic sections (ellipses, hyperbolas, or
parabolas) for quadratic discriminant analysis. For a visualization of these regions,
see Create and Visualize Discriminant Analysis Classifier.

The posterior probability that a point *x* belongs to class
*k* is the product of the prior
probability and the multivariate normal density. The density function of
the multivariate normal with 1-by-*d* mean
*μ _{k}* and

$$P\left(x|k\right)=\frac{1}{{\left({\left(2\pi \right)}^{d}\left|{\Sigma}_{k}\right|\right)}^{1/2}}\mathrm{exp}\left(-\frac{1}{2}\left(x-{\mu}_{k}\right){\Sigma}_{k}^{-1}{\left(x-{\mu}_{k}\right)}^{T}\right),$$

where $$\left|{\Sigma}_{k}\right|$$ is the determinant of
Σ* _{k}*, and $${\Sigma}_{k}^{-1}$$ is the inverse matrix.

Let *P*(*k*) represent the prior probability of
class *k*. Then the posterior probability that an observation
*x* is of class *k* is

$$\widehat{P}\left(k|x\right)=\frac{P\left(x|k\right)P\left(k\right)}{P\left(x\right)},$$

where *P*(*x*) is a normalization constant,
namely, the sum over *k* of
*P*(*x*|*k*)*P*(*k*).

The prior probability is one of three choices:

`'uniform'`

— The prior probability of class`k`

is 1 over the total number of classes.`'empirical'`

— The prior probability of class`k`

is the number of training samples of class`k`

divided by the total number of training samples.A numeric vector — The prior probability of class

`k`

is the`j`

th element of the`Prior`

vector. See`fitcdiscr`

.

After creating a classifier `obj`

, you can set the prior using
dot notation:

obj.Prior = v;

where `v`

is a vector of positive elements representing the
frequency with which each element occurs. You do not need to retrain the classifier
when you set a new prior.

There are two costs associated with discriminant analysis classification: the true misclassification cost per class, and the expected misclassification cost per observation.

`Cost(i,j)`

is the cost of classifying an observation into
class `j`

if its true class is `i`

. By
default, `Cost(i,j)=1`

if `i~=j`

, and
`Cost(i,j)=0`

if `i=j`

. In other words,
the cost is `0`

for correct classification, and
`1`

for incorrect classification.

You can set any cost matrix you like when creating a classifier. Pass the cost
matrix in the `Cost`

name-value pair in `fitcdiscr`

.

After you create a classifier `obj`

, you can set a custom
cost using dot notation:

obj.Cost = B;

`B`

is a square matrix of size
`K`

-by-`K`

when there are
`K`

classes. You do not need to retrain the classifier when
you set a new cost.

Suppose you have `Nobs`

observations that you want to
classify with a trained discriminant analysis classifier `obj`

.
Suppose you have `K`

classes. You place the observations into a
matrix `Xnew`

with one observation per row. The command

[label,score,cost] = predict(obj,Xnew)

returns, among other outputs, a cost matrix of size
`Nobs`

-by-`K`

. Each row of the cost matrix
contains the expected (average) cost of classifying the observation into each of
the `K`

classes. `cost(n,k)`

is

$$\sum _{i=1}^{K}\widehat{P}\left(i|Xnew(n)\right)C\left(k|i\right)},$$

where

*K*is the number of classes.$$\widehat{P}\left(i|Xnew(n)\right)$$ is the posterior probability of class

*i*for observation*Xnew*(*n*).$$C\left(k|i\right)$$ is the cost of classifying an observation as

*k*when its true class is*i*.