# Documentation

## Gaussian Mixture Models

Gaussian mixture models are formed by combining multivariate normal density components In Statistics and Machine Learning Toolbox™ software, use the `gmdistribution` class to fit data using an expectation maximization (EM) algorithm, which assigns posterior probabilities to each component density with respect to each observation. The fitting method uses an iterative algorithm that converges to a local optimum.

Clustering using Gaussian mixture models is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. For more information on clustering with Gaussian mixture models, see Clustering Using Gaussian Mixture Models. This section describes their creation.

### Creating Gaussian Mixture Models

#### Specifying a Model

Use the `gmdistribution` constructor to create Gaussian mixture models with specified means, covariances, and mixture proportions.

First, define the means, covariances, and mixture proportions.

```MU = [1 2;-3 -5]; % Means SIGMA = cat(3,[2 0;0 .5],[1 0;0 1]); % Covariances p = ones(1,2)/2; % Mixing proportions ```

Then, create an object of the `gmdistribution` class defining a two-component mixture of bivariate Gaussian distributions:

```obj = gmdistribution(MU,SIGMA,p); ```

Display properties of the object with the MATLAB® function `fieldnames`:

```properties = fieldnames(obj) ```
```properties = 'NumVariables' 'DistributionName' 'NumComponents' 'ComponentProportion' 'SharedCovariance' 'NumIterations' 'RegularizationValue' 'NegativeLogLikelihood' 'CovarianceType' 'mu' 'Sigma' 'AIC' 'BIC' 'Converged' ```

The `gmdistribution` reference page describes these properties. To access the value of a property, use dot indexing. For example, access the dimensions of the object.

```dimension = obj.NDimensions ```
```dimension = 2 ```

Access the distribution name.

```name = obj.DistName ```
```name = gaussian mixture distribution ```

Use the methods `pdf` and `cdf` to compute values and visualize the object:

```figure ezsurf(@(x,y)pdf(obj,[x y]),[-10 10],[-10 10]) ```

```figure ezsurf(@(x,y)cdf(obj,[x y]),[-10 10],[-10 10]) ```

#### Fitting a Model to Data

You can also create Gaussian mixture models by fitting a parametric model with a specified number of components to data. `fitgmdist` uses the syntax ```obj = fitgmdist(X,k)```, where `X` is a data matrix and `k` is the specified number of components. Choosing a suitable number of components `k` is essential for creating a useful model of the data—too few components fails to model the data accurately; too many components leads to an over-fit model with singular covariance matrices.

The following example illustrates this approach.

First, create some data from a mixture of two bivariate Gaussian distributions using the `mvnrnd` function:

```MU1 = [1 2]; SIGMA1 = [2 0; 0 .5]; MU2 = [-3 -5]; SIGMA2 = [1 0; 0 1]; X = [mvnrnd(MU1,SIGMA1,1000); mvnrnd(MU2,SIGMA2,1000)]; figure scatter(X(:,1),X(:,2),10,'.') ```

Next, fit a two-component Gaussian mixture model:

```options = statset('Display','final'); obj = fitgmdist(X,2,'Options',options); hold on h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]); hold off ```
```18 iterations, log-likelihood = -7058.35 ```

Among the properties of the fit are the parameter estimates.

Display the estimates for mu, sigma, and mixture proportions

```ComponentMeans = obj.mu ComponentCovariances = obj.Sigma MixtureProportions = obj.PComponents ```
```ComponentMeans = -2.9617 -4.9727 0.9539 2.0261 ComponentCovariances(:,:,1) = 1.0100 0.0059 0.0059 0.9897 ComponentCovariances(:,:,2) = 1.9939 -0.0092 -0.0092 0.4981 MixtureProportions = 0.5000 0.5000 ```

The two-component model minimizes the Akaike information:

```AIC = zeros(1,4); obj = cell(1,4); for k = 1:4 obj{k} = fitgmdist(X,k); AIC(k)= obj{k}.AIC; end [minAIC,numComponents] = min(AIC); numComponents ```
```numComponents = 2 ```

Display the model.

```model = obj{2} ```
```model = Gaussian mixture distribution with 2 components in 2 dimensions Component 1: Mixing proportion: 0.500000 Mean: -2.9617 -4.9727 Component 2: Mixing proportion: 0.500000 Mean: 0.9539 2.0261 ```

Both the Akaike and Bayes information are negative log-likelihoods for the data with penalty terms for the number of estimated parameters. You can use them to determine an appropriate number of components for a model when the number of components is unspecified.

### Simulating Gaussian Mixtures

Use the method `random` of the `gmdistribution` class to generate random data from a Gaussian mixture model created with `gmdistribution` or `fitgmdist`.

For example, the following specifies a `gmdistribution` object consisting of a two-component mixture of bivariate Gaussian distributions:

```MU = [1 2;-3 -5]; SIGMA = cat(3,[2 0;0 .5],[1 0;0 1]); p = ones(1,2)/2; obj = gmdistribution(MU,SIGMA,p); ```
```figure ezcontour(@(x,y)pdf(obj,[x y]),[-10 10],[-10 10]) hold on ```

Use `random(gmdistribution)` to generate 1000 random values:

```Y = random(obj,1000); scatter(Y(:,1),Y(:,2),10,'.') ```