Documentation

Gaussian Mixture Models

Gaussian mixture models are formed by combining multivariate normal density components In Statistics and Machine Learning Toolbox™ software, use the gmdistribution class to fit data using an expectation maximization (EM) algorithm, which assigns posterior probabilities to each component density with respect to each observation. The fitting method uses an iterative algorithm that converges to a local optimum.

Clustering using Gaussian mixture models is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. For more information on clustering with Gaussian mixture models, see Clustering Using Gaussian Mixture Models. This section describes their creation.

Creating Gaussian Mixture Models

Specifying a Model

Use the gmdistribution constructor to create Gaussian mixture models with specified means, covariances, and mixture proportions.

First, define the means, covariances, and mixture proportions.

MU = [1 2;-3 -5]; % Means
SIGMA = cat(3,[2 0;0 .5],[1 0;0 1]); % Covariances
p = ones(1,2)/2; % Mixing proportions

Then, create an object of the gmdistribution class defining a two-component mixture of bivariate Gaussian distributions:

obj = gmdistribution(MU,SIGMA,p);

Display properties of the object with the MATLAB® function fieldnames:

properties = fieldnames(obj)
properties = 

    'NumVariables'
    'DistributionName'
    'NumComponents'
    'ComponentProportion'
    'SharedCovariance'
    'NumIterations'
    'RegularizationValue'
    'NegativeLogLikelihood'
    'CovarianceType'
    'mu'
    'Sigma'
    'AIC'
    'BIC'
    'Converged'

The gmdistribution reference page describes these properties. To access the value of a property, use dot indexing. For example, access the dimensions of the object.

dimension = obj.NDimensions
dimension =

     2

Access the distribution name.

name = obj.DistName
name =

gaussian mixture distribution

Use the methods pdf and cdf to compute values and visualize the object:

figure
ezsurf(@(x,y)pdf(obj,[x y]),[-10 10],[-10 10])

figure
ezsurf(@(x,y)cdf(obj,[x y]),[-10 10],[-10 10])

Fitting a Model to Data

You can also create Gaussian mixture models by fitting a parametric model with a specified number of components to data. fitgmdist uses the syntax obj = fitgmdist(X,k), where X is a data matrix and k is the specified number of components. Choosing a suitable number of components k is essential for creating a useful model of the data—too few components fails to model the data accurately; too many components leads to an over-fit model with singular covariance matrices.

The following example illustrates this approach.

First, create some data from a mixture of two bivariate Gaussian distributions using the mvnrnd function:

MU1 = [1 2];
SIGMA1 = [2 0; 0 .5];
MU2 = [-3 -5];
SIGMA2 = [1 0; 0 1];
X = [mvnrnd(MU1,SIGMA1,1000);
mvnrnd(MU2,SIGMA2,1000)];
figure
scatter(X(:,1),X(:,2),10,'.')

Next, fit a two-component Gaussian mixture model:

options = statset('Display','final');
obj = fitgmdist(X,2,'Options',options);
hold on
h = ezcontour(@(x,y)pdf(obj,[x y]),[-8 6],[-8 6]);
hold off
18 iterations, log-likelihood = -7058.35

Among the properties of the fit are the parameter estimates.

Display the estimates for mu, sigma, and mixture proportions

ComponentMeans = obj.mu
ComponentCovariances = obj.Sigma
MixtureProportions = obj.PComponents
ComponentMeans =

   -2.9617   -4.9727
    0.9539    2.0261


ComponentCovariances(:,:,1) =

    1.0100    0.0059
    0.0059    0.9897


ComponentCovariances(:,:,2) =

    1.9939   -0.0092
   -0.0092    0.4981


MixtureProportions =

    0.5000    0.5000

The two-component model minimizes the Akaike information:

AIC = zeros(1,4);
obj = cell(1,4);
for k = 1:4
    obj{k} = fitgmdist(X,k);
    AIC(k)= obj{k}.AIC;
end

[minAIC,numComponents] = min(AIC);
numComponents
numComponents =

     2

Display the model.

model = obj{2}
model = 

Gaussian mixture distribution with 2 components in 2 dimensions
Component 1:
Mixing proportion: 0.500000
Mean:   -2.9617   -4.9727

Component 2:
Mixing proportion: 0.500000
Mean:    0.9539    2.0261



Both the Akaike and Bayes information are negative log-likelihoods for the data with penalty terms for the number of estimated parameters. You can use them to determine an appropriate number of components for a model when the number of components is unspecified.

Simulating Gaussian Mixtures

Use the method random of the gmdistribution class to generate random data from a Gaussian mixture model created with gmdistribution or fitgmdist.

For example, the following specifies a gmdistribution object consisting of a two-component mixture of bivariate Gaussian distributions:

MU = [1 2;-3 -5];
SIGMA = cat(3,[2 0;0 .5],[1 0;0 1]);
p = ones(1,2)/2;
obj = gmdistribution(MU,SIGMA,p);
figure
ezcontour(@(x,y)pdf(obj,[x y]),[-10 10],[-10 10])
hold on

Use random(gmdistribution) to generate 1000 random values:

Y = random(obj,1000);
scatter(Y(:,1),Y(:,2),10,'.')

Was this topic helpful?