File Exchange

## EM algorithm for Gaussian mixture model with background noise

version 1.1.0.0 (3.07 KB) by
Standard EM algorithm to fit a GMM with the (optional) consideration of background noise.

Updated 16 May 2012

This is the standard EM algorithm for GMMs, presented in Bishop's book "Pattern Recognition and Machine Learning", Chapter 9, with one small exception, the addition of a uniform distribution to the mixture to pick up background noise/speckle; data points which one would not want to associate with any cluster.

NOTE: This function requires the MATLAB Statistical Toolbox and, for plotting the ellipses, the function error_ellipse, available from http://www.mathworks.com/matlabcentral/fileexchange/4705. Also requires at least MATLAB 7.9 (2009b)

For a demo example simply run GM_EM();
Plotting is provided automatically for 1D/2D cases with 5 GMs or less.

Usage: % GM_EM - fit a Gaussian mixture model to N points located in n-dimensional space.
% GM_EM(X,k) - fit a GMM to X, where X is N x n and k is the number of
% clusters. Algorithm follows steps outlined in Bishop
% (2009) 'Pattern Recognition and Machine Learning', Chapter 9.

% Optional inputs
% bn_noise - allow for uniform background noise term ('T' or 'F',
% default 'T'). If 'T', relevant classification uses the
% (k+1)th cluster
% reps - number of repetitions with different initial conditions
% (default = 10). Note: only the best fit (in a likelihood sense) is
% returned.
% max_iters - maximum iteration number for EM algorithm (default = 100)
% tol - tolerance value (default = 0.01)

% Outputs
% idx - classification/labelling of data in X
% mu - GM centres

### Cite As

Andrew (2020). EM algorithm for Gaussian mixture model with background noise (https://www.mathworks.com/matlabcentral/fileexchange/36721-em-algorithm-for-gaussian-mixture-model-with-background-noise), MATLAB Central File Exchange. Retrieved .

Anders Ueland

Thank you! A very nice contribution.

I used your program on a feature vector with 20 000 samples and I tried to make it faster. By replacing the matrix product by a vectorized implementation, avoiding the diag function, I achieved a speedup of a factor of 40.

Current matrix product implementation:
% tot_sum = (X'-repmat(mu(:,j),1,N)) * diag(gamma_znk(:,j)) * (X'-repmat(mu(:,j),1,N))';

Suggested implementation:
% tot_sum = bsxfun(@times, X'-repmat(mu(:,j),1,N), gamma_znk(:,j)') * (X'-repmat(mu(:,j),1,N))';

David Provencher

I'm trying to run the code, but I keep getting this warning :

'Warning: chol failed, algorithm abandoned';

because the cholcov(Sigma(:,:,j),0); line always fails at the 2nd iteration (bn_noise='T') or 3rd iteration (bn_noise='F').

FYI, I have no NaN values in my data, and I get coherent results with kmeans() and emgm() [the submission that inspired this one]. Actually, no matter what data I feed into the function (e.g. squre matrix, rand(m,n), ...) this step always fails.

Any insight on this?
Thanks,
David

Jin Wang

the input has to be square,right?
if my input data is not square, like 200x10, what should I do?
Thanks!

peter

an "unknown" cluster, this is what we have been looking for. thanks a lot.