Gaussian mixture model sometimes seems to fit very badly
Show older comments
In the following code, I fit a gaussian mixture model (GMM) to some randomly sampled data. I do this twice. Each time, the data represent two well separated gaussians, the only difference being the seed I use for the random number generator.
N = 100000;
EFFECT_SIZE = 5;
seedList = [1 6];
for s = seedList
rng(s)
X = [randn(N,1); randn(N,1)+EFFECT_SIZE];
figure
hist(X,101)
GMModel = fitgmdist(X,2)
end
If you run that code -- you will need the Statistics Toolbox -- you will see that the first distribution is fit very well, and the second one terribly. I am trying to understand why. I would expect such well separated peaks to be fit well essentially every time.
This is not a fluke. I ran 1,000 different seeds, and got the bad fit about 18% of the time. Also, those bad fits tend to cluster relatively close the same parameter values.
Any thoughts? I am a novice at using GMM, so maybe I am just naive about how well this should do.
I am running R2014b on Mac OS X Yosemite.
Accepted Answer
More Answers (0)
Categories
Find more on Gaussian Mixture Models in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!