# Calculating the Gaussian distribution paramaters

3 views (last 30 days)
Adrien Massiet on 21 Jun 2022
Commented: Adrien Massiet on 21 Jun 2022
Hello,
I'm trying do a small script to try the EM algorithm in which I have 2 sets of 1 dimension points that belong to 2 different guassians but I don't know which point belongs to which data set, and the EM algorithm estimates the gaussian parameters (mean,variance) for both.
For that I first create a small data set
data1 = normrnd(-6,3,[200 1]);
data2 = normrnd(6,1,[200 1]);
data = [data1;data2];
Then to compare the results outputed by the EM algoritm, I first calculate the gaussian distrubution parameters. However the result I get is slightly different if i use the matlab funtion fitdist or if I code the math it self: (left is manual math, right is fitdist)
Why is that?
PS:
The math I did was for mu and sigma:
The manual math is coded as:
% (μ,σ²)
distGauss1.mu = mean(data1);
distGauss1.sigma = mean((data1-distGauss1.mu).^2);
distGauss2.mu = mean(data2);
distGauss2.sigma = mean((data2-distGauss1.mu).^2);

dpb on 21 Jun 2022
Let's try your formula with numbers...
>> data1 = normrnd(-6,3,[200 1]);
>> mean(data1)
ans =
-6.1098
>> std(data1)
ans =
3.0128
>>
OK, that returns what we would expect, pretty close to the input parameters ot the RNG...
Now what does your calculation give...
>> mean((data1-mean(data1)).^2)
ans =
9.0315
>>
Woops!!! You forgot two things -- first is
sqrt(mean((data1-mean(data1)).^2))
ans =
3.0052
>>
That's much closer, but still not quite the same identical answer as std returned -- but you used mean which divides by n and the unbiased estimator of the std uses n-1
So, as the LH plot shows, your distribution is much fatter than it should be...3X the width since the input sigma was 3. The result is much closer for the other as sqrt(1) --> 1 so the difference just doesn't show up numerically.
##### 2 CommentsShowHide 1 older comment
Adrien Massiet on 21 Jun 2022
oh, and that also fixed the ones inside the EM, I also had an issue where the EM algorithm wasn't working properly, best it could do was a really rough approximation of what it was supposed to find ,sometimes not working at all computing the mu and/or sigma so small matlab just said "NaN". This comment doesn't add anything, just wanted to say thx again