Fitting data to a Gaussian when data is a little bit skewed (but not too much)

14 views (last 30 days)
Hello, I have some data below that should approx to a Gaussian.
1.00 NaN
2.00 4.00
3.00 3.00
4.00 6.00
5.00 9.00
6.00 7.00
7.00 9.00
8.00 57.00
9.00 141.00
10.00 205.00
11.00 204.00
12.00 190.00
13.00 86.00
14.00 23.00
15.00 5.00
16.00 4.00
17.00 7.00
18.00 5.00
19.00 7.00
20.00 4.00
21.00 5.00
I understand its not perfect, but when I give my gaussian fitting the parameters represented by the orange curve, it fails to do any kind of fit.
This is my Routine
b0 = 11.00
c0 = 1.73 % This is actually sqrt (fwhm guess = 3)
function [fwhm,xdataFine,fitGaus,xpeak,ypeak,rsquared]=myGaussianFit2(xdata,ydata, b0,c0)
%--------------------Gaussian Fit----------------------------------------
%Input is xdata, ydata and must be in column vector format
%only two initial guesses required as the following are determine by ydata
a0 = max(ydata(:)); %-min(ydata(:));
d0 = min(ydata(:));
%b0 is the guess at x-location of peak
%c0 is guess at width of peak
%Output gives the fitted parameters as well as xpeak location and its
%yvalue (i.e. xpeak & ypeak
%-------------------------------------------------------------------------
%Define Gauss Equation (remember the . notation
gaussEqn ='a*exp(-0.5*((x-b)/(c^2)).^2)+d';
%Use startpoint to define guess values (otherwise fit doesn't perform well)
try
%f = fit(xdata,ydata,gaussEqn,'Normalize','off', 'StartPoint',[a0,b0,sqrt(c0),d0]); %use c^2 instred of c to enforce postive sigma/fwhm
[f,gof]=fit(xdata,ydata,gaussEqn,'Normalize','off', 'StartPoint',[a0,b0,sqrt(c0),d0]); %use c^2 instead of c to enforce postive sigma/fwhm
coeffs=coeffvalues(f);
a=coeffs(1); b=coeffs(2);
c=coeffs(3); %need to square it as used c^2 in fitting equation to enforce +ve values
c=c^2;
d=coeffs(4);
rsquared=gof.rsquare;
fwhm=c*sqrt(log(256));
% %Increase resolution of x data (by 30)
xdataFine=(linspace(xdata(1),xdata(end),100))';
% Create high res Gauss function
fitGaus = a*exp(-0.5*((xdataFine-b)/c).^2)+d;
%Find max value and its x location
ypeak=max(fitGaus);
xpeak=b;
xpeak=mean(xpeak(:)); %Take mean incase more than one peak found
catch
a=0;b=0;c=0;d=0;xpeak=0;ypeak=0;
end
Is there anyhting I can do here to make the fit work, although I understand its not great, but I still think it should be able to fit something!
  1 Comment
Jason
Jason on 30 Sep 2024
Edited: Jason on 30 Sep 2024
This seems to help.
Assuming its the peak value that could be wrong, so "exclude" it
[idx,v]=find(ydata == a0); %a0 is max of ydata
try
[f,gof]=fit(xdata,ydata,gaussEqn,'Normalize','off', 'StartPoint',[a0,b0,sqrt(c0),d0]); %use c^2 instead of c to enforce postive sigma/fwhm
catch
[f,gof]=fit(xdata,ydata,gaussEqn,'Normalize','off', 'StartPoint',[a0,b0,sqrt(c0),d0],'Exclude',[idx]);
end

Sign in to comment.

Accepted Answer

dpb
dpb on 30 Sep 2024
xy=[1.00 NaN
2.00 4.00
3.00 3.00
4.00 6.00
5.00 9.00
6.00 7.00
7.00 9.00
8.00 57.00
9.00 141.00
10.00 205.00
11.00 204.00
12.00 190.00
13.00 86.00
14.00 23.00
15.00 5.00
16.00 4.00
17.00 7.00
18.00 5.00
19.00 7.00
20.00 4.00
21.00 5.00];
f=fit(xy(2:end,1),xy(2:end,2),'gauss1')
f =
General model Gauss1: f(x) = a1*exp(-((x-b1)/c1)^2) Coefficients (with 95% confidence bounds): a1 = 224.6 (211, 238.2) b1 = 10.73 (10.62, 10.85) c1 = 2.37 (2.204, 2.536)
Simply removing the NaN observation and using the builtin gaussian seems to work. I didn' take time to try to see what might be going wrong otherwise...
plot(f,xy(:,1),xy(:,2))
  8 Comments
dpb
dpb on 30 Sep 2024
Moved: John D'Errico on 30 Sep 2024
It's always a goal of mine to not leave temporary arrays around if they're not really needed...I don't know if the JIT compiler is smart enough that if one were to instead write
x=x(isfinite(x) & isfinite(y));
y=y(isfinite(x) & isfinite(y));
that doesn't store the logical indexing vector it wouldn't actually do the comparison twice but create the temporary behind the scenes anyway. I'm sure there are those who know enough more about the JIT compiler that could answer that, but I'm not one of them...so I tend to write like the first and sacrifice some memory on the assumption am saving a few compute cycles. Whether that's true or not, I really don't know, though, and there's not enough detail in the documentation to give any hints as to whether one way or the other is better--in fact, they go out of their way to tell you to not try to optimize because "things change" and their suggestion is to write as straightforward code as one can and then and only then if it turns out to be too slow try to figure out ways to speed it up...
Image Analyst
Image Analyst on 1 Oct 2024
You might also be interested in the function fitgmdist
help fitgmdist
fitgmdist - Fit Gaussian mixture model to data This MATLAB function returns a Gaussian mixture distribution model (GMModel) with k components fitted to data (X). Syntax GMModel = fitgmdist(X,k) GMModel = fitgmdist(X,k,Name,Value) Input Arguments X - Data numeric matrix k - Number of components positive integer Name-Value Arguments CovarianceType - Type of covariance matrix 'diagonal' | 'full' (default) Options - Iterative EM algorithm optimization options statset options structure ProbabilityTolerance - Tolerance for posterior probabilities 1e-8 (default) | nonnegative scalar value in range [0,1e-6] RegularizationValue - Regularization parameter value nonnegative scalar | 0 (default) Replicates - Number of times to repeat EM algorithm positive integer | 1 (default) SharedCovariance - Flag indicating whether all covariance matrices are identical logical false (default) | logical true Start - Initial value setting method 'randSample' | 'plus' (default) | vector of integers | structure array Output Arguments GMModel - Fitted Gaussian mixture model gmdistribution model Examples openExample('stats/ClusterDataUsingAGaussianMixtureModelExample') openExample('stats/RegularizeGaussianMixtureModelEstimationExample') openExample('stats/ViewGaussianMixtureModelsOverComponentLevelsExample') openExample('stats/DetermineGaussianMixtureFitUsingAICExample') openExample('stats/SetInitialValuesWhenFittingGaussianMixtureModelsExample') See also gmdistribution, cluster Introduced in Statistics and Machine Learning Toolbox in R2014a Documentation for fitgmdist doc fitgmdist

Sign in to comment.

More Answers (0)

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!