gamfit confidence intervals

Hi,
when using the gamfit function, the confidence intervals of the parameters of the gamma distribution are not symmetric around the fitted values. This can be seen for the second (scale) variable in the example given in the R2011b documentation, and seems to apply also to the first (shape) parameter in my R2008b version.
The confidence interval is usually computed using the Fisher Information Matrix, and should be symmetric, as it is given as "fitted value +- c*std" where std is the standard deviation and c a numerical constant giving the desired confidence level under the assumption of normal distribution.
So why isn't the fitted value in the middle of the confidence interval? Are the fitted parameters different from those given and a non-linear transformation is then used for conversion?

 Accepted Answer

Peter Perkins
Peter Perkins on 13 Sep 2011
GAMFIT fits the parameters on the log scale, on which the asymptotic normal approximation tends to need fewer observations to be reasonable. Then the symmetric CIs for logged parameters get exponentiated. If you edit GAMFIT, you can see exactly what the code does, e.g., lines 279-284 in the current release.

4 Comments

Dear Mr. Perkins,
thank you for your reply. I have indeed looked at the code and seen the fit performed on the log of the parameters. Can you please elaborate why the normal approximation is supposed to need fewer observations that way? To me it looks awkward to take a log of a scale parameter that usually has dimensions, such as time.
Yours,
Ehud Schreiber.
The construction of the standard confidence interval here depends heavily on being able to approximate the sampling distribution of the parameter estimator as a normal distribution. Scale and shape parameter MLEs often have skewed sampling distributions especially in small sample sizes, and you can demonstrate that for yourself by generating lots (1000, say) of gamma dist'd data from the same distribution, repeatedly fitting the data by maximum likelihood, and making a histogram of the 1000 shape parameters. Do it for a small sample size, then a larger one, then a larger one.
Or, note that the sampling distribution for the MLE of sigma^2 for a normal distribution is chi-squared.
The log transform is just a way to get a sampling distribution that's closer to the assumption on which the construction of the CI is based.
To clarify, when I said, "by generating lots (1000, say) of gamma dist'd data from the same distribution", I meant generate (say) M=1000 vectors of data from (say) gamrnd, each of length (say) n=25. Then use gamfit to get 100 pairs of MLEs. Then try it with n=100 and n=1000.
Reading Mr. Perkins' new comments I think now I understand the reasoning. Perhaps it can also be explained in the following fashion, highlighting a problem with the naive application of the confidence interval estimation (i.e. Fisher information matrix approach applied directly to the parameters).
The scale and shape parameters must both be positive, in the range (0, infinity). When the sample size is small, then the distribution of the estimators is wide; a normal approximation of this distribution is thus inappropriate as it may significantly penetrate the negative values. Taking the log transform of the parameters maps their allowed region to the whole real line, solving the above problem, and is therefore advantageous.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!