How to fit lognormal distribution to a dataset which contains some zero values?
    17 views (last 30 days)
  
       Show older comments
    
How to fit lognormal distribution to a dataset which contains some zero values?
4 Comments
  Walter Roberson
      
      
 on 11 Aug 2023
				Exact zeros for rainfall values are common. The overall dataset cannot be lognormal. To get any further with a lognormal distribution you would have to start doing calculations based upon absolute humidity or relative humidity measured multiple times over the day so that you could calculate "available water"
Answers (2)
  Walter Roberson
      
      
 on 10 Aug 2023
        
      Edited: Walter Roberson
      
      
 on 10 Aug 2023
  
      Don't do that?
There are a small number of possibilities in that situation:
- That the log-normal distribution is just a wrong model for the system and you should be chosing a different model instead
- That the zeros are place holders for errors in the data. In such a case those measurements should be removed before trying to fit the data
- That the zeros are round-off for small measurements, perhaps due to limited precision of sensors. You will not be able to learn anything useful from those measurements, so you should remove them before trying to fit the data
- That the zeros are caused by noise in the system. In such a case, log-normal model is not going to apply, but you might be able to obtain an approximation by removing the zeros (and negatives) before trying to fit the data
- That the zeros are correct points, representing locations where the parameters are negative infinity. I would imagine that there are several papers to be written about the physics of such a system, which would probably have deep connections to Bose-Einsten Condensates and to Planck Distances...
1 Comment
  John D'Errico
      
      
 on 10 Aug 2023
				
      Edited: John D'Errico
      
      
 on 10 Aug 2023
  
			Be careful.
If the zeros are just low values that were "rounded" off to zero, then simply removing them will be a problem. Essentially you are biasing the estimate, since they SHOULD have been really small values. You are now estimating the parameters of a censored sample.
If that is the case, then you probably need to use MLE for a left censored sample.
A comparable example might be to estimate the distribution parameters of a normal distribution, but where all of the negative numbers were simply discarded. For example:
n = 100000;
x = randn(n,1);
x(x<0) = [];
mean(x)
var(x)
As you should expect, any attempt to estimate the normal parameters (which here should be (0,1)) will fail, unless you treat this properly as a censored sample.
The point being, you want to understand where the zeros are coming from, and deal with them properly.
  dpb
      
      
 on 11 Aug 2023
        
      Edited: dpb
      
      
 on 13 Aug 2023
  
      One analysis technique for daily rainfall modeling divides the problem into two parts -- a "wet-day" model that predicts rainfall amounts for those days that rainfall occurs and an independent Markov or stochastic renewal model to predict the occurrence of the zero-rainfall days.
The Pearson Type-3 or the two-parameter gamma distributions have been able to do a reasonable job of modeling point location rainfall for wet-day amount predictions.  There's extensive literature in the subject field...
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



