Lognormal Fitting with the Pore Size and its Probabilities

Question

Alex Lee on 8 Jan 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2067676-lognormal-fitting-with-the-pore-size-and-its-probabilities

Commented: William Rose on 21 Feb 2024

Dear all,

I have some pore sizes and its probabilites. Visually the pore size distribution follows the lognormal distribution, and the particle size distribution usually follows lognormal distribution, as the black line in the figure shows. So I try to fit it to a lognormal distribution with the following code:

Counts = [71	73	53	77	79	141	158	237	360	594	958	1659	3094	7183	19034	40051	125001	143248	164224	168209	172406	167600	160578	146530	124505	90064	67769	42179	21645	1070];
PoreSize = [89716.4	71264.3	56607.2	44964.7	35716.7	28370.8	22535.7	17900.8	14219.1	11294.6	8971.6	7126.4	5660.7	2837.1	2253.6	1790.1	712.6	566.1	449.6	357.2	283.7	225.4	179	142.2	112.9	89.7	71.3	56.6	45	35.7];
probs = Counts / sum(Counts);
ExpOfY = sum(PoreSize.*probs); 
Ysqr = PoreSize.^2; 
ExpOfYsqr = sum(Ysqr.*probs);
VarOfY = ExpOfYsqr - ExpOfY1^;  % variance of the scores tabulated in vectors 1 & 2
Invalid expression. Check for missing or extra characters.
normu = log(ExpOfY/sqrt(1+VarOfY/ExpOfY^2));
norvar = log(1+VarOfY/ExpOfY^2);
norsigma = sqrt(norvar);
dist = makedist('Lognormal','mu',normu1,'sigma',norsigma1);
x = linspace(0.001,200000,2000000);  % Use whatever range is appropriate for data
pdfOfx1 = pdf(dist,x);
figure; hold on; plot(PoreSize,Counts);
plot(x,pdfOfx * 6200); % Multiply 6200 just to compare with the original data.

After fitting, the results look wired, comparing with the raw data point distribution. The agreement between the fitting and the original looks quite bed.I have also tried to use the makedist, the results look similar as well. Does anyone know the reason?

Best

18 Comments
Show 16 older commentsHide 16 older comments

Rena Berman on 21 Feb 2024

Edited: Rena Berman on 21 Feb 2024

(Answers Dev) @William Rose, @the cyclist, I have forwarded this on to the appropriate MW developers.

William Rose on 21 Feb 2024

Thank you @Rena Berman.

Sign in to comment.

Sign in to answer this question.

Answer 1

William Rose on 8 Jan 2024

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2067676-lognormal-fitting-with-the-pore-size-and-its-probabilities#answer_1385621

@Alex Lee,

Take the logarithm of the pore sizes, then compute mu and sigma using the log(pore size) vector. The mu, sigma computed that way are what you should pass to makedist('Lognormal',...). Perhaps the rest of your code will work correctly after that.

27 Comments
Show 25 older commentsHide 25 older comments

William Rose on 8 Jan 2024

Open in MATLAB Online

@Alex Lee,

% load data

Counts = [71, 73, 53, 77, 79, 141, 158, 237, 360, 594, 958, 1659, 3094, 7183,19034,40051,125001,143248,164224,168209,172406,167600,160578,146530,124505,90064,67769,42179,21645,1070];

PoreSize=[89716,71264,56607,44965,35717,28371,22536,17901,14219,11295, 8972, 7126, 5661, 2837, 2254, 1790, 712.6, 566.1, 449.6, 357.2,283.7, 225.4, 179.0, 142.2, 112.9, 89.7, 71.3, 56.6, 45.0, 35.7];

% plot data

figure; semilogx(PoreSize,Counts,'-k.');

grid on; xlabel('Pore Size (nm)'); ylabel('Count'); hold on

%estimate distribution

N=sum(Counts); % total number of observations

probs = Counts / N;

Y=log(PoreSize);

EY = sum(Y.*probs);

EY2 = sum(Y.*Y.*probs);

VarY = EY2 - EY^2; % variance log(PoreSize)

sigmaY = sqrt(VarY);

dist = makedist('Lognormal','mu',EY,'sigma',sigmaY);

X=sort(PoreSize); % sort PoreSize into ascending order

Xpdf = pdf(dist,X);

CountPred=Xpdf.*X*N/sum(Xpdf.*X);

% add to plot

semilogx(X,CountPred,'-r.');

legend('Measured','Pred')

When computing the predicted counts, I scale the PDF by an "effective bin width" which is equal to X, the PoreSize:

CountPred=Xpdf.*X*N/sum(Xpdf.*X);

This is necessary because this is a lognormal distribution. The scalar factor N/sum(Xpdf.*X) is needed to make the total number of predicted counts equal the total number of measured counts.

Paul on 9 Jan 2024

Open in MATLAB Online

Hi William,

I'm quite curious about this problem in how the parameters of the lognormal distribution are estimated. My concern is that the parameters aren't being estimated from the actual data, of which there are N points. Rather, the parameters are being estimated from discretized data, and the data is discretized into bins that, as far as I can tell, aren't even equally spaced in any sense. Is there any theoretical underpinning for estimating distribution parameters from discretized data? It's not clear to me that disrcetizing and then computing the sample mean and variance as below is the same as what we'd get from computing the sample mean and variance from the un-discretized data.

Here is your code for estimating the pdf:

Counts   = [71,   73,   53,   77,   79,   141,  158,  237,  360,  594,   958,  1659, 3094, 7183,19034,40051,125001,143248,164224,168209,172406,167600,160578,146530,124505,90064,67769,42179,21645,1070];
PoreSize = [89716,71264,56607,44965,35717,28371,22536,17901,14219,11295, 8972, 7126, 5661, 2837, 2254, 1790, 712.6, 566.1, 449.6, 357.2,283.7, 225.4, 179.0, 142.2, 112.9, 89.7, 71.3, 56.6, 45.0, 35.7];
Counts   = fliplr(Counts);
PoreSize = fliplr(PoreSize);
%estimate distribution
N     = sum(Counts);      % total number of observations
probs = Counts / N;
Y     = log(PoreSize);
EY    = sum(Y.*probs);
EY2   = sum(Y.*Y.*probs);
VarY  = EY2 - EY^2;    % variance log(PoreSize)
sigmaY = sqrt(VarY);
dist1  = makedist('Lognormal','mu',EY,'sigma',sigmaY);

Here is my code for estimating the pdf. The difference is that I'm estimating the sample mean and variance from the discretized data and then finding the parameters of the pdf in the lognormal way, while your code is estimating the sample mean and variance of the log of the discretized data directly.

m     = sum(PoreSize.*Counts)/N;
v     = sum((PoreSize-m).^2.*Counts)/N;
mu    = log(m^2/sqrt(v+m^2));
sigma = sqrt(log(v/m^2+1));
dist2 = makedist('LogNormal','mu',mu,'sigma',sigma);

Comparing the pdfs we see two very different solutions

dist1,dist2

dist1 =

LognormalDistribution Lognormal distribution mu = 5.55692 sigma = 0.84966

dist2 =

LognormalDistribution Lognormal distribution mu = 4.92834 sigma = 1.47884

figure

x = logspace(-2,5,5000);

semilogx(x,pdf(dist1,x),x,pdf(dist2,x)),grid

Despite appearances, both pdfs integrate to unity.

[trapz(x,pdf(dist1,x)) trapz(x,pdf(dist2,x))]
ans = 1×2
    1.0000    1.0000

Comparing dist1 and dist2 to the histogram of the data clearly shows that your solution is superior.

edges = mean([PoreSize(2:end);PoreSize(1:end-1)]);

edges = [2*PoreSize(1)-edges(1), edges, 2*PoreSize(end)-edges(end)];

figure

histogram('BinEdges',edges,'BinCounts',Counts,'Normalization','pdf')

hold on

plot(x,pdf(dist1,x),x,pdf(dist2,x)),grid

legend('data','dist1','dist2')

xlim([0 1000])

Here, I'll generate a full set of PoreSize data in accordance with your pdf.

PoreSize = random(dist2,1,N);

Now use the same code as above to estimate the pdf parameters with this data

Counts = ones(1,N);

probs = Counts / N;

Y = log(PoreSize);

EY = sum(Y.*probs);

EY2 = sum(Y.*Y.*probs);

VarY = EY2 - EY^2; % variance log(PoreSize)

sigmaY = sqrt(VarY);

dist1 = makedist('Lognormal','mu',EY,'sigma',sigmaY);

m = sum(PoreSize.*Counts)/N;

v = sum((PoreSize-m).^2.*Counts)/N;

mu = log(m^2/sqrt(v+m^2));

sigma = sqrt(log(v/m^2+1));

dist2 = makedist('Lognormal','mu',mu,'sigma',sigma);

figure

semilogx(x,pdf(dist1,x),x,pdf(dist2,x)),grid

We essentially get the same results.

Any idea why your method works so much better with the discretized data?

William Rose on 9 Jan 2024

@Paul, the uneven spa ing of the data did defijnitewly throwe me off. The uneven spacing means it is quite an odd data set. It is not linearly or logarithmically spaced. The plot of number of observations vesus pore size looks kind of like a normal distribution when plotted on a log scale, which is why it seems readonable to use a lognormal distribution. HOwever, this plot should not be thought of as representing a histogram, because the uneven bin width has not been taken into account. I realized this when I was trying to use the fitted distribution to predict the observed numbers. #

@Alex Lee,

Please explain the experimental procedure used to obtain the data. The observed pore numbers at sizes 712 and 1740 nm are particularly interesting and somewhat unexpected. There is a large gap (a factor of 2.5) between these two sizes. If each pore size is considered to be the center or the edge of a bin, then the bins at these two sizes (or at least one of them) would have a considerably higher expected count than what was observed. I have been trying to think of an epxerimental procedure that could generate data like this from a lognormal distribution of pore sizes. So far, I am failing. Thanks.

William Rose on 9 Jan 2024

@Paul,

The parameter estimation formulas which I used give the maximum likelihood estimator (MLE) for the lognormal distribution. The MLE is (see here)

The estimates I computed are equivalent to formulas above*, if you assume that there were n1 samples with value x1, and n2 samples with value x2, and so on.

You have used the method of moments(MoM): choose mu and sigma so that the moments of the fitted distribution match the experimentally measured moments. For a normal distribution, the MLE and the MoM give the same formulas for the estimates of mu and sigma. For a lognormal distribution, the MLE and MoM estimates are not the same.

@Paul, you also asked "Is there any theoretical underpinning for estimating distribution parameters from discretized data?". Yes there is. For example, the Kolmogorov-Smirnov test, and the Anderson-Darling test, for deciding if a sample came from a specified distribution, are well suited to working with binned data. When doing the K-S test, it is standard to use the MLE parameters, if the parameters must be estimated. I believe the A-D test for the lognormal distributiuon also uses the MLE parameters (see here, p.3, lower left column).

I am certainly not a statistician, so everything I say is based on my student-level understanding of the topics.

*Demonstration that the MLE formulas from here are equivalent to the formulas I used in my code:

Paul on 10 Jan 2024

Edited: Paul on 10 Jan 2024

Open in MATLAB Online

Yes, the MLE and MOM estimates won't be the same, but I think they'll be pretty close when applied to non-discretized data that actually comes from the assumed, continuous distribution, which is what I showed above.

To be clear, my concern is that by discretizing the data there must be some loss of information (musn't there?) that would compromise the parameter estimation for the continuous distribution. It turns out that for one example my concern is not well founded when the discretized data is the result of discretizing the samples from the actual distribution to be estimated.

Here's the example.

Start with the same data in this question

Counts   = [71,   73,   53,   77,   79,   141,  158,  237,  360,  594,   958,  1659, 3094, 7183,19034,40051,125001,143248,164224,168209,172406,167600,160578,146530,124505,90064,67769,42179,21645,1070];
PoreSize = [89716,71264,56607,44965,35717,28371,22536,17901,14219,11295, 8972, 7126, 5661, 2837, 2254, 1790, 712.6, 566.1, 449.6, 357.2,283.7, 225.4, 179.0, 142.2, 112.9, 89.7, 71.3, 56.6, 45.0, 35.7];
Counts   = fliplr(Counts);
PoreSize = fliplr(PoreSize);

Make the distribution from your procedure as implemented above.

dist1 = makedist('LogNormal','mu',5.55692,'sigma',0.84966);

Generate random data from this actual distribution.

N = sum(Counts);
rng(100);
X = random(dist1,1,N);

Discretize the data assuming that the PoreSise elements define the bin centers and the edges are linearly halfway between the centers (maybe it would be better to assume halfway in a log sense?)

edges = mean([PoreSize(2:end);PoreSize(1:end-1)]);
edges = [2*PoreSize(1)-edges(1), edges, 2*PoreSize(end)-edges(end)];
bincounts = histcounts(X,edges);

Compare the discretized counts from the actual distribution to the Counts data from which dist1 as derived

figure

semilogx(PoreSize,Counts,'-o',PoreSize,bincounts,'-x'),grid

xline(edges)

The discretized counts from sampling dist1 is actually a pretty good match to Counts, execpt for PoreSize = 712, which seems to be the center of the widest bin, probably not a coincidence.

MLE parameter estimates based on the dist1, discretized samples (red points)

probs = bincounts / N;
Y     = log(PoreSize);
EY    = sum(Y.*probs);
EY2   = sum(Y.*Y.*probs);
VarY  = EY2 - EY^2;    % variance log(PoreSize)
sigmaY = sqrt(VarY);
dist2 = makedist('LogNormal','mu',EY,'sigma',sigmaY);

MOM parameter estimates based on dist1, discretized samples (red points)

m     = sum(PoreSize.*bincounts)/N;
v     = sum((PoreSize-m).^2.*bincounts)/N;
mu    = log(m^2/sqrt(v+m^2));
sigma = sqrt(log(v/m^2+1));
dist3 = makedist('LogNormal','mu',mu,'sigma',sigma);

Compare the true and estimated pdfs.

x = logspace(-2,5,5000);

figure

semilogx(x,pdf(dist1,x),x,pdf(dist2,x),x,pdf(dist3,x)),grid

For this case, the MLE and MOM both work pretty well when the discretized data is derived from truly log-normal samples.

But for the original Counts data, the MLE and MOM approaches yield very different results, and I'm not sure why that should be the case. I'm sure there's a reason ....

William Rose on 10 Jan 2024

Open in MATLAB Online

@Paul, thanks for demonstrating that the MLE and MOM approaches to lognormal parameter estimation give similar results when applied to data that is actually from a lognormal distribnution, whether it is binned or not, and thank you for for showing that the MLE and MOM parameter estimates are quite different for the observed data in this example.

As you note, the predicted number of counts at pore size=712 um is much larger than the observed number. The large predicted number is due to the large binwidth at 712. The observed counts at 712 um and at 1790 um (which is also a wide bin, if you assume the bounds are at the midpoints) are lower than you'd expect, if the distribution is actually lognormal. Thus the counts at 712 um and maybe at 1790 um are outliers, in the sense of being highly improbable , if the data is a random sample from a lognormal distribution. The plot of the probability density shows this, sort of:

Counts = [71, 73, 53, 77, 79, 141, 158, 237, 360, 594, 958, 1659, 3094, 7183,19034,40051,125001,143248,164224,168209,172406,167600,160578,146530,124505,90064,67769,42179,21645,1070];

PoreSize = [89716,71264,56607,44965,35717,28371,22536,17901,14219,11295, 8972, 7126, 5661, 2837, 2254, 1790, 712.6, 566.1, 449.6, 357.2,283.7, 225.4, 179.0, 142.2, 112.9, 89.7, 71.3, 56.6, 45.0, 35.7];

Counts = fliplr(Counts); PoreSize = fliplr(PoreSize);

Y=log(PoreSize);

binEdge=(Y(1:end-1)+Y(2:end))/2; % interior bin edges, at the midpoints

binWidth=diff(binEdge); % interior bin widths

% add bin widths at ends, same as nearest neighbors

binWidth=[binWidth(1),binWidth,binWidth(end)];

% Estimate density. Will be normal, if PoreSize is lognormal.

pdf=(Counts/sum(Counts))./binWidth;

plot(Y,pdf,'-rx'); grid on; xlabel('Y=log(PoreSize)'); ylabel('PDF')

text(Y(14),pdf(14),'\leftarrow 712 \mum');

text(Y(15),pdf(15),'\leftarrow 1790 \mum');

I understand that different methods of parameter estimation have different outlier sensitivity. It seems that the MOM is more sensitive to outliers than the MLE, for this distribution.

Emily on 18 Jan 2024

Open in MATLAB Online

@Alex Lee @William Rose

I am reading this post after @Torsten linked it on my question here. I have a question about how you calculate mu and sigma from the data. I was getting values that did not make sense when simply taking mean(log(data)) and std(log(data)), and I am wondering if you can explain the logic behind the following lines:

%estimate distribution
N=sum(Counts);      % total number of observations
probs = Counts / N;
Y=log(PoreSize);
EY = sum(Y.*probs);
EY2 = sum(Y.*Y.*probs);
VarY = EY2 - EY^2;    % variance log(PoreSize)
sigmaY = sqrt(VarY);
dist = makedist('Lognormal','mu',EY,'sigma',sigmaY);
X=sort(PoreSize);   % sort PoreSize into ascending order
Xpdf = pdf(dist,X);
CountPred=Xpdf.*X*N/sum(Xpdf.*X);

If there is any source I could read more from that explains these steps, that would be helpful. I am just not understanding the calculation of mu (EY), and though I did read the explanation of the step of scaling the PDF to get correct predicted counts, I am still a bit confused by the logic behind that step. Thank you in advance for any help! I have tried to find sources online for calculating these parameters but haven't had much luck until this thread.

William Rose on 18 Jan 2024

@Emily,

The code above is for a person who does not have the raw data - they only have a histogram of the data. Does that apply to you? I looked at your other post. It is not obvious to me if you were fitting raw data, or a histogram of the data.

To understand the code above, remember that the raw data (pore sizes) are denoted by X, and Y=log(X). I converted the histogram of X to a histogram of Y. The code above, which you asked about, estimates

and

. This approach is the maximum likelihood estimator (MLE) for mu and sigma.

@Paul uses a different approach. He computes

and var(X). Then he computes the mu and sigma that are consistent with

and

. This approach is the method of moments estimator for mu and sigma.

If the data are really from a lognormal distribution, the MOM and MLE estimates of mu and sigma will match, or be very close. But if the data contain outliers, or are not from a lognormal distribution, then the MOM estimates and MLE estimates can be quite different, and that is what we observed for the data set in this question.

Emily on 18 Jan 2024

Edited: Emily on 18 Jan 2024

Open in MATLAB Online

@William Rose I am fitting raw data - an observed aerosol size distribution. One variable is the count of number of aerosols within each "bin" (osmoke_num_dist_norm), and the other variable designates the central diameter of each bin (odiameters2). The only changes I have made from the raw data is to create equal bin spacing across the diameters variable that is then applied to the bin concentrations, and then I normalized the bin concentrations (each individual bin concentration divided by the sum of all bin concentrations). When I make a semilogx plot with odiameters2 as the x-variable and osmoke_num_dist_norm as the y-variable, it resembles a lognormal PDF. I am trying to calculate the mu and sigma from these observations to create a new lognormal PDF that extends past the current min/max of the observations.

In the code below I use two methods to try to calculate the mu and sigma from the raw data. Both methods give a mu = -10.3954 and sigma = 1.8516. I know from guessing/checking that to plot a fitted distribution that matches the observations the values should be mu = -1.8 and sigma = 0.39. I am including the plot that comes out as a result ("calculated" and "lognfit" come up with the same solutions, so they overlap in the plot).

osmoke_num_dist_norm = load('osmoke_num_dist_norm.mat'); osmoke_num_dist_norm = osmoke_num_dist_norm.osmoke_num_dist_norm; 
odiameters2 = load('odiameters2.mat'); odiameters2 = odiameters2.odiameters2; 
% calculate mu and sigma from the data
ln_x = log(osmoke_num_dist_norm);
mu1 = mean(ln_x);
sig1 = std(ln_x);
% use lognfit to get mu and sigma
pHat = lognfit(osmoke_num_dist_norm);
mu2 = pHat(1,1);
sig2 = pHat(1,2);
% create distributions from both mu/sigma versions
xvalues = 0:0.02:1E02;
pd1 = makedist('Lognormal','mu',mu1,'sigma',sig1);
y1 = pdf(pd1,xvalues);
y1 = y1./sum(y1); % normalize the fitted distribution to match observed
pd2 = makedist('Lognormal','mu',mu2,'sigma',sig2);
y2 = pdf(pd2,xvalues);
y2 = y2./sum(y2); % normalize the fitted distribution to match observed
semilogx(xvalues,y1,'LineWidth',2)
hold on
semilogx(xvalues,y2,'LineWidth',2)
hold on
semilogx(odiameters2,osmoke_num_dist_norm,'LineWidth',2)
legend('calculated','lognfit','data')

I feel like I must be putting the data into the equations incorrectly somehow? Or maybe I'm just confused about something really simple? Sorry in advance if this turns out to be a really stupid error on my end, I just cannot seem to figure it out to save my life.

William Rose on 19 Jan 2024

Open in MATLAB Online

@Emily,

Load data:

load osmoke_num_dist_norm
prob=osmoke_num_dist_norm;
load odiameters2
X=odiameters2;
Y=log(X);

Estimate mu and sigma. Call the estimates

= muHat, sigmaHat, as a reminder that these are only estimates of true, but unknown, parameters.

muHat=sum(prob.*Y);

sigmaHat=sqrt(sum(prob.*Y.^2)-muHat^2);

Display results.

fprintf('mu=%.3f, sigma=%.3f.\n',muHat,sigmaHat)
mu=-1.535, sigma=0.689.

Make probability distribution object

pd1 = makedist('Lognormal','mu',muHat,'sigma',sigmaHat);
pdfEst = pdf(pd1,X);

Plot results. The values in vector prob are not a density. To convert the values to a density, for plotting versus the fitted denisty, we must divide each value in prob by the corresponding bin width. The bin widths in your data are all the same. (The bin widths were irregular in @Alex Lee's data, which made things more complicated.)

binwid=mean(diff(X)); % mean bin width; widths are all the same

pdfX=prob/binwid; % probability density for observed data

semilogx(X,pdfX,'r.',X,pdfEst,'-b');

xlabel('X=particle size'); ylabel('Probability denisity');

grid on; legend('Data','Fitted Lognormal Dist.')

It's not the greatest fit ever, but it's not totally terrible. The data show some significant deviation from log-normality. For example, note the data (red dots) above the blue line from X=0.8 to X=1.0. Furthermore, we cannot see, with the plot above, what's going on in the tail region. A log-log plot will help us see how the fitted distribution compares to the experimental data in the tail.

figure; loglog(X,pdfX,'r.',X,pdfEst,'-b');

xlabel('X=particle size'); ylabel('Probability denisity');

grid on; legend('Data','Fitted Lognormal Dist.')

The log-log plot helps me understand the fit. The first plot (semilogx plot) made me wonder if you could get a better fit by making sigma smaller. This would have made the blue line come closer to the red dots, at least up to about X=10^0. But the second plot (loglog plot) shows that if sigma were smaller, the mismatch between data and fit would be even worse than it is, for particle sizes > 10^0.

William Rose on 19 Jan 2024

@Torsten, good point. There's something to be said for the optical (chi-by-eye) fit. Although, if you plot the PDF for mu,sigma=-1.8,0.39 on a log-log plot, you might think again.

The mu,sigma I got are the MLE, if the data are from a lognormal distibution. If this data is not from a log-normal distribution, then maybe we don't want the MLE. (For @Alex Lee's data, the MLE gave a better optical result than the MOM estimate.)

We could ask "what mu and sigma give a lognormal density that is the best approximation to the observed data?" How should we define "best approximation"? A. The chi-squared goodness of fit statistic. B. The squared differences between observed and measured counts in each cell. C. Something else. Minimizing A or B may give different estimates for mu, sigma than the MLE estimates. I suspect that B will give a mu, sigma whose PDF is more optically pleasing (on a plot with a linear y-axis for the PDF) than the MLE, because B will not be influenced by the data above X=10^0 the way the MLE is. I think statisticians would frown on B.

Torsten on 19 Jan 2024

Edited: Torsten on 19 Jan 2024

Open in MATLAB Online

load osmoke_num_dist_norm

prob=osmoke_num_dist_norm;

load odiameters2

X=odiameters2;

%binwid=mean(diff(X)); % mean bin width; widths are all the same

%pdfX=prob/binwid; % probability density for observed data

pdfX = zeros(1,numel(X));

pdfX(1) = prob(1)/(0.5*(X(2)-X(1)));

pdfX(2:end-1) = prob(2:end-1)./(0.5*(X(2:end-1)-X(1:end-2))+0.5*(X(3:end)-X(2:end-1)));

pdfX(end) = prob(end)/(0.5*(X(end)-X(end-1)));

trapz(X,pdfX)

ans = 1.0000

%MLE

Y=log(X);

muHat=sum(prob.*Y);

sigmaHat=sqrt(sum(prob.*Y.^2)-muHat^2);

fprintf('Maximum Likelihood: mu=%.3f, sigma=%.3f.\n',muHat,sigmaHat)

Maximum Likelihood: mu=-1.535, sigma=0.689.

pd1 = makedist('Lognormal','mu',muHat,'sigma',sigmaHat);

pdfEst = pdf(pd1,X);

%MOM

EX = sum(prob.*X); % E(X)

VX = sum(prob.*X.^2)-EX^2; % var(X)

muhat2 = log(EX^2/sqrt(VX+EX^2)); % MOM estimate for mu

sigmahat2 = sqrt(log(VX/EX^2+1)); % MOM estimate for sigma

fprintf('Method of moments: mu=%.3f, sigma=%.3f.\n',muhat2,sigmahat2)

Method of moments: mu=-2.015, sigma=1.334.

pd2 = makedist('Lognormal','mu',muhat2,'sigma',sigmahat2);

pdfMOM = pdf(pd2,X);

%Function fitting

fun = @lognpdf;

F = @(x,xdata)fun(xdata,x(1),x(2));

ms = lsqcurvefit(F,[-1.8,0.39],X,pdfX);

Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance.

muhat3 = ms(1);

sigmahat3 = ms(2);

fprintf('Function fitting: mu=%.3f, sigma=%.3f.\n',muhat3,sigmahat3)

Function fitting: mu=-1.706, sigma=0.426.

pd3 = makedist('Lognormal','mu',muhat3,'sigma',sigmahat3);

pdfFF = pdf(pd3,X);

%Plot results

semilogx(X,pdfX,'r.',X,pdfEst,'-b',X,pdfMOM,'-g',X,pdfFF,'-c');

xlabel('X=particle size'); ylabel('Probability density');

grid on; legend('Data','Maximum Likelihood','Method of Moments','Function Fitting')

figure; loglog(X,pdfX,'r.',X,pdfEst,'-b',X,pdfMOM,'-g',X,pdfFF,'-c');

xlabel('X=particle size'); ylabel('Probability density');

grid on; legend('Data','Maximum Likelihood','Method of Moments','Function Fitting')

Torsten on 19 Jan 2024

@William Rose

He used Matlab's lsqcurvefit() to find the mu and sigma that gives the n=best fit, in a least-squares sense, to the data.

Although it will be difficult to find a statistical justification to use function fitting methods in distribution fitting ...

William Rose on 19 Jan 2024

@Torsten, @Emily,

I see from @Torsten's plots above that the MOM estimates of mu, sigma give a PDF that looks terrible (green line) on the semi-log plot. MOM also gave a visually unacceptable solution for @Alex Lee's data.

Sign in to comment.

Answer 2

Hassaan on 8 Jan 2024

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2067676-lognormal-fitting-with-the-pore-size-and-its-probabilities#answer_1385641

Edited: Hassaan on 8 Jan 2024

Open in MATLAB Online

% Given Data

Counts = [71 73 53 77 79 141 158 237 360 594 958 1659 3094 7183 19034 40051 125001 143248 164224 168209 172406 167600 160578 146530 124505 90064 67769 42179 21645 1070];

PoreSize = [89716.4 71264.3 56607.2 44964.7 35716.7 28370.8 22535.7 17900.8 14219.1 11294.6 8971.6 7126.4 5660.7 2837.1 2253.6 1790.1 712.6 566.1 449.6 357.2 283.7 225.4 179 142.2 112.9 89.7 71.3 56.6 45 35.7];

probs = Counts / sum(Counts);

% Calculating Expected Values and Variance

ExpOfY = sum(PoreSize.*probs);

Ysqr = PoreSize.^2;

ExpOfYsqr = sum(Ysqr.*probs);

VarOfY = ExpOfYsqr - ExpOfY^2; % Corrected the typo here

% Calculating the parameters for the lognormal distribution

normu = log(ExpOfY/sqrt(1+VarOfY/ExpOfY^2));

norvar = log(1+VarOfY/ExpOfY^2);

norsigma = sqrt(norvar);

% Create a lognormal distribution object

dist = makedist('Lognormal','mu',normu,'sigma',norsigma);

% Define the range for your data (adjust as needed)

x = linspace(min(PoreSize), max(PoreSize), 1000);

% Calculate the PDF

pdfOfx = pdf(dist,x);

% Plotting the data

figure; hold on;

scatter(PoreSize,Counts, 'b'); % Scatter plot of the original data

plot(x,pdfOfx * max(Counts), 'r'); % Plot of the fitted distribution

xlabel('Pore Size');

ylabel('Counts');

title('Pore Size Distribution with Lognormal Fit');

legend('Data','Lognormal Fit');

hold off;

Syntax and Typographical Errors: Ensure that all variable names and operations are correct. The expression VarOfY = ExpOfYsqr - ExpOfY1^; in your original code seems to contain errors. It's corrected here.
Visualizing Data: I've used a scatter plot for the original data and a line plot for the fitted distribution. Adjust the scaling factor (max(Counts)) as needed to better compare the two.
Data Range: The range of x in the line x = linspace(min(PoreSize), max(PoreSize), 1000); is set from the minimum to the maximum of your pore sizes. Adjust this as necessary.
Distribution Parameters: The 'mu' and 'sigma' for a lognormal distribution are derived from the logarithm of your data. Ensure these are calculated correctly.
Fitting and Model Choice: If the fit is still poor, consider alternative fitting methods or distributions. The fit quality can vary significantly based on the characteristics of your data.

---------------------------------------------------------------------------------------------------------------------------------------------------

If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.

Professional Interests

Technical Services and Consulting
Embedded Systems | Firmware Developement | Simulations
Electrical and Electronics Engineering

1 Comment
Show -1 older commentsHide -1 older comments

Alex Lee on 9 Jan 2024

Many thanks for your help!

Sign in to comment.

Answer 3

Torsten on 8 Jan 2024

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2067676-lognormal-fitting-with-the-pore-size-and-its-probabilities#answer_1385646

Edited: Torsten on 8 Jan 2024

Open in MATLAB Online

Counts = [71 73 53 77 79 141 158 237 360 594 958 1659 3094 7183 19034 40051 125001 143248 164224 168209 172406 167600 160578 146530 124505 90064 67769 42179 21645 1070];

PoreSize = [89716.4 71264.3 56607.2 44964.7 35716.7 28370.8 22535.7 17900.8 14219.1 11294.6 8971.6 7126.4 5660.7 2837.1 2253.6 1790.1 712.6 566.1 449.6 357.2 283.7 225.4 179 142.2 112.9 89.7 71.3 56.6 45 35.7];

data = [];

for i = 1:numel(Counts)

data = [data,PoreSize(i)*ones(1,Counts(i))];

end

phat = mle(data,'Distribution','LogNormal')

phat = 1×2

5.5569 0.8497

x = linspace(0,2000,1000);

hold on

plot(x,lognpdf(x,phat(1),phat(2)),'g')

histogram(data,'Normalization','pdf')

xlim([0 2000])

probs = Counts / sum(Counts);

ExpOfY = sum(PoreSize.*probs);

Ysqr = PoreSize.^2;

ExpOfYsqr = sum(Ysqr.*probs);

VarOfY = ExpOfYsqr - ExpOfY^2; % variance of the scores tabulated in vectors 1 & 2

normu = log(ExpOfY^2/sqrt(VarOfY+ExpOfY^2))

normu = 4.9283

norvar = log(1+VarOfY/ExpOfY^2);

norsigma = sqrt(norvar)

norsigma = 1.4788

plot(x,lognpdf(x,normu,norsigma),'b')

hold off

1 Comment
Show -1 older commentsHide -1 older comments

Alex Lee on 9 Jan 2024

Many thanks for your help!

Sign in to comment.

Lognormal Fitting with the Pore Size and its Probabilities

18 Comments
Show 16 older commentsHide 16 older comments

Accepted Answer

27 Comments
Show 25 older commentsHide 25 older comments

More Answers (2)

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

Lognormal Fitting with the Pore Size and its Probabilities

18 Comments Show 16 older commentsHide 16 older comments

Accepted Answer

27 Comments Show 25 older commentsHide 25 older comments

More Answers (2)

1 Comment Show -1 older commentsHide -1 older comments

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

18 Comments
Show 16 older commentsHide 16 older comments

27 Comments
Show 25 older commentsHide 25 older comments

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments