2 views (last 30 days)
laurensius adhitya on 17 Sep 2021
Edited: John D'Errico on 17 Sep 2021
Hi everyone i want to ask how to do curve fitting equation for 4 variable. I have variable (a,b,c,d) And i want to find k1,k2,k3, and k4 from the equation a=k1*b^k2*c^k3*d^k4, Thank you. Do you have any suggestions what kind of method can i use? Thank you

John D'Errico on 17 Sep 2021
Edited: John D'Errico on 17 Sep 2021
Invariably the data will vary by multiple orders of magnitude in such a problem. And that tends to imply the noise in your data is NOT normally distributed. It virtually cannot be Gaussian noise. In fact, odds are, the noise is what may be called proportional noise. So multiplicative noise. It might follow a lognormal distribution, as would be common.
A serious problem when you have mutiplicative noise on data, is if you then try to use a nonlinear regression that treats every data point as equally important, is it places far too much importance on some of the data.
Anyway, that should give you the clue to how to solve this problem.
LOG your data. That is, take the log of your model. (As well as your data.) So, if we have:
a=k1*b^k2*c^k3*d^k4
then
log(a) = log(k1) + k2*log(b) + k3*log( c) + k4*log(d)
Feel free to choose what log base you use, thus log10, or the natural log. Whatever floats your boat.
A really nice thing is that now, any multiplicative noise you may have had before, is now viewed as purely additive noise. So now a simple linear least squares wil apply. And even better, you should see the coeficients in this model can now be estimated using a simple linear regression. This is becuse those coefficients that were once exponents are now merely multiplicative constants in a simple additive linear model.
That is, you can now use any tool applied to the logs of your data, such as regress, or fitlm, etc. You can even use the backslash operator.
When all is done, remember to exponentiate the constant term in the model, since it too got logged in that transformation.
As an example, since I lack your data, here is how you would handle it, on some sample data.
n = 100;
X = rand(n,1);
Y = rand(n,1);
coeffs = [2 3 4]; % Ground truth
Z = coeffs(1)*X.^coeffs(2).*Y.^coeffs(3) .* lognrnd(0,.25,n,1);
plot3(X,Y,Z,'o')
view(21,15)
grid on
box on Actually, the data is not too bad looking.
Now we can fit this using one of two models. First, a nonlinear regression.
ft = fittype('k1*X.^k2.*Y^k3','indep',{'X','Y'})
ft =
General model: ft(k1,k2,k3,X,Y) = k1*X.^k2.*Y^k3
mdl = fit([X,Y],Z,ft)
Warning: Start point not provided, choosing random start point.
General model: mdl(X,Y) = k1*X.^k2.*Y^k3 Coefficients (with 95% confidence bounds): k1 = 2.342 (2.202, 2.481) k2 = 3.62 (3.342, 3.899) k3 = 3.847 (3.558, 4.136)
And that would appear to be not too terrible, but need I point out that the confidence intervals for all three coefficients do not even contain what we know to be the ground truth valus, of [2 3 4]?
I did not even feel the need to give it starting values, but fit handlesd that well enough. Anyway, the problem is not lack of convergence. Now let me use a linear least squares.
K123 = [ones(n,1),log(X),log(Y)]\log(Z);
K123 = [exp(K123(1));K123(2:3)]
K123 = 3×1
2.0668 3.0178 4.0104
And that seems to hit the ground truth values nearly dead on.
So, how well did the nonlinear model do? Compared to the linear fit, the nonlinear fit was pure crapola in terms of how well the coefficients were estimated. The problem was the noise is not properly handled when we do the nonlinear regression, since the noise is truly proportional noise in this case. And the result was the coeffficient estimates were poor when I did the nonlinear regression.
William Rose on 17 Sep 2021
@John D'Errico makes some very good points! I like the "take the log of both sides" idea.

William Rose on 17 Sep 2021
One set of four numbers (a,b,c,d) is not enough to determin the four unknowns (k1,k2,k3,k4). You need four independent sets of (a,b,c,d). Then you wil have four equations and four unknowns. If you have more than four sets of (a,b,c,d), then you will want to find the values of k1,k2,k3,k4 which give the best fit. You must define "best fit". For exampe, you might define best fit as the fit that minimizes the sum squared error beyween the predicted and measured values of a. In that case, I recommend the funciotn fmincon(). See the documentation for how to use it.