curve fitting and optimisation code problems for f(x)= a.*exp(-b./(x.^p)) to dataset

1 view (last 30 days)
I'm trying to fit a model to my data - I've attempted using the curve fitting tool and writing some code (see below) but both approaches are failing. The model I am trying to fit is: f(x)= a.*exp(-b./(x.^p)), where the only thing I know is that p falls between 0.25 and 0.5.
When I use the curve fitting tool, only a straight line appears at a value that is too far off to even be visible in the dispay (photo attached)
When I use the code, I get an error message popping up as below.
Can anyone explain what could be going wrong here?
t = set6a{1:end,1};
y = set6a{1:end,2};
plot(t,y,'ro')
title('Data points')
F=@(x,xdata) (x(1)*exp(-1*(x(2)./((xdata).^x(3)))));
x0 = [0.00000001 1 0.5];
[x,resnorm,~,exitflag,output] = lsqcurvefit(F,x0,t,y)
hold on
plot(t,F(x,t))
hold off
The error message I'm getting is this:
Error using snls (line 47)
Objective function is returning undefined values at initial point. lsqcurvefit cannot continue.
Error in lsqncommon (line 166)
snls(funfcn,xC,lb,ub,flags.verbosity,options,defaultopt,initVals.F,initVals.J,caller, ...
Error in lsqcurvefit (line 257)
lsqncommon(funfcn,xCurrent,lb,ub,options,defaultopt,caller,...
Error in curvefittingpractice2 (line 44)
[x,resnorm,~,exitflag,output] = lsqcurvefit(F,x0,t,y)
Any thoughts on this would be really helpful!
  2 Comments
Star Strider
Star Strider on 10 May 2020
The data on the plot and the data in the Excel file appear to be significantly different. What subset of data are you using?
There is a significant discontinuity between x-values of about 205 and 297 that does not show up on the plot you posted. That discontinuity makes even the genetic algorithm that I tried impossible to fit.
I nevertheless got these constants for the parameters for the best fit (out of 100 runs):
Rate Constants:
Theta(1) = 126.87726
Theta(2) = 33.15605
Theta(3) = 0.06804
I am not posting this as an Answer because it isn’t one. It’s just an observation of sorts.
Em
Em on 10 May 2020
I'm really sorry! I attached the wrong dataset - the correct dataset is attached to the post now

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 10 May 2020
Edited: John D'Errico on 10 May 2020
A serious problem with this is what I would suggest looks like almost two functions merged into one. Worse, the two "curves" at the bottom end are not consistent in either case with what we see at the top end, so schizophrenic data. And worse yet, you have a least two large outliers. Large residual nonlinear least square problems are difficult to solve, because one data point can really swings things around. It has an effectively huge residual. What else? Oh yeah, your data varies by an order of magnitude. The result of that is the points at one end of the curve impart effectively more influence on the result than they should.
First what happens to your model if we log y? This actually solves one of the issues I mention above, since we no longer have data that varies by powers of 10 when the fit is performed. Effectively, I am suggesting we think of your data in a semi-logy form.
semilogy(x,y,'o')
grid on
xlabel X
ylabel Y
The strange behavior of your data should be clear here. Thus we see the outliers (be careful not to miss the outlier on the very bottom.) We see data that on the right half seems to be nearly a perfectly straight line. The left half we see two pieces that clearly represent two completely different and inconsistent processes. They are also inconsistent with the data on the right. Two different things are happening there, neither of which is consistent with the right half of your data.
What will happen when we try to estimate a model, is your curve will not represent all three of those behaviors.
Before I get into a fit, first, log your model. That is, when we form the semilogy plot, we are changing your model from this:
y = a*exp(-b/(x^p))
into this:
log(y) = log(a) - b*x^(-p)
Dividing by x^p is just a multiply by x^(-p). And log(a) is still just a constant that we don't know. Effectively, we can think of the model in a logged form as almost a polynomial. Most seem to call that a power-law model. Again though, these curves really do seem to be purely linear when we look at the semi-logy plot. That implies p==-1 is about as close as my eye wants to fit this. Remember, you have that -p in there, so two negatives do make a positive in this case. As such, the proper model for your curve prbably has p as very close to -1.
Time to do a first cut at a model. I'll just use the basic fitting tools from the plot figure itself to create that model.
plot(x,log(y),'o')
grid on
On the right half, again, that data really is nearly perfectly linear, implying p==-1. But on the left half, which of those curves should it follow? A curve fit cannot follow both curves. And if it chases one, then it misses the other by much more. And worse, if you do go chase one of those lower personalities, then the curve will miss the upper part by more yet.
The funny thing is, those two nasty looking outlier points almost seem to be the least of your problems. Even so, I would still delete them from the analysis before proceeding further. They are still influencing your estimates.
What would I do?
  1. Delete the two outliers.
  2. Determine why it is your data is so schizophrenic. What happens that drives the two behaviors? Why is there a split at roughly x~150? Choose the process you wish to accept as truth. Only when you excise the split personalities in your data can you build a model that works.
  3. Finally, either split the problem into two pieces, performing a broken-stick regression with the break between the two halves of your curve, or try to build once global model that won't fit either half very well.
So first, in your data set (ignoring the NaNs) we see
Right half of the curve: points # 10:47
The two outliers: # 1, 49
Left curve (lower branch: # 2:9
Left curve, upper bransh: # 48:60
So if I had to guess, you are doing something strange to generate the data, that creates different behaviors that are inconsistent with each other. A model is not going to bring it all together in some magic way.
I might split the data like this:
ind1 = 10:47;
ind2 = 2:9;
ind3 = 48:60;
ind4 = [1 49];
If we just consider the piece at the right, we would see:
p1 = polyfit(x(ind1),log(y(ind1)),1)
p1 =
0.0185627813572888 -27.2729761320311
Nothing more sophisticated than polyfit was needed. So a very good model for that curve segment will be simply
y = exp(-27.2729761320311) * exp(0.0185627813572888 * x)
Note the fact that x in that model did not even require a negative exponent.
Plot the data now, with this model. As you will see, the fit really is very good. I'll even view it in an unlogged form, to show I was sucessful in posing a reasonable model for the entire process, even though the model I would pose is far simpler than what you have intended.
plot(x(ind1),y(ind1),'bo',x(ind2),y(ind2),'go',x(ind3),y(ind3),'ko',x(ind4),y(ind4),'r*')
grid on
hold on
xhat = 40:230;
yhat = exp(p1(2))*exp(p1(1)*xhat);
plot(xhat,yhat,'b-')
In fact, the model fits splendidly well for the upper half. There simply is no good way to resolve the split personality in the lower half though. At least not so until you understand why the split exists, and how to properly deal with it. My suggestion at this point would be to get better data.
Finally, can I reconcile your claim that p should be on the order of 0.25 to 0.5? That seems to be wildly inconsistent with the data you have provided.
  2 Comments
Em
Em on 13 May 2020
Thanks so much for this answer! You have solved all my queries and made me realise that I need to fit another model to my results - along the lines you showed me where P=~1. I think my experiment could be switching between operating in 2 different ways in fact so I will split the problem into 2 pieces as you said.

Sign in to comment.

More Answers (0)

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!