# How can i extrapolate data?

15 views (last 30 days)
Hein zaw on 27 Mar 2020
Commented: Ameer Hamza on 27 Mar 2020
I have two data set. How can i extrapolate for NaN?
fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];
P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];

John D'Errico on 27 Mar 2020
Extrapolation is a dangerous thing to do. Best left to someone with sufficient expertise that they know when they are doing something bad. However, even the experts frequetly crash and fail on this. Consider the many people who try to extrapolate population trends or the weather, or the stock market, out for any period of time. Also consider the disagreements you will find, even among those who claim to have expertise, in exactly those things.
In your case however, we have a problem of extrapolation over a short relative distance. But even here there are issues. The biggest of those issues is that you have ONLY 4 data points with valid data. As bad as that is the fact that your data is clearly not perfect, rounded as it is to barely more than one significant digit.
fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];
P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];
plot(fs,P,'o-')
grid on That means we need to know the behavior of this relationship as the independent variable (fs) approaches 0 as a limit from above. Does this system require that P(fs==0) should be 0? Or can it be that this sytem will approach some non-zero limit?
Depending on which of those is the case, you would then choose to use some model for that system, whatever is appropriate. Thus an appropriate model here might be some sort of exponential process, or perhaps a low order polynomial.
But unfortunately, you have not provided enough data, thus insufficient information to have any confidence what is the answer based only on the data.
I might suggest the curve fitting toolbox here, as it allows fairly general models, and we can easily exclude a constant term, but you could also just use polyfit to estimate some models. A very nice feature of FIT from the curve fitting toolbox is it also easily provides some degree of confidence interval around the coefficients. However, with only 4 data points, there is no simple way to intelligently extrapolate your data. That is, we might do this:
mdl = fit(fs(6:9)',P(6:9)','poly3')
which fits a cubic polynomial through the 4 data points you have. I'm not at all confidant that an interpolating polynomial is a good idea though, and extrapolating all the way down near zero is a relatively long distance in this context.
mdl = fit(fs(6:9)',P(6:9)','poly3')
mdl =
Linear model Poly3:
mdl(x) = p1*x^3 + p2*x^2 + p3*x + p4
Coefficients:
p1 = -1.591e-06
p2 = 0.000305
p3 = 0.01208
p4 = 0.03216
As I said, that is an interpolating cubic polynomial. Now, to extrapolate down to the lower values, just do this:
[fs(1:5)',mdl(fs(1:5))]
ans =
0.0001 0.0321649508802425
0.001 0.0321758248664304
0.01 0.0322845919048414
0.1 0.0333749785263151
1 0.0445490526315783
This suggest that extrapolated all the way to zero, we would expect so see a prediction at fs==0 of
mdl.p4
ans =
0.0321637426900577
so the constant term of the polynomial.
As I asked before though, do you expect this process should approach exact zero at fs==0? If so, then you cannot use a polynomial model with a constant term in it. We can now build a model from that data.
ft = fittype('p3*x.^3 + p2*x.^2 + p1*x','indep','x')
ft =
General model:
ft(p1,p2,p3,x) = p3*x.^3 + p2*x.^2 + p1*x
mdl2 = fit(fs(6:9)',P(6:9)',ft)
mdl2 =
General model:
mdl2(x) = p3*x.^3 + p2*x.^2 + p1*x
Coefficients (with 95% confidence bounds):
p1 = 0.01531 (-0.01057, 0.04119)
p2 = 0.0002285 (-0.000635, 0.001092)
p3 = -1.115e-06 (-7.274e-06, 5.043e-06)
How does this extrapolate down to the small values of fs?
[fs(1:5)',mdl2(fs(1:5))]
ans =
0.0001 1.53118614221083e-06
0.001 1.53120670264829e-05
0.01 0.000153141229708404
0.1 0.00153346724755357
1 0.0155391736344852
Again, this model will predict zero when fs is exactly zero, since there is no constant term in the model.
Is that the correct extrapolation? Of course it is not perfect. As I said, it completely depends on your assumptions of what will happen at zero. And there is no way to know for us what the process represents, so knowing how to best extrapolate is just a wild guess, certainly so for me.

Ameer Hamza on 27 Mar 2020
Edited: Ameer Hamza on 27 Mar 2020
The following code linear extrapolation on the available data.
fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];
P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];
% filter data without nan
fs_ = fs(~isnan(P));
P_ = P(~isnan(P));
P_extarp = interp1(fs_, P_, fs, 'linear', 'extrap');

John D'Errico on 27 Mar 2020
Note that the linear extrapolant uses only the lowest two data points, essentially ignoring the upper data points for this purpose. As such, it implicitly fits a straight line thrugh the points at fs == 5 and 25, then extrapolating that all the way down to 1, .01, .001, .0001. It is as if you had used polyfit on those two data points.
An important feature of that approximation is it does not make any assumption of what happens at fs==0. The data itself for those two points has the points as
[fs;P]
ans =
0.0001 0.001 0.01 0.1 1 5 25 50 100
NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7
I am fairly confident that this data was made up, as those first two (non-nan) data points happen to fall on a line that passes exactly through the origin. Thus the linear extrpolant used will be this one:
poly1model = polyfit(fs(6:7),P(6:7),1)
poly1model =
0.02 -1.7691e-17
Thus a straight line with a constant term of zero (though there is some floating point trash in the result) and a slope of 0.02.
If the data was not made up (as I am again fairly confident this was, or at least it was heaily rounded) then the linear extrapolant might not be so well behaved.
Ameer Hamza on 27 Mar 2020
Good analysis. Yes, It appears from values that the OP just created these values as an example, and the actual dataset might have different characteristics. Since OP didn't provide any information, so my solution was just an example. Unless OP provides the real dataset, we can just speculate.