15 views (last 30 days)

I have two data set. How can i extrapolate for NaN?

fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];

P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];

John D'Errico
on 27 Mar 2020

Extrapolation is a dangerous thing to do. Best left to someone with sufficient expertise that they know when they are doing something bad. However, even the experts frequetly crash and fail on this. Consider the many people who try to extrapolate population trends or the weather, or the stock market, out for any period of time. Also consider the disagreements you will find, even among those who claim to have expertise, in exactly those things.

In your case however, we have a problem of extrapolation over a short relative distance. But even here there are issues. The biggest of those issues is that you have ONLY 4 data points with valid data. As bad as that is the fact that your data is clearly not perfect, rounded as it is to barely more than one significant digit.

fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];

P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];

plot(fs,P,'o-')

grid on

That means we need to know the behavior of this relationship as the independent variable (fs) approaches 0 as a limit from above. Does this system require that P(fs==0) should be 0? Or can it be that this sytem will approach some non-zero limit?

Depending on which of those is the case, you would then choose to use some model for that system, whatever is appropriate. Thus an appropriate model here might be some sort of exponential process, or perhaps a low order polynomial.

But unfortunately, you have not provided enough data, thus insufficient information to have any confidence what is the answer based only on the data.

I might suggest the curve fitting toolbox here, as it allows fairly general models, and we can easily exclude a constant term, but you could also just use polyfit to estimate some models. A very nice feature of FIT from the curve fitting toolbox is it also easily provides some degree of confidence interval around the coefficients. However, with only 4 data points, there is no simple way to intelligently extrapolate your data. That is, we might do this:

mdl = fit(fs(6:9)',P(6:9)','poly3')

which fits a cubic polynomial through the 4 data points you have. I'm not at all confidant that an interpolating polynomial is a good idea though, and extrapolating all the way down near zero is a relatively long distance in this context.

mdl = fit(fs(6:9)',P(6:9)','poly3')

mdl =

Linear model Poly3:

mdl(x) = p1*x^3 + p2*x^2 + p3*x + p4

Coefficients:

p1 = -1.591e-06

p2 = 0.000305

p3 = 0.01208

p4 = 0.03216

As I said, that is an interpolating cubic polynomial. Now, to extrapolate down to the lower values, just do this:

[fs(1:5)',mdl(fs(1:5))]

ans =

0.0001 0.0321649508802425

0.001 0.0321758248664304

0.01 0.0322845919048414

0.1 0.0333749785263151

1 0.0445490526315783

This suggest that extrapolated all the way to zero, we would expect so see a prediction at fs==0 of

mdl.p4

ans =

0.0321637426900577

so the constant term of the polynomial.

As I asked before though, do you expect this process should approach exact zero at fs==0? If so, then you cannot use a polynomial model with a constant term in it. We can now build a model from that data.

ft = fittype('p3*x.^3 + p2*x.^2 + p1*x','indep','x')

ft =

General model:

ft(p1,p2,p3,x) = p3*x.^3 + p2*x.^2 + p1*x

mdl2 = fit(fs(6:9)',P(6:9)',ft)

mdl2 =

General model:

mdl2(x) = p3*x.^3 + p2*x.^2 + p1*x

Coefficients (with 95% confidence bounds):

p1 = 0.01531 (-0.01057, 0.04119)

p2 = 0.0002285 (-0.000635, 0.001092)

p3 = -1.115e-06 (-7.274e-06, 5.043e-06)

How does this extrapolate down to the small values of fs?

[fs(1:5)',mdl2(fs(1:5))]

ans =

0.0001 1.53118614221083e-06

0.001 1.53120670264829e-05

0.01 0.000153141229708404

0.1 0.00153346724755357

1 0.0155391736344852

Again, this model will predict zero when fs is exactly zero, since there is no constant term in the model.

Is that the correct extrapolation? Of course it is not perfect. As I said, it completely depends on your assumptions of what will happen at zero. And there is no way to know for us what the process represents, so knowing how to best extrapolate is just a wild guess, certainly so for me.

Sign in to comment.

Ameer Hamza
on 27 Mar 2020

Edited: Ameer Hamza
on 27 Mar 2020

The following code linear extrapolation on the available data.

fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];

P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];

% filter data without nan

fs_ = fs(~isnan(P));

P_ = P(~isnan(P));

P_extarp = interp1(fs_, P_, fs, 'linear', 'extrap');

John D'Errico
on 27 Mar 2020

Note that the linear extrapolant uses only the lowest two data points, essentially ignoring the upper data points for this purpose. As such, it implicitly fits a straight line thrugh the points at fs == 5 and 25, then extrapolating that all the way down to 1, .01, .001, .0001. It is as if you had used polyfit on those two data points.

An important feature of that approximation is it does not make any assumption of what happens at fs==0. The data itself for those two points has the points as

[fs;P]

ans =

0.0001 0.001 0.01 0.1 1 5 25 50 100

NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7

I am fairly confident that this data was made up, as those first two (non-nan) data points happen to fall on a line that passes exactly through the origin. Thus the linear extrpolant used will be this one:

poly1model = polyfit(fs(6:7),P(6:7),1)

poly1model =

0.02 -1.7691e-17

Thus a straight line with a constant term of zero (though there is some floating point trash in the result) and a slope of 0.02.

If the data was not made up (as I am again fairly confident this was, or at least it was heaily rounded) then the linear extrapolant might not be so well behaved.

Ameer Hamza
on 27 Mar 2020

Sign in to comment.

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.