How can i extrapolate data?

Question

Hein zaw on 27 Mar 2020

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/513367-how-can-i-extrapolate-data

Commented: Ameer Hamza on 27 Mar 2020

Accepted Answer: John D'Errico

I have two data set. How can i extrapolate for NaN?

fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];

P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 27 Mar 2020

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/513367-how-can-i-extrapolate-data#answer_422365

Open in MATLAB Online

Extrapolation is a dangerous thing to do. Best left to someone with sufficient expertise that they know when they are doing something bad. However, even the experts frequetly crash and fail on this. Consider the many people who try to extrapolate population trends or the weather, or the stock market, out for any period of time. Also consider the disagreements you will find, even among those who claim to have expertise, in exactly those things.

In your case however, we have a problem of extrapolation over a short relative distance. But even here there are issues. The biggest of those issues is that you have ONLY 4 data points with valid data. As bad as that is the fact that your data is clearly not perfect, rounded as it is to barely more than one significant digit.

fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];
P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];
plot(fs,P,'o-')
grid on

That means we need to know the behavior of this relationship as the independent variable (fs) approaches 0 as a limit from above. Does this system require that P(fs==0) should be 0? Or can it be that this sytem will approach some non-zero limit?

Depending on which of those is the case, you would then choose to use some model for that system, whatever is appropriate. Thus an appropriate model here might be some sort of exponential process, or perhaps a low order polynomial.

But unfortunately, you have not provided enough data, thus insufficient information to have any confidence what is the answer based only on the data.

I might suggest the curve fitting toolbox here, as it allows fairly general models, and we can easily exclude a constant term, but you could also just use polyfit to estimate some models. A very nice feature of FIT from the curve fitting toolbox is it also easily provides some degree of confidence interval around the coefficients. However, with only 4 data points, there is no simple way to intelligently extrapolate your data. That is, we might do this:

mdl = fit(fs(6:9)',P(6:9)','poly3')

which fits a cubic polynomial through the 4 data points you have. I'm not at all confidant that an interpolating polynomial is a good idea though, and extrapolating all the way down near zero is a relatively long distance in this context.

mdl = fit(fs(6:9)',P(6:9)','poly3')
mdl = 
     Linear model Poly3:
     mdl(x) = p1*x^3 + p2*x^2 + p3*x + p4
     Coefficients:
       p1 =  -1.591e-06
       p2 =    0.000305
       p3 =     0.01208
       p4 =     0.03216

As I said, that is an interpolating cubic polynomial. Now, to extrapolate down to the lower values, just do this:

[fs(1:5)',mdl(fs(1:5))]
ans =
                    0.0001        0.0321649508802425
                     0.001        0.0321758248664304
                      0.01        0.0322845919048414
                       0.1        0.0333749785263151
                         1        0.0445490526315783 

This suggest that extrapolated all the way to zero, we would expect so see a prediction at fs==0 of

mdl.p4
ans =
        0.0321637426900577

so the constant term of the polynomial.

As I asked before though, do you expect this process should approach exact zero at fs==0? If so, then you cannot use a polynomial model with a constant term in it. We can now build a model from that data.

ft = fittype('p3*x.^3 + p2*x.^2 + p1*x','indep','x')
ft = 
     General model:
     ft(p1,p2,p3,x) = p3*x.^3 + p2*x.^2 + p1*x
mdl2 = fit(fs(6:9)',P(6:9)',ft)
mdl2 = 
     General model:
     mdl2(x) = p3*x.^3 + p2*x.^2 + p1*x
     Coefficients (with 95% confidence bounds):
       p1 =     0.01531  (-0.01057, 0.04119)
       p2 =   0.0002285  (-0.000635, 0.001092)
       p3 =  -1.115e-06  (-7.274e-06, 5.043e-06)

How does this extrapolate down to the small values of fs?

[fs(1:5)',mdl2(fs(1:5))]
ans =
                    0.0001      1.53118614221083e-06
                     0.001      1.53120670264829e-05
                      0.01      0.000153141229708404
                       0.1       0.00153346724755357
                         1        0.0155391736344852

Again, this model will predict zero when fs is exactly zero, since there is no constant term in the model.

Is that the correct extrapolation? Of course it is not perfect. As I said, it completely depends on your assumptions of what will happen at zero. And there is no way to know for us what the process represents, so knowing how to best extrapolate is just a wild guess, certainly so for me.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Ameer Hamza on 27 Mar 2020

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/513367-how-can-i-extrapolate-data#answer_422364

Edited: Ameer Hamza on 27 Mar 2020

Open in MATLAB Online

The following code linear extrapolation on the available data.

fs=[0.0001 0.001 0.01 0.1 1 5 25 50 100];
P=[NaN NaN NaN NaN NaN 0.1 0.5 1.2 2.7];
% filter data without nan
fs_ = fs(~isnan(P));
P_ = P(~isnan(P));
P_extarp = interp1(fs_, P_, fs, 'linear', 'extrap');

2 Comments
Show NoneHide None

John D'Errico on 27 Mar 2020

Open in MATLAB Online

Note that the linear extrapolant uses only the lowest two data points, essentially ignoring the upper data points for this purpose. As such, it implicitly fits a straight line thrugh the points at fs == 5 and 25, then extrapolating that all the way down to 1, .01, .001, .0001. It is as if you had used polyfit on those two data points.

An important feature of that approximation is it does not make any assumption of what happens at fs==0. The data itself for those two points has the points as

[fs;P]
ans =
 0.0001  0.001   0.01    0.1      1      5     25     50    100
    NaN    NaN    NaN    NaN    NaN    0.1    0.5    1.2    2.7

I am fairly confident that this data was made up, as those first two (non-nan) data points happen to fall on a line that passes exactly through the origin. Thus the linear extrpolant used will be this one:

poly1model = polyfit(fs(6:7),P(6:7),1)
poly1model =
         0.02  -1.7691e-17

Thus a straight line with a constant term of zero (though there is some floating point trash in the result) and a slope of 0.02.

If the data was not made up (as I am again fairly confident this was, or at least it was heaily rounded) then the linear extrapolant might not be so well behaved.

Ameer Hamza on 27 Mar 2020

Good analysis. Yes, It appears from values that the OP just created these values as an example, and the actual dataset might have different characteristics. Since OP didn't provide any information, so my solution was just an example. Unless OP provides the real dataset, we can just speculate.

Sign in to comment.

How can i extrapolate data?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

How can i extrapolate data?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None