Extrapolation of data points

Hi everyone.
I have plotted a set of x and y values (See pictures attached).
I want to extrapolate the two curves until they intercept each other.
Can anyone help me with this?
Thank you :)

3 Comments

Probably interp1 will help you
Will they ever intersect?
You have no idea what the data do beyond the values you have measured and plotted. If you have a mathematical model that correctly describes the process that created your data, and you have accurately estimated its parameters, you might be able to extrapoilate your data. That still does not mean that the curves will intersect.
So just take a wild guess as to where they will intersect. It will be as accurate as anything you can calculate!
I also noted in my answer that the curves are very close to being parallel and that the extrapolated intersection is likely to be very sensitive to the extrapolation procedure.

Sign in to comment.

 Accepted Answer

Jon
Jon on 9 Aug 2019
Edited: Jon on 9 Aug 2019
You could first find the equation of a straight line that goes through the last n points of one curve, and then similarly find an equation of a straight line that goes through the last n points of the second curve. Given these two equations, you can easily solve analytically (routine algebra) to find there intersection, or you could plot the two lines along with your original data and see the intersection graphically.
To get the equation of a straight line through the last n points you could use something like
p1 = polyfit(x1(end-n:end),y1(end-n:end),1) % p1(1) is slope, p1(2) is intercept
p2 = polyfit(x2(end-n:end),y2(end-n:end),1) % p2(1) is slope, p2(2) is intercept
if you wanted to plot these two straight lines out to the extrapolated region
% define final values for extrapolation and number of points to plot
numPoints = 50;
xfinal = 100;
xfit1 = linspace(x1(end-n),xfinal,numPoints)
xfit2 = linspace(x2(end-n),xfinal,numPoints)
yfit1 = polyval(p1,xfit1)
yfit2 = polyval(p2,xfit2)
plot(x1,y1,xfit1,yfit1,x2,y2,xfit2,yfit2)
With a little more thought you could make the x's, y's etc into matrices, and make little loops so you wouldn't have to do everything twice.
Note: The curves look like they are quite close to being parallel, so, your extrapolated intersection will be very sensitive to how you choose to define your extrapolation (very slight shifts in the slopes or intercepts of the extrapolated lines will make big shifts in the intersection point)
Finally as @Alex suggests, you could use interp1 to do the extrapolation, but for example if you chose the 'linear' method I think it would only use the last two data points for the extrapolation, which might be less robust than the approach I outlined above. It would however be very simple to use interp1, if it works well enough.

8 Comments

Did this solve your problem? If this answered your question, please accept the answer.
Hi Jon!
Sorry for the late reply but I was on holiday.
It works for the first serie (x1, y1) but not for the second :/
For x2,y2 says "NaN".
Do you have an idea why it cannot fit?
I would have to look at your data and the code you are actually using when you get the error. If you want you can attach that and I can see if I can tell what is going on.
Thanks for your help Jon!
I did in excel and calculated the equation line for last points.
But now that I'm writing the paper and I will have to justify how did I get the value, I would like to have a more sofisticated justification.
I attached my excel file and the code.
Data = xlsread('Book1.xlsx')
x1 = Data(:,1);
y1 = Data(:,2);
x2 = Data(:,3);
y2 = Data(:,4);
plot(x1,y1,x2,y2,'LineWidth',2)
xlabel ('Axial deformation (%)')
ylabel ('Void ratio (-)')
lgd = legend('6kPa 22RD','6kPa 70RD')
p1 = polyfit(x1(end-100:end),y1(end-100:end),1) % p1(1) is slope, p1(2) is intercept
p2 = polyfit(x2(end-100:end),y2(end-100:end),1) % p2(1) is slope, p2(2) is intercept
% define final values for extrapolation and number of points to plot
numPoints = 50;
xfinal = 100;
xfit1 = linspace(x1(end-400),xfinal,numPoints)
xfit2 = linspace(x2(end-400),xfinal,numPoints)
yfit1 = polyval(p1,xfit1)
yfit2 = polyval(p2,xfit2)
plot(x1,y1,xfit1,yfit1,x2,y2,xfit2,yfit2,'LineWidth',2)
xlabel ('Axial deformation (%)')
ylabel ('Void ratio (-)')
lgd = legend('6kPa 22RD','6kPa 70RD')
Jon
Jon on 19 Sep 2019
Edited: Jon on 19 Sep 2019
So the main problem you are having is that in the Excel worksheet, there are less data points for the second curve than for the first curve. So there are blanks for those entries in the Excel sheet. Matlab imports those blanks as NaN (not a number), padding the end of Data(:,3) and Data(:,4) with the NaN's so that they have the same number of rows as Data(:,1) and Data(:,2). The NaN's are not plotted, and also give you a misleading idea about where the data ends for the second curve.
In the attached code you can see how to eliminate those NaN values and everything then works. I also added a litttle bit to your code to calculate and display the intersection point.
I also made a couple of other tweaks. I put in a figure statement before starting to make each plot so the second plot wouldn't overwrite the first one. Also I removed the left hand side from your call to legend since you never used the variable that this returned.
Data = xlsread('Book1.xlsx');
x1 = Data(:,1);
y1 = Data(:,2);
x2 = Data(:,3);
y2 = Data(:,4);
% xlsread returns NaN for empty cells in Excel sheet.
% this gives a problem because x2 and y2 are not as long as x1 and y1 so
% they get padded with NaN.
% Eliminate the NaN entries
% assume y2 is good whenever x2 is good
ikeep = ~isnan(x2);
x2 = x2(ikeep);
y2 = y2(ikeep);
figure
plot(x1,y1,x2,y2,'LineWidth',2)
xlabel ('Axial deformation (%)')
ylabel ('Void ratio (-)')
legend('6kPa 22RD','6kPa 70RD');
p1 = polyfit(x1(end-100:end),y1(end-100:end),1); % p1(1) is slope, p1(2) is intercept
p2 = polyfit(x2(end-100:end),y2(end-100:end),1); % p2(1) is slope, p2(2) is intercept
% find intersection between extrapolated lines
% solve p1(1)*xmatch + p1(2) = p2(1)*xmatch + p2(2)
xmatch = (p2(2) - p1(2))/(p1(1) - p2(1));
ymatch = polyval(p1,xmatch); % could use either p1 or p2 since y will be the same at intersection
% define final values for extrapolation and number of points to plot
numPoints = 50;
xfinal = 100;
xfit1 = linspace(x1(end-400),xfinal,numPoints);
xfit2 = linspace(x2(end-400),xfinal,numPoints);
yfit1 = polyval(p1,xfit1);
yfit2 = polyval(p2,xfit2);
figure
plot(x1,y1,xfit1,yfit1,x2,y2,xfit2,yfit2,xmatch,ymatch,'o','LineWidth',2)
xlabel ('Axial deformation (%)')
ylabel ('Void ratio (-)')
legend('6kPa 22RD','6kPa 70RD');
text(20,0.85,['intersection at x = ',num2str(xmatch),' y = ',num2str(ymatch)])
John D'Errico
John D'Errico on 19 Sep 2019
Edited: John D'Errico on 19 Sep 2019
I would like to echo the fact that the intersection point as predicted will be a virtually random number, given the curves that I saw in the plot. It will be STRONGLY influenced by the number of points in the sample used for those linear polynomial fits. And since you can choose how many points to use there, you can get almost any number out that you want. This is a VERY bad idea to publish in a paper. Were I a reviewer on your paper, I would flag you on that issue alone.
One thing you can do is to use the uncertainties in your estimated polynomial coefficients. Then use an error propagation technique, to infer the approximate uncertainty in the intersection point. I would predict the error bounds on that point of intersection will be hugely wide. Were you to then include such information in your paper, and acknowledged that the uncertainty is very wide, then I might no longer complain as a referee on your paper.
Another thing you might do is to look at the sensitivity of the extrapolated intersection point, as a function of the size of the sample you use for the fit near the end. If the prediction varies all over the map, then you should recognize that publishing such a number is a bad idea, again unless you publish the variation in the predictions to indicate your uncertainty.
But to use use extrapolation, publishing a result based on that, with no statement about the uncertainty in the numbers is bad mathematics, and worthy of inclusion in the book "How To Lie With Statistics". Yes, I accept that you would not do so intentionally. But having been now warned about the issues, I would strongly suggest that you consider doing as I have suggestd above.
Thanks very much Jon!
It works perfectly! :)
You are right John!
I tested and knew this before.
My solution will depende on the number of points I choose.
I think for the calculations I want to do, 1.0341 or 1.0357 will not have a big impact. But I'm not sure, I need to look at that.
I used the error propagation before in some measurements of my experiments, so I can use it for this as well.
Thanks :)
@Jon hello dr jon i hope that you are doing well
am about to extrapoliate data from excel file the calculeted data i that i got from the extrapolation are NAN and when i change the method for exmlpe i use linear method i got negative value could you sir help me

Sign in to comment.

More Answers (0)

Asked:

on 9 Aug 2019

Commented:

on 22 Jun 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!