extrapolation from correlated data

Hi,
I've got a bit of a problem.
I've got 2 data sets, which are very highly correlated with one another. Knowing the future values of one set of data, I would like to extrapolate the other set. How would I go about doing this

 Accepted Answer

Hint:
If you're data are linear, the regression slope is the covariance between X and Y divided by the variance of X
m = cov(x,y)/var(x);
where a point (x0) would fall along the line at
y0 = m(2,1)*x0
assuming the y-intercept is at 0.

4 Comments

Fan Hu
Fan Hu on 3 Dec 2019
Edited: Adam Danz on 3 Dec 2019
No, the data is not linear - in fact it doesnt have much of a pattern to it at all, but as you can see they are highly correlated. I want to predict the future of the blue line given future values of the orange line
These look like they could be very noisy linear data with nearly flat slopes and a vertical offset between the two data sets. I'm not even sure if those data are rising and falling in synchrony.
If the orange and blue lines are variables y1 and y2, could you share the results of this plot:
plot(diff(y1), 'o')
hold on
plot(diff(y2),'-s')
grid on
Without knowing anything about the data, I see three trends that I'm not certain of.
  1. Both data sets (orange and blue) seem to be noise that varies about a flat line (slope=0)
  2. The difference between the trends is a vertical offset.
  3. The blue seems to have a slightly larger modulation amplitude.
To estimate the vertical offset, I'd average the difference between the blue and orange.
vOffset = mean(orange - blue)
which should be a postive number (mayby around 300)
To estimate the gain in modulation, you could try
gain = mean(diff(blue)./diff(orange))
assuming the blue and orange data have the same number of data points. This should also be a positive value.
Given a new value of orange, the blue would be something like
% b0 is the new blue value
% g0 is the given orange value
b0 = g0*gain + vOffset
Of course none of this has been tested so you might have to play around with it.
You may also want to add noise to the estimate. Otherwise all of your estimate values will have a correlation of 1 but your real data is obviously not as highly correlated.

Sign in to comment.

More Answers (1)

Can you attach some data?
If the values are correlated, how about if you scatter one versus the other, then fit a line between them? Now if you have the future values of set #1, y1, then you must have the future times, tFuture. So what if you just extrapolate them? You can add some noise if you want.
coefficients = polyfit(y1, y2, 1); % Fit a line through the scatterplot ov y2 vs. y1.
y1Future = y1(tFuture); % Get the y1 values at the future times.
y2Future = polyval(coefficients, y1Future); % Predict the y2 values when y1 are these values.
plot(tFuture, y2Future, 'ms-'); % Plot the y2Future values vs. time.
That's just off the top of my head. Can you attach some data so we can see if it seems reasonable?

Products

Release

R2019b

Asked:

on 2 Dec 2019

Answered:

on 3 Dec 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!