How to calculate R^2 based on external linear equation?

Question

Lu Da Silva on 15 Mar 2022

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1671949-how-to-calculate-r-2-based-on-external-linear-equation

Edited: John D'Errico on 15 Mar 2022

I have a scatter and I calculated the slope and intercept of the best fitting line (using orthogonal regression).

So now I have scattered datapoints and an equation in the shape y=mx+c.

How can I calculate R^2 based on this new equation I found?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 15 Mar 2022

4
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1671949-how-to-calculate-r-2-based-on-external-linear-equation#answer_918074

Edited: John D'Errico on 15 Mar 2022

Open in MATLAB Online

Unfortunately, R^2 is not a completely simple concept when working with an orthogonal regression. To compute a valid R^2, we need to understand what it means. So start there. (Sigh, this may get long. Sadly, people use R^2, in fact, rely on it for curve fitting, without really understanding it far too often.)

What does R^2 mean? In the context of fitting a line (or a curve) to data, where the noise is assume to be on only y, R^2 asks the question of how well does the model fit the data, when compared to a constant model? For example, you might have some simple data, with the noise on only y.

x = (-5:5)';
y = 2*x + randn(size(x));
[mdl,goodness] = fit(x,y,'poly1')
mdl = 
     Linear model Poly1:
     mdl(x) = p1*x + p2
     Coefficients (with 95% confidence bounds):
       p1 =       1.968  (1.676, 2.259)
       p2 =     -0.2897  (-1.212, 0.6329)
goodness = struct with fields:
           sse: 16.4651
       rsquare: 0.9628
           dfe: 9
    adjrsquare: 0.9586
          rmse: 1.3526

So we see that R^2 was a number close to 1. In fact, R^2 will always be a number between 0 and 1 as long as there is a constant term in your model (worth a long explanation even there.)

plot(mdl,'b')

hold on

plot(x,y,'ro')

yline(mean(y),'g')

hold off

Why did I include a green line there at the mean of y? Because effectively, that is what R^2 compares your model to. Effectively, how well does the line in blue fit the data, when compared to a model where the function is just a constant?

Now we can compute R^2 for this problem ourselves. I hope we get the same result. :)

constantModel = mean(y)
constantModel = -0.2897
totalSumOfSquaresForConstantModel = sum((y - constantModel).^2)
totalSumOfSquaresForConstantModel = 442.3126
sumOfSquaresForLinearModel = sum((y - mdl(x)).^2)
sumOfSquaresForLinearModel = 16.4651
Rsq = 1 - sumOfSquaresForLinearModel/totalSumOfSquaresForConstantModel
Rsq = 0.9628

In fact, it just happens to be exactly the same value as we fit gave us. WHEW, did I get lucky, or what?

That data was actually pretty good, with a fairly high signal to noise ratio, so we expect R^2 to be close to 1. For data that is just completely random, with no signal at all, we will see R^2 values that are near zero. So people have come to effectively rely on R^2 as telling them something important. How good is their model? (If I'm not careful here, I might start to rant and rave about why R^2 is not as useful or important as they think. GRRRR. set('rantMode','off')). Ok. Better now.)

Now let me create some total least squares data as an example, where there is some noise in both variables. (Even that has issues in terms of how we would compute R^2. UGH.)

x = (-10:10)';
y = 2*x;
x = x + randn(size(x))*2;
y = y + randn(size(x))*2;

Compute the total least squares model. Some call it an errors in variables model. Some will call it an orthogonal regression. And some might call it a PCA model, because of how it might be computed. I'll use svd to do the work here.

A = [x - mean(x),y - mean(y)];
[~,~,V] = svd(A,0);
coef = V(:,end)
coef = 2×1
   -0.9104
    0.4137

So coeff contains the coefficients in the model

a2*(y - mean(y)) + a1*mean(x - mean(x)) = 0

If we want to write the model in the form y = m*x+b, be careful, in case the coefficient on y was zero!!!!!!

m = -coef(1)/coef(2)
m = 2.2008
b = mean(x)*coef(1)/coef(2) + mean(y)
b = -0.5112
xp = linspace(min(x),max(x));
yp = m*xp + b;

Now, what does it mean to compare the orthogonal regression to a default constant model? That default constant model is just the point at (mean(x),mean(y)). I'll plot that point in green again.

plot(x,y,'ro',xp,yp,'b-',mean(x),mean(y),'gs')

Again, R^2 for an orthogonal regression should mean how well does the straight line model compare to no model at all? If the data were perfectly spherical noise, we might hope to see an R^2 near zero.

So here, I'll compute R^2 much as we did before.

totalSumOfSquaresForConstantModel = sum((x - mean(x)).^2) + sum((y - mean(y)).^2)
totalSumOfSquaresForConstantModel = 3.7580e+03

For the linear model, we need the sum of squares of orthogonal distances to the curve for each point. There are many ways we might do that of course. Being too lazy to derive it here, I'll just give you a reference for one way:

https://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line

sumOfSquaresForlinearModel = sum((-y + m*x + b).^2/(1 + m^2))
sumOfSquaresForlinearModel = 78.8675
OrthRegR2 = 1 - sumOfSquaresForlinearModel/totalSumOfSquaresForConstantModel
OrthRegR2 = 0.9790

So a reasonable estimate of R^2 for this orthogonal regression model (based on the same intuitive idea of what R^2 means in the classical sense) is 0.9790.

Again, R^2 tells us how much better we fit the data using a linear model, when compared to essentially no fit at all, so compared to the total noise in the data itself as if the data were entirely noise. A perfect fit would have R^2 as 1. Complete garbage for data would expect to see R^2 near zero.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

KSSV on 15 Mar 2022

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1671949-how-to-calculate-r-2-based-on-external-linear-equation#answer_917924

Edited: KSSV on 15 Mar 2022

Open in MATLAB Online

Substitute your x in th equation y = mx+c, get (x1,y1) and let (x0,y0) be your old existing values. Then you can use

regression or plotregression.

% Demo example:

% Demo line

m = rand; c = randi(10) ;

x = 1:10 ;

y = m*x+c ;

% Make scattered data along the line

x0 = x ;

y0 = y+2*rand(size(x0)) ;

% Fit a line

p = polyfit(x0,y0,1) ;

x1 = x0 ;

y1 = polyval(p,x1) ;

R = regression(y0,y1)

R = 0.6264

figure(1)

plot(x0,y0,'.r',x1,y1,'b')

% figure(2)

% plotregression(y0,y1)

2 Comments
Show NoneHide None

John D'Errico on 15 Mar 2022

Note that what you showed was NOT an orthogonal regression. It is very different when you need to use an orthogonal regression.

Sam Chak on 15 Mar 2022

Perhaps the info on this page is helpful.

https://www.mathworks.com/help/stats/fitting-an-orthogonal-regression-using-principal-components-analysis.html

Sign in to comment.

How to calculate R^2 based on external linear equation?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

How to calculate R^2 based on external linear equation?

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None