regress and stats
1 view (last 30 days)
Show older comments
In regress function there is an option to save stats that includes R^2 among the other things. I am trying to see the relationship between R^2 and corrcoef. When we have only simple linear regression (variable y (response) and variable x (independent variable), R = corrcoef (x, y); Also, R = corrcoef(y, y_from_regress_function); However, when I have say two independent variables x1 and x2, the relationship above do not hold. However, one relationship still has to hold. That is R from the regress output should still be equal to corrcoef(y,y_from_regress_function). Any suggestions on why matlab does not produce expected R2 in multiple regression? Here is the code I use: X = [one(size(x1)) x1 x2 x1.*x2]; [b,bind,r,rint,stats] = regress(y,X); model = b(1) + b(2)*x1 + b(3)*x3 + b(4).*x1.*x2; corr = corrcoef(model,y); I expected stats(1) = corr^2. But it is not. Any suggestions?
2 Comments
the cyclist
on 20 Jan 2012
It would be helpful if you used the Code button to format your code more readably.
the cyclist
on 20 Jan 2012
It would also be helpful if you posted code with specification of x1, etc., such that it is a self-contained example that exhibits the issue. That saves people who might help you a lot of guesswork, and gives a common example to work with.
Answers (2)
Léon
on 24 Jan 2012
That is not a matlab related questions, since it relies on econometrics/statistics.
In every case the coefficient of determination R^2 is the relation of the sum of explained squares and the sum of of all squares, R^2 = SSE / SST. In the bivariate case we can show that the correlation coefficient (Pearson) is sufficient to describe the explanatory power of the model, so that r^2 = R^2. Meaning that the covariance between y and x (where x is just 1 explanatory variable) increases with the variance in y and x and represents the variation that can be explained by that specific model. In other words, the R^2 in the bivariate case can be rewritten as r_(y,x) * (beta_x * s_x/s_y). Hence the coefficient of determination is as well the correlation coefficient weighted by the standardized regression coefficient in x. This relationship holds for the trivariate and multivariate case where R^2 can be expressed as the sum of all bivariate correlations, weighted by their specific standardized regression coefficients. So the point is in fact that in the bivariate case the standardized regression coefficient equals the correlation coefficient (!), such that, --> R^2 = r * r = r * (beta_x * s_x / s_y), (for the bivariate case).
I hope this helps you seeing the relation between the R^2 and the correlation between your variables clearer. But once again this is subject of elementary econometrics courses/books and you should be aware of these things before using such models that might give you biased/wrong results.
0 Comments
Tom Lane
on 24 Jan 2012
One problem is that the model you fit is not the same as the "model" value you computed afterward. Or maybe the "x3" was just a typo. Either way, here's some code showing that the square of the correlation between the observed and fitted y is equal to the R^2 value in the stats structure:
x1 = randn(100,1); x2 = 5*rand(100,1);
y = 100 + 10*x1 - 4*x1.*x2 + 3*x2.^2;
X = [ones(size(x1)) x1 x2 x1.*x2];
[b,bind,r,rint,stats] = regress(y,X);
model = X*b;
corr = corrcoef(model,y)
sqrt(stats(1))
0 Comments
See Also
Categories
Find more on Gaussian Process Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!