Getting NaN when computing partialcorr (no NaNs in data)

13 views (last 30 days)
Hi, I am using partialcorr on series of data and it sometimes results in NaNs. Why is that? I am sure I have no NaNs in my data and no missing or empty entries. Sometimes using partialcorr([x y], 'rows','complete') helps bot it does not always fix the problem. Thanks for help.
  4 Comments
dpb
dpb on 10 Oct 2022
Edited: dpb on 10 Oct 2022
tF=readtable(websave('Test_data.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/125764/Test_data.txt'));
partialcorr([tF.flower_date,tF.cum_temp],[tF.Var1,tF.Var2])
ans = 2×2
1 NaN NaN NaN
fitlm(tF,'predictorVars',{'cum_temp','Var1','Var2'},'ResponseVar','flower_date','intercept',true)
Warning: Regression design matrix is rank deficient to within machine precision.
ans =
Linear regression model: flower_date ~ 1 + Var1 + Var2 + cum_temp Estimated Coefficients: Estimate SE tStat pValue ________ _________ _______ __________ (Intercept) 0 0 NaN NaN Var1 17.841 0.25253 70.647 1.8066e-59 Var2 -0.42291 0.016155 -26.178 1.5975e-34 cum_temp 0.36047 0.0049775 72.419 4.1539e-60 Number of observations: 64, Error degrees of freedom: 61 Root Mean Squared Error: 3.28 R-squared: 0.845, Adjusted R-Squared: 0.84 F-statistic vs. constant model: 167, p-value = 1.9e-25
So partialcorr isn't lying to us; let's see what's going on between the independent variables themselves...
corrcoef([tF.cum_temp,tF.Var1,tF.Var2])
ans = 3×3
1.0000 -0.9174 -0.4560 -0.9174 1.0000 0.7726 -0.4560 0.7726 1.0000
OK, none of those are identically 1 altho cum_temp is very highly correlated with Var1 and Var1,Var2 are pretty high with each other, they aren't directly correlated. So, the conclusion has to be that cum_temp is a linear combination of the other two...let's check that out next--
fitlm(tF,'predictorVars',{'Var1','Var2'},'ResponseVar','cum_temp','intercept',true)
ans =
Linear regression model: cum_temp ~ 1 + Var1 + Var2 Estimated Coefficients: Estimate SE tStat pValue ________ __ _____ ______ (Intercept) 427 0 Inf 0 Var1 -61 0 -Inf 0 Var2 1 0 Inf 0 Number of observations: 64, Error degrees of freedom: 61 R-squared: 1, Adjusted R-Squared: 1 F-statistic vs. constant model: 8.54e+29, p-value = 0
That last shows that cum_temp is identically predicted by a linear combination of Var1, Var2 leading to the given results before.
This probably means that Var1, Var2 were/are derived, not observed variables and may throw doubt on the rest of the prior analyses as well, depending on just how those corollary variables were/are defined and what it is that prevented the above result for other cases as well.

Sign in to comment.

Answers (1)

Adam Danz
Adam Danz on 4 May 2021
The same basic problem is happening with the partial correlation.
Matlab's partialcorr follows the steps explained in Wikipedia's Partial Correlation article.
When correlating variable X with variable Y while controlling for variable Z, the X variable may be predicted by Z so their residuals would be 0 or very close to 0. To prevent returning a spurious correlation, the partialcorr function detects residuals close to 0 and sets them to 0 to avoid floating point roundoff error. If you look at the equation in the wiki article, it will be clear why NaN values are returned in those cases since 0/0=NaN.
The partialcorr.m file contains valuable comments by its authors explaining this just above the lines of code that compute the correlation coefficients (r2021a).

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!