Getting NaN when computing partialcorr (no NaNs in data)
13 views (last 30 days)
Show older comments
Hi, I am using partialcorr on series of data and it sometimes results in NaNs. Why is that? I am sure I have no NaNs in my data and no missing or empty entries. Sometimes using partialcorr([x y], 'rows','complete') helps bot it does not always fix the problem. Thanks for help.
4 Comments
dpb
on 10 Oct 2022
Edited: dpb
on 10 Oct 2022
tF=readtable(websave('Test_data.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/125764/Test_data.txt'));
partialcorr([tF.flower_date,tF.cum_temp],[tF.Var1,tF.Var2])
fitlm(tF,'predictorVars',{'cum_temp','Var1','Var2'},'ResponseVar','flower_date','intercept',true)
So partialcorr isn't lying to us; let's see what's going on between the independent variables themselves...
corrcoef([tF.cum_temp,tF.Var1,tF.Var2])
OK, none of those are identically 1 altho cum_temp is very highly correlated with Var1 and Var1,Var2 are pretty high with each other, they aren't directly correlated. So, the conclusion has to be that cum_temp is a linear combination of the other two...let's check that out next--
fitlm(tF,'predictorVars',{'Var1','Var2'},'ResponseVar','cum_temp','intercept',true)
That last shows that cum_temp is identically predicted by a linear combination of Var1, Var2 leading to the given results before.
This probably means that Var1, Var2 were/are derived, not observed variables and may throw doubt on the rest of the prior analyses as well, depending on just how those corollary variables were/are defined and what it is that prevented the above result for other cases as well.
Answers (1)
Adam Danz
on 4 May 2021
See similar question: getting a NaN in correlation coefficient
The same basic problem is happening with the partial correlation.
When correlating variable X with variable Y while controlling for variable Z, the X variable may be predicted by Z so their residuals would be 0 or very close to 0. To prevent returning a spurious correlation, the partialcorr function detects residuals close to 0 and sets them to 0 to avoid floating point roundoff error. If you look at the equation in the wiki article, it will be clear why NaN values are returned in those cases since 0/0=NaN.
The partialcorr.m file contains valuable comments by its authors explaining this just above the lines of code that compute the correlation coefficients (r2021a).
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!