# Correlation and regression between matrixes with NaN values

16 views (last 30 days)
Carlotta Dentico on 5 Sep 2023
Commented: dpb on 6 Sep 2023
Hello!
I want to calcolate the regression and correlation coefficent between two matrixes (temperature and sea level pressure), having the same dimension 241 x 81 but containg some NaN values.
The final goal is to have a two dimensions matrix that I can plot (see attached image), i.e. for every point in the map I have a value for my correlations and regression coefficents
Thank you a lot!
##### 3 CommentsShow 1 older commentHide 1 older comment
MarKf on 5 Sep 2023
Edited: MarKf on 5 Sep 2023
Maybe there is no need for data. If the 2 matrices are 2D and have the same dimensions then they can be correlated even if they have NaNs. However you wouldn't obtain 241 x 81 values, the result would not be a matrix of the same size. Unless you cross-correlate, the correlations would give you a vector (either 81 or 241 -depending on how you correlate- rhos or R_squares -depending on what kinda correlation- or less -depending on missing values and what you decide to do with those). Cross-correlating will not give you a matrix corresponding to the same 2D locations, so I'm guessing that's not what you want. So maybe having the data can help us understand.
As for the NaNs, you have a few options, like using 'rows','complete' name-value pair to ignore rows with NaN values, which is likely what you need ( R = corr(A,B, 'rows','complete') ).
Carlotta Dentico on 5 Sep 2023
Here's the data :)

MarKf on 5 Sep 2023
I see, "array1" has some islands of values in a sea of NaNs.
a1 = ar1.array3; a2 = ar2.d;
ar1_0s = a1; ar1_0s(isnan(ar1_0s)) = 0; imagesc((ar1_0s)*10^2+a2); %here to visualize what I mean
So you have only sum(sum(~isnan(a1))) = 1719 non-NaNs values to correlate. You cannot do a map with 2D locations of those islands as I mention in the comment above, unless you have a couple of vectors for each of those locations you want to correlate. I just thought that you could also normxcorr2 but again that's probably not what you want given that these are geo/meteorogical data.
You could still correlate the values for each location that you have, that is a1(:) and a2(:) (converting each input into its vector representation), corrcoef does that automatically:
corrcoef(a1,a2, 'rows','complete')
ans = 2×2
1.0000 0.5341 0.5341 1.0000
To get rho = 0.5341
dpb on 5 Sep 2023
sum(sum(~isnan(a1)))
-->
nnz(isfinite(a1))

dpb on 5 Sep 2023
Edited: dpb on 5 Sep 2023
"...regression and correlation coefficent between two matrixes (temperature and sea level pressure), ... to have a .... for every point in the map .. value for my correlations and regression coefficents"
whos -file array1
Name Size Bytes Class Attributes array3 241x81 156168 double
whos -file array2
Name Size Bytes Class Attributes d 241x81 156168 double
array3(1:5,1:5)
ans = 5×5
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
d(1:5,1:5)
ans = 5×5
1.0e+05 * 1.0038 1.0039 1.0040 1.0041 1.0041 1.0038 1.0039 1.0040 1.0041 1.0041 1.0038 1.0039 1.0039 1.0041 1.0041 1.0038 1.0039 1.0039 1.0040 1.0040 1.0038 1.0039 1.0039 1.0040 1.0040
Pretty meaningless variable names, one presumes the 10E5 must be P and the other by elimination T?
However, for each point in the 2D array there is only one value each for T, P, so there is no "regression" or "correlation" of the two on a pointwise basis. You can look at the overall correlation between the two variables, but there's nothing to regress against or compare pointwise.
[r,p]=corrcoef(d,array3,'rows','complete')
r = 2×2
1.0000 0.5341 0.5341 1.0000
p = 2×2
1.0000 0.0000 0.0000 1.0000
gives the overall correlation between the two arrays for the locations that are both finite in the same positions; that's about all there is to be gained from these data in that regards.
What might be interesting would be
scatter(d,array3)
Indeed...there are some pretty clear correlations amongst given sets of data; the various columns are heavily correlated in having a definite set of trends but it is the relationship from one observation to another that is correlated, not that the two variables are highly (linearly) correlated.
Wonder how many columns contain at least one observation...
nnz(any(isfinite(array3)))
ans = 41
So, 41 out of the 81 columns have at least one observation so there are 41 separate traces above...
What, this means I dunno, but is pretty interesting -- and indicates that the overall correlation coefficient doesn't really indicate much and probably is of no practical value.
Carlotta Dentico on 6 Sep 2023
Actually there is a very useful toolbox, called Climate data Toolbox with which you can calculate the corrrelation between a time series and a 3D dataset (maybe you already knew it).
It is very useful especially for people working with climate, oceanographic data.
And indeed using the corr3 function I got the same map :)
dpb on 6 Sep 2023
"instead of comparing the two matrixes i need to callculate the correlation coefficent between the pressure matrix and the time series of temperature."
As the other respondent noted, you would have to have multiple arrays at differing times to do that which I suppose you probably do have.
But, the correlations above are by position and probably just reflect the changing depth as traverse the latitude. But, you haven't told us what temperature it is that is actually measured, nor even precisely what the pressure measurement is pressure of what...if it's atmospheric pressure at sea level, then it's going to be greatly influenced by what else is going on in the global weather patterns at the time.