cross correlation using 'xcorr' in the presence of NaN or missing values
38 views (last 30 days)
Show older comments
Hi I am trying to calculate cross correlation of two time-series at different lags but my data have a lot of NaN values. When I calculate cross correlation as below, it gives all NaNs in the corln.
[corln, lags] = xcorr (ave_precp_india (:), aod_all (:, 1), 15);
I want to specify something like 'rows', 'pairwise' in calculating correlation so that NaNs are ignored. How can I specify 'rows', 'pairwise' option in xcorr?
2 Comments
Answers (2)
Adam Danz
on 15 Apr 2020
Edited: Adam Danz
on 15 Apr 2020
There isn't a simple solution to this problem. If you have a single NaN value within the window, the correlation for that window will be NaN.
Depending on how many missing values are in the data and how far they are spread apart, you may be able to work around the problem.
If there are relatively few missing values and the missing values are spread apart, you could fill in the NaN values by interpolation or using Matlab's fillmissing() function but you must do so in a responsible and meaningful way. Merely avoiding NaN values is not an indication that your solution was a good solution. After filling the missing values, plot the data and make sure the updated values make sense and are reasonable.
If the NaN values are clustered together, interpolation and fillmissing() won't be reasonable solutions. You may have to analyze the data in chunks but even that has problems since the number of data points within the window becomes smaller at the beginning and end of each chunk of data.
2 Comments
Brian Sweis
on 15 Apr 2020
Thanks Adam, I'm realizing this is certainly not so simple, as I've been browsing the web all day learning about other people's situations with similar issues. I appreciate your (recent) commentary as a lot of people's comments online are from pretty old posts.
To make it a little more complicated, I'm trying to do this in 2 dimensions much like xcorr2, which also does't have an obvious way to spit out the normalized corrcoeff like xcorr can.
This function normxcorr2_general seems to work a bit more flexibly than matlab's built in normxcorr2:
And this function nanxcov seems to try to handle NaNs in a way xcorr doesn't and then normalizes with means removed.
I reached out to the author of the normxcorr2_general code above, Dirk Padfield who just today pointed me in the direction of this paper with corresponding code:
"Masked object registration in the Fourier domain" on how to quickly compute masked correlation that you can find along with the code at http://www.dirkpadfield.com/papers. This approach/code enables you to specify a mask with any arbitrary pixels turned on that you want, and all pixels that are turned off will be ignored in the computation. The code does not include a "maxLag" parameter but because the computation is fast you can crop the output to what you need after it processes all lags in multiple dimensions. "
Adam Danz
on 15 Apr 2020
Thanks for sharing what you've found. It will likely be useful to future visitors here.
Marco Sandoval Belmar
on 22 May 2021
Edited: Marco Sandoval Belmar
on 22 May 2021
Hi,
I agree with the comment above. It is not a straightforward way to deal with this. However, I have a code that calculate the normal correlation and with the 'rows','complete' option of 'corr' and then moves the time series manually. Nevertheless, I have noticed that this produces some artificial "wiggles' if you make a lag vs correlation graph, and I assume is because of the NaN's and the explanation of the above comment. So, something like:
function [R,L,pvalue] = nanxcorr_ms(s1,s2,Lag)
% function [L, R,pvalue] = nanxcorr (s1, s2, Lag);
% Function that allows obtaining the cross-correlation
% of a pair of time series containing gaps
%
% Input:
% s1 time series [vector]
% s2 time series [vector]
% lag number of lags to correlate (ex. 20)
%
% output
% L lag
% R correlation coefficient
% pvalue
%
% sam 04/16/2013
% Marco Sandoval Belmar 4/1/2018
[r,p]=corr(s1',s2','rows','complete'); % correlation a lag == 0
% Performs the correlation for the different lags
L = 0; R = r; pvalue=p;
for i1 =1:1:Lag
s11 = s1(1:end-i1);
s21 = s2(i1+1:end);
[c,pp] = corr(s11',s21','rows','complete');
R = [c;R];
pvalue = [pp;pvalue];
L = [-i1;L];
clear s11 s21 c pp
s21 = s2(1:end-i1);
s11 = s1(i1+1:end);
[c,pp] = corr(s11',s21','rows','complete');
R = [R;c];
pvalue = [pvalue;pp];
L = [L;i1];
clear s21 s11 c
end
end
1 Comment
Vasilisa Iatckova
on 26 Aug 2024
with small edits: assuming the inputs are column vectors
function [R,L,pvalue] = nanxcorr(s1,s2,Lag)
% function [L, R,pvalue] = nanxcorr (s1, s2, Lag);
% Function that allows obtaining the cross-correlation
% of a pair of time series containing gaps
%
% Input:
% s1 time series [vector]
% s2 time series [vector]
% lag number of lags to correlate (ex. 20)
%
% output
% L lag
% R correlation coefficient
% pvalue
%
% sam 04/16/2013
% Marco Sandoval Belmar 4/1/2018
% Vasilisa Iatckova 8/26/2024
[r,p]=corr(s1,s2,'rows','complete'); % correlation a lag == 0
% Performs the correlation for the different lags
L = 0; R = r; pvalue=p;
for i1 =1:1:Lag
s11 = s1(1:end-i1);
s21 = s2(i1+1:end);
[c,pp] = corr(s11,s21,'rows','complete');
try
R = [c;R];catch ME; ME,keyboard;end
pvalue = [pp;pvalue];
L = [-i1;L];
clear s11 s21 c pp
s21 = s2(1:end-i1);
s11 = s1(i1+1:end);
[c,pp] = corr(s11,s21,'rows','complete');
R = [R;c];
pvalue = [pvalue;pp];
L = [L;i1];
end
end
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!