Correlation computation using a window of 3

5 views (last 30 days)
Hello
Please I have 3 column x, and y
x = 5, 8, 9, 4, 9, 6, 0 ,7 ,8 , 5, 4
y = 6, 4, 8, 7, 3, 7, 8 ,7 ,6 , 4, 7
I want to find the correlation using 3 window size computation
for instance the first 3 windows will be corr for x = 5, 8 , 9 and y = 6, 4, 8
The if the last numbers is not equal to 3 then the correlation of the numbers present is obtained in the case of
The cor for x = 5, 4 and y = 4, 7 is obtained
I get a new column for x and y with 4 rows
I need a value for the correlation instead of the corrcoef function which is giving me matrices.
Thanks for your help in advance.

Accepted Answer

Dana
Dana on 17 Aug 2020
I don't entirely understand what you're trying to do, but you may want to use corr(a,b) instead of corrcoef(a,b) or corrcoef(C).
For a matrix C, corrcoef(C) returns a correlation matrix, i.e., a matrix whose (i,j) element is the correlation coefficient between the i-th and j-th columns of C. For column vectors a and b, the syntax corrcoef(a,b) is the same thing as corrcoef([a,b]) (i.e., MATLAB just puts the two vectors together into a single matrix, and then finds the correlation matrix).
On the other hand, corr(a,b) simply returns the correlation coefficient between the vectors a and b. Note, however, that corr([a,b]) = corrcoef(a,b) = corrcoef([a,b]), i.e., that syntax will also return the correlation matrix. So if you just want the one correlation coefficient, you need to use corr(a,b).
  5 Comments
Dana
Dana on 17 Aug 2020
Edited: Dana on 17 Aug 2020
I see now what you're trying to do. There are any number of different approaches you could take. Here's one:
x = [5, 8, 9, 4, 9, 6, 0 ,7 ,8 , 5, 4];
y = [6, 4, 8, 7, 3, 7, 8 ,7 ,6 , 4, 7];
winsz = 3; % window size
xy = [x;y]; % combine data
nxy = size(xy,2); % number of observations
ngr = ceil(nxy/winsz); % number of groups of size winsz
pdsz = ngr*winsz; % we will pad the data with extra elements so that the
% total # of elements is evenly divisible by winsz;
% pdsz is the size of the padded array
xy(:,nxy+1:pdsz) = NaN; % pad to desired size with NaN
xy = reshape(xy,2,winsz,ngr); % reshape into a 3-D array, where 1st and 2nd row correspond
% to x and y, columns to winsz observations, and the 3rd
% dimension to different groupings of size winsz
% dv is a 1x1xngr array whose j-th element will be the number of observations in the j-th
% group; this will be equal to winsz in all but the last group
dv = winsz*ones(1,1,ngr);
dv(ngr) = winsz-(pdsz-nxy);
xymeans = sum(xy,2,'omitnan')./dv; % compute means of x and y for each group
xyc = xy - xymeans; % de-mean the observations
xystds = sqrt(sum(xyc.^2,2,'omitnan')./dv); % compute s.d.'s of x and y for each group
xycovs = sum(prod(xyc,1,'omitnan'),2)./dv); % compute covariances of x and y for each group
xycorr = reshape(xycovs./prod(xystds,1),1,ngr); % get correlation coefficients, and then
% reshape 3-D result to a row vector
EDIT to say: the above uses the sample mean from each group of 3 as the mean estimate for that group. This is what would be done if you just ran a loop and called the corr function for each grouping of 3. You could substitute some other mean estimate if you wanted, though, e.g., use the same mean from the entire vectors x and y for each group. To do that, you'd instead use xyc = xy-mean([x;y],2).
Also, in hindsight, it's probably an easier option to just run a loop here. That would be noticeably slower for large arrays, but in this case it won't make an appreciable difference. So:
x = [5, 8, 9, 4, 9, 6, 0 ,7 ,8 , 5, 4].';
y = [6, 4, 8, 7, 3, 7, 8 ,7 ,6 , 4, 7].';
winsz = 3; % window size
nxy = numel(x); % number of observations
ngr = ceil(nxy/winsz); % number of groups of size winsz
xycorr = zeros(ngr,1);
for j = 1:ngr
indsj = ((j-1)*winsz+1:min(j*winsz,nxy)).';
xycorr(j) = corr(x(indsj),y(indsj));
end
As a last note, this loop method delivers the same answer as the other method above, except for in the last group. That last group only has two observations, and in that scenario you need to be more careful in calculating the correlation coefficient. In particular, +/- 1 are the only possible correlations when you have only two observations, and the method I did above won't give you that answer.
Furthermore, if you were to apply either of these methods in a situation where the last group has only 1 observation, it's not going to work at all.
Tino
Tino on 18 Aug 2020
Thank you very much am really grateful

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!