Why does repeating pca on randomized data switch the order of the points? Compare the test case and the Monte-Carlo case.

Question

dodgeball on 12 Nov 2021

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1584739-why-does-repeating-pca-on-randomized-data-switch-the-order-of-the-points-compare-the-test-case-and

Edited: the cyclist on 12 Nov 2021

Dear Help,

I would like to repeat PCA analysis many times using slightly randomized data. The idea is to produce a "cloud" of points in PCA space that surround the real data. I use simple matlab functions below to code this. But I find that the red and black points intersperse. Am I misunderstanding something about the pca output?

-Dodgeball

% Note: The first three columns of X correspond to the "black" data category, whereas the last three columns correspond to "red" category.
X= [21.3082,   20.4909,   21.7057,   27.0204,   26.8216,   26.8253;
    24.7031, 24.7590, 24.7928, 18.7839, 18.6898, 21.7545;
    27.1607, 27.2170, 26.7247, 25.1205, 20.8257, 21.5048;
    26.8413, 26.6575, 26.8508, 21.2030, 20.9727, 23.0522;
    26.2660, 26.0913, 26.4202, 20.9011, 21.6699, 20.8864;
    26.7285, 26.5326, 26.7244, 20.9167, 22.2356, 22.0653;
    26.2849, 26.3539, 26.4534, 21.3777, 21.7901, 21.4655;
    27.1331, 27.1494, 27.1535, 21.3922, 22.9945, 22.9521;
    26.3820, 26.4615, 26.6303, 21.2554, 22.3799, 21.8397;
    21.7944, 26.0630, 26.1678, 28.0560, 28.8596, 28.0929;
    21.3088, 21.1267, 21.8997, 26.0820, 25.3791, 25.9918;
    21.1613, 22.0565, 21.2904, 25.7505, 25.8322, 25.7237;
    20.7289, 21.1625, 21.1503, 24.6710, 24.8744, 24.9068;
    24.7395, 24.7855, 24.8222, 21.3117, 20.7364, 21.6248;
    23.2372, 23.4656, 21.3302, 25.8519, 25.9230, 25.9140];
% test case
[coeff,score,latent,tsquared,explained] = pca(X');
plot(score(1:3,1), score(1:3,2),'.k');
hold on;
plot(score(4:6,1), score(4:6,2),'.r');
% end test case
% begin Monte-Carlo case
HOBART = X;
HOBART_red = HOBART(:,1:3);
HOBART_black = HOBART(:,4:6);
cov_red = cov(HOBART_red');     % covariance matrix 15x15
cov_black = cov(HOBART_black'); % covariance matrix 15x15
mu_red = mean(HOBART_red');     % mu 1x15
mu_black = mean(HOBART_black'); % mu 1x15
nDRAWS_MV = 1000;
XDATA_red_MV = zeros(3*nDRAWS_MV,1);
YDATA_red_MV = zeros(3*nDRAWS_MV,1);
XDATA_black_MV = zeros(3*nDRAWS_MV,1);
YDATA_black_MV = zeros(3*nDRAWS_MV,1);
cankick = 0;
avgexplained1 = zeros(nDRAWS_MV,1);
avgexplained2 = zeros(nDRAWS_MV,1);
for(i=1:1:nDRAWS_MV)
    
    sample_HUGEmatrixC_MV = [ mvnrnd(mu_red,cov_red,3)', mvnrnd(mu_black,cov_black,3)'];
    
    % [coeff,score,latent,tsquared,explained] = pca(sample_HUGEmatrixC_MV','VariableWeights','variance');
    [coeff,score,latent,tsquared,explained] = pca(sample_HUGEmatrixC_MV');
    
    old_cankick = cankick;
    cankick = cankick+3;
    
    XDATA_red_MV((old_cankick+1):1:cankick,1)= score(1:3,1);
    YDATA_red_MV((old_cankick+1):1:cankick,1)= score(1:3,2);
    
    XDATA_black_MV((old_cankick+1):1:cankick,1)= score(4:6,1);
    YDATA_black_MV((old_cankick+1):1:cankick,1)= score(4:6,2);
    
    
    avgexplained1(i,1) = explained(1);
    avgexplained2(i,1) = explained(2);
    
end
figure; % plot figure of the scatter of the Monte Carlo
hold on;
plot(XDATA_red_MV,YDATA_red_MV,'.r');
plot(XDATA_black_MV,YDATA_black_MV,'.k');
xlabel(strcat('PCA1:',num2str(round(mean(avgexplained1))),'%'))
ylabel(strcat('PCA2:',num2str(round(mean(avgexplained2))),'%'))
% end Monte-Carlo case

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

the cyclist on 12 Nov 2021

2
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1584739-why-does-repeating-pca-on-randomized-data-switch-the-order-of-the-points-compare-the-test-case-and#answer_829694

Edited: the cyclist on 12 Nov 2021

Open in MATLAB Online

I have to admit that I did not spend the time to come to a complete understanding of your code.

However, I'm guessing that your unexpected result is explained by the fact that the principal component vectors are the eigenvectors of the covariance matrix, and the negative of an eigenvector is also an eigenvector. Therefore, swapping signs on all your PCA components will give equally valid PCA components.

I'm guessing that the randomness you are inserting is enough that the PCA algorithm is sometimes landing on the opposite-signed eigenvectors than those of the original X.

I am not certain about the following, and you should carefully think about this yourself, but I think if you insert the line

score = score*sign(score(1)); % Flip signs of all components in PC space if needed, to ensure first one is positive

after you calculate the PCA, then all your MC black/red distinctions will be retained consistently, and still be valid. (To make it consistent with the original data, force the MC score(1,1) to have the same sign as the original PCA.)

1 Comment
Show -1 older commentsHide -1 older comments

dodgeball on 12 Nov 2021

Thank you so much for your insight! This seems a very plausible explanation. I would be surprised if there is an argument against your approach, but I will think about it further. If you or anyone else on this thread cares to weigh in on this or to comment further on the theory, please feel free to do so!

-Dodgeball

Sign in to comment.

Why does repeating pca on randomized data switch the order of the points? Compare the test case and the Monte-Carlo case.

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Why does repeating pca on randomized data switch the order of the points? Compare the test case and the Monte-Carlo case.

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments