Dimensional error using PCA
2 views (last 30 days)
Show older comments
Hello everyone. I have generated a code in which I use a Gaussian correlation kernel to generate 1000 realizations of a stochastic process and then, perform PCA over the resulting process. The result is a matrix of 501*1000.
However, when I perform the PCA over this matrix, the results contradict the help at https://la.mathworks.com/help/stats/pca.html
In the info it says that if one inrtoduces a n*p matrix, coeff will be a p*p matrix and score an n*p. Here, I get different results, coeff is a p*n matrix and score a p*p; the weird thing is that the process is reconstructed propperly. Can anyone tell me what is happening?
Thanks.
Additionally, reading theory, coeffs should be standard normal random variables; if I plot the histograms, the resulting variables are normal but not standard. If someone could tell me why these are not standard I would be very thankfull.
The code in question:
close all
clear
clc
[X,Y] = meshgrid(0:0.002:1,0:0.002:1);
Z=exp((-1)*abs(X-Y));
tam=size(X, 1);
number_realizations=1000;
realizacion_mat=zeros(tam, number_realizations);
cov_mat=cov(Z);
[evec_mal, evalM_mal]=eig(cov_mat);
eval_mal=eig(evalM_mal);
num_eval=size(eval_mal,1);
for i=1:num_eval
eval(i)=eval_mal(num_eval-i+1);
evec(:,i)=evec_mal(:,num_eval-i+1);
end
figure
hold on
for j=1:number_realizations
realizacion=zeros(tam, 1);
for i=1:tam
v_a = normrnd(0,1);
realizacion=realizacion+sqrt(eval(i))*evec(:,i)*v_a;
end
realizacion_mat(:,j)=realizacion;
plot(realizacion)
clear('realizacion')
end
[coeff,score,latent,tsquared,explained,mu] = pca(realizacion_mat,'Centered',false);
reconstruction_process=score*coeff';
diference=reconstruction_process-realizacion_mat;
figure
plot(diference)
for i=1:5
figure
histogram(coeff(:,i), 20)
end
0 Comments
Accepted Answer
Jon
on 9 Jul 2019
Edited: Jon
on 9 Jul 2019
The first argument to pca should be n by p, where n is the number of observations. You are supplying it with a p by n matrix. As a result the arguments that are returned are not dimensioned as you expect. I do not see anything in the MATLAB documentation that discusses the distribution (standard normal) of the coefficients. Maybe this is something specific to your application. In any case, if you supply pca with an array, where each row is an observation, then you will be off to a good start.
I also suggest that in your code, you do not use the variable name eval, for eigenvalues. eval is a MATLAB function that evaluates an expression. You did not get any error message as MATLAB assumes you want to use eval as a variable name rather than as a function. It is at the least confusing to read the code if you know what the eval function does, and also if at some point further you actually wanted to use eval as a function you would have problems.
5 Comments
Jon
on 9 Jul 2019
Hi I'm not familiar with the theoretical background for your problem, and have not used principle components analysis in this particular context, so I do not have an immediate answer regarding why they are not standard normal variables. I'm sorry, I do not have time to dig deeper, but I would guess that there is a scaling factor somewhere that is not consistent between the two implementations (MATLAB pca, and the reference that you are working from).
More Answers (0)
See Also
Categories
Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!