Clear Filters
Clear Filters

Matlab SVD & PCA - which singular values belongs to which variables?

25 views (last 30 days)
Hello everybody,
I'm currently doing a Singular Value Decomposition of a matrix A of size 64800 X 454. (each of the 454 columns correspond to a unique timestamp, so these are my variables, and the 64800 rows correspond to values in longitude/latitude grid points). I would like to do a Principal Component Analysis from this SVD data, but I've run into a problem.
According to my understanding, when computing [U,S,V] = svd(A), the diagonal entries of S correspond to the singular values of A. So if you square these values, you should obtain the eigenvalues of the principal components (PC), and from here you can get the variance each PC accounts for.
However, Matlab sorts the singular values in descending order, so the largest one comes first, etc. So how can i figure out which singular value belongs to which PC, and thus which timestamp? The whole point is to see which timestamp is the most 'significant' or interesting.

Answers (2)

JESUS DAVID ARIZA ROYETH
JESUS DAVID ARIZA ROYETH on 6 May 2018
assuming that X is your matrix :
mu = mean(X);%average of each column
Xmean = bsxfun(@minus, X ,mu);%data centered
[coeff,latent] = svd(cov(Xmean));%Singular value decomposition
[latent, ind] = sort(diag(latent), 'descend');%values in descending order
explained=100*latent/sum(latent);%variances of all individual principal components
coeff = coeff(:,ind);%Principal component coefficients (Each column of coeff contains coefficients for one principal component)
score = Xmean * coeff';%each column of score is a Component
plot(score(:,1),score(:,2),'r*')%ploting The two principal components
title(['The two principal components PCA (', num2str(round(sum(explained(1:3)))), '% variance explained)'])
xlabel('First component')
ylabel('Second component ')
for to see which timestamp is the most 'significant' or interesting, you should see the coefficients of "coeff"

John D'Errico
John D'Errico on 6 May 2018
A singular value does NOT belong to ANY variable. There is NO correspondence, nor certainly any ownership.
The sort that is done sorts the singular values, and the corresponding singular vectors consistently. But a singular vector defines a LINEAR combination of the original variables.
Could you decide to look at the elements of that first singular vector? You might perhaps decide that because one coefficient in that linear combination is larger than the other coefficients, then it designates a variable that had the most importance. Be careful, as that presumes that all of your data was scaled to have unit variances. Otherwise, I might decide to arbitrarily change the units on one of my variables.
For example, suppose I had two variables, both with units of length. Except that I can choose to express one variable in nanometers, the second variable in kilo-parsecs. If I did that, the weights for those variables would be far different than if I had swapped that choice of units in reverse. The problem is the same in either case, but you would come to a far different conclusion for each case as to which was the more important variable, if you looked ONLY at the linear combination chosen from the PCA.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!