# Documentation

### This is machine translation

Translated by
Mouse over text to see original. Click the button below to return to the English verison of the page.

# princomp

Principal component analysis (PCA) on data

`princomp` will be removed in a future release. Use `pca` instead.

## Syntax

`[COEFF,SCORE] = princomp(X)[COEFF,SCORE,latent] = princomp(X)[COEFF,SCORE,latent,tsquare] = princomp(X)[...] = princomp(X,'econ')`

## Description

`COEFF = princomp(X)` performs principal components analysis (PCA) on the n-by-p data matrix `X`, and returns the principal component coefficients, also known as loadings. Rows of `X` correspond to observations, columns to variables. `COEFF` is a p-by-p matrix, each column containing coefficients for one principal component. The columns are in order of decreasing component variance.

`princomp` centers `X` by subtracting off column means, but does not rescale the columns of `X`. To perform principal components analysis with standardized variables, that is, based on correlations, use `princomp(zscore(X))`. To perform principal components analysis directly on a covariance or correlation matrix, use `pcacov`.

`[COEFF,SCORE] = princomp(X)` returns `SCORE`, the principal component scores; that is, the representation of `X` in the principal component space. Rows of `SCORE` correspond to observations, columns to components.

`[COEFF,SCORE,latent] = princomp(X)` returns `latent`, a vector containing the eigenvalues of the covariance matrix of `X`.

`[COEFF,SCORE,latent,tsquare] = princomp(X)` returns `tsquare`, which contains Hotelling's T2 statistic for each data point.

The scores are the data formed by transforming the original data into the space of the principal components. The values of the vector `latent` are the variance of the columns of `SCORE`. Hotelling's T2 is a measure of the multivariate distance of each observation from the center of the data set.

When `n <= p`, `SCORE(:,n:p)` and `latent(n:p)` are necessarily zero, and the columns of `COEFF(:,n:p)` define directions that are orthogonal to `X`.

`[...] = princomp(X,'econ')` returns only the elements of `latent` that are not necessarily zero, and the corresponding columns of `COEFF` and `SCORE`, that is, when `n <= p`, only the first `n-1`. This can be significantly faster when `p` is much larger than `n`.

## Examples

Compute principal components for the `ingredients` data in the Hald data set, and the variance accounted for by each component.

```load hald; [pc,score,latent,tsquare] = princomp(ingredients); pc,latent pc = -0.0678 -0.6460 0.5673 0.5062 -0.6785 -0.0200 -0.5440 0.4933 0.0290 0.7553 0.4036 0.5156 0.7309 -0.1085 -0.4684 0.4844 latent = 517.7969 67.4964 12.4054 0.2372```

The following command and plot show that two components account for 98% of the variance:

```cumsum(latent)./sum(latent) ans = 0.86597 0.97886 0.9996 1 biplot(pc(:,1:2),'Scores',score(:,1:2),'VarLabels',... {'X1' 'X2' 'X3' 'X4'})```

For a more detailed example and explanation of this analysis method, see Principal Component Analysis (PCA).

## References

[1] Jackson, J. E., A User's Guide to Principal Components, John Wiley and Sons, 1991, p. 592.

[2] Jolliffe, I. T., Principal Component Analysis, 2nd edition, Springer, 2002.

[3] Krzanowski, W. J. Principles of Multivariate Analysis: A User's Perspective. New York: Oxford University Press, 1988.

[4] Seber, G. A. F., Multivariate Observations, Wiley, 1984.