Problem 2043. Six Steps to PCA - Step 1: Centre and Standardize
Introduction
Principal Component Analysis (PCA) is a classic among the many methods of multivariate data analysis. Invented in 1901 by Karl Pearson the method is mostly used today as a tool in exploratory data analysis and dimension reduction, but also for making predictive models in machine learning.
Step 1: Centre and Standardize
A first step for many multivariate methods begins by removing the influence of location and scale from variables in the raw data. Also commonly known as the z-scores of X, Z is a transformation of X such that the columns are centered to have mean 0 and scaled to have standard deviation 1 (unless a column of X is constant, in which case that column of Z is constant at 0). Strictly speaking, z-scores are based on population parameters, whereas the analogous calculation based on sample mean and standard deviation is the Student's t-statistic.
Task
Write a function to centre and standardize the input matrix X, returning as the output a structure with the following fields:
- Z: the centred and standardized matrix corresponding to the input X
- Mu: a vector of the original means of columns of X
- Sigma: a vector of the original standard deviations of columns of X
Tips
- Matlab's zscore function is part of the Stats Toolbox which is not available in Cody. You'll have to write your own.
- You should take care to avoid division by zero when a column is invariant.
Following problems in the series
Solution Stats
Problem Comments
-
1 Comment
Your definition of a constant (or invariant) data with rand is problematic. If you increase the size of your data (n=1000, n=10000...), you can always increase the deviations (so what threshold for sigma ?). I think that with real data, this artifact isn't possible. No ?
Solution Comments
Show commentsProblem Recent Solvers18
Suggested Problems
-
4559 Solvers
-
170 Solvers
-
890 Solvers
-
Convert a Cell Array into an Array
1939 Solvers
-
Check that number is whole number
4573 Solvers
More from this Author1
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!