Classical Multidimensional Scaling

Open Live Script

This example shows how to use cmdscale to perform classical (metric) multidimensional scaling, also known as principal coordinates analysis.

cmdscale takes as an input a matrix of inter-point distances and creates a configuration of points. Ideally, those points are in two or three dimensions, and the Euclidean distances between them reproduce the original distance matrix. Thus, a scatter plot of the points created by cmdscale provides a visual representation of the original distances.

As a very simple example, you can reconstruct a set of points from only their inter-point distances. First, create some four dimensional points with a small component in their fourth coordinate, and reduce them to distances.

rng default;  % For reproducibility
X = [normrnd(0,1,10,3),normrnd(0,.1,10,1)];
D = pdist(X,'euclidean');

Next, use cmdscale to find a configuration with those inter-point distances. cmdscale accepts distances as either a square matrix, or, as in this example, in the vector upper-triangular form produced by pdist.

[Y,eigvals] = cmdscale(D);

cmdscale produces two outputs. The first output, Y, is a matrix containing the reconstructed points. The second output, eigvals, is a vector containing the sorted eigenvalues of what is often referred to as the "scalar product matrix," which, in the simplest case, is equal to Y*Y'. The relative magnitudes of those eigenvalues indicate the relative contribution of the corresponding columns of Y in reproducing the original distance matrix D with the reconstructed points.

format short g
[eigvals eigvals/max(abs(eigvals))]

ans = 10×2

        35.41            1
       11.158      0.31511
       1.6894      0.04771
       0.1436    0.0040553
   4.5646e-15   1.2891e-16
   2.2494e-15   6.3524e-17
   1.5788e-15   4.4585e-17
   9.8852e-16   2.7916e-17
  -1.1872e-15  -3.3526e-17
  -2.7265e-15  -7.6998e-17

If eigvals contains only positive and zero (within round-off error) eigenvalues, the columns of Y corresponding to the positive eigenvalues provide an exact reconstruction of D, in the sense that their inter-point Euclidean distances, computed using pdist, for example, are identical (within round-off) to the values in D.

maxerr4 = max(abs(D - pdist(Y)))   % Exact reconstruction

maxerr4 = 
   2.6645e-15

If two or three of the eigenvalues in eigvals are much larger than the rest, then the distance matrix based on the corresponding columns of Y nearly reproduces the original distance matrix D. In this sense, those columns form a lower-dimensional representation that adequately describes the data. However it is not always possible to find a good low-dimensional reconstruction.

maxerr3 = max(abs(D - pdist(Y(:,1:3))))  % Good reconstruction in 3D

maxerr3 = 
     0.043142

maxerr2 = max(abs(D - pdist(Y(:,1:2))))  % Poor reconstruction in 2D

maxerr2 = 
      0.98315

The reconstruction in three dimensions reproduces D very well, but the reconstruction in two dimensions has errors that are of the same order of magnitude as the largest values in D.

max(max(D))

ans = 
       5.8974

Often, eigvals contains some negative eigenvalues, indicating that the distances in D cannot be reproduced exactly. That is, there might not be any configuration of points whose inter-point Euclidean distances are given by D. If the largest negative eigenvalue is small in magnitude relative to the largest positive eigenvalues, then the configuration returned by cmdscale might still reproduce D well.