How to change color of the data on biplot by the result of clustering

8 views (last 30 days)
Hello, everyone
I'm a begginer of Matlab.
I wanna show the result of the clustering with PCA and biplot.
But, I don't know how to change the color of the data on biplot by the result of clustering. In the picture, the color of the data is only red. I wanna separate the data into 3 colors(because the number of clusters is 3).
Could you tell me your idea?
D=readmatrix("Test.xlsx");
[coeff,score,latent]=pca(D);
[idx,H,sumd]=kmeans(D,3,MaxIter=1000,Display="final",Replicates=5);
Replicate 1, 10 iterations, total sum of distances = 10012.1. Replicate 2, 12 iterations, total sum of distances = 10011.4. Replicate 3, 11 iterations, total sum of distances = 10012.1. Replicate 4, 10 iterations, total sum of distances = 10012.1. Replicate 5, 13 iterations, total sum of distances = 10017.1. Best total sum of distances = 10011.4
vbls = {'Depth','Sample','Ping','sea bottom mean','Length','Height','Perimeter','Area','BAmean','TAmean','Elongation','UNEVENNESS1','UNEVENNESS"','Lectangularity','Fractual demensiton','Circularity'};
figure
biplot(coeff(:,1:3),'scores',score(:,1:3),"VarLabels",vbls)

Answers (1)

Atsushi Ueno
Atsushi Ueno on 11 Nov 2022
Edited: Atsushi Ueno on 11 Nov 2022
For attached data, the output of biplot function becomes like below.
The graphic handle "h" in this example contains 104 object handles.
  • Handles h(1:16) correspond to line handles for the three variables.
  • Handles h(17:32) correspond to marker handles for the three variables.
  • Handles h(33:48) correspond to text handles for the three variables.
  • Handles h(49:1012) correspond to line handles for the observations.
  • The last handle h(1013) corresponds to a line handle for the axis lines.
Also, "Cluster indices" (idx) which is one of output of kmeans function, is used as color index.
But there is a drawback that these values (from 1 to 3 in this case) change every time they are executed.
D=readmatrix("https://jp.mathworks.com/matlabcentral/answers/uploaded_files/1188973/Test.xlsx");
[coeff,score,latent]=pca(D);
[idx,H,sumd]=kmeans(D,3,MaxIter=1000,Display="final",Replicates=5);
Replicate 1, 14 iterations, total sum of distances = 10080.8. Replicate 2, 14 iterations, total sum of distances = 10804.3. Replicate 3, 9 iterations, total sum of distances = 10014.8. Replicate 4, 12 iterations, total sum of distances = 10796.3. Replicate 5, 8 iterations, total sum of distances = 11103.3. Best total sum of distances = 10014.8
vbls = {'Depth','Sample','Ping','sea bottom mean','Length','Height','Perimeter','Area','BAmean','TAmean','Elongation','UNEVENNESS1','UNEVENNESS"','Lectangularity','Fractual demensiton','Circularity'};
figure
h = biplot(coeff(:,1:3),'scores',score(:,1:3),"VarLabels",vbls); % output h has been added
% added from here
xlim([-0.1 0.5]); ylim([-0.1 0.5]); zlim([-0.5 0.3]); % to make it look good
color = 'rgb'; % just for this example
for k = 1:size(D,1)
h(k + size(D,2)*3).MarkerEdgeColor = color(idx(k)); % chenge the color of data
end
  1 Comment
Adam Danz
Adam Danz on 11 Nov 2022
I would encourage you to investigate this approach further by using a simpler data set with fewer points so you can see what's going on and confirm that this is what you want to do.
The demo below plots the results twice using the same data and same exact code but the results differ. This is because kmeans uses a random starting point so the grouping indices will likely differ each time you run it.
Load and compute data
load carsmall
X = [Acceleration Displacement Horsepower MPG Weight];
X = rmmissing(X);
Z = zscore(X); % Standardized data
Plot the results
[coefs,score] = pca(Z);
nClusters = width(Z);
[idx,H,sumd]=kmeans(Z,nClusters,MaxIter=1000,Replicates=5);
figure()
h = biplot(coefs(:,1:2),'Scores',score(:,1:2));
% Change color of varlines and observations according to kmeans results
colors = lines(width(Z));
tags = {h.Tag};
observationHandles = h(strcmp(tags, 'obsmarker'));
for i = 1:nClusters
h(i).Color = colors(i,:);
h(i).LineWidth = 2;
set(observationHandles(idx==i), 'Color', colors(i,:))
end
set(observationHandles, 'MarkerSize', 12)
Copy-pasted from the block above to plot this again
[coefs,score] = pca(Z);
nClusters = width(Z);
[idx,H,sumd]=kmeans(Z,nClusters,MaxIter=1000,Replicates=5);
figure()
h = biplot(coefs(:,1:2),'Scores',score(:,1:2));
% Change color of varlines and observations according to kmeans results
colors = lines(width(Z));
tags = {h.Tag};
observationHandles = h(strcmp(tags, 'obsmarker'));
for i = 1:nClusters
h(i).Color = colors(i,:);
h(i).LineWidth = 2;
set(observationHandles(idx==i), 'Color', colors(i,:))
end
set(observationHandles, 'MarkerSize', 12)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!