K-Means Matlab cluster assignment

6 views (last 30 days)
Hello,
so I run K-Means algorithm in a data set and it can calculate that there are 4 different clusters, but the numbers are wrong. To be more specific, I would like it to assign the values in an increasing order.
E = evalclusters(c,'kmeans','DaviesBouldin','klist',[3:10])
kidx = kmeans(c,E.OptimalK);

Accepted Answer

Image Analyst
Image Analyst on 19 Feb 2020
Anastasis, below is a full demo of how to sort the labels according to how far the cluster centroid is from the origin, and how to relabel the class numbers so that class 1 will be closest and class 4 will be farthest away from the origin.
Don't be afraid of the length of the code. It's actually simple but it just looks long because I had to put in code to make some sample clustered data (which you won't need), and has code at the end to double-check/verify the results (which you won't need), as well as tons of comments to help explain it to you (which you should probably leave in).
Adapt as needed.
% Demo to show how you can redefine the class numbers assigned by kmeans() to different numbers.
% In this demo, the original, arbitrary class numbers will be reassigned a new number
% according to how far the cluster centroid is from the origin.
% Author Image Analyst, Feb. 2020.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clearvars;
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 18;
%-------------------------------------------------------------------------------------------------------------------------------------------
% CREATE SAMPLE DATA.
% Make up 4 clusters with 150 points each.
pointsPerCluster = 150;
spread = 0.03;
offsets = [0.3, 0.5, 0.7, 0.9];
% offsets = [0.62, 0.73, 0.84, 0.95];
xa = spread * randn(pointsPerCluster, 1) + offsets(1);
ya = spread * randn(pointsPerCluster, 1) + offsets(1);
xb = spread * randn(pointsPerCluster, 1) + offsets(2);
yb = spread * randn(pointsPerCluster, 1) + offsets(2);
xc = spread * randn(pointsPerCluster, 1) + offsets(3);
yc = spread * randn(pointsPerCluster, 1) + offsets(3);
xd = spread * randn(pointsPerCluster, 1) + offsets(4);
yd = spread * randn(pointsPerCluster, 1) + offsets(4);
x = [xa; xb; xc; xd];
y = [ya; yb; yc; yd];
xy = [x, y];
%-------------------------------------------------------------------------------------------------------------------------------------------
% K-MEANS CLUSTERING.
% Now do kmeans clustering.
% Determine what the best k is:
evaluationObject = evalclusters(xy, 'kmeans', 'DaviesBouldin', 'klist', [3:10])
% Do the kmeans with that k:
[assignedClass, clusterCenters] = kmeans(xy, evaluationObject.OptimalK);
clusterCenters % Echo to command window
% Do a scatter plot with the original class numbers assigned by kmeans.
hfig = figure;
subplot(1, 2, 1);
gscatter(x, y, assignedClass);
legend('FontSize', fontSize, 'Location', 'northwest');
grid on;
xlabel('x', 'fontSize', fontSize);
ylabel('y', 'fontSize', fontSize);
title('Original Class Numbers Assigned by kmeans()', 'fontSize', fontSize);
hfig.WindowState = 'maximized'; % Maximize the figure window so that it takes up the full screen.
%-------------------------------------------------------------------------------------------------------------------------------------------
% SORTING ALGORITHM
% Sort the clusters according to how far each cluster center is from the origin.
% First get the distance of each cluster center (as reported by the kmeans function) from the origin.
distancesFromOrigin = sqrt(clusterCenters(:, 1) .^ 2 + clusterCenters(:, 2) .^2)
%-------------------------------------------------------------------------------------------------------------------------------------------
% NOW GET NEW CLASS NUMBERS ACCORDING TO THAT SORTING ALGORITHM.
% Now, say for example, that you want to give the classes numbers according to how from from the origin they are.
% Determine what the new order to sort them in should be:
[sortedDistances, sortOrder] = sort(distancesFromOrigin, 'ascend') % Sort x values of centroids.
% Get new class numbers for each point since, for example,
% what used to be class 4 will now be class 1 since class 4 is closest to the origin.
% (The actual numbers may change for each run since kmeans is based on random initial sets.)
% Instantiate a vector that will tell each point what it's new class number will be.
newClassNumbers = zeros(length(x), 1);
% For each class, find out where it is
for k = 1 : size(clusterCenters, 1)
% First find out what points have this current class,
% and where they are by creating this logical vector.
currentClassLocations = assignedClass == k;
% Now assign all of those locations to their new class.
newClassNumber = find(k == sortOrder); % Find index in sortOrder where this class number appears.
fprintf('Initially the center of cluster %d is (%.2f, %.2f), %.2f from the origin.\n', ...
k, clusterCenters(k), clusterCenters(k), distancesFromOrigin(k));
fprintf(' Relabeling all points in initial cluster #%d to cluster #%d.\n', k, newClassNumber);
% Do the relabeling right here:
newClassNumbers(currentClassLocations) = newClassNumber;
end
% Plot the clusters with their new labels and colors.
subplot(1, 2, 2);
gscatter(x, y, newClassNumbers);
grid on;
xlabel('x', 'fontSize', fontSize);
ylabel('y', 'fontSize', fontSize);
title('New Class Numbers', 'fontSize', fontSize);
legend('FontSize', fontSize, 'Location', 'northwest');
% Basically, we're done now.
%-------------------------------------------------------------------------------------------------------------------------------------------
% DOUBLE CHECK, VERIFICATION, PROOF.
% To verify, let's get the mean (x,y) of each class after the relabeling.
fprintf('Now, after relabeling:\n');
for k = 1 : size(clusterCenters, 1)
% First find out what points have this class.
% and where they are by creating this logical vector.
currentClassLocations = newClassNumbers == k;
% Now assign all of those locations to their new class.
meanx(k) = mean(x(currentClassLocations));
meany(k) = mean(y(currentClassLocations));
fprintf('The center of cluster %d is (%.2f, %.2f).\n', k, meanx(k), meany(k));
end
% cc = [assignedClass, newClassNumbers]; % Class assignments, side-by-side.

More Answers (1)

KSSV
KSSV on 18 Feb 2020
Edited: KSSV on 18 Feb 2020
If idx are the indices and P are the points you have.
figure
hold on
for i = 1:4
plot(P(idx==i,1),P(idx==i,2),'.') ;
end
legend({'1','2','3','4'})
  2 Comments
Anastasis Pk
Anastasis Pk on 18 Feb 2020
The problem is that the indices in kidx, are assigned in the wrong way. I want them to be assigned in an increasing centroid order.
KSSV
KSSV on 18 Feb 2020
That is not a problem....get the centroids.....sort them and sort the indices accordingly.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!