Is there is any alternative for evalclusters (Evaluate clustering solutions)
2 views (last 30 days)
Show older comments
Hello i hope you are doing well.
I have implement K-means, Now i want to find K value automatically.
I have applied evalclusters, but it is very slow, it takes more than 10 seconds to give answer for 59000 values.
is there is any other solution to Find optimal K.
0 Comments
Answers (1)
Walter Roberson
on 24 May 2022
Yes, for kmeans, the optimal number of clusters is always
size(unique(Data, 'rows'), 1)
That is, the optimal number of clusters is always the number of unique points.
When each unique set of points is at the center of its own cluster then the distance between the points and the cluster center is always 0 (no matter which distance measure), and with every point always being distance 0 from its center, the overall fit is perfect and cannot be improved.
Using fewer clusters than number of unique points always has at least one non-zero cost and so cannot be as good.
If you were expecting something else then either you had not thought about this, or else you have a per-cluster cost. But there is no possible "objective" non-zero per-cluster cost, so unless you have an explicit cost value you cannot proceed. And if you do have an explicit cost value, you should probably also have an explicit cost function to price a proposed cluster.
For example if the context were building power distribution stations then obviously there should be a cost per station. Is the cost fixed, or does it depend on the number of points being served by the station? Is there a maximum station size? Is there a cost for electrical cables between the houses and the station? It is not likely to be the case that each house has a direct wire to the station, probably there would be runs of cables and maybe splitters, and optimization of the cables to the station would be necessary... And since the power distribution stations are not generating power, you need costs from the distribution stations to the power source...
Thus you can choose the abstract "give each point its own cluster" if you want mathematical simplicity. And if you do not want that simplicity because you are trying to represent "real" costs, then you need a lot more detail, and kmeans is probably not an appropriate algorithm (kmeans cannot model within-local-cluster costs that depend upon point relationships to each other, only point-to-centroid costs.)
2 Comments
See Also
Categories
Find more on Cluster Analysis and Anomaly Detection in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!