MATLAB Answers

Kmeans failed to converge after 10 million iterations! (BIG DATA)

17 views (last 30 days)
Ame ZL
Ame ZL on 2 Aug 2018
Commented: Image Analyst on 30 Dec 2020
Hello everyone,
I'm working with a matrix X ( 11 000 000 x 27 (single) matrix )
I want to cluster my data into k clusters, where k can be any integer from 4 to 20. I have standardise my data as I have values with very different units, and I'm using the kmeans() function in R2018a MATLAB.
X = bsxfun(@minus, X, mean(X));
X = bsxfun(@rdivide,X,std(X));
rng(0) % for repeatability
[km_ind,~,sumd] = kmeans(X,k,'MaxIter',10000000,'Replicates',5);
I have tried with up to 10 million 'MaxIter', but I still don't get convergence. I have tried different values for k and it doesn't change the warning message. Sometimes it gives me the warning message in a matter of seconds, and I doubt 10 million iterations were done in a couple of seconds.
Warning: Failed to converge in 10000000 iterations during replicate 1.
In kmeans/loopBody (line 476)
In internal.stats.parallel.smartForReduce (line 136)
In kmeans (line 343)
What am I missing ? what am I doing wrong ? any suggestions?
Thanks very much
EDIT 1: I have uploaded the first 60 000 observations of my data (already standardised). I also have problems when clustering this subset, and does not converge after a 10 million iterations.
EDIT 2: New information: I've compared the clustering results using: - 10^9 iterations (thousahd million iterations!) - 10^8 iterations - 10^7 iterations - 10^5 iterations - 10^4 iterations - 10^3 iterations - 10^2 iterations - 10 iterations and some more in between 10 and 50 iterations, and although I always receive the non-convergence warning, the result actually stops changing somewhere in between iteration 15th and 20th. What could make matlab yield the non-convergence message even when there actually seems to be a convergence in the results?
  3 Comments
Ame ZL
Ame ZL on 3 Aug 2018
New information:
I've compared the clustering results using:
- 10^9 iterations (thousahd million iterations!)
- 10^8 iterations
- 10^7 iterations
- 10^5 iterations
- 10^4 iterations
- 10^3 iterations
- 10^2 iterations
- 10 iterations
and some more in between 10 and 50 iterations, and although I always receive the non-convergence warning, the result actually stops changing somewhere in between iteration 15th and 20th.
What could make matlab yield the non-convergence message even when there actually seems to be a convergence in the results?

Sign in to comment.

Answers (1)

Image Analyst
Image Analyst on 2 Aug 2018
What makes you think there are 4 to 20 clusters? Any basis to justify that belief?
If you have some that you think are in different clusters, then use them as training points and try k nearest neighbors. I believe, from the nature of KNN, it must converge. Or try random forest, which is kind of like an ad hoc big if-then-else statement.
  5 Comments
Image Analyst
Image Analyst on 30 Dec 2020
xiaoyu, this is not a Answer to AME's question. Post the link to your totally separate, new discussion thread so that we don't keep sending emails to Ame about new activity on this thread, by editing your question above to remove the code and data and give the link to the new question that is all your own.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!