I'm working with a matrix X ( 11 000 000 x 27 (single) matrix )
I want to cluster my data into k clusters, where k can be any integer from 4 to 20. I have standardise my data as I have values with very different units, and I'm using the kmeans() function in R2018a MATLAB.
X = bsxfun(@minus, X, mean(X));
X = bsxfun(@rdivide,X,std(X));
rng(0) % for repeatability
[km_ind,~,sumd] = kmeans(X,k,'MaxIter',10000000,'Replicates',5);
I have tried with up to 10 million 'MaxIter', but I still don't get convergence. I have tried different values for k and it doesn't change the warning message. Sometimes it gives me the warning message in a matter of seconds, and I doubt 10 million iterations were done in a couple of seconds.
Warning: Failed to converge in 10000000 iterations during replicate 1.
In kmeans/loopBody (line 476)
In internal.stats.parallel.smartForReduce (line 136)
In kmeans (line 343)
What am I missing ? what am I doing wrong ? any suggestions?
Thanks very much
EDIT 1: I have uploaded the first 60 000 observations of my data (already standardised). I also have problems when clustering this subset, and does not converge after a 10 million iterations.
EDIT 2: New information: I've compared the clustering results using: - 10^9 iterations (thousahd million iterations!) - 10^8 iterations - 10^7 iterations - 10^5 iterations - 10^4 iterations - 10^3 iterations - 10^2 iterations - 10 iterations and some more in between 10 and 50 iterations, and although I always receive the non-convergence warning, the result actually stops changing somewhere in between iteration 15th and 20th. What could make matlab yield the non-convergence message even when there actually seems to be a convergence in the results?