Hello All, I really need your help, your advice. I am trying to do clustering, I have mixed data and big dataset. From all datased I selected only some of them. I do k-means algorithm but error is shown on my command window. Could you advise me, where is bug?
Thank you very very much...

 Accepted Answer

the cyclist
the cyclist on 3 Dec 2015

0 votes

What data type are you entering as the first argument to the kmeans command? According to the documentation, it has to be an n-by-p data matrix, but it looks like you have input a dataset instead.

8 Comments

Yes, It is dataset. How shoult I convert this dataset into matrix?
It's not possible to give a general answer on how to convert a dataset array to a matrix, because dataset arrays can hold a variety of other data types. I think you need to read the documentation for dataset arrays to see how to extract the contents and figure out exactly which variable you want to do the kmeans analysis on.
Also, it seems you are using the Import Data tool. There may have been a different way to import your data, so that you get a matrix rather than a dataset from the import.
So, Shoult I use dataset2table for convert heterogeneous data to matrix (table)? Or....what is simple way to import data to matrix when I have mixed data? I tried use [num txt raw]=xlsread(*.xlsx) but then I have all my data in raw variable...
A table is a specific type of MATLAB data object, and is different from a matrix (which is sort of the "default" numeric object you get just by typing things X = [1 2; 3 4]).
The input to the kmeans function must be a matrix. It cannot hold mixed data types. So, it won't do you any good to convert from a dataset to a table.
However you import, you need to end up with a matrix. I'm not sure I can advise on the best way to do that.
But how can i insert data (load data) into matlab to the matrix? And what shoult I do with nominal data? Is any way to use nominal data in clustering?...
I hope you are not offended by this statement, but it seems that you are new to MATLAB and new to clustering techniques, so it is difficult to try to explain both to you at the same time.
Regarding clustering:
Strictly speaking, K-means will not work on categorical data (to my knowledge), because it relies on a numerical distance function. There are related techniques like K-median and K-mode, but I don't know if MATLAB will help you with that.
Regarding getting a matrix from your Excel data:
I recommend that you open a new question about that. Post a small sample of your Excel file, and ask how to get the data into a matrix. Without seeing the data, it is almost impossible to suggest a specific technique.
Yes, you are right, I am newbie. Thank you for your time, help and advices. I really appreciate it. I will try use your recommendation.

Sign in to comment.

More Answers (1)

Ravi Injeti
Ravi Injeti on 14 Dec 2019

0 votes

why dont you try to use
find(zeros(x),2))) instead any(isNan) beause it is a python function

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!