Sort Group Data Trends

2 views (last 30 days)
T
T on 3 Apr 2013
Suppose you had a vector
A =
1.2
1.3
1.4
1.5
2.4
2.5
2.6
2.7
2.8
2.9
4.1
4.2
4.3
4.4
4.6
4.7
Is it possible to sort this data into groups automatically without knowing if your vector has groups in 1.X, 2.X or 4.X ? Is it possible to detect a trend?
Group#1:
1.2
1.3
1.4
1.5
Group#2:
2.4
2.5
2.6
2.7
2.8
2.9
Group#3:
4.1
4.2
4.3
4.4
4.6
4.7
  3 Comments
T
T on 3 Apr 2013
Edited: T on 3 Apr 2013
Right but suppose you don't know what the other groups could be but you have a bunch of data, say
7885.93420752164
7929.78836160714
7936.53852291667
7945.56820951705
7954.26671073718
7961.92029910714
7968.44986160714
7983.83251
7988.44219642857
8014.91854589844
11128.7494221591
11174.72721875
11177.7595013587
11181.1245013587
11184.4331732955
11187.3921853147
11190.4290520833
11241.0803513889
11250.4949040616
11284.1499960937
15498.1340036932
15525.80777125
15533.8052548077
15542.8122678571
15551.8893194444
15559.6083854167
15565.6444630682
15587.30253125
15595.4106951531
15623.6680576002
There are three groups here, 7k 11k and 15k. Now in the simple example you shown, you assumed that we already know what the groups would look like.
Sven
Sven on 3 Apr 2013
So how to you know that there are three groups?
If your criteria is "my human brain looked at the numbers and using all my years experience with numbers and the knowledge of my current problem I chose 3", then things will be difficult.
If instead you can describe a thought process such as:
  1. I took the smallest number (7885) and saw it had 4 digits
  2. I divided all numbers by 10E(4-1)
  3. I rounded all the numbers, and grouped by the result.
... then we might be able to do something.

Sign in to comment.

Accepted Answer

Sven
Sven on 3 Apr 2013
Edited: Sven on 3 Apr 2013
Anthony, here's a solution relevant to my comment above. Note that there are in fact 4 groups: 7k, 8k, 11k, and 15k.
nums = [7885.93420752164
7929.78836160714
7936.53852291667
7945.56820951705
7954.26671073718
7961.92029910714
7968.44986160714
7983.83251
7988.44219642857
8014.91854589844
11128.7494221591
11174.72721875
11177.7595013587
11181.1245013587
11184.4331732955
11187.3921853147
11190.4290520833
11241.0803513889
11250.4949040616
11284.1499960937
15498.1340036932
15525.80777125
15533.8052548077
15542.8122678571
15551.8893194444
15559.6083854167
15565.6444630682
15587.30253125
15595.4106951531
15623.6680576002]
minDigits = length(num2str(round(min(nums))));
rescaledNums = nums/ 10^(minDigits-1);
[grpPrefixes ,~,groups] = unique(floor(rescaledNums))
grpPrefixes =
7
8
11
15
  3 Comments
Sven
Sven on 4 Apr 2013
Anthony, in your first example you separated numbers that were less than 1 apart. Here you're merging numbers "because their difference is 26". Can you see the inconsistency? I agree that a human mind can see patterns, but can you see that if you want a computer to find the same pattern as your mind you need to describe how you are choosing your separation?
If you can describe clearly why you chose 3 groups for your first example and then (using the exact same reasoning!) have it also choose 3 groups for your second example, then we can help you cluster your data. For example, the logic I gave in my first comment clusters your first example into 3 groups and your second example into 4 groups, but at least it is 100% non-ambiguous so can therefore be coded.
If you know in advance how many clusters, then you can use kmeans() which will (possibly) conform to how you would expect the clustering to be done:
groups = kmeans(nums,3)
Does that help you out?
T
T on 4 Apr 2013
I'll look into it. Thanks!

Sign in to comment.

More Answers (0)

Categories

Find more on Shifting and Sorting Matrices in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!