Array Row Similarity/Comparison
9 views (last 30 days)
Show older comments
Tyler Smith
on 16 Nov 2016
Commented: Tyler Smith
on 17 Nov 2016
I want to compare rows of two arrays to see which rows are most similar to one another, sort of like clustering. To be clear, I don't want to compare the differences between the numbers, rather the entire row as a whole. Another thing I would like to be able to do is see if particular numbers in the SLP variable occur more often when a number shows up in the same index in the 500z variable. Both variables have 20 columns and 49 rows, but in the following example variables I only included 3 rows and 5 cols.
SLP = [1,3,4,2,3
4,7,6,5,6
1,4,3,3,2]
500z= [9,6,7,6,6
7,5,7,6,8
9,7,6,6,6 ]
An example output I would like is: 1.) A measure of row similarity (perhaps a percentage of similarity or even a cluster number): The most similar rows in SLP are rows 1 and 3: therefore an example output could be a 3x3 matrix (SLP rows 1-3 going down and 500z rows 1-3 going across) with the percentage of similarity between each row. Or it could be in the form of a cluster. Ex: rows 1 and 3 belonging to cluster 1 and row 2 belonging to cluster 2. 2.) Which numbers occur most frequently in the same index between the two variables. So looking at the sample variables, I would get SLP 1 tends to occur with 500z 9. SLP 3 tends to occur with 500z 6, SLP 4 tends to occur with 500z 7, and so on. This could be output as simply as an array where column 1 is the SLP pair and column 2 is the 500z pair. It would be great to be able to have a column 3 as well saying how often the pair occurred.
I've been stuck on this for a while so any help or suggestions of how to best approach this problem would be awesome! I am also fairly new to Matlab, so my wording may not be the best.
0 Comments
Accepted Answer
Walter Roberson
on 17 Nov 2016
One of the techniques for similarity is sum-of-squares-of-differences between the rows.
It so happens that square root of sum-of-squares-of-differences is equivalent to Euclidean distance. Therefore you can find a similarity measure by using pdist() between the rows.
More Answers (0)
See Also
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!