Grouping multi-variable data points

3 views (last 30 days)
Gabriel Stanley
Gabriel Stanley on 10 Aug 2022
Answered: Divyam on 18 Sep 2024
I have three different data sources, of increasing generality. Group1 is a bunch of 2-value data points, Group2 is a good estimation of how the data in Group1 should be distributed (e.g. Group2 tells me there should be N datapoints with (x=7, y=2)), and Group3 is a collection of vague ranges into which I need to group the entries in Group2 and Group1 (e.g. Group3(1) = [5,8 ; 0,4]; Group3(2) = [7,9 ; 0,4]). I am trying to do two seperate things with these data sets, and whether it's from lack of sleep or coffee, I cannot figure out which MatLab functions I should be looking at to do the heavy lifting. I'm thinking one or more of hiscounts2, discretize, and/or maybe findgroups.
The tasks I'm trying to complete are:
1) Check that all the elements in Group1 align with the expected groups in Group2, and get some metadata on any outliers (e.g. to which Group2 element is any given unmatched Group1 element closest?)
2) ?Cluster? the elements in Group2 using the elements in Group3. E.g. if Group2(1) = [N,x=7,y=2], then it falls within both Group3(1) and Group3(2) as described above.
If any of y'all could help direct me to the appropriate functions Ishould focus on understanding & learning how to use, I would appreciate it.

Answers (1)

Divyam
Divyam on 18 Sep 2024
To determine whether data in Group1 aligns with the expected distribution in Group2 you can use the "pdist2" function to calculate the distance between the data points and use the "find" function for logically identifying the outliers after specifying a certain threshold.
% Example Group1 and Group2 data
Group1 = [7.1, 2.2; 6.9, 2.1; 7.5, 2.5];
Group2 = [7, 2; 8, 3];
% Calculate pairwise distances
distances = pdist2(Group1, Group2);
% Find the closest Group2 element for each Group1 point
[minDistances, closestIndices] = min(distances, [], 2);
% Determine outliers based on a threshold
outlierThreshold = 0.5;
outliers = find(minDistances > outlierThreshold);
% Display results
fprintf('Closest Group2 elements for each Group1 point: [%s]\n', join(string(closestIndices), ','));
Closest Group2 elements for each Group1 point: [1,1,1]
fprintf('Outliers: [%s]\n', join(string(outliers), ','));
Outliers: [3]
For determining the elements of Group2 which fall within the ranges specified in Group3, you can use both the "discretize" function or logical indexing.
% Example Group3 ranges
Group3 = {[5, 8; 0, 4], [7, 9; 0, 4]};
% Initialize clusters
clusters = cell(size(Group3));
% Check membership for each Group2 element using logical indexing
for i = 1:length(Group3)
range = Group3{i};
inCluster = Group2(:, 1) >= range(1, 1) & Group2(:, 1) <= range(1, 2) & ...
Group2(:, 2) >= range(2, 1) & Group2(:, 2) <= range(2, 2);
clusters{i} = find(inCluster);
end
% Display cluster assignments
for i = 1:length(clusters)
fprintf('Group3(%d) contains Group2 elements: [%s]\n', i, join(string(clusters{i}), ','));
end
Group3(1) contains Group2 elements: [1,2] Group3(2) contains Group2 elements: [1,2]
For more information regarding the "pdist2" and "find" functions, refer to the following documentation links:

Categories

Find more on MATLAB in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!