Find knn (nearest neighbour) point give a data set

5 views (last 30 days)
Hello, I have this simple data set:
Class1 = [1 0; 0.5 1; 0.5 1.5];
How do I go about finding the "nearest neighbor" point of that data set using the euclidean distance?
That is, generate a point based on the knn algorithm that will be closest to the three points.
The point highlighted in orange should be where the calculated "nearest neighbor" point should be?
  5 Comments
Walter Roberson
Walter Roberson on 17 Oct 2016
Not quite. For option 2 the answer would be the centroid, but the centroid would not necessarily become one of the nearest k neighbors of each of the three points.
It is necessary to first find the k nearest neighbors of the three points, and find the euclidean distance from the last of those k points in each case. That gives you a radius around each point; if the generated new point is further away than the radius then it would not become one of the k nearest neighbors relative to that point. The generated point must now be some point that is within the area of intersection of the three radii relative to their respective points. This will not be possible in the general case; it would not even be generally possible if there were only 2 points.
Consider that for any given point, up to (all but 2) could be close by to them, so your k could end up having to be high in order for the furthest away from the first point to be a radius large enough to include the ranges from the second and third points.
monkey_matlab
monkey_matlab on 17 Oct 2016
Edited: Walter Roberson on 17 Oct 2016
Hello Walter, how do I go about implementing this in Matlab? I tried
IDX2 = knnsearch(x_c2,y_c2)
but this just give an index. Thanks for your time and help.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 17 Oct 2016
In the below, I will assume that the coordinates of all points in your graph (including the three special points) are stored in x and y -- the J'th point being located at [x(J), y(J)]. Notice that these are X Y coordinates, not row and column coordinates -- in MATLAB, X corresponds to columns and Y corresponds to rows
The below assumes that the three distinguished points have their coordinates in special_x and special_y . The code does not assume that the coordinates are exact copies of what is in x and y, because of the potential for round-off error, but the code will "snap" each [special_x(J), special_y(J)] to its closest match in x, y
k = 5; %the number of closest neighbours eligible to be considered
XY = [x(:), y(:)];
special_XY = [special_x(:), special_y(:)];
%We are finding every point relative to every other point. That includes finding every
%point relative to itself. Which will be distance 0, which will be "nearest". We want
%to exclude the point to itself, so we need to find k+1 nearest neighbors and then
%ignore the first. When you bug-proof this code be sure to enhance it to take into
%account the possibility of duplicate points
[IDX, D] = knnsearch(XY, XY, 'K', k+1);
IDX = IDX(:, 2:end); %discard the point to itself
D = D(:, 2:end); %discard the distance of the point to itself
furthest_k_distance = D(:,end);
%now furthest_k_distance(J) is the distance from [X(J), Y(J)] to its K'th closest
%neighbour. If we insert a new point then it cannot be further away than this or else
%it will not be one of the K-closest neighbours.
%we do not count on the coordinates in special_x and special_y to be exact matches
%so we need to snap them to indices in x and y
special_IDX = knnsearch(XY, special_XY, 'K', 1);
%now we want coordinates of the special points corrected to be exactly in x, y
snapped_special_XY = XY(special_IDX, :);
k_closest_to_special = IDX(special_IDX,:); %indices into XY
special_furthest_distance = furthest_k_distance(special_IDX);
%now if the proposed point to be generated is not within the distance
%special_furthest_distance(J) of the corrected special points
%snapped_special_XY(J,:), then the proposed point would not be one of the K closest
%neighbours of [special_x(J), special_y(J)] and that would be disallowed
After this... you can be somewhat stuck. Consider for example,
x = [ 7 3 6 1 9 5 4 10 5 3 1 6 2 1 6];
y = [ 3 4 8 9 4 10 10 3 8 7 10 7 8 8 4];
special_x = x([4 7 8]);
special_y = y([4 7 8]);
scatter(x(:), y(:), 'b');
hold on
scatter(special_x(:), special_y(:), 70, 'r', 'filled', 'd')
The above shows the regular points in the graph as blue circles, and the three special points as red diamonds.
Then run the code I showed, but with k = 3. You will get
special_furthest_distance = [1.4142135623731; 2.82842712474619; 4.12310562561766]
(which is [sqrt(2), 2*sqrt(2), sqrt(17)])
Now,
viscircles(snapped_special_XY, special_furthest_distance, 'color', 'g')
hold off
this will show you, for each of the special points identified in red, a green circle around them that includes the 3 nearest neighbours. Two of the points will be obvious in each case; the third one is right on the circle.
Now, if the proposed point is not on or inside all of the circles simultaneously, then the proposed point is too far away to be one of the 3 nearest neighbours of the circle it is outside of.
But notice that in this example the large circle in the lower right does not intersect the two smaller circles in the upper left. Therefore no such point is possible -- any point that was close enough to the two special points in the upper left to be one of their three nearest neighbours, cannot possibly be one of the three nearest neighbours of the special point in the bottom right.
====
I was able to show that the centroid will not necessarily be within range for all points. I was also able to find code to get the solution if one is available:
%minimize sum of squares of euclidean distances to special points
G = @(Q)sum(sum((special_XY-Q).^2,2));
%under the constraint that the distances cannot be exceeded
nonlcon = @(Q)deal(sqrt(sum((special_XY-Q).^2,2))-special_furthest_distance,[]);
%use the centroid as the initial guess.
XY0 = mean(snapped_special_XY);
A = []; b = []; Aeq = []; beq = []; lb = []; ub = [];
try
[bestXY, ~, exitflag] = fmincon(G, XY0, A, b, Aeq, beq, lb, ub, nonlcon);
catch
exitflag = -2;
end
if exitflag < 0
fprintf('Search failed, no good trying again\n');
best_x = []; best_y = [];
elseif exitflag ~= 1
fprintf('Search failed, but perhaps a different starting point or more iterations would work\n');
best_x = []; best_y = [];
else
best_x = bestXY(1);
best_y = bestXY(2);
end

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!