Find knn (nearest neighbour) point give a data set
5 views (last 30 days)
monkey_matlab on 16 Oct 2016
Hello, I have this simple data set:
Class1 = [1 0; 0.5 1; 0.5 1.5];
How do I go about finding the "nearest neighbor" point of that data set using the euclidean distance?
That is, generate a point based on the knn algorithm that will be closest to the three points.
The point highlighted in orange should be where the calculated "nearest neighbor" point should be?
Walter Roberson on 17 Oct 2016
In the below, I will assume that the coordinates of all points in your graph (including the three special points) are stored in x and y -- the J'th point being located at [x(J), y(J)]. Notice that these are X Y coordinates, not row and column coordinates -- in MATLAB, X corresponds to columns and Y corresponds to rows
The below assumes that the three distinguished points have their coordinates in special_x and special_y . The code does not assume that the coordinates are exact copies of what is in x and y, because of the potential for round-off error, but the code will "snap" each [special_x(J), special_y(J)] to its closest match in x, y
k = 5; %the number of closest neighbours eligible to be considered
XY = [x(:), y(:)];
special_XY = [special_x(:), special_y(:)];
%We are finding every point relative to every other point. That includes finding every
%point relative to itself. Which will be distance 0, which will be "nearest". We want
%to exclude the point to itself, so we need to find k+1 nearest neighbors and then
%ignore the first. When you bug-proof this code be sure to enhance it to take into
%account the possibility of duplicate points
[IDX, D] = knnsearch(XY, XY, 'K', k+1);
IDX = IDX(:, 2:end); %discard the point to itself
D = D(:, 2:end); %discard the distance of the point to itself
furthest_k_distance = D(:,end);
%now furthest_k_distance(J) is the distance from [X(J), Y(J)] to its K'th closest
%neighbour. If we insert a new point then it cannot be further away than this or else
%it will not be one of the K-closest neighbours.
%we do not count on the coordinates in special_x and special_y to be exact matches
%so we need to snap them to indices in x and y
special_IDX = knnsearch(XY, special_XY, 'K', 1);
%now we want coordinates of the special points corrected to be exactly in x, y
snapped_special_XY = XY(special_IDX, :);
k_closest_to_special = IDX(special_IDX,:); %indices into XY
special_furthest_distance = furthest_k_distance(special_IDX);
%now if the proposed point to be generated is not within the distance
%special_furthest_distance(J) of the corrected special points
%snapped_special_XY(J,:), then the proposed point would not be one of the K closest
%neighbours of [special_x(J), special_y(J)] and that would be disallowed
After this... you can be somewhat stuck. Consider for example,
x = [ 7 3 6 1 9 5 4 10 5 3 1 6 2 1 6];
y = [ 3 4 8 9 4 10 10 3 8 7 10 7 8 8 4];
special_x = x([4 7 8]);
special_y = y([4 7 8]);
scatter(x(:), y(:), 'b');
scatter(special_x(:), special_y(:), 70, 'r', 'filled', 'd')
The above shows the regular points in the graph as blue circles, and the three special points as red diamonds.
Then run the code I showed, but with k = 3. You will get
special_furthest_distance = [1.4142135623731; 2.82842712474619; 4.12310562561766]
(which is [sqrt(2), 2*sqrt(2), sqrt(17)])
viscircles(snapped_special_XY, special_furthest_distance, 'color', 'g')
this will show you, for each of the special points identified in red, a green circle around them that includes the 3 nearest neighbours. Two of the points will be obvious in each case; the third one is right on the circle.
Now, if the proposed point is not on or inside all of the circles simultaneously, then the proposed point is too far away to be one of the 3 nearest neighbours of the circle it is outside of.
But notice that in this example the large circle in the lower right does not intersect the two smaller circles in the upper left. Therefore no such point is possible -- any point that was close enough to the two special points in the upper left to be one of their three nearest neighbours, cannot possibly be one of the three nearest neighbours of the special point in the bottom right.
I was able to show that the centroid will not necessarily be within range for all points. I was also able to find code to get the solution if one is available:
%minimize sum of squares of euclidean distances to special points
G = @(Q)sum(sum((special_XY-Q).^2,2));
%under the constraint that the distances cannot be exceeded
nonlcon = @(Q)deal(sqrt(sum((special_XY-Q).^2,2))-special_furthest_distance,);
%use the centroid as the initial guess.
XY0 = mean(snapped_special_XY);
A = ; b = ; Aeq = ; beq = ; lb = ; ub = ;
[bestXY, ~, exitflag] = fmincon(G, XY0, A, b, Aeq, beq, lb, ub, nonlcon);
exitflag = -2;
if exitflag < 0
fprintf('Search failed, no good trying again\n');
best_x = ; best_y = ;
elseif exitflag ~= 1
fprintf('Search failed, but perhaps a different starting point or more iterations would work\n');
best_x = ; best_y = ;
best_x = bestXY(1);
best_y = bestXY(2);