Find knn (nearest neighbour) point give a data set

Question

monkey_matlab on 16 Oct 2016

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/307578-find-knn-nearest-neighbour-point-give-a-data-set

Answered: Walter Roberson on 17 Oct 2016

Accepted Answer: Walter Roberson

Open in MATLAB Online

Hello, I have this simple data set:

Class1 = [1 0; 0.5 1; 0.5 1.5];

How do I go about finding the "nearest neighbor" point of that data set using the euclidean distance?

That is, generate a point based on the knn algorithm that will be closest to the three points.

The point highlighted in orange should be where the calculated "nearest neighbor" point should be?

5 Comments
Show 3 older commentsHide 3 older comments

Walter Roberson on 16 Oct 2016

Is the question:

Generate a new point (typically not already present in the dataset) such that the sum of the Euclidean distances to the three given points will be smallest?
Generate a new point (typically not already present in the dataset) such that the sum of the squares of the Euclidean distances to the three points will be smallest?
Generate a new point (typically not already present in the dataset) such that the new point would be one of the k closest neighbors of each of the three points and such that the sum of the Euclidean distances to the three given points will be smallest?
Generate a new point (typically not already present in the dataset) such that the new point would be one of the k closest neighbors of each of the three points and such that the sum of the squares of the Euclidean distances to the three points will be smallest?
Find an existing point already in the dataset such that the sum of the Euclidean distances to the three given points will be smallest?
Find an existing point already in the dataset such that the sum of the squares of the Euclidean distances to the three given points will be smallest?
Find an existing point already in the dataset such that point is one of the k closest neighbors for each of the three given points, and such that the sum of the Euclidean distances to the three given points will be smallest?
Find an existing point already in the dataset such that point is one of the k closest neighbors for each of the three given points, and such that the sum of the squares of the Euclidean distances to the three given points will be smallest?

Walter Roberson on 17 Oct 2016

Not quite. For option 2 the answer would be the centroid, but the centroid would not necessarily become one of the nearest k neighbors of each of the three points.

It is necessary to first find the k nearest neighbors of the three points, and find the euclidean distance from the last of those k points in each case. That gives you a radius around each point; if the generated new point is further away than the radius then it would not become one of the k nearest neighbors relative to that point. The generated point must now be some point that is within the area of intersection of the three radii relative to their respective points. This will not be possible in the general case; it would not even be generally possible if there were only 2 points.

Consider that for any given point, up to (all but 2) could be close by to them, so your k could end up having to be high in order for the furthest away from the first point to be a radius large enough to include the ranges from the second and third points.

monkey_matlab on 17 Oct 2016

Edited: Walter Roberson on 17 Oct 2016

Open in MATLAB Online

Hello Walter, how do I go about implementing this in Matlab? I tried

IDX2 = knnsearch(x_c2,y_c2)

but this just give an index. Thanks for your time and help.

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 17 Oct 2016

2
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/307578-find-knn-nearest-neighbour-point-give-a-data-set#answer_239322

Open in MATLAB Online

In the below, I will assume that the coordinates of all points in your graph (including the three special points) are stored in x and y -- the J'th point being located at [x(J), y(J)]. Notice that these are X Y coordinates, not row and column coordinates -- in MATLAB, X corresponds to columns and Y corresponds to rows

The below assumes that the three distinguished points have their coordinates in special_x and special_y . The code does not assume that the coordinates are exact copies of what is in x and y, because of the potential for round-off error, but the code will "snap" each [special_x(J), special_y(J)] to its closest match in x, y

k = 5;      %the number of closest neighbours eligible to be considered
XY = [x(:), y(:)];
special_XY = [special_x(:), special_y(:)];
%We are finding every point relative to every other point. That includes finding every
%point relative to itself. Which will be distance 0, which will be "nearest". We want
%to exclude the point to itself, so we need to find k+1 nearest neighbors and then
%ignore the first. When you bug-proof this code be sure to enhance it to take into
%account the possibility of duplicate points
[IDX, D] = knnsearch(XY, XY, 'K', k+1);  
IDX = IDX(:, 2:end);    %discard the point to itself
D = D(:, 2:end);        %discard the distance of the point to itself
furthest_k_distance = D(:,end);
%now furthest_k_distance(J) is the distance from [X(J), Y(J)] to its K'th closest
%neighbour. If we insert a new point then it cannot be further away than this or else
%it will not be one of the K-closest neighbours.
%we do not count on the coordinates in special_x and special_y to be exact matches
%so we need to snap them to indices in x and y
special_IDX = knnsearch(XY, special_XY, 'K', 1);
%now we want coordinates of the special points corrected to be exactly in x, y
snapped_special_XY = XY(special_IDX, :);
k_closest_to_special = IDX(special_IDX,:);    %indices into XY
special_furthest_distance = furthest_k_distance(special_IDX);
%now if the proposed point to be generated is not within the distance
%special_furthest_distance(J) of the corrected special points
%snapped_special_XY(J,:), then the proposed point would not be one of the K closest
%neighbours of [special_x(J), special_y(J)] and that would be disallowed

After this... you can be somewhat stuck. Consider for example,

x = [ 7  3  6  1  9  5  4 10  5  3  1  6  2  1  6];
y = [ 3  4  8  9  4 10 10  3  8  7 10  7  8  8  4];
special_x = x([4 7 8]);
special_y = y([4 7 8]);
scatter(x(:), y(:), 'b');
hold on
scatter(special_x(:), special_y(:), 70, 'r', 'filled', 'd')

The above shows the regular points in the graph as blue circles, and the three special points as red diamonds.

Then run the code I showed, but with k = 3. You will get

special_furthest_distance = [1.4142135623731; 2.82842712474619; 4.12310562561766]

(which is [sqrt(2), 2*sqrt(2), sqrt(17)])

Now,

viscircles(snapped_special_XY, special_furthest_distance, 'color', 'g')
hold off

this will show you, for each of the special points identified in red, a green circle around them that includes the 3 nearest neighbours. Two of the points will be obvious in each case; the third one is right on the circle.

Now, if the proposed point is not on or inside all of the circles simultaneously, then the proposed point is too far away to be one of the 3 nearest neighbours of the circle it is outside of.

But notice that in this example the large circle in the lower right does not intersect the two smaller circles in the upper left. Therefore no such point is possible -- any point that was close enough to the two special points in the upper left to be one of their three nearest neighbours, cannot possibly be one of the three nearest neighbours of the special point in the bottom right.

====

I was able to show that the centroid will not necessarily be within range for all points. I was also able to find code to get the solution if one is available:

%minimize sum of squares of euclidean distances to special points
G = @(Q)sum(sum((special_XY-Q).^2,2));
%under the constraint that the distances cannot be exceeded
nonlcon = @(Q)deal(sqrt(sum((special_XY-Q).^2,2))-special_furthest_distance,[]);
%use the centroid as the initial guess.
XY0 = mean(snapped_special_XY);
A = []; b = []; Aeq = []; beq = []; lb = []; ub = [];
try
  [bestXY, ~, exitflag] = fmincon(G, XY0, A, b, Aeq, beq, lb, ub, nonlcon);
catch
  exitflag = -2;
end
if exitflag < 0
  fprintf('Search failed, no good trying again\n');
  best_x = []; best_y = [];
elseif exitflag ~= 1
  fprintf('Search failed, but perhaps a different starting point or more iterations would work\n');
  best_x = []; best_y = [];
else
  best_x = bestXY(1);
  best_y = bestXY(2);
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Find knn (nearest neighbour) point give a data set

5 Comments
Show 3 older commentsHide 3 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Find knn (nearest neighbour) point give a data set

5 Comments Show 3 older commentsHide 3 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments