how to average the close points ?

hi all, I have a 1200*2 matrix let's call it 'center', which means different points in a photo. while several points are very close, for example, center(10,:)=[120,200]; and center(13,:)=[122,200]; or center(110,:)=[1420,2400]; and center(113,:)=[1420,2402] etc. I want to find get a new matrix only with the average value of close points. which means the points center(10,:) and center(13,:) could be presented as a new element[121,200], so as other points. Is there any efficient way to to this? thanks a lot.

Answers (2)

Walter Roberson
Walter Roberson on 17 Jun 2015
No, you cannot do this because "close" is not transitive.
If point A is close to point B, and point B is close to point C, then point A might not be close to point C. If you happened to be looking at point A then you would average A and B but leave C out because it was too far, but if you were looking at point B then you would average A, B, and C because both A and C are close to B. The result is thus going to depend upon the order of processing, and clearly that is not what you would want.
In order to do this you would need to determine through some other means which coordinates (does not need to be an existing point) that are to be the "nucleus" to gather together close points.
You are implicitly asking to do clustering, with all the difficulties that implies.

2 Comments

Tnx a lot. The attached file temp.txt, is the file I am working on. You can find that several of the points are extremely close to each other. What I want is use one point to show all of these closet points. I am trying to use 'kmeans' to solve it, still working on it now.
The file did not get attached.

Sign in to comment.

If you do not have "lines" of close points (the transitiveness issue that Walter mentioned), a simple and quick way to do this could be (here X is your 1200x2 matrix of point coordinates):
mindist = 5; % desired minimum distance between two points
nX = sum(X.^2,2);
d = bsxfun(@plus,nX,nX')-2*X*X';
[p,~,r] = dmperm(d<mindist^2);
nvoxels = diff(r);
for n=find(nvoxels>1)
idx = p(r(n):r(n+1)-1);
X(idx,:) = repmat(mean(X(idx,:),1),numel(idx),1);
X(idx(2:end),:) = nan;
end
X(any(isnan(X),2),:) = [];
If you do have "lines" of close points, and you do not want them collapsed into a single point, then yes, you would need some sort of clustering algorithm. If nothing more sophisticated is available (e.g. using the Statistics toolbox) you could try something like the following:
mindist = 5; % desired minimum distance between two points
while 1
nX = sum(X.^2,2);
d = bsxfun(@plus,nX,nX')-2*X*X';
[i,j] = find(d<mindist^2);
idx = find(i~=j,1);
if isempty(idx), break; end
X(i(idx),:) = mean(X([i(idx),j(idx)],:),1);
X(j(idx),:) = [];
end

4 Comments

It doesn't take "lines" as such. If you have a cluster of points around a centroid and you have a cutoff radius, then the points towards the boundary of the circle around the centroid would be up to sqrt(2) * radius apart at 90 degrees, and up to 2 * radius apart at 180 degrees. And of course if you are finding the centroid of the points then some points will be on one side of the centroid and some will be on the other side.
You can take a random starting point and add in nearest neighbours until some point would be further than the cutoff from an existing point. Getting the cutoff conditions right might be a bit tricky. But even if you got them right, which point you started with would make a difference.
Sorry if the "lines" comment was not clear, I probably should have used "connected sets" or some other clearer term. What I meant was the following:
Let's say you are going to paint at each point a marker/circle with radius r, and in your data there are only a few points that are "touching" in that plot (two points are touching if they are at distance less than 2*r). If you simply want to group those points that are touching, and these points do not form long "lines" (connected sets of touching points that contain more than two or a few points and extend well beyond r) then you could avoid the whole clustering approach and use a simpler graph approach like in my first code example.
If, on the other hand, your data does not fit this scenario then you are back to clustering methods, and yes, many of those will be dependent on your initial conditions (like k-means, or my second code example).
Take 3 equidistant points of distance R apart, an equilateral triangle. Draw circles of radius R around each of them. The area that is shared by all three is the area where any extra points would not be further than R away from any of the three points. I do not have the formula for the area of overlap at the moment, but it amounts to about 0.224 * Pi * R^2. The feasible area falls quickly.
I believe I understand your point, but I am not claiming that the graph approach is a suitable substitute for clustering, I am simply saying than in some simple cases your distribution of point-to-point distances has some clearly discernable "small-distance" outliers and for those cases only you can use this simpler approach to combine those abnormally-close points. For example, in my experience, I have run into these cases when I use a general-mixture model to fit some data (and I get multiple centroids that are effectively defining the same cluster so I would like to combine those), or when I run a filter-bank to find interesting features in an image (where again I often get small clusters of suprathreshold feature detectors effectively pointing at the same feature), and in many other scenarios, so I can assure you that these cases do exist, no matter how unlikely they would seem if your points followed a normal or poisson distribution :)

Sign in to comment.

Categories

Asked:

on 16 Jun 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!