Clear Filters
Clear Filters

Find similar rows but not equal at the same matrix

4 views (last 30 days)
Hi,
I have a mx3 matrix and need to find which rows are very close one another. For example
[a b c
d e f
a b c+1e-8
d e f
g h i
g h+1e-5 i
];
ans= [1,3 ; 5,6] %ignoring the rows that hasn't at least one similar. It's something like a min_tol to accept and an max_tol to reject
I've been thinking about tolerances with find(), unique, uniquetol(), intersection(?) or perform some calculations to get it, but i have only figure it out in ways that enlarge too much my data. Any guide Thanks!
  2 Comments
Stephen23
Stephen23 on 17 May 2017
Why are rows 2 and 4 not listed in your example output?
Libros Construccion
Libros Construccion on 17 May 2017
Because i don't want equals rows, just similar ones. unique() clear one of them, but i get row 2 outputted when i try it.

Sign in to comment.

Accepted Answer

Libros Construccion
Libros Construccion on 17 May 2017
Edited: Libros Construccion on 17 May 2017
Hey, thanks!. I'll try to follow your guides. Maybe i can combine them with this, which is getting me where i want.
A=[1 2 3; 1+1e-5 2 3; 1 2 3;1+1e-8 2 3];
simA=find(ismember(A,uniquetol(A,1e-6,'ByRows',true),'rows')==0);
ans=4

More Answers (1)

Jan
Jan on 17 May 2017
Edited: Jan on 17 May 2017
pdist replies the pairwise distances between the rows. Then you can filter out the values:
D = pdist(Data);
Match = (D ~= 0 & D < 1e-4);
Without the Statistics Toolbox you can calculate the distance matrix manually also. If the input is not huge, a loop might be useful also, because you can remove the unwanted element directly:
n = 1000;
data = rand(n, 3);
check = true(1, n);
result = zeros(n, 2); % Pre-allocate
iResult = 0;
limit = 1e-4 ^ 2; % Squared limit to avoid SQRT
for k = 1:n
if check(k) % Not included before (is this wanted?!)
dist = sum(data(k, :) - data(k+1:n, :)).^2, 2);
match = find(dist > 0 & dist < 1e-2);
...
end
end
Sorry, I cannot finish this code, because I found some open questions. What should happen for:
a b c
d e f
a b c+1e-4
a+1e-4 b c
Or if the limit is 1e-4, what about this:
a b c
a b c+1e-4
a b c+2e-4
What is the wanted result?
Why does the output have less columns than the input?

Categories

Find more on Stress and Strain in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!