find duplicated rows in matlab without for loop
    29 views (last 30 days)
  
       Show older comments
    
Hello Friedns,
I have a very large matrix with 2 columns. I need to find the location of duplicated rows (the position of them) . However, I don't want to solve this problem with a for loop because I've tried it before (see the attached code) and it takes a long time. I'm looking for an alternative way to do this. I would be grateful if you could suggest me an idea.
Best, 
Mina 
x  = [File(:,1) File(:,2)];
Grid=unique(x,'rows');
for j=1:length(DD)
    idx=find(day_of_year==DD(j));
    File2=File(idx,:);
    for g=1:length(Grid)
        [index1]    = (ismember(File2(:,[1 2]),Grid(g,:),'rows'));
        idx2=find(index1==1);
        Total=[Total;Grid(g,1) Grid(g,2) DD(j) mean(File2(idx2,3)) mean(File2(idx2,4)) mean(File2(idx2,5)) mean(File2(idx2,6))];
    end
end
0 Comments
Accepted Answer
  Matt J
      
      
 on 21 Jul 2023
        
      Edited: Matt J
      
      
 on 21 Jul 2023
  
      [~,I]=unique(x,'rows');
locations=setdiff(1:height(x),I) %locations of duplicate rows
2 Comments
  Matt J
      
      
 on 21 Jul 2023
				
      Edited: Matt J
      
      
 on 21 Jul 2023
  
			It seems that for each line your code is only finding the position of one of the duplicated lines.
I don't think so. It should return the indices of all rows that have been seen before. As you can see below, the final locations list includes all rows except for 1 and 3, which is where a new row is encountered.
x=[ 1 2;
    1 2;
    0 4;
    1 2;
    0 4
    0 4];
[~,I]=unique(x,'rows');
locations=setdiff(1:height(x),I) %locations of duplicate rows
More Answers (1)
  Walter Roberson
      
      
 on 21 Jul 2023
        
      Moved: Matt J
      
      
 on 21 Jul 2023
  
      The third output of unique gives the "group number" for each entry. There are different ways of handling that. one of ways is
[unique_rows, ~, ic] = unique(x,'rows');
appears_in_rows = accumarray(ic, (1:size(x,1)).', [], @(v) {v});
T = table(unique_rows, appears_in_rows);
This would create a table in which the first variable is each unique row, and the second variable is a cell array of row indices that are that unique row. The cell array will always have at least one entry, but might have more.
See Also
Categories
				Find more on Logical in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!