Efficient way to identify duplicate edges.

Hello,
I am having trouble coming up with efficient method to identify duplicate edges given the edge list.
Assume we have the following edge list:
SampleEdgeList=[9 8;1 3;4 6;7 3;2 4;3 1;]
First column represent starting nodes and the second column represent corresponding ending nodes. I am working with undirected edges, therefore, second row [1 3] and the sixth row [3 1] means the same thing. As a result, I have duplicate edges connecting node 1 and node 3.
I have EdgeList containing millions of edges, therefore, I would like to avoid for-loops and come up with the most efficient way to identify those duplicate edges.
Ultimately, I would like to regenerate one-end of the nodes for all the duplicate edges, to eliminate all duplicate edges.
Thanks for the help in advance!
Louis

 Accepted Answer

the cyclist
the cyclist on 8 Aug 2013
Edited: the cyclist on 8 Aug 2013
uniqueEdgeList = unique(sort(SampleEdgeList,2),'rows')
You can also use the second and third outputs of the unique() function to know which of the original rows map to the unique rows, and vice versa. See
>> doc unique
for details.

4 Comments

Thank you! I didn't realize that simple sort function would make the unique function possible in this case.
Respected Sir, I am reading the data from a text file. Example : 1 2, 1 3, 1 4, 2 1, 3 1 are the edges. Here (1, 3) and (3, 1) (1, 2) and (2, 1) are duplicates. How to remove these duplicates from the dataset. The above statement given by you is not working for me.
fileID = fopen('C:\Users\TR RAO\Desktop\rao1.txt','r'); C = textscan(fileID, '%s %s'); fclose(fileID); d1=cellstr(C{1,1});d2=cellstr(C{1,2}); G=graph(d1,d2); A=adjacency(G) full(A); But this is not working for duplicate edges
TR RAO, see Christine Tobler's comment on Anurag's answer.

Sign in to comment.

More Answers (1)

I have the same question. However, my edges are in a cell array and are text. For example 'A0A023PXA5' 'O23144' 'A0A023PXP4' 'O23171' 'A0A023PYF7' 'O23171'
and sort (DIM and MODE) do not work on cell arrays. I have also tried to build source and target as separate cell arrays to use
G= graph(s,t)
But i am still getting the same error as:
Error using matlab.internal.graph.MLGraph Duplicate edges not supported.
Error in matlab.internal.graph.constructFromEdgeList (line 125) G = underlyingCtor(double(s), double(t), totalNodes);
Error in matlab.internal.graph.constructFromTable (line 40)
what am I doing wrong.

2 Comments

I realize this is late, but I'll add it in case it's still helpful: sort and unique along the rows are not supported for cell arrays of character vectors (cellstr), but is supported on the new string class. So you can do the following:
edgesString = string([s(:), t(:)]);
edgesUnique = cellstr(unique(edgesString, 'rows'))
g = graph(edgesUnique(:, 1), edgesUnique(:, 2));
It's not very nice-looking, but it should work.
My data is (1,2) (1, 3) (1,4) (3,1)(2, 1) edgesUnique = cellstr(unique(edgesString, 'rows')) is working. But g = graph(edgesUnique(:, 1), edgesUnique(:, 2)); is not working for thisdata.

Sign in to comment.

Categories

Find more on Graph and Network Algorithms in Help Center and File Exchange

Asked:

on 8 Aug 2013

Commented:

on 19 Jan 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!