How can I remove inverted repeat pairs of strings from a table?
1 view (last 30 days)
Show older comments
Hi I wanna extract the inverted repeat pairs of strings from a 650x2 table. Let say I have the following pairs in a table:
A123.B123 B123.C123
A456.B456 B456.C456
A789.B789 B789.C789
B123.C123 A123.B123
B456.C456 A456.B456
. .
. .
. .
So as you can see there are some pairs that if we invert the order of pairing they became the same pair, for example the first pair with the fourth pair so I wanna extract those inverted repeated pairs from my table but I dont know how to do it. I tried with the "unique" function but that doesnt seems to work for inverted repeats. Any suggestions?
3 Comments
Dyuman Joshi
on 7 May 2024
Edited: Dyuman Joshi
on 7 May 2024
@Paul Jimenez, There are no inverted string pairs in the data you have -
readtable('table.csv')
Accepted Answer
Voss
on 7 May 2024
Edited: Voss
on 7 May 2024
T = readtable('table.csv')
Here's one way to find pairs of reversed rows:
temp = string(T.(1)) == string(T.(2)).';
[r2,r1] = find(temp & temp.');
r = [r1 r2];
disp(r)
That says row 1 is a reversed copy of row 104, row 2 is a reversed copy of row 33, and so on.
Checking the first few, they do seem to be reversed pairs of rows:
T{r(1,:),:} % rows 1 and 104
T{r(2,:),:} % rows 2 and 33
T{r(3,:),:} % rows 3 and 69
I'm not sure exactly what you want to do with this information.
1 Comment
Voss
on 7 May 2024
Here's a slight modification that's useful for removing one of each pair of reversed rows from the table:
T = readtable('table.csv')
temp = string(T.(1)) == string(T.(2)).';
idx = tril(temp & temp.');
idx(1:size(T,1)+1:end) = false; % to avoid removing a row that is the reverse of itself,
% set elements of idx along the diagonal to false
[r,~] = find(idx);
T(r,:) = []
Checking again for reversed pairs of rows confirms that the only ones left are the reverse of themselves:
temp = string(T.(1)) == string(T.(2)).';
[r,~] = find(tril(temp & temp.'))
T(r,:)
More Answers (1)
Mathieu NOE
on 6 May 2024
hello Paul
this would be my suggestion
attached your data simply pasted in a text file
hope it helps
clc
out = readcell('data.txt');
first_col = out(:,1);
second_col = out(:,2);
% main loop
n = 0;
for k = 1:numel(first_col)
tf = strcmp(first_col{k},second_col);
if any(tf)
n = n + 1; % increase counter
ind1(n,1) = k;
ind2(n,1) = find(tf);
end
end
% all matching pairs
out = [ind1 ind2]
2 Comments
Mathieu NOE
on 7 May 2024
hello again
seems that in the csv file , in each column you have duplicates of strings
so I simply asked to perform the same process but taking only the unique strings in consideration , but of course it's not the same list as your original file
it is what you wanted or not ?
data = readcell('table.csv');
first_col = unique(data(:,1));
second_col = unique(data(:,2));
% main loop
n = 0;
for k = 1:numel(first_col)
tf = strcmp(first_col{k},second_col);
if any(tf)
n = n + 1; % increase counter
ind1(n,1) = k;
ind2(n,1) = find(tf);
end
end
% all matching pairs
out = [ind1 ind2]
See Also
Categories
Find more on Whos in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!