How can I remove inverted repeat pairs of strings from a table?

Question

Paul Jimenez on 6 May 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2115566-how-can-i-remove-inverted-repeat-pairs-of-strings-from-a-table

Commented: Voss on 7 May 2024

table.csv

Hi I wanna extract the inverted repeat pairs of strings from a 650x2 table. Let say I have the following pairs in a table:

A123.B123 B123.C123

A456.B456 B456.C456

A789.B789 B789.C789

B123.C123 A123.B123

B456.C456 A456.B456

. .

So as you can see there are some pairs that if we invert the order of pairing they became the same pair, for example the first pair with the fourth pair so I wanna extract those inverted repeated pairs from my table but I dont know how to do it. I tried with the "unique" function but that doesnt seems to work for inverted repeats. Any suggestions?

3 Comments
Show 1 older commentHide 1 older comment

Dyuman Joshi on 7 May 2024

Edited: Dyuman Joshi on 7 May 2024

Open in MATLAB Online

table.csv

@Paul Jimenez, There are no inverted string pairs in the data you have -

readtable('table.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
ans = 656x2 table
     x10N_286_54_E7        x10N_286_54_B1  
    _________________    __________________

    {'10N.286.55.E1'}    {'10N.286.54.B1' }
    {'10N.286.46.E8'}    {'10N.261.52.F7' }
    {'10N.286.54.A6'}    {'10N.286.54.B1' }
    {'10N.286.54.A6'}    {'10N.286.54.A11'}
    {'10N.286.54.F7'}    {'10N.286.54.A11'}
    {'10N.286.54.F7'}    {'10N.286.54.A1' }
    {'10N.286.54.A6'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.A6' }
    {'10N.286.55.E1'}    {'10N.286.54.E1' }
    {'12B09'        }    {'12B09'         }
    {'10N.286.54.E7'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.E1' }
    {'10N.286.54.E1'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.A1' }
    {'10N.286.55.E8'}    {'10N.286.55.B8' }
    {'10N.286.54.A6'}    {'10N.286.54.E1' }

Paul Jimenez on 7 May 2024

Yes there are. If you look for example, the first pair under the headers:

10N.286.55.E1 10N.286.54.B1

and then look at the 104 pair of the table you will see that is :

10N.286.54.B1 10N.286.55.E1

which essentially is the same pair but in oposite order.

Sign in to comment.

Sign in to answer this question.

Answer 1

Voss on 7 May 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2115566-how-can-i-remove-inverted-repeat-pairs-of-strings-from-a-table#answer_1453947

Edited: Voss on 7 May 2024

Open in MATLAB Online

table.csv

T = readtable('table.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T = 656x2 table
     x10N_286_54_E7        x10N_286_54_B1  
    _________________    __________________

    {'10N.286.55.E1'}    {'10N.286.54.B1' }
    {'10N.286.46.E8'}    {'10N.261.52.F7' }
    {'10N.286.54.A6'}    {'10N.286.54.B1' }
    {'10N.286.54.A6'}    {'10N.286.54.A11'}
    {'10N.286.54.F7'}    {'10N.286.54.A11'}
    {'10N.286.54.F7'}    {'10N.286.54.A1' }
    {'10N.286.54.A6'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.A6' }
    {'10N.286.55.E1'}    {'10N.286.54.E1' }
    {'12B09'        }    {'12B09'         }
    {'10N.286.54.E7'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.E1' }
    {'10N.286.54.E1'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.A1' }
    {'10N.286.55.E8'}    {'10N.286.55.B8' }
    {'10N.286.54.A6'}    {'10N.286.54.E1' }

Here's one way to find pairs of reversed rows:

temp = string(T.(1)) == string(T.(2)).';
[r2,r1] = find(temp & temp.');
r = [r1 r2];
disp(r)
 104
  33
  69
  51
 200
 160
  18
 194
  13
  10
  70
 188
   9
 293
  24
 101
  90
   7
  93
  96
 268
  26
  73
  15
 205
  22
  58
  95
 357
  46
  75
 203
   2
  52
 182
 183
  86
  59
 249
  40
  67
 397
  55
 328
  30
 110
  77
 383
 173
   4
  34
 112
  44
  74
  89
  27
  38
 215
 217
 274
 108
 288
 255
  41
 210
   3
  11
  97
 227
  23
  56
  31
  48
 514
 126
 214
 218
 338
 229
 135
 251
  37
 129
 498
  57
  17
 161
 287
  19
 339
  28
  20
  71
 149
 325
 286
  16
 198
 169
   1
 223
 261
 224
  64
 367
  47
 176
  53
 349
 337
 521
 262
 192
 458
 326
 324
 133
  79
 424
  87
 143
 187
 520
 124
 189
  84
 301
 477
 645
 197
 213
 323
 206
 130
 622
 319
 159
 294
  98
 373
 400
 416
 170
 453
 226
 211
 230
 146
   6
  91
 247
 413
 241
 422
 172
 443
 103
 154
 327
 167
  50
 395
 425
 111
 382
 244
 282
 291
 406
  35
  36
 201
 515
 446
 131
  12
 134
 350
 238
 119
 464
   8
 292
 317
 139
 102
 343
   5
 184
 551
  32
 480
  25
 142
 269
 442
 303
  68
 157
 427
 140
  80
  61
 232
  62
  81
 478
 316
 635
 105
 107
 633
 156
  72
 258
  83
 158
 216
 651
 372
 356
 625
 191
 512
 330
 165
 479
 617
 178
 612
 297
 163
  39
  85
 474
 531
 332
  66
 396
 279
 228
 368
 452
 106
 118
 362
 361
 418
 311
 604
  21
 207
 655
 459
 434
 278
  63
 467
 387
 567
 273
 257
 626
 321
 179
 619
 348
 100
  92
  65
 475
 391
 180
 195
  14
 148
 529
 630
 246
 524
 355
 136
 460
 209
 305
 304
 492
 408
 344
 389
 266
 439
 313
 496
 221
 196
 599
 145
 359
 281
 412
 141
 123
  99
 122
 171
  45
 641
 240
 358
 254
 378
 380
 463
 352
 114
  82
  94
 433
 377
 481
 199
 309
 642
 398
 385
 285
 113
 190
 388
 336
 636
 436
 299
 235
  29
 331
 320
 264
 263
 365
 366
 363
 364
 109
 259
 469
 465
 384
 234
 150
 450
 484
 564
 341
 333
 447
 334
 177
  49
 371
 347
 401
 276
 351
 310
 415
 290
 455
 449
 500
 174
 256
  43
 346
 431
 152
 386
 503
 405
 507
 403
 181
 423
 308
 430
 583
 322
 164
 550
 390
 153
 265
 646
 429
 624
 166
 407
 127
 175
 212
 420
 409
 399
 456
 340
 272
 623
 354
 506
 497
 312
 489
 510
 208
 168
 457
 186
 379
 393
 374
 482
 260
 155
 461
 392
 432
 445
 120
 271
 302
 454
 509
 335
 193
 370
 275
 516
 369
 589
 544
 530
 252
 289
 493
 137
 220
 242
 204
 342
 451
 375
 650
 569
 501
 555
 440
 494
 527
 307
 476
 490
 519
 315
 438
  88
 611
 394
 487
 594
 402
 598
 517
 437
 404
 597
 462
 441
 537
 239
 538
  78
 185
 468
 505
 588
 495
 132
 117
 584
 540
 298
 601
 616
 491
 295
 473
 253
 533
 532
 561
 602
 618
 511
 513
 563
 523
 600
 592
 553
 472
 591
 579
 595
 593
 414
 202
 556
 543
 605
 488
 552
 565
 560
 578
 558
 534
 587
 539
 376
 557
 608
 277
 656
 486
 571
 570
 577
 610
 609
 606
 572
 559
 547
 614
 603
 410
 522
 634
 562
 518
 471
 596
 545
 542
 549
 502
 548
 590
 508
 504
 318
 541
 525
 535
 582
 267
 554
 576
 637
 566
 575
 573
 499
 245
 580
 632
 526
 243
 536
 284
 144
 435
 421
 236
 280
 629
 628
 296
 640
 615
 225
 585
 222
 353
 607
 643
 631
 329
 345
 639
 138
 419
 653
 485
 233
 648
 270
 568

That says row 1 is a reversed copy of row 104, row 2 is a reversed copy of row 33, and so on.

Checking the first few, they do seem to be reversed pairs of rows:

T{r(1,:),:} % rows 1 and 104
ans = 2x2 cell array
    {'10N.286.55.E1'}    {'10N.286.54.B1'}
    {'10N.286.54.B1'}    {'10N.286.55.E1'}
T{r(2,:),:} % rows 2 and 33
ans = 2x2 cell array
    {'10N.286.46.E8'}    {'10N.261.52.F7'}
    {'10N.261.52.F7'}    {'10N.286.46.E8'}
T{r(3,:),:} % rows 3 and 69
ans = 2x2 cell array
    {'10N.286.54.A6'}    {'10N.286.54.B1'}
    {'10N.286.54.B1'}    {'10N.286.54.A6'}

I'm not sure exactly what you want to do with this information.

1 Comment
Show -1 older commentsHide -1 older comments

Voss on 7 May 2024

Open in MATLAB Online

table.csv

Here's a slight modification that's useful for removing one of each pair of reversed rows from the table:

T = readtable('table.csv')
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T = 656x2 table
     x10N_286_54_E7        x10N_286_54_B1  
    _________________    __________________

    {'10N.286.55.E1'}    {'10N.286.54.B1' }
    {'10N.286.46.E8'}    {'10N.261.52.F7' }
    {'10N.286.54.A6'}    {'10N.286.54.B1' }
    {'10N.286.54.A6'}    {'10N.286.54.A11'}
    {'10N.286.54.F7'}    {'10N.286.54.A11'}
    {'10N.286.54.F7'}    {'10N.286.54.A1' }
    {'10N.286.54.A6'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.A6' }
    {'10N.286.55.E1'}    {'10N.286.54.E1' }
    {'12B09'        }    {'12B09'         }
    {'10N.286.54.E7'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.E1' }
    {'10N.286.54.E1'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.A1' }
    {'10N.286.55.E8'}    {'10N.286.55.B8' }
    {'10N.286.54.A6'}    {'10N.286.54.E1' }
temp = string(T.(1)) == string(T.(2)).';
idx = tril(temp & temp.');
idx(1:size(T,1)+1:end) = false; % to avoid removing a row that is the reverse of itself, 
                                % set elements of idx along the diagonal to false
[r,~] = find(idx);
T(r,:) = []
T = 353x2 table
     x10N_286_54_E7        x10N_286_54_B1  
    _________________    __________________

    {'10N.286.55.E1'}    {'10N.286.54.B1' }
    {'10N.286.46.E8'}    {'10N.261.52.F7' }
    {'10N.286.54.A6'}    {'10N.286.54.B1' }
    {'10N.286.54.A6'}    {'10N.286.54.A11'}
    {'10N.286.54.F7'}    {'10N.286.54.A11'}
    {'10N.286.54.F7'}    {'10N.286.54.A1' }
    {'10N.286.54.A6'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.A6' }
    {'10N.286.55.E1'}    {'10N.286.54.E1' }
    {'12B09'        }    {'12B09'         }
    {'10N.286.54.E7'}    {'10N.286.55.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.E1' }
    {'10N.286.55.A1'}    {'10N.286.54.A1' }
    {'10N.286.55.E8'}    {'10N.286.55.B8' }
    {'10N.286.54.A6'}    {'10N.286.54.E1' }
    {'10N.222.49.C7'}    {'10N.222.46.C2' }

Checking again for reversed pairs of rows confirms that the only ones left are the reverse of themselves:

temp = string(T.(1)) == string(T.(2)).';
[r,~] = find(tril(temp & temp.'))
r = 3x1
    10
    35
   213
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
T(r,:)
ans = 3x2 table
    x10N_286_54_E7    x10N_286_54_B1
    ______________    ______________

      {'12B09'}         {'12B09'}   
      {'12B01'}         {'12B01'}   
      {'12G01'}         {'12G01'}   

Sign in to comment.

Answer 2

Mathieu NOE on 6 May 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2115566-how-can-i-remove-inverted-repeat-pairs-of-strings-from-a-table#answer_1453051

Open in MATLAB Online

data.txt

hello Paul

this would be my suggestion

attached your data simply pasted in a text file

hope it helps

clc
out = readcell('data.txt');
first_col = out(:,1);
second_col = out(:,2);
% main loop
n = 0;
for k = 1:numel(first_col)
tf = strcmp(first_col{k},second_col);
    if any(tf)
        n = n + 1; % increase counter
        ind1(n,1) = k;
        ind2(n,1) = find(tf);
    end
end
 % all matching pairs 
out = [ind1 ind2]
out = 4x2
     1     4
     2     5
     4     1
     5     2
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

2 Comments
Show NoneHide None

Paul Jimenez on 6 May 2024

Thanks a lot for your answer. Unfortunately I got the following error:

Unable to perform assignment because the indices on the left side are not compatible with the size of the right side.

Error in strains (line 19)

ind2(n,1) = find(tf);

Also, is there a way to remove only one of the pairs from the out list?

Mathieu NOE on 7 May 2024

Open in MATLAB Online

table.csv

hello again

seems that in the csv file , in each column you have duplicates of strings

so I simply asked to perform the same process but taking only the unique strings in consideration , but of course it's not the same list as your original file

it is what you wanted or not ?

data = readcell('table.csv');
first_col = unique(data(:,1));
second_col = unique(data(:,2));
% main loop
n = 0;
for k = 1:numel(first_col)
tf = strcmp(first_col{k},second_col);
    if any(tf)
        n = n + 1; % increase counter
        ind1(n,1) = k;
        ind2(n,1) = find(tf);
    end
end
 % all matching pairs  
out = [ind1 ind2]
out = 97x2
     1     1
     2     2
     3     3
     4     4
     5     5
     6     6
     7     7
     8     8
     9     9
    10    10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Sign in to comment.

How can I remove inverted repeat pairs of strings from a table?

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

How can I remove inverted repeat pairs of strings from a table?

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None