# How do I count the total number 'AA' repeats in the text? I managed to count how many times it appears in each cell, but don't know how to add it up.

4 views (last 30 days)
Nitzan Kahn on 3 May 2018
Commented: Nitzan Kahn on 5 May 2018
clear all;
Seq_Out_File='./Seq_Out.txt';
fileID=fopen(Seq_Out_File);
C=textscan(fileID, '%s');
NAA=(strfind(C{1},'AA'));
x=cellfun('length', NAA);
##### 2 CommentsShow 1 older commentHide 1 older comment
Nitzan Kahn on 5 May 2018
AAA is two matches

Walter Roberson on 3 May 2018
no_overlap_count = length(regexp(S, 'AA'));
with_overlap_count = length(regexp(S, 'A(?=A)'));
Nitzan Kahn on 5 May 2018
This is exactly what i needed, thanks for your help!

John BG on 4 May 2018
Edited: John BG on 4 May 2018
Hi Nitzan Khan
1.-
According to
AAA counts as 2x AA and AAAA would count as 3x AA.
.
2.-
Also, you asked for AA match but you may want to really count all possible outcomes, all possible pairs of the basic sequence 'ACTG'
A basic equivalent to the Stack Overflow code in MATLAB would be:
A1='CTACTGCGACTTATGCCCATAATTGGCCACAATAAGTTTCTCGGATTCGCAGGTACCCTCGAGAGTATGGTCGTGGACTCAACCTTAGAGGCAACGGAGT'
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A1,L2(k,:))
nRep=[nRep numel(t1)]
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L3=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L3=L3(:)'
L3(end)=[]
str3=['T1=table(' L3 ')']
evalin('base',str3)
T1 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
5 7 6 7 6 4 7 6 7 6 5 5 7 5 6 7
.
3.-
For a complete sequence in a text file:
clear all;clc;close all
L1='ACTG'
nL=combinator(4,2) % SChwarz's legendary function available here:
L2=L1(nL)
cell1={}
for k=1:1:size(L2,1)
cell1=[cell1 L2(k,:)]
end
nRep=[];
for k=1:1:size(L2,1)
[t1,t2]=regexp(A,L2(k,:));
nRep=[nRep numel(t1)];
end
for k=1:1:size(L2,1)
str1=['n' L2(k,:) '=' num2str(nRep(k))]
evalin('base',str1)
end
L32=[repmat('n',size(L2,1),1) L2 repmat(',',size(L2,1),1)]'
L32=L32(:)'
L32(end)=[]
.
the count table for all pairs is:
.
str3=['T2=table(' L32 ')']
T2 =
1×16 table
nAA nAC nAT nAG nCA nCC nCT nCG nTA nTC nTT nTG nGA nGC nGT nGG
_____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
48684 62452 62140 62799 62493 48543 62702 62323 62328 62690 48502 62482 62491 62410 62683 48427
.
Besides the sequence used, also attached the saved variables in .mat file.
You can add more sequences, one each row, as show in the link above
.
thanks in advance for time and attention
John BG