Clear Filters
Clear Filters

How to search nucleotide sequences with regexp?

2 views (last 30 days)
Hello everyone,
I am trying to search a huge list of 23 322 DNA sequences for this sequence:
XTTATTATTATTATTATTATTATTY
Where T and A are the usual bases, and I want X and Y to be A, C, T, or G, length 1. I am looking for this (TTA)7TT repeat core sequence and trying to find what are the bases immediately flanking it.
So I am using the regular expression:
[ACTG]{1,1}TTATTATTATTATTATTATTATT[ACTG]{1,1}
And I get 30 results. When I search for the flanking residues manually and sum up those results, using regular expressions like this:
ATTATTATTATTATTATTATTATTA
GTTATTATTATTATTATTATTATTG
CTTATTATTATTATTATTATTATTC
TTTATTATTATTATTATTATTATTT
and so on, I get 47 results. The first regular expression should be able to find all of the results in one go but apparently it does not. So I think I have made an error in constructing my first regular expression, because it is not finding all of the results. If there are any regular expression masters out there, I would greatly appreciate your help.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!