Problem 79. DNA N-Gram Distribution
Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.
So for
s = 'AACTGAACG'
and
n = 3
we get the following n-grams (trigrams):
AAC, ACT, CTG, TGA, GAA, AAC, ACG
Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.
Solution Stats
Problem Comments
-
1 Comment
E Chang
on 22 Oct 2018
It should be noted that spaces should be ignored or else test suites 3 and 5 fail.
Solution Comments
Show commentsProblem Recent Solvers1357
Suggested Problems
-
Replace NaNs with the number that appears to its left in the row.
3017 Solvers
-
Count from 0 to N^M in base N.
236 Solvers
-
697 Solvers
-
We love vectorized solutions. Problem 1 : remove the row average.
836 Solvers
-
The Answer to Life, the Universe, and Everything
545 Solvers
More from this Author96
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!