Problem 811. Genome Sequence 004: Long 3rd Generation Segment Correction
The Melopsittacus undulates genome, Parrot Budgerigar, was successfully sequenced in July 2012 using long 3rd Gen sequences provided by PacBio. The Assemblathon Genome Contest led the team of Phillippy, Koren and Jarvis to successfully Sequence Parrot DNA using the PacBio 3rd Generation data and Illumina 2nd Gen data.
The 3rd gen PacBio data is very long, 1K-20K, but has 15% error rate. The Illumina data is 100-500 long with <1% error rate. Jarvis and his team combined this data to achieve < 0.1% error rate.
Genome Challenge 004 is the correction of simplified PacBio simulated reads with high error rate.
Input:
Call 1: empty array, segment Width, Flag=0
Call 2: N PacBio DNA vectors (N x width), Segment Width, Flag=1
Output:
Call 1: empty vector, Number of Requested Vectors
Call 2: Corrected DNA vector, Number of Requested Vectors
Score: Number of N vectors used to produce correct vector for w=1024 case
The first call to the PacBio_fix routine returns the number of vectors requested to produce a final product. This may be a function of w.
The second call to PacBio_fix will have a DNA matix (N x width) and flag=1.
The response to the second call is the fixed DNA sequence, vector of width w.
example: First call return : N=3
01230123111122223333 Truth Input example 01232123112122221332 Injected errors 01130123111122123323 11230133121122223333
Output: 01230123111122223333 Truth, hopefully
This data is simplified by only having simple substitutions and the data sets are provided pre-aligned.
The real PacBio data is quite a bit more complicated. Values may be added, deleted, substituted, and are of varying lengths. This causes alignment issues.
Follow-Up Challenges: Sample Data from the PacBio site for Lambda Phage will be molded into various Challenges. Possible challenges are correcting individual long segments and assembling multiple long segments into the full Lambda Phage genome. The Parrot genome is too big for Cody to solve in 50 seconds.
Solution Stats
Solution Comments
Show commentsProblem Recent Solvers2
Suggested Problems
-
How to find the position of an element in a vector without using the find function
2728 Solvers
-
Back to basics 17 - white space
271 Solvers
-
Accessing elements on the diagonal
101 Solvers
-
Create matrix of replicated elements
378 Solvers
-
Who has power to do everything in this world?
442 Solvers
More from this Author308
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!