How to perform taxonomic analysis of 16s rRNA NGS .fastq files?

11 views (last 30 days)
I have raw files from Next-generation sequencing of 16s rRNA in .fastq format and I want to analyse them to obtain the OTU and taxonomy relative abundance of all the microbial species present in the sample.
Thank you.

Accepted Answer

Tim DeFreitas
Tim DeFreitas on 29 Mar 2019
A complete answer to this question is outside the scope of a single MATLAB Answers post, I suggest reading some published papers on various approaches to reconstructing phylogeny with 16s rRNA. Here's one such paper, though there are many others: https://academic.oup.com/nar/article/36/18/e120/1070009.
In general, you will need to perform the following series of steps:
  1. Obtain reference sequences of the 16s gene (likely in FASTA format) for each of the microbial species you wish to test for. These can likely be obtained from public databases like the NCBI: https://www.ncbi.nlm.nih.gov/gene/?term=16s%20rrna. For particular sequences of interest, you can obtain these in MATLAB using getgetbank
  2. Assign each of your input reads to it's closest species match. There are several methods to do so, one way is to use blastlocal using the FASTA reference sequences from step 1 as the database, and your FASTQ reads as the queries. The relative abundance of each species can be inferred from the number of matches to each of your reference sequences.
  3. To construct a taxonomy, you must then perform a multiple alignment of the 16s gene for each of your observed species (likely a subset of your references from (1)), and construct a phylogenetic tree using the distances between each sequence. In MATLAB, this can be done with multialign, seqpdist, and seqlinkage. The definition of an OTU is not set in stone, but in general is a common set of very similar sequences. From the phytree created with seqlinkage, you can construct OTUs by providing a similarity threshold using cluster(phytree).
Feel free to ask more specific questions about any of these steps in a follow up question. If you need broader help with constructing a pipeline to do this analysis, we do offer consulting.
Hope this helps,
-Tim
  1 Comment
Mattana Pongsopon
Mattana Pongsopon on 2 Apr 2019
Hi Tim,
Thank you so much for your clear guideline. I will work through them and see if I need further help.
Best,
Mattana

Sign in to comment.

More Answers (0)

Categories

Find more on Genomics and Next Generation Sequencing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!