blastreadlocal

Read data from local BLAST report

Syntax

Data = blastreadlocal(BLASTReport, Format)

Input Arguments

BLASTReport

BLAST report specified by any of the following:

  • File name or path and file name of a locally created BLAST report file, such as returned by the blastlocal function with the 'ToFile' property.

  • MATLAB® character array that contains the text for a local BLAST report.

If you specify only a file name, that file must be on the MATLAB search path or in the current folder.

FormatInteger specifying the alignment format used to create BLASTReport. Choices are:
  • 0 — Pairwise

  • 1 — Query-anchored, showing identities

  • 2 — Query-anchored, no identities

  • 3 — Flat query-anchored, showing identities

  • 4 — Flat query-anchored, no identities

  • 5 — Query-anchored, no identities and blunt ends

  • 6 — Flat query-anchored, no identities and blunt ends

  • 7 — Not used

  • 8 — Tabular

  • 9 — Tabular with comment lines

Output Arguments

DataMATLAB structure or array of structures (if multiple query sequences) containing fields corresponding to BLAST keywords and data from a local BLAST report.

Description

The Basic Local Alignment Search Tool (BLAST) offers a fast and powerful comparative analysis of protein and nucleotide sequences against known sequences in online and local databases. BLAST reports can be lengthy, and parsing the data from the various formats can be cumbersome.

Data = blastreadlocal(BLASTReport, Format) reads BLASTReport, a locally created BLAST report file, and returns Data, a MATLAB structure or array of structures (if multiple query sequences) containing fields corresponding to BLAST keywords and data from a local BLAST report. Format is an integer specifying the alignment format used to create BLASTReport.

    Note:   The function assumes the BLAST report was produced using version 2.2.17 of the blastall executable.

Data contains a subset of the following fields, based on the specified alignment format.

FieldDescription
AlgorithmNCBI algorithm used to do a BLAST search.
QueryIdentifier of the query sequence submitted to a BLAST search.
LengthLength of the query sequence.
DatabaseAll databases searched.
Hits.NameName of a database sequence (subject sequence) that matched the query sequence.
Hits.ScoreAlignment score between the query sequence and the subject sequence.
Hits.ExpectExpectation value for the alignment between the query sequence and the subject sequence.
Hits.LengthLength of a subject sequence.
Hits.HSPs.ScorePairwise alignment score for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.ExpectExpectation value for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.IdentitiesIdentities (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.Positives Identical or similar residues (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject amino acid sequence.

    Note:   This field applies only to translated nucleotide or amino acid query sequences and/or databases.

Hits.HSPs.Gaps Nonaligned residues (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.MismatchesResidues that are not similar to each other (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.FrameReading frame of the translated nucleotide sequence for a high-scoring sequence pair between the query sequence and a subject sequence.

    Note:   This field applies only when performing translated searches, that is, when using tblastx, tblastn, and blastx.

Hits.HSPs.Strand Sense (Plus = 5' to 3' and Minus = 3' to 5') of the DNA strands for a high-scoring sequence pair between the query sequence and a subject sequence.

    Note:   This field applies only when using a nucleotide query sequence and database.

Hits.HSPs.Alignment Three-row matrix showing the alignment for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.QueryIndicesIndices of the query sequence residue positions for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.SubjectIndicesIndices of the subject sequence residue positions for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.AlignmentLengthLength of the pairwise alignment for a high-scoring sequence pair between the query sequence and a subject sequence.
AlignmentEntire alignment for the query sequence and the subject sequence(s).
StatisticsSummary of statistical details about the performed search, such as lambda values, gap penalties, number of sequences searched, and number of hits.

Examples

The following examples assume you have a FASTA nucleotide file for E. coli, such as the file NC_004431.fna, which you can download from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/, saved to your MATLAB current folder.

Reading Data Using a Tabular Alignment Format

  1. Create a local blastable database from the NC_004431.fna FASTA file.

    blastformat('inputdb', 'NC_004431.fna', 'protein', 'false');
  2. Use the getgenbank function to retrieve two sequences from the GenBank® database.

    S1 = getgenbank('M28570.1');
    S2 = getgenbank('M12565');
  3. Create a query file by using the fastawrite function to create a FASTA file named query_multi_nt.fa from these two sequences, using the only accession number as the header.

    Seqs(1).Header = S1.Accession;
    Seqs(1).Sequence = S1.Sequence;
    Seqs(2).Header = S2.Accession;
    Seqs(2).Sequence = S2.Sequence;
    fastawrite('query_multi_nt.fa', Seqs);
  4. Submit the query sequences in the query_multi_nt.fa FASTA file for a BLAST search of the local nucleotide database NC_004431.fna. Specify the BLAST program blastn and a tabular alignment format. Save the contents of the BLAST report to a file named myecoli_nt8.txt, and then read the local BLAST report.

    blastlocal('inputquery', 'query_multi_nt.fa',...
               'database', 'NC_004431.fna',...
               'tofile', 'myecoli_nt8.txt', 'program', 'blastn',...
               'format', 8);
    blastreadlocal('myecoli_nt8.txt', 8);
    

Reading Data Using a Query Anchored Format

  1. If you have not already done so, create a local blastable database and a query file as described in steps 1 through 3 in Reading Data Using a Tabular Alignment Format.

  2. Submit the query sequences in the query_multi_nt.fa FASTA file for a BLAST search of the local nucleotide database NC_004431.fna. Specify the BLAST program blastn and a query-anchored format. Save the contents of the BLAST report to a file named myecoli_nt1.txt, and then read the local BLAST report, saving the results in results, an array of structures.

    blastlocal('inputquery', 'query_multi_nt.fa',...
               'database', 'NC_004431.fna',...
               'tofile', 'myecoli_nt1.txt', 'program', 'blastn',...
               'format', 1);
    results = blastreadlocal('myecoli_nt1.txt', 1);
    

References

[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

[2] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

For more information about reading and interpreting BLAST reports, see:

http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs
Was this topic helpful?