blastlocal

Perform search on local BLAST database to create BLAST report

Syntax

blastlocal('InputQuery', InputQueryValue)
Data = blastlocal('InputQuery', InputQueryValue)

... blastlocal(..., 'Program', ProgramValue, ...)
... blastlocal(..., 'Database', DatabaseValue, ...)
... blastlocal(..., 'BlastPath', BlastPathValue, ...)
... blastlocal(..., 'Expect', ExpectValue, ...)
... blastlocal(..., 'Format', FormatValue, ...)
... blastlocal(..., 'ToFile', ToFileValue, ...)
... blastlocal(..., 'Filter', FilterValue, ...)
... blastlocal(..., 'GapOpen', GapOpenValue, ...)
... blastlocal(..., 'GapExtend', GapExtendValue, ...)
... blastlocal(..., 'BLASTArgs', BLASTArgsValue, ...)

Input Arguments

InputQueryValueString specifying the file name or path and file name of a FASTA file containing query nucleotide or amino acid sequence(s). (This corresponds to the blastall option -i.)
ProgramValue

String specifying a BLAST program. Choices are:

  • 'blastp' (default) — Search protein query versus protein database.

  • 'blastn' — Search nucleotide query versus nucleotide database.

  • 'blastx' — Search translated nucleotide query versus protein database.

  • 'tblastn' — Search protein query versus translated nucleotide database.

  • 'tblastx' — Search translated nucleotide query versus translated nucleotide database.

(The ProgramValue argument corresponds to the blastall option -p.)

DatabaseValueString specifying a file name or path and file name of a local BLAST database (formatted using the NCBI formatdb function) to search. Default is a local version of the nr database in the MATLAB® current folder. (This corresponds to the blastall option -d.)
BlastPathValueString specifying the full path to the blastall executable file, including the name and extension of the executable file. Default is the system path.
ExpectValueValue specifying the statistical significance threshold for matches against database sequences. Choices are any real number. Default is 10. (This corresponds to the blastall option -e.)
FormatValueInteger specifying the alignment format of the BLAST search results. Choices are:
  • 0 (default) — Pairwise

  • 1 — Query-anchored, showing identities

  • 2 — Query-anchored, no identities

  • 3 — Flat query-anchored, showing identities

  • 4 — Flat query-anchored, no identities

  • 5 — Query-anchored, no identities and blunt ends

  • 6 — Flat query-anchored, no identities and blunt ends

  • 8 — Tabular

  • 9 — Tabular with comment lines

(This corresponds to the blastall option -m.)

ToFileValueString specifying a file name or path and file name in which to save the contents of the BLAST report. (This corresponds to the blastall option -o.)
FilterValueControls the application of a filter (DUST filter for the blastn program or SEG filter for other programs) to the query sequence(s). Choices are true (default) or false. (This corresponds to the blastall option -F.)
GapOpenValueInteger that specifies the penalty for opening a gap in the alignment of sequences. Default is -1. (This corresponds to the blastall option -G.)
GapExtendValueInteger that specifies the penalty for extending a gap in the alignment of sequences. Default is -1. (This corresponds to the blastall option -E.)
BLASTArgsValueNCBI blastall command string, that is a string containing one or more instances of -x and the option associated with it, used to specify input arguments. For an example, see step 2 in Examples.

Output Arguments

DataMATLAB structure or array of structures (if multiple query sequences) containing fields corresponding to BLAST keywords and data from a local BLAST report.

Description

This function assumes that

The Basic Local Alignment Search Tool (BLAST) offers a fast and powerful comparative analysis of protein and nucleotide sequences against known sequences in online or local databases.

    Note:   To use the blastlocal function, you must have a local copy of the NCBI blastall executable file (version 2.2.17) available from your system. You can download the blastall executable file by accessing BLAST+ executables, then clicking the download link under the blast column for your platform. Run the downloaded executable and configure it for your system.

    For more information, see the readme file on the NCBI ftp site.

    For convenience, consider placing the NCBI blastall executable file on your system path.

blastlocal('InputQuery', InputQueryValue) submits query sequence(s) specified by InputQueryValue, a FASTA file containing nucleotide or amino acid sequence(s), for a BLAST search of a local BLAST database, by calling a local version of the NCBI blastall executable file. The BLAST search results are displayed in the MATLAB Command Window. (This corresponds to the blastall option -i.)

Data = blastlocal('InputQuery', InputQueryValue) returns the BLAST search results in Data, a MATLAB structure or array of structures (if multiple query sequences) containing fields corresponding to BLAST keywords and data from a local BLAST report.

Data contains a subset of the following fields, based on the specified alignment format.

FieldDescription
AlgorithmNCBI algorithm used to do a BLAST search.
QueryIdentifier of the query sequence submitted to a BLAST search.
LengthLength of the query sequence.
DatabaseAll databases searched.
Hits.NameName of a database sequence (subject sequence) that matched the query sequence.
Hits.ScoreAlignment score between the query sequence and the subject sequence.
Hits.ExpectExpectation value for the alignment between the query sequence and the subject sequence.
Hits.LengthLength of a subject sequence.
Hits.HSPs.ScorePairwise alignment score for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.ExpectExpectation value for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.IdentitiesIdentities (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.Positives Identical or similar residues (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject amino acid sequence.

    Note:   This field applies only to translated nucleotide or amino acid query sequences and/or databases.

Hits.HSPs.Gaps Nonaligned residues (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.MismatchesResidues that are not similar to each other (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.FrameReading frame of the translated nucleotide sequence for a high-scoring sequence pair between the query sequence and a subject sequence.

    Note:   This field applies only when performing translated searches, that is, when using tblastx, tblastn, and blastx.

Hits.HSPs.Strand Sense (Plus = 5' to 3' and Minus = 3' to 5') of the DNA strands for a high-scoring sequence pair between the query sequence and a subject sequence.

    Note:   This field applies only when using a nucleotide query sequence and database.

Hits.HSPs.Alignment Three-row matrix showing the alignment for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.QueryIndicesIndices of the query sequence residue positions for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.SubjectIndicesIndices of the subject sequence residue positions for a high-scoring sequence pair between the query sequence and a subject sequence.
Hits.HSPs.AlignmentLengthLength of the pairwise alignment for a high-scoring sequence pair between the query sequence and a subject sequence.
AlignmentEntire alignment for the query sequence and the subject sequence(s).
StatisticsSummary of statistical details about the performed search, such as lambda values, gap penalties, number of sequences searched, and number of hits.

... blastlocal(..., 'PropertyName', PropertyValue, ...) calls blastlocal with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows.


... blastlocal(..., 'Program', ProgramValue, ...)
specifies the BLAST program. Choices are 'blastp' (default), 'blastn', 'blastx', 'tblastn', and 'tblastx'. (This corresponds to the blastall option -p.) For help in selecting an appropriate BLAST program, visit:

http://blast.ncbi.nlm.nih.gov/producttable.shtml

... blastlocal(..., 'Database', DatabaseValue, ...) specifies the local BLAST database (formatted using the NCBI formatdb function) to search. Default is a local version of the nr database in the MATLAB current folder. (This corresponds to the blastall option -d.)

... blastlocal(..., 'BlastPath', BlastPathValue, ...) specifies the full path to the blastall executable file, including the name and extension of the executable file. Default is the system path.

... blastlocal(..., 'Expect', ExpectValue, ...) specifies a statistical significance threshold for matches against database sequences. Choices are any real number. Default is 10. (This corresponds to the blastall option -e.) You can learn more about the statistics of local sequence comparison at:

http://blast.ncbi.nlm.nih.gov/tutorial/Altschul-1.html#head2

... blastlocal(..., 'Format', FormatValue, ...) specifies the alignment format of the BLAST search results. Choices are:

  • 0 (default) — Pairwise

  • 1 — Query-anchored, showing identities

  • 2 — Query-anchored, no identities

  • 3 — Flat query-anchored, showing identities

  • 4 — Flat query-anchored, no identities

  • 5 — Query-anchored, no identities and blunt ends

  • 6 — Flat query-anchored, no identities and blunt ends

  • 7 — Not used

  • 8 — Tabular

  • 9 — Tabular with comment lines

(This corresponds to the blastall option -m.)

... blastlocal(..., 'ToFile', ToFileValue, ...) saves the contents of the BLAST report to the specified file. (This corresponds to the blastall option -o.)

... blastlocal(..., 'Filter', FilterValue, ...) specifies whether a filter (DUST filter for the blastn program or SEG filter for other programs) is applied to the query sequence(s). Choices are true (default) or false. (This corresponds to the blastall option -F.)

... blastlocal(..., 'GapOpen', GapOpenValue, ...) specifies the penalty for opening a gap in the alignment of sequences. Default is -1. (This corresponds to the blastall option -G.)

... blastlocal(..., 'GapExtend', GapExtendValue, ...) specifies the penalty for extending a gap in the alignment of sequences. Default is -1. (This corresponds to the blastall option -E.)

... blastlocal(..., 'BLASTArgs', BLASTArgsValue, ...) specifies options using the input arguments for the NCBI blastall function. BLASTArgsValue is a string containing one or more instances or -x and the option associated with it. For example, to specify the BLOSUM 45 matrix, you would use the following syntax:

blastlocal('InputQuery', ecoliquery.txt, 'BLASTArgs', '-M BLOSUM45')

    Tip   Use the 'BlastArgs' property to specify blastall options for which there are no corresponding property name/property value pairs.

    Note:   For a complete list of valid input arguments for the NCBI blastall function, make sure that the blastall executable file is located on your system path or current folder, then type the following at your system's command prompt.

    blastall -

Using blastall Syntax

You can also use the syntax and input arguments accepted by the NCBI blastall function, instead of the property name/property value pairs listed previously. To do so, supply a single string containing multiple options using the -x option syntax. For example, you can specify the ecoliquery.txt FASTA file as your query sequences, the blastp program, and the ecoli local database, by using

blastlocal('-i ecoliquery.txt -p blastp -d ecoli')

    Note:   For a complete list of valid input arguments for the NCBI blastall function, make sure that the blastall executable file is located on your system path or current folder, then type the following at your system's command prompt.

    blastall -

Examples

The following examples assume you have a FASTA nucleotide file and a FASTA amino acid file for E. coli, such as the files NC_004431.fna and NC_004431.faa, which you can download from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/, saved to your MATLAB current folder.

Performing a Nucleotide Translated Search

  1. Create local blastable databases from the NC_004431.fna and NC_004431.faa FASTA files by using the blastformat function.

    blastformat('inputdb', 'NC_004431.fna', 'protein', 'false');
    blastformat('inputdb', 'NC_004431.faa');
  2. Use the getgenbank function to retrieve sequence information for the E. coli threonine operon from the GenBank® database.

    S = getgenbank('M28570');
  3. Create a query file by using the fastawrite function to create a FASTA file named query_nt.fa from this sequence information, using only the accession number as the header.

    S.Header = S.Accession;
    fastawrite('query_nt.fa', S);
  4. Use MATLAB syntax to submit the query sequence in the query_nt.fa FASTA file for a BLAST search of the local amino acid database NC_004431.faa. Specify the BLAST program blastx. Return the BLAST search results in results, a MATLAB structure.

    results = blastlocal('inputquery', 'query_nt.fa',...
                         'database', 'NC_004431.faa',...
                         'program',  'blastx');

Performing a Nucleotide Search Using blastall Syntax

  1. If you have not already done so, create local blastable databases and a query file as described in steps 1 through 3 in Performing a Nucleotide Translated Search.

  2. Use blastall syntax to submit the query sequence in the query_nt.fa FASTA file for a BLAST search of the local nucleotide database NC_004431.fna. Specify the BLAST program blastn and an expectation value of 0.0001. Return the BLAST search results in results, a MATLAB structure.

    results = blastlocal('-i query_nt.fa -d NC_004431.fna ...
                          -p blastn -e 0.0001');

Performing a Nucleotide Search and Creating a Formatted Report

  1. If you have not already done so, create local blastable databases and a query file as described in steps 1 through 3 in Performing a Nucleotide Translated Search.

  2. Submit the query sequence in the query_nt.fa FASTA file for a BLAST search of the local nucleotide database NC_004431.fna. Specify the BLAST program blastn and a tabular alignment format. Save the contents of the BLAST report to a file named myecoli_nt.txt.

    blastlocal('inputquery', 'query_nt.fa',...
               'database', 'NC_004431.fna', 'tofile',...
               'myecoli_nt.txt', 'blastargs', '-p blastn -m 8');

References

[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

[2] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

For more information on the NCBI blastall function, see:

http://blast.ncbi.nlm.nih.gov/docs/blastall.html
Was this topic helpful?