BWAMEMOptions
Description
Creation
Description
creates a
bwamemOpt
= BWAMEMOptionsBWAMEMOptions
object with the default property values.
BWAMEMOptions
requires the BWA Support Package for Bioinformatics Toolbox™. If the support package is not installed, then the function provides a download
link. For details, see Bioinformatics Toolbox Software Support Packages.
sets
the object properties using
one or more name-value pair arguments. Enclose each property name in quotes. For example,
bwamemOpt
= BWAMEMOptions(Name,Value)bwamemOpt = BWAMEMOptions('BandWidth',90)
sets the maximum allowable
gap length to 90.
specifies optional parameters using a string or character vector
bwamemOpt
= BWAMEMOptions(S
)S
.
Input Arguments
S
— bwamem
options
character vector | string
bwamem
options, specified as a character vector or string.
S
must be in the bwa mem
option syntax
(prefixed by one or two dashes).
Example: '-k14 -W20 -r10'
Properties
AlternativeHitsThreshold
— Threshold for determining which hits receive XA tag in output SAM file
[5 200]
(default) | nonnegative integer | two-element numeric vector
Threshold for determining which hits receive an XA tag in the output SAM file, specified as a nonnegative integer n or two-element numeric vector [n m]
, where n and m must be nonnegative integers.
If a read has less than n hits with a score greater than 80% of the best score for that read, all hits receive an XA tag in the output SAM file.
When you also specify m, the software returns up to m hits if the hit list contains a hit to an ALT contig.
Data Types: double
AppendReadCommentsToSAM
— Flag to append FASTA or FASTQ comments to output SAM file
false
(default) | true
Flag to append FASTA or FASTQ comments to the output SAM file, specified as
true
or false
. The comments appear as text
after a space in the file header.
Data Types: logical
BandWidth
— Maximum allowable gap length
100
(default) | nonnegative integer
Maximum allowable gap length, specified as a nonnegative integer.
Data Types: double
BasesPerBatch
— Number of bases per batch
[]
(default) | positive integer
Number of bases per batch, specified as a positive integer.
If you do not specify BasesPerBatch
, the software uses 1e7
* NumThreads
by default. NumThreads
is the number of
parallel threads available when you run bwamem
.
If you specify BasesPerBatch
, the software uses that exact number
and does not multiply the number by NumThreads
. This rule applies
regardless of whether you explicitly set NumThreads
or not.
However, if you specify NumThreads
but not
BasesPerBatch
, the software uses 1e7 *
NumThreads
.
The batch size is proportional to the number of parallel threads in use. Using different numbers of threads might produce different outputs. Specifying this option helps with the reproducibility of results.
Data Types: double
ClipPenalty
— Penalty for clipped alignments
[5 5]
(default) | nonnegative integer | two-element numeric vector
Penalty for clipped alignments, specified as a nonnegative integer or two-element numeric
vector. Each read has the best score for an alignment that spans the length
of the read. The software does not clip alignments that do not span the
length of the read and do not score higher than the sum of
ClipPenalty
and the best score of the full-length
read.
Specify a nonnegative integer to set the same penalty for both 5'
and 3'
clipping.
Specify a two-element numeric vector to set different penalties for 5'
and
3'
clipping.
Data Types: double
DropChainFraction
— Threshold for dropping chains relative to longest overlapping chain
0.5
(default) | scalar between 0
and 1
Threshold for dropping chains relative to the longest overlapping chain, specified as
a scalar between 0
and 1
.
The software drops chains that are shorter than DropChainFraction * (longest
overlapping chain length)
.
Data Types: double
DropChainLength
— Minimum number of bases
0
(default) | nonnegative integer
Minimum number of bases in seeds forming a chain, specified as a nonnegative integer. The
software drops chains shorter than
DropChainLength
.
Data Types: double
ExtraCommand
— Additional commands
""
(default) | character vector | string
Additional commands, specified as a character vector or string.
The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB® properties.
When the software converts the original flags to MATLAB properties, it stores any unrecognized flags in this property.
Example: '-y'
Data Types: char
| string
FastaHeaderToXR
— Flag to include FASTA header in XR tag
false
(default) | true
Flag to include the FASTA header in the XR tag, specified as true
or false
.
Data Types: logical
GapExtensionPenalty
— Gap extension penalty
[1 1]
(default) | nonnegative integer | two-element numeric vector
Gap extension penalty, specified as a nonnegative integer or two-element numeric vector [n m]
. n is the penalty for extending a deletion. m is the penalty for extending an insertion.
If you specify a nonnegative integer, the software uses it as the penalty for extending a deletion or an insertion.
Data Types: double
GapOpenPenalty
— Gap opening penalty
[6 6]
(default) | nonnegative integer | two-element numeric vector
Gap opening penalty, specified as a nonnegative integer or two-element numeric vector [n m]
. n is the penalty for opening a deletion. m is the penalty for opening an insertion.
If you specify a nonnegative integer, the software uses it as the penalty for opening a deletion or an insertion.
Data Types: double
HeaderInsert
— Text to insert into header of output SAM file
[0x0 string]
(default) | character vector | string
Text to insert into the header of the output SAM file, specified as a character vector or string.
Use one of the following:
Character vector or string that starts with
@
to insert the exact text to the SAM headerCharacter vector or string that is a file name, where each line in the file must start with
@
Data Types: char
| string
IncludeAll
— Flag to use all object properties
false
(default) | true
Flag to include all the object properties with the
corresponding default values when converting to the original options syntax, specified as
true
or false
. You can convert the properties to the
original syntax prefixed by one or two dashes (such as '-d 100 -e 80'
) by
using getCommand
. The
default value false
means that when you call
getCommand(optionsObject)
, it converts only the specified properties.
If the value is true
, getCommand
converts all available
properties, with default values for unspecified properties, to the original syntax.
Note
If you set IncludeAll
to true
, the software
converts all available properties, using default values for unspecified properties. The
only exception is when the default value of a property is NaN
,
Inf
, []
, ''
, or
""
. In this case, the software does not translate the
corresponding property.
Example: true
Data Types: logical
InsertSizeStatistics
— Insert size distribution parameters
[1x0 double]
(default) | four-element numeric array
Insert size distribution parameters, specified as a four-element numeric array [mean std max min]
.
mean is the mean insert size.
std is the standard deviation.
max is the maximum insert size.
min is the minimum insert size.
If you specify n elements array, where n is less than four, the elements specify the first n distribution parameters. By default, the software infers unspecified parameters from data.
Data Types: double
MarkShortSplitsSecond
— Flag to mark shorter split hits as secondary
false
(default) | true
Flag to mark the shorter split hits as secondary in the SAM flag, specified as true
or false
.
Data Types: logical
MarkSmallestCoordinatePrimary
— Flag to mark segment with smallest coordinates as primary
false
(default) | true
Flag to mark the segment with the smallest coordinates as primary when the alignment is split, specified as true
or false
.
Data Types: logical
MatchScore
— Score for sequence match
1
(default) | nonnegative integer
Score for a sequence match, specified as a nonegative integer.
Data Types: double
MaxMemOccurrence
— Maximum number of MEM occurrences
500
(default) | positive integer
Maximum number of MEM (maximal exact match) occurrences for each read before it is discarded, specified as a positive integer.
Data Types: double
MaxRoundsMateRescue
— Maximum number of rounds of mate rescue
50
(default) | nonnegative integer
Maximum number of rounds of mate rescue for each read, specified as a nonnegative integer. The software uses the Smith-Waterman (SW) algorithm for the mate rescue.
Data Types: double
MinSeedLength
— Minimum seed length
19
(default) | positive integer
Minimum seed length, specified as a positive integer. The software discards any matches shorter than the minimum seed length.
Data Types: double
MismatchPenalty
— Penalty for alignment mismatch
4
(default) | nonnegative integer
Penalty for an alignment mismatch, specified as a nonnegative integer.
Data Types: double
NumThreads
— Number of parallel threads to use
1
(default) | positive integer
Number of parallel threads to use, specified as a positive integer. Threads are run on separate processors or cores. Increasing the number of threads generally improves the runtime significantly, but increases the memory footprint.
Data Types: double
OutputAllAlignments
— Flag to return all found alignments
false
(default) | true
Flag to return all found alignments including unpaired and paired-end reads, specified
as true
or false
. If the value is
true
, the software returns all found alignments and marks them as
secondary alignments.
Data Types: logical
OutputScoreThreshold
— Score threshold for returning alignments
30
(default) | positive integer
Score threshold for returning alignments, specified as a positive integer. Specify the minimum score that alignments must have to be in the output file.
Data Types: double
ReadGroupLine
— Text to insert into read group header
[0x0 string]
(default) | character vector | string
Text to insert into the read group (RG) header line in the output file, specified as a character vector or string.
Data Types: char
| string
ReadType
— Type of reads to align
[0x0 string]
(default) | 'pacbio
| 'ont2d
| 'intractg'
Type of reads to align, specified as a character vector or string. Each read type has different default parameter values to use during alignment. You can overwrite any parameters. Valid options are:
'pacbio'
— PacBio reads'ont2d'
— Oxford nanopore 2D reads'intractg'
— Intra-species contigs
The parameter values are as follows.
|
The equivalent native syntax is |
|
The equivalent native syntax is |
|
The equivalent native syntax is |
Data Types: char
| string
ReduceSupplementaryMAPQ
— Flag to reduce mapping quality (MAPQ) score of supplementary alignments
true
(default) | false
Flag to reduce the mapping quality (MAPQ) score of supplementary alignments, specified as true
or false
.
Data Types: logical
SeedSplitRatio
— Threshold for reseeding
1.50
(default) | nonnegative integer
Threshold for reseeding, specified as a nonnegative integer. Specify the seed length at which reseeding happens relative to the minimum seed length MinSeedLength
. Specifically, if a MEM (maximal exact match) is longer than MinSeedLength * SeedSplitRatio
, reseeding occurs.
Data Types: double
SkipMateRescue
— Flag to skip mate rescue
false
(default) | true
Flag to skip mate rescue, specified as true
or false
.
Mate rescue uses the Smith-Waterman (SW) algorithm to align unmapped reads with mates
that are properly aligned.
Data Types: logical
SkipPairing
— Flag to skip read pairing
false
(default) | true
Flag to skip read pairing, specified as true
or false
.
If true
, for paired-end reads, the software uses the Smith-Waterman
(SW) algorithm to rescue missing hits only and does not try to find hits that fit a
proper pair.
Data Types: logical
SmartPairing
— Flag to perform smart pairing
false
(default) | true
Flag to perform smart pairing, specified as true
or
false
. If the value is true
, the software
pairs adjacent reads that are in the same file and have the same name. Such FASTQ files
are also known as interleaved files.
Data Types: logical
SoftClipSupplementary
— Flag to soft clip supplemental alignments
false
(default) | true
Flag to soft clip supplemental alignments, specified as true
or
false
. If the value is true
, the software soft
clips both supplemental alignments and a primary alignment.
The default value is false
, which means that the software soft clips the
primary alignment and hard clips the supplemental alignments.
Data Types: logical
TreatAltAsPrimary
— Flag to treat ALT contigs as part of primary assembly
false
(default) | true
Flag to treat ALT contigs as part of the primary assembly, specified as true
or false
.
Data Types: logical
UnpairedReadPenalty
— Penalty for mapping read pairs as unpaired
17
(default) | nonnegative integer
Penalty for mapping read pairs as unpaired, specified as a nonnegative integer.
The alignment score for a paired read pair is
. The alignment score
for an unpaired read pair is read1
score +
read2
score - insert penalty
. The software
compares these two scores to force read pairing. A larger
read1
score +
read2
score - UnpairedReadPenaltyUnpairedReadPenalty
value leads to a more aggressive read
pairing.
Data Types: double
Verbosity
— Verbosity level of information printed
0
(default) | nonnegative integer
Verbosity level of information printed to the MATLAB command line while the software is running, specified as a nonnegative integer. Valid options are:
0 — For disabling all outputs to the command line.
1 — For printing error messages.
2 — For printing warning and error messages.
3 — For printing all messages.
4 — For debugging purposes only.
Data Types: double
Version
— Supported version
string
This property is read-only.
Supported version of the original bwa
software, returned as a string.
Example: "0.7.17"
Data Types: string
ZDropOff
— Cutoff for Smith-Waterman extension
100
(default) | nonnegative integer
Cutoff for the Smith-Waterman (SW) extension, specified as a nonnegative integer. The software uses the following expression:
, where i and j are the current positions of the query and reference, respectively. When the difference between the best score and current extension score is larger than this expression value, the software terminates the SW extension.
Data Types: double
Object Functions
getCommand | Translate object properties to original options syntax |
getOptionsTable | Return table with all properties and equivalent options in original syntax |
Examples
Align Reads to Reference Sequence Using BWA
This example requires the BWA Support Package for Bioinformatics Toolbox™. If the support package is not installed, the software provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.
Build a set of index files for the Drosophila genome. This example uses the reference sequence Dmel_chr4.fa
, provided with the toolbox. The 'Prefix'
argument lets you define the prefix of the output index files. You can also include the file path information. For this example, define the prefix as Dmel_chr4
and save the index files in the current directory.
bwaindex('Dmel_chr4.fa','Prefix','./Dmel_chr4');
As an alternative to specifying name-value pair arguments, you can use the BWAIndexOptions
object to specify the indexing options.
indexOpt = BWAIndexOptions; indexOpt.Prefix = './Dmel_chr4'; indexOpt.Algorithm = 'bwtsw'; bwaindex('Dmel_chr4.fa',indexOpt);
Once the index files are ready, map the read sequences to the reference using bwamem
. Two pair-end read input files are already provided with the toolbox. Using name-value pair arguments, you can specify different alignment options, such as the number of parallel threads to use.
bwamem('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam','NumThreads',4);
Alternatively, you can use BWAMEMoptions
to specify the alignment options.
alignOpt = BWAMEMOptions; alignOpt.NumThreads = 4; bwamem('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam',alignOpt)
References
[1] Li, Heng, and Richard Durbin. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics 25, no. 14 (July 15, 2009): 1754–60. https://doi.org/10.1093/bioinformatics/btp324.
[2] Li, Heng, and Richard Durbin. “Fast and Accurate Long-Read Alignment with Burrows–Wheeler Transform.” Bioinformatics 26, no. 5 (March 1, 2010): 589–95. https://doi.org/10.1093/bioinformatics/btp698.
Version History
Introduced in R2020b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)