bioinfo.pipeline.block.SeqTrim
Description
A SeqTrim
block enables you to trim sequences based on a
specified criterion.
Creation
Syntax
Description
creates a
b
= bioinfo.pipeline.block.SeqTrimSeqTrim
block.
also specifies additional b
= bioinfo.pipeline.block.SeqTrim(options
)options
.
specifies additional options as the property names and values of a b
= bioinfo.pipeline.block.SeqTrim(Name=Value
)SeqTrimOptions
object. This object is set as the value of the Options
property of
the block.
Note
The block always overwrites existing output files, unlike the seqtrim
function.
Input Arguments
options
— SeqTrim options
bioinfo.pipeline.options.SeqTrimOptions
SeqTrim options, specified as a SeqTrimOptions
object.
Data Types: char
| string
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Note
The following list of arguments is a partial list. For the complete list, refer to
the properties of
SeqTrimOptions
object.
Encoding
— Base quality encoding format
'Illumina18'
(default) | 'Sanger'
| 'Solexa'
| 'Illumina13'
| 'Illumina15'
Base quality encoding format, specified as a character vector or string.
Method
— Criterion to trim sequences
'MaxNumberLowQualityBases'
(default) | 'MaxPercentLowQualityBases'
| 'MeanQuality'
| 'BasePositions'
| 'Termini'
Criterion to trim sequences, specified as one of the following options. Specify only one trimming criterion per function call.
'MaxNumberLowQualityBases'
– applies a maximum threshold on the number of low-quality bases allowed before trimming a sequence starting at the5'
end.'MaxPercentLowQualityBases'
– applies a maximum threshold on the percentage of low-quality bases allowed before trimming a sequence starting at the5'
end.'MeanQuality'
– applies a minimum threshold on the running average base quality allowed before trimming a sequence starting at the5'
end.'BasePositions'
– trims each sequence according to the base positions (first base and last base) starting at the5'
end.'Termini'
– trims each sequence from either the5'
or3'
end or from both ends.
Use this name-value pair argument together with 'Threshold'
to specify the appropriate threshold value. Depending on the trimming criterion, the corresponding value for 'Threshold'
varies. See the 'Threshold'
option for the default values.
Note
Sequences resulting in empty sequences after trimming are saved in the output files as empty sequences. To remove empty sequences from files, use the seqfilter
function with the 'MinLength'
option set to the value of 1
.
Properties
ErrorHandler
— Function to handle errors from run
method
function handle
Function to handle errors from the run
method of the block, specified as a function handle. The handle specifies the function to call
if the run method encounters an error within a pipeline. For the pipeline to continue after a
block fails, ErrorHandler
must return a structure that is compatible with
the output ports of the block. The error handling function is called with the following two inputs:
Structure with these fields:
Field Description identifier Identifier of the error that occurred message Text of the error message index Linear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension. Input structure passed to the
run
method when it fails
Data Types: function_handle
Inputs
— Input ports
structure
This property is read-only.
Input ports of the block, specified as a structure. The field
names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input
objects. These objects describe the input port behaviors.
The input port names are the expected field names of the input structure that you pass to the
block run
method.
The SeqTrim
block Inputs
structure has the
following field:
FASTQFiles
— Names of FASTQ-formatted files with sequence and quality information. This input is a required input that must be satisfied. The default value is abioinfo.pipeline.datatypes.Unset
object, which means that the input value is not set yet.
Data Types: struct
Outputs
— Output ports
structure
This property is read-only.
Output ports of the block, specified as a structure. The field
names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output
objects. These objects describe the output port behaviors.
The field names of the output structure returned by the block run
method
are the same as the output port names.
The SeqTrim
block Outputs
structure has the
following fields:
TrimmedFASTQFiles
— Output file names. By default, the name of each output file consists of the input file name followed by the output suffix ('_trimmed'
).Tip
To see the actual location of these files, first get the results of the block. Then use the
unwrap
method as shown in this example.NumTrimmed
— Number of sequences trimmed from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order inNumTrimmed
corresponds to the order of the input files.NumUntrimmed
— Number of sequences untrimmed from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order inNumUntrimmed
corresponds to the order of the input files.
Data Types: struct
Options
— SeqTrim
options
bioinfo.pipeline.options.SeqTrimOptions
object (default)
SeqTrim
options, specified as a SeqTrimOptions
object. The default value is a default
SeqTrimOptions
object.
Object Functions
compile | Perform block-specific additional checks and validations |
copy | Copy array of handle objects |
emptyInputs | Create input structure for use with run method |
eval | Evaluate block object |
run | Run block object |
Examples
Trim Sequences Using SeqTrim
Block
Use a SeqTrim
block to start scanning the sequence
at the 5' end and trim at the first base with a low quality score of 10 (default).
import bioinfo.pipeline.block.* import bioinfo.pipeline.Pipeline FC = FileChooser(which("SRR6008575_10k_1.fq")); ST = SeqTrim; P = Pipeline; addBlock(P,[FC,ST]); connect(P,FC,ST,["Files","FASTQFiles"]); run(P); R = results(P,ST)
R = struct with fields: TrimmedFASTQFiles: [1×1 bioinfo.pipeline.datatypes.File] NumTrimmed: 1495 NumUntrimmed: 8505
Call unwrap
on TrimmedFASTQFiles
to see the
location of the output file.
unwrap(R.TrimmedFASTQFiles)
ans = "C:\PipelineResults\SeqTrim_1\1\SRR6008575_10k_1_trimmed.fastq"
Version History
Introduced in R2023a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)