Main Content

bioinfo.pipeline.block.SeqTrim

Bioinformatics pipeline block to trim sequences

Since R2023a

  • seqtrim block icon

Description

A SeqTrim block enables you to trim sequences based on a specified criterion.

Creation

Description

b = bioinfo.pipeline.block.SeqTrim creates a SeqTrim block.

example

b = bioinfo.pipeline.block.SeqTrim(options) also specifies additional options.

b = bioinfo.pipeline.block.SeqTrim(Name=Value) specifies additional options as the property names and values of a SeqTrimOptions object. This object is set as the value of the Options property of the block.

Note

The block always overwrites existing output files, unlike the seqtrim function.

Input Arguments

expand all

SeqTrim options, specified as a SeqTrimOptions object.

Data Types: char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Note

The following list of arguments is a partial list. For the complete list, refer to the properties of SeqTrimOptions object.

Base quality encoding format, specified as a character vector or string.

Criterion to trim sequences, specified as one of the following options. Specify only one trimming criterion per function call.

  • 'MaxNumberLowQualityBases'– applies a maximum threshold on the number of low-quality bases allowed before trimming a sequence starting at the 5' end.

  • 'MaxPercentLowQualityBases'– applies a maximum threshold on the percentage of low-quality bases allowed before trimming a sequence starting at the 5' end.

  • 'MeanQuality'– applies a minimum threshold on the running average base quality allowed before trimming a sequence starting at the 5' end.

  • 'BasePositions'– trims each sequence according to the base positions (first base and last base) starting at the 5' end.

  • 'Termini'– trims each sequence from either the 5' or 3' end or from both ends.

Use this name-value pair argument together with 'Threshold' to specify the appropriate threshold value. Depending on the trimming criterion, the corresponding value for 'Threshold' varies. See the 'Threshold' option for the default values.

Note

Sequences resulting in empty sequences after trimming are saved in the output files as empty sequences. To remove empty sequences from files, use the seqfilter function with the 'MinLength' option set to the value of 1.

Properties

expand all

Function to handle errors from the run method of the block, specified as a function handle. The handle specifies the function to call if the run method encounters an error within a pipeline. For the pipeline to continue after a block fails, ErrorHandler must return a structure that is compatible with the output ports of the block. The error handling function is called with the following two inputs:

  • Structure with these fields:

    FieldDescription
    identifierIdentifier of the error that occurred
    messageText of the error message
    indexLinear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension.

  • Input structure passed to the run method when it fails

Data Types: function_handle

This property is read-only.

Input ports of the block, specified as a structure. The field names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input objects. These objects describe the input port behaviors. The input port names are the expected field names of the input structure that you pass to the block run method.

The SeqTrim block Inputs structure has the following field:

  • FASTQFiles — Names of FASTQ-formatted files with sequence and quality information. This input is a required input that must be satisfied. The default value is a bioinfo.pipeline.datatypes.Unset object, which means that the input value is not set yet.

Data Types: struct

This property is read-only.

Output ports of the block, specified as a structure. The field names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output objects. These objects describe the output port behaviors. The field names of the output structure returned by the block run method are the same as the output port names.

The SeqTrim block Outputs structure has the following fields:

  • TrimmedFASTQFiles — Output file names. By default, the name of each output file consists of the input file name followed by the output suffix ('_trimmed').

    Tip

    To see the actual location of these files, first get the results of the block. Then use the unwrap method as shown in this example.

  • NumTrimmed — Number of sequences trimmed from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order in NumTrimmed corresponds to the order of the input files.

  • NumUntrimmed — Number of sequences untrimmed from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order in NumUntrimmed corresponds to the order of the input files.

Data Types: struct

SeqTrim options, specified as a SeqTrimOptions object. The default value is a default SeqTrimOptions object.

Object Functions

compilePerform block-specific additional checks and validations
copyCopy array of handle objects
emptyInputsCreate input structure for use with run method
evalEvaluate block object
runRun block object

Examples

collapse all

Use a SeqTrim block to start scanning the sequence at the 5' end and trim at the first base with a low quality score of 10 (default).

import bioinfo.pipeline.block.*
import bioinfo.pipeline.Pipeline

FC = FileChooser(which("SRR6008575_10k_1.fq"));
ST = SeqTrim;

P = Pipeline;
addBlock(P,[FC,ST]);
connect(P,FC,ST,["Files","FASTQFiles"]);

run(P);
R = results(P,ST)
R = 

  struct with fields:

    TrimmedFASTQFiles: [1×1 bioinfo.pipeline.datatypes.File]
           NumTrimmed: 1495
         NumUntrimmed: 8505

Call unwrap on TrimmedFASTQFiles to see the location of the output file.

unwrap(R.TrimmedFASTQFiles)
ans = 

    "C:\PipelineResults\SeqTrim_1\1\SRR6008575_10k_1_trimmed.fastq"

Version History

Introduced in R2023a