BioRead class

Contain sequence and quality data

Description

The BioRead class contains data from short-read sequences, including sequence headers, nucleotide sequences, and the quality scores for the sequences. This data is typically obtained from a high-throughput sequencing instrument.

You construct a BioRead object from short-read sequence data. Each element in the object has a sequence, header, and quality score associated with it. Use the object properties and methods to explore, access, filter, and manipulate all or a subset of the data, before doing subsequent analyses or sequence alignment and mapping.

Construction

BioReadobj = BioRead constructs BioReadobj, an empty BioRead object.

BioReadobj = BioRead(File) constructs BioReadobj, a BioRead object, from File, a FASTQ- or SAM-formatted file. The data remains in the source file, and the BioRead object accesses it using an auxiliary index file. The index file must have the same name as the source file, but with an .IDX extension. If the index file is not present in the same folder as the source file, the BioRead constructor function creates the index file in that folder.

    Note:   Because the data remains in the source file:

    • Do not delete the source file (FASTQ or SAM) or the auxiliary index file.

    • You cannot modify BioReadobj properties.

BioReadobj = BioRead(Struct) constructs BioReadobj, a BioRead object, from Struct, a MATLAB® structure containing Header, Sequence, and Quality fields, such as returned by the fastqread or the samread function. The data from Struct is kept in memory, which lets you modify the properties of BioReadobj.

BioReadobj = BioRead(Seqs) constructs BioReadobj, a BioRead object, from Seqs, a cell array of strings containing the letter representations of nucleotide sequences.

BioReadobj = BioRead(Seqs,Quals) constructs BioReadobj, a BioRead object, also from Quals, a cell array of strings containing the ASCII representation of per-base quality scores for nucleotide sequences.

BioReadobj = BioRead(Seqs,Quals,Headers) constructs BioReadobj, a BioRead object, also from Headers, a cell array of strings containing header text for nucleotide sequences.

BioReadobj = BioRead(___,'PropertyName',PropertyValue) constructs a BioRead object using options, specified as name-value pair arguments.

BioReadobj = BioRead(File,'InMemory',InMemoryValue) specifies whether to place the data in memory or leave the data in the source file. Leaving the data in the source file and accessing it via an index file is more memory efficient, but does not let you modify properties of BioReadobj. Choices are true or false (default). If the first input argument is not a file name, then this name-value pair argument is ignored, and the data is automatically placed in memory.

    Tip   Set the InMemory name-value pair argument to true if you want to modify the properties of BioReadobj.

BioReadobj = BioRead(___,'IndexDir',IndexDirValue) specifies the path to the folder where the index file either exists or will be created.

    Tip   Use the IndexDir name-value pair argument if you do not have write access to the folder where the source file is located.

BioReadobj = BioRead(___,'Sequence',SequenceValue) constructs BioReadobj, a BioRead object, from SequenceValue, a cell array of strings containing the letter representations of nucleotide sequences. This name-value pair works only if the data is read into memory.

BioReadobj = BioRead(___,'Quality',QualityValue) constructs BioReadobj, a BioRead object, from QualityValue, a cell array of strings containing the ASCII representation of per-base quality scores for nucleotide sequences. This name-value pair works only if the data is read into memory.

BioReadobj = BioRead(___,'Header',HeaderValue) constructs BioReadobj, a BioRead object, from HeaderValue, a cell array of strings containing header text for nucleotide sequences. This name-value pair works only if the data is read into memory.

BioReadobj = BioRead(___,'Name',NameValue) constructs BioReadobj, a BioRead object, and then sets the Name property to NameValue, a string describing the object. Default is '', an empty string.

Input Arguments

File

String specifying a FASTQ- or SAM-formatted file.

Struct

MATLAB structure containing Header, Sequence, and Quality fields, such as returned by the fastqread or the samread function.

InMemoryValue

Logical specifying whether to place the data in memory or leave the data in the source file. Leaving the data in the source file and accessing it via an index file is more memory efficient, but does not let you modify properties of the BioRead object. If the first input argument is not a file name, then this name-value pair argument is ignored, and the data is automatically placed in memory.

Default: false

IndexDirValue

String specifying the path to the folder where the index file either exists or will be created.

Default: Folder where File is located

Seqs

Cell array of strings containing the letter representations of nucleotide sequences. This information populates the BioRead object's Sequence property.

Quals

Cell array of strings containing the ASCII representation of per-base quality scores for nucleotide sequences. This information populates the BioRead object's Quality property.

Headers

Cell array of strings containing header text for nucleotide sequences. This information populates the BioRead object's Header property.

SequenceValue

Cell array of strings containing the letter representations of nucleotide sequences. This information populates the BioRead object's Sequence property. This name-value pair works only if the data is read into memory.

QualityValue

Cell array of strings containing the ASCII representation of per-base quality scores for nucleotide sequences. This information populates the BioRead object's Quality property. This name-value pair works only if the data is read into memory.

Default: Empty cell array

HeaderValue

Cell array of strings containing header text for nucleotide sequences. This information populates the BioRead object's Header property. This name-value pair works only if the data is read into memory.

Default: Empty cell array

NameValue

String describing the BioRead object. This information populates the object's Name property.

Default: ' ', an empty string

Properties

Header

Headers associated with all sequences represented in the BioRead object.

Cell array of strings, such that there is a header for each sequence in the object. Header strings can be empty. There is a one-to-one relationship between the number and order of elements in Header and Sequence, unless Header is an empty cell array.

Name

Description of the BioRead object.

Single string describing the BioRead object.

Default: ' ', an empty string

NSeqs

Number of sequences in the BioRead object.

This information is read only.

Quality

Per-base quality scores associated with all sequences represented in the BioRead object.

Cell array of strings, such that there is a quality string for each sequence in the object. Each quality string is an ASCII representation of per-base quality scores for a nucleotide sequence or an empty string. A one-to-one relationship exists between the number and order of elements in Quality and Sequence, unless Quality is an empty cell array.

Sequence

Nucleotide sequences in the BioRead object.

Cell array of strings containing the letter representations of the nucleotide sequences.

Methods

combineCombine two objects
getRetrieve property of object
getHeaderRetrieve sequence headers from object
getQualityRetrieve sequence quality scores from object
getSequenceRetrieve sequences from object
getSubsequenceRetrieve partial sequences from object
getSubsetCreate object containing subset of elements from object
plotSummaryPlot summary statistics of BioRead object
setSet property of object
setHeaderSet sequence headers for object
setQualitySet sequence quality scores for object
setSequenceSet sequences for object
setSubsequenceSet partial sequences for object
setSubsetSet elements for object
writeWrite contents of BioRead or BioMap object to file

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB Programming Fundamentals documentation.

Indexing

BioRead objects support dot . indexing to extract, assign, and delete data.

Examples

expand all

Construct BioRead Object from FASTQ File

Construct a BioRead object from a FASTQ-formatted file that is provided with Bioinformatics Toolbox™.

BRObj1 = BioRead('SRR005164_1_50.fastq','Name','MyObject')
BRObj1 = 

  BioRead with properties:

     Quality: [50x1 File indexed property]
    Sequence: [50x1 File indexed property]
      Header: [50x1 File indexed property]
       NSeqs: 50
        Name: 'MyObject'

Construct BioRead Object from MATLAB Workspace Variables

Create variables containing sequences, quality scores, and headers.

seqs = {randseq(10);randseq(15);randseq(20)};
quals = {repmat('!', 1, 10);repmat('%', 1, 15);repmat('&', 1, 20)};
headers = {'H1';'H2';'H3'};

Construct a BioRead object from these three variables.

BRObj2 = BioRead(seqs,quals,headers)
BRObj2 = 

  BioRead with properties:

     Quality: {3x1 cell}
    Sequence: {3x1 cell}
      Header: {3x1 cell}
       NSeqs: 3
        Name: ''

Construct BioRead Object from MATLAB Structure

Create variables containing sequences, quality scores, and headers.

seqs = {randseq(10);randseq(15);randseq(20)};
quals = {repmat('!',1,10); repmat('%',1,15);repmat('&',1,20)};
headers = {'H1';'H2';'H3'};

Construct a structure containing Header, Sequence, and Quality fields.

BRStruct = struct('Header',headers,'Sequence',seqs,'Quality',quals);

Construct a BioRead object from this structure.

BRObj3 = BioRead(BRStruct)
BRObj3 = 

  BioRead with properties:

     Quality: {3x1 cell}
    Sequence: {3x1 cell}
      Header: {3x1 cell}
       NSeqs: 3
        Name: ''
Was this topic helpful?