Main Content

getCounts

Class: BioMap

Return count of read sequences aligned to reference sequence in BioMap object

Syntax

Count = getCounts(BioObj, StartPos, EndPos)
GroupCount = getCounts(BioObj, StartPos, EndPos, Groups)
GroupCount = getCounts(BioObj, StartPos, EndPos, Groups, R)
___ = getCounts(___, Name,Value)

Description

Count = getCounts(BioObj, StartPos, EndPos) returns Count, a nonnegative integer specifying the number of read sequences in BioObj, a BioMap object, that align to a specific range or set of ranges in the reference sequence. The range or set of ranges are defined by StartPos and EndPos. StartPos and EndPos can be two nonnegative integers such that StartPos is less than EndPos, and both integers are smaller than the length of the reference sequence. StartPos and EndPos can also be two column vectors representing a set of ranges (overlapping or segmented).

By default, getCounts counts each read only once. Therefore, if a read spans multiple ranges, that read instance is counted only once. When StartPos and EndPos specify overlapping ranges, the overlapping ranges are considered as one range.

GroupCount = getCounts(BioObj, StartPos, EndPos, Groups) specifies Groups, a vector of integers or cell array of character vectors or string vector, indicating groups that segmented ranges belong to. The segmented ranges are treated independently.

GroupCount = getCounts(BioObj, StartPos, EndPos, Groups, R) specifies a reference for each of the segmented ranges defined by StartPos, EndPos, and Groups.

___ = getCounts(___, Name,Value) uses additional options specified by one or more Name,Value pair arguments.

Input Arguments

BioObj

Object of the BioMap class.

StartPos

Either of the following:

  • Nonnegative integer that defines the start of a range in the reference sequence. StartPos must be less than EndPos, and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the start of a range in the reference sequence.

EndPos

Either of the following:

  • Nonnegative integer that defines the end of a range in the reference sequence. EndPos must be greater than StartPos, and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the end of a range in the reference sequence.

Groups

Row vector of integers, cell array of character vectors, or string vector of the same size as StartPos and EndPos. This vector indicates the group to which each range belongs.

R

Vector of positive integers indexing the SequenceDictionary property of BioObj, or a cell array of character vectors or string vector of the reference names. R must be scalar or must have the same number of elements as Groups.

For a given value of Groups, all the corresponding elements in R must be the same.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Independent

Logical that specifies whether to treat the ranges defined by StartPos and EndPos independently. If true, Count is a column vector containing the same number of elements as StartPos and EndPos. In this case, a read that spans multiple ranges, is counted once in each range.

Note

This name-value pair argument is ignored when using the Groups input argument, because getCounts assumes that each group of ranges is independent.

Default: false

Overlap

Specifies the minimum number of base positions that a read must overlap in a range or set of ranges, to be counted. This value can be any of the following:

  • Positive integer

  • 'full' — A read must be fully contained in a range or set of ranges to be counted.

  • 'start' — A read's start position must lie within a range or set of ranges to be counted.

Default: 1

Spliced

Logical specifying whether short reads are spliced during mapping (as in mRNA-to-genome mapping). N symbols in the Signature property of the object are not counted.

Default: false

Method

Character vector or string specifying the method to measure the abundance of reads. Choices are:

  • 'raw' — Raw counts

  • 'rpkm' — Counts of reads per kilobase pairs per million aligned reads

  • 'mean' — Average coverage depth computed base-by-base

  • 'max' — Maximum coverage depth computed base-by-base

  • 'min' — Minimum coverage depth computed base-by-base

  • 'sum' — Sum of all aligned bases in all the reads

Default: 'raw'

Output Arguments

Count

Either of the following:

  • When Independent is false, this value is a nonnegative integer. The integer specifies the number of reads that align to a range or set of ranges (overlapping or segmented) of the reference sequence in BioObj, a BioMap object. Each read is counted only once, even if the read spans multiple ranges.

  • When Independent is true, this value is a vector of nonnegative integers. This vector indicates the number of reads that align to the independent ranges specified by StartPos and EndPos. This vector contains the same number of elements as StartPos and EndPos.

GroupCount

Either of the following:

  • If no reference or a single reference is specified, this value is a vector containing the number of reads for each unique group in Groups. The order of elements in GroupsCount corresponds to the ascending order of unique elements in Groups.

  • If multiple references are specified, GroupCount is a cell array, where the ith element contains the number of reads for each unique group in the ith reference. The order of elements in GroupsCount corresponds to the ascending order of unique elements in R.

Examples

expand all

Create a BioMap object.

obj = BioMap('ex1.sam');

Return the number of reads that cover at least one base of the segmented range 1:50 and 71:100. By default, the ranges are not treated independently, that is, a read is counted once even if it maps to both segmented ranges.

counts_1 = getCounts(obj,[1;71],[50;100])
counts_1 = 37

Compute the number of reads, treating the segmented ranges [1:50] and [71:100] independently. Observe that sum(counts_2) is greater than counts_1 because there are four reads that span over the two segments and are counted twice in the second case.

counts_2 = getCounts(obj,[1;71],[50;100], 'Independent', true)
counts_2 = 2×1

    20
    21

Compute the number of reads that align to the segmented range 30:60 (associated with group 1) and the segmented range [1:10 50:60] (associated with group 2).

counts_3 = getCounts(obj,[1;30;50],[10;60;60],[2 1 2])
counts_3 = 2×1

    25
    22

Return the total number of reads aligned to the reference sequence.

getCounts(obj, min(getStart(obj)), max(getStop(obj)))
ans = 1482