Documentation

basecount

Count nucleotides in sequence

Syntax

NTStruct = basecount(SeqNT)
NTStruct = basecount(SeqNT, ...'Ambiguous', AmbiguousValue, ...)
NTStruct = basecount(SeqNT, ...'Gaps', GapsValue, ...)
NTStruct = basecount(SeqNT, ...'Chart', ChartValue, ...)

Input Arguments

SeqNT

One of the following:

AmbiguousValue

String specifying how to treat ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N). Choices are:

  • 'ignore' (default) — Skips ambiguous characters

  • 'bundle' — Counts ambiguous characters and reports the total count in the Ambiguous field.

  • 'prorate' — Counts ambiguous characters and distributes them proportionately in the appropriate fields. For example, the counts for the character R are distributed evenly between the A and G fields.

  • 'individual' — Counts ambiguous characters and reports them in individual fields.

  • 'warn' — Skips ambiguous characters and displays a warning.

GapsValue

Specifies whether gaps, indicated by a hyphen (-), are counted or ignored. Choices are true or false (default).

ChartValue

String specifying a chart type. Choices are 'pie' or 'bar'.

Output Arguments

NTStruct1-by-1 MATLAB structure containing the fields A, C, G, and T.

Description

NTStruct = basecount(SeqNT) counts the number of each type of base in SeqNT, a nucleotide sequence, and returns the counts in NTStruct, a 1-by-1 MATLAB structure containing the fields A, C, G, and T.

  • The character U is added to the T field.

  • Ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N), and gaps, indicated by a hyphen (-), are ignored by default.

  • Unrecognized characters are ignored and cause the following warning message.

    Warning: Unknown symbols appear in the sequence. These will be ignored.

NTStruct = basecount(SeqNT, ...'PropertyName', PropertyValue, ...) calls basecount with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

NTStruct = basecount(SeqNT, ...'Ambiguous', AmbiguousValue, ...) specifies how to treat ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N). Choices are:

  • 'ignore' (default)

  • 'bundle'

  • 'prorate'

  • 'individual'

  • 'warn'

NTStruct = basecount(SeqNT, ...'Gaps', GapsValue, ...) specifies whether gaps, indicated by a hyphen (-), are counted or ignored. Choices are true or false (default).

NTStruct = basecount(SeqNT, ...'Chart', ChartValue, ...) creates a chart showing the relative proportions of the nucleotides. ChartValue can be 'pie' or 'bar'.

Examples

  1. Count the bases in a DNA sequence and return the results in a structure.

    Bases = basecount('TAGCTGGCCAAGCGAGCTTG')
    
    Bases = 
    
        A: 4
        C: 5
        G: 7
        T: 4
    
  2. Get the count for adenosine (A) bases.

    Bases.A
    
    ans =
    
        4
    
  3. Count the bases in a DNA sequence containing ambiguous characters, listing the ambiguous characters in separate fields.

    basecount('ABCDGGCCAAGCGAGCTTG','Ambiguous','individual')
    
    ans =
     
        A: 4
        C: 5
        G: 6
        T: 2
        R: 0
        Y: 0
        K: 0
        M: 0
        S: 0
        W: 0
        B: 1
        D: 1
        H: 0
        V: 0
        N: 0
    
Was this topic helpful?