cuffgffread
Filter and convert GFF and GTF files
Description
cuffgffread(
reads the input
,output
)input
GFF or GTF file and writes the mandatory columns to the
output
GFF file [1]. The function can also
return the GTF-format file using the 'GTFOutput'
option.
cuffgffread
requires the Cufflinks Support Package for the Bioinformatics Toolbox™. If the support package is not installed, then the function provides a download
link. For details, see Bioinformatics Toolbox Software Support Packages.
cuffgffread(
uses additional options specified by one or more name-value pair arguments. For example,
input
,output
,Name,Value
)cuffgffread('gyrAB.gtf','gyrAB.gff','PreserveAttributes',true)
retains
all attributes in the output file.
Examples
Convert GTF to GFF Format
Convert a GTF file to a GFF file while retaining all attributes.
cuffgffread('gyrAB.gtf','gyrABOut.gff','PreserveAttributes',true)
You can also set the options using an object. For instance, specify the output to be in the GTF format.
opt = CuffGFFReadOptions; opt.GTFOutput = true; opt.PreserveAttributes = true; cuffgffread('gyrAB.gtf','gyrABOut.gtf',opt);
Once you have the options object, you can retrieve the equivalent original options for all object properties using getOptionsTable
.
getOptionsTable(opt)
ans = 33×3 table PropertyName FlagName FlagShortName ___________________________ ________________ _____________ AppendDescription 'AppendDescription' '-A' '' CheckOppositeStrand 'CheckOppositeStrand' '-B' '' CheckPhase 'CheckPhase' '-H' '' Cluster 'Cluster' '--cluster-only' '' CodingOnly 'CodingOnly' '-C' '' CollapseContainer 'CollapseContainer' '-K' '' CollapseFull 'CollapseFull' '-Q' '' CoordinateRange 'CoordinateRange' '-r' '' DiscardInvalidCDS 'DiscardInvalidCDS' '-J' '' DiscardNonCanonicalSplice 'DiscardNonCanonicalSplice' '-N' '' DiscardSingleExon 'DiscardSingleExon' '-U' '' DiscardTerminatedCDS 'DiscardTerminatedCDS' '-V' '' FastaCDSFile 'FastaCDSFile' '-x' '' FastaExonsFile 'FastaExonsFile' '-w' '' FastaProteinFile 'FastaProteinFile' '-y' '' FirstExonOnly 'FirstExonOnly' '-G' '' ForceExons 'ForceExons' '--force-exons' '' FullyContained 'FullyContained' '-R' '' GTFOutput 'GTFOutput' '-T' '' MaxIntronLength 'MaxIntronLength' '-i' '' Merge 'Merge' '--merge' '-M' MergeCloseExons 'MergeCloseExons' '-Z' '' MergeInfoFile 'MergeInfoFile' '-d' '' PreserveAttributes 'PreserveAttributes' '-F' '' Pseudo 'Pseudo' '--no-pseudo' '' ReplacementTable 'ReplacementTable' '-m' '' SequenceFile 'SequenceFile' '-g' '' SequenceInfo 'SequenceInfo' '-s' '' UrlDecode 'UrlDecode' '-D' '' UseEnsemblConversion 'UseEnsemblConversion' '-L' '' UseNonTranscript 'UseNonTranscript' '-O' '' UseTrackName 'UseTrackName' '-t' '' WriteCoordinates 'WriteCoordinates' '-W' ''
Input Arguments
input
— Input file name
string | character vector
Input file name, specified as a string or character vector. The file can be a GTF or GFF file.
Example: 'gyrAB.gtf'
Data Types: char
| string
output
— Output file name
string | character vector
Output file name, specified as a string or character vector. By default, the output
is a GFF file. Set 'GTFOutput'
to true
to get a
GTF output file.
Example: 'gyrAB.gff'
Data Types: char
| string
opt
— cuffgffread
options
CuffGFFReadOptions
object | string | character vector
cuffgffread
options, specified as a CuffGFFReadOptions
object, string, or character vector. The string or
character vector must be in the original gffread
option syntax
(prefixed by one or two dashes) [1].
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: cuffgffread('gyrAB.gtf','gyrAB.gff','CoordinateRange','+NC_000912.1:4821..7340')
AppendDescription
— Flag to add file descriptions to descr
attribute
false
(default) | true
Flag to add file descriptions from sequence files to the
descr
attribute of the output GFF record, specified as
true
or false
. Specify the sequence files using the
SequenceInfo
option.
Example:
'AppendDescription',true
Data Types: logical
CheckOppositeStrand
— Flag to check opposite strand when checking for in-frame stop codons
false
(default) | true
Flag to check opposite strand when checking for in-frame stop codons, specified as true
or false
.
Example:
'CheckOppositeStrand',true
Data Types: logical
CheckPhase
— Flag to adjust coding sequence phase
false
(default) | true
Flag to adjust coding sequence phase when checking for in-frame stop codons, specified as true
or false
.
Example:
'CheckPhase',true
Data Types: logical
Cluster
— Flag to cluster input transcripts into loci
true
(default) | false
Flag to cluster the input transcripts into loci, specified as
true
or false
. This option is the same as the
Merge
property, except that it does not collapse fully contained
transcripts with identical introns.
Example:
'Cluster',false
Data Types: logical
CodingOnly
— Flag to discard transcripts with no coding sequence
false
(default) | true
Flag to discard transcripts with no coding sequence feature (CDS), specified as true
or false
.
Example:
'CodingOnly',true
Data Types: logical
CollapseContainer
— Flag to collapse fully contained transcripts
false
(default) | true
Flag to collapse fully contained transcripts that are shorter
with fewer introns than the container, specified as true
or
false
. This property applies only when you set Merge
to true
.
Example:
'CollapseContainer',true
Data Types: logical
CollapseFull
— Flag to collapse shorter transcripts overlapping at least 80% with another exon
false
(default) | true
Flag to collapse shorter transcripts overlapping at least 80%
with another single exon transcript, specified as true
or
false
. This property applies only when you set Merge
to true
.
Example:
'CollapseFull',true
Data Types: logical
CoordinateRange
— Genomic range to filter transcripts
string | character vector
Genomic range to filter transcripts, specified as a string or character vector. The format must be "[[<strand>]<chr>:]<start>..<end>"
, where start
and end
are genomic positions, chr
is an optional chromosome or contig name, and an optional strand
('+'
or '-'
).
Example:
'CoordinateRange',“+NC_000912.1:4821..7340”
Data Types: char
| string
DiscardInvalidCDS
— Flag to ignore mRNA transcripts either lacking start or stop codon or having in-frame stop codon
false
(default) | true
Flag to ignore mRNA transcripts either lacking a start or stop codon or having an in-frame stop codon, specified as true
or false
.
Example:
'DiscardInvalidCDS',true
Data Types: logical
DiscardNonCanonicalSplice
— Flag to ignore multiexon mRNA transcripts that have intron with noncanonical splice sequence
false
(default) | true
Flag to ignore multiexon mRNA transcripts that have an intron
with a noncanonical splice sequence, specified as true
or
false
. A noncanonical splice sequence is any splice sequence other than
"GT-AG"
, "CG-AG"
, or
"AT-AC"
.
Example:
'DiscardNonCanonicalSplice',true
Data Types: logical
DiscardSingleExon
— Flag to ignore transcripts spanning single exon
false
(default) | true
Flag to ignore transcripts spanning a single exon, specified as true
or false
.
Example:
'DiscardSingleExon',true
Data Types: logical
DiscardTerminatedCDS
— Flag to ignore transcripts with in-frame stop codon
false
(default) | true
Flag to ignore transcripts with an in-frame stop codon, specified as true
or false
.
Example:
'DiscardTerminatedCDS',true
Data Types: logical
ExtraCommand
— Additional commands
""
(default) | character vector | string
The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB® properties.
Example: 'ExtraCommand',"-E"
Data Types: char
| string
FastaCDSFile
— Name of file to save spliced coding sequences
string | character vector
Name of a file to save the spliced coding sequences in the FASTA format, specified as a string or character vector.
Example:
'FastaCDSFile',"splicedCoding.FASTA"
Data Types: char
| string
FastaExonsFile
— Name of file to save spliced exons
string | character vector
Name of a file to save the spliced exons in the FASTA format, specified as a string or character vector.
Example:
'FastaExonsFile',"splicedExon.FASTA"
Data Types: char
| string
FastaProteinFile
— Name of file to save protein translation of coding sequences
string | character vector
Name of a file to save the protein translation of coding sequences in the FASTA format, specified as a string or character vector.
Example:
'FastaProteinFile',"translated.FASTA"
Data Types: char
| string
FirstExonOnly
— Flag to parse additional attributes only from first exon
false
(default) | true
Flag to parse additional attributes only from the first exon, specified as true
or false
.
Example: 'FirstExonOnly',true
Data Types: logical
ForceExons
— Flag to list lowest-level GFF features as exon features
false
(default) | true
Flag to list the lowest-level GFF features as exon features in
the output file, specified as true
or false
.
Example:
'ForceExons',true
Data Types: logical
FullyContained
— Flag to discard transcripts not contained fully
false
(default) | true
Flag to discard transcripts not contained fully within the
range, specified as true
or false
. Specify the range using
the CoordinateRange
option.
Example:
'FullyContained',true
Data Types: logical
GTFOutput
— Flag to output GTF-format transcript files
false
(default) | true
Flag to output GTF-format transcript files, specified as
true
or false
.
Example:
'GTFOutput',true
Data Types: logical
IncludeAll
— Flag to apply all available options
false
(default) | true
The original (native) syntax is prefixed by one or two dashes.
By default, the function converts only the specified options. If the value is
true
, the software converts all available options, with default values
for unspecified options, to the original syntax.
Note
If you set IncludeAll
to true
, the software
converts all available properties, using default values for unspecified properties. The
only exception is when the default value of a property is NaN
,
Inf
, []
, ''
, or
""
. In this case, the software does not translate the
corresponding property.
Example: 'IncludeAll',true
Data Types: logical
MaxIntronLength
— Maximum intron length for transcript to include in output
Inf
(default) | positive integer
Maximum intron length for a transcript to include in the output
file, specified as a positive integer. Inf
, the default value, sets no limit
on the intron length.
Example:
'MaxIntronLength',500
Data Types: double
Merge
— Flag to merge transcripts to loci
false
(default) | true
Flag to merge transcripts into loci by collapsing transcripts with identical introns, specified as true
or false
.
Example:
'Merge',true
Data Types: logical
MergeCloseExons
— Flag to merge exons into single exon
false
(default) | true
Flag to merge exons into a single exon when separated by fewer than 4 base-pair introns, specified as true
or false
.
Example:
'MergeCloseExons',true
Data Types: logical
MergeInfoFile
— Name of file to save information on duplicates when merging
string | character vector
Name of a file to save information on duplicates when merging,
specified as a string or character vector. This property applies only when you set
Merge
to true
.
Example:
'MergeInfoFile',"duplicates.txt"
Data Types: char
| string
PreserveAttributes
— Flag to retain all attributes in output
false
(default) | true
Flag to retain all attributes in the output file, specified as true
or false
.
Example:
'PreserveAttributes',true
Data Types: logical
Pseudo
— Flag to filter out records containing "pseudo"
true
(default) | false
Flag to filter out records containing the word "pseudo,"
specified as true
or false
.
Example:
'Pseudo',false
Data Types: logical
ReplacementTable
— Name of file containing replacement table
string | character vector
Name of a file containing a replacement table, specified as a string or character vector. The table must have two columns, where the first column contains the original transcript IDs and the second column contains the new transcript IDs. An example table follows.
origTranscript1 | newTranscript1 |
origTranscript2 | newTranscript2 |
origTranscript3 | newTranscript3 |
If you provide a replacement table, the function replaces the transcript IDs found in the first column with the new transcripts IDs from the second column and filters out those transcripts not found.
Example:
'ReplacementTable',"replaceTbl.txt"
Data Types: char
| string
SequenceFile
— Name of FASTA-format file containing genomic sequences
string | character vector
Name of a FASTA-format file containing genomic sequences for all input mappings, specified as a string or character vector.
Example:
'SequenceFile',"seqs.fasta"
Data Types: char
| string
SequenceInfo
— Name of tab-delimited file with additional information on input sequence
string | character vector
Name of a tab-delimited file with additional information on
each input sequence, specified as a string or character vector. This file must have three
columns: a sequence name column, a sequence length column, and a sequence description column. If
AppendDescription
is true
, the sequence description
is included as an attribute in the output GFF file.
Example:
'SequenceInfo',"seqinfo.txt"
Data Types: char
| string
UrlDecode
— Flag to decode URL-encoded characters in attribute names
false
(default) | true
Flag to decode url-encoded characters in attribute names,
specified as true
or false
. For instance,
"transcript%20description" is decoded to "transcript description".
Example:
'UrlDecode',true
Data Types: logical
UseEnsemblConversion
— Flag to use GTF-to-GFF3 conversion method from Ensembl
false
(default) | true
Flag to use the GTF-to-GFF3 conversion method from Ensembl, specified as true
or false
.
Example:
'UseEnsemblConversion',true
Data Types: logical
UseNonTranscript
— Flag to include nontranscript GFF records in output file
false
(default) | true
Flag to include nontranscript GFF records in the output file, specified as true
or false
.
Example:
'UseNonTranscript',true
Data Types: logical
UseTrackName
— Flag to use track name in second column of GFF output line
false
(default) | true
Flag to use the track name in the second column of the GFF output line, specified as true
or false
.
Example:
'UseTrackName',true
Data Types: logical
WriteCoordinates
— Flag to write exon coordinates projected onto spliced sequence
false
(default) | true
Flag to write the exon coordinates projected onto the spliced
sequence, specified as true
or false
. This property
applies only when FastaExonsFile
or FastaCDSFile
is
specified.
Example:
'WriteCoordinates',true
Data Types: logical
References
[1] Trapnell, Cole, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. “Transcript Assembly and Quantification by RNA-Seq Reveals Unannotated Transcripts and Isoform Switching during Cell Differentiation.” Nature Biotechnology 28, no. 5 (May 2010): 511–15.
Version History
Introduced in R2019a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)