clustergram

Object containing hierarchical clustering analysis data

Description

The clustergram function creates a clustergram object. The object contains hierarchical clustering analysis data that you can view in a heatmap and dendrogram.

Creation

Syntax

clustergram(data)

clustergram(data,Name,Value)

Description

example

cgObj = clustergram(data) performs hierarchical clustering analysis on the values in data. The returned clustergram object cgObj contains analysis data and displays a dendrogram and heatmap.

example

cgObj = clustergram(data,Name,Value) sets the object properties using name-value pairs. For example, clustergram(data,'Standardize','column') standardizes the values along the columns of data. You can specify multiple name-value pairs. Enclose each property name in quotes.

Input Arguments

expand all

`data` — Source data
DataMatrix object | numeric matrix

Source data, specified as a DataMatrix object or numeric matrix. Typically, if the matrix contains gene expression data, each row corresponds to a gene and each column corresponds to a sample.

Name-Value Arguments

Use comma-separated name-value pair arguments to set the object properties. Enclose each property name in single quotes.

Example: cg = clustergram(data,'Colormap',redbluecmap,'Annotate',true)

Properties

expand all

`Standardize` — Dimension for standardizing data values
`'none'` (default) | `'row'` | `'column'` | `3` | `2` | `1`

Dimension for standardizing data values, specified as a character vector, string, or positive integer. Choices are:

'column' or 1 — Standardize along the columns of data.
'row' or 2 — Standardize along the rows of data.
'none' or 3 — Do not standardize.

If you specify 'column' or 'row', the function transforms the standardized values so that the mean is 0 and the standard deviation is 1 in the specified dimension.

Example: 'column'

Data Types: double | char | string

`Symmetric` — Flag to make the heatmap color scale symmetric around zero
`true` (default) | `false`

Flag to make the heatmap color scale symmetric around zero, specified as true or false.

Example: false

Data Types: logical

`ImputeFun` — Name of function or function handle to impute missing data
character vector | cell array

Name of a function or function handle to impute missing data, specified as a character vector or cell array. If you specify a cell array, the first element must be the name of a function or function handle, and the remaining elements must be name-value pairs used as inputs to the function. Missing data points are colored gray in the heatmap.

If data points are missing, use this property to impute the missing values.. Otherwise, the clustergram function errors.

Example: 'func1'

Data Types: char

`Colormap` — Heatmap colors
`redgreencmap` (default) | matrix | name of function handle

heatmap colors, specified as a three-column (M-by-3) matrix of red-green-blue (RGB) values or the name of a function handle that returns a colormap, such as redgreencmap or redbluecmap.

The default colormap is redgreencmap, in which red represents values above the mean, black represents the mean, and green represents values below the mean of a row (gene) across all columns (samples).

Example: redbluecmap

Data Types: double | char

`ColumnLabels` — Column labels
`[1x0 double]` (default) | string vector | cell array of character vectors | numeric vector

Column labels, specified as a string vector, cell array of character vectors, or numeric vector. The size of the vector must match the number of columns in the input data.

If the number of column labels is 200 or more, the labels do not appear in the clustergram plot.

Example: ["sample1","sample2","sample3"]

Data Types: double | string | cell

`RowLabels` — Row labels
`[]` (default) | string vector | cell array of character vectors | numeric vector

Row labels, specified as a string vector, cell array of character vectors, or numeric vector. The size of the vector must match the number of rows in the input data.

If the number of row labels is 200 or more, the labels do not appear in the clustergram plot.

Example: ["gene1","gene2","gene3"]

Data Types: double | string | cell

`ColumnLabelsRotate` — Orientation of column labels
`90` (default) | numeric scalar

Orientation of column labels, specified as a numeric scalar. Specify the value of rotation in degrees (positive angles cause counterclockwise rotation).

Example: 30

Data Types: double

`RowLabelsRotate` — Orientation of row labels
0 (default) | numeric scalar

Orientation of row labels, specified as a numeric scalar. Specify the value of rotation in degrees (positive angles cause counterclockwise rotation).

Example: 30

Data Types: double

`Annotate` — Flag to display data values in heatmap
`false` (default) | `true`

Flag to display data values in the heatmap, specified as true or false.

Example: true

Data Types: logical

`AnnotPrecision` — Display precision of data values
`2` (default) | numeric scalar

Display precision of data values in the heatmap, specified as a numeric scalar. The default number of digits of precision is 2.

Example: 3

Data Types: double

`LabelsWithMarkers` — Flag to display colored markers for row and column labels
`false` (default) | `true`

Flag to display colored markers instead of colored text for the row and column labels, specified as true or false.

Example: true

Data Types: logical

`AnnotColor` — Text color of displayed data values
`'w'` (default) | character vector | string | three-element numeric vector

Text color of displayed data values in the heatmap, specified as a character vector, string, or three-element numeric vector. For example, to use cyan, you can enter [0 1 1], 'c', "c", "cyan", or 'cyan'. For details, see Color Options.

Example: 'red'

Data Types: char | string | double

`DisplayRange` — Display range of standardize values
3 (default) | positive scalar

Display range of standardize values, specified as a positive scalar.

The default value 3means that there is a color variation for values between -3 and 3, but values greater than 3 are the same color as 3, and values less than -3 are the same color as -3.

For example, if you specify redgreencmap for the 'Colormap' property, pure red represents values greater than or equal to the specified display range value and pure green represents values less than or equal to the negative of the specified display range value.

Example: 3

Data Types: double

`ColumnLabelsColor` — Color information for column labels
`[]` (default) | structure | structure array

Warning

This property will be removed in a future release. Set LabelsWithMarkers to true for colored markers instead of colored texts.

Color information for column labels, specified as a structure or structure array.

For a single structure, you must specify the following fields.

Labels — Cell array of character vectors specifying column labels listed in the ColumnLabels property.
Colors — Character vector or string specifying a color for the column labels. If this field is empty, the default color (black) is used.

For a structure array, you must specify a single element in each field for each structure.

Labels — Character vector or string specifying a column label listed in the ColumnLabels property.
Colors — Character vector or string specifying a color for the column labels. If this field is empty, the default color (black) is used.

For more information on specifying colors, see Color Options.

Data Types: struct

`RowLabelsColor` — Color information for row labels
`[]` (default) | structure | structure array

Warning

This property will be removed in a future release. Set LabelsWithMarkers to true for colored markers instead of colored texts.

Color information for row labels, specified as a structure or structure array.

For a single structure, you must specify the following fields.

Labels — Cell array of character vectors specifying row labels listed in the RowLabels property.
Colors — Character vector or string specifying a color for the row labels. If this field is empty, the default color (black) is used.

For a structure array, you must specify a single element in each field for each structure.

Labels — Character vector or string specifying a row label listed in the RowLabels property.
Colors — Character vector or string specifying a color for the row labels. If this field is empty, the default color (black) is used.

For more information on specifying colors, see Color Options.

`Cluster` — Dimension for data clustering
`'all'` (default) | `1` | `2` | `3` | `'column'` | `'row'`

Dimension for data clustering, specified as a positive integer, character vector, or string. Choices are:

'column' or 1 — Cluster along the columns of data only, which results in clustered rows.
'row' or 2 — Cluster along the rows of data only, which results in clustered columns.
'all' or 3 — Cluster along the columns of data, then cluster along the rows of row-clustered data.

Example: 2

Data Types: double | char | string

`ColumnGroupMarker` — Information for annotating groups of columns
structure | structure array

Information for annotating groups of columns, specified as a structure or structure array.

If you specify a single structure, each field must contain a cell array of elements. If you specify a structure array, each structure must have a single element in each field.

The fields are :

GroupNumber — Scalar specifying the column group number to annotate.
Annotation — Character vector specifying text to annotate the column group.
Color — Character vector or three-element vector of RGB values specifying a color to label the column group. For more information on specifying colors, see Color Options. If this field is empty, the default value is 'blue'.

Data Types: struct

`ColumnPDist` — Distance metric to pass to `pdist` function
`'euclidean'` (default) | character vector | cell array

Distance metric to pass to the pdist function to calculate the pairwise distances between columns, specified as a character vector or cell array. Specify a cell array if the distance metric requires extra arguments. For example, to use the Minkowski distance with an exponent p, specify {'minkowski',p}.

Example: 'jaccard'

Data Types: char | cell

`Dendrogram` — Color threshold information to pass to `dendrogram` function
scalar | two-element numeric vector | character vector | cell array of character vectors

Color threshold information to pass to the dendrogram function to create a dendrogram plot, specified as a scalar, two-element numeric vector, character vector, or cell array of character vectors. This option sets the 'ColorThreshold' property of the dendrogram plot. If you specify a two-element numeric vector or cell array, the first element is for the rows, and the second element is for the columns.

Data Types: double | cell

`DisplayRatio` — Ratio of space that row and column dendrograms occupy
`1/5` (default) | scalar between `0` and `1` | two-element vector

Ratio of space that the row and column dendrograms occupy relative to the heatmap, specified as a scalar between 0 and 1 or two-element vector. If you specify a scalar, the function uses it as the ratio for both row and column dendrograms. If you specify a two-element vector, the function uses the first element for the ratio of the row dendrogram width to the heatmap width, and the second element for the ratio of the column dendrogram height to the heatmap height. The second element is ignored for one-dimensional clustergrams.

Example: 0.5

Data Types: double

`Linkage` — Linkage method to create hierarchical cluster tree
`'average'` (default) | character vector | two-element cell array of character vectors

Linkage method passed to the linkage function to create the hierarchical cluster tree for rows and columns, specified as a character vector or two-element cell array of character vectors. If you specify a cell array, the function uses the first element for linkage between rows, and the second element for linkage between columns.

Example: 'centroid'

Data Types: char | cell

`LogTrans` — Flag to log₂ transform data
`false` (default) | `true`

Flag to log₂ transform the data from natural scale, specified as true or false.

Example: true

Data Types: logical

`OptimalLeafOrder` — Flag to calculate optimal leaf order
`true` | `false`

Flag to calculate the optimal leaf order that maximizes the similarity between neighboring leaves, specified as true or false. The default value depends on the size of the input data. If the number of rows or columns in data exceeds 1500, the default value is false. Otherwise, the default value is true.

Disabling the optimal leaf ordering calculation can be useful when working with large datasets because this calculation consumes a lot of memory and time.

Example: true

Data Types: logical

`RowGroupMarker` — Information for annotating groups of rows
structure | structure array

Information for annotating groups of rows, specified as a structure or structure array.

If you specify a single structure, each field must contain a cell array of elements. If you specify a structure array, each structure must have a single element in each field.

The fields are

GroupNumber — Scalar specifying the column group number to annotate.
Annotation — Character vector specifying text to annotate the column group.
Color — Character vector or three-element vector of RGB values specifying a color to label the column group. For more information on specifying colors, see Color Options. If this field is empty, the default value is 'blue'.

Data Types: struct

`RowPDist` — Distance metric to pass to `pdist` function
`'euclidean'` (default) | character vector | cell array

Distance metric to pass to the pdist function to calculate the pairwise distances between rows, specified as a character vector or cell array. Specify a cell array if the distance metric requires extra arguments. For example, to use the Minkowski distance with an exponent p, specify {'minkowski',p}.

Example: 'jaccard'

Data Types: char | cell

`ShowDendrogram` — Flag to show dendrogram tree diagrams with clustergram
`'on'` (default) | `'off'`

Flag to show the dendrogram tree diagrams with the clustergram, specified as 'on' or 'off'.

Example: 'off'

Data Types: char

Object Functions

`view`	Display heatmap or clustergram
`plot`	Render heatmap or clustergram
`addTitle`	Add title to heatmap or clustergram
`addXLabel`	Label x-axis of heatmap or clustergram
`addYLabel`	Label y-axis of heatmap or clustergram
`clusterGroup`	Select cluster group

Examples

collapse all

Perform Hierarchical Clustering on Gene Expression Data

Open Script

Load microarray data containing gene expression levels of Saccharomyces cerevisiae (yeast) during the metabolic shift from fermentation to respiration [1].

load filteredyeastdata

This MAT file includes three variables, which are added to the MATLAB® workspace:

- yeastvalues - A matrix of gene expression data from Saccharomyces -_cerevisiae_ during the metabolic shift from fermentation to respiration - genes - A cell array of GenBank® accession numbers for labeling the rows in yeastvalues - times - A vector of time values for labeling the columns in yeastvalues

Create a clustergram object to display the heat map from the gene expression data in the first 30 rows of the yeastvalues matrix and standardize along the rows of data.

cgo = clustergram(yeastvalues(1:30,:),'Standardize','Row')

Clustergram object with 30 rows of nodes and 7 columns of nodes.

Use the set method and the genes and times vectors to add meaningful row and column labels to the clustergram.

set(cgo,'RowLabels',genes(1:30),'ColumnLabels',times)

Add a color bar to the clustergram by clicking the Insert Colorbar button on the toolbar.

View a data tip containing the intensity value, row label, and column label for a specific area of the heat map by clicking the Data Cursor button on the toolbar, then clicking an area in the heat map. To delete this data tip, right-click it, then select Delete Current Datatip.

Display intensity values for each area of the heat map by clicking the Annotate button on the toolbar. Click the Annotate button again to remove the intensity values.

Tip: If the amount of data is large enough, the cells within the clustergram
are too small to display the intensity annotations. Zoom in to see the
intensity annotations.

Remove the dendrogram tree diagrams from the figure by clicking the Show Dendrogram button on the toolbar. Click it again to display the dendrograms.

Use the get method to display the properties of the clustergram object, cgo.

get(cgo)

               Cluster: 'ALL'
              RowPDist: {'Euclidean'}
           ColumnPDist: {'Euclidean'}
               Linkage: {'Average'}
            Dendrogram: {}
      OptimalLeafOrder: 1
              LogTrans: 0
          DisplayRatio: [0.2000 0.2000]
        RowGroupMarker: []
     ColumnGroupMarker: []
        ShowDendrogram: 'on'
           Standardize: 'ROW'
             Symmetric: 1
          DisplayRange: 3
              Colormap: [11x3 double]
             ImputeFun: []
          ColumnLabels: {1x7 cell}
             RowLabels: {30x1 cell}
    ColumnLabelsRotate: 90
       RowLabelsRotate: 0
              Annotate: 'off'
        AnnotPrecision: 2
            AnnotColor: 'w'
     ColumnLabelsColor: []
        RowLabelsColor: []
     LabelsWithMarkers: 0

Change the clustering parameters by changing the linkage method and changing the color of the groups of nodes in the dendrogram whose linkage is less than a threshold of 3.

set(cgo,'Linkage','complete','Dendrogram',3)

Place the cursor on a branch node in the dendrogram to highlight (in blue) the group associated with it. Press and hold the mouse button to display a data tip listing the group number and the nodes (genes or samples) in the group.

Right-click a branch node in the dendrogram to display a menu of options.

The following options are available:

- Set Group Color - Change the cluster group color. - Print Group to Figure - Print the group to a figure window. - Copy Group to New Clustergram - Copy the group to a new clustergram window. - Export Group to Workspace - Create a clustergram object of the group in the MATLAB workspace. - Export Group Info to Workspace - Create a structure containing information about the group in the MATLAB workspace. The structure contains these fields:

- GroupNames - Cell array of character vectors containing the names of the row or column groups. - RowNodeNames - Cell array of character vectors containing the names of the row nodes. - ColumnNodeNames - Cell array of character vectors containing the names of the column nodes. - ExprValues - An M-by-N matrix of intensity values, where M and N are the number of row nodes and of column nodes respectively. If the matrix contains gene expression data, typically each row corresponds to a gene and each column corresponds to sample.

Create a clustergram object for Group 18 in the MATLAB workspace. Right-click Group 18, then select Export Group to Workspace. In the Export to Workspace dialog box, type Group18, then click OK.

Use the view method to view the clustergram object, Group18.

view(Group18)

View all the gene expression data using a diverging red and blue colormap and standardize along the rows of data.

cgo_all = clustergram(yeastvalues,'Colormap',redbluecmap,'Standardize','Row')

Clustergram object with 614 rows of nodes and 7 columns of nodes.

Create structure arrays to specify marker colors and annotations for two groups of rows (510 and 593) and two groups of columns (4 and 5).

rm = struct('GroupNumber',{510,593},'Annotation',{'A','B'},...
     'Color',{'b','m'});
cm = struct('GroupNumber',{4,5},'Annotation',{'Time1','Time2'},...
     'Color',{[1 1 0],[0.6 0.6 1]});

Use the RowGroupMarker and ColumnGroupMarker properties to add the color markers and annotations to the clustergram.

set(cgo_all,'RowGroupMarker',rm,'ColumnGroupMarker',cm)

More About

expand all

Color Options

The following lists the predefined colors and their RGB triplet equivalents. The short names and long names are character vectors that specify one of eight preset colors. The RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color; the intensities must be in the range [0 1].

RGB Triplet	Short Name	Long Name
`[1 1 0]`	`y`	`yellow`
`[1 0 1]`	`m`	`magenta`
`[0 1 1]`	`c`	`cyan`
`[1 0 0]`	`r`	`red`
`[0 1 0]`	`g`	`green`
`[0 0 1]`	`b`	`blue`
`[1 1 1]`	`w`	`white`
`[0 0 0]`	`k`	`black`

References

[1] DeRisi, J. L. “Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale.” Science 278, no. 5338 (October 24, 1997): 680–86.

Version History

Introduced before R2006a

clustergram

Description

Creation

Syntax

Description

Input Arguments

data — Source data DataMatrix object | numeric matrix

Properties

Standardize — Dimension for standardizing data values 'none' (default) | 'row' | 'column' | 3 | 2 | 1

Symmetric — Flag to make the heatmap color scale symmetric around zero true (default) | false

ImputeFun — Name of function or function handle to impute missing data character vector | cell array

Colormap — Heatmap colors redgreencmap (default) | matrix | name of function handle

ColumnLabels — Column labels [1x0 double] (default) | string vector | cell array of character vectors | numeric vector

RowLabels — Row labels [] (default) | string vector | cell array of character vectors | numeric vector

ColumnLabelsRotate — Orientation of column labels 90 (default) | numeric scalar

RowLabelsRotate — Orientation of row labels 0 (default) | numeric scalar

Annotate — Flag to display data values in heatmap false (default) | true

AnnotPrecision — Display precision of data values 2 (default) | numeric scalar

LabelsWithMarkers — Flag to display colored markers for row and column labels false (default) | true

AnnotColor — Text color of displayed data values 'w' (default) | character vector | string | three-element numeric vector

DisplayRange — Display range of standardize values 3 (default) | positive scalar

ColumnLabelsColor — Color information for column labels [] (default) | structure | structure array

RowLabelsColor — Color information for row labels [] (default) | structure | structure array

Cluster — Dimension for data clustering 'all' (default) | 1 | 2 | 3 | 'column' | 'row'

ColumnGroupMarker — Information for annotating groups of columns structure | structure array

ColumnPDist — Distance metric to pass to pdist function 'euclidean' (default) | character vector | cell array

Dendrogram — Color threshold information to pass to dendrogram function scalar | two-element numeric vector | character vector | cell array of character vectors

DisplayRatio — Ratio of space that row and column dendrograms occupy 1/5 (default) | scalar between 0 and 1 | two-element vector

Linkage — Linkage method to create hierarchical cluster tree 'average' (default) | character vector | two-element cell array of character vectors

LogTrans — Flag to log2 transform data false (default) | true

OptimalLeafOrder — Flag to calculate optimal leaf order true | false

RowGroupMarker — Information for annotating groups of rows structure | structure array

RowPDist — Distance metric to pass to pdist function 'euclidean' (default) | character vector | cell array

ShowDendrogram — Flag to show dendrogram tree diagrams with clustergram 'on' (default) | 'off'

Object Functions

Examples

Perform Hierarchical Clustering on Gene Expression Data

More About

Color Options

References

Version History

See Also

`data` — Source data
DataMatrix object | numeric matrix

`Standardize` — Dimension for standardizing data values
`'none'` (default) | `'row'` | `'column'` | `3` | `2` | `1`

`Symmetric` — Flag to make the heatmap color scale symmetric around zero
`true` (default) | `false`

`ImputeFun` — Name of function or function handle to impute missing data
character vector | cell array

`Colormap` — Heatmap colors
`redgreencmap` (default) | matrix | name of function handle

`ColumnLabels` — Column labels
`[1x0 double]` (default) | string vector | cell array of character vectors | numeric vector

`RowLabels` — Row labels
`[]` (default) | string vector | cell array of character vectors | numeric vector

`ColumnLabelsRotate` — Orientation of column labels
`90` (default) | numeric scalar

`RowLabelsRotate` — Orientation of row labels
0 (default) | numeric scalar

`Annotate` — Flag to display data values in heatmap
`false` (default) | `true`

`AnnotPrecision` — Display precision of data values
`2` (default) | numeric scalar

`LabelsWithMarkers` — Flag to display colored markers for row and column labels
`false` (default) | `true`

`AnnotColor` — Text color of displayed data values
`'w'` (default) | character vector | string | three-element numeric vector

`DisplayRange` — Display range of standardize values
3 (default) | positive scalar

`ColumnLabelsColor` — Color information for column labels
`[]` (default) | structure | structure array

`RowLabelsColor` — Color information for row labels
`[]` (default) | structure | structure array

`Cluster` — Dimension for data clustering
`'all'` (default) | `1` | `2` | `3` | `'column'` | `'row'`

`ColumnGroupMarker` — Information for annotating groups of columns
structure | structure array

`ColumnPDist` — Distance metric to pass to `pdist` function
`'euclidean'` (default) | character vector | cell array

`Dendrogram` — Color threshold information to pass to `dendrogram` function
scalar | two-element numeric vector | character vector | cell array of character vectors

`DisplayRatio` — Ratio of space that row and column dendrograms occupy
`1/5` (default) | scalar between `0` and `1` | two-element vector

`Linkage` — Linkage method to create hierarchical cluster tree
`'average'` (default) | character vector | two-element cell array of character vectors

`LogTrans` — Flag to log₂ transform data
`false` (default) | `true`

`OptimalLeafOrder` — Flag to calculate optimal leaf order
`true` | `false`

`RowGroupMarker` — Information for annotating groups of rows
structure | structure array

`RowPDist` — Distance metric to pass to `pdist` function
`'euclidean'` (default) | character vector | cell array

`ShowDendrogram` — Flag to show dendrogram tree diagrams with clustergram
`'on'` (default) | `'off'`