# dendrogram

Dendrogram plot

## Description

dendrogram(tree) generates a dendrogram plot of the hierarchical binary cluster tree. A dendrogram consists of many U-shaped lines that connect data points in a hierarchical tree. The height of each U represents the distance between the two data points being connected.

• If there are 30 or fewer data points in the original data set, then each leaf in the dendrogram corresponds to one data point.

• If there are more than 30 data points, then dendrogram collapses lower branches so that there are 30 leaf nodes. As a result, some leaves in the plot correspond to more than one data point.

example

dendrogram(tree,P) generates a dendrogram plot with no more than P leaf nodes. If there are more than P data points in the original data set, then dendrogram collapses the lower branches of the tree. As a result, some leaves in the plot correspond to more than one data point.

example

dendrogram(___,Name=Value) specifies additional options using one or more name-value arguments. For examples, you can specify the order of leaf nodes and the orientation of the dendrogram plot.

example

dendrogram(ax,___) displays the plot in the target axes. Specify ax as the first input argument followed by any of the input argument combinations in the previous syntaxes.

H = dendrogram(___) returns a vector of Line objects. You can use any of the argument combinations from the previous syntaxes.

example

[H,T,outperm] = dendrogram(___) also returns a vector T containing the leaf node number for each object in the original data set, and a vector outperm giving the order of the node labels of the leaves as shown in the dendrogram.

• It is useful to return T when the number of leaf nodes, P, is less than the total number of data points, where some leaf nodes in the display correspond to multiple data points.

• The order of the node labels given in outperm is from left to right for a vertical dendrogram, and from bottom to top for a horizontal dendrogram.

example

## Examples

collapse all

Generate sample data and use it to create a hierarchical binary cluster tree using the linkage function. Plot a dendrogram of the tree.

rng(0,"twister") % For reproducibility
X = rand(10,3);
dendrogram(tree)

Generate sample data and use it to create a hierarchical binary cluster tree using the linkage function.

rng(0,"twister") % For reproducibility
X = rand(10,3);

Calculate an optimal leaf order and plot a dendrogram.

D = pdist(X);
leafOrder = optimalleaforder(tree,D)
leafOrder = 1×10

3     7     6     1     4     9     5     8    10     2

dendrogram(tree,Reorder=leafOrder)

The order of the leaf nodes in the dendrogram plot corresponds to the permutation in leafOrder.

Generate sample data.

rng(0,"twister") % For reproducibility
X = rand(100,2);

There are 100 data points in the original data set, X.

Create a hierarchical binary cluster tree using the linkage function. Then, plot the dendrogram for the complete tree (100 leaf nodes) by setting the input argument P equal to 0.

dendrogram(tree,0)

Now, plot the dendrogram with only 25 leaf nodes. Return the mapping of the original data points to the leaf nodes shown in the plot.

figure
[~,T] = dendrogram(tree,25);

List the original data points that are in leaf node 7 of the dendrogram plot.

find(T==7)
ans = 7×1

7
33
60
70
74
76
86

Generate sample data and use it to create a hierarchical binary cluster tree using the linkage function.

rng(0,"twister") % For reproducibility
X = rand(10,3);

Plot the dendrogram with a horizontal orientation, using the default color threshold. Return the Line objects so you can change the dendrogram line widths.

H = dendrogram(tree,Orientation="left",ColorThreshold="default");
set(H,LineWidth=2)

Since R2024b

Generate sample data and use it to create a hierarchical binary cluster tree using the linkage function.

rng(0,"twister") % For reproducibility
X = rand(25,3);

Assign the leaf nodes to clusters.

clusterAssignments = cluster(tree,Cutoff=1.1,Criterion="inconsistent");

Plot the dendrogram. Color the groups of nodes and leaf node markers according to their cluster assignments. Show a dashed line indicating where the tree is cut to produce the cluster assignments.

dendrogram(tree,ClusterIndices=clusterAssignments, ...
ShowMarkers=true,ShowCut=true);

## Input Arguments

collapse all

Hierarchical binary cluster tree, specified as an (M – 1)-by-3 matrix that you generate using the linkage function, where M is the number of data points in the original data set.

Maximum number of leaf nodes to include in the dendrogram plot, specified as a positive integer value.

• If there are P or fewer data points in the original data set, then each leaf in the dendrogram corresponds to one data point.

• If there are more than P data points, then dendrogram collapses lower branches so that there are P leaf nodes. As a result, some leaves in the plot correspond to more than one data point.

If you do not specify P, then dendrogram uses 30 as the maximum number of leaf nodes. To display the complete tree, set P equal to 0.

Data Types: single | double

Axes for the plot, specified as an Axes or UIAxes object. If you do not specify ax, then dendrogram creates the plot using the current axes. For more information on creating an axes object, see axes and uiaxes.

### Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: dendrogram(tree,Orientation="left",Reorder=myOrder) specifies a horizontal dendrogram with leaves in the order specified by myOrder.

Order of leaf nodes in the dendrogram plot, specified as a numeric vector giving the order of nodes in the complete tree. The order vector must be a permutation of the vector 1:M, where M is the number of data points in the original data set. Specify the order from left to right for vertical dendrograms, and from bottom to top for horizontal dendrograms.

If M is greater than the number of leaf nodes in the dendrogram plot, P (by default, P is 30), then you can only specify a permutation vector that does not separate the groups of leaves that correspond to collapsed nodes.

Data Types: single | double

Indicator for whether to check for crossing branches in the dendrogram plot, specified as a numeric or logical 1 (true) or 0 (false). This option is useful only when you specify a value for Reorder.

When CheckCrossing has the value true, dendrogram issues a warning if the order of the leaf nodes causes crossing branches in the plot. If the dendrogram plot does not show a complete tree (because the number of data points in the original data set is greater than P), dendrogram issues a warning only when the order of the leaf nodes causes branches to cross in the dendrogram as shown in the plot. That is, there is no warning if the order causes crossing branches in the complete tree but not in the dendrogram as shown in the plot.

Since R2024b

Cluster assignments for leaf nodes, specified as a numeric vector of length N, where N is the number of rows in tree. Each value in ClusterIndices must be an integer in the range [1,C], where C is the number of clusters. When you specify ClusterIndices, the function ignores the specified ColorThreshold value, and instead, colors groups of nodes according to the cluster assignments. If you specify ShowMarkers=true, the function also colors the leaf node markers according to the cluster assignments.

Example: ClusterIndices=[1 1 2 1 2 1]

Data Types: single | double

Threshold for unique colors in the dendrogram plot, specified as either "default" or a scalar value in the range (0,max(tree(:,3))). If ColorThreshold has the value T, then dendrogram assigns a unique color to each group of nodes in the dendrogram whose linkage is less than T.

• If ColorThreshold has the value "default", then the threshold, T, is 70% of the maximum linkage, 0.7*max(tree(:,3)).

• If you do not specify a value for ColorThreshold, or if you specify a threshold outside the range (0,max(tree(:,3))), then dendrogram uses only one color for the dendrogram plot.

• If you specify ClusterIndices, the function ignores the specified ColorThreshold value, and instead, colors groups of nodes according to the cluster assignments.

Example: ColorThreshold=0.5

Data Types: single | double | string | char

Since R2024b

Indicator to show the cut line, specified as a numeric or logical 0 (false) or 1 (true). If you specify ClusterIndices and ShowCut=true, dendrogram plots a dashed line showing where the tree is cut to produce the cluster assignments in ClusterIndices. If you specify ColorThreshold and ShowCut=true, and do not specify ClusterIndices, dendrogram plots a dashed line at the ColorThreshold value.

Example: ShowCut=true

Data Types: logical

Since R2024b

Indicator to show the leaf node markers, specified as a numeric or logical 0 (false) or 1 (true). If you specify ShowMarkers=true, dendrogram plots an unfilled black circle marker at each leaf node. If you additionally specify ClusterIndices, the circle marker is filled and colored according to the cluster assignment if all rows in that leaf are in the same cluster. Move the cursor over a leaf node marker to display a data tip with the tree row numbers (and cluster assignments, if specified) for that leaf. The data tip displays a maximum of three row numbers and three cluster assignments for a leaf.

Example: ShowMarkers=true

Data Types: logical

Orientation of the dendrogram, specified as one of these values:

 "top" Top to bottom "bottom" Bottom to top "left" Left to right "right" Right to left

Specify "top" or "bottom" for a vertical dendrogram, where the leaf nodes are arranged horizontally.

Specify "left" or "right" for a horizontal dendrogram, where the leaf nodes are arranged vertically.

Label for each data point in the original data set, specified as a character vector, string array, or cell array of character vectors. dendrogram labels any leaves in the dendrogram plot containing a single data point with that data point’s label.

Parent container, specified as a Figure or Panel object. For more information on these object properties, see Figure Properties and Panel.

## Output Arguments

collapse all

Lines in the dendrogram plot, returned as a vector of Line objects.

Leaf node numbers for each data point in the original data set, returned as a column vector of length M, where M is the number of data points in the original data set.

When there are fewer than P data points in the original data (P is 30, by default), all data points are displayed in the dendrogram, with each node containing a single data point. In this case, T is the identity map, T = (1:M)'.

T is useful when P is less than the total number of data points, that is, when some leaf nodes in the dendrogram correspond to multiple data points. For example, to find out which data points are contained in leaf node k of the dendrogram plot, use find(T==k).

Permutation of the node labels of the leaves of the dendrogram as shown in the plot, returned as a row vector. outperm gives the order from left to right for a vertical dendrogram, and from bottom to top for a horizontal dendrogram. If there are P leaves in the dendrogram plot, outperm is a permutation of the vector 1:P.

## Version History

Introduced before R2006a

expand all