crosstab

Cross-tabulation

Syntax

tbl = crosstab(x1,x2)

tbl = crosstab(x1,...,xn)

tbl = crosstab(datatbl)

tbl = crosstab(___,Name=Value)

[tbl,chi2,p]
= crosstab(___)

[tbl,chi2,p,labels]
= crosstab(___)

Description

tbl = crosstab(x1,x2) returns a cross-tabulation, tbl, of two vectors of the same length, x1 and x2.

example

tbl = crosstab(x1,...,xn) returns a multi-dimensional cross-tabulation, tbl, of data for multiple input vectors, x1, x2, ..., xn.

example

tbl = crosstab(datatbl) returns a cross-tabulation of the variables in the table datatbl.

tbl = crosstab(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. You can specify whether to include missing groups, and specify to return the output as a matrix, table, or stacked table.

example

[tbl,chi2,p] = crosstab(___) also returns the chi-square statistic and p-value for a chi-square test. The test null hypothesis is that the proportion in any entry in tbl is the product of the proportions in each dimension.

example

[tbl,chi2,p,labels] = crosstab(___) also returns a cell array, labels, which contains one column of labels for each input argument, x1 ... xn.

example

Examples

collapse all

Cross-Tabulate Two Data Vectors

Open Live Script

Create two sample data vectors, containing three and four distinct values, respectively.

x = [1 1 2 3 1];
y = [1 2 5 3 1];

Cross-tabulate x and y.

table = crosstab(x,y)

table = 3×4

     2     1     0     0
     0     0     0     1
     0     0     1     0

The rows in table correspond to the three distinct values in x, and the columns correspond to the four distinct values in y.

Cross-Tabulate Independent Data Vectors

Open Live Script

Generate two independent vectors, x1 and x2, each containing 50 discrete uniform random numbers in the range 1:3.

rng default;  % for reproducibility
x1 = unidrnd(3,50,1);
x2 = unidrnd(3,50,1);

Cross-tabulate x1 and x2.

[table,chi2,p] = crosstab(x1,x2)

table = 3×3

     1     6     7
     5     5     2
    11     7     6

chi2 = 
7.5449

p = 
0.1097

The returned p value of 0.1097 indicates that, at the 5% significance level, crosstab fails to reject the null hypothesis that table is independent in each dimension.

Cross-Tabulate Grouped Data

Open Live Script

Load the sample data, which contains measurements of large model cars during the years 1970-1982.

load carbig

Cross-tabulate the data of four-cylinder cars (cyl4) based on model year (when) and country of origin (org).

[table,chi2,p,labels] = crosstab(cyl4,when,org);

Use labels to determine the index location in table for the number of four-cylinder cars made in the USA during the late period of the data.

labels

labels=3×3 cell array
    {'Other'   }    {'Early'}    {'USA'   }
    {'Four'    }    {'Mid'  }    {'Europe'}
    {0×0 double}    {'Late' }    {'Japan' }

The first column of labels corresponds to the data in cyl4, and indicates that row 2 of table contains data on cars with four cylinders. The second column of labels corresponds to the data in when, and indicates that column 3 of table contains data on cars made during the late period. The third column of labels corresponds to the data in org, and indicates that location 1 of the third dimension of table contains data on cars made in the USA.

Therefore, table(2,3,1) contains the number of four-cylinder cars made in the USA during the late period.

table(2,3,1)

ans = 
38

The data contains 38 four-cylinder cars made in the USA during the late period.

Count `NaN` Values

Open Live Script

Load the grouping variables and display the grouping variables table.

load grouping_variables.mat
datatbl

datatbl=4×3 table
     x      y      z 
    ___    ___    ___

    NaN    NaN      5
      5      1      5
      1      2      1
      5      3    NaN

datatbl is a table that contains values for three grouping variables: x, y, and z. All three variables contain a NaN value.

Generate a cross-tabulation table using the grouping variables in datatbl. Include counts for the NaN entries.

tbl = crosstab(datatbl,IncludeMissingGroups=true)

tbl=36×4 table
     x      y     z    Counts
    ___    ___    _    ______

      1      1    1      0   
      5      1    1      0   
    NaN      1    1      0   
      1      2    1      1   
      5      2    1      0   
    NaN      2    1      0   
      1      3    1      0   
      5      3    1      0   
    NaN      3    1      0   
      1    NaN    1      0   
      5    NaN    1      0   
    NaN    NaN    1      0   
      1      1    5      0   
      5      1    5      1   
    NaN      1    5      0   
      1      2    5      0   
      ⋮

The cross-tabulation table tbl includes counts for every unique combination of the grouping variable values, including NaN values. Each row of the table corresponds to a unique combination, and the last column contains the count for each combination.

Generate and Visualize Contingency Table

Open Live Script

Create a contingency table from data, and visualize the table in a heatmap chart.

Load the hospital data.

load hospital

The hospital dataset array contains data on 100 hospital patients, including last name, gender, age, weight, smoking status, and systolic and diastolic blood pressure measurements.

Convert the dataset array to a MATLAB® table.

Tbl = dataset2table(hospital);

Determine whether smoking status is independent of gender by creating a 2-by-2 contingency table of smokers and nonsmokers, grouped by gender.

[conttbl,chi2,p,labels] = crosstab(Tbl.Sex,Tbl.Smoker)

conttbl = 2×2

    40    13
    26    21

chi2 = 
4.5083

p = 
0.0337

labels = 2×2 cell
    {'Female'}    {'0'}
    {'Male'  }    {'1'}

The rows of the resulting contingency table conttbl correspond to patient gender, with row 1 containing data for females and row 2 containing data for males. The columns correspond to patient smoking status, with column 1 containing data for nonsmokers and column 2 containing data for smokers. The returned result chi2 = 4.5083 is the value of the chi-squared test statistic for a Pearson's chi-squared test of independence. The $p$ -value for the test p = 0.0337 suggests, at a 5% level of significance, rejection of the null hypothesis that gender and smoking status are independent.

Visualize the contingency table in a heatmap. Plot smoking status on the $x$ -axis and gender on the $y$ -axis.

heatmap(Tbl,'Smoker','Sex')

Figure contains an object of type heatmap. The chart of type heatmap has title Count of Sex vs. Smoker.

Input Arguments

collapse all

`x1` — Input vector
vector of grouping variables

Input vector, specified as a vector of grouping variables. All input vectors, including x1, x2, ..., xn, must be the same length.

`x2` — Input vector
vector of grouping variables

Input vector, specified as a vector of grouping variables. All input vectors, including x1, x2, ..., xn, must be the same length.

`x1,...,xn` — Input vectors
vectors of grouping variables

Input vectors, specified as vectors of grouping variables. If you use this syntax to specify more than two input vectors, then crosstab generates a multi-dimensional cross-tabulation table. All input vectors, including x1, x2, ..., xn, must be the same length.

`datatbl` — Table of input vectors
table of grouping variables

Table of input vectors, specified as a table of grouping variables. If datatbl contains more than two input vectors, crosstab returns a table in the stacked-table format. For more information, see OutputFormat.

Example: array2table(randi(5,3,4))

Data Types: table

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: crosstab(datatbl,IncludeMissingGroups=true,OutputFormat="matrix") cross-tabulates the data in datatbl, including missing data, and returns the cross-tabulation table as an array.

`IncludeMissingGroups` — Flag to include missing data
`false` or `0` (default) | `true` or `1`

Flag to include missing data, specified as a numeric or logical 0 (false) or 1 (true). When IncludeMissingGroups is true, crosstab includes counts for NaN, <missing>, <undefined>, and {''} values in the cross-tabulation table.

Example: IncludeMissingGroups=true

Data Types: single | double | logical

`OutputFormat` — Cross-tabulation table format
string | character vector

Cross-tabulation table format, specified as a string or character vector containing one of these values:

"matrix" — crosstab returns the cross-tabulation table as an array with n dimensions, where n is the number of grouping variables. Each dimension corresponds to a grouping variable, and the size of the dimension is the number of elements in the variable. The elements of the cross-tabulation table are counts for the combination of grouping variable values corresponding to the element's index. This value is the default when you specify the input data using the input arguments x1 and x2, or x1,...,xn.
"table" — crosstab returns the cross-tabulation table as a table with labels for the rows and columns. The rows of the table correspond to the values of the first grouping variable, and the columns correspond to the values of the second grouping variable. The elements of the cross-tabulation table are counts for the corresponding grouping variable. You must have at most two grouping variables to specify "table". This value is the default when you specify two grouping variables using the datatbl input argument.
"stacked-table" — crosstab returns the cross-tabulation table as a table with columns corresponding to the grouping variables, and a column for the counts. Each row in tbl contains a unique combination of grouping variable values and the count for the unique combination. This value is the default when you specify three or more grouping variables using the datatbl argument.

Example: OutputFormat="matrix"

Data Types: string | char

Output Arguments

collapse all

`tbl` — Cross-tabulation table
matrix of integer values | table

Cross-tabulation table, returned as a matrix of integer values or a table.

You can specify the format of the cross-tabulation table using the OutputFormat name-value argument.

`chi2` — Chi-square statistic
positive scalar value

Chi-square statistic, returned as a positive scalar value. The null hypothesis is that the proportion in any entry of tbl is the product of the proportions in each dimension.

`p` — p-Value
scalar value in the range `[0,1]`

p-value for the chi-square test statistic, returned as a scalar value in the range [0,1]. crosstab tests that tbl is independent in each dimension.

`labels` — Data labels
cell array

Data labels, returned as a cell array. The entries in the first column are labels for the rows of tbl, the entries in the second column are labels for the columns, and so on, for a multi-dimensional tbl.

Algorithms

crosstab uses grp2idx to assign a positive integer to each distinct value. tbl(i,j) is a count of indices where grp2idx(x1) is i and grp2idx(x2) is j. The numerical order of grp2idx(x1) and grp2idx(x2) order rows and columns of tbl, respectively.
In this case, the returned value of tbl(i,j,...,n) is a count of indices where grp2idx(x1) is i, grp2idx(x2) is j, grp2idx(x3) is k, and so on.
crosstab computes the p-value of the chi-square test statistic using a formula that is asymptotically valid for a large sample size. The approximation is less accurate for small samples or samples with uneven marginal distributions. If your sample includes only two variables and each has two levels, you can use fishertest instead. This function performs Fisher’s exact test, which does not depend on large-sample distribution assumptions.

Extended Capabilities

expand all

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

This function supports tall arrays for out-of-memory data with the limitation:

The fourth output, labels, is returned as a cell array containing M unevaluated tall cell arrays, where M is the number of input grouping variables. Each unevaluated tall cell array, labels{j}, contains the labels for one grouping variable.

For more information, see Tall Arrays for Out-of-Memory Data.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

The crosstab function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.

Version History

Introduced before R2006a

expand all

R2025a: Include missing counts and specify the cross-tabulation table format

Use the IncludeMissingGroups name-value argument to count missing values. Use the OutputFormat name-value argument to specify the format of the returned cross-tabulation table.

R2025a: Specify grouping variables in table format

Use the datatbl input argument to specify table grouping variables.

crosstab

Syntax

Description

Examples

Cross-Tabulate Two Data Vectors

Cross-Tabulate Independent Data Vectors

Cross-Tabulate Grouped Data

Count `NaN` Values

Generate and Visualize Contingency Table

Input Arguments

`x1` — Input vector
vector of grouping variables

`x2` — Input vector
vector of grouping variables

`x1,...,xn` — Input vectors
vectors of grouping variables

`datatbl` — Table of input vectors
table of grouping variables

Name-Value Arguments

`IncludeMissingGroups` — Flag to include missing data
`false` or `0` (default) | `true` or `1`

`OutputFormat` — Cross-tabulation table format
string | character vector

Output Arguments

`tbl` — Cross-tabulation table
matrix of integer values | table

`chi2` — Chi-square statistic
positive scalar value

`p` — p-Value
scalar value in the range `[0,1]`

`labels` — Data labels
cell array

Algorithms

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

Version History

R2025a: Include missing counts and specify the cross-tabulation table format

R2025a: Specify grouping variables in table format

See Also

Topics

crosstab

Syntax

Description

Examples

Cross-Tabulate Two Data Vectors

Cross-Tabulate Independent Data Vectors

Cross-Tabulate Grouped Data

Count NaN Values

Generate and Visualize Contingency Table

Input Arguments

x1 — Input vector vector of grouping variables

x2 — Input vector vector of grouping variables

x1,...,xn — Input vectors vectors of grouping variables

datatbl — Table of input vectors table of grouping variables

Name-Value Arguments

IncludeMissingGroups — Flag to include missing data false or 0 (default) | true or 1

OutputFormat — Cross-tabulation table format string | character vector

Output Arguments

tbl — Cross-tabulation table matrix of integer values | table

chi2 — Chi-square statistic positive scalar value

p — p-Value scalar value in the range [0,1]

labels — Data labels cell array

Algorithms

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

Thread-Based Environment Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

R2025a: Include missing counts and specify the cross-tabulation table format

R2025a: Specify grouping variables in table format

See Also

Topics

Count `NaN` Values

`x1` — Input vector
vector of grouping variables

`x2` — Input vector
vector of grouping variables

`x1,...,xn` — Input vectors
vectors of grouping variables

`datatbl` — Table of input vectors
table of grouping variables

`IncludeMissingGroups` — Flag to include missing data
`false` or `0` (default) | `true` or `1`

`OutputFormat` — Cross-tabulation table format
string | character vector

`tbl` — Cross-tabulation table
matrix of integer values | table

`chi2` — Chi-square statistic
positive scalar value

`p` — p-Value
scalar value in the range `[0,1]`

`labels` — Data labels
cell array

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.