indexcrossentropy

Index cross-entropy loss for classification tasks

Since R2024b

Syntax

loss = indexcrossentropy(Y,targets)

loss = indexcrossentropy(Y,targets,weights)

loss = indexcrossentropy(___,Name=Value)

Description

The index cross-entropy operation computes the cross-entropy loss between network predictions and targets specified as integer class indices for single-label classification tasks.

Index cross-entropy loss, also known as sparse cross-entropy loss, is a more memory and computationally efficient alternative to the standard cross-entropy loss algorithm. It does not require binary or one-hot encoded targets. Instead, the function requires targets specified as integer class indices. Index cross-entropy loss is particularly well-suited to targets that span many classes, where one-hot encoded data presents unnecessary memory overhead.

loss = indexcrossentropy(Y,targets) calculates the categorical cross-entropy loss between the formatted predictions Y and the integer class indices targets for single-label classification tasks.

For unformatted input data, use the DataFormat argument.

example

loss = indexcrossentropy(Y,targets,weights) applies weights to the calculated loss values. Use this syntax to weight the contributions of classes, observations, regions, or individual elements of the input to the calculated loss values.

loss = indexcrossentropy(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of the input arguments from previous syntaxes. For example, DataFormat="BC" specifies that the first and second dimensions of the input data correspond to the batch and channel dimensions, respectively.

Examples

collapse all

Index Cross-Entropy Loss for Single-Label Classification

Open Live Script

Create an array of prediction scores for seven observations over five classes.

numClasses = 5;
numObservations = 7;

Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y)

Y = 
  5(C) × 7(B) dlarray

    0.2205    0.1175    0.1140    0.1153    0.1963    0.2416    0.3104
    0.2415    0.1408    0.2571    0.1526    0.1056    0.2381    0.1582
    0.1109    0.1842    0.2537    0.2500    0.2381    0.1677    0.2021
    0.2434    0.2777    0.1583    0.2210    0.2592    0.2182    0.1605
    0.1837    0.2798    0.2169    0.2612    0.2008    0.1344    0.1688

Create an array of targets specified as class indices.

T = randi(numClasses,[1 numObservations])

T = 1×7

     5     4     2     5     1     3     2

Compute the index cross-entropy loss between the predictions and the targets.

loss = indexcrossentropy(Y,T)

loss = 
  1×1 dlarray

    1.5620

Weighted Index Cross-Entropy Loss

Open Live Script

Create an array of prediction scores for seven observations over five classes.

numClasses = 5;
numObservations = 7;

Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y)

Y = 
  5(C) × 7(B) dlarray

    0.2205    0.1175    0.1140    0.1153    0.1963    0.2416    0.3104
    0.2415    0.1408    0.2571    0.1526    0.1056    0.2381    0.1582
    0.1109    0.1842    0.2537    0.2500    0.2381    0.1677    0.2021
    0.2434    0.2777    0.1583    0.2210    0.2592    0.2182    0.1605
    0.1837    0.2798    0.2169    0.2612    0.2008    0.1344    0.1688

Create an array of targets specified as class indices.

T = randi(numClasses,[1 numObservations])

T = 1×7

     5     4     2     5     1     3     2

Compute the weighted cross-entropy loss between the predictions and the targets using a vector of class weights. Specify a weights format of "UC" (unspecified, channel) using the WeightsFormat argument.

weights = rand(1,numClasses)

weights = 1×5

    0.7655    0.7952    0.1869    0.4898    0.4456

loss = indexcrossentropy(Y,T,weights,WeightsFormat="UC")

loss = 
  1×1 dlarray

    0.8725

Input Arguments

collapse all

`Y` — Predictions
`dlarray` object | numeric array

Predictions, specified as a formatted or unformatted dlarray object, or a numeric array. When Y is not a formatted dlarray, you must specify the dimension format using the DataFormat argument.

If Y is a numeric array, targets must be a dlarray object.

`targets` — Target classification labels
`dlarray` object | numeric array

Target classification labels, specified as a formatted or unformatted dlarray object, or a numeric array.

Specify the targets as an array containing integer class indices with the same size and format as Y, excluding the channel dimension. Each element of targets must be a positive integer less than or equal to the size of the channel dimension of Y (the number of classes), or equal to the MaskIndex argument value.

If targets and Y are formatted dlarray objects, then the format of targets must be the same as the format of Y, excluding the "C" (channel) dimension. If targets is a formatted dlarray object and Y is not a formatted dlarray object, then the format of targets must be the same as the DataFormat argument value, excluding the "C" (channel) dimension.

If targets is an unformatted dlarray or a numeric array, then the function applies the format of Y or the value of DataFormat to targets.

Tip

Formatted dlarray objects automatically permute the dimensions of the underlying data to have the order "S" (spatial), "C" (channel), "B" (batch), "T" (time), then "U" (unspecified). To ensure that the dimensions of Y and targets are consistent, when Y is a formatted dlarray, also specify targets as a formatted dlarray.

`weights` — Weights
`dlarray` object | numeric array

Weights, specified as a dlarray object or a numeric array.

To specify class weights, specify a vector with a "C" (channel) dimension with size matching the "C" (channel) dimension of Y and a singleton "U" (unspecified) dimension. Specify the dimensions of the class weights by using a formatted dlarray object or by using the WeightsFormat argument.

To specify observation weights, specify a vector with a "B" (batch) dimension with size matching the "B" (batch) dimension of Y. Specify the "B" (batch) dimension of the class weights by using a formatted dlarray object or by using the WeightsFormat argument.

To specify weights for each element of the input independently, specify the weights as an array of the same size as Y. In this case, if weights is not a formatted dlarray object, then the function uses the same format as Y. Alternatively, specify the weights format using the WeightsFormat argument.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: indexcrossentropy(Y,T,DataFormat="BC") specifies that the first and second dimension of the input data correspond to the batch and channel dimensions, respectively.

`MaskIndex` — Masked value index
`0` (default) | numeric scalar

Masked value index, specified as a numeric scalar.

The function excludes elements of the input data from loss computation when the target elements match the mask index.

`Reduction` — Loss value array reduction mode
`"sum"` (default) | `"none"`

Loss value array reduction mode, specified as "sum" or "none".

If the Reduction argument is "sum", then the function sums all elements in the array of loss values. In this case, the output loss is a scalar.

If the Reduction argument is "none", then the function does not reduce the array of loss values. In this case, the output loss is an unformatted dlarray object of the same size as Y.

`NormalizationFactor` — Divisor for normalizing reduced loss
`"batch-size"` (default) | `"all-elements"` | `"targets-included"` | `"none"`

Divisor for normalizing the reduced loss, specified as one of these options:

"batch-size" — Normalize the loss by dividing it by the number of observations in Y.
"all-elements" — Normalize the loss by dividing it by the number of elements of Y.
"targets-included" — Normalize the loss by dividing the loss values by the product of the number of observations and the number of elements that are not excluded according to the MaskIndex argument.
"none" — Do not normalize the loss.

If Reduction is "none", then this option has no effect.

`DataFormat` — Description of data dimensions
character vector | string scalar

Description of the data dimensions, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

For example, consider an array that represents a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can describe the data as having the format "CBT" (channel, batch, time).

You can specify multiple dimensions labeled "S" or "U". You can use the labels "C", "B", and "T" once each, at most. The software ignores singleton trailing "U" dimensions after the second dimension.

If the input data is not a formatted dlarray object, then you must specify the DataFormat option.

For more information, see Deep Learning Data Formats.

Data Types: char | string

`WeightsFormat` — Description of dimensions of weights
character vector | string scalar

Description of the dimensions of the weights, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

If weights is a numeric vector and Y has two or more nonsingleton dimensions, then you must specify the WeightsFormat option.

If weights is not a vector, or weights and Y are both vectors, then the default value of WeightsFormat is the same as the format of Y.

For more information, see Deep Learning Data Formats.

Data Types: char | string

Output Arguments

collapse all

`loss` — Index cross-entropy loss
unformatted `dlarray` object

Index cross-entropy loss, returned as an unformatted dlarray object with the same underlying data type as the input Y.

If the Reduction argument is "sum", then the function sums all elements in the array of loss values. In this case, the output loss is a scalar.

If the Reduction argument is "none", then the function does not reduce the array of loss values. In this case, the output loss is an unformatted dlarray object of the same size as Y.

Algorithms

collapse all

Index Cross-Entropy Loss

In particular, for each prediction in the input, the standard cross-entropy loss function requires targets specified as 1-by-K vectors, each containing only one nonzero element. To avoid the dense encoding of the zero and nonzero elements, the index cross-entropy function requires targets specified as scalars that represent the indices of the nonzero elements.

For single-label classification, the standard cross-entropy function uses the formula

$loss = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{i = 1}^{K} T_{n, i} ln Y_{n, i},$

where T is an array of one-hot encoded targets, Y is an array of predictions, and N and K are the numbers of observations and classes, respectively.

For single-label classification, the index cross-entropy loss function uses the formula:

$loss = - \frac{1}{N} \sum_{n = 1}^{N} \ln Y_{n, T_{n}},$

where T is an array of targets, specified as class indices.

This table shows the index cross-entropy loss formulas for different tasks.

Task	Description	Loss
Single-label classification	Index cross-entropy loss for mutually exclusive classes. This is useful when observations must have only a single label.	$loss = - \frac{1}{N} \sum_{n = 1}^{N} \ln Y_{n, T_{n}},$ where N is the numbers of observations.
Single-label classification with weighted classes	Index cross-entropy loss with class weights. This is useful for datasets with imbalanced classes.	$loss = - \frac{1}{N} \sum_{n = 1}^{N} w_{T_{n}} ln Y_{n, T_{n}},$ where N is the number of observations, and w_i denotes the weight for class i.
Sequence-to-sequence classification	Index cross-entropy loss with masked time steps. This is useful for ignoring loss values that correspond to padded data.	$loss = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{t = 1}^{S} [Y_{n, t, T_{n, t}} = m] \sum_{i = 1}^{K} ln Y_{n, t, T_{n, t}},$ where $[x = m] = {\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} & \begin{matrix} if x = m \\ if x \neq m \end{matrix} \end{matrix},$ and N, S, and K are the numbers of observations, time steps, and classes, respectively, and m denotes the mask index.

Task

Description

Loss

Single-label classification

Index cross-entropy loss for mutually exclusive classes. This is useful when observations must have only a single label.

$loss = - \frac{1}{N} \sum_{n = 1}^{N} \ln Y_{n, T_{n}},$

where N is the numbers of observations.

Single-label classification with weighted classes

Index cross-entropy loss with class weights. This is useful for datasets with imbalanced classes.

$loss = - \frac{1}{N} \sum_{n = 1}^{N} w_{T_{n}} ln Y_{n, T_{n}},$

where N is the number of observations, and w_i denotes the weight for class i.

Sequence-to-sequence classification

Index cross-entropy loss with masked time steps. This is useful for ignoring loss values that correspond to padded data.

$loss = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{t = 1}^{S} [Y_{n, t, T_{n, t}} = m] \sum_{i = 1}^{K} ln Y_{n, t, T_{n, t}},$

where

$[x = m] = {\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} & \begin{matrix} if x = m \\ if x \neq m \end{matrix} \end{matrix},$

and N, S, and K are the numbers of observations, time steps, and classes, respectively, and m denotes the mask index.

Deep Learning Array Formats

Most deep learning networks and functions operate on different dimensions of the input data in different ways.

For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.

To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

To create formatted input data, create a dlarray object and specify the format using the second argument.

To provide additional layout information with unformatted data, specify the formats using the DataFormat and WeightsFormat arguments.

For more information, see Deep Learning Data Formats.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The indexcrossentropy function supports GPU array input with these usage notes and limitations:

When at least one of these input arguments is a gpuArray or a dlarray with underlying data of type gpuArray, this function runs on the GPU:
- Y
- targets
- weights
- MaskIndex

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2024b

indexcrossentropy

Syntax

Description

Examples

Index Cross-Entropy Loss for Single-Label Classification

Weighted Index Cross-Entropy Loss

Input Arguments

`Y` — Predictions
`dlarray` object | numeric array

`targets` — Target classification labels
`dlarray` object | numeric array

`weights` — Weights
`dlarray` object | numeric array

Name-Value Arguments

`MaskIndex` — Masked value index
`0` (default) | numeric scalar

`Reduction` — Loss value array reduction mode
`"sum"` (default) | `"none"`

`NormalizationFactor` — Divisor for normalizing reduced loss
`"batch-size"` (default) | `"all-elements"` | `"targets-included"` | `"none"`

`DataFormat` — Description of data dimensions
character vector | string scalar

`WeightsFormat` — Description of dimensions of weights
character vector | string scalar

Output Arguments

`loss` — Index cross-entropy loss
unformatted `dlarray` object

Algorithms

Index Cross-Entropy Loss

Deep Learning Array Formats

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

indexcrossentropy

Syntax

Description

Examples

Index Cross-Entropy Loss for Single-Label Classification

Weighted Index Cross-Entropy Loss

Input Arguments

Y — Predictions dlarray object | numeric array

targets — Target classification labels dlarray object | numeric array

weights — Weights dlarray object | numeric array

Name-Value Arguments

MaskIndex — Masked value index 0 (default) | numeric scalar

Reduction — Loss value array reduction mode "sum" (default) | "none"

NormalizationFactor — Divisor for normalizing reduced loss "batch-size" (default) | "all-elements" | "targets-included" | "none"

DataFormat — Description of data dimensions character vector | string scalar

WeightsFormat — Description of dimensions of weights character vector | string scalar

Output Arguments

loss — Index cross-entropy loss unformatted dlarray object

Algorithms

Index Cross-Entropy Loss

Deep Learning Array Formats

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

`Y` — Predictions
`dlarray` object | numeric array

`targets` — Target classification labels
`dlarray` object | numeric array

`weights` — Weights
`dlarray` object | numeric array

`MaskIndex` — Masked value index
`0` (default) | numeric scalar

`Reduction` — Loss value array reduction mode
`"sum"` (default) | `"none"`

`NormalizationFactor` — Divisor for normalizing reduced loss
`"batch-size"` (default) | `"all-elements"` | `"targets-included"` | `"none"`

`DataFormat` — Description of data dimensions
character vector | string scalar

`WeightsFormat` — Description of dimensions of weights
character vector | string scalar

`loss` — Index cross-entropy loss
unformatted `dlarray` object

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.