indexcrossentropy
Syntax
Description
The index cross-entropy operation computes the cross-entropy loss between network predictions and targets specified as integer class indices for single-label classification tasks.
Index cross-entropy loss, also known as sparse cross-entropy loss, is a more memory and computationally efficient alternative to the standard cross-entropy loss algorithm. It does not require binary or one-hot encoded targets. Instead, the function requires targets specified as integer class indices. Index cross-entropy loss is particularly well-suited to targets that span many classes, where one-hot encoded data presents unnecessary memory overhead.
calculates the categorical cross-entropy loss between the formatted predictions
loss = indexcrossentropy(Y,targets)Y and the integer class indices targets for
single-label classification tasks.
For unformatted input data, use the DataFormat
argument.
specifies options using one or more name-value arguments in addition to any combination of
the input arguments from previous syntaxes. For example, loss = indexcrossentropy(___,Name=Value)DataFormat="BC"
specifies that the first and second dimensions of the input data correspond to the batch and
channel dimensions, respectively.
Examples
Create an array of prediction scores for seven observations over five classes.
numClasses = 5;
numObservations = 7;
Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y)Y =
5(C) × 7(B) dlarray
0.2205 0.1175 0.1140 0.1153 0.1963 0.2416 0.3104
0.2415 0.1408 0.2571 0.1526 0.1056 0.2381 0.1582
0.1109 0.1842 0.2537 0.2500 0.2381 0.1677 0.2021
0.2434 0.2777 0.1583 0.2210 0.2592 0.2182 0.1605
0.1837 0.2798 0.2169 0.2612 0.2008 0.1344 0.1688
Create an array of targets specified as class indices.
T = randi(numClasses,[1 numObservations])
T = 1×7
5 4 2 5 1 3 2
Compute the index cross-entropy loss between the predictions and the targets.
loss = indexcrossentropy(Y,T)
loss =
1×1 dlarray
1.5620
Create an array of prediction scores for seven observations over five classes.
numClasses = 5;
numObservations = 7;
Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y)Y =
5(C) × 7(B) dlarray
0.2205 0.1175 0.1140 0.1153 0.1963 0.2416 0.3104
0.2415 0.1408 0.2571 0.1526 0.1056 0.2381 0.1582
0.1109 0.1842 0.2537 0.2500 0.2381 0.1677 0.2021
0.2434 0.2777 0.1583 0.2210 0.2592 0.2182 0.1605
0.1837 0.2798 0.2169 0.2612 0.2008 0.1344 0.1688
Create an array of targets specified as class indices.
T = randi(numClasses,[1 numObservations])
T = 1×7
5 4 2 5 1 3 2
Compute the weighted cross-entropy loss between the predictions and the targets using a vector of class weights. Specify a weights format of "UC" (unspecified, channel) using the WeightsFormat argument.
weights = rand(1,numClasses)
weights = 1×5
0.7655 0.7952 0.1869 0.4898 0.4456
loss = indexcrossentropy(Y,T,weights,WeightsFormat="UC")loss =
1×1 dlarray
0.8725
Input Arguments
Predictions, specified as a formatted or unformatted dlarray object,
or a numeric array. When Y is not a formatted
dlarray, you must specify the dimension format using the
DataFormat argument.
If Y is a numeric array, targets must be a
dlarray object.
Target classification labels, specified as a formatted or unformatted
dlarray object, or a numeric array.
Specify the targets as an array containing integer class indices with the same size
and format as Y, excluding the channel dimension. Each element of
targets must be a positive integer less than or equal to the size
of the channel dimension of Y (the number of classes), or equal to
the MaskIndex argument value.
If targets and Y are formatted
dlarray objects, then the format of targets must
be the same as the format of Y, excluding the
"C" (channel) dimension. If targets is a
formatted dlarray object and Y is not a formatted
dlarray object, then the format of targets must
be the same as the DataFormat argument value, excluding the
"C" (channel) dimension.
If targets is an unformatted dlarray or a
numeric array, then the function applies the format of Y or the
value of DataFormat to targets.
Tip
Formatted dlarray objects automatically permute the dimensions of the
underlying data to have the order "S" (spatial), "C"
(channel), "B" (batch), "T" (time), then
"U" (unspecified). To ensure that the dimensions of
Y and targets are consistent, when
Y is a formatted dlarray, also specify
targets as a formatted dlarray.
Weights, specified as a dlarray object or a numeric array.
To specify class weights, specify a vector with a "C" (channel) dimension
with size matching the "C" (channel) dimension of
Y and a singleton "U" (unspecified)
dimension. Specify the dimensions of the class weights by using a formatted
dlarray object or by using the WeightsFormat
argument.
To specify observation weights, specify a vector with a "B" (batch)
dimension with size matching the "B" (batch) dimension of
Y. Specify the "B" (batch) dimension of the
class weights by using a formatted dlarray object or by using the
WeightsFormat argument.
To specify weights for each element of the input independently, specify the weights as an
array of the same size as Y. In this case, if
weights is not a formatted dlarray object, then
the function uses the same format as Y. Alternatively, specify the
weights format using the WeightsFormat argument.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: indexcrossentropy(Y,T,DataFormat="BC") specifies that the
first and second dimension of the input data correspond to the batch and channel dimensions,
respectively.
Masked value index, specified as a numeric scalar.
The function excludes elements of the input data from loss computation when the target elements match the mask index.
Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64
Loss value array reduction mode, specified as "sum" or
"none".
If the Reduction argument is "sum", then the function
sums all elements in the array of loss values. In this case, the output
loss is a scalar.
If the Reduction argument is "none", then the
function does not reduce the array of loss values. In this case, the output
loss is an unformatted dlarray object
of the same size as Y.
Divisor for normalizing the reduced loss, specified as one of these options:
"batch-size"— Normalize the loss by dividing it by the number of observations inY."all-elements"— Normalize the loss by dividing it by the number of elements ofY."targets-included"— Normalize the loss by dividing the loss values by the product of the number of observations and the number of elements that are not excluded according to theMaskIndexargument."none"— Do not normalize the loss.
If Reduction is "none", then this option
has no effect.
Description of the data dimensions, specified as a character vector or string scalar.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"— Spatial"C"— Channel"B"— Batch"T"— Time"U"— Unspecified
For example, consider an array that represents a batch of sequences where the first,
second, and third dimensions correspond to channels, observations, and time steps,
respectively. You can describe the data as having the format "CBT"
(channel, batch, time).
You can specify multiple dimensions labeled "S" or "U".
You can use the labels "C", "B", and
"T" once each, at most. The software ignores singleton trailing
"U" dimensions after the second dimension.
If the input data is not a formatted dlarray object, then you must
specify the DataFormat option.
For more information, see Deep Learning Data Formats.
Data Types: char | string
Description of the dimensions of the weights, specified as a character vector or string scalar.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"— Spatial"C"— Channel"B"— Batch"T"— Time"U"— Unspecified
For example, consider an array that represents a batch of sequences where the first,
second, and third dimensions correspond to channels, observations, and time steps,
respectively. You can describe the data as having the format "CBT"
(channel, batch, time).
You can specify multiple dimensions labeled "S" or "U".
You can use the labels "C", "B", and
"T" once each, at most. The software ignores singleton trailing
"U" dimensions after the second dimension.
If weights is a numeric vector and
Y has two or more nonsingleton
dimensions, then you must specify the
WeightsFormat option.
If weights is not a vector, or
weights and
Y are both vectors, then the
default value of WeightsFormat is the same
as the format of Y.
For more information, see Deep Learning Data Formats.
Data Types: char | string
Output Arguments
Index cross-entropy loss, returned as an unformatted dlarray
object with the same underlying data type as the input Y.
If the Reduction argument is "sum", then the function
sums all elements in the array of loss values. In this case, the output
loss is a scalar.
If the Reduction argument is "none", then the
function does not reduce the array of loss values. In this case, the output
loss is an unformatted dlarray object
of the same size as Y.
Algorithms
Index cross-entropy loss, also known as sparse cross-entropy loss, is a more memory and computationally efficient alternative to the standard cross-entropy loss algorithm. It does not require binary or one-hot encoded targets. Instead, the function requires targets specified as integer class indices. Index cross-entropy loss is particularly well-suited to targets that span many classes, where one-hot encoded data presents unnecessary memory overhead.
In particular, for each prediction in the input, the standard cross-entropy loss function requires targets specified as 1-by-K vectors, each containing only one nonzero element. To avoid the dense encoding of the zero and nonzero elements, the index cross-entropy function requires targets specified as scalars that represent the indices of the nonzero elements.
For single-label classification, the standard cross-entropy function uses the formula
where T is an array of one-hot encoded targets, Y is an array of predictions, and N and K are the numbers of observations and classes, respectively.
For single-label classification, the index cross-entropy loss function uses the formula:
where T is an array of targets, specified as class indices.
This table shows the index cross-entropy loss formulas for different tasks.
| Task | Description | Loss |
|---|---|---|
| Single-label classification | Index cross-entropy loss for mutually exclusive classes. This is useful when observations must have only a single label. |
where N is the numbers of observations. |
| Single-label classification with weighted classes | Index cross-entropy loss with class weights. This is useful for datasets with imbalanced classes. |
where N is the number of observations, and wi denotes the weight for class i. |
| Sequence-to-sequence classification | Index cross-entropy loss with masked time steps. This is useful for ignoring loss values that correspond to padded data. |
where and N, S, and K are the numbers of observations, time steps, and classes, respectively, and m denotes the mask index. |
Most deep learning networks and functions operate on different dimensions of the input data in different ways.
For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.
To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"— Spatial"C"— Channel"B"— Batch"T"— Time"U"— Unspecified
For example, consider an array that represents a batch of sequences where the first,
second, and third dimensions correspond to channels, observations, and time steps,
respectively. You can describe the data as having the format "CBT"
(channel, batch, time).
To create formatted input data, create a dlarray object and specify the format using the second argument.
To provide additional layout information with unformatted data, specify the formats using the DataFormat and WeightsFormat arguments.
For more information, see Deep Learning Data Formats.
Extended Capabilities
The indexcrossentropy function
supports GPU array input with these usage notes and limitations:
When at least one of these input arguments is a
gpuArrayor adlarraywith underlying data of typegpuArray, this function runs on the GPU:YtargetsweightsMaskIndex
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2024b
See Also
dlarray | dlgradient | dlfeval | crossentropy | softmax | sigmoid | huber | l1loss | l2loss
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)