Main Content

crossentropy

Cross-entropy loss for classification tasks

Description

The cross-entropy operation computes the cross-entropy loss between network predictions and target values for single-label and multi-label classification tasks.

The crossentropy function computes the cross-entropy loss between predictions and targets represented as dlarray data. Using dlarray objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the 'S', 'T', 'C', and 'B' labels, respectively. For unspecified and other dimensions, use the 'U' label. For dlarray object functions that operate over particular dimensions, you can specify the dimension labels by formatting the dlarray object directly, or by using the 'DataFormat' option.

Note

To calculate the cross-entropy loss within a layerGraph object or Layer array for use with the trainNetwork function, use classificationLayer.

example

loss = crossentropy(dlY,targets) returns the categorical cross-entropy loss between the formatted dlarray object dlY containing the predictions and the target values targets for single-label classification tasks. The output loss is an unformatted scalar dlarray scalar.

For unformatted input data, use the 'DataFormat' option.

loss = crossentropy(dlY,targets,weights) applies weights to the calculated loss values. Use this syntax to weight the contributions of classes, observations, regions, or individual elements of the input to the calculated loss values.

loss = crossentropy(___,'DataFormat',FMT) also specifies the dimension format FMT when dlY is not a formatted dlarray.

loss = crossentropy(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'TargetCategories','independent' computes the cross-entropy loss for a multi-label classification task.

Examples

collapse all

Create an array of prediction scores for 12 observations over 10 classes.

numClasses = 10;
numObservations = 12;

Y = rand(numClasses,numObservations);
dlY = dlarray(Y,'CB');
dlY = softmax(dlY);

View the size and format of the prediction scores.

size(dlY)
ans = 1×2

    10    12

dims(dlY)
ans = 
'CB'

Create an array of targets encoded as one-hot vectors.

labels = randi(numClasses,[1 numObservations]);
targets = onehotencode(labels,1,'ClassNames',1:numClasses);

View the size of the targets.

size(targets)
ans = 1×2

    10    12

Compute the cross-entropy loss between the predictions and the targets.

loss = crossentropy(dlY,targets)
loss = 
  1x1 dlarray

    2.3343

Create an array of prediction scores for 12 observations over 10 classes.

numClasses = 10;
numObservations = 12;
Y = rand(numClasses,numObservations);
dlY = dlarray(Y,'CB');

View the size and format of the prediction scores.

size(dlY)
ans = 1×2

    10    12

dims(dlY)
ans = 
'CB'

Create a random array of targets encoded as a numeric array of zeros and ones. Each observation can have multiple classes.

targets = rand(numClasses,numObservations) > 0.75;
targets = single(targets);

View the size of the targets.

size(targets)
ans = 1×2

    10    12

Compute the cross-entropy loss between the predictions and the targets. To specify cross-entropy loss for multi-label classification, set the 'TargetCategories' option to 'independent'.

loss = crossentropy(dlY,targets,'TargetCategories','independent')
loss = 
  1x1 single dlarray

    9.8853

Create an array of prediction scores for 12 observations over 10 classes.

numClasses = 10;
numObservations = 12;

Y = rand(numClasses,numObservations);
dlY = dlarray(Y,'CB');
dlY = softmax(dlY);

View the size and format of the prediction scores.

size(dlY)
ans = 1×2

    10    12

dims(dlY)
ans = 
'CB'

Create an array of targets encoded as one-hot vectors.

labels = randi(numClasses,[1 numObservations]);
targets = onehotencode(labels,1,'ClassNames',1:numClasses);

View the size of the targets.

size(targets)
ans = 1×2

    10    12

Compute the weighted cross-entropy loss between the predictions and the targets using a vector class weights. Specify a weights format of 'UC' (unspecified, channel) using the 'WeightsFormat' option.

weights = rand(1,numClasses);
loss = crossentropy(dlY,targets,weights,'WeightsFormat','UC')
loss = 
  1x1 dlarray

    1.1261

Input Arguments

collapse all

Predictions, specified as a formatted dlarray, an unformatted dlarray, or a numeric array. When dlY is not a formatted dlarray, you must specify the dimension format using the 'DataFormat' option.

If dlY is a numeric array, targets must be a dlarray.

Target classification labels, specified as a formatted or unformatted dlarray or a numeric array.

Specify the targets as an array containing one-hot encoded labels with the same size and format as dlY. For example, if dlY is a numObservations-by-numClasses array, then targets(n,i) = 1 if observation n belongs to class i targets(n,i) = 0 otherwise.

If targets is a formatted dlarray, its dimension format must be the same as the format of dlY, or the same as 'DataFormat' if dlY is unformatted

If targets is an unformatted dlarray or a numeric array, then the format of dlY or the value of 'DataFormat' is implicitly applied to targets.

Tip

Formatted dlarray objects automatically sorts their dimensions. To ensure that the dimensions of dlY and targets are consistent, when dlY is a formatted dlarray, also specify targets as a formatted dlarray.

Weights, specified as a dlarray or a numeric array.

To specify class weights, specify a vector with a 'C' (channel) dimension with size matching the 'C' (channel) dimension of the dlX. Specify the 'C' (channel) dimension of the class weights by using a formatted dlarray object or by using the 'WeightsFormat' option.

To specify observation weights, specify a vector with a 'B' (batch) dimension with size matching the 'B' (batch) dimension of the dlY. Specify the 'B' (batch) dimension of the class weights by using a formatted dlarray object or by using the 'WeightsFormat' option.

To specify weights for each element of the input independently, specify the weights as an array of the same size as dlY. In this case, if weights is not a formatted dlarray object, then the function uses the same format as dlY. Alternatively, specify the weights format using the 'WeightsFormat' option.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'TargetCategories','independent','DataFormat','CB' evaluates the cross-entropy loss for multi-label classification tasks and specifies the dimension order of the input data as 'CB'

Type of classification task, specified as the comma-separated pair consisting of 'TargetCategories' and one of the following:

  • 'exclusive' — Single-label classification. Each observation in the predictions dlY is exclusively assigned to one category. The function computes the loss between the target value for the single category specified by targets and the corresponding prediction in dlY, averaged over the number of observations.

  • 'independent'— Multi-label classification. Each observation in the predictions dlY can be assigned to one or more independent categories. The function computes the sum of the loss between each category specified by targets and the predictions in dlY for those categories, averaged over the number of observations. Cross-entropy loss for this type of classification task is also known as binary cross-entropy loss.

Mask indicating which elements to include for loss computation, specified as the comma-separated pair consisting of 'Mask' and a dlarray object, a logical array, or a numeric array with the same size as dlY.

The function includes and excludes elements of the input data for loss computation when the corresponding value in the mask is 1 and 0, respectively.

The default value is a logical array of ones with the same size as dlY.

Tip

Formatted dlarray objects automatically sorts their dimensions. To ensure that the dimensions of dlY and the mask are consistent, when dlY is a formatted dlarray, also specify the mask as a formatted dlarray.

Mode for reducing array of loss values, specified as the comma-separated pair consisting of 'Reduction' and one of the following:

  • 'sum' – Sum all of the elements in the array of loss values. In this case, the output loss is scalar.

  • 'none' – Do not reduce the array of loss values. In this case, the output loss is an unformatted dlarray object with the same size as dlY.

Divisor for normalizing reduced loss when 'Reduction' is 'sum', specified as the comma-separated pair consisting of 'NormalizationFactor' and one of the following:

  • 'batch-size' – Normalize the loss by dividing by the number of observations in dlX.

  • 'all-elements' – Normalize the loss by dividing by the number of elements of dlX.

  • 'mask-included' – Normalize the loss by dividing the loss values by the number of included elements specified by the mask for each observation independently. To use this option, you must specify a mask using the 'Mask' option.

  • 'none' – Do not normalize the loss.

Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character vector or string scalar FMT that provides a label for each dimension of the data.

When specifying the format of a dlarray object, each character provides a label for each dimension of the data and must be one of the following:

  • 'S' — Spatial

  • 'C' — Channel

  • 'B' — Batch (for example, samples and observations)

  • 'T' — Time (for example, time steps of sequences)

  • 'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

You must specify 'DataFormat' when the input data is not a formatted dlarray.

Example: 'DataFormat','SSCB'

Data Types: char | string

Dimension order of the class weights, specified as the comma-separated pair consisting of 'WeightsFormat' and a character vector or string scalar that provides a label for each dimension of the weights.

When specifying the format of a dlarray object, each character provides a label for each dimension of the data and must be one of the following:

  • 'S' — Spatial

  • 'C' — Channel

  • 'B' — Batch (for example, samples and observations)

  • 'T' — Time (for example, time steps of sequences)

  • 'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

You must specify 'WeightsFormat' when weights is a numeric vector and dlX has two or more nonsingleton dimensions.

If weights is not a vector, or both weights and dlY are vectors, then default value of 'WeightsFormat' is the same as the format of dlY.

Example: 'WeightsFormat','CB'

Data Types: char | string

Output Arguments

collapse all

Cross-entropy loss, returned as an unformatted dlarray. The output loss is an unformatted dlarray with the same underlying data type as the input dlY.

The size of loss depends on the 'Reduction' option.

Algorithms

collapse all

Cross-Entropy Loss

For each element Yj of the input, the crossentropy function computes the corresponding cross-entropy element-wise loss values using the formula

lossj=TjlnYj+(1Tj)ln(1Yj),

where Tj is the corresponding target value to Yj.

To reduce the loss values to a single scalar, the function then reduces the element-wise loss values to a scalar loss using the formula

loss=1Njmjwjlossj,

where N is the normalization factor, mj is the mask value for element j, and wj is the weight value for element j.

If not reducing the loss, the function applies the mask and the weights to the loss values directly:

lossj*=mjwjlossj

This table shows the loss formulations for different tasks.

TaskDescriptionLoss
Single-label classificationCross-entropy loss for mutually exclusive classes. This is useful when observations must have a single label only.

loss=1Nn=1Ni=1KTnilnYni,

where N and K are the numbers of observations, and classes, respectively.

Multi-label classificationCross-entropy loss for independent classes. This is useful when observations can have multiple labels.

loss=1Nn=1Ni=1K(Tnilog(Yni)+(1Tni)log(1Yni)),

where N and K are the numbers of observations and classes, respectively.

Single-label classification with weighted classesCross-entropy loss with class weights. This is useful for datasets with imbalanced classes.

loss=1Nn=1Ni=1KwiTnilnYni,

where N and K are the numbers of observations and classes, respectively, and wi denotes the weight for class i.

Sequence-to-sequence classificationCross-entropy loss with masked time-steps. This is useful for ignoring loss values that correspond to padded data.

loss=1Nn=1Nt=1Smnti=1KTntilnYnti,

where N, S, and K are the numbers of observations, time steps, and classes, mnt denotes the mask value for time step t of observation n.

Extended Capabilities

Introduced in R2019b