kfoldLoss

Loss for cross-validated partitioned quantile regression model

Since R2025a

Syntax

L = kfoldLoss(CVMdl)

L = kfoldLoss(CVMdl,Name=Value)

Description

L = kfoldLoss(CVMdl) returns the loss (quantile loss) obtained by the cross-validated quantile regression model CVMdl. For every fold, kfoldLoss computes the loss for validation-fold observations using a model trained on training-fold observations. CVMdl.X and CVMdl.Y contain both sets of observations.

example

L = kfoldLoss(CVMdl,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the quantiles for which to return loss values.

example

Examples

collapse all

Compare Holdout and k-Fold Cross-Validation Quantile Losses

Open Live Script

Compute the quantile loss for a quantile neural network regression model, first partitioned using holdout validation and then partitioned using 5-fold cross-validation. Compare the two losses.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Cylinders, Displacement, and so on, as well as the response variable MPG. View the first eight observations.

load carbig
cars = table(Acceleration,Cylinders,Displacement, ...
    Horsepower,Model_Year,Origin,Weight,MPG);
head(cars)

    Acceleration    Cylinders    Displacement    Horsepower    Model_Year    Origin     Weight    MPG
    ____________    _________    ____________    __________    __________    _______    ______    ___

          12            8            307            130            70        USA         3504     18 
        11.5            8            350            165            70        USA         3693     15 
          11            8            318            150            70        USA         3436     18 
          12            8            304            150            70        USA         3433     16 
        10.5            8            302            140            70        USA         3449     17 
          10            8            429            198            70        USA         4341     15 
           9            8            454            220            70        USA         4354     14 
         8.5            8            440            215            70        USA         4312     14

Remove rows of cars where the table has missing values.

cars = rmmissing(cars);

Categorize the cars based on whether they were made in the USA.

cars.Origin = categorical(cellstr(cars.Origin));
cars.Origin = mergecats(cars.Origin,["France","Japan",...
    "Germany","Sweden","Italy","England"],"NotUSA");

Partition the data using cvpartition. First, create a partition for holdout validation, using approximately 80% of the observations for the training data and 20% for the test data. Then, create a partition for 5-fold cross-validation.

rng(0,"twister") % For reproducibility
holdoutPartition = cvpartition(height(cars),Holdout=0.20);
kfoldPartition = cvpartition(height(cars),KFold=5);

Train a quantile neural network regression model using the cars data. Specify MPG as the response variable, and standardize the numeric predictors. Use the default 0.5 quantile (median).

Mdl = fitrqnet(cars,"MPG",Standardize=true);

Create the partitioned quantile regression models using crossval.

holdoutMdl = crossval(Mdl,CVPartition=holdoutPartition)

holdoutMdl = 
  RegressionPartitionedQuantileModel
      CrossValidatedModel: 'QuantileNeuralNetwork'
           PredictorNames: {'Acceleration'  'Cylinders'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
    CategoricalPredictors: 6
             ResponseName: 'MPG'
          NumObservations: 392
                    KFold: 1
                Partition: [1×1 cvpartition]
        ResponseTransform: 'none'
                Quantiles: 0.5000


  Properties, Methods

kfoldMdl = crossval(Mdl,CVPartition=kfoldPartition)

kfoldMdl = 
  RegressionPartitionedQuantileModel
      CrossValidatedModel: 'QuantileNeuralNetwork'
           PredictorNames: {'Acceleration'  'Cylinders'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
    CategoricalPredictors: 6
             ResponseName: 'MPG'
          NumObservations: 392
                    KFold: 5
                Partition: [1×1 cvpartition]
        ResponseTransform: 'none'
                Quantiles: 0.5000


  Properties, Methods

Compute the quantile loss for holdoutMdl and kfoldMdl by using the kfoldLoss object function.

holdoutL = kfoldLoss(holdoutMdl)

holdoutL = 
0.9488

kfoldL = kfoldLoss(kfoldMdl)

kfoldL = 
0.9628

holdoutL is the quantile loss computed using one holdout set, while kfoldL is an average quantile loss computed using five holdout sets. Cross-validation metrics tend to be better indicators of a model's performance on unseen data.

Specify Prediction for Observations with Missing Values in Loss Computation

Open Live Script

Before computing the loss for a cross-validated quantile regression model, specify the prediction for observations with missing predictor values.

Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a matrix X containing the predictor variables Acceleration, Displacement, Horsepower, and Weight. Store the response variable MPG in the variable Y.

load carbig
X = [Acceleration,Displacement,Horsepower,Weight];
Y = MPG;

Train a cross-validated quantile linear regression model. Specify to use the 0.25, 0.50, and 0.75 quantiles (that is, the lower quartile, median, and upper quartile). To improve the model fit, change the beta tolerance to 1e-6 instead of the default value 1e-4, and use a ridge (L2) regularization term of 1. Specify 10-fold cross-validation by setting CrossVal="on".

rng(0,"twister") % For reproducibility
CVMdl = fitrqlinear(X,Y,Quantiles=[0.25,0.50,0.75], ...
    BetaTolerance=1e-6,Lambda=1,CrossVal="on")

CVMdl = 
  RegressionPartitionedQuantileModel
    CrossValidatedModel: 'QuantileLinear'
         PredictorNames: {'x1'  'x2'  'x3'  'x4'}
           ResponseName: 'Y'
        NumObservations: 398
                  KFold: 10
              Partition: [1×1 cvpartition]
      ResponseTransform: 'none'
              Quantiles: [0.2500 0.5000 0.7500]


  Properties, Methods

CVMdl is a RegressionPartitionedQuantileModel.

Compute the quantile loss for each fold and quantile. Use a NaN prediction for test set observations with missing predictor values.

L = kfoldLoss(CVMdl,Mode="individual",PredictionForMissingValue=NaN)

L = 10×3

    1.5388    1.6703    1.3547
       NaN       NaN       NaN
    1.9140    2.1864    2.0922
       NaN       NaN       NaN
    1.4339    2.2040    1.7293
    1.5513    1.9968    1.8037
       NaN       NaN       NaN
    1.3979    2.0011    2.0695
       NaN       NaN       NaN
    1.8021    2.2161    1.5746

The rows of L correspond to folds, and the columns correspond to quantiles. The NaN values in L indicate that the data set includes observations with missing predictor values. For example, at least one of the observations in the second test set has a missing predictor value. You can find the predictor values for the observations in the second test set by using the following code.

test2Indices = test(CVMdl.Partition,2);
test2Observations = CVMdl.X(test2Indices,:)

Instead of using a NaN prediction for test set observations with missing predictor values, remove the observations from the computation.

newL = kfoldLoss(CVMdl,Mode="individual", ...
    PredictionForMissingValue="omitted")

newL = 10×3

    1.5388    1.6703    1.3547
    1.6612    2.1528    1.4820
    1.9140    2.1864    2.0922
    2.1431    2.6693    2.0767
    1.4339    2.2040    1.7293
    1.5513    1.9968    1.8037
    1.2971    1.8850    1.8236
    1.3979    2.0011    2.0695
    1.6716    2.0485    1.5921
    1.8021    2.2161    1.5746

Input Arguments

collapse all

`CVMdl` — Cross-validated quantile regression model
`RegressionPartitionedQuantileModel` object

Cross-validated quantile regression model, specified as a RegressionPartitionedQuantileModel object.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: kfoldLoss(CVMdl,Quantiles=[0.25 0.5 0.75]) specifies to return the quantile loss for the 0.25, 0.5, and 0.75 quantiles.

`Quantiles` — Quantiles for which to compute loss
`"all"` (default) | vector of values in `CVMdl.Quantiles`

Quantiles for which to compute the loss, specified as a vector of values in CVMdl.Quantiles. The software returns loss values only for the quantiles specified in Quantiles.

Example: Quantiles=[0.4 0.6]

Data Types: single | double | char | string

`Folds` — Fold indices to use
`1:CVMdl.KFold` (default) | positive integer vector

Fold indices to use, specified as a positive integer vector. The elements of Folds must be within the range from 1 to CVMdl.KFold. The software uses only the folds specified in Folds.

Example: Folds=[1 4 10]

Data Types: single | double

`LossFun` — Loss function
`"quantile"` (default) | function handle

Loss function, specified as "quantile" or a function handle.

"quantile" — Quantile loss.
Function handle — To specify a custom loss function, use a function handle. The function must have this form:
```
lossval = lossfun(Y,YFit,W,q)
```
- The output argument lossval is a numeric scalar.
- You specify the function name (lossfun).
- Y is a length-n numeric vector of observed responses.
- YFit is a length-n numeric vector of corresponding predicted responses.
- W is an n-by-1 numeric vector of observation weights.
- q is a numeric scalar in the range [0,1] corresponding to a quantile.

Example: LossFun=@lossfun

Data Types: char | string | function_handle

`Mode` — Aggregation level for output
`"average"` (default) | `"individual"`

Aggregation level for the output, specified as "average" or "individual".

Value	Description
`"average"`	The output is a 1-by-q vector of loss values, averaged over the folds specified by the `Folds` name-value argument. q is the number of quantiles specified by the `Quantiles` name-value argument.
`"individual"`	The output is a k-by-q matrix of loss values, where k is the number of folds specified by the `Folds` name-value argument and q is the number of quantiles specified by the `Quantiles` name-value argument.

Example: Mode="individual"

Data Types: char | string

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"quantile"` (default) | `"omitted"` | numeric scalar | numeric vector

Predicted response value to use for observations with missing predictor values, specified as "quantile", "omitted", a numeric scalar, or a numeric vector.

Value	Description
`"quantile"`	`kfoldLoss` uses the specified quantile of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
`"omitted"`	`kfoldLoss` excludes observations with missing predictor values from the loss computation.
Numeric scalar or vector	If `PredictionForMissingValue` is a scalar, then `kfoldLoss` uses this value as the predicted response value for observations with missing predictor values. The function uses the same value for all quantiles. If `PredictionForMissingValue` is a vector, its length must be equal to the number of quantiles specified by the `Quantiles` name-value argument. `kfoldLoss` uses element i in the vector as the quantile i predicted response value for observations with missing predictor values.

If an observation is missing an observed response value or an observation weight, then kfoldLoss does not use the observation in the loss computation.

Example: PredictionForMissingValue="omitted"

Data Types: single | double | char | string

Output Arguments

collapse all

`L` — Loss
numeric row vector | numeric matrix

Loss, returned as a numeric row vector or numeric matrix. The loss is the LossFun loss between the validation-fold observations and the predictions made with a quantile regression model trained on the training-fold observations.

If Mode is "average", then L is the average loss over the folds. That is, L is a 1-by-q vector of loss values, averaged over the folds specified by the Folds name-value argument. q is the number of quantiles specified by the Quantiles name-value argument.
If Mode is "individual", then L is a k-by-q matrix of loss values, where k is the number of folds specified by the Folds name-value argument and q is the number of quantiles specified by the Quantiles name-value argument.

Algorithms

kfoldLoss computes losses according to the loss object function of the trained compact models in CVMdl (CVMdl.Trained). For more information, see the model-specific loss function reference pages in the following table.

Model Type	`loss` Function
Quantile linear regression model	`loss`
Quantile neural network model for regression	`loss`

Version History

Introduced in R2025a

kfoldLoss

Syntax

Description

Examples

Compare Holdout and k-Fold Cross-Validation Quantile Losses

Specify Prediction for Observations with Missing Values in Loss Computation

Input Arguments

CVMdl — Cross-validated quantile regression model RegressionPartitionedQuantileModel object

Name-Value Arguments

Quantiles — Quantiles for which to compute loss "all" (default) | vector of values in CVMdl.Quantiles

Folds — Fold indices to use 1:CVMdl.KFold (default) | positive integer vector

LossFun — Loss function "quantile" (default) | function handle

Mode — Aggregation level for output "average" (default) | "individual"

PredictionForMissingValue — Predicted response value to use for observations with missing predictor values "quantile" (default) | "omitted" | numeric scalar | numeric vector

Output Arguments

L — Loss numeric row vector | numeric matrix

Algorithms

Version History

See Also

`CVMdl` — Cross-validated quantile regression model
`RegressionPartitionedQuantileModel` object

`Quantiles` — Quantiles for which to compute loss
`"all"` (default) | vector of values in `CVMdl.Quantiles`

`Folds` — Fold indices to use
`1:CVMdl.KFold` (default) | positive integer vector

`LossFun` — Loss function
`"quantile"` (default) | function handle

`Mode` — Aggregation level for output
`"average"` (default) | `"individual"`

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"quantile"` (default) | `"omitted"` | numeric scalar | numeric vector

`L` — Loss
numeric row vector | numeric matrix