crossval
Description
returns a cross-validated (partitioned) direct forecasting model
(CVMdl
= crossval(Mdl
,TSPartition
)CVMdl
) from a trained direct forecasting model
(Mdl
). The crossval
function uses the
cross-validation scheme specified by TSPartition
.
You can assess the predictive performance of Mdl
on cross-validated
data by using the object functions of CVMdl
(cvloss
and
cvpredict
).
Examples
Evaluate Model Using Expanding Window Cross-Validation
Create a cross-validated direct forecasting model using expanding window cross-validation. To evaluate the performance of the model:
Compute the mean squared error (MSE) on each test set using the
cvloss
object function.For each test set, compare the true response values to the predicted response values using the
cvpredict
object function.
Load the sample file TemperatureData.csv
, which contains average daily temperature from January 2015 through July 2016. Read the file into a table. Observe the first eight observations in the table.
Tbl = readtable("TemperatureData.csv");
head(Tbl)
Year Month Day TemperatureF ____ ___________ ___ ____________ 2015 {'January'} 1 23 2015 {'January'} 2 31 2015 {'January'} 3 25 2015 {'January'} 4 39 2015 {'January'} 5 29 2015 {'January'} 6 12 2015 {'January'} 7 10 2015 {'January'} 8 4
Create a datetime
variable t
that contains the year, month, and day information for each observation in Tbl
.
numericMonth = month(datetime(Tbl.Month, ... InputFormat="MMMM",Locale="en_US")); t = datetime(Tbl.Year,numericMonth,Tbl.Day);
Plot the temperature values in Tbl
over time.
plot(t,Tbl.TemperatureF) xlabel("Date") ylabel("Temperature in Fahrenheit")
Create a direct forecasting model by using the data in Tbl
. Train the model using a bagged ensemble of trees. All three of the predictors (Year
, Month
, and Day
) are leading predictors because their future values are known. To create new predictors by shifting the leading predictor and response variables backward in time, specify the leading predictor lags and the response variable lags.
Mdl = directforecaster(Tbl,"TemperatureF", ... Learner="bag", ... LeadingPredictors="all",LeadingPredictorLags={0:1,0:1,0:7}, ... ResponseLags=1:7)
Mdl = DirectForecaster Horizon: 1 ResponseLags: [1 2 3 4 5 6 7] LeadingPredictors: [1 2 3] LeadingPredictorLags: {[0 1] [0 1] [0 1 2 3 4 5 6 7]} ResponseName: 'TemperatureF' PredictorNames: {'Year' 'Month' 'Day'} CategoricalPredictors: 2 Learners: {[1x1 classreg.learning.regr.CompactRegressionEnsemble]} MaxLag: 7 NumObservations: 565
Mdl
is a DirectForecaster
model object. By default, the horizon is one step ahead. That is, Mdl
predicts a value that is one step into the future.
Partition the time series data in Tbl
using an expanding window cross-validation scheme. Create three training sets and three test sets, where each test set has 100 observations. Note that each observation in Tbl
is in at most one test set.
CVPartition = tspartition(size(Mdl.X,1),"ExpandingWindow",3, ... TestSize=100)
CVPartition = tspartition Type: 'expanding-window' NumObservations: 565 NumTestSets: 3 TrainSize: [265 365 465] TestSize: [100 100 100] StepSize: 100
The training sets increase in size from 265 observations in the first window to 465 observations in the third window.
Create a cross-validated direct forecasting model using the partition specified in CVPartition
. Inspect the Learners
property of the resulting CVMdl
object.
CVMdl = crossval(Mdl,CVPartition)
CVMdl = PartitionedDirectForecaster Partition: [1x1 tspartition] Horizon: 1 ResponseLags: [1 2 3 4 5 6 7] LeadingPredictors: [1 2 3] LeadingPredictorLags: {[0 1] [0 1] [0 1 2 3 4 5 6 7]} ResponseName: 'TemperatureF' PredictorNames: {'Year' 'Month' 'Day'} CategoricalPredictors: 2 Learners: {3x1 cell} MaxLag: 7 NumObservations: 565
CVMdl.Learners
ans=3×1 cell array
{1x1 timeseries.forecaster.CompactDirectForecaster}
{1x1 timeseries.forecaster.CompactDirectForecaster}
{1x1 timeseries.forecaster.CompactDirectForecaster}
CVMdl
is a PartitionedDirectForecaster
model object. The crossval
function trains CVMdl.Learners{1}
using the observations in the first training set, CVMdl.Learner{2}
using the observations in the second training set, and CVMdl.Learner{3}
using the observations in the third training set.
Compute the average test set MSE.
averageMSE = cvloss(CVMdl)
averageMSE = 53.3480
To obtain more information, compute the MSE for each test set.
individualMSE = cvloss(CVMdl,Mode="individual")
individualMSE = 3×1
44.1352
84.0695
31.8393
The models trained on the first and third training sets seem to perform better than the model trained on the second training set.
For each test set observation, predict the temperature value using the corresponding model in CVMdl.Learners
.
predictedY = cvpredict(CVMdl); predictedY(260:end,:)
ans=306×1 table
TemperatureF_Step1
__________________
NaN
NaN
NaN
NaN
NaN
NaN
50.963
57.363
57.04
60.705
59.606
58.302
58.023
61.39
67.229
61.083
⋮
Only the last 300 observations appear in any test set. For observations that do not appear in a test set, the predicted response value is NaN
.
For each test set, plot the true response values and the predicted response values.
tiledlayout(3,1) nexttile idx1 = test(CVPartition,1); plot(t(idx1),Tbl.TemperatureF(idx1)) hold on plot(t(idx1),predictedY.TemperatureF_Step1(idx1)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 1") hold off nexttile idx2 = test(CVPartition,2); plot(t(idx2),Tbl.TemperatureF(idx2)) hold on plot(t(idx2),predictedY.TemperatureF_Step1(idx2)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 2") hold off nexttile idx3 = test(CVPartition,3); plot(t(idx3),Tbl.TemperatureF(idx3)) hold on plot(t(idx3),predictedY.TemperatureF_Step1(idx3)) legend("True Response","Predicted Response", ... Location="eastoutside") xlabel("Date") ylabel("Temperature") title("Test Set 3") hold off
Overall, the cross-validated direct forecasting model is able to predict the trend in temperatures. If you are satisfied with the performance of the cross-validated model, you can use the full DirectForecaster
model Mdl
for forecasting at time steps beyond the available data.
Input Arguments
Mdl
— Direct forecasting model
DirectForecaster
model object
Direct forecasting model, specified as a DirectForecaster
model object.
TSPartition
— Cross-validation partition for time series data
tspartition
object
Cross-validation partition for time series data, specified as a tspartition
object. TSPartition
uses an expanding window cross-validation,
sliding window cross-validation, or holdout validation scheme (as specified by the
tspartition
function).
Output Arguments
CVMdl
— Cross-validated direct forecasting model
PartitionedDirectForecaster
model object
Cross-validated direct forecasting model, returned as a PartitionedDirectForecaster
model object.
Version History
Introduced in R2023b
See Also
DirectForecaster
| PartitionedDirectForecaster
| tspartition
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)