estimate
Fit univariate ARIMA or ARIMAX model to data
Syntax
Description
returns the fully specified ARIMA model EstMdl
= estimate(Mdl
,y
)EstMdl
. This model stores the
estimated parameter values resulting from fitting the partially specified ARIMA model
Mdl
to the observed univariate time series y
by
using maximum likelihood. EstMdl
and Mdl
are the
same model type and have the same structure.
[
also returns the estimated variance-covariance matrix associated with estimated parameters EstMdl
,EstParamCov
,logL
,info
] = estimate(___)EstParamCov
, the optimized loglikelihood objective function logL
, and a data structure of summary information info
.
fits the partially specified ARIMA model EstMdl
= estimate(Mdl
,Tbl1
)Mdl
to the response variable
in the input table or timetable Tbl1
, which contains time series
data, and returns the fully specified, estimated ARIMA model EstMdl
.
estimate
selects the response variable named in
Mdl.SeriesName
or the sole variable in Tbl1
. To
select a different response variable in Tbl1
to fit the model to, use
the ResponseVariable
name-value argument. (since R2023b)
[___] = estimate(___,
specifies options using one or more name-value arguments in
addition to any of the input argument combinations in previous syntaxes.
Name=Value
)estimate
returns the output argument combination for the
corresponding input arguments. For example, estimate(Mdl,y,Y0=y0,X=Pred)
fits the ARIMA
model Mdl
to the vector of response data y
,
specifies the vector of presample response data y0
, and includes a
linear regression term in the model for the exogenous predictor data
Pred
.
Supply all input data using the same data type. Specifically:
If you specify the numeric vector
y
, optional data sets must be numeric arrays and you must use the appropriate name-value argument. For example, to specify a presample, set theY0
name-value argument to a numeric matrix of presample data.If you specify the table or timetable
Tbl1
, optional data sets must be tables or timetables, respectively, and you must use the appropriate name-value argument. For example, to specify a presample, set thePresample
name-value argument to a table or timetable of presample data.
Examples
Fit ARMA Model to Vector of Simulated Response Data
Fit an ARMA(2,1) model to simulated data.
Simulate Data from Known Model
Suppose that the data generating process (DGP) is
where is a series of iid Gaussian random variables with mean 0 and variance 0.1.
Create the ARMA(2,1) model representing the DGP.
DGP = arima(AR={0.5,-0.3},MA=0.2,Constant=0, ...
Variance=0.1)
DGP = arima with properties: Description: "ARIMA(2,0,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 1 Constant: 0 AR: {0.5 -0.3} at lags [1 2] SAR: {} MA: {0.2} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: 0.1
DGP
is a fully specified arima
model object.
Simulate a random 500 observation path from the ARMA(2,1) model.
rng(5,"twister"); % For reproducibility T = 500; y = simulate(DGP,T);
y is a 500-by-1 column vector representing a simulated response path from the ARMA(2,1) model DGP
.
Estimate Model
Create an ARMA(2,1) model template for estimation.
Mdl = arima(2,0,1)
Mdl = arima with properties: Description: "ARIMA(2,0,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 1 Constant: NaN AR: {NaN NaN} at lags [1 2] SAR: {} MA: {NaN} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN
Mdl
is a partially specified arima
model object. Only required, nonestimable parameters that determine the model structure are specified. NaN
-valued properties, including , , , , and , are unknown model parameters to be estimated.
Fit the ARMA(2,1) model to y
.
EstMdl = estimate(Mdl,y)
ARIMA(2,0,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue _________ _____________ __________ __________ Constant 0.0089018 0.018417 0.48334 0.62886 AR{1} 0.49563 0.10323 4.8013 1.5767e-06 AR{2} -0.25495 0.070155 -3.6341 0.00027897 MA{1} 0.27737 0.10732 2.5846 0.0097491 Variance 0.10004 0.0066577 15.027 4.9017e-51
EstMdl = arima with properties: Description: "ARIMA(2,0,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 1 Constant: 0.00890178 AR: {0.495632 -0.254951} at lags [1 2] SAR: {} MA: {0.27737} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: 0.100043
MATLAB®
displays a table containing an estimation summary, which includes parameter estimates and inferences. For example, the Value
column contains corresponding maximum-likelihood estimates, and the PValue
column contains -values for the asymptotic -test of the null hypothesis that the corresponding parameter is 0.
EstMdl
is a fully specified, estimated arima
model object; its estimates resemble the parameter values of the DGP.
Apply Equality Constraints to Parameters During Estimation
Fit an AR(2) model to simulated data while holding the model constant fixed during estimation.
Simulate Data from Known Model
Suppose the DGP is
where is a series of iid Gaussian random variables with mean 0 and variance 0.1.
Create the AR(2) model representing the DGP.
DGP = arima(AR={0.5,-0.3},Constant=0,Variance=0.1);
Simulate a random 500 observation path from the model.
rng(5,"twister"); % For reproducibility T = 500; y = simulate(DGP,T);
Create Model Object Specifying Constraint
Assume that the mean of is 0, which implies that is 0.
Create an AR(2) model for estimation. Set to 0.
Mdl = arima(ARLags=1:2,Constant=0)
Mdl = arima with properties: Description: "ARIMA(2,0,0) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 0 Constant: 0 AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN
Mdl
is a partially specified arima
model object. Specified parameters include all required parameters and the model constant. NaN
-valued properties, including , , and , are unknown model parameters to be estimated.
Estimate Model
Fit the AR(2) model template containing the constraint to y
.
EstMdl = estimate(Mdl,y)
ARIMA(2,0,0) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ __________ Constant 0 0 NaN NaN AR{1} 0.56342 0.044225 12.74 3.5474e-37 AR{2} -0.29355 0.041786 -7.0252 2.137e-12 Variance 0.10022 0.006644 15.085 2.0476e-51
EstMdl = arima with properties: Description: "ARIMA(2,0,0) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 0 Q: 0 Constant: 0 AR: {0.563425 -0.293554} at lags [1 2] SAR: {} MA: {} SMA: {} Seasonality: 0 Beta: [1×0] Variance: 0.100222
EstMdl
is a fully specified, estimated arima
model object; its estimates resemble the parameter values of the AR(2) model DGP
. The value of in the estimation summary and object display is 0
, and corresponding inferences are trivial or do not apply.
Compute Estimated Standard Errors
Load the US equity index data set Data_EquityIdx
.
load Data_EquityIdx
The table DataTable
includes the time series variable NYSE
, which contains daily NYSE composite closing prices from January 1990 through December 2001.
Convert the table to a timetable.
dt = datetime(dates,'ConvertFrom','datenum','Format','yyyy-MM-dd'); TT = table2timetable(DataTable,'RowTimes',dt);
Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period
Fit an ARIMA(1,1,1) model to the data, and return the estimated parameter covariance matrix.
Mdl = arima(1,1,1);
[EstMdl,EstParamCov] = estimate(Mdl,TT{:,"NYSE"});
ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ________ Constant 0.15745 0.09783 1.6094 0.10752 AR{1} -0.21995 0.15642 -1.4062 0.15968 MA{1} 0.28539 0.15382 1.8554 0.063544 Variance 17.159 0.20038 85.632 0
EstParamCov
EstParamCov = 4×4
0.0096 -0.0002 0.0002 0.0023
-0.0002 0.0245 -0.0240 -0.0060
0.0002 -0.0240 0.0237 0.0057
0.0023 -0.0060 0.0057 0.0402
EstMdl
is a fully specified, estimated arima
model object. Rows and columns of EstParamCov
correspond to the rows in the table of estimates and inferences; for example, .
Compute estimated parameter standard errors by taking the square root of the diagonal elements of the covariance matrix.
estParamSE = sqrt(diag(EstParamCov))
estParamSE = 4×1
0.0978
0.1564
0.1538
0.2004
Compute a Wald-based 95% confidence interval on .
T = size(TT,1); % Effective sample size
phihat = EstMdl.AR{1};
sephihat = estParamSE(2);
ciphi = phihat + tinv([0.025 0.975],T - 3)*sephihat
ciphi = 1×2
-0.5266 0.0867
The interval contains 0, which suggests that is insignificant.
Fit ARIMA Model to Response Variable in Timetable
Since R2023b
Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply a timetable of data and specify the series for the fit.
Load Data
Load the US equity index data set Data_EquityIdx
.
load Data_EquityIdx
T = height(DataTimeTable)
T = 3028
The timetable DataTimeTable
includes the time series variable NYSE
, which contains daily NYSE composite closing prices from January 1990 through December 2001.
Plot the daily NYSE price series.
figure
plot(DataTimeTable.Time,DataTimeTable.NYSE)
title("NYSE Daily Closing Prices: 1990 - 2001")
Prepare Timetable for Estimation
When you plan to supply a timetable, you must ensure it has all the following characteristics:
The selected response variable is numeric and does not contain any missing values.
The timestamps in the
Time
variable are regular, and they are ascending or descending.
Create a new timetable, DTT
, by removing all missing values from the timetable, relative to the NYSE price series.
DTT = rmmissing(DataTimeTable,DataVariables="NYSE");
T_DTT = height(DTT)
T_DTT = 3028
Because all sample times have observed NYSE prices, rmmissing
does not remove any observations.
Determine whether the sampling timestamps have a regular frequency and are sorted.
areTimestampsRegular = isregular(DTT,"days")
areTimestampsRegular = logical
0
areTimestampsSorted = issorted(DTT.Time)
areTimestampsSorted = logical
1
areTimestampsRegular = 0
indicates that the timestamps of DTT
are irregular, and areTimestampsSorted = 1
indicates that the timestamps are sorted. These measurements are irregular because observations occur only on business days.
Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.
DTTW = convert2weekly(DTT,Aggregation="mean"); areTimestampsRegular = isregular(DTTW,"weeks")
areTimestampsRegular = logical
1
T_DTTW = height(DTTW)
T_DTTW = 627
The timetable DTTW
is regular.
figure
plot(DTTW.Time,DTTW.NYSE)
title("NYSE Daily Closing Prices: 1990 - 2001")
Create Model Template for Estimation
Create an ARIMA(1,1,1) model template for estimation.
Mdl = arima(1,1,1)
Mdl = arima with properties: Description: "ARIMA(1,1,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 1 Q: 1 Constant: NaN AR: {NaN} at lag [1] SAR: {} MA: {NaN} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN
Mdl
is a partially specified arima
model object.
Fit Model to Data
Fit an ARIMA(1,1,1) model to weekly average NYSE closing prices. Specify the entire series and the response variable name.
EstMdl = estimate(Mdl,DTTW,ResponseVariable="NYSE");
ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 0.86386 0.46496 1.8579 0.06318 AR{1} -0.37582 0.22719 -1.6542 0.09809 MA{1} 0.47221 0.21741 2.172 0.029858 Variance 55.89 1.832 30.507 2.1199e-204
EstMdl
is a fully specified, estimated arima
model object. By default, estimate
backcasts for the required Mdl.P = 2
presample responses.
Initialize Model Estimation Using Presample Response Data
Since R2023b
Because an ARIMA model is a function of previous values, estimate
requires presample data to initialize the model early in the sampling period. Although estimate
backcasts for presample data by default, you can specify required presample data instead. The P
property of an arima
model object specifies the required number of presample observations.
Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply timetables of presample and estimation data sets.
Load Data
Load the US equity index data set Data_EquityIdx
.
load Data_EquityIdx
Prepare Timetable for Estimation
The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.
DTTW = convert2weekly(DataTimeTable,Aggregation="mean");
Create Model Template for Estimation
Create an ARIMA(1,1,1) model template for estimation.
Mdl = arima(1,1,1)
Mdl = arima with properties: Description: "ARIMA(1,1,1) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 2 D: 1 Q: 1 Constant: NaN AR: {NaN} at lag [1] SAR: {} MA: {NaN} at lag [1] SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN
Mdl.P
is 2
. Therefore, estimate
requires 2 presample observations to initialize the model for estimation.
Partition Sample
Partition the entire sample DTTW
into presample and estimation sample timetables. The presample occurs first and contains two observations, and the estimation sample contains the remaining observations in DTTW
.
PS = DTTW(1:Mdl.P,:); ES = DTTW((Mdl.P+1):end,:);
Estimate Model
Fit an ARIMA(1,1,1) model to the estimation sample. Specify the presample sample and response variable names.
EstMdl = estimate(Mdl,ES,ResponseVariable="NYSE", ... Presample=PS,PresampleResponseVariable="NYSE");
ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 0.83624 0.453 1.846 0.064891 AR{1} -0.32862 0.23526 -1.3968 0.16246 MA{1} 0.42703 0.22613 1.8885 0.058965 Variance 56.065 1.8433 30.416 3.3809e-203
Specify Initial Parameter Values for Optimization
Since R2023b
Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Specify initial parameter values obtained from an analysis of a pilot sample.
Load Data
Load the US equity index data set Data_EquityIdx
.
load Data_EquityIdx
Prepare Timetable for Estimation
The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.
DTTW = convert2weekly(DataTimeTable,Aggregation="mean");
Create Model Template for Estimation
Create an ARIMA(1,1,1) model template for estimation. Specify the response series name as NYSE
.
Mdl = arima(ARLags=1,D=1,MALags=1,SeriesName="NYSE");
Fit Model to Pilot Sample
Treat the first two years as a pilot sample for obtaining initial parameter values when fitting the model to the remaining three years of data. Fit the model to the pilot sample. By default, estimate
uses the response data in the table variable that matches Mdl.SeriesName
.
endPilot = datetime(1991,12,31);
DTTW0 = DTTW(DTTW.Time <= endPilot,:);
EstMdl0 = estimate(Mdl,DTTW0,Display="off");
EstMdl0
is a fully specified, estimated arima
model object.
Estimate Model
Fit an ARIMA(1,1,1) model to the estimation sample. Specify the estimated parameters from the pilot sample fit as initial values for optimization.
DTTWEst = DTTW(DTTW.Time > endPilot,:);
c0 = EstMdl0.Constant;
ar0 = EstMdl0.AR;
ma0 = EstMdl0.MA;
var0 = EstMdl0.Variance;
EstMdl = estimate(Mdl,DTTWEst,Constant0=c0,AR0=ar0, ...
MA0=ma0,Variance0=var0);
ARIMA(1,1,1) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 0.93922 0.55503 1.6922 0.090609 AR{1} -0.38996 0.26259 -1.4851 0.13753 MA{1} 0.48477 0.25108 1.9308 0.053513 Variance 64.661 2.4853 26.018 3.1308e-149
Estimate ARIMA Model Containing Exogenous Predictors (ARIMAX)
Fit an ARIMAX model to simulated time series data.
Simulate Predictor and Response Data
Create the ARIMAX(2,1,0) model for the DGP, represented by in the equation
where is a series of iid Gaussian random variables with mean 0 and variance 0.1.
DGP = arima(AR={0.5,-0.3},D=1,Constant=2, ...
Variance=0.1,Beta=[1.5 2.6 -0.3]);
Assume that the exogenous variables , , and are represented by the AR(1) processes
where follows a Gaussian distribution with mean 0 and variance 0.01 for . Create ARIMA models that represent the exogenous variables.
MdlX1 = arima(AR=0.1,Constant=0,Variance=0.01); MdlX2 = arima(AR=0.2,Constant=0,Variance=0.01); MdlX3 = arima(AR=0.3,Constant=0,Variance=0.01);
Simulate length 1000 exogenous series from the AR models. Store the simulated data in a matrix.
T = 1000; rng(10,"twister"); % For reproducibility x1 = simulate(MdlX1,T); x2 = simulate(MdlX2,T); x3 = simulate(MdlX3,T); X = [x1 x2 x3];
X
is a 1000-by-3 matrix of simulated time series data. Each row corresponds to an observation in the time series, and each column corresponds to an exogenous variable.
Simulate a length 1000 series from the DGP. Specify the simulated exogenous data.
y = simulate(DGP,T,X=X);
y
is a 1000-by-1 vector of response data.
Estimate Model
Create an ARIMA(2,1,0) model template for estimation.
Mdl = arima(2,1,0)
Mdl = arima with properties: Description: "ARIMA(2,1,0) Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian" P: 3 D: 1 Q: 0 Constant: NaN AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Seasonality: 0 Beta: [1×0] Variance: NaN
The model description (Description
property) and value of Beta
suggest that the partially specified arima
model object Mdl
is agnostic of the exogenous predictors.
Estimate the ARIMAX(2,1,0) model; specify the exogenous predictor data. Because estimate
backcasts for presample responses (a process that requires presample predictor data for ARIMAX models), fit the model to the latest T – Mdl.P
responses. (Alternatively, you can specify presample responses by using the Y0
name-value argument.)
EstMdl = estimate(Mdl,y((Mdl.P + 1):T),X=X);
ARIMAX(2,1,0) Model (Gaussian Distribution): Value StandardError TStatistic PValue ________ _____________ __________ ___________ Constant 1.7519 0.021143 82.859 0 AR{1} 0.56076 0.016511 33.963 7.9428e-253 AR{2} -0.26625 0.015966 -16.676 1.9633e-62 Beta(1) 1.4764 0.10157 14.536 7.1229e-48 Beta(2) 2.5638 0.10445 24.547 4.6635e-133 Beta(3) -0.34422 0.098623 -3.4903 0.00048249 Variance 0.10673 0.0047273 22.577 7.3157e-113
EstMdl
is a fully specified, estimated arima
model object.
When you estimate the model by using estimate
and supply the exogenous data by specifying the X
name-value argument, MATLAB® recognizes the model as an ARIMAX(2,1,0) model and includes a linear regression component for the exogenous variables.
The estimated model is
which resembles the DGP represented by Mdl0
. Because MATLAB returns the AR coefficients of the model expressed in difference-equation notation, their signs are opposite in the equation.
Compute Fitted Response Values
Since R2023b
Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Compute estimated weekly averages closing price within the time range of the data.
Load the US equity index data set Data_EquityIdx
.
load Data_EquityIdx
The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables.
DTTW = convert2weekly(DataTimeTable,Aggregation="mean");
numobs = height(DTTW)
numobs = 627
Create an ARIMA(1,1,1) model template for estimation. Specify the response series name as NYSE
.
Mdl = arima(1,1,1);
Mdl.SeriesName = "NYSE";
Fit an ARIMA(1,1,1) model to the entire sample. Suppress the estimation display.
EstMdl = estimate(Mdl,DTTW,Display="off");
Infer residuals from the estimated model.
ResidTT = infer(EstMdl,DTTW); tail(ResidTT)
Time NYSE NASDAQ NYSE_Residual NYSE_Variance ___________ ______ ______ _____________ _____________ 16-Nov-2001 577.11 1886.9 5.8562 55.89 23-Nov-2001 583 1898.3 5.4409 55.89 30-Nov-2001 581.41 1925.8 -2.8105 55.89 07-Dec-2001 584.96 1998.1 3.4212 55.89 14-Dec-2001 574.03 1981 -12.071 55.89 21-Dec-2001 582.1 1967.9 8.7933 55.89 28-Dec-2001 590.28 1967.2 6.2015 55.89 04-Jan-2002 589.8 1950.4 -1.2004 55.89
ResidTT
is a 627-by-4 timetable containing the data passed to esimtate
from DTTW
, and the residuals NYSE_Residual
and estimated conditional variances NYSE_Variance
from the fit. Because the model variance is a constant, the conditional variance variable contains a vector completely composed of 55.89
, which is the model variance estimate.
Compute the fitted values and store them in ResidTT
.
ResidTT.NYSE_YHat = ResidTT.NYSE - ResidTT.NYSE_Residual; tail(ResidTT)
Time NYSE NASDAQ NYSE_Residual NYSE_Variance NYSE_YHat ___________ ______ ______ _____________ _____________ _________ 16-Nov-2001 577.11 1886.9 5.8562 55.89 571.25 23-Nov-2001 583 1898.3 5.4409 55.89 577.56 30-Nov-2001 581.41 1925.8 -2.8105 55.89 584.22 07-Dec-2001 584.96 1998.1 3.4212 55.89 581.54 14-Dec-2001 574.03 1981 -12.071 55.89 586.1 21-Dec-2001 582.1 1967.9 8.7933 55.89 573.3 28-Dec-2001 590.28 1967.2 6.2015 55.89 584.08 04-Jan-2002 589.8 1950.4 -1.2004 55.89 591
Plot the last 200 observations with corresponding fitted values on the same graph.
figure h = plot(ResidTT.Time((end-199):end),ResidTT{(end-199):end,["NYSE" "NYSE_YHat"]}); h(2).LineStyle = "--"; legend(["Observations" "Fitted values"]) title("Model of NYSE Weekly Average Closing Prices")
The fitted values closely track the observations.
Plot the residuals versus the fitted values.
figure plot(ResidTT.NYSE_YHat,ResidTT.NYSE_Residual,".",MarkerSize=15) ylabel("Residuals") xlabel("Fitted Values") title("Residual Plot")
The residual variance appears larger for larger fitted values. One remedy for this behavior is to apply the log transform to the data.
Input Arguments
Mdl
— Partially specified ARIMA model
arima
model object
Partially specified ARIMA model used to indicate constrained and estimable model
parameters, specified as an arima
model object returned by
arima
. Properties of Mdl
describe the model
structure and can specify parameter values.
estimate
fits unspecified (NaN
-valued)
parameters to the data y
.
estimate
treats specified parameters as equality constraints
during estimation.
y
— Single path of observed response data yt
numeric column vector
Tbl1
— Time series data
table | timetable
Since R2023b
Time series data, to which estimate
fits the model,
specified as a table or timetable with numvars
variables and
numobs
rows.
The selected response variable is a numeric vector representing a single path of
numobs
observations. You can optionally select a response variable
yt from Tbl1
by using
the ResponseVariables
name-value argument, and you can select
numpreds
predictor variables
xt for the exogenous regression component by
using the PredictorVariables
name-value argument.
Each row is an observation, and measurements in each row occur simultaneously.
Variables in Tbl1
represent the continuation of corresponding
variables in Presample
.
If Tbl1
is a timetable, it must represent a sample with a
regular datetime time step (see isregular
), and the datetime vector Tbl1.Time
must be
strictly ascending or descending.
If Tbl1
is a table, the last row contains the latest
observation.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: esimtate(Mdl,y,Y0=y0,X=Pred)
uses the vector
y0
as presample responses for estimation and includes a linear
regression component for the exogenous predictor data in the vector
Pred
.
ResponseVariable
— Response variable yt to select from Tbl1
string scalar | character vector | integer | logical vector
Since R2023b
Response variable yt to select from
Tbl1
containing the response data, specified as one of the
following data types:
String scalar or character vector containing a variable name in
Tbl1.Properties.VariableNames
Variable index (integer) to select from
Tbl1.Properties.VariableNames
A length
numvars
logical vector, whereResponseVariable(
selects variablej
) = true
fromj
Tbl1.Properties.VariableNames
, andsum(ResponseVariable)
is1
The selected variable must be a numeric vector and cannot contain missing values
(NaN
).
If Tbl1
has one variable, the default specifies that variable.
Otherwise, the default matches the variable to name in
Mdl.SeriesName
.
Example: ResponseVariable="StockRate2"
Example: ResponseVariable=[false false true false]
or
ResponseVariable=3
selects the third table variable as the
response variable.
Data Types: double
| logical
| char
| cell
| string
X
— Exogenous predictor data
numeric matrix
Exogenous predictor data for the linear regression component, specified as a
numeric matrix containing numpreds
columns. Use
X
only when you supply a vector of response data
y
.
numpreds
is the number of predictor variables.
Rows correspond to observations, and the last row contains the latest observation.
estimate
does not use the regression component in the
presample period. X
must have at least as many observations as are
used after the presample period:
If you specify
Y0
,X
must have at leastnumobs
rows.Otherwise,
X
must have at leastnumobs
+Mdl.P
observations to account for the presample removal.
In either case, if you supply more rows than necessary,
estimate
uses the latest observations only.
estimate
synchronizes X
and
y
so that the latest observations (last rows) occur
simultaneously.
Columns correspond to individual predictor variables.
By default, estimate
excludes the regression component,
regardless of its presence in Mdl
.
Data Types: double
PredictorVariables
— Exogenous predictor variables xt to select from Tbl1
string vector | cell vector of character vectors | vector of integers | logical vector
Since R2023b
Exogenous predictor variables xt to
select from Tbl1
containing predictor data for the regression
component, specified as one of the following data types:
String vector or cell vector of character vectors containing
numpreds
variable names inTbl1.Properties.VariableNames
A length
numpreds
vector of unique indices (positive integers) of variables to select fromTbl1.Properties.VariableNames
A length
numvars
logical vector, wherePredictorVariables(
selects variablej
) = true
fromj
Tbl1.Properties.VariableNames
, andsum(PredictorVariables)
isnumpreds
The selected variables must be numeric vectors and cannot contain missing values
(NaN
).
If you specify PredictorVariables
, you must also specify
presample response data to by using the Presample
and
PresampleResponseVariable
name-value arguments. For more
details, see Algorithms.
By default, estimate
excludes the regression component,
regardless of its presence in Mdl
.
Example: PredictorVariables=["M1SL" "TB3MS"
"UNRATE"]
Example: PredictorVariables=[true false true false]
or
PredictorVariable=[1 3]
selects the first and third table
variables to supply the predictor data.
Data Types: double
| logical
| char
| cell
| string
Options
— Optimization options
optimoptions
optimization controller
Optimization options, specified as an optimoptions
optimization
controller. For details on modifying the default values of the optimizer, see optimoptions
or fmincon
in Optimization Toolbox™.
For example, to change the constraint tolerance to 1e-6
, set
options =
optimoptions(@fmincon,ConstraintTolerance=1e-6,Algorithm="sqp")
. Then,
pass Options
into estimate
using
Options=options
.
By default, estimate
uses the same default options as
fmincon
, except Algorithm
is
"sqp"
and ConstraintTolerance
is
1e-7
.
Display
— Command Window display option
"params"
(default) | "diagnostics"
| "full'"
| "iter"
| "off"
| string vector | cell vector of character vectors
Command Window display option, specified as one or more of the values in this table.
Value | Information Displayed |
---|---|
"diagnostics" | Optimization diagnostics |
"full" | Maximum likelihood parameter estimates, standard errors, t statistics, iterative optimization information, and optimization diagnostics |
"iter" | Iterative optimization information |
"off" | None |
"params" | Maximum likelihood parameter estimates, standard errors, and t statistics and p-values of coefficient significance tests |
Example: Display="off"
is well suited for running a simulation that
estimates many models.
Example: Display=["params" "diagnostics"]
displays all estimation
results and the optimization diagnostics.
Data Types: char
| cell
| string
Y0
— Presample response data yt
numeric column vector
Presample response data yt to initialize
the model, specified as a numpreobs
-by-1 numeric column vector. Use
Y0
only when you supply the vector of response data
y
.
numpreobs
is the number of presample observations. Each row is
a presample observation. The last row contains the latest presample observation.
numpreobs
must be at least Mdl.P
. If
numpreobs
> Mdl.P
,
estimate
uses the latest required number of observations
only. The last element or row contains the latest observation.
By default, estimate
backward forecasts (backcasts) for
the necessary amount of presample responses.
For details on partitioning data for estimation, see Time Base Partitions for ARIMA Model Estimation.
Data Types: double
E0
— Presample residual data et
numeric column vector
Presample residual data
et to initialize the
model, specified as a numpreobs
-by-1 numeric column vector. Use
E0
only when you supply the vector of response data
y
.
numpreobs
is the number of presample observations. Each row is
a presample observation. The last row contains the latest presample observation.
numpreobs
must be at least Mdl.Q
. If
numpreobs
> Mdl.Q
,
estimate
uses the latest required number of observations
only. The last element or row contains the latest observation.
If Mdl.Variance
is a conditional variance model object, such as
a garch
model, estimate
can require more than Mdl.Q
presample innovations.
By default, estimate
sets all required presample residuals
to 0
, which is the expected value of the corresponding innovations
series.
Data Types: double
V0
— Presample conditional variances σt2
numeric positive column vector
Presample conditional variances
σ2t
to initialize any conditional variance model, numpreobs
-by-1
positive column vector. If Mdl.Variance
is a conditional variance
model, V0
provides initial values for that model. Use
V0
only when you supply the vector of response data
y
.
Each row is a presample observation. numpreobs
must be at least
number of observations required to initialize the conditional variance model type in
Mdl.Variance
(see estimate
). If V0
has extra rows,
estimate
uses only the latest observations. The last row
contains the latest presample observation.
If the variance is constant, estimate
ignores
V0
.
By default, estimate
sets the necessary presample
conditional variances to the average squared value of the inferred residuals.
Data Types: double
Presample
— Presample data
table | timetable
Since R2023b
Presample data containing the response
yt, residual
et, or conditional variance
σt2 series to
initialize the model for estimation, specified as a table or timetable, the same type
as Tbl1
, with numprevars
variables and
numpreobs
rows. Use Presample
only when you
supply a table or timetable of data Tbl1
.
Each selected variable is a single path of numpreobs
observations representing the presample of responses, residuals, or conditional
variances for the selected response variable in Tbl1
.
Each row is a presample observation, and measurements in each row occur
simultaneously. numpreobs
must satisfy one of the following conditions:
numpreobs
≥Mdl.P
whenPresample
provides only presample responsesnumpreobs
≥Mdl.Q
whenPresample
provides only presample residualsnumpreobs
≥max([Mdl.P Mdl.Q])
whenPresample
provides presample responses and residuals.Mdl
can require more presample observations then specified in the other conditions whenPresample
provides presample conditional variances. For more details, seeestimate
.
If you supply more rows than necessary,
estimate
uses the latest required number of observations
only.
When
If Presample
is a timetable, all the following conditions
must be true:
Presample
must represent a sample with a regular datetime time step (seeisregular
).The inputs
Tbl1
andPresample
must be consistent in time such thatPresample
immediately precedesTbl1
with respect to the sampling frequency and order.The datetime vector of sample timestamps
Presample.Time
must be ascending or descending.
If Presample
is a table, the last row contains the latest
presample observation.
By default:
When
Mdl
is an ARIMA model without an exogenous linear regression component,estimate
backcasts for necessary presample responses, sets necessary presample residuals to 0, and sets necessary presample variances to the average squared value of inferred residuals.When
Mdl
is an ARIMAX model (you specify thePredictorVariables
name-value argument), you must specify presample response data becauseestimate
cannot backcast for presample responses.estimate
sets necessary presample residuals to 0 and necessary presample variances to the average squared value of inferred residuals.
If you specify the Presample
, you must specify the presample
response, innovation, and conditional variance variable names by using the
PresampleResponseVariable
,
PresampleInnovationVariable
, or
PresampleVarianceVariable
name-value argument,
respectively.
PresampleResponseVariable
— Response variable yt to select from Presample
string scalar | character vector | integer | logical vector
Since R2023b
Response variable yt to select from
Presample
containing presample response data, specified as one of
the following data types:
String scalar or character vector containing the variable name to select from
Presample.Properties.VariableNames
Variable index (positive integer) to select from
Presample.Properties.VariableNames
A logical vector, where
PresampleResponseVariable(
selects variablej
) = true
fromj
Presample.Properties.VariableNames
The selected variable must be a numeric vector and cannot contain missing values
(NaN
s).
If you specify presample response data by using the Presample
name-value argument, you must specify
PresampleResponseVariable
.
Example: PresampleResponseVariable="GDP"
Example: PresampleResponseVariable=[false false true false]
or
PresampleResponseVariable=3
selects the third table variable for
presample response data.
Data Types: double
| logical
| char
| cell
| string
PresampleInnovationVariable
— Residual variable et to select from Presample
string scalar | character vector | integer | logical vector
Since R2023b
Residual variable et to select from
Presample
containing presample residual data, specified as one of
the following data types:
String scalar or character vector containing the variable name to select from
Presample.Properties.VariableNames
Variable index (positive integer) to select from
Presample.Properties.VariableNames
A logical vector, where
PresampleInnovationVariable(
selects variablej
) = true
fromj
Presample.Properties.VariableNames
The selected variable must be a numeric vector and cannot contain missing values
(NaN
s).
If you specify presample residual data by using the Presample
name-value argument, you must specify
PresampleInnovationVariable
.
Example: PresampleInnovationVariable="GDPInnov"
Example: PresampleInnovationVariable=[false false true false]
or
PresampleInnovationVariable=3
selects the third table variable
for presample residual data.
Data Types: double
| logical
| char
| cell
| string
PresampleVarianceVariable
— Conditional variance variable σt2 to select from of Presample
string scalar | character vector | integer | logical vector
Since R2023b
Conditional variance variable
σt2 to select
from of Presample
containing presample conditional variance data,
specified as one of the following data types:
String scalar or character vector containing a variable name in
Presample.Properties.VariableNames
Variable index (positive integer) to select from
Presample.Properties.VariableNames
A logical vector, where
PresampleVarianceVariable(
selects variablej
) = true
fromj
Presample.Properties.VariableNames
The selected variable must be a numeric vector and cannot contain missing values
(NaN
s).
If you specify presample conditional variance data by using the
Presample
name-value argument, you must specify
PresampleVarianceVariable
.
Example: PresampleVarianceVariable="StockRateVar0"
Example: PresampleVarianceVariable=[false false true false]
or
PresampleVarianceVariable=3
selects the third table variable as
the presample conditional variance variable.
Data Types: double
| logical
| char
| cell
| string
Constant0
— Initial estimate of model constant
numeric scalar
Initial estimate of the model constant c, specified as a numeric scalar.
By default, estimate
derives initial estimates using standard time series techniques.
Data Types: double
AR0
— Initial estimates of nonseasonal AR polynomial coefficients
numeric vector
Initial estimates of the nonseasonal AR polynomial coefficients , specified as a numeric vector.
Elements of AR0
correspond to nonzero cells of
Mdl.AR
.
By default, estimate
derives initial estimates using standard time series techniques.
Data Types: double
SAR0
— Initial estimates of seasonal autoregressive polynomial coefficients
numeric vector
Initial estimates of the seasonal autoregressive polynomial coefficients , specified as a numeric vector.
Elements of SAR0
correspond to nonzero cells of
Mdl.SAR
.
By default, estimate
derives initial estimates using standard time series techniques.
Data Types: double
MA0
— Initial estimates of nonseasonal moving average polynomial coefficients
numeric vector
Initial estimates of the nonseasonal moving average polynomial coefficients , specified as a numeric vector.
Elements of MA0
correspond to elements of
Mdl.MA
.
By default, estimate
derives initial estimates using standard time series techniques.
Data Types: double
SMA0
— Initial estimates of seasonal moving average polynomial coefficients
numeric vector
Initial estimates of the seasonal moving average polynomial coefficients , specified as a numeric vector.
Elements of SMA0
correspond to nonzero cells of
Mdl.SMA
.
By default, estimate
derives initial estimates using standard time series techniques.
Data Types: double
Beta0
— Initial estimates of regression coefficients
numeric vector
Initial estimates of the regression coefficients β, specified as a numeric vector.
The length of Beta0
must equal the numpreds
. Elements of Beta0
correspond to the predictor variables represented by the columns of X
or PredictorVariables
.
By default, estimate
derives initial estimates using standard time series techniques.
Data Types: double
DoF0
— Initial estimate of t-distribution degrees-of-freedom parameter
10
(default) | positive scalar
Initial estimate of the t-distribution degrees-of-freedom parameter
ν, specified as a positive scalar. DoF0
must
exceed 2.
Data Types: double
Variance0
— Initial estimates of variances of innovations
positive scalar | cell vector of name-value arguments
Initial estimates of variances of innovations, specified as a positive scalar or a cell vector of name-value arguments.
Mdl.Variance Value | Description | 'Variance0' Value |
---|---|---|
Numeric scalar or NaN | Constant variance | Positive scalar |
garch , egarch , or gjr model object | Conditional variance model | Cell vector of name-value arguments for specifying initial estimates,
see the estimate function of the
conditional variance model objects. The cell vector must have the form
{'Name1',value1,'Name2',value2,...} . |
By default, estimate
derives initial estimates using standard time series techniques.
Example: For a model with a constant variance, set Variance0=2
to specify an initial variance estimate of 2
.
Example: For a composite conditional mean and variance model, set
Variance0={'Constant0',2,'ARCH0',0.1}
to specify an initial
estimate of 2
for the conditional variance model constant, and an
initial estimate of 0.1
for the lag 1 coefficient in the ARCH
polynomial.
Data Types: double
| cell
Note
NaN
values iny
,X
,Y0
,E0
, andV0
indicate missing values.estimate
removes missing values from specified data by listwise deletion.For the presample,
estimate
horizontally concatenatesY0
,E0
, andV0
, and then it removes any row of the concatenated matrix containing at least oneNaN
.For the estimation sample,
estimate
horizontally concatenatesy
andX
, and then it removes any row of the concatenated matrix containing at least oneNaN
.Regardless of sample,
estimate
synchronizes the specified, possibly jagged vectors with respect to the latest observation of the sample (last row).
This type of data reduction reduces the effective sample size and can create an irregular time series.
estimate
issues an error when any table or timetable input contains missing values.
Output Arguments
EstParamCov
— Estimated covariance matrix of maximum likelihood estimates
positive semidefinite numeric matrix
Estimated covariance matrix of maximum likelihood estimates known to the optimizer, returned as a positive semidefinite numeric matrix.
The rows and columns contain the covariances of the parameter estimates. The standard error of each parameter estimate is the square root of the main diagonal entries.
The rows and columns corresponding to any parameters held fixed as equality constraints are zero vectors.
Parameters corresponding to the rows and columns of EstParamCov
appear in the following order:
Constant
Nonzero
AR
coefficients at positive lags, from the smallest to largest lagNonzero
SAR
coefficients at positive lags, from the smallest to largest lagNonzero
MA
coefficients at positive lags, from the smallest to largest lagNonzero
SMA
coefficients at positive lags, from the smallest to largest lagRegression coefficients (when you specify exogenous data), ordered by the columns of
X
or entries ofPredictorVariables
Variance parameters, a scalar for constant variance models and vector for conditional variance models (see
estimate
for the order of parameters)Degrees of freedom (t-innovation distribution only)
Data Types: double
logL
— Optimized loglikelihood objective function value
numeric scalar
Optimized loglikelihood objective function value, returned as a numeric scalar.
Data Types: double
info
— Optimization summary
structure array
Optimization summary, returned as a structure array with the fields described in this table.
Field | Description |
---|---|
exitflag | Optimization exit flag (see fmincon in Optimization Toolbox) |
options | Optimization options controller (see optimoptions and fmincon in Optimization Toolbox) |
X | Vector of final parameter estimates |
X0 | Vector of initial parameter estimates |
For example, you can display the vector of final estimates by entering info.X
in the Command Window.
Data Types: struct
Tips
Algorithms
estimate
infers innovations and conditional variances (when present) of the underlying response series, and then uses constrained maximum likelihood to fit the modelMdl
to the response datay
.Because you can specify numeric presample data inputs
Y0
,E0
, andV0
of differing lengths,estimate
assumes that all specified sets have these characteristics:The final observation (row) in each set occurs simultaneously.
The first observation in the estimation sample immediately follows the last observation in the presample, with respect to the sampling frequency.
If you specify the
Display
name-value argument, the value overrides theDiagnostics
andDisplay
settings of theOptions
name-value argument. Otherwise,estimate
displays optimization information usingOptions
settings.estimate
uses the outer product of gradients (OPG) method to perform covariance matrix estimation.If you supply data in the table or timetable
Tbl1
to estimate an ARIMAX model,estimate
cannot backcast for presample responses. Therefore, if you specifyPredictorVariables
, you must also specify presample response data by using thePresample
andPresampleResponseVariable
name-value arguments.
References
[1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
[2] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995.
[3] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008.
[4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
Version History
Introduced in R2012aR2023b: estimate
accepts input data in tables and timetables
In addition to accepting input data (in-sample and presample data) in numeric arrays,
estimate
accepts input data in tables or regular timetables. When
you supply data in a table or timetable, estimate
chooses the
default series on which to operate, but you can use the specified optional name-value
argument to select a different series.
Name-value arguments to support tabular workflows include:
ResponseVariable
specifies the variable name of the response series in the input dataTbl1
, to which the model is fit.Presample
specifies the input table or timetable of presample response, residual, and conditional variance data.PresampleResponseVariable
specifies the variable name of the response series to select fromPresample
.PresampleInnovationVariable
specifies the variable name of the residual series to select fromPresample
.PresampleVarianceVariable
specifies the variable name of the conditional variance series to select fromPresample
.PredictorVariables
specifies the names of the predictor series to select from the input data for the exogenous regression component.
R2019b: estimate
includes the final lag in all estimated univariate time series model polynomials
estimate
includes the final polynomial lag as specified in the input model template for estimation. In other words, the specified polynomial degrees of an input model template returned by an object creation function and the corresponding polynomial degrees of the estimated model returned by estimate are equal.
Before R2019b, estimate
removed trailing lags estimated below the tolerance of 1e-12
.
Polynomial degrees require minimum presample observations for operations downstream of estimation, such as model forecasting and simulation. If a model template in your code does not describe the data generating process well, then the polynomials in the estimated model can have higher degrees than in previous releases. Consequently, you must supply additional presample responses for operations on the estimated model; otherwise, the function issues an error. For more details, see the Y0
name-value argument.
R2018a: The Print
name-value argument is removed
Replace all instances of 'Print',true
with
'Display','on'
, and 'Print',false
with
'Display','off'
.
See Also
Objects
Functions
Topics
- Time Base Partitions for ARIMA Model Estimation
- Estimate Multiplicative ARIMA Model
- Estimate Conditional Mean and Variance Model
- Model Seasonal Lag Effects Using Indicator Variables
- Maximum Likelihood Estimation for Conditional Mean Models
- Conditional Mean Model Estimation with Equality Constraints
- Presample Data for Conditional Mean Model Estimation
- Initial Values for Conditional Mean Model Estimation
- Optimization Settings for Conditional Mean Model Estimation
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)