filter
Forward recursion of statespace models
Description
filter
computes statedistribution moments for each
period of the specified response data by recursively applying the Kalman
filter.
To compute updated statedistribution moments efficiently during only the final period
of the specified response data by applying one recursion of the Kalman filter, use
update
instead.
returns filtered states (X
= filter(Mdl
,Y
)X
)
from performing forward recursion of the fully specified statespace model Mdl
.
That is, filter
applies the standard Kalman filter using Mdl
and
the observed responses Y
.
uses additional options specified by one or more X
= filter(Mdl
,Y
,Name,Value
)Name,Value
arguments. For example, specify the regression coefficients and predictor data to
deflate the observations, or specify to use the squareroot filter.
If Mdl
is not fully specified, then you must specify the unknown parameters
as known scalars using the
'
Params
'
Name,Value
argument.
[
uses any of the input arguments
in the previous syntaxes to additionally return the loglikelihood
value (X
,logL
,Output
]
= filter(___)logL
) and an output structure array (Output
)
using any of the input arguments in the previous syntaxes. Output
contains:
Filtered and forecasted states
Estimated covariance matrices of the filtered and forecasted states
Loglikelihood value
Forecasted observations and its estimated covariance matrix
Adjusted Kalman gain
Vector indicating which data the software used to filter
Examples
Filter States of TimeInvariant StateSpace Model
Suppose that a latent process is an AR(1). The state equation is
$${x}_{t}=0.5{x}_{t1}+{u}_{t},$$
where $${u}_{t}$$ is Gaussian with mean 0 and standard deviation 1.
Generate a random series of 100 observations from $${x}_{t}$$, assuming that the series starts at 1.5.
T = 100; ARMdl = arima('AR',0.5,'Constant',0,'Variance',1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,'Y0',x0);
Suppose further that the latent process is subject to additive measurement error. The observation equation is
$${y}_{t}={x}_{t}+{\epsilon}_{t},$$
where $${\epsilon}_{t}$$ is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a statespace model.
Use the random latent state process (x
) and the observation equation to generate observations.
y = x + 0.75*randn(T,1);
Specify the four coefficient matrices.
A = 0.5; B = 1; C = 1; D = 0.75;
Specify the statespace model using the coefficient matrices.
Mdl = ssm(A,B,C,D)
Mdl = Statespace model type: ssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = (0.50)x1(t1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 1.33 State types x1 Stationary
Mdl
is an ssm
model. Verify that the model is correctly specified using the display in the Command Window. The software infers that the state process is stationary. Subsequently, the software sets the initial state mean and covariance to the mean and variance of the stationary distribution of an AR(1) model.
Filter states for periods 1 through 100. Plot the true state values and the filtered state estimates.
filteredX = filter(Mdl,y); figure plot(1:T,x,'k',1:T,filteredX,':r','LineWidth',2) title({'State Values'}) xlabel('Period') ylabel('State') legend({'True state values','Filtered state values'})
The true values and filter estimates are approximately the same.
Filter States of StateSpace Model Containing Regression Component
Suppose that the linear relationship between the change in the unemployment rate and the nominal gross national product (nGNP) growth rate is of interest. Suppose further that the first difference of the unemployment rate is an ARMA(1,1) series. Symbolically, and in statespace form, the model is
$$\begin{array}{l}\left[\begin{array}{c}{x}_{1,t}\\ {x}_{2,t}\end{array}\right]=\left[\begin{array}{cc}\varphi & \theta \\ 0& 0\end{array}\right]\left[\begin{array}{c}{x}_{1,t1}\\ {x}_{2,t1}\end{array}\right]+\left[\begin{array}{c}1\\ 1\end{array}\right]{u}_{1,t}\\ {y}_{t}\beta {Z}_{t}={x}_{1,t}+\sigma {\epsilon}_{t},\end{array}$$
where:
$${x}_{1,t}$$ is the change in the unemployment rate at time t.
$${x}_{2,t}$$ is a dummy state for the MA(1) effect.
$${y}_{1,t}$$ is the observed change in the unemployment rate being deflated by the growth rate of nGNP ($${Z}_{t}$$).
$${u}_{1,t}$$ is the Gaussian series of state disturbances having mean 0 and standard deviation 1.
$${\epsilon}_{t}$$ is the Gaussian series of observation innovations having mean 0 and standard deviation $$\sigma $$.
Load the NelsonPlosser data set, which contains the unemployment rate and nGNP series, among other things.
load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series, and the first difference of each series. Also, remove the starting NaN
values from each series.
isNaN = any(ismissing(DataTable),2); % Flag periods containing NaNs gnpn = DataTable.GNPN(~isNaN); u = DataTable.UR(~isNaN); T = size(gnpn,1); % Sample size Z = [ones(T1,1) diff(log(gnpn))]; y = diff(u);
Though this example removes missing values, the software can accommodate series containing missing values in the Kalman filter framework.
Specify the coefficient matrices.
A = [NaN NaN; 0 0]; B = [1; 1]; C = [1 0]; D = NaN;
Specify the statespace model using ssm
.
Mdl = ssm(A,B,C,D);
Estimate the model parameters, and use a random set of initial parameter values for optimization. Specify the regression component and its initial value for optimization using the 'Predictors'
and 'Beta0'
namevalue pair arguments, respectively. Restrict the estimate of $$\sigma $$ to all positive, real numbers.
params0 = [0.3 0.2 0.2]; [EstMdl,estParams] = estimate(Mdl,y,params0,'Predictors',Z,... 'Beta0',[0.1 0.2],'lb',[Inf,Inf,0,Inf,Inf]);
Method: Maximum likelihood (fmincon) Sample size: 61 Logarithmic likelihood: 99.7245 Akaike info criterion: 209.449 Bayesian info criterion: 220.003  Coeff Std Err t Stat Prob  c(1)  0.34098 0.29608 1.15164 0.24948 c(2)  1.05003 0.41377 2.53771 0.01116 c(3)  0.48592 0.36790 1.32079 0.18657 y < z(1)  1.36121 0.22338 6.09358 0 y < z(2)  24.46711 1.60018 15.29024 0   Final State Std Dev t Stat Prob x(1)  1.01264 0.44690 2.26592 0.02346 x(2)  0.77718 0.58917 1.31912 0.18713
EstMdl
is an ssm
model, and you can access its properties using dot notation.
Filter the estimated statespace model. EstMdl
does not store the data or the regression coefficients, so you must pass in them in using the namevalue pair arguments 'Predictors'
and 'Beta'
, respectively. Plot the estimated, filtered states. Recall that the first state is the change in the unemployment rate, and the second state helps build the first.
filteredX = filter(EstMdl,y,'Predictors',Z,'Beta',estParams(end1:end)); figure plot(dates(end(T1)+1:end),filteredX(:,1)); xlabel('Period') ylabel('Change in the unemployment rate') title('Filtered Change in the Unemployment Rate')
Filter Series in Real Time
Consider nowcasting the model in Filter States of TimeInvariant StateSpace Model.
Generate a random series of 100 observations from $${x}_{t}$$.
T = 100;
A = 0.5;
B = 1;
C = 1;
D = 0.75;
Mdl = ssm(A,B,C,D);
rng(1); % For reproducibility
y = simulate(Mdl,T);
Suppose the final 10 observations are in the forecast horizon.
fh = 10; yf = y((endfh+1):end); % Holdout sample responses y = y(1:endfh); % Insample responses
Filter the observations through the model to obtain filtered states for each period.
[xhat,logL,output] = filter(Mdl,y); xhatvar = output.FilteredStatesCov;
xhat
and xhatvar
are 90by1 vectors of insample filtered states and corresponding variances, respectively. xhat(
t
)
is the estimate of $\mathit{E}\left({\mathit{x}}_{\mathit{t}}{\mathit{y}}_{1},...,{\mathit{y}}_{\mathit{t}}\right)$, and xhatvar
is the estimate of $\mathrm{Var}\left({\mathit{x}}_{\mathit{t}}{\mathit{y}}_{1},...,{\mathit{y}}_{\mathit{t}}\right)$.
Call filter
again, but specify the realtime update option.
[xhatRT,logLRT,outputRT] = filter(Mdl,y,'RealTimeUpdate',true);
xhatRTvar = outputRT.FilteredStatesCov;
xhatRT
and xhatRTvar
are scalars representing the estimate of $\mathit{E}\left({\mathit{x}}_{90}{\mathit{y}}_{1},...,{\mathit{y}}_{90}\right)$ and its corresponding variance, respectively.
Compare the filtered states and variances of period 90.
tol = 1e10; areMeansEqual = (xhat(end)  xhatRT) < tol
areMeansEqual = logical
1
areVarsEqual = (xhatvar(end)  xhatRTvar) < tol
areVarsEqual = logical
0
areLogLsEqual = (logL  logLRT) < tol;
In the last period, the filtered states, their variances, and the loglikelihoods are equal.
Nowcast the model into the forecast horizon by performing this procedure for each successive period:
Set the initial state and its variance to their current filter estimates. This action changes the statespace model.
As an observation becomes available, filter it through the model in real time.
% Initialize state and variance state0 = xhatRT; var0 = xhatRTvar; % Preallocate xhatRTF = zeros(fh,1); xhatRTvarF = zeros(fh,1); for j = 1:fh Mdl.Mean0 = state0; Mdl.Cov0 = var0; [xhatRTF(j),~,outputRT] = filter(Mdl,yf(j),'RealTimeUpdate',true); % Alternatively, use update xhatRTvarF(j) = outputRT.FilteredStatesCov; state0 = xhatRTF(j); var0 = xhatRTvarF(j); end
Plot the data and nowcasts.
figure plot((Tfh20):T,[y(end20:end); yf],'b',(Tfh+1):T,xhatRTF,'r*') legend(["Data" "Nowcasts"],'Location',"best")
Input Arguments
Mdl
— Standard statespace model
ssm
model object
Standard statespace model, specified as an ssm
model
object returned by ssm
or estimate
.
If Mdl
is not fully specified (that is, Mdl
contains
unknown parameters), then specify values for the unknown parameters using the
'
Params
'
namevalue
argument. Otherwise, the software issues an error. estimate
returns
fullyspecified statespace models.
Mdl
does not store observed responses or predictor data. Supply the data
wherever necessary using the appropriate input or namevalue arguments.
Y
— Observed response data
numeric matrix  cell vector of numeric vectors
Observed response data, specified as a numeric matrix or a cell vector of numeric vectors.
If
Mdl
is time invariant with respect to the observation equation, thenY
is a Tbyn matrix, where each row corresponds to a period and each column corresponds to a particular observation in the model. T is the sample size and m is the number of observations per period. The last row ofY
contains the latest observations.If
Mdl
is time varying with respect to the observation equation, thenY
is a Tby1 cell vector. Each element of the cell vector corresponds to a period and contains an n_{t}dimensional vector of observations for that period. The corresponding dimensions of the coefficient matrices inMdl.C{t}
andMdl.D{t}
must be consistent with the matrix inY{t}
for all periods. The last cell ofY
contains the latest observations.
NaN
elements indicate missing observations. For details on how the
Kalman filter accommodates missing observations, see Algorithms.
NameValue Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Namevalue arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'Beta',beta,'Predictors',Z
specifies
to deflate the observations by the regression component composed of
the predictor data Z
and the coefficient matrix beta
.
Beta
— Regression coefficients
[]
(default)  numeric matrix
Regression coefficients corresponding to predictor variables,
specified as the commaseparated pair consisting of 'Beta'
and
a dbyn numeric matrix. d is
the number of predictor variables (see Predictors
)
and n is the number of observed response series
(see Y
).
If Mdl
is an estimated statespace model,
then specify the estimated regression coefficients stored in estParams
.
Params
— Values for unknown parameters
numeric vector
Values for unknown parameters in the statespace model, specified as the commaseparated pair consisting of 'Params'
and a numeric vector.
The elements of Params
correspond to the unknown parameters in the statespace model matrices A
, B
, C
, and D
, and, optionally, the initial state mean Mean0
and covariance matrix Cov0
.
If you created
Mdl
explicitly (that is, by specifying the matrices without a parametertomatrix mapping function), then the software maps the elements ofParams
toNaN
s in the statespace model matrices and initial state values. The software searches forNaN
s columnwise following the orderA
,B
,C
,D
,Mean0
, andCov0
.If you created
Mdl
implicitly (that is, by specifying the matrices with a parametertomatrix mapping function), then you must set initial parameter values for the statespace model matrices, initial state values, and state types within the parametertomatrix mapping function.
If Mdl
contains unknown parameters, then you must specify their values. Otherwise, the software ignores the value of Params
.
Data Types: double
Predictors
— Predictor variables in statespace model observation equation
[]
(default)  numeric matrix
Predictor variables in the statespace model observation equation,
specified as the commaseparated pair consisting of 'Predictors'
and
a Tbyd numeric matrix. T is
the number of periods and d is the number of predictor
variables. Row t corresponds to the observed predictors
at period t (Z_{t}).
The expanded observation equation is
$${y}_{t}{Z}_{t}\beta =C{x}_{t}+D{u}_{t}.$$
That is, the software deflates the observations using the regression component. β is the timeinvariant vector of regression coefficients that the software estimates with all other parameters.
If there are n observations per period, then the software regresses all predictor series onto each observation.
If you specify Predictors
, then Mdl
must
be time invariant. Otherwise, the software returns an error.
By default, the software excludes a regression component from the statespace model.
Data Types: double
SquareRoot
— Square root filter method flag
false
(default)  true
Square root filter method flag, specified as the commaseparated pair consisting of
'SquareRoot'
and true
or
false
. If true
, then
filter
applies the square root filter method when
implementing the Kalman filter.
If you suspect that the eigenvalues of the filtered state or
forecasted observation covariance matrices are close to zero, then
specify 'SquareRoot',true
. The square root filter
is robust to numerical issues arising from finite the precision of
calculations, but requires more computational resources.
Example: 'SquareRoot',true
Data Types: logical
Tolerance
— Forecast uncertainty threshold
0
(default)  nonnegative scalar
Forecast uncertainty threshold, specified as the commaseparated
pair consisting of 'Tolerance'
and a nonnegative
scalar.
If the forecast uncertainty for a particular observation is
less than Tolerance
during numerical estimation,
then the software removes the uncertainty corresponding to the observation
from the forecast covariance matrix before its inversion.
It is best practice to set Tolerance
to a
small number, for example, le15
, to overcome numerical
obstacles during estimation.
Example: 'Tolerance',le15
Data Types: double
Univariate
— Univariate treatment of multivariate series flag
false
(default)  true
Univariate treatment of a multivariate series flag, specified
as the commaseparated pair consisting of 'Univariate'
and true
or false
.
Univariate treatment of a multivariate series is also known as sequential
filtering.
The univariate treatment can accelerate and improve numerical stability of the Kalman filter. However, all observation innovations must be uncorrelated. That is, D_{t}D_{t}' must be diagonal, where D_{t}, t = 1,...,T, is one of the following:
The matrix
D{t}
in a timevarying statespace modelThe matrix
D
in a timeinvariant statespace model
Example: 'Univariate',true
Data Types: logical
RealTimeUpdate
— Flag indicating whether to apply realtime filter
false
(default)  true
Flag indicating whether to apply the realtime filter, specified as a value in the table.
Value  Description 

true 

false 

Example: 'RealTimeUpdate',true
Data Types: logical
Output Arguments
X
— Filtered states
numeric matrix  cell vector of numeric vectors
Filtered states, returned as a numeric matrix or a cell vector of numeric vectors.
If Mdl
is time invariant, then the number
of rows of X
is the sample size, T,
and the number of columns of X
is the number of
states, m. The last row of X
contains
the latest filtered states.
If Mdl
is time varying, then X
is
a cell vector with length equal to the sample size. Cell t of X
contains
a vector of filtered states with length equal to the number of states
in period t. The last cell of X
contains
the latest filtered states.
If you set the RealTimeUpdate
namevalue argument to
true
, filter
returns only
the filtered state for time T, a
1bym vector. For more details, see RealTimeUpdate
.
logL
— Loglikelihood function value
scalar
Loglikelihood function value, returned as a scalar.
Missing observations do not contribute to the loglikelihood.
Output
— Filtering results by period
structure array
Filtering results by period, returned as a structure array.
Output
is a Tby1 structure,
where element t corresponds to the filtering result
at time t.
If
Univariate
isfalse
(it is by default), then the following table outlines the fields ofOutput
.Field Description Estimate of LogLikelihood
Scalar loglikelihood objective function value N/A FilteredStates
m_{t}by1 vector of filtered states $$E\left({x}_{t}{y}_{1},\mathrm{...},{y}_{t}\right)$$ FilteredStatesCov
m_{t}bym_{t} variancecovariance matrix of filtered states $$Var\left({x}_{t}{y}_{1},\mathrm{...},{y}_{t}\right)$$ ForecastedStates
m_{t}by1 vector of state forecasts $$E\left({x}_{t}{y}_{1},\mathrm{...},{y}_{t1}\right)$$ ForecastedStatesCov
m_{t}bym_{t} variancecovariance matrix of state forecasts $$Var\left({x}_{t}{y}_{1},\mathrm{...},{y}_{t1}\right)$$ ForecastedObs
h_{t}by1 forecasted observation vector $$E\left({y}_{t}{y}_{1},\mathrm{...},{y}_{t1}\right)$$ ForecastedObsCov
h_{t}byh_{t} variancecovariance matrix of forecasted observations $$Var\left({y}_{t}{y}_{1},\mathrm{...},{t}_{t1}\right)$$ KalmanGain
m_{t}byn_{t} adjusted Kalman gain matrix N/A DataUsed
h_{t}by1 logical vector indicating whether the software filters using a particular observation. For example, if observation i at time t is a NaN
, then element i inDataUsed
at time t is0
.N/A If
Univarite
istrue
, then the fields ofOutput
are the same as in the previous table, except for the following amendments.Field Changes ForecastedObs
Same dimensions as if Univariate = 0
, but only the first elements are equalForecastedObsCov
nby1 vector of forecasted observation variances.
The first element of this vector is equivalent to
ForecastedObsCov(1,1)
whenUnivariate
isfalse
. The rest of the elements are not necessarily equivalent to their corresponding values inForecastObsCov
whenUnivariate
.KalmanGain
Same dimensions as if Univariate
isfalse
, thoughKalmanGain
might have different entries.
If you set the RealTimeUpdate
namevalue argument to
true
, filter
returns only
the filtered states for time T, its covariance matrix,
and the loglikelihood (in other words, the sum of the loglikelihoods
returned by update
). For more details, see RealTimeUpdate
.
Tips
Mdl
does not store the response data, predictor data, and the regression coefficients. Supply the data wherever necessary using the appropriate input or namevalue arguments.To accelerate estimation for lowdimensional, timeinvariant models, set
'Univariate',true
. Using this specification, the software sequentially updates rather then updating all at once during the filtering process.
Algorithms
The Kalman filter accommodates missing data by not updating filtered state estimates corresponding to missing observations. In other words, suppose there is a missing observation at period t. Then, the state forecast for period t based on the previous t – 1 observations and filtered state for period t are equivalent.
For explicitly defined statespace models,
filter
applies all predictors to each response series. However, each response series has its own set of regression coefficients.
Alternative Functionality
To filter a standard statespace model in real time by performing one forward
recursion of the Kalman filter, call the update
function instead. Unlike filter
, update
performs minimal input validation for computational efficiency.
References
[1] Durbin, J, and Siem Jan Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
Version History
Introduced in R2014a
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)