Notes Lecture 14: The ARMA(p,q) process STAT352: Applied Time Series Tilman M. Davies Dept. of Mathematics & Statistics University of Otago This Lecture Bringing the autoregressive and moving average models together ARMA fitting Notes AR + MA = ARMA Notes We have covered the AR and MA models extensively. However, up until now, we have treated them as completely separate entities. The ARMA(p,q) process incorporates both modelling frameworks under one framework. The ARMA(p,q) process X is an Autoregressive Moving Average process of order p, q if Xt = µ + Wt + θ1 Wt−1 + θ2 Wt−2 + . . . + θq Wt−q + φ1 (Xt−1 − µ) + φ2 (Xt−2 − µ) + . . . + φp (Xt−p − µ), where µ is the stationary mean of the series, W is white noise with variance σ 2 and θ = {θ1 , . . . , θq } and φ = {φ1 , . . . , φp } are constants. Notes The ARMA(p,q) process: important notes Notes It looks a little ugly in its general form, but the ARMA(p,q) process is just the sum of AR(p) terms with those of an MA(q) process around some stationary mean! OMG! WTF? LOL @ ARMA. As you can also see from the previous slide, the pure AR(p) and MA(q) models are just special cases of the ARMA(p,q) model: The ARMA(p,0) model is exactly an AR(p) model. The ARMA(0,q) model is exactly an MA(q) model. Because of this, we should automatically try to fit an ARMA model to a stationary time series as a first step, as pure AR and MA models are automatically included as possibilities under this umbrella. Aw c’mon Tilly. Why do we care? By possessing the ability to incorporate behaviour apparent in both the pure AR and MA processes, the ARMA process allows for far more flexibility in modelling an assumed stationary time series. It subsequently has the typical advantage of requiring us to estimate far fewer parameters than if we considered the AR and MA models separately. This is known as the Principle of Parsimony. Notes Principle of Parsimony and ARMA models Notes The Principle of Parsimony says we want to find a model with as few parameters as possible, but which gives an adequate representation of the data at hand. The ARMA model satisfies this principle: it is a fact that a stationary time series may often be adequately modelled by an ARMA model involving fewer parameters than a pure MA or AR process by itself. Properties of the ARMA(1,1) model The model is Xt = µ + Wt + θ1 Wt−1 + φ1 (Xt−1 − µ) . Using similar analytical methods as for the pure AR process, it can be shown that E[Xt ] = µ, that Var[Xt ] = σ 2 + σ 2 (θ1 + φ1 )2 (1 − φ21 )−1 , and that 2 γX (h) = Cov[Xt , Xt+h ] = (θ1 + φ1 )φh−1 1 σ + (θ1 + φ1 )2 σ 2 φh1 (1 − φ21 )−1 . The methods used to derive these properties are the same as for the more general case of an ARMA(p,q) model. Notes Conditions on the parameters of an ARMA(p,q) process Notes Remember, there are some restrictions on the values of the parameters for an MA(q) model to satisfy the condition of invertibility. There are even more restrictions on the values of the parameters for an AR(p) model to be stationary and causal. Especially for the AR(p) process, these ‘restrictions’ are difficult to clarify further without some rather involved mathematics. But just be aware that they do exist i.e. that we can’t just have any random values as the parameters. These restrictions for the pure AR and MA models flow directly through to the ARMA model, so that we end up with a stationary, causal, and invertible process. The parameter estimation techniques we consider automatically produce values for θ and φ which result in a process that satisfies these conditions. Parameter estimation for the ARMA(p,q) process Estimation of θ and φ in an ARMA model given some observed time series X = {x1 , . . . , xN } follows the same ideas as parameter estimation for the MA(q) process. That is, we use a conditional-least-squares-followed-by-maximum-likelihood approach. Remember, the conditional least squares method finds the combination of parameter valuesPthat minimises the estimated residual sum of squares RSS = t w ˆ t2 . Notes Example 1: Conditional least squares for the ARMA(1,1) model Notes Suppose we have some observed data X = {x1 , . . . , xN } that we wish to fit an ARMA(1,1) model to. As in the previous slide, the model is Xt = µ + Wt + θ1 Wt−1 + φ1 (Xt−1 − µ). We then guess some values for µ, θ1 and φ1 , set w0 = 0 and x0 = µ, and than calculate the residuals recursively by w ˆ 1 = x1 − µ w ˆ 2 = x2 − µ − θ1 w ˆ 1 − φ1 (x1 − µ) up to w ˆ N = xN − µ − θ1 w ˆ N−1 − φ1 (xN−1 − µ). P 2 Then, RSS = t w ˆ t may be calculated. Example 1: Conditional least squares for the ARMA(1,1) model (cont.) The procedure on the previous slide is repeated for different combinations of µ, θ1 , φ1 . The conditional sum of squares estimates is represented by the combination which results in the smallest value of RSS. In the computer functions, these estimates are then used as starting values for a numerical procedure to find the parameter values which maximise a specific likelihood function. The actual value of this log-transformed likelihood function, at the maximum likelihood estimates of the parameters, is provided in the R output as log likelihood after running arma.fit (we’ll see this in a moment) or ma.fit or ar.fit. Note that the if we decided beforehand that µ = 0, then this discussion still holds, except that we only seek estimates of θ1 , φ1 . Notes Order selection for the ARMA(p,q) model Notes Recall the AIC measure allowed us to automate the procedure for selecting an appropriate order for the pure AR(p) and MA(q) processes. Minimisation of this criterion provides a model in which we aim to balance goodness-of-fit and complexity in terms of the number of parameter requiring estimation given some observed time series. So, given a time series, we can successively fit higher-order models and find the combination of p and q which provide the smallest AIC. Since the pure AR and MA processes are special cases of the ARMA model, this essentially means the R functions ma.orderselect and ar.orderselect are obsolete. arma.orderselect Order selection via AIC for a general ARMA(p,q) model, which obviously includes the possibility of selecting a pure AR(p) or MA(q) process, is achieved in R with the function arma.orderselect. It has precisely the same three arguments and is used in exactly the same way as ma.orderselect and ar.orderselect. There is one small note to make: unlike ma.orderselect and ar.orderselect, the argument order.max has no default value and must be set by the user. Notes arma.orderselect (cont.) Notes Also, order.max in arma.orderselect refers to both p and q i.e. the search for the best order according to AIC will only take place up to the ARMA(order.max,order.max) model. The function can be particularly computationally expensive; more so than ma.orderselect or ar.orderselect. You should start with a modest value, e.g. order.max=10 or lower. arma.fit Notes Much like ma.fit and ar.fit, arma.fit uses conditional-least-squares-followed-by-maximum-likelihood to fit an ARMA model, once your order has been selected. The function has four arguments: 1 2 3 4 X: your time series (a numeric vector) p: your selected value of p for the AR component of the model q: your selected value of q for the MA component of the model include.mean: whether or not we wish to include µ in the estimation procedure Using arma.fit with p=0 will produce identical results to using ma.fit. Using arma.fit with q=0 will produce identical results to using ar.fit. Example 2: Level of Lake Heron Notes 582 Level of Lake Heron, 1875−1972 ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● 579 ● ● ●● ●● ●● ●● ● ● ●● ● ●●●● ● ● 578 level (ft) 580 581 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 577 ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●●● ●● 576 ● ● 0 20 40 60 80 100 year Example 2: Level of Lake Heron (cont.) > arma.orderselect(heron,include.mean=TRUE,order.max=10) |===============================================| 100% p q aic 1.0000 1.0000 214.4905 > > heron.arma <- arma.fit(heron,p=1,q=1,include.mean=TRUE) > heron.arma Call: arima(x = ts(X, 1, length(X)), order = c(p, 0, q), include.mean = TRUE) Coefficients: ar1 ma1 0.7449 0.3206 s.e. 0.0777 0.1135 intercept 579.0555 0.3501 sigma^2 estimated as 0.4749: log likelihood = -103.25, aic = 214.49 AIC has elected an ARMA(1,1) model for the Lake Heron data. In the output, which estimate corresponds to θˆ1 and φˆ1 ? Why have we chosen to include µ? What is its estimate? Notes Example 2: Level of Lake Heron (cont.) Notes > acf(heron,lag.max=lgm,main="Lake Heron ACF") > lines(0:lgm,ARMAacf(ar=coef(heron.arma)[1],ma=coef(heron.arma)[2],lag.max=lgm),col=2,lwd=2,lty=2) 0.4 −0.2 0.0 0.2 ACF 0.6 0.8 1.0 Lake Heron ACF 0 5 10 15 20 25 30 Lag Example 2: Level of Lake Heron (cont.) As always, we should make sure via diagnostic plots that the estimated residuals under this model do appear to be white noise. > tsdiag(heron.arma) −2 0 1 2 Standardized Residuals 0 20 40 60 80 100 Time 0.6 0.2 −0.2 ACF 1.0 ACF of Residuals 0 5 10 15 Lag p values for Ljung−Box statistic ● ● ● ● ● ● ● ● 0.0 p value ● 0.4 0.8 ● 2 4 6 lag and everything looks sweet as. 8 10 Notes Example 3: Pure AR or MA model for Lake Heron data...? Notes > ar.orderselect(heron,include.mean=TRUE,order.max=10) |=============================================| 100% p aic 2.0000 215.2664 > ma.orderselect(heron,include.mean=TRUE,order.max=10) |=============================================| 100% q aic 3.0000 222.1263 The above results show that the ARMA(1,1) has a smaller AIC than the ‘best’ models AIC could select for either a pure AR or MA process. And the Principle of Parsimony is exhibited here, with a larger number of parameters being required in the case of a pure MA process compared to the selected ARMA process. Example 4: Comparing the functions > ar.fit(heron,p=2,include.mean=TRUE) Call: arima(x = ts(X, 1, length(X)), order = c(p, 0, 0), include.mean = TRUE) Coefficients: ar1 ar2 1.0436 -0.2495 s.e. 0.0983 0.1008 intercept 579.0473 0.3319 sigma^2 estimated as 0.4788: log likelihood = -103.63, aic = 215.27 > arma.fit(heron,p=2,q=0,include.mean=TRUE) Call: arima(x = ts(X, 1, length(X)), order = c(p, 0, q), include.mean = TRUE) Coefficients: ar1 ar2 1.0436 -0.2495 s.e. 0.0983 0.1008 intercept 579.0473 0.3319 sigma^2 estimated as 0.4788: log likelihood = -103.63, aic = 215.27 Notes Example 4: Comparing the functions (cont.) Notes > ma.fit(heron,q=3,include.mean=TRUE) Call: arima(x = ts(X, 1, length(X)), order = c(0, 0, q), include.mean = TRUE) Coefficients: ma1 ma2 1.0872 0.7444 s.e. 0.0958 0.1160 ma3 0.3671 0.1083 sigma^2 estimated as 0.5029: intercept 579.0086 0.2266 log likelihood = -106.06, aic = 222.13 > arma.fit(heron,p=0,q=3,include.mean=TRUE) Call: arima(x = ts(X, 1, length(X)), order = c(p, 0, q), include.mean = TRUE) Coefficients: ma1 ma2 1.0872 0.7444 s.e. 0.0958 0.1160 ma3 0.3671 0.1083 sigma^2 estimated as 0.5029: intercept 579.0086 0.2266 log likelihood = -106.06, aic = 222.13 Example 4: Comparing the model ACFs Notes 1.0 Lake Heron ACF 0.4 0.2 0.0 −0.2 ACF 0.6 0.8 ARMA(1,1) AR(2) MA(3) 0 5 10 15 Lag 20 25 30 Next lecture... Notes What’s the forecast? Prediction from stationary ARMA models. Notes

© Copyright 2017 ExploreDoc