Intervention Analysis

Econ 5316 time Series Econometrics(Spring 2017) / Homework5 Problem2 Keunyoung (Kay) Kim

1. Introduction

This intervention analysis is used to assess of how the mean level of a series changes after July 2008 recession. For this, first, identify and estimate a seasonal ARIMA for pre intervention period. Second, identify and estimate a transfer function model until Dec. 2014. Finally, using estimated transfer function model, produce forecasts until the end of 2016 and evaluate the accuracy of them.

2.Data

The monthly data for Retail and Food Services Sales FRED/RSAFSNA are imported from Quandl website (FRED/RSAFSNA). This time series shows the seasonality and trend. Therefore we need to consider a seasonal AR or MA component as a regular AR or MA component. In addition to that we should take an intervention approach since it looks there was a shock around July 2008.

I followed the Box-Jenkins methodology to build a time series model based on the data until June 2008. Seasonally(period=12) and regularly differencde data, \(\Delta\Delta_{12} log(y)\) shows better stationarity.

3. Methodology

We need to use intervention analysis approach to incorporate explicitly the effect of 2008 recession on this time series:

(1) Identify and estimate a seasonal ARIMA model for pre-intervention period up until June 2008

I checked ACF and PACF to estimate model.

I considered several alternative models to find the better one. As a result, model m2:ARIMA(9,1,0)(2,1,0)(12) for log(y) is selected since it shows the higher p-values for Ljung-Box statistics and look more adequate. These model also have smaller AICc.

# estimate model for pre-intervention period, based on ACF and PACF using just regular MA(1)
m1 <- Arima(ly, order=c(3,1,0),seasonal=list(order=c(1,1,1), period=12), fixed=c(NA, NA, 0, NA, NA))
m1

## Series: ly 
## ARIMA(3,1,0)(1,1,1)[12]                    
## 
## Coefficients:
##           ar1      ar2  ar3    sar1     sma1
##       -0.7545  -0.5054    0  0.1279  -0.7232
## s.e.   0.0674   0.0681    0  0.1108   0.0752
## 
## sigma^2 estimated as 0.0003439:  log likelihood=476.79
## AIC=-943.57   AICc=-943.23   BIC=-927.47

tsdiag(m1, gof.lag=maxlag)

# seasonal AR(2) lowers both AICc and BIC, S-MA(1) term is significant
m2 <- Arima(ly, order=c(9,1,0),seasonal=list(order=c(2,1,0), period=12))
m2

## Series: ly 
## ARIMA(9,1,0)(2,1,0)[12]                    
## 
## Coefficients:
##           ar1     ar2      ar3      ar4      ar5     ar6     ar7     ar8
##       -0.7507  -0.537  -0.1356  -0.2393  -0.0264  0.0235  0.0421  0.0329
## s.e.   0.0700   0.092   0.1052   0.1045   0.1068  0.1037  0.1048  0.0949
##          ar9     sar1     sar2
##       0.3087  -0.5686  -0.3380
## s.e.  0.0746   0.0804   0.0743
## 
## sigma^2 estimated as 0.0003089:  log likelihood=490.38
## AIC=-956.75   AICc=-954.94   BIC=-918.11

tsdiag(m2, gof.lag=maxlag)

Next I forecasted based on the model(m2) for the pre-intervention period. This forecast is based on the pre-intervention period model, so it does not reflect the intervention effect(refer to blue line in the plot below).

The figure of multistep forecast error indicates the short term and long term effect with different size and signs. It can be achieved as combination of pulse function and step function.

(2) Identify and estimate a transfer function model for the period up until December 2014, using July 2008 as the date of intervention

Next, I estimated model with transfer function.

I checked the adequacy of transfer function. As a result, model m4:ARIMA(9,1,0)(2,1,1)(12) is preferred, since it shows the higher p-values for Ljung-Box statistics and smaller AICc and BIC. Here, the transfer function assumes that the effect of July 2008 recession disappears but with an additional term so that \(f_t = f^a_t + f^b_t\).

m3 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12),
             xtransf=data.frame(P2008), transfer=list(c(1,0)), method="ML")
m3

## 
## Call:
## arimax(x = log(y), order = c(9, 1, 0), seasonal = list(order = c(2, 1, 0), period = 12), 
##     method = "ML", xtransf = data.frame(P2008), transfer = list(c(1, 0)))
## 
## Coefficients:
##           ar1      ar2     ar3     ar4     ar5     ar6      ar7     ar8
##       -0.5390  -0.2515  0.1816  0.0198  0.1216  0.0546  -0.0141  0.0177
## s.e.   0.0605   0.0740  0.0723  0.0730  0.0717  0.0721   0.0719  0.0708
##          ar9     sar1     sar2  P2008-AR1  P2008-MA0
##       0.2732  -0.5509  -0.3404     0.6005     0.0286
## s.e.  0.0621   0.0646   0.0612     0.1525     0.0146
## 
## sigma^2 estimated as 0.0003453:  log likelihood = 671.89,  aic = -1317.78

tsdiag(m3, gof.lag=maxlag)

m4 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12),
             xtransf=data.frame(P2008,P2008), transfer=list(c(0,0),c(1,0)), method="ML")
m4

## 
## Call:
## arimax(x = log(y), order = c(9, 1, 0), seasonal = list(order = c(2, 1, 1), period = 12), 
##     method = "ML", xtransf = data.frame(P2008, P2008), transfer = list(c(0, 
##         0), c(1, 0)))
## 
## Coefficients:
##           ar1      ar2     ar3      ar4     ar5     ar6      ar7     ar8
##       -0.5113  -0.2652  0.0946  -0.0197  0.0949  0.1005  -0.0013  0.0262
## s.e.   0.0619   0.0734  0.0726   0.0720  0.0714  0.0713   0.0721  0.0711
##          ar9    sar1     sar2     sma1  P2008-MA0  P2008.1-AR1
##       0.2166  0.0140  -0.1448  -0.6574    -0.0292       0.4939
## s.e.  0.0644  0.1026   0.0762   0.0815     0.0360       0.1799
##       P2008.1-MA0
##            0.0573
## s.e.       0.0377
## 
## sigma^2 estimated as 0.0003154:  log likelihood = 682.27,  aic = -1334.54

tsdiag(m4, gof.lag=maxlag)

(3) Use the estimated transfer function model to produce and plot a multistep forecast, and a sequence of one month ahead forecasts until the end of 2016.

I constructed forecast for the model with transfer function using both model m3 and m4.

## Series: log(y) 
## Regression with ARIMA(9,1,0)(2,1,1)[12] errors 
## 
## Coefficients:
##           ar1      ar2     ar3      ar4     ar5     ar6      ar7    ar8
##       -0.5033  -0.2596  0.0955  -0.0220  0.0929  0.0976  -0.0095  0.017
## s.e.   0.0618   0.0726  0.0723   0.0722  0.0712  0.0709   0.0712  0.070
##          ar9    sar1     sar2     sma1  tf3[1:length(y)]
##       0.2112  0.0207  -0.1477  -0.6610            1.0351
## s.e.  0.0640  0.1016   0.0756   0.0803            0.5345
## 
## sigma^2 estimated as 0.0003403:  log likelihood=681.84
## AIC=-1335.68   AICc=-1333.98   BIC=-1285.67

## Series: log(y) 
## Regression with ARIMA(9,1,0)(2,1,1)[12] errors 
## 
## Coefficients:
##           ar1      ar2     ar3      ar4     ar5     ar6      ar7     ar8
##       -0.5113  -0.2652  0.0946  -0.0197  0.0949  0.1005  -0.0013  0.0261
## s.e.   0.0618   0.0731  0.0725   0.0720  0.0714  0.0712   0.0721  0.0710
##          ar9    sar1     sar2     sma1  m4$coef["P2008-MA0"] * P2008x
##       0.2166  0.0140  -0.1448  -0.6574                         1.0001
## s.e.  0.0643  0.1025   0.0762   0.0814                         0.9853
##       m4$coef["P2008.1-MA0"] * filter(P2008x, filter = m4$coef["P2008.1-AR1"], 
##                                                                          1.0001
## s.e.                                                                     0.5500
## 
## sigma^2 estimated as 0.0003406:  log likelihood=682.27
## AIC=-1334.54   AICc=-1332.6   BIC=-1280.96

I plotted forecast for both m3 and m4 models with transfer function and they look very similar.

I compared the estimated 2008 recession effects on Retail and Food Services Sales. Both models look similar.

I plotted actual and fitted values of model m4. The error terms oscillate around zero.

I created a one-month ahead rolling forecast based on model m4 from January 2015 to December 2016.

(4) Compare accuracy : multistep vs. one month ahead

RMSE of rolling scheme model using ARIMA(9,1,0)(2,1,1)(12) is much smller than RMSE of multi-step model using same model when we forecast log(y).

Multi-step model using ARIMA(9,1,0)(2,1,1)(12): RMSE=0.0237

Rolling scheme model using ARIMA(9,1,0)(2,1,1)(12): RMSE= 0.0138

##                   ME      RMSE        MAE        MPE      MAPE       ACF1
## Test set -0.02130619 0.0238394 0.02130619 -0.1639048 0.1639048 -0.1793825
##          Theil's U
## Test set 0.2531675

##                     ME       RMSE        MAE         MPE       MAPE
## Test set -0.0006494715 0.01394127 0.01049293 -0.00522695 0.08078912
##                ACF1 Theil's U
## Test set -0.2665874 0.1461409

4.Conclusions

In conclusion, one-month rolling scheme forecast based on ARIMA(9,1,0)(2,1,1)(12) has the smallest RMSE and shows the best performance for forecasting.

* R code for analysis

library(Quandl)
Quandl.api_key("XzdSwkDsE98Mxj3ixQzG")

# Monthly retail and Food Services Sales: FRED/FSAFSNA

#library(xts)
#rfs <- Quandl("FRED/RSAFSNA", type = "ts")

library(TSA)
library(zoo)
rfs <- Quandl("FRED/RSAFSNA", type = "zoo")
y=rfs
plot(y, xlab="Year", ylab="Spending", main="Monthly retail and Food Services Sales")

yall <- window(rfs, end=2016+11/12)

# identify the model for the process over the pre-intervention period

# pre-intervention period
y <- window(yall, end=2008+5/12)

ly <- log(y)
dly1 <-diff(ly)
dly12 <- diff(ly, lag=12)
d2ly1_12 <- diff(diff(ly), lag=12)

library(forecast)
# Original and transformed data
par(mfrow = c(2, 3))
plot(y, main = expression(y))
plot(ly, main = expression(log(y)))

plot.new()
plot(dly1, ylab='', xlab='year', main=expression(paste(Delta, "log(y)")))
plot(dly12, ylab='', xlab='year', main=expression(paste(Delta[12], "log(y)")))
plot(d2ly1_12, ylab='', xlab='year', main=expression(paste(Delta,Delta[12], "log(y)")))


# ACFs and PACFs
maxlag <- 60
par(mfrow=c(2,4), mar=c(3,3,4,2))
Acf(ly, type="correlation", lag.max=maxlag, main=expression(paste("ACF for log(y)")))
Acf(dly1, type="correlation", lag.max=maxlag, main=expression(paste("ACF for ",Delta, "log(y)")))
Acf(dly12, type="correlation", lag.max=maxlag, main=expression(paste("ACF for ",Delta[12], "log(y)")))
Acf(d2ly1_12, type="correlation", lag.max=maxlag, main=expression(paste("ACF for ",Delta,Delta[12], "log(y)")))

Acf(ly, type="partial", lag.max=maxlag, main=expression(paste("PACF for log(y)")))
Acf(dly1, type="partial", lag.max=maxlag, main=expression(paste("PACF for ",Delta, "log(y)")))
Acf(dly12, type="partial", lag.max=maxlag, main=expression(paste("PACF for ",Delta[12], "log(y)")))
Acf(d2ly1_12, type="partial", lag.max=maxlag, main=expression(paste("PACF for ",Delta,Delta[12], "log(y)")))

# estimate model for pre-intervention period, based on ACF and PACF using just regular MA(1)
m1 <- Arima(ly, order=c(3,1,0),seasonal=list(order=c(1,1,1), period=12), fixed=c(NA, NA, 0, NA, NA))
m1
tsdiag(m1, gof.lag=maxlag)

# seasonal AR(2) lowers both AICc and BIC, S-MA(1) term is significant
m2 <- Arima(ly, order=c(9,1,0),seasonal=list(order=c(2,1,0), period=12))
m2
tsdiag(m2, gof.lag=maxlag)

# model m2 has lower AICc and BIC

# forecast based on the model for the pre-intervention period
# y <- yall
m <- m2
m.f <- forecast(m, h=48)

# m.f.err <- log(yall)- as.zoo(m.f$mean)
m.f.err <- as.ts(log(yall))- m.f$mean

#plot forecast
par(mfrow=c(2,2), mar=c(2,4,2,2), cex=0.9)
plot(m.f, ylim=c(11.5,13.2))
lines(index(yall), log(yall))

plot(m.f.err, ylim=c(-0.17, 0), main="Multistep Forecast Error")
abline(h=0, lty="dashed")

#library(dygraphs)
#dygraph(m.f.err, main="Multistep Forecast Error")

Acf(m.f.err, type="correlation", lag.max=48, ylab="ACF", main="")
Acf(m.f.err, type="partial", lag.max=48, ylab="PACF", main="")


# 2.(2)
# estimate model with transfer function

y <- window(yall, end=2014+11/12)

# Create pulse function for Sept
p2008 <- 1*(index(y)==2008+(7-1)/12)


#m31 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12),
             # xtransf=data.frame(p2008,p2008), transfer=list(c(0,0),c(1,1)), method="ML")
#m31
# tsdiag(m4, gof.lag=maxlag)


m3 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12),
             xtransf=data.frame(p2008), transfer=list(c(1,0)), method="ML")
m3
tsdiag(m3, gof.lag=maxlag)


#short term and long term effect with different size and signs
# f_t = f^A_t + f^B_t +f^C_t (refer to p.12)
#f^A_t : omega_0 
#f^B_t : delta_1 
#f^C_t : omega_2
m4 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12),
             xtransf=data.frame(p2008,p2008,p2008), transfer=list(c(0,0),c(1,1),c(0,2)), method="ML")
m4
tsdiag(m4, gof.lag=maxlag)



# construct forecast for the model with transfer function
hmax <- 24
p2008x <- c(p2008, rep(0,hmax))

tf3 <- m3$coef["p2008-MA0"]*filter(p2008x, filter=m3$coef["p2008-AR1"], method='recursive', sides=1)
m3x <- Arima(log(y), order=c(9,1,0),  seasonal=list(order=c(2,1,1), period=12), xreg=tf3[1:length(y)])
m3x
m3x.f.h <- forecast(m3x, h=hmax, xreg=tf3[(length(y)+1):(length(y)+hmax)])

#tf4 <- cbind(m4$coef["p2008-MA0"]*p2008x, m4$coef["p2008.1-MA0"]*filter(p2008x, filter=m4$coef["p2008.1-AR1"], method='recursive', sides=1))
tf4 <- cbind(m4$coef["p2008-MA0"]*p2008x, m4$coef["p2008.1-MA0"]*filter(p2008x, filter=m4$coef["p2008.1-AR1"], method='recursive', sides=1))
m4x <- Arima(log(y), order=c(9,1,0),  seasonal=list(order=c(2,1,1), period=12), xreg=tf4[1:length(y),])
m4x
m4x.f.h <- forecast(m4x, h=hmax, xreg=tf4[(length(y)+1):(length(y)+hmax),])

# plot forecast for the model with transfer function
#par(mfrow=c(2,1))
#plot(m3x.f.h)
#lines(log(yall), col="black", lwd=2)
plot(m4x.f.h, xlim =c(2005, 2016), ylim=c(12.5, 13.2))
lines(log(yall), col="black", lwd=2)


# construct estimated 2008 recession effects on Retail and Food Services Sales
f3 <- m3$coef["p2008-MA0"]*filter(p2008, filter=m3$coef["p2008-AR1"], method = "recursive")
f4 <- m4$coef["p2008-MA0"]*p2008 + m4$coef["p2008.1-MA0"]*filter(p2008, filter=m4$coef["p2008.1-AR1"], method="recursive")
f3 <- ts(f3, start=1992, frequency=12)
f4 <- ts(f4, start=1992, frequency=12)
plot(f4, type="l", col=4, lty=1,
     ylab="", main="Estimated Effect of 2008 recession on Retail and Food Services Sales, in log points")
lines(f4, type="l", col=2, lty=2)
legend("bottomleft",c("m3 - one component", "m4 - Three components"), bty="n", col=c(4,2), lty=c(1,2))


# construct actual and fitted values
library(zoo)
lyhat3 <- rbind(as.zoo(fitted(m3x)), as.zoo(m3x.f.h$mean))
lyhat4 <- rbind(as.zoo(fitted(m4x)), as.zoo(m4x.f.h$mean))
lyhat3 <- window(lyhat3, end=2016+11/12)
lyhat4 <- window(lyhat4, end=2016+11/12)
m3x.f.h.err <- as.zoo(log(yall)) - lyhat3
m4x.f.h.err <- as.zoo(log(yall)) - lyhat4

# plot actual and fitted values
par(mfrow=c(2,2))
plot(lyhat3, col="blue", xlim =c(2005, 2016), ylim=c(12.5, 13.2))
lines(log(yall))
plot(m3x.f.h.err)
abline(h=0)

plot(lyhat4, col="blue", xlim =c(2005, 2016), ylim=c(12.5, 13.2))
lines(log(yall))
plot(m4x.f.h.err)
abline(h=0)

# m3, m4 almost same 

# create a one month ahead recursive scheme forecast
yall <- rfs
lstM <- 2014+11/12
y1 <- window(yall, end=lstM)
y2 <- window(yall, start=lstM+1/12, end=2016+11/12)
p2008 <- 1*(index(y)==2008+(6-1)/12)

m4x.f.rec <- list()
for(i in 1:(length(y2)+1))
{
    y <- window(yall, end=lstM+(i-1)/12)
    m4.updt <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12), fixed=NULL,
               xtransf=data.frame(p2008), transfer=list(c(1,0)), method="ML")
    
    #m4.updt <-  arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12),
    #                   xtransf=data.frame(p2008,p2008,p2008), transfer=list(c(0,0),c(1,1),c(0,2)), method="ML")
    #m4.updt <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12),
    #                  xtransf=data.frame(p2008), transfer=list(c(1,0)), method="ML")
    
   # p2008 <- c(p2008, 0)
    #tf4 <- cbind(m4$coef["p2008-MA0"]*p2008,  m4$coef["p2008.1-MA0"]*filter(p2008, filter=m4$coef["p2008.1-AR1"], method='recursive'))
    #tf4 <- m4$coef["p2008-MA0"]*filter(p2008, filter=m4$coef["p2008-AR1"], method='recursive')
    
    #m4x.updt <- Arima(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12), xreg=tf4[1:length(y),])
    #m4x.updt.f.1 <- forecast(m4x.updt, h=1, xreg=t(as.matrix(tf4[(length(y)+1):(length(y)+1),])))
    #m4x.f.rec$mean <- rbind(m4x.f.rec$mean, as.zoo(m4x.updt.f.1$mean))
    #m4x.f.rec$lower <- rbind(m4x.f.rec$lower, m4x.updt.f.1$lower)
    #m4x.f.rec$upper <- rbind(m4x.f.rec$upper, m4x.updt.f.1$upper)
}
m4x.f.rec$mean <- as.ts(m4x.f.rec$mean)
m4x.f.rec$level <- m4x.updt.f.1$level
m4x.f.rec$x <- window(m4x.updt.f.1$x, end=lstM)
class(m4x.f.rec) <- class(m4x.f.h)

# plot multistep and 1 steap ahead forecasts
par(mfrow=c(2,1), mar=c(2,4,2,2))

# plot multistep ahead forecasts
plot(m4x.f.h, xlim=c(2005,2016), ylim=c(12,13.2), main="Multistep Ahead Forecasts")
lines(log(yall))

# plot 1 step ahead rolling forecasts form model m4
plot(m4x.f.rec, xlim=c(2005,2016), ylim=c(12,13.2), main="1-Month Ahead Recursive Forecasts")
lines(log(yall))


# plot forecast errors for multistep and 1 steap ahead forecasts
par(mfrow=c(2,1), mar=c(2,4,2,2))
# plot multistep ahead forecasts
plot(log(y2)-m4x.f.h$mean, xlim=c(1992,2016), main="Forecast Errors: Multistep Ahead Forecasts")
abline(h=0, lty="dashed")
# plot 1 step ahead rolling forecasts form model m4
plot(log(y2)-m4x.f.rec$mean, xlim=c(1992,2016), main="Forecast Errors: 1-Month Ahead Recursive Forecasts")
abline(h=0, lty="dashed")


# evaluate forecast accuracy
accuracy(m4x.f.h$mean, log(y2))
accuracy(m4x.f.rec$mean, log(y2))


# undo log transformation
m4x.f.h$mean <- exp(m4x.f.h$mean)
m4x.f.h$lower <- exp(m4x.f.h$lower)
m4x.f.h$upper <- exp(m4x.f.h$upper)
m4x.f.h$x <- exp(m4x.f.h$x)

m4x.f.rec$mean <- exp(m4x.f.rec$mean)
m4x.f.rec$lower <- exp(m4x.f.rec$lower)
m4x.f.rec$upper <- exp(m4x.f.rec$upper)
m4x.f.rec$x <- exp(m4x.f.rec$x)


# plot forecast in levels
par(mfrow=c(2,1), mar=c(2,4,2,2))
# plot multistep ahead forecasts
plot(m4x.f.h, xlim=c(1992,2016), main="Multistep Ahead Forecasts")
lines(yall)
# plot 1 step ahead rolling forecasts form model m4
plot(m4x.f.rec, xlim=c(1992,2016), main="1-Month Ahead Recursive Forecasts")
lines(yall)

# evaluate forecast accuracy for levels
accuracy(m4x.f.h$mean, y2)
accuracy(m4x.f.rec$mean, y2)