Econ 5316 time Series Econometrics(Spring 2017) / Homework5 Problem2 Keunyoung (Kay) Kim
This intervention analysis is used to assess of how the mean level of a series changes after July 2008 recession. For this, first, identify and estimate a seasonal ARIMA for pre intervention period. Second, identify and estimate a transfer function model until Dec. 2014. Finally, using estimated transfer function model, produce forecasts until the end of 2016 and evaluate the accuracy of them.
The monthly data for Retail and Food Services Sales FRED/RSAFSNA are imported from Quandl website (FRED/RSAFSNA). This time series shows the seasonality and trend. Therefore we need to consider a seasonal AR or MA component as a regular AR or MA component. In addition to that we should take an intervention approach since it looks there was a shock around July 2008.
I followed the Box-Jenkins methodology to build a time series model based on the data until June 2008. Seasonally(period=12) and regularly differencde data, \(\Delta\Delta_{12} log(y)\) shows better stationarity.
We need to use intervention analysis approach to incorporate explicitly the effect of 2008 recession on this time series:
I checked ACF and PACF to estimate model.
I considered several alternative models to find the better one. As a result, model m2:ARIMA(9,1,0)(2,1,0)(12) for log(y) is selected since it shows the higher p-values for Ljung-Box statistics and look more adequate. These model also have smaller AICc.
# estimate model for pre-intervention period, based on ACF and PACF using just regular MA(1)
m1 <- Arima(ly, order=c(3,1,0),seasonal=list(order=c(1,1,1), period=12), fixed=c(NA, NA, 0, NA, NA))
m1
## Series: ly
## ARIMA(3,1,0)(1,1,1)[12]
##
## Coefficients:
## ar1 ar2 ar3 sar1 sma1
## -0.7545 -0.5054 0 0.1279 -0.7232
## s.e. 0.0674 0.0681 0 0.1108 0.0752
##
## sigma^2 estimated as 0.0003439: log likelihood=476.79
## AIC=-943.57 AICc=-943.23 BIC=-927.47
tsdiag(m1, gof.lag=maxlag)
# seasonal AR(2) lowers both AICc and BIC, S-MA(1) term is significant
m2 <- Arima(ly, order=c(9,1,0),seasonal=list(order=c(2,1,0), period=12))
m2
## Series: ly
## ARIMA(9,1,0)(2,1,0)[12]
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## -0.7507 -0.537 -0.1356 -0.2393 -0.0264 0.0235 0.0421 0.0329
## s.e. 0.0700 0.092 0.1052 0.1045 0.1068 0.1037 0.1048 0.0949
## ar9 sar1 sar2
## 0.3087 -0.5686 -0.3380
## s.e. 0.0746 0.0804 0.0743
##
## sigma^2 estimated as 0.0003089: log likelihood=490.38
## AIC=-956.75 AICc=-954.94 BIC=-918.11
tsdiag(m2, gof.lag=maxlag)
Next I forecasted based on the model(m2) for the pre-intervention period. This forecast is based on the pre-intervention period model, so it does not reflect the intervention effect(refer to blue line in the plot below).
The figure of multistep forecast error indicates the short term and long term effect with different size and signs. It can be achieved as combination of pulse function and step function.
Next, I estimated model with transfer function.
I checked the adequacy of transfer function. As a result, model m4:ARIMA(9,1,0)(2,1,1)(12) is preferred, since it shows the higher p-values for Ljung-Box statistics and smaller AICc and BIC. Here, the transfer function assumes that the effect of July 2008 recession disappears but with an additional term so that \(f_t = f^a_t + f^b_t\).
m3 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12),
xtransf=data.frame(P2008), transfer=list(c(1,0)), method="ML")
m3
##
## Call:
## arimax(x = log(y), order = c(9, 1, 0), seasonal = list(order = c(2, 1, 0), period = 12),
## method = "ML", xtransf = data.frame(P2008), transfer = list(c(1, 0)))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## -0.5390 -0.2515 0.1816 0.0198 0.1216 0.0546 -0.0141 0.0177
## s.e. 0.0605 0.0740 0.0723 0.0730 0.0717 0.0721 0.0719 0.0708
## ar9 sar1 sar2 P2008-AR1 P2008-MA0
## 0.2732 -0.5509 -0.3404 0.6005 0.0286
## s.e. 0.0621 0.0646 0.0612 0.1525 0.0146
##
## sigma^2 estimated as 0.0003453: log likelihood = 671.89, aic = -1317.78
tsdiag(m3, gof.lag=maxlag)
m4 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12),
xtransf=data.frame(P2008,P2008), transfer=list(c(0,0),c(1,0)), method="ML")
m4
##
## Call:
## arimax(x = log(y), order = c(9, 1, 0), seasonal = list(order = c(2, 1, 1), period = 12),
## method = "ML", xtransf = data.frame(P2008, P2008), transfer = list(c(0,
## 0), c(1, 0)))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## -0.5113 -0.2652 0.0946 -0.0197 0.0949 0.1005 -0.0013 0.0262
## s.e. 0.0619 0.0734 0.0726 0.0720 0.0714 0.0713 0.0721 0.0711
## ar9 sar1 sar2 sma1 P2008-MA0 P2008.1-AR1
## 0.2166 0.0140 -0.1448 -0.6574 -0.0292 0.4939
## s.e. 0.0644 0.1026 0.0762 0.0815 0.0360 0.1799
## P2008.1-MA0
## 0.0573
## s.e. 0.0377
##
## sigma^2 estimated as 0.0003154: log likelihood = 682.27, aic = -1334.54
tsdiag(m4, gof.lag=maxlag)
I constructed forecast for the model with transfer function using both model m3 and m4.
## Series: log(y)
## Regression with ARIMA(9,1,0)(2,1,1)[12] errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## -0.5033 -0.2596 0.0955 -0.0220 0.0929 0.0976 -0.0095 0.017
## s.e. 0.0618 0.0726 0.0723 0.0722 0.0712 0.0709 0.0712 0.070
## ar9 sar1 sar2 sma1 tf3[1:length(y)]
## 0.2112 0.0207 -0.1477 -0.6610 1.0351
## s.e. 0.0640 0.1016 0.0756 0.0803 0.5345
##
## sigma^2 estimated as 0.0003403: log likelihood=681.84
## AIC=-1335.68 AICc=-1333.98 BIC=-1285.67
## Series: log(y)
## Regression with ARIMA(9,1,0)(2,1,1)[12] errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## -0.5113 -0.2652 0.0946 -0.0197 0.0949 0.1005 -0.0013 0.0261
## s.e. 0.0618 0.0731 0.0725 0.0720 0.0714 0.0712 0.0721 0.0710
## ar9 sar1 sar2 sma1 m4$coef["P2008-MA0"] * P2008x
## 0.2166 0.0140 -0.1448 -0.6574 1.0001
## s.e. 0.0643 0.1025 0.0762 0.0814 0.9853
## m4$coef["P2008.1-MA0"] * filter(P2008x, filter = m4$coef["P2008.1-AR1"],
## 1.0001
## s.e. 0.5500
##
## sigma^2 estimated as 0.0003406: log likelihood=682.27
## AIC=-1334.54 AICc=-1332.6 BIC=-1280.96
I plotted forecast for both m3 and m4 models with transfer function and they look very similar.
I compared the estimated 2008 recession effects on Retail and Food Services Sales. Both models look similar.
I plotted actual and fitted values of model m4. The error terms oscillate around zero.
I created a one-month ahead rolling forecast based on model m4 from January 2015 to December 2016.
RMSE of rolling scheme model using ARIMA(9,1,0)(2,1,1)(12) is much smller than RMSE of multi-step model using same model when we forecast log(y).
Multi-step model using ARIMA(9,1,0)(2,1,1)(12): RMSE=0.0237
Rolling scheme model using ARIMA(9,1,0)(2,1,1)(12): RMSE= 0.0138
## ME RMSE MAE MPE MAPE ACF1
## Test set -0.02130619 0.0238394 0.02130619 -0.1639048 0.1639048 -0.1793825
## Theil's U
## Test set 0.2531675
## ME RMSE MAE MPE MAPE
## Test set -0.0006494715 0.01394127 0.01049293 -0.00522695 0.08078912
## ACF1 Theil's U
## Test set -0.2665874 0.1461409
In conclusion, one-month rolling scheme forecast based on ARIMA(9,1,0)(2,1,1)(12) has the smallest RMSE and shows the best performance for forecasting.
library(Quandl)
Quandl.api_key("XzdSwkDsE98Mxj3ixQzG")
# Monthly retail and Food Services Sales: FRED/FSAFSNA
#library(xts)
#rfs <- Quandl("FRED/RSAFSNA", type = "ts")
library(TSA)
library(zoo)
rfs <- Quandl("FRED/RSAFSNA", type = "zoo")
y=rfs
plot(y, xlab="Year", ylab="Spending", main="Monthly retail and Food Services Sales")
yall <- window(rfs, end=2016+11/12)
# identify the model for the process over the pre-intervention period
# pre-intervention period
y <- window(yall, end=2008+5/12)
ly <- log(y)
dly1 <-diff(ly)
dly12 <- diff(ly, lag=12)
d2ly1_12 <- diff(diff(ly), lag=12)
library(forecast)
# Original and transformed data
par(mfrow = c(2, 3))
plot(y, main = expression(y))
plot(ly, main = expression(log(y)))
plot.new()
plot(dly1, ylab='', xlab='year', main=expression(paste(Delta, "log(y)")))
plot(dly12, ylab='', xlab='year', main=expression(paste(Delta[12], "log(y)")))
plot(d2ly1_12, ylab='', xlab='year', main=expression(paste(Delta,Delta[12], "log(y)")))
# ACFs and PACFs
maxlag <- 60
par(mfrow=c(2,4), mar=c(3,3,4,2))
Acf(ly, type="correlation", lag.max=maxlag, main=expression(paste("ACF for log(y)")))
Acf(dly1, type="correlation", lag.max=maxlag, main=expression(paste("ACF for ",Delta, "log(y)")))
Acf(dly12, type="correlation", lag.max=maxlag, main=expression(paste("ACF for ",Delta[12], "log(y)")))
Acf(d2ly1_12, type="correlation", lag.max=maxlag, main=expression(paste("ACF for ",Delta,Delta[12], "log(y)")))
Acf(ly, type="partial", lag.max=maxlag, main=expression(paste("PACF for log(y)")))
Acf(dly1, type="partial", lag.max=maxlag, main=expression(paste("PACF for ",Delta, "log(y)")))
Acf(dly12, type="partial", lag.max=maxlag, main=expression(paste("PACF for ",Delta[12], "log(y)")))
Acf(d2ly1_12, type="partial", lag.max=maxlag, main=expression(paste("PACF for ",Delta,Delta[12], "log(y)")))
# estimate model for pre-intervention period, based on ACF and PACF using just regular MA(1)
m1 <- Arima(ly, order=c(3,1,0),seasonal=list(order=c(1,1,1), period=12), fixed=c(NA, NA, 0, NA, NA))
m1
tsdiag(m1, gof.lag=maxlag)
# seasonal AR(2) lowers both AICc and BIC, S-MA(1) term is significant
m2 <- Arima(ly, order=c(9,1,0),seasonal=list(order=c(2,1,0), period=12))
m2
tsdiag(m2, gof.lag=maxlag)
# model m2 has lower AICc and BIC
# forecast based on the model for the pre-intervention period
# y <- yall
m <- m2
m.f <- forecast(m, h=48)
# m.f.err <- log(yall)- as.zoo(m.f$mean)
m.f.err <- as.ts(log(yall))- m.f$mean
#plot forecast
par(mfrow=c(2,2), mar=c(2,4,2,2), cex=0.9)
plot(m.f, ylim=c(11.5,13.2))
lines(index(yall), log(yall))
plot(m.f.err, ylim=c(-0.17, 0), main="Multistep Forecast Error")
abline(h=0, lty="dashed")
#library(dygraphs)
#dygraph(m.f.err, main="Multistep Forecast Error")
Acf(m.f.err, type="correlation", lag.max=48, ylab="ACF", main="")
Acf(m.f.err, type="partial", lag.max=48, ylab="PACF", main="")
# 2.(2)
# estimate model with transfer function
y <- window(yall, end=2014+11/12)
# Create pulse function for Sept
p2008 <- 1*(index(y)==2008+(7-1)/12)
#m31 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12),
# xtransf=data.frame(p2008,p2008), transfer=list(c(0,0),c(1,1)), method="ML")
#m31
# tsdiag(m4, gof.lag=maxlag)
m3 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12),
xtransf=data.frame(p2008), transfer=list(c(1,0)), method="ML")
m3
tsdiag(m3, gof.lag=maxlag)
#short term and long term effect with different size and signs
# f_t = f^A_t + f^B_t +f^C_t (refer to p.12)
#f^A_t : omega_0
#f^B_t : delta_1
#f^C_t : omega_2
m4 <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12),
xtransf=data.frame(p2008,p2008,p2008), transfer=list(c(0,0),c(1,1),c(0,2)), method="ML")
m4
tsdiag(m4, gof.lag=maxlag)
# construct forecast for the model with transfer function
hmax <- 24
p2008x <- c(p2008, rep(0,hmax))
tf3 <- m3$coef["p2008-MA0"]*filter(p2008x, filter=m3$coef["p2008-AR1"], method='recursive', sides=1)
m3x <- Arima(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12), xreg=tf3[1:length(y)])
m3x
m3x.f.h <- forecast(m3x, h=hmax, xreg=tf3[(length(y)+1):(length(y)+hmax)])
#tf4 <- cbind(m4$coef["p2008-MA0"]*p2008x, m4$coef["p2008.1-MA0"]*filter(p2008x, filter=m4$coef["p2008.1-AR1"], method='recursive', sides=1))
tf4 <- cbind(m4$coef["p2008-MA0"]*p2008x, m4$coef["p2008.1-MA0"]*filter(p2008x, filter=m4$coef["p2008.1-AR1"], method='recursive', sides=1))
m4x <- Arima(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12), xreg=tf4[1:length(y),])
m4x
m4x.f.h <- forecast(m4x, h=hmax, xreg=tf4[(length(y)+1):(length(y)+hmax),])
# plot forecast for the model with transfer function
#par(mfrow=c(2,1))
#plot(m3x.f.h)
#lines(log(yall), col="black", lwd=2)
plot(m4x.f.h, xlim =c(2005, 2016), ylim=c(12.5, 13.2))
lines(log(yall), col="black", lwd=2)
# construct estimated 2008 recession effects on Retail and Food Services Sales
f3 <- m3$coef["p2008-MA0"]*filter(p2008, filter=m3$coef["p2008-AR1"], method = "recursive")
f4 <- m4$coef["p2008-MA0"]*p2008 + m4$coef["p2008.1-MA0"]*filter(p2008, filter=m4$coef["p2008.1-AR1"], method="recursive")
f3 <- ts(f3, start=1992, frequency=12)
f4 <- ts(f4, start=1992, frequency=12)
plot(f4, type="l", col=4, lty=1,
ylab="", main="Estimated Effect of 2008 recession on Retail and Food Services Sales, in log points")
lines(f4, type="l", col=2, lty=2)
legend("bottomleft",c("m3 - one component", "m4 - Three components"), bty="n", col=c(4,2), lty=c(1,2))
# construct actual and fitted values
library(zoo)
lyhat3 <- rbind(as.zoo(fitted(m3x)), as.zoo(m3x.f.h$mean))
lyhat4 <- rbind(as.zoo(fitted(m4x)), as.zoo(m4x.f.h$mean))
lyhat3 <- window(lyhat3, end=2016+11/12)
lyhat4 <- window(lyhat4, end=2016+11/12)
m3x.f.h.err <- as.zoo(log(yall)) - lyhat3
m4x.f.h.err <- as.zoo(log(yall)) - lyhat4
# plot actual and fitted values
par(mfrow=c(2,2))
plot(lyhat3, col="blue", xlim =c(2005, 2016), ylim=c(12.5, 13.2))
lines(log(yall))
plot(m3x.f.h.err)
abline(h=0)
plot(lyhat4, col="blue", xlim =c(2005, 2016), ylim=c(12.5, 13.2))
lines(log(yall))
plot(m4x.f.h.err)
abline(h=0)
# m3, m4 almost same
# create a one month ahead recursive scheme forecast
yall <- rfs
lstM <- 2014+11/12
y1 <- window(yall, end=lstM)
y2 <- window(yall, start=lstM+1/12, end=2016+11/12)
p2008 <- 1*(index(y)==2008+(6-1)/12)
m4x.f.rec <- list()
for(i in 1:(length(y2)+1))
{
y <- window(yall, end=lstM+(i-1)/12)
m4.updt <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12), fixed=NULL,
xtransf=data.frame(p2008), transfer=list(c(1,0)), method="ML")
#m4.updt <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,1), period=12),
# xtransf=data.frame(p2008,p2008,p2008), transfer=list(c(0,0),c(1,1),c(0,2)), method="ML")
#m4.updt <- arimax(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12),
# xtransf=data.frame(p2008), transfer=list(c(1,0)), method="ML")
# p2008 <- c(p2008, 0)
#tf4 <- cbind(m4$coef["p2008-MA0"]*p2008, m4$coef["p2008.1-MA0"]*filter(p2008, filter=m4$coef["p2008.1-AR1"], method='recursive'))
#tf4 <- m4$coef["p2008-MA0"]*filter(p2008, filter=m4$coef["p2008-AR1"], method='recursive')
#m4x.updt <- Arima(log(y), order=c(9,1,0), seasonal=list(order=c(2,1,0), period=12), xreg=tf4[1:length(y),])
#m4x.updt.f.1 <- forecast(m4x.updt, h=1, xreg=t(as.matrix(tf4[(length(y)+1):(length(y)+1),])))
#m4x.f.rec$mean <- rbind(m4x.f.rec$mean, as.zoo(m4x.updt.f.1$mean))
#m4x.f.rec$lower <- rbind(m4x.f.rec$lower, m4x.updt.f.1$lower)
#m4x.f.rec$upper <- rbind(m4x.f.rec$upper, m4x.updt.f.1$upper)
}
m4x.f.rec$mean <- as.ts(m4x.f.rec$mean)
m4x.f.rec$level <- m4x.updt.f.1$level
m4x.f.rec$x <- window(m4x.updt.f.1$x, end=lstM)
class(m4x.f.rec) <- class(m4x.f.h)
# plot multistep and 1 steap ahead forecasts
par(mfrow=c(2,1), mar=c(2,4,2,2))
# plot multistep ahead forecasts
plot(m4x.f.h, xlim=c(2005,2016), ylim=c(12,13.2), main="Multistep Ahead Forecasts")
lines(log(yall))
# plot 1 step ahead rolling forecasts form model m4
plot(m4x.f.rec, xlim=c(2005,2016), ylim=c(12,13.2), main="1-Month Ahead Recursive Forecasts")
lines(log(yall))
# plot forecast errors for multistep and 1 steap ahead forecasts
par(mfrow=c(2,1), mar=c(2,4,2,2))
# plot multistep ahead forecasts
plot(log(y2)-m4x.f.h$mean, xlim=c(1992,2016), main="Forecast Errors: Multistep Ahead Forecasts")
abline(h=0, lty="dashed")
# plot 1 step ahead rolling forecasts form model m4
plot(log(y2)-m4x.f.rec$mean, xlim=c(1992,2016), main="Forecast Errors: 1-Month Ahead Recursive Forecasts")
abline(h=0, lty="dashed")
# evaluate forecast accuracy
accuracy(m4x.f.h$mean, log(y2))
accuracy(m4x.f.rec$mean, log(y2))
# undo log transformation
m4x.f.h$mean <- exp(m4x.f.h$mean)
m4x.f.h$lower <- exp(m4x.f.h$lower)
m4x.f.h$upper <- exp(m4x.f.h$upper)
m4x.f.h$x <- exp(m4x.f.h$x)
m4x.f.rec$mean <- exp(m4x.f.rec$mean)
m4x.f.rec$lower <- exp(m4x.f.rec$lower)
m4x.f.rec$upper <- exp(m4x.f.rec$upper)
m4x.f.rec$x <- exp(m4x.f.rec$x)
# plot forecast in levels
par(mfrow=c(2,1), mar=c(2,4,2,2))
# plot multistep ahead forecasts
plot(m4x.f.h, xlim=c(1992,2016), main="Multistep Ahead Forecasts")
lines(yall)
# plot 1 step ahead rolling forecasts form model m4
plot(m4x.f.rec, xlim=c(1992,2016), main="1-Month Ahead Recursive Forecasts")
lines(yall)
# evaluate forecast accuracy for levels
accuracy(m4x.f.h$mean, y2)
accuracy(m4x.f.rec$mean, y2)