Econ 5316 time Series Econometrics(Spring 2017) / Homework5 Problem1 Keunyoung (Kay) Kim
For the monthly Total Private Residential Construction Spending data, build a time series model by following the Box-Jenkins methodology. Then produce a multistep forecast and a one month ahead forecasts by using the estimated model. Finally, Evaluate their accuracy.
Data for this model are imported from Quandl website (FRED/PRRESCON). Based on the ACF and PACF, it looks that AR(2) is the preferred model for original time series. However, this time series shows the seasonality and trend. So we need to consider a seasonal AR or MA component as well as a regular AR or MA component.
I followed the Box-Jenkins methodology to build a time series model based on the data until the end of 2013. Seasonally(period=12) and regularly differenced data, \(\Delta\Delta_{12} log(y)\) shows better stationarity.
Based on ACF and PACF, AR(2) or AR(9) is preferred for data log(y).
I checked model adequacy using tsdiag. As a result, model m3 and m4 show the higher p-values for Ljung-Box statistics and look more adequate. These model also have smaller AICc and BIC.
# estimate model - twice differenced data1
m1 <- Arima(dly12_1, order = c(3,0,0), seasonal=list(order=c(1,0,0), period=12))
m1
## Series: dly12_1
## ARIMA(3,0,0)(1,0,0)[12] with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 sar1 mean
## 0.4186 0.3296 -0.1060 -0.0974 2e-04
## s.e. 0.0672 0.0684 0.0649 0.0680 3e-03
##
## sigma^2 estimated as 0.0003371: log likelihood=618.48
## AIC=-1224.97 AICc=-1224.61 BIC=-1204.11
tsdiag(m1, gof.lag=48)
# estimate model - seasonally differenced data
m2 <- Arima(dly12, order = c(3,0,2))
m2
## Series: dly12
## ARIMA(3,0,2) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 mean
## 0.7913 0.9239 -0.7440 0.6660 -0.2076 0.0452
## s.e. 0.0655 0.0541 0.0621 0.0924 0.0860 0.0559
##
## sigma^2 estimated as 0.0003256: log likelihood=623.46
## AIC=-1232.91 AICc=-1232.43 BIC=-1208.55
tsdiag(m2, gof.lag=48)
# estimate model - regulary differenced data
#m3 <- Arima(dly1, order = c(3,0,0), seasonal=list(order=c(1,0,1), period=12))
m3 <- Arima(dly1, order = c(9,0,0), seasonal=list(order=c(1,0,1), period=12))
m3
## Series: dly1
## ARIMA(9,0,0)(1,0,1)[12] with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## 0.4618 0.2876 -0.1073 -0.0205 0.0520 -0.0071 -0.1629 0.1678
## s.e. 0.0641 0.0685 0.0704 0.0707 0.0701 0.0692 0.0702 0.0678
## ar9 sar1 sma1 mean
## -0.1551 0.9767 -0.0348 0.0138
## s.e. 0.0625 0.0088 0.0618 0.0422
##
## sigma^2 estimated as 0.0003192: log likelihood=641.88
## AIC=-1257.75 AICc=-1256.21 BIC=-1211.92
tsdiag(m3, gof.lag=48)
# estimate model - not differenced data
m4<- Arima(ly, order = c(9,1,0), seasonal=list(order=c(1,0,1), period=12), fixed=c(NA,NA, 0,0,0,0,NA,NA,NA,NA,NA))
m4
## Series: ly
## ARIMA(9,1,0)(1,0,1)[12]
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8 ar9 sar1
## 0.4347 0.2466 0 0 0 0 -0.1596 0.1638 -0.1584 0.9790
## s.e. 0.0619 0.0616 0 0 0 0 0.0619 0.0651 0.0625 0.0079
## sma1
## -0.0384
## s.e. 0.0613
##
## sigma^2 estimated as 0.0003206: log likelihood=640.43
## AIC=-1264.86 AICc=-1264.26 BIC=-1236.65
tsdiag(m4, gof.lag=48)
I constructed the multistep forecasts for each model(m1, m2, m3, m4).
Since m4 has the smallest AICc and BIC, I finally selected m4 to compare the performance of multistep model with a rolling scheme model.
I created one-month ahead rolling forecast based on model m4.
RMSE of rolling scheme model using ARIMA(9,1,0)(1,0,1)(12) is much smller than RMSE of multi-step model using same model when we forecast log(y).
multi-step model using ARIMA(9,1,0)(1,0,1)(12): RMSE=0.2082
Rolling scheme model using ARIMA(9,1,0)(1,0,1)(12): RMSE=0.0216
## ME RMSE MAE MPE MAPE
## Test set -0.1832089 0.2076869 0.1832089 -1.744177 1.744177
## ME RMSE MAE MPE MAPE
## Test set -0.001606052 0.02168935 0.01611569 -0.0155215 0.1539586
In conclusion, one-month rolling scheme forecast based on ARIMA(9,1,0)(1,0,1)(12) has the smallest RMSE and shows the best performance for forecasting.
library(Quandl)
Quandl.api_key("XzdSwkDsE98Mxj3ixQzG")
# Total Private Residential Construction Spending : FRED/PRRESCON
library(xts)
library(zoo)
prc <- Quandl("FRED/PRRESCON", type = "zoo")
y=prc
plot(y, xlab="Year", ylab="Spending", main="Total Private Residential Construction Spending")
library(forecast)
par(mfrow = c(2, 1))
Acf(y, type="correlation", lag=48, main="ACF")
Acf(y, type="partial", lag=48, main="PACF")
str(y)
# Multiplicative Seasonal AR Model
yall<-y
fstM <- 1993+0/12
lstM <- 2013+11/12
y1 <- window(yall, end=lstM)
y2 <- window(yall, start=lstM+1/12)
y<- y1
ly <- log(y)
dly1 <- diff(ly)
dly12 <- diff(ly,12)
dly12_1 <- diff(diff(ly),12)
# Original and transformed data
par(mfrow = c(2, 3))
plot(y, main = expression(y))
plot(ly, main = expression(log(y)))
plot.new()
plot(dly1, main = expression(paste(Delta, "log(y)")))
plot(dly12, main = expression(paste(Delta[12], "log(y)")))
plot(dly12_1, main = expression(paste(Delta, Delta[12], "log(y)")))
# ACFs and PACFs
maxlag <- 48
par(mfrow=c(2,4), mar=c(3,3,4,2))
Acf(ly, type='correlation', lag=maxlag, ylab="", main=expression(paste("ACF for log(y)")))
Acf(dly1, type='correlation', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("ACF for ", Delta,"log(y)")))
Acf(dly12, type='correlation', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("ACF for ", Delta[12], "log(y)")))
Acf(dly12_1, type='correlation', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("ACF for ", Delta, Delta[12], "log(y)")))
Acf(ly, type='partial', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("PACF for log(y)")))
Acf(dly1, type='partial', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("PACF for ", Delta, "log(y)")))
Acf(dly12, type='partial', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("PACF for ", Delta[12], "log(y)")))
Acf(dly12_1, type='partial', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("PACF for ", Delta,Delta[12], "log(y)")))
# Estimate model and Check model for adqeuacy :the smallest AIC and higher p-value for Lyung-Box are preferred
# estimate model - twice differenced data1
m1 <- Arima(dly12_1, order = c(3,0,0), seasonal=list(order=c(1,0,0), period=12))
m1
tsdiag(m1, gof.lag=48)
# estimate model - seasonally differenced data
m2 <- Arima(dly12, order = c(3,0,2))
m2
tsdiag(m2, gof.lag=48)
# estimate model - regulary differenced data
#m3 <- Arima(dly1, order = c(3,0,0), seasonal=list(order=c(1,0,1), period=12))
m3 <- Arima(dly1, order = c(9,0,0), seasonal=list(order=c(1,0,1), period=12))
m3
tsdiag(m3, gof.lag=48)
# estimate model - not differenced data
m4<- Arima(ly, order = c(9,1,0), seasonal=list(order=c(1,0,1), period=12), fixed=c(NA,NA, 0,0,0,0,NA,NA,NA,NA,NA))
m4
tsdiag(m4, gof.lag=48)
# construct 36 month ahead forecasts
hmax <-36
m1.f.h <- forecast(m1, h=hmax)
m2.f.h <- forecast(m2, h=hmax)
m3.f.h <- forecast(m3, h=hmax)
m4.f.h <- forecast(m4, h=hmax)
# Plot forecast
par(mfrow=c(2,2), cex=0.7, mar=c(2,4,3,1))
plot(m1.f.h, xlim=c(1993,2016))
lines(diff(diff(log(yall),12)))
plot(m2.f.h, xlim=c(1993,2016))
lines(diff(log(yall),12))
plot(m3.f.h, xlim=c(1993,2016))
lines(diff(log(yall)))
plot(m4.f.h, xlim=c(1993,2016))
lines(log(yall))
# construct 1 month ahead rolling forecasts from model m4, together with their confidence intervals
m4.f.rol <- list()
for(i in 1:(length(y2)+1))
{
y <- window(yall, start=fstM+(i-1)/12, end=lstM+(i-1)/12 )
ly <- log(y)
m4.updt <- Arima(ly, order=c(9,1,0), seasonal=list(order=c(1,0,1),period=12))
m4.updt.f.1 <- forecast(m4.updt,1)
m4.f.rol$mean <- rbind(m4.f.rol$mean, as.zoo(m4.updt.f.1$mean))
m4.f.rol$lower <- rbind(m4.f.rol$lower, m4.updt.f.1$lower)
m4.f.rol$upper <- rbind(m4.f.rol$upper, m4.updt.f.1$upper)
}
m4.f.rol$mean <- as.ts(m4.f.rol$mean)
m4.f.rol$level <- m4.updt.f.1$level
m4.f.rol$x <- window(m4.updt.f.1$x, end=lstM)
class(m4.f.rol) <- class(m4.f.h)
par(mfrow=c(2,1), mar=c(2,4,2,2))
# plot multistep ahead forecasts
plot(m4.f.h, pch=20, xlim=c(2010,2016), ylim=c(9.5, 11.5), main="Multistep Ahead Forecasts")
lines(m4.f.h$mean, type="p", pch=20, lty="dashed", col="blue")
lines(log(yall), type="o", pch=20, lty="dashed")
# plot 1 step ahead rolling forecasts form model m4
plot(m4.f.rol, pch=20, xlim=c(2010,2016), ylim=c(9.5, 11.5), main="1-month Ahead Rolling Forecasts")
lines(m4.f.rol$mean, type="p", pch=20, lty="dashed", col="blue")
lines(log(yall), type="o", pch=20, lty="dashed")
# evaluate forecast accuracy
accuracy(m4.f.h$mean, log(y2))
accuracy(m4.f.rol$mean, log(y2))