Econ 5316 time Series Econometrics(Spring 2017) / Homework5 Problem1 Keunyoung (Kay) Kim

1. Introduction

For the monthly Total Private Residential Construction Spending data, build a time series model by following the Box-Jenkins methodology. Then produce a multistep forecast and a one month ahead forecasts by using the estimated model. Finally, Evaluate their accuracy.

2.Data

Data for this model are imported from Quandl website (FRED/PRRESCON). Based on the ACF and PACF, it looks that AR(2) is the preferred model for original time series. However, this time series shows the seasonality and trend. So we need to consider a seasonal AR or MA component as well as a regular AR or MA component.

3. Methodology

(1) Estimate model

I followed the Box-Jenkins methodology to build a time series model based on the data until the end of 2013. Seasonally(period=12) and regularly differenced data, \(\Delta\Delta_{12} log(y)\) shows better stationarity.

Based on ACF and PACF, AR(2) or AR(9) is preferred for data log(y).

(2) Check model adequacy

I checked model adequacy using tsdiag. As a result, model m3 and m4 show the higher p-values for Ljung-Box statistics and look more adequate. These model also have smaller AICc and BIC.

# estimate model - twice differenced data1
m1 <- Arima(dly12_1, order = c(3,0,0), seasonal=list(order=c(1,0,0), period=12))
m1
## Series: dly12_1 
## ARIMA(3,0,0)(1,0,0)[12] with non-zero mean 
## 
## Coefficients:
##          ar1     ar2      ar3     sar1   mean
##       0.4186  0.3296  -0.1060  -0.0974  2e-04
## s.e.  0.0672  0.0684   0.0649   0.0680  3e-03
## 
## sigma^2 estimated as 0.0003371:  log likelihood=618.48
## AIC=-1224.97   AICc=-1224.61   BIC=-1204.11
tsdiag(m1, gof.lag=48)

# estimate model - seasonally differenced data
m2 <- Arima(dly12, order = c(3,0,2))
m2
## Series: dly12 
## ARIMA(3,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2      ar3     ma1      ma2    mean
##       0.7913  0.9239  -0.7440  0.6660  -0.2076  0.0452
## s.e.  0.0655  0.0541   0.0621  0.0924   0.0860  0.0559
## 
## sigma^2 estimated as 0.0003256:  log likelihood=623.46
## AIC=-1232.91   AICc=-1232.43   BIC=-1208.55
tsdiag(m2, gof.lag=48)

# estimate model - regulary differenced data

#m3 <- Arima(dly1, order = c(3,0,0), seasonal=list(order=c(1,0,1), period=12)) 
m3 <- Arima(dly1, order = c(9,0,0), seasonal=list(order=c(1,0,1), period=12))
m3
## Series: dly1 
## ARIMA(9,0,0)(1,0,1)[12] with non-zero mean 
## 
## Coefficients:
##          ar1     ar2      ar3      ar4     ar5      ar6      ar7     ar8
##       0.4618  0.2876  -0.1073  -0.0205  0.0520  -0.0071  -0.1629  0.1678
## s.e.  0.0641  0.0685   0.0704   0.0707  0.0701   0.0692   0.0702  0.0678
##           ar9    sar1     sma1    mean
##       -0.1551  0.9767  -0.0348  0.0138
## s.e.   0.0625  0.0088   0.0618  0.0422
## 
## sigma^2 estimated as 0.0003192:  log likelihood=641.88
## AIC=-1257.75   AICc=-1256.21   BIC=-1211.92
tsdiag(m3, gof.lag=48)

# estimate model - not differenced data
m4<- Arima(ly, order = c(9,1,0), seasonal=list(order=c(1,0,1), period=12), fixed=c(NA,NA, 0,0,0,0,NA,NA,NA,NA,NA))
m4
## Series: ly 
## ARIMA(9,1,0)(1,0,1)[12]                    
## 
## Coefficients:
##          ar1     ar2  ar3  ar4  ar5  ar6      ar7     ar8      ar9    sar1
##       0.4347  0.2466    0    0    0    0  -0.1596  0.1638  -0.1584  0.9790
## s.e.  0.0619  0.0616    0    0    0    0   0.0619  0.0651   0.0625  0.0079
##          sma1
##       -0.0384
## s.e.   0.0613
## 
## sigma^2 estimated as 0.0003206:  log likelihood=640.43
## AIC=-1264.86   AICc=-1264.26   BIC=-1236.65
tsdiag(m4, gof.lag=48)

(3) Construct multistep forecasts

I constructed the multistep forecasts for each model(m1, m2, m3, m4).

Since m4 has the smallest AICc and BIC, I finally selected m4 to compare the performance of multistep model with a rolling scheme model.

(4) Construct one-month ahead forecasts

I created one-month ahead rolling forecast based on model m4.

(5) Compare accuracy : multistep vs. one month ahead

RMSE of rolling scheme model using ARIMA(9,1,0)(1,0,1)(12) is much smller than RMSE of multi-step model using same model when we forecast log(y).

multi-step model using ARIMA(9,1,0)(1,0,1)(12): RMSE=0.2082

Rolling scheme model using ARIMA(9,1,0)(1,0,1)(12): RMSE=0.0216

##                  ME      RMSE       MAE       MPE     MAPE
## Test set -0.1832089 0.2076869 0.1832089 -1.744177 1.744177
##                    ME       RMSE        MAE        MPE      MAPE
## Test set -0.001606052 0.02168935 0.01611569 -0.0155215 0.1539586

4. Conclusions

In conclusion, one-month rolling scheme forecast based on ARIMA(9,1,0)(1,0,1)(12) has the smallest RMSE and shows the best performance for forecasting.


* R code for analysis

library(Quandl)
Quandl.api_key("XzdSwkDsE98Mxj3ixQzG")

# Total Private Residential Construction Spending : FRED/PRRESCON
library(xts)

library(zoo)
prc <- Quandl("FRED/PRRESCON", type = "zoo")
y=prc
plot(y, xlab="Year", ylab="Spending", main="Total Private Residential Construction Spending")

library(forecast)
par(mfrow = c(2, 1))
Acf(y, type="correlation", lag=48, main="ACF")
Acf(y, type="partial", lag=48, main="PACF")
str(y)

# Multiplicative Seasonal AR Model
yall<-y
fstM <- 1993+0/12
lstM <- 2013+11/12
y1 <- window(yall, end=lstM)
y2 <- window(yall, start=lstM+1/12)

y<- y1
ly <- log(y)
dly1 <- diff(ly)
dly12 <- diff(ly,12)
dly12_1 <- diff(diff(ly),12)


# Original and transformed data
par(mfrow = c(2, 3))
plot(y, main = expression(y))
plot(ly, main = expression(log(y)))

plot.new()
plot(dly1, main = expression(paste(Delta, "log(y)")))
plot(dly12, main = expression(paste(Delta[12], "log(y)"))) 
plot(dly12_1, main = expression(paste(Delta, Delta[12], "log(y)"))) 


# ACFs and PACFs
maxlag <- 48
par(mfrow=c(2,4), mar=c(3,3,4,2))
Acf(ly, type='correlation', lag=maxlag, ylab="", main=expression(paste("ACF for log(y)")))
Acf(dly1, type='correlation', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("ACF for ", Delta,"log(y)")))
Acf(dly12, type='correlation', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("ACF for ", Delta[12], "log(y)")))
Acf(dly12_1, type='correlation', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("ACF for ", Delta, Delta[12], "log(y)")))

Acf(ly, type='partial', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("PACF for log(y)")))
Acf(dly1, type='partial', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("PACF for ", Delta, "log(y)")))
Acf(dly12, type='partial', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("PACF for ", Delta[12], "log(y)")))
Acf(dly12_1, type='partial', lag=maxlag, na.action=na.omit, ylab="", main=expression(paste("PACF for ", Delta,Delta[12], "log(y)")))

# Estimate model and Check model for adqeuacy :the smallest AIC and higher p-value for Lyung-Box are preferred 

# estimate model - twice differenced data1
m1 <- Arima(dly12_1, order = c(3,0,0), seasonal=list(order=c(1,0,0), period=12))
m1
tsdiag(m1, gof.lag=48)

# estimate model - seasonally differenced data
m2 <- Arima(dly12, order = c(3,0,2))
m2
tsdiag(m2, gof.lag=48)

# estimate model - regulary differenced data

#m3 <- Arima(dly1, order = c(3,0,0), seasonal=list(order=c(1,0,1), period=12)) 
m3 <- Arima(dly1, order = c(9,0,0), seasonal=list(order=c(1,0,1), period=12))
m3
tsdiag(m3, gof.lag=48)


# estimate model - not differenced data
m4<- Arima(ly, order = c(9,1,0), seasonal=list(order=c(1,0,1), period=12), fixed=c(NA,NA, 0,0,0,0,NA,NA,NA,NA,NA))
m4
tsdiag(m4, gof.lag=48)


# construct 36 month ahead forecasts
hmax <-36
m1.f.h <- forecast(m1, h=hmax)
m2.f.h <- forecast(m2, h=hmax)
m3.f.h <- forecast(m3, h=hmax)
m4.f.h <- forecast(m4, h=hmax)

# Plot forecast
par(mfrow=c(2,2), cex=0.7, mar=c(2,4,3,1))

plot(m1.f.h, xlim=c(1993,2016))
lines(diff(diff(log(yall),12)))

plot(m2.f.h, xlim=c(1993,2016))
lines(diff(log(yall),12))

plot(m3.f.h, xlim=c(1993,2016))
lines(diff(log(yall)))

plot(m4.f.h, xlim=c(1993,2016))
lines(log(yall))


# construct 1 month ahead rolling forecasts from model m4, together with their confidence intervals
m4.f.rol <- list()
for(i in 1:(length(y2)+1))
{
  y <- window(yall, start=fstM+(i-1)/12, end=lstM+(i-1)/12 )
  ly <- log(y)
  m4.updt <- Arima(ly, order=c(9,1,0), seasonal=list(order=c(1,0,1),period=12))
  m4.updt.f.1 <- forecast(m4.updt,1)
  m4.f.rol$mean <- rbind(m4.f.rol$mean, as.zoo(m4.updt.f.1$mean))
  m4.f.rol$lower <- rbind(m4.f.rol$lower, m4.updt.f.1$lower)
  m4.f.rol$upper <- rbind(m4.f.rol$upper, m4.updt.f.1$upper)
}
m4.f.rol$mean <- as.ts(m4.f.rol$mean)
m4.f.rol$level <- m4.updt.f.1$level
m4.f.rol$x <- window(m4.updt.f.1$x, end=lstM)
class(m4.f.rol) <- class(m4.f.h)

par(mfrow=c(2,1), mar=c(2,4,2,2))

# plot multistep ahead forecasts
plot(m4.f.h, pch=20, xlim=c(2010,2016), ylim=c(9.5, 11.5), main="Multistep Ahead Forecasts")
lines(m4.f.h$mean, type="p", pch=20, lty="dashed", col="blue")
lines(log(yall), type="o", pch=20, lty="dashed")

# plot 1 step ahead rolling forecasts form model m4
plot(m4.f.rol, pch=20, xlim=c(2010,2016), ylim=c(9.5, 11.5), main="1-month Ahead Rolling Forecasts")
lines(m4.f.rol$mean, type="p", pch=20, lty="dashed", col="blue")
lines(log(yall), type="o", pch=20, lty="dashed")


# evaluate forecast accuracy
accuracy(m4.f.h$mean, log(y2))
accuracy(m4.f.rol$mean, log(y2))