Introduction

This assignment involves an analysis of real GDP, using actual historical data. Taking the data for real GDP from 1947 to 2008, a model will be created to forecast real GDP for the time period from 2009 to 2016.

Data

Real Personal Consumption Expenditures

Quarterly data of Real GDP from 1947 to 2016 can be found on the FRED website. A graph of this data can be found below.

# import time series for Real Gross Domestic Product
RGDP <- Quandl("FRED/GDPC1", type="zoo")
plot(RGDP, main = "Real Gross Domestic Product", sub = "Billions of 2005 Dollars")

Log Changes

The data was placed into a time series plot of log changes involving Real GDP: \(y_t=\varDelta log c_t = log c_t - log c_{t-1}\). The plot of this graph is shown below.

# construct log change in RGDP
RGDP_time <- diff(log(RGDP))
plot(RGDP_time, main = "Real Gross Domestic Product - Change of Logs")

Methodlogy

The data was split into two time periods. The first time period is from 1947 to 2008, while the second time period is from 2009 to 2016. This was done so that the data from the first time period could be used to forecast the real GDP for the second time period.

dlrGDPp1 <- window(RGDP_time, end="2008 Q4")
dlrGDPp2 <- window(RGDP_time, start="2009 Q1")

Autocorrelation and Partial Autocorrelation Functions

Both the ACF and PACF functions for the first time period were created. These two functions for \(y_t\) are plotted below.

par(mfrow=c(2,1))
Acf(dlrGDPp1,type='correlation', main = "ACF for Real GDP")
Acf(dlrGDPp1,type='partial', main = "PACF for Real GDP")

Estimating Suitable Models

The best ARMA model was chosen using the auto.arima function. It was decided that an ARMA(2,2) model would be the best fit, which seems to fit the ACF and PACF functions found previously.

auto.arima(dlrGDPp1, ic="aicc", stepwise = FALSE)
## Series: dlrGDPp1 
## ARIMA(2,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  intercept
##       1.2808  -0.6706  -0.9728  0.4962     0.0081
## s.e.  0.1719   0.1646   0.2083  0.1792     0.0008
## 
## sigma^2 estimated as 8.294e-05:  log likelihood=812.48
## AIC=-1612.96   AICc=-1612.61   BIC=-1591.91
ARMA22 <- Arima(dlrGDPp1, order = c(2,0,2))

Testing for Adequacy and Analyzing the Residuals

The model was then tested for adequacy. This was done by checking for stationarity and invertibility using plot inverted AR and MA roots. The plots below show that all roots can be found inside of the unit circle, indicating that they pass the test. The residuals for the model were also diagnosed. Based on the p-values for the residuals in the Ljung-Box statistic, the ARMA(2,2) model is acceptable.

plot.Arima(ARMA22)

tsdiag(ARMA22)

Generating Forecasts

The ARMA(2,2) model was then forecasted for the period from 2009 to 2016. This was done using both a multistep forecast and a 1 step ahead rolling forecast. A plot was then created showing these forecasts, along with the actual real GDP for this time period gathered from the FRED website.

ARMA22.f.h <- forecast(ARMA22, length(dlrGDPp2))
fstQ <- 1947.25
lstQ <- 2008.75
m1.f.rol <- zoo()
for(i in 1:length(dlrGDPp2))
{
  y <- window( RGDP_time, start=fstQ+(i-1)/4, end=lstQ+(i-1)/4)
  m1.updt <- arima(y, order=c(2,0,2))
  m1.f.rol <- c(m1.f.rol, forecast(m1.updt, 1)$mean)
}
m1.f.rol <- as.ts(m1.f.rol)

plot(ARMA22.f.h, type="o", pch=16, xlim=c(2005,2016), ylim=c(-0.03,0.03),
  main="ARMA (2,2) Multistep and Rolling Forecasts - U.S. Real GDP")
lines(ARMA22.f.h$mean, type="p", pch=16, col="blue")
lines(m1.f.rol, type="o", pch=16, col="red")
lines(RGDP_time, type="o", pch=16)
legend(2010, -0.01, c("Log of Real GDP", "ARMA(2,2) forecast", "Rolling Scheme forecast"), 
       col = c(1, 2, 4), lty = c(1, 1, 1), merge = TRUE, bg = "gray90")

Accuracy

The sample accuracy of the two forecasts were found below. The accuracy of an AR(1) model was also found and compared to these two forecasts.

AR1 <- Arima(dlrGDPp1, order = c(1,0,0))
AR1.f.h <- forecast(AR1, length(dlrGDPp2))

fstQ <- 1947.25
lstQ <- 2008.75
m2.f.rol <- zoo()
for(i in 1:length(dlrGDPp2))
{
  y <- window( RGDP_time, start=fstQ+(i-1)/4, end=lstQ+(i-1)/4)
  m2.updt <- arima(y, order=c(1,0,0))
  m2.f.rol <- c(m2.f.rol, forecast(m2.updt, 1)$mean)
}
m2.f.rol <- as.ts(m2.f.rol)


accuracy(ARMA22.f.h$mean, dlrGDPp2)
##                    ME        RMSE         MAE       MPE     MAPE
## Test set -0.003506586 0.005294129 0.004239407 -159.3768 234.5007
accuracy(m1.f.rol, dlrGDPp2)
##                    ME        RMSE         MAE       MPE    MAPE
## Test set -0.002202307 0.004997696 0.003901092 -96.81412 164.633
accuracy(AR1.f.h$mean, dlrGDPp2)
##                    ME        RMSE         MAE       MPE     MAPE
## Test set -0.003124662 0.005080373 0.004035371 -145.1532 227.4407
accuracy(m2.f.rol, dlrGDPp2)
##                    ME        RMSE         MAE       MPE     MAPE
## Test set -0.001829009 0.004835928 0.003944857 -94.16649 169.6169

Conclusion

For both the multistep and rolling forecasts, the AR(1) had a lower real mean square error, mean absolute error and mean absolute percentage error than the ARMA(2,2) model. It should be noted though, that the p values for the Ljung-Box of the AR(1) model were barely adequate, however.

This could indicate a flaw when using the auto.arima function, regarding overfitting the data. The fact that the AR(1) projection better fit the data could also be due to random chance, or because the recession of 2008 could have had an impact on Real GDP, as shown by the first graph.