View Code
gdp <- read.csv("GDPC1.csv")
ue <- read.csv("UNRATE.csv")
b_rec <- read.csv("USRECD.csv")
f_rec <- read.csv("RECPROUSM156N.csv")
cons <- read.csv("CSCICP03USM665S.csv")
dat <- sqldf("SELECT A.DATE AS DATE
,A.USRECD AS REC_IND
,B.RECPROUSM156N AS REC_PROB
,C.GDPC1 AS GDP
,D.UNRATE
,E.CSCICP03USM665S AS CONS_OP
FROM b_rec AS A
LEFT JOIN f_rec As B
ON A.DATE = B.DATE
LEFT JOIN gdp AS C
ON A.DATE = C.DATE
LEFT JOIN ue AS D
ON A.DATE = D.DATE
LEFT JOIN cons AS E
ON A.DATE = E.DATE
WHERE A.DATE < '2020-01-01'")
dat <- na.omit(dat)
dat$DATE = as.Date(dat$DATE)
View Code
gdp_plot <- ggplot(data = dat, aes(x=DATE))+
geom_vline(data = subset(dat, REC_IND == 1),aes(xintercept=DATE),color = "#00bc8d", lwd=1.25) +
geom_line(aes(y=GDP),color = "white", lwd = 1) +
ylab("GDP") +
xlab("Date") +
ggtitle("GDP 1967 - 2019 with Recession Indicator") +
theme_dark()
un_plot <- ggplot(data = dat, aes(x=DATE))+
geom_vline(data = subset(dat, REC_IND == 1),aes(xintercept=DATE),color = "#00bc8d", lwd=1.25) +
geom_line(aes(y=UNRATE),color = "white", lwd = 1) +
ylab("Unemployment Rate") +
xlab("Date") +
ggtitle("Unemployment 1967 - 2019 with Recession Indicator") +
theme_dark()
cons_plot <- ggplot(data = dat, aes(x=DATE))+
geom_vline(data = subset(dat, REC_IND == 1),aes(xintercept=DATE),color = "#00bc8d", lwd=1.25) +
geom_line(aes(y=CONS_OP),color = "white", lwd = 1) +
ylab("Consumer Opinion Index") +
xlab("Date") +
ggtitle("Consumer Opinion 1967 - 2019 with Recession Indicator") +
theme_dark()
prob_plot <- ggplot(data = dat, aes(x=DATE))+
geom_vline(data = subset(dat, REC_IND == 1),aes(xintercept=DATE),color = "#00bc8d", lwd=1.25) +
geom_line(aes(y=REC_PROB),color = "white", lwd = 1) +
ylab("Recession Probability") +
xlab("Date") +
ggtitle("Recession Probability 1967 - 2019 with Recession Indicator") +
theme_dark()
View Code
gdp <- ts(dat$GDP, start = c(1967, 1), frequency = 4)
split <- splitTrainTest(gdp, numTrain = length(gdp) - 9)
train <- split$train
test <- split$test
tt <- as.data.frame(train)
tt$x <- 0
ts <- as.data.frame(test)
ts$x <- 1
tt <- rbind(tt, ts)
dat <- cbind(dat, tt)
tt_plot <-
ggplot(data = dat, aes(x=DATE)) +
geom_line(data = subset(dat, x==0), aes(y=GDP, color = "Train"), lwd = 1.25) +
geom_line(data = subset(dat, x==1), aes(y=GDP, color = "Test"), lwd = 2) +
theme_dark() +
ggtitle("Train and Test Separation") +
theme(legend.title = element_blank())
Summary and ACF/PACF of GDP
There is a consistent trend upward over the entire time span. There are no signs of seasonality. It is stationarity with a sharp, negative decline happening with the great recession of 2007- 2008. There is increasing variation as we move across time. ACF has significant positive correlation that tapers to 0 and PACF starts with non-zero values after first lag.
We let the auto.arima() function determine the best order ARIMA (p,d,q)(P,D,Q)s model. It selected ARIMA(2,2,2)(1,0,2)[4].
Using the Ljung-Box test for the Arima model we see that the p-value of 0.1615 is greater than .05, suggesting that there are NO significant autocorrelations between successive forecasting errors and the residuals is white noise. The Ljung-Box test suggests that the data is independently distributed.
## Series: train
## ARIMA(2,2,2)(1,0,2)[4]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sma1 sma2
## -0.5108 0.3432 -0.1406 -0.7211 0.2712 -0.3893 -0.1847
## s.e. 0.3704 0.1139 0.3719 0.2293 0.2048 0.2092 0.0868
##
## sigma^2 estimated as 4853: log likelihood=-1124.42
## AIC=2264.85 AICc=2265.6 BIC=2291.19
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 3.245213 68.08589 50.39667 0.0345882 0.5385398 0.1656864
## ACF1
## Training set -0.02947297
GDP Residuals
We see a residual value that is a negative outlier – this is from the Housing Crisis recession in 2009. This will affect the coverage of our prediction intervals.
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,2,2)(1,0,2)[4]
## Q* = 5.1449, df = 3, p-value = 0.1615
##
## Model df: 7. Total lags used: 10
Plot of Predicted Forecast
We can see that the mean point forecast increase linearly over time. As we move closer to 2019, the prediction intervals also increase.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2017 Q2 18293.05 18203.78 18382.33 18156.52 18429.59
## 2017 Q3 18391.38 18241.50 18541.26 18162.15 18620.61
## 2017 Q4 18473.76 18263.48 18684.05 18152.17 18795.36
## 2018 Q1 18552.81 18290.57 18815.04 18151.76 18953.86
## 2018 Q2 18633.84 18325.03 18942.65 18161.55 19106.13
## 2018 Q3 18719.51 18368.78 19070.23 18183.12 19255.89
## 2018 Q4 18802.55 18409.37 19195.74 18201.23 19403.88
## 2019 Q1 18878.44 18444.39 19312.48 18214.62 19542.25
## 2019 Q2 18960.33 18491.88 19428.77 18243.90 19676.75
## Qtr1 Qtr2 Qtr3 Qtr4
## 2017 66.37804 139.10334 180.61830
## 2018 199.54862 180.08393 230.84107 218.04650
## 2019 263.30698 293.63324
## [1] 385011.5
Exponential Smoothing Summary
The best ETS model is additive level, damped-additive trend and has no seasonality. The smoothing parameters consist of \(\alpha\) = 0.9999, \(\beta\) = 0.357, \(\phi\) = 0.9447, \(\sigma\) = 72.4475
Using the Ljung-Box test for the ETS model we see that the p-value of 0.005529 is less than .05, suggesting that there are significant autocorrelations between successive forecasting errors and the residuals is not white noise. The model has some significant autocorrelation in the residuals, indicating that our prediction intervals may not have accurate coverage.
## ETS(A,Ad,N)
##
## Call:
## ets(y = train)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.357
## phi = 0.9447
##
## Initial states:
## l = 4570.7731
## b = 43.7152
##
## sigma: 72.4475
##
## AIC AICc BIC
## 2794.612 2795.045 2814.431
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 10.16714 71.54071 54.24635 0.1036165 0.5779917 0.1783428
## ACF1
## Training set 0.009280819
Residuals
We see the same negative residual outlier as from the ARIMA model. This will affect the coverage of our prediction intervals.
##
## Ljung-Box test
##
## data: Residuals from ETS(A,Ad,N)
## Q* = 12.622, df = 3, p-value = 0.005529
##
## Model df: 5. Total lags used: 8
View Code
Results of Sliding Window
k <- 140
n <- length(gdp)
p <- 4
H <- 6
st <- tsp(gdp)[1]+(k-2)/p # gives the start time in time units,
mae_arima_expand <- matrix(NA,n-k,H)
mae_arima_slide <- matrix(NA,n-k,H)
mae_ets_expand <- matrix(NA,n-k,H)
mae_ets_slide <- matrix(NA,n-k,H)
rmse_arima_expand <- matrix(NA,n-k,H)
rmse_arima_slide <- matrix(NA,n-k,H)
rmse_ets_expand <- matrix(NA,n-k,H)
rmse_ets_slide <- matrix(NA,n-k,H)
aicc_values <- data.frame(iter = numeric(), aicc_arima_expand = numeric(),
aicc_arima_slide = numeric(), aicc_ets_expand = numeric(),
aicc_ets_slide = numeric())
for(i in 1:(n-k)) {
### One Month rolling forecasting
# Expanding Window
train_expand <- window(gdp, end=st + i/p) ## Window Length: k+i
# Sliding Window - keep the training window of fixed length.
# The training set always consists of k observations.
train_slide <- window(gdp, start=st+(i-k+1)/p, end=st+i/p) ## Window Length: k
test <- window(gdp, start=st + (i+1)/p, end=st + (i+H)/p) ## Window Length: H
fit_1 <- Arima(train_expand, order=c(1,2,2),
include.drift=FALSE, lambda=0, method="ML")
fcast_arima_expand <- forecast(fit_1, h=H)
fit_2 <- Arima(train_slide, order=c(1,2,2),
include.drift=FALSE, lambda=0, method="ML")
fcast_arima_slide <- forecast(fit_2, h=H)
ets_expand <- ets(train_expand)
fcast_ets_expand <- forecast(ets_expand, h = H)
ets_slide <- ets(train_slide)
fcast_ets_slide <- forecast(ets_slide, h = H)
mae_arima_expand[i,1:length(test)] <- abs(fcast_arima_expand[['mean']]-test)
mae_arima_slide[i,1:length(test)] <- abs(fcast_arima_slide[['mean']]-test)
mae_ets_expand[i,1:length(test)] <- abs(fcast_ets_expand[['mean']]-test)
mae_ets_slide[i,1:length(test)] <- abs(fcast_ets_slide[['mean']]-test)
rmse_arima_expand[i,1:length(test)]<- (fcast_arima_expand[['mean']]-test)^2
rmse_arima_slide[i,1:length(test)]<- (fcast_arima_slide[['mean']]-test)^2
rmse_ets_expand[i,1:length(test)]<- (fcast_ets_expand[['mean']]-test)^2
rmse_ets_slide[i,1:length(test)]<- (fcast_ets_slide[['mean']]-test)^2
aicc_arima_expand <- fcast_arima_expand$model$aicc
aicc_arima_slide <- fcast_arima_slide$model$aicc
aicc_ets_expand <- fcast_ets_expand$model$aicc
aicc_ets_slide <- fcast_ets_slide$model$aicc
add_to_aicc <- data.frame(iter = i, aicc_arima_expand, aicc_arima_slide,
aicc_ets_expand, aicc_ets_slide)
aicc_values <- rbind(aicc_values, add_to_aicc)
}
mae_rmse <- data.frame(h = 1:H, mae_arima_expand = colMeans(mae_arima_expand, na.rm = TRUE),
mae_arima_slide = colMeans(mae_arima_slide, na.rm = TRUE),
mae_ets_expand = colMeans(mae_ets_expand, na.rm = TRUE),
mae_ets_slide = colMeans(mae_ets_slide, na.rm = TRUE),
rmse_arima_expand = sqrt(colMeans(rmse_arima_expand, na.rm = TRUE)),
rmse_arima_slide = sqrt(colMeans(rmse_arima_slide, na.rm = TRUE)),
rmse_ets_expand = sqrt(colMeans(rmse_ets_expand, na.rm = TRUE)),
rmse_ets_slide = sqrt(colMeans(rmse_ets_slide, na.rm = TRUE)))
colors <- c("Arima Expanding Window" = "darkred", "Arima Sliding Window" = "darkblue",
"ETS Expanding Window" = "green", "ETS Sliding Window" = "cyan")
p_rmse <-
ggplot(data = mae_rmse, aes(x = h)) +
geom_line(aes(y = rmse_arima_expand, color = "Arima Expanding Window"), lwd=1.15) +
geom_line(aes(y = rmse_arima_slide, color = "Arima Sliding Window"), lwd=1.15) +
geom_line(aes(y = rmse_ets_expand, color = "ETS Expanding Window"), lwd=1.15) +
geom_line(aes(y = rmse_ets_slide, color = "ETS Sliding Window"), lwd=1.15) +
scale_color_manual(name = "", values = colors) +
labs(x = "Horizon",
y = "RMSE",
color = "Legend",
title = "Root-square Forecast Error (RMSE)") +
theme_dark() +
theme(legend.title = element_blank(),legend.position = "bottom")
p_mae <-
ggplot(data = mae_rmse, aes(x = h)) +
geom_line(aes(y = mae_arima_expand, color = "Arima Expanding Window"), lwd=1.15) +
geom_line(aes(y = mae_arima_slide, color = "Arima Sliding Window"), lwd=1.15) +
geom_line(aes(y = mae_ets_expand, color = "ETS Expanding Window"), lwd=1.15) +
geom_line(aes(y = mae_ets_slide, color = "ETS Sliding Window"), lwd=1.15) +
scale_color_manual(name = "", values = colors) +
labs(x = "Horizion",
y = "MAE",
color = "Legend",
title = "Mean Absolute Forecast Error (MAE)") +
theme_dark() +
theme(legend.title = element_blank(),legend.position = "bottom")
p_aic <-
ggplot(data = aicc_values, aes(x = iter)) +
geom_line(aes(y = aicc_arima_expand, color = "Arima Expanding Window"), lwd=1.15) +
geom_line(aes(y = aicc_arima_slide, color = "Arima Sliding Window"), lwd=1.15) +
geom_line(aes(y = aicc_ets_expand, color = "ETS Expanding Window"), lwd=1.15) +
geom_line(aes(y = aicc_ets_slide, color = "ETS Sliding Window"), lwd=1.15) +
scale_color_manual(name = "", values = colors) +
labs(x = "Iteration Number",
y = "AICc",
color = "Legend",
title = "AICc vs Iteration Number") +
theme_dark() +
theme(legend.title = element_blank(),legend.position = "bottom")
p_aic
View Code
data<-read.csv('All_Data.csv')
data_ts<-ts(data,frequency=4,start=c(1967,3))
log_gdp<-log(data_ts[,'GDP'])
log_unrate<-log(data_ts[,'UNRATE'])
log_consop<-log(data_ts[,'CONS_OP'])
log_recprob<-log(data_ts[,'REC_PROB']+0.001)
reorg_ts<-cbind(log_gdp,log_unrate,log_consop,log_recprob)
View Code
diff_log_gdp<-diff(log_gdp,lag=1,difference=1)
diff_log_unrate<-diff(log_unrate,lag=1,difference=1)
diff_log_consop<-diff(log_consop,lag=1,difference=1)
diff_log_recprob<-diff(log_recprob,lag=1,difference=1)
plot_df <- as.data.frame(cbind(diff_log_gdp, log_unrate, log_consop, log_recprob))
p_unrate <-
ggplot(data = plot_df, aes(x=diff_log_gdp)) +
geom_point(aes(y=log_unrate), color = "#00bc8d") +
xlab("Log GDP") +
ylab("Log Unemployment Rate") +
ggtitle("Log GDP vs Log Unemployment Rate") +
theme_dark()
p_cons <-
ggplot(data = plot_df, aes(x=diff_log_gdp)) +
geom_point(aes(y=log_consop), color = "#00bc8d") +
xlab("Log GDP") +
ylab("Log Consumer Opinion") +
ggtitle("Log GDP vs Log Consumer Opinion") +
theme_dark()
p_recprob <-
ggplot(data = plot_df, aes(x=diff_log_gdp)) +
geom_point(aes(y=log_recprob), color = "#00bc8d") +
xlab("Log GDP") +
ylab("Log Recession Probability") +
ggtitle("Log GDP vs Log Recession Probability Rate") +
theme_dark()
gdp_cor <- data.frame("Log GDP",cor(diff_log_gdp,diff_log_unrate),cor(diff_log_gdp,diff_log_consop),cor(diff_log_gdp,diff_log_recprob))
x <- c("x","Log Unemployment", "Log Consumer", "Log Rec. Prob")
colnames(gdp_cor) <- x
KPSS Tests
GDP KPSS Test
##
## KPSS Test for Level Stationarity
##
## data: diff_log_gdp
## KPSS Level = 0.23151, Truncation lag parameter = 4, p-value = 0.1
Consumer Opinion KPSS Test
##
## KPSS Test for Level Stationarity
##
## data: diff_log_consop
## KPSS Level = 0.051138, Truncation lag parameter = 4, p-value = 0.1
Recession Probability KPSS Test
##
## KPSS Test for Level Stationarity
##
## data: diff_log_recprob
## KPSS Level = 0.014937, Truncation lag parameter = 4, p-value = 0.1
Using the VARselect() function in R, we can determine the best possible value for the lag length in our VAR model. Based on all selection tools p = 2 seems to be the most appropriate. Thus, we will be running a VAR(2) model.
train_gdp<-window(diff_log_gdp,end=c(2010,1))
test_gdp<-window(diff_log_gdp,start=c(2012,2))
train_unrate<-window(diff_log_unrate,end=c(2010,1))
test_unrate<-window(diff_log_unrate,start=c(2012,2))
train_consop<-window(diff_log_unrate,end=c(2010,1))
test_consop<-window(diff_log_unrate,start=c(2012,2))
train_recprob<-window(diff_log_recprob,end=c(2010,1))
test_recprob<-window(diff_log_recprob,start=c(2012,2))
VARselect(cbind(train_gdp,train_unrate))
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 2 2 2 2
##
## $criteria
## 1 2 3 4
## AIC(n) -1.598185e+01 -1.612772e+01 -1.609286e+01 -1.606589e+01
## HQ(n) -1.593502e+01 -1.604968e+01 -1.598360e+01 -1.592541e+01
## SC(n) -1.586653e+01 -1.593552e+01 -1.582378e+01 -1.571994e+01
## FPE(n) 1.145972e-07 9.904606e-08 1.025670e-07 1.053839e-07
## 5 6 7 8
## AIC(n) -1.603749e+01 -1.599569e+01 -1.595857e+01 -1.595076e+01
## HQ(n) -1.586579e+01 -1.579277e+01 -1.572443e+01 -1.568540e+01
## SC(n) -1.561465e+01 -1.549597e+01 -1.538197e+01 -1.529728e+01
## FPE(n) 1.084414e-07 1.131029e-07 1.174255e-07 1.184061e-07
## 9 10
## AIC(n) -1.595009e+01 -1.597817e+01
## HQ(n) -1.565352e+01 -1.565038e+01
## SC(n) -1.521973e+01 -1.517093e+01
## FPE(n) 1.185614e-07 1.153702e-07
var_model<-VAR(cbind(train_gdp,train_unrate),p=2,type="both")
summary(var_model)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: train_gdp, train_unrate
## Deterministic variables: both
## Sample size: 168
## Log Likelihood: 890.164
## Roots of the characteristic polynomial:
## 0.4928 0.4928 0.4089 0.4089
## Call:
## VAR(y = cbind(train_gdp, train_unrate), p = 2, type = "both")
##
##
## Estimation results for equation train_gdp:
## ==========================================
## train_gdp = train_gdp.l1 + train_unrate.l1 + train_gdp.l2 + train_unrate.l2 + const + trend
##
## Estimate Std. Error t value Pr(>|t|)
## train_gdp.l1 2.119e-01 9.139e-02 2.318 0.02168 *
## train_unrate.l1 -1.958e-02 1.508e-02 -1.298 0.19602
## train_gdp.l2 1.529e-01 9.697e-02 1.577 0.11683
## train_unrate.l2 1.302e-02 1.385e-02 0.940 0.34846
## const 5.288e-03 1.754e-03 3.015 0.00299 **
## trend -9.261e-06 1.294e-05 -0.716 0.47530
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.008012 on 162 degrees of freedom
## Multiple R-Squared: 0.14, Adjusted R-squared: 0.1134
## F-statistic: 5.273 on 5 and 162 DF, p-value: 0.0001629
##
##
## Estimation results for equation train_unrate:
## =============================================
## train_unrate = train_gdp.l1 + train_unrate.l1 + train_gdp.l2 + train_unrate.l2 + const + trend
##
## Estimate Std. Error t value Pr(>|t|)
## train_gdp.l1 -2.396e+00 5.115e-01 -4.684 5.93e-06 ***
## train_unrate.l1 5.197e-02 8.439e-02 0.616 0.539
## train_gdp.l2 -2.308e+00 5.427e-01 -4.252 3.57e-05 ***
## train_unrate.l2 6.897e-02 7.752e-02 0.890 0.375
## const 4.600e-02 9.817e-03 4.685 5.89e-06 ***
## trend -8.383e-05 7.244e-05 -1.157 0.249
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.04484 on 162 degrees of freedom
## Multiple R-Squared: 0.4121, Adjusted R-squared: 0.3939
## F-statistic: 22.71 on 5 and 162 DF, p-value: < 2.2e-16
##
##
##
## Covariance matrix of residuals:
## train_gdp train_unrate
## train_gdp 6.419e-05 -0.0001922
## train_unrate -1.922e-04 0.0020107
##
## Correlation matrix of residuals:
## train_gdp train_unrate
## train_gdp 1.000 -0.535
## train_unrate -0.535 1.000
We can use a Portmanteau test to determine if there is any serial correlation between the residuals. Because our p-value (.1552) is greater than .05, we fail to reject the null hypothesis, thus suggesting that there is little to no serial correlation. The model fits well with our data.
Serial Test of Var Model
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var_model
## Chi-squared = 66.693, df = 56, p-value = 0.1552
forecast_obj<-forecast(var_model,h=9)
autoplot(forecast_obj)
log_gdp_obj = forecast_obj[[2]]$train_gdp
log_gdp_fc = log_gdp_obj[3]$mean
gdp<-exp(diffinv(c(log_gdp[1],train_gdp,log_gdp_fc)))
gdp_fc<-tail(gdp,n=9)
gdp_value<-window(data_ts[,'GDP'],start=c(2010,2),end=c(2012,2))
accuracy(gdp_fc,gdp_value)
## ME RMSE MAE MPE MAPE ACF1
## Test set -36.63795 75.47476 66.07281 -0.2291987 0.4170155 0.4614275
## Theil's U
## Test set 0.7583586