Data Sets

FRED



Federal Reserve Economic Data
Database maintained by the Federal Reserve Bank of St. Louis, with more than 765,000 economic time series from 96 different sources coverning:
  • banking
  • business/fiscal
  • consumer indexes
  • employment and population
  • exchange rates
  • gross domestic product
  • interest rates
  • monetary aggregates
  • producer price indexes
  • reserves and monetary base
  • U.S. trade and international transactions
  • U.S. financial data

Data Selection


Recession Data
USRECD - NBER based Recession Indicators for the United States from the Period following the Peak through the Trough
This time series is an interpretation of US Business Cycle Expansions and Contractions data provided by The National Bureau of Economic Research (NBER) at http://www.nber.org/cycles/cyclesmain.html. The NBER identifies months and quarters of turning points without designating a date within the period that turning points occurred. The dummy variable adopts an arbitrary convention that the turning point occurred at a specific date within the period. The arbitrary convention does not reflect any judgment on this issue by the NBER’s Business Cycle Dating Committee. Our time series is composed of dummy variables that represent periods of expansion and recession. A value of 1 is a recessionary period, while a value of 0 is an expansionary period. For this time series, the recession begins on the 15th day of the month of the peak and ends on the 15th day of the month of the trough.
Data profile:
  • Range: 1854-12-01 to 2020-08-17
  • Frequency: Daily
  • Data Type: Binary
  • Not Seasonally Adjusted



RECPROUSM156N - Smoothed U.S. Recession Probabilities
Smoothed recession probabilities for the United States are obtained from a dynamic-factor markov-switching model applied to four monthly coincident variables: non-farm payroll employment, the index of industrial production, real personal income excluding transfer payments, and real manufacturing and trade sales. This model was originally developed in Chauvet, M., “An Economic Characterization of Business Cycle Dynamics with Factor Structure and Regime Switching.”
Data profile:
  • Range: 1967-06-01 to 2020-06-01
  • Frequency: Monthly
  • Data Type: Float
  • Not Seasonally Adjusted


GDP Data
GDPC1 - Real Gross Domestic Product
Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.For more information see the Guide to the National Income and Product Accounts of the United States (NIPA).
Data profile:
  • Range: 1947-01-01 to 2020-04-01
  • Frequency: Quarterly
  • Data Type: Float
  • Seasonally Adjusted


Unemployment Data
UNRATE - Unemployment Rate
The unemployment rate represents the number of unemployed as a percentage of the labor force. Labor force data are restricted to people 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces.
Data profile:
  • Range: 1948-01-01 to 2020-07-01
  • Frequency: Monthly
  • Data Type: Float
  • Seasonally Adjusted


Consumer Opinion Data
CSCICP03USM665S - Consumer Opinion Surveys: Confidence Indicators: Composite Indicators: OECD Indicator for the United States
This consumer confidence indicator provides an indication of future developments of households’ consumption and saving, based upon answers regarding their expected financial situation, their sentiment about the general economic situation, unemployment and capability of savings. An indicator above 100 signals a boost in the consumers’ confidence towards the future economic situation, as a consequence of which they are less prone to save, and more inclined to spend money on major purchases in the next 12 months. Values below 100 indicate a pessimistic attitude towards future developments in the economy, possibly resulting in a tendency to save more and consume less
Data profile:
  • Range: 1960-01-01 to 2020-07-01
  • Frequency: Monthly
  • Data Type: Float
  • Seasonally Adjusted
  • Normalized (Normal = 100)

Data Preparation

Data combination and Truncation
This data presented several different frequencies and periods of reporting. The recesssion indicator went back to 1854 and was reported DAILY. However, most of our data was reported on quarterly and the latest a data set started was 1960. We truncated 2020 (under instruction) and ended up with 210 observations, represented below.

View Code

gdp <- read.csv("GDPC1.csv")
ue <- read.csv("UNRATE.csv")
b_rec <- read.csv("USRECD.csv")
f_rec <- read.csv("RECPROUSM156N.csv")
cons <- read.csv("CSCICP03USM665S.csv")
dat <- sqldf("SELECT  A.DATE AS DATE
                          ,A.USRECD AS REC_IND
                          ,B.RECPROUSM156N AS REC_PROB
                          ,C.GDPC1 AS GDP
                          ,D.UNRATE
                          ,E.CSCICP03USM665S AS CONS_OP
                  FROM    b_rec AS A
                  LEFT JOIN f_rec As B
                  ON A.DATE = B.DATE
                  LEFT JOIN gdp AS C
                  ON A.DATE = C.DATE
                  LEFT JOIN ue AS D
                  ON A.DATE = D.DATE
                  LEFT JOIN cons AS E
                  ON A.DATE = E.DATE
                  WHERE A.DATE < '2020-01-01'")

dat <- na.omit(dat)
dat$DATE = as.Date(dat$DATE)


Visual Inspection As a precursory measure we wanted to just take a look at each of our variables. The recession indicator was binary so we overlayed that with the other four variables. GDP, although seasonally adjusted, definitely has a trend. Unemployment is seasonally adjusted but we will still need to examine for trend. Consumer opinion index does not appear to have seasonality or trend but we will still test for that.

View Code

gdp_plot <- ggplot(data = dat, aes(x=DATE))+
  geom_vline(data = subset(dat, REC_IND == 1),aes(xintercept=DATE),color = "#00bc8d", lwd=1.25) +
  geom_line(aes(y=GDP),color = "white", lwd = 1) +
  ylab("GDP") +
  xlab("Date") +
  ggtitle("GDP 1967 - 2019 with Recession Indicator") +
  theme_dark()

un_plot <- ggplot(data = dat, aes(x=DATE))+
  geom_vline(data = subset(dat, REC_IND == 1),aes(xintercept=DATE),color = "#00bc8d", lwd=1.25) +
  geom_line(aes(y=UNRATE),color = "white", lwd = 1) +
  ylab("Unemployment Rate") +
  xlab("Date") +
  ggtitle("Unemployment 1967 - 2019 with Recession Indicator") +
  theme_dark()

cons_plot <- ggplot(data = dat, aes(x=DATE))+
  geom_vline(data = subset(dat, REC_IND == 1),aes(xintercept=DATE),color = "#00bc8d", lwd=1.25) +
  geom_line(aes(y=CONS_OP),color = "white", lwd = 1) +
  ylab("Consumer Opinion Index") +
  xlab("Date") +
  ggtitle("Consumer Opinion 1967 - 2019 with Recession Indicator") +
  theme_dark()

prob_plot <- ggplot(data = dat, aes(x=DATE))+
  geom_vline(data = subset(dat, REC_IND == 1),aes(xintercept=DATE),color = "#00bc8d", lwd=1.25) +
  geom_line(aes(y=REC_PROB),color = "white", lwd = 1) +
  ylab("Recession Probability") +
  xlab("Date") +
  ggtitle("Recession Probability 1967 - 2019 with Recession Indicator") +
  theme_dark()


Train and Test

View Code

gdp <- ts(dat$GDP, start = c(1967, 1), frequency = 4)
split <- splitTrainTest(gdp, numTrain = length(gdp) - 9)
train <- split$train
test <- split$test

tt <- as.data.frame(train)
tt$x <- 0
ts <- as.data.frame(test)
ts$x <- 1
tt <- rbind(tt, ts)
dat <- cbind(dat, tt)


tt_plot <- 
  ggplot(data = dat, aes(x=DATE)) + 
  geom_line(data = subset(dat, x==0), aes(y=GDP, color = "Train"), lwd = 1.25) + 
  geom_line(data = subset(dat, x==1), aes(y=GDP, color = "Test"), lwd = 2) +
  theme_dark() +
  ggtitle("Train and Test Separation") +
  theme(legend.title = element_blank())

ARIMA


Summary and ACF/PACF of GDP
There is a consistent trend upward over the entire time span. There are no signs of seasonality. It is stationarity with a sharp, negative decline happening with the great recession of 2007- 2008. There is increasing variation as we move across time. ACF has significant positive correlation that tapers to 0 and PACF starts with non-zero values after first lag.

We let the auto.arima() function determine the best order ARIMA (p,d,q)(P,D,Q)s model. It selected
ARIMA(2,2,2)(1,0,2)[4].

Using the Ljung-Box test for the Arima model we see that the p-value of
0.1615 is greater than .05, suggesting that there are NO significant autocorrelations between successive forecasting errors and the residuals is white noise. The Ljung-Box test suggests that the data is independently distributed.

## Series: train 
## ARIMA(2,2,2)(1,0,2)[4] 
## 
## Coefficients:
##           ar1     ar2      ma1      ma2    sar1     sma1     sma2
##       -0.5108  0.3432  -0.1406  -0.7211  0.2712  -0.3893  -0.1847
## s.e.   0.3704  0.1139   0.3719   0.2293  0.2048   0.2092   0.0868
## 
## sigma^2 estimated as 4853:  log likelihood=-1124.42
## AIC=2264.85   AICc=2265.6   BIC=2291.19
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE      MAPE      MASE
## Training set 3.245213 68.08589 50.39667 0.0345882 0.5385398 0.1656864
##                     ACF1
## Training set -0.02947297


GDP Residuals
We see a residual value that is a negative outlier – this is from the Housing Crisis recession in 2009. This will affect the coverage of our prediction intervals.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,2,2)(1,0,2)[4]
## Q* = 5.1449, df = 3, p-value = 0.1615
## 
## Model df: 7.   Total lags used: 10


Plot of Predicted Forecast
We can see that the mean point forecast increase linearly over time. As we move closer to 2019, the prediction intervals also increase.

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2017 Q2       18293.05 18203.78 18382.33 18156.52 18429.59
## 2017 Q3       18391.38 18241.50 18541.26 18162.15 18620.61
## 2017 Q4       18473.76 18263.48 18684.05 18152.17 18795.36
## 2018 Q1       18552.81 18290.57 18815.04 18151.76 18953.86
## 2018 Q2       18633.84 18325.03 18942.65 18161.55 19106.13
## 2018 Q3       18719.51 18368.78 19070.23 18183.12 19255.89
## 2018 Q4       18802.55 18409.37 19195.74 18201.23 19403.88
## 2019 Q1       18878.44 18444.39 19312.48 18214.62 19542.25
## 2019 Q2       18960.33 18491.88 19428.77 18243.90 19676.75
##           Qtr1      Qtr2      Qtr3      Qtr4
## 2017            66.37804 139.10334 180.61830
## 2018 199.54862 180.08393 230.84107 218.04650
## 2019 263.30698 293.63324
## [1] 385011.5

Exponential Smoothing


Exponential Smoothing Summary
The best ETS model is additive level, damped-additive trend and has no seasonality. The smoothing parameters consist of \(\alpha\) = 0.9999, \(\beta\) = 0.357, \(\phi\) = 0.9447, \(\sigma\) = 72.4475

Using the Ljung-Box test for the ETS model we see that the p-value of
0.005529 is less than .05, suggesting that there are significant autocorrelations between successive forecasting errors and the residuals is not white noise. The model has some significant autocorrelation in the residuals, indicating that our prediction intervals may not have accurate coverage.

## ETS(A,Ad,N) 
## 
## Call:
##  ets(y = train) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.357 
##     phi   = 0.9447 
## 
##   Initial states:
##     l = 4570.7731 
##     b = 43.7152 
## 
##   sigma:  72.4475
## 
##      AIC     AICc      BIC 
## 2794.612 2795.045 2814.431 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE      MAPE      MASE
## Training set 10.16714 71.54071 54.24635 0.1036165 0.5779917 0.1783428
##                     ACF1
## Training set 0.009280819



Residuals

We see the same negative residual outlier as from the ARIMA model. This will affect the coverage of our prediction intervals.

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,N)
## Q* = 12.622, df = 3, p-value = 0.005529
## 
## Model df: 5.   Total lags used: 8

Sliding Window

We applied the time-series cross validation to train and test models. We used expanding and sliding training windows for both the ARIMA and ETS models. Based on the three graphs below, it is clear that the ARIMA model is a better fit, no matter which forecast horizon is used. Overall, it seems that the ARIMA Expanding window is a better fit – it has a lower AICc value for each iteration number, a lower MAE as the forecast horizon increases, and a similar RMSE to the ARIMA Sliding window.

View Code Results of Sliding Window

k <- 140
n <- length(gdp)

p <- 4
H <- 6

st <- tsp(gdp)[1]+(k-2)/p #  gives the start time in time units,

mae_arima_expand <- matrix(NA,n-k,H)
mae_arima_slide <- matrix(NA,n-k,H)
mae_ets_expand <- matrix(NA,n-k,H)
mae_ets_slide <- matrix(NA,n-k,H)

rmse_arima_expand <- matrix(NA,n-k,H)
rmse_arima_slide <- matrix(NA,n-k,H)
rmse_ets_expand <- matrix(NA,n-k,H)
rmse_ets_slide <- matrix(NA,n-k,H)

aicc_values <- data.frame(iter = numeric(), aicc_arima_expand = numeric(),
                          aicc_arima_slide = numeric(), aicc_ets_expand = numeric(),
                          aicc_ets_slide = numeric())

for(i in 1:(n-k)) {
  
  ### One Month rolling forecasting
  # Expanding Window 
  train_expand <- window(gdp, end=st + i/p)  ## Window Length: k+i
  
  # Sliding Window - keep the training window of fixed length. 
  # The training set always consists of k observations.
  train_slide <- window(gdp, start=st+(i-k+1)/p, end=st+i/p) ## Window Length: k
  
  test <- window(gdp, start=st + (i+1)/p, end=st + (i+H)/p) ## Window Length: H
  
  fit_1 <- Arima(train_expand, order=c(1,2,2),
                 include.drift=FALSE, lambda=0, method="ML")
  fcast_arima_expand <- forecast(fit_1, h=H)
  
  fit_2 <- Arima(train_slide, order=c(1,2,2), 
                 include.drift=FALSE, lambda=0, method="ML")
  fcast_arima_slide <- forecast(fit_2, h=H)
  
  ets_expand <- ets(train_expand)
  fcast_ets_expand <- forecast(ets_expand, h = H)  
  
  ets_slide <- ets(train_slide)
  fcast_ets_slide <- forecast(ets_slide, h = H)
  
  mae_arima_expand[i,1:length(test)] <- abs(fcast_arima_expand[['mean']]-test)
  mae_arima_slide[i,1:length(test)] <- abs(fcast_arima_slide[['mean']]-test)
  mae_ets_expand[i,1:length(test)] <- abs(fcast_ets_expand[['mean']]-test)
  mae_ets_slide[i,1:length(test)] <- abs(fcast_ets_slide[['mean']]-test)
  
  rmse_arima_expand[i,1:length(test)]<- (fcast_arima_expand[['mean']]-test)^2
  rmse_arima_slide[i,1:length(test)]<- (fcast_arima_slide[['mean']]-test)^2
  rmse_ets_expand[i,1:length(test)]<- (fcast_ets_expand[['mean']]-test)^2
  rmse_ets_slide[i,1:length(test)]<- (fcast_ets_slide[['mean']]-test)^2
  
  aicc_arima_expand <- fcast_arima_expand$model$aicc
  aicc_arima_slide <- fcast_arima_slide$model$aicc
  aicc_ets_expand <- fcast_ets_expand$model$aicc
  aicc_ets_slide <- fcast_ets_slide$model$aicc
  
  add_to_aicc <- data.frame(iter = i, aicc_arima_expand, aicc_arima_slide,
                            aicc_ets_expand, aicc_ets_slide)
  aicc_values <- rbind(aicc_values, add_to_aicc)
  
}


mae_rmse <- data.frame(h = 1:H, mae_arima_expand = colMeans(mae_arima_expand, na.rm = TRUE),
                       mae_arima_slide = colMeans(mae_arima_slide, na.rm = TRUE),
                       mae_ets_expand = colMeans(mae_ets_expand, na.rm = TRUE),
                       mae_ets_slide = colMeans(mae_ets_slide, na.rm = TRUE),
                       rmse_arima_expand = sqrt(colMeans(rmse_arima_expand, na.rm = TRUE)),
                       rmse_arima_slide = sqrt(colMeans(rmse_arima_slide, na.rm = TRUE)),
                       rmse_ets_expand = sqrt(colMeans(rmse_ets_expand, na.rm = TRUE)),
                       rmse_ets_slide = sqrt(colMeans(rmse_ets_slide, na.rm = TRUE)))

colors <- c("Arima Expanding Window" = "darkred", "Arima Sliding Window" = "darkblue", 
            "ETS Expanding Window" = "green", "ETS Sliding Window" = "cyan")

p_rmse <- 
  ggplot(data = mae_rmse, aes(x = h)) +
  geom_line(aes(y = rmse_arima_expand, color = "Arima Expanding Window"), lwd=1.15) +
  geom_line(aes(y = rmse_arima_slide, color = "Arima Sliding Window"), lwd=1.15) +  
  geom_line(aes(y = rmse_ets_expand, color = "ETS Expanding Window"), lwd=1.15) +
  geom_line(aes(y = rmse_ets_slide, color = "ETS Sliding Window"), lwd=1.15) +
  scale_color_manual(name = "", values = colors) +
  labs(x = "Horizon",
       y = "RMSE",
       color = "Legend",
       title = "Root-square Forecast Error (RMSE)") +
  theme_dark() +
  theme(legend.title = element_blank(),legend.position = "bottom")

p_mae <- 
  ggplot(data = mae_rmse, aes(x = h)) +
  geom_line(aes(y = mae_arima_expand, color = "Arima Expanding Window"), lwd=1.15) +
  geom_line(aes(y = mae_arima_slide, color = "Arima Sliding Window"), lwd=1.15) +  
  geom_line(aes(y = mae_ets_expand, color = "ETS Expanding Window"), lwd=1.15) +
  geom_line(aes(y = mae_ets_slide, color = "ETS Sliding Window"), lwd=1.15) +
  scale_color_manual(name = "", values = colors) +
  labs(x = "Horizion",
       y = "MAE",
       color = "Legend",
       title = "Mean Absolute Forecast Error (MAE)") +
  theme_dark() +
  theme(legend.title = element_blank(),legend.position = "bottom")


p_aic <- 
  ggplot(data = aicc_values, aes(x = iter)) +
  geom_line(aes(y = aicc_arima_expand, color = "Arima Expanding Window"), lwd=1.15) +
  geom_line(aes(y = aicc_arima_slide, color = "Arima Sliding Window"), lwd=1.15) +  
  geom_line(aes(y = aicc_ets_expand, color = "ETS Expanding Window"), lwd=1.15) +
  geom_line(aes(y = aicc_ets_slide, color = "ETS Sliding Window"), lwd=1.15) +
  scale_color_manual(name = "", values = colors) +
  labs(x = "Iteration Number",
       y = "AICc",
       color = "Legend",
       title = "AICc vs Iteration Number") +
  theme_dark() +
  theme(legend.title = element_blank(),legend.position = "bottom")
p_aic

VAR

Construct MTS

View Code

data<-read.csv('All_Data.csv')
data_ts<-ts(data,frequency=4,start=c(1967,3))
log_gdp<-log(data_ts[,'GDP'])
log_unrate<-log(data_ts[,'UNRATE'])
log_consop<-log(data_ts[,'CONS_OP'])
log_recprob<-log(data_ts[,'REC_PROB']+0.001)
reorg_ts<-cbind(log_gdp,log_unrate,log_consop,log_recprob)

Correlation Estimation

Correlation Plots By taking the first-order differences for each variable, we get stronger correlations between the variables and GDP. In addition, since we had a few negative outliers due to the 2009-10 recession, we scaled the data by taking the logarithm of each variable.

View Code

diff_log_gdp<-diff(log_gdp,lag=1,difference=1)
diff_log_unrate<-diff(log_unrate,lag=1,difference=1)
diff_log_consop<-diff(log_consop,lag=1,difference=1)
diff_log_recprob<-diff(log_recprob,lag=1,difference=1)

plot_df <- as.data.frame(cbind(diff_log_gdp, log_unrate, log_consop, log_recprob))
p_unrate <- 
  ggplot(data = plot_df, aes(x=diff_log_gdp)) +
  geom_point(aes(y=log_unrate), color = "#00bc8d") +
  xlab("Log GDP") +
  ylab("Log Unemployment Rate") +
  ggtitle("Log GDP vs Log Unemployment Rate") +
  theme_dark()
p_cons <- 
  ggplot(data = plot_df, aes(x=diff_log_gdp)) +
  geom_point(aes(y=log_consop), color = "#00bc8d") +
  xlab("Log GDP") +
  ylab("Log Consumer Opinion") +
  ggtitle("Log GDP vs Log Consumer Opinion") +
  theme_dark()
p_recprob <- 
  ggplot(data = plot_df, aes(x=diff_log_gdp)) +
  geom_point(aes(y=log_recprob), color = "#00bc8d") +
  xlab("Log GDP") +
  ylab("Log Recession Probability") +
  ggtitle("Log GDP vs Log Recession Probability Rate") +
  theme_dark()

gdp_cor <- data.frame("Log GDP",cor(diff_log_gdp,diff_log_unrate),cor(diff_log_gdp,diff_log_consop),cor(diff_log_gdp,diff_log_recprob))
x <- c("x","Log Unemployment", "Log Consumer", "Log Rec. Prob")
colnames(gdp_cor) <- x

We can see that there is a moderately strong, negative correlation between the log of unemployment rate and the log of GDP. That indicates that as the unemployment rate declines, GDP increases. The log of consumer opinion has a weak, positive correlation with the log of GDP – as more consumers note that the economy is going well, the higher GDP will be. Finally, we see a weak, negative correlation with the log of GDP and the log of the probability of a recession. The lower the probability of a recession is, the higher GDP will be. All three of these correlations make sense logically.


KPSS Tests

GDP KPSS Test

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff_log_gdp
## KPSS Level = 0.23151, Truncation lag parameter = 4, p-value = 0.1



Consumer Opinion KPSS Test

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff_log_consop
## KPSS Level = 0.051138, Truncation lag parameter = 4, p-value = 0.1



Recession Probability KPSS Test

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff_log_recprob
## KPSS Level = 0.014937, Truncation lag parameter = 4, p-value = 0.1

For each time series (GDP, Consumer Opinion, Recession Probability), a KPSS test rejects the null hypothesis, suggesting that the process is stationary

Var Model Construction

Using the VARselect() function in R, we can determine the best possible value for the lag length in our VAR model. Based on all selection tools p = 2 seems to be the most appropriate. Thus, we will be running a VAR(2) model.

train_gdp<-window(diff_log_gdp,end=c(2010,1))
test_gdp<-window(diff_log_gdp,start=c(2012,2))
train_unrate<-window(diff_log_unrate,end=c(2010,1))
test_unrate<-window(diff_log_unrate,start=c(2012,2))
train_consop<-window(diff_log_unrate,end=c(2010,1))
test_consop<-window(diff_log_unrate,start=c(2012,2))
train_recprob<-window(diff_log_recprob,end=c(2010,1))
test_recprob<-window(diff_log_recprob,start=c(2012,2))
VARselect(cbind(train_gdp,train_unrate))
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      2      2      2      2 
## 
## $criteria
##                    1             2             3             4
## AIC(n) -1.598185e+01 -1.612772e+01 -1.609286e+01 -1.606589e+01
## HQ(n)  -1.593502e+01 -1.604968e+01 -1.598360e+01 -1.592541e+01
## SC(n)  -1.586653e+01 -1.593552e+01 -1.582378e+01 -1.571994e+01
## FPE(n)  1.145972e-07  9.904606e-08  1.025670e-07  1.053839e-07
##                    5             6             7             8
## AIC(n) -1.603749e+01 -1.599569e+01 -1.595857e+01 -1.595076e+01
## HQ(n)  -1.586579e+01 -1.579277e+01 -1.572443e+01 -1.568540e+01
## SC(n)  -1.561465e+01 -1.549597e+01 -1.538197e+01 -1.529728e+01
## FPE(n)  1.084414e-07  1.131029e-07  1.174255e-07  1.184061e-07
##                    9            10
## AIC(n) -1.595009e+01 -1.597817e+01
## HQ(n)  -1.565352e+01 -1.565038e+01
## SC(n)  -1.521973e+01 -1.517093e+01
## FPE(n)  1.185614e-07  1.153702e-07
var_model<-VAR(cbind(train_gdp,train_unrate),p=2,type="both")
summary(var_model)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: train_gdp, train_unrate 
## Deterministic variables: both 
## Sample size: 168 
## Log Likelihood: 890.164 
## Roots of the characteristic polynomial:
## 0.4928 0.4928 0.4089 0.4089
## Call:
## VAR(y = cbind(train_gdp, train_unrate), p = 2, type = "both")
## 
## 
## Estimation results for equation train_gdp: 
## ========================================== 
## train_gdp = train_gdp.l1 + train_unrate.l1 + train_gdp.l2 + train_unrate.l2 + const + trend 
## 
##                   Estimate Std. Error t value Pr(>|t|)   
## train_gdp.l1     2.119e-01  9.139e-02   2.318  0.02168 * 
## train_unrate.l1 -1.958e-02  1.508e-02  -1.298  0.19602   
## train_gdp.l2     1.529e-01  9.697e-02   1.577  0.11683   
## train_unrate.l2  1.302e-02  1.385e-02   0.940  0.34846   
## const            5.288e-03  1.754e-03   3.015  0.00299 **
## trend           -9.261e-06  1.294e-05  -0.716  0.47530   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.008012 on 162 degrees of freedom
## Multiple R-Squared:  0.14,   Adjusted R-squared: 0.1134 
## F-statistic: 5.273 on 5 and 162 DF,  p-value: 0.0001629 
## 
## 
## Estimation results for equation train_unrate: 
## ============================================= 
## train_unrate = train_gdp.l1 + train_unrate.l1 + train_gdp.l2 + train_unrate.l2 + const + trend 
## 
##                   Estimate Std. Error t value Pr(>|t|)    
## train_gdp.l1    -2.396e+00  5.115e-01  -4.684 5.93e-06 ***
## train_unrate.l1  5.197e-02  8.439e-02   0.616    0.539    
## train_gdp.l2    -2.308e+00  5.427e-01  -4.252 3.57e-05 ***
## train_unrate.l2  6.897e-02  7.752e-02   0.890    0.375    
## const            4.600e-02  9.817e-03   4.685 5.89e-06 ***
## trend           -8.383e-05  7.244e-05  -1.157    0.249    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.04484 on 162 degrees of freedom
## Multiple R-Squared: 0.4121,  Adjusted R-squared: 0.3939 
## F-statistic: 22.71 on 5 and 162 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##               train_gdp train_unrate
## train_gdp     6.419e-05   -0.0001922
## train_unrate -1.922e-04    0.0020107
## 
## Correlation matrix of residuals:
##              train_gdp train_unrate
## train_gdp        1.000       -0.535
## train_unrate    -0.535        1.000

Estimate Model

We can use a Portmanteau test to determine if there is any serial correlation between the residuals. Because our p-value (.1552) is greater than .05, we fail to reject the null hypothesis, thus suggesting that there is little to no serial correlation. The model fits well with our data.

Serial Test of Var Model

## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object var_model
## Chi-squared = 66.693, df = 56, p-value = 0.1552

Predict

forecast_obj<-forecast(var_model,h=9)
autoplot(forecast_obj)

log_gdp_obj = forecast_obj[[2]]$train_gdp
log_gdp_fc = log_gdp_obj[3]$mean
gdp<-exp(diffinv(c(log_gdp[1],train_gdp,log_gdp_fc)))
gdp_fc<-tail(gdp,n=9)
gdp_value<-window(data_ts[,'GDP'],start=c(2010,2),end=c(2012,2))
accuracy(gdp_fc,gdp_value)
##                 ME     RMSE      MAE        MPE      MAPE      ACF1
## Test set -36.63795 75.47476 66.07281 -0.2291987 0.4170155 0.4614275
##          Theil's U
## Test set 0.7583586

Future Work

Additional Data Sources
PAYEMS - All Employees, Total Nonfarm
All Employees: Total Nonfarm, commonly known as Total Nonfarm Payroll, is a measure of the number of U.S. workers in the economy that excludes proprietors, private household employees, unpaid volunteers, farm employees, and the unincorporated self-employed. This measure accounts for approximately 80 percent of the workers who contribute to Gross Domestic Product (GDP).
This measure provides useful insights into the current economic situation because it can represent the number of jobs added or lost in an economy. Increases in employment might indicate that businesses are hiring which might also suggest that businesses are growing. Additionally, those who are newly employed have increased their personal incomes, which means (all else constant) their disposable incomes have also increased, thus fostering further economic expansion.

Data profile:
  • Range: 1939-01-01 to 2020-07-01
  • Frequency: Monthly
  • Data Type: Float
  • Not Seasonally Adjusted



INDPRO - Industrial Production Index
Since 1997, the Industrial Production Index has been determined from 312 individual series based on the 2007 North American Industrial Classification System (NAICS) codes. These individual series are classified in two ways (1) market groups and (2) industry groups. (1) The Board of Governors defines markets groups as products (aggregates of final products) and materials (inputs used in the manufacture of products). Consumer goods and business equipment can be examples of market groups. “Industry groups are defined as three digit NAICS industries and aggregates of these industries such as durable and nondurable manufacturing, mining, and utilities.”

The index is compiled on a monthly basis to bring attention to short- term changes in industrial production,. It measures movements in production output and highlights structural developments in the economy. (1) Growth in the production index from month to month is an indicator of growth in the industry.

Data profile:
  • Range: 1919-01-01 to 2020-07-01
  • Frequency: Monthly
  • Data Type: Float
  • Not Seasonally Adjusted