Analysis, Model Development, and Forecasting for Civilian Unemployment Rate Data



## 'zooreg' series from Jan 1948 to Jan 2016
##   Data: num [1:817] 3.4 3.8 4 3.9 3.5 3.6 3.6 3.9 3.8 3.7 ...
##   Index: Class 'yearmon'  num [1:817] 1948 1948 1948 1948 1948 ...
##   Frequency: 12
## 'zooreg' series from Jan 1948 to Dec 2014
##   Data: num [1:804] 3.4 3.8 4 3.9 3.5 3.6 3.6 3.9 3.8 3.7 ...
##   Index: Class 'yearmon'  num [1:804] 1948 1948 1948 1948 1948 ...
##   Frequency: 12



The above graph shows Civilian Unemployment Rate data. We can see by viewing this data that there is no exponential growth and no obvious reason to Log Transform the data.




The ACF and PACF suggest nonstationarity and the presence of deterministic or stochastic trend.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  UNRATE
## Dickey-Fuller = -4.0617, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary


The simple ADF test above seems to suggest we Reject the null that this is a difference stationary time series, due to the small p-value.

We will investigate this further with a KPSS test.

# KPSS Test - H0-Trend Stationary (tseries)
kpss.test(UNRATE, null = "Trend")
## 
##  KPSS Test for Trend Stationarity
## 
## data:  UNRATE
## KPSS Trend = 0.50053, Truncation lag parameter = 6, p-value = 0.01
# KPSS Test - H0-Level Stationary (tseries)
kpss.test(UNRATE, null = "Level")
## 
##  KPSS Test for Level Stationarity
## 
## data:  UNRATE
## KPSS Level = 2.0589, Truncation lag parameter = 6, p-value = 0.01
# KPSS Test - H0-Trend Stationary (tseries)
kpss.test(diff_UNRATE, null = "Trend")
## 
##  KPSS Test for Trend Stationarity
## 
## data:  diff_UNRATE
## KPSS Trend = 0.030042, Truncation lag parameter = 6, p-value = 0.1
# KPSS Test - H0-Level Stationary (tseries)
kpss.test(diff_UNRATE, null = "Level")
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff_UNRATE
## KPSS Level = 0.051729, Truncation lag parameter = 6, p-value = 0.1


Interestingly the KPSS test is telling us that we Reject the null that this time series is either Trend or Level stationary. We will look at one more set of KPSS tests before we conclude this is difference stationary.

Also interesting is when we pass the test the differenced data we Fail to Reject the null that this is trend stationary.

# KPSS Test - H0-Trend Stationary-mu (urca)
UNRATE_mu.kpss <- ur.kpss(UNRATE, type = "mu")
summary(UNRATE_mu.kpss)
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 2.0589 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
# KPSS Test - H0-Trend Stationary-tau (urca)
UNRATE_tau.kpss <- ur.kpss(UNRATE, type = "tau")
summary(UNRATE_tau.kpss)
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: tau with 6 lags. 
## 
## Value of test-statistic is: 0.5005 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216


The above KPSS tests again suggest that we are dealing with a difference stationary time series. The test statistic is well into the rejection region.


Since we went back and forth a few times before we determined we could be dealing more with a difference stationary time series over a trend stationary time series, we will now start to look more closely at a more sophisticated Augmented Dickey-Fuller test to try to determine what type of difference stationary time series we might be dealing with.


Before we start the ADF testing we need to determine the appropriate lag length to select based on this time series data set. However, care must be taken to ensure that an appropriate lag length is chosen. If the lag length is too short then the residuals will be serially correlated and the test will be biased. If the lag length is too large the power of the test will suffer. There is some literate precedence to suggest, based on boot strapping the lag length, that erring on the side of too large is better than erring on the side of too short.

To get a staring point for \(p_{max}\) I will use a procedure developed by G. William Schwert in his 1989 paper in The Journal of Business of Economic Statistics.
\[{l_{12}} = {\mathop{\rm int}} \left\{ {12{{\left( {{T \over {100}}} \right)}^{{1 \over 4}}}} \right\}\]
This differs some from the suggestions by Said and Dickey but since I am only using this as a guide it seems reasonable to use.

Once a starting point has been determined, I will use this \(p_{max}\) value as my lag parameter and then plot the residuals of the ADF test to determine the appropriateness of the lag length. If the residuals appear correlated a larger lag value will be chosen. This process will continue until all serial correlation has disappeared. This process is more of an art form than a science but I feel like it is worth the time to ensure a proper lag length is determined.

After executing the above mentioned process, I have determined that a lag length of \(p_{max}=26\).

We will follow the ā€œProposed Full Procedure for ADF Testā€ guidelines to determine what type of difference stationary model we are dealing with.

We will start by testing some hypnosis against Model C, below:

# Model C
t1 <- ur.df(UNRATE, type="trend", lags=26, selectlags = "Fixed")
plot(t1)

summary(t1)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.64593 -0.10792 -0.01042  0.10648  0.73198 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.284e-02  2.727e-02   2.671 0.007729 ** 
## z.lag.1      -1.534e-02  5.100e-03  -3.007 0.002725 ** 
## tt            4.089e-05  3.192e-05   1.281 0.200513    
## z.diff.lag1   8.588e-02  3.637e-02   2.361 0.018457 *  
## z.diff.lag2   1.885e-01  3.650e-02   5.165 3.08e-07 ***
## z.diff.lag3   1.301e-01  3.658e-02   3.556 0.000400 ***
## z.diff.lag4   1.229e-01  3.686e-02   3.334 0.000898 ***
## z.diff.lag5   1.216e-01  3.545e-02   3.430 0.000638 ***
## z.diff.lag6   2.697e-02  3.502e-02   0.770 0.441413    
## z.diff.lag7   6.080e-06  3.496e-02   0.000 0.999861    
## z.diff.lag8   3.864e-02  3.485e-02   1.109 0.267934    
## z.diff.lag9   1.529e-02  3.494e-02   0.438 0.661788    
## z.diff.lag10 -8.720e-02  3.493e-02  -2.496 0.012771 *  
## z.diff.lag11  2.250e-02  3.500e-02   0.643 0.520524    
## z.diff.lag12 -1.596e-01  3.499e-02  -4.560 5.97e-06 ***
## z.diff.lag13 -6.586e-03  3.549e-02  -0.186 0.852822    
## z.diff.lag14 -1.453e-02  3.548e-02  -0.410 0.682176    
## z.diff.lag15  3.487e-02  3.498e-02   0.997 0.319205    
## z.diff.lag16  1.904e-02  3.498e-02   0.544 0.586453    
## z.diff.lag17  7.988e-03  3.469e-02   0.230 0.817946    
## z.diff.lag18  5.547e-03  3.468e-02   0.160 0.872980    
## z.diff.lag19  8.824e-02  3.465e-02   2.546 0.011084 *  
## z.diff.lag20  6.846e-02  3.475e-02   1.970 0.049200 *  
## z.diff.lag21  6.262e-03  3.484e-02   0.180 0.857391    
## z.diff.lag22 -2.529e-02  3.458e-02  -0.731 0.464734    
## z.diff.lag23 -3.521e-02  3.439e-02  -1.024 0.306198    
## z.diff.lag24 -1.560e-01  3.407e-02  -4.578 5.48e-06 ***
## z.diff.lag25  3.195e-02  3.380e-02   0.945 0.344736    
## z.diff.lag26  3.139e-02  3.370e-02   0.931 0.352047    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1778 on 748 degrees of freedom
## Multiple R-squared:  0.2236, Adjusted R-squared:  0.1945 
## F-statistic: 7.693 on 28 and 748 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -3.0071 3.0183 4.5255 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34


We start by testing against Model C, \[\Delta {y_t} = \gamma {y_{t - 1}} + \mu + \beta t + \sum\limits_{i = 1}^{p - 1} {{p_i}} \Delta {y_{t - i}} + {e_t}\] where we look at the \(\phi_3\) statistic to test \(H_0:\) \(\gamma=\beta=0\)

According to the critical region we Fail to Reject the null which implies that this time series data set contains a unit root. This is further confirmed by looking at \(\tau_3\) statistic for the lagged endogenous variable in levels to see that we Fail to Reject the null that \(H_0:\) \(\gamma=0\). Again a unit root can not be rejected.

We next look at the \(\phi_2\) statistic to determine if this time series data is a random walk or a random walk with drift. The \(H_0: \gamma=\mu=\beta=0\) but since we have already determined that \(\gamma=\beta=0\) in previous hypothesis test we are really testing for \(\mu=0\). We find that we Fail to reject this hypnosis which indicates we are dealing with a Pure Random Walk.

This can be further confirmed by now testing several hypothesis on Model B.

#Model B
t2 <- ur.df(UNRATE, type="drift", lags=26, selectlags = "Fixed")
plot(t2)

summary(t2)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.65368 -0.10897 -0.01111  0.10680  0.72441 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.072571   0.027281   2.660 0.007979 ** 
## z.lag.1      -0.012380   0.004550  -2.721 0.006662 ** 
## z.diff.lag1   0.085089   0.036377   2.339 0.019592 *  
## z.diff.lag2   0.187409   0.036506   5.134 3.63e-07 ***
## z.diff.lag3   0.129111   0.036587   3.529 0.000443 ***
## z.diff.lag4   0.121770   0.036868   3.303 0.001003 ** 
## z.diff.lag5   0.120948   0.035462   3.411 0.000683 ***
## z.diff.lag6   0.025594   0.035019   0.731 0.465106    
## z.diff.lag7  -0.001546   0.034949  -0.044 0.964736    
## z.diff.lag8   0.036935   0.034841   1.060 0.289439    
## z.diff.lag9   0.013571   0.034927   0.389 0.697706    
## z.diff.lag10 -0.088963   0.034921  -2.548 0.011047 *  
## z.diff.lag11  0.020603   0.034988   0.589 0.556135    
## z.diff.lag12 -0.161681   0.034971  -4.623 4.45e-06 ***
## z.diff.lag13 -0.008267   0.035478  -0.233 0.815811    
## z.diff.lag14 -0.016112   0.035469  -0.454 0.649775    
## z.diff.lag15  0.033749   0.034989   0.965 0.335069    
## z.diff.lag16  0.017705   0.034983   0.506 0.612928    
## z.diff.lag17  0.006997   0.034697   0.202 0.840247    
## z.diff.lag18  0.004644   0.034689   0.134 0.893546    
## z.diff.lag19  0.087363   0.034659   2.521 0.011922 *  
## z.diff.lag20  0.067327   0.034753   1.937 0.053081 .  
## z.diff.lag21  0.004903   0.034836   0.141 0.888103    
## z.diff.lag22 -0.026877   0.034569  -0.777 0.437115    
## z.diff.lag23 -0.036651   0.034384  -1.066 0.286795    
## z.diff.lag24 -0.157707   0.034054  -4.631 4.29e-06 ***
## z.diff.lag25  0.029847   0.033772   0.884 0.377105    
## z.diff.lag26  0.029078   0.033670   0.864 0.388074    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1779 on 749 degrees of freedom
## Multiple R-squared:  0.2219, Adjusted R-squared:  0.1938 
## F-statistic:  7.91 on 27 and 749 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -2.7209 3.7036 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78


Model B,

\[\Delta {y_t} = \gamma {y_{t - 1}} + \mu + \sum\limits_{i = 1}^{p - 1} {{p_i}} \Delta {y_{t - i}} + {e_t}\]
Since we had some difficulty determining if this time series was trend stationary or difference stationary I will increase my level of significance to 95% for the critical regions of the F-test.

In Model B we first look at the \(\tau_2\) statistic under the null, \(H_0:\) \(\gamma=0\); Fail to Reject at the 95% level. Next we look at \(\phi_1\) statistic under the null \(H_0:\) \(\gamma=\mu=0\) which really means, based on the previous result, that we are testing if \(\mu=0\); Reject at the 95% level, which confirms again we are dealing with a Pure Random Walk.

Since we suspect we are dealing with a pure random walk model we will investigate Model A.

#Model A
t3 <- ur.df(UNRATE, type="none", lags=26, selectlags = "Fixed")
plot(t3)

summary(t3)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.68279 -0.10639 -0.00401  0.10974  0.75352 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## z.lag.1      -6.133e-04  1.070e-03  -0.573 0.566825    
## z.diff.lag1   8.248e-02  3.651e-02   2.259 0.024161 *  
## z.diff.lag2   1.839e-01  3.663e-02   5.021 6.41e-07 ***
## z.diff.lag3   1.254e-01  3.671e-02   3.417 0.000666 ***
## z.diff.lag4   1.173e-01  3.698e-02   3.172 0.001578 ** 
## z.diff.lag5   1.145e-01  3.552e-02   3.223 0.001323 ** 
## z.diff.lag6   1.909e-02  3.508e-02   0.544 0.586496    
## z.diff.lag7  -8.916e-03  3.498e-02  -0.255 0.798880    
## z.diff.lag8   2.888e-02  3.485e-02   0.829 0.407462    
## z.diff.lag9   5.233e-03  3.493e-02   0.150 0.880937    
## z.diff.lag10 -9.724e-02  3.492e-02  -2.785 0.005495 ** 
## z.diff.lag11  1.339e-02  3.502e-02   0.382 0.702275    
## z.diff.lag12 -1.691e-01  3.500e-02  -4.833 1.63e-06 ***
## z.diff.lag13 -1.405e-02  3.556e-02  -0.395 0.692934    
## z.diff.lag14 -2.147e-02  3.556e-02  -0.604 0.546193    
## z.diff.lag15  2.994e-02  3.510e-02   0.853 0.393945    
## z.diff.lag16  1.369e-02  3.509e-02   0.390 0.696477    
## z.diff.lag17  3.471e-03  3.481e-02   0.100 0.920609    
## z.diff.lag18  1.024e-03  3.480e-02   0.029 0.976529    
## z.diff.lag19  8.341e-02  3.477e-02   2.399 0.016680 *  
## z.diff.lag20  6.292e-02  3.485e-02   1.805 0.071414 .  
## z.diff.lag21  2.183e-05  3.493e-02   0.001 0.999502    
## z.diff.lag22 -3.291e-02  3.463e-02  -0.950 0.342329    
## z.diff.lag23 -4.353e-02  3.443e-02  -1.264 0.206448    
## z.diff.lag24 -1.654e-01  3.407e-02  -4.856 1.46e-06 ***
## z.diff.lag25  2.217e-02  3.378e-02   0.656 0.511899    
## z.diff.lag26  2.142e-02  3.368e-02   0.636 0.525068    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1786 on 750 degrees of freedom
## Multiple R-squared:  0.2145, Adjusted R-squared:  0.1863 
## F-statistic: 7.588 on 27 and 750 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -0.573 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62


Model A,

\[\Delta {y_t} = \gamma {y_{t - 1}} + \sum\limits_{i = 1}^{p - 1} {{p_i}} \Delta {y_{t - i}} + {e_t}\]
We are testing \(H_0: \gamma=0\). When we look at \(\tau_1\) we see that we Fail to Reject the null which confirms, in the strongest way, that we are dealing with a Pure Random Walk.

We will look again at Model C to determine if differencing the series once I(1) will suffice or if we need to difference twice I(2).

# Model C Differenced Data
t4 <- ur.df(diff_UNRATE, type="trend", lags=26, selectlags = "Fixed")
plot(t4)

summary(t4)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6691 -0.1127 -0.0061  0.1102  0.7285 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.057e-03  1.343e-02   0.228 0.820046    
## z.lag.1      -5.666e-01  1.024e-01  -5.534 4.34e-08 ***
## tt           -5.983e-06  2.850e-05  -0.210 0.833796    
## z.diff.lag1  -3.527e-01  1.006e-01  -3.506 0.000481 ***
## z.diff.lag2  -1.697e-01  9.827e-02  -1.727 0.084543 .  
## z.diff.lag3  -2.748e-02  9.517e-02  -0.289 0.772881    
## z.diff.lag4   9.438e-02  9.337e-02   1.011 0.312427    
## z.diff.lag5   1.884e-01  9.197e-02   2.048 0.040920 *  
## z.diff.lag6   2.245e-01  9.010e-02   2.492 0.012909 *  
## z.diff.lag7   2.071e-01  8.928e-02   2.319 0.020647 *  
## z.diff.lag8   2.270e-01  8.788e-02   2.583 0.009991 ** 
## z.diff.lag9   2.343e-01  8.578e-02   2.731 0.006454 ** 
## z.diff.lag10  1.382e-01  8.369e-02   1.651 0.099145 .  
## z.diff.lag11  1.549e-01  8.128e-02   1.906 0.057026 .  
## z.diff.lag12 -1.520e-02  7.883e-02  -0.193 0.847135    
## z.diff.lag13 -2.735e-02  7.559e-02  -0.362 0.717577    
## z.diff.lag14 -4.431e-02  7.224e-02  -0.613 0.539819    
## z.diff.lag15 -5.293e-04  6.898e-02  -0.008 0.993880    
## z.diff.lag16  1.474e-02  6.717e-02   0.219 0.826352    
## z.diff.lag17  2.154e-02  6.524e-02   0.330 0.741402    
## z.diff.lag18  2.089e-02  6.406e-02   0.326 0.744412    
## z.diff.lag19  9.928e-02  6.287e-02   1.579 0.114716    
## z.diff.lag20  1.670e-01  6.161e-02   2.711 0.006862 ** 
## z.diff.lag21  1.683e-01  6.078e-02   2.769 0.005764 ** 
## z.diff.lag22  1.233e-01  5.993e-02   2.058 0.039976 *  
## z.diff.lag23  6.509e-02  5.775e-02   1.127 0.260003    
## z.diff.lag24 -1.150e-01  5.400e-02  -2.130 0.033467 *  
## z.diff.lag25 -1.079e-01  4.725e-02  -2.283 0.022701 *  
## z.diff.lag26 -8.191e-02  3.351e-02  -2.444 0.014746 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1778 on 747 degrees of freedom
## Multiple R-squared:  0.5113, Adjusted R-squared:  0.493 
## F-statistic: 27.92 on 28 and 747 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -5.5338 10.2363 15.3468 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34


Looking at the results from the above test we can conclude, by viewing the \(\tau_3\) statistic that we Reject the idea that we need to difference the data more than once. In other words, I(1) differencing is sufficient for this time series.

It certainly seems that we are dealing with a difference stationary time series. We will now follow the Box-Jenkins Method to build a time series model.


According to the ACF and PACF plots we can clearly see that we have an AR(4) component and possibly a MA(q) component. Despite the fact that, according to the FRED data, we are dealing with seasonally adjusted data, we can clearly see a seasonal trend at S=12 lag.

We will start by modeling an ARIMA model with a seasonal component, \(ARIMA(4,1,2)X(2,0,2)_{12}\).

m1 <- Arima(UNRATE, order=c(4,1,2), seasonal= list(order=c(2,0,2), period=12))
m1
## Series: UNRATE 
## ARIMA(4,1,2)(2,0,2)[12]                    
## 
## Coefficients:
##          ar1      ar2      ar3     ar4      ma1     ma2    sar1     sar2
##       1.5617  -0.5619  -0.1768  0.0692  -1.5793  0.7863  0.5517  -0.1136
## s.e.  0.0829   0.1051   0.0759  0.0573   0.0738  0.0818  0.3013   0.2041
##          sma1     sma2
##       -0.7486  -0.0013
## s.e.   0.2997   0.2652
## 
## sigma^2 estimated as 0.03535:  log likelihood=200.49
## AIC=-378.99   AICc=-378.66   BIC=-327.42
tsdiag(m1, gof.lag=24)

m1.fcast <- forecast(m1, h=24)

m2 <- Arima(UNRATE, order=c(4,1,2), seasonal= list(order=c(1,0,1), period=12))
m2
## Series: UNRATE 
## ARIMA(4,1,2)(1,0,1)[12]                    
## 
## Coefficients:
##           ar1     ar2     ar3     ar4     ma1      ma2    sar1     sma1
##       -0.3297  0.6874  0.2536  0.0317  0.3323  -0.4779  0.5587  -0.8237
## s.e.   0.1926  0.1035  0.0597  0.0561  0.1903   0.1021  0.0685   0.0479
## 
## sigma^2 estimated as 0.03595:  log likelihood=194
## AIC=-370.01   AICc=-369.78   BIC=-327.81
tsdiag(m2, gof.lag=24)

m2.fcast <- forecast(m2, h=24)

auto.arima(UNRATE, max.order = 10, stepwise = F)
## Series: UNRATE 
## ARIMA(3,1,1)(2,0,2)[12]                    
## 
## Coefficients:
##          ar1     ar2     ar3      ma1    sar1     sar2     sma1    sma2
##       0.5092  0.2104  0.0761  -0.5122  0.5571  -0.1242  -0.8072  0.0622
## s.e.  0.0919  0.0395  0.0483   0.0868  0.3685   0.2206   0.3681  0.3108
## 
## sigma^2 estimated as 0.03592:  log likelihood=194.51
## AIC=-371.02   AICc=-370.79   BIC=-328.82
m3 <- Arima(UNRATE, order=c(3,1,1), seasonal= list(order=c(2,0,2), period=12))
m3
## Series: UNRATE 
## ARIMA(3,1,1)(2,0,2)[12]                    
## 
## Coefficients:
##          ar1     ar2     ar3      ma1    sar1     sar2     sma1    sma2
##       0.5092  0.2104  0.0761  -0.5122  0.5571  -0.1242  -0.8072  0.0622
## s.e.  0.0919  0.0395  0.0483   0.0868  0.3685   0.2206   0.3681  0.3108
## 
## sigma^2 estimated as 0.03592:  log likelihood=194.51
## AIC=-371.02   AICc=-370.79   BIC=-328.82
tsdiag(m3, gof.lag=24)

m3.fcast <- forecast(m3, h=24)

m4 <- Arima(UNRATE, order=c(4,1,2))
m4
## Series: UNRATE 
## ARIMA(4,1,2)                    
## 
## Coefficients:
##           ar1     ar2     ar3     ar4     ma1      ma2
##       -0.4795  0.7153  0.2766  0.0290  0.5145  -0.4743
## s.e.   0.1055  0.0969  0.0580  0.0526  0.1000   0.0989
## 
## sigma^2 estimated as 0.039:  log likelihood=162.61
## AIC=-311.21   AICc=-311.07   BIC=-278.39
tsdiag(m4, gof.lag=24)

m4.fcast <- forecast(m4, h=24)

m5 <- Arima(UNRATE, order=c(5,1,5))
m5
## Series: UNRATE 
## ARIMA(5,1,5)                    
## 
## Coefficients:
##           ar1     ar2    ar3      ar4      ar5     ma1      ma2      ma3
##       -0.3731  0.6940  0.749  -0.0813  -0.6692  0.3911  -0.5174  -0.5468
## s.e.   0.0606  0.0674  0.076   0.0669   0.0553  0.0599   0.0577   0.0658
##          ma4     ma5
##       0.1404  0.7584
## s.e.  0.0634  0.0595
## 
## sigma^2 estimated as 0.03745:  log likelihood=178.63
## AIC=-335.27   AICc=-334.93   BIC=-283.7
tsdiag(m5, gof.lag=24)

m5.fcast <- forecast(m5, h=24)

m6 <- Arima(UNRATE, order=c(4,1,0))
m6
## Series: UNRATE 
## ARIMA(4,1,0)                    
## 
## Coefficients:
##          ar1     ar2     ar3     ar4
##       0.0278  0.2423  0.1717  0.0789
## s.e.  0.0353  0.0348  0.0348  0.0354
## 
## sigma^2 estimated as 0.03965:  log likelihood=156.4
## AIC=-302.79   AICc=-302.72   BIC=-279.35
tsdiag(m6, gof.lag=24)

m6.fcast <- forecast(m6, h=24)


According to the above results our best model is \(ARIMA(4,1,2)X(2,0,2)_{12}\).

The Diagnostics on this model look adequate. The only concern is an sar2 and sma2 component that appear to not be significant but then when removed from the model look like they are contributing something.

I will now create some forecasts and view the results to try to determine which model forecasts best.


Interesting results! It seems like all of the statistics and test were pointing to the \(ARIMA(4,1,2)X(2,0,2)_{12}\) model, but when developing forecasts for this model it seems to fall apart after a few periods. The forecast fits the data nicely until it starts to deviate upward for some reason. According to the ACF and PACF plots there was a clear seasonal trend but when I try to model this trend with a seasonal ARIMA it does not do well. It appears that a non-seasonal \(ARIMA(4,1,2)\) or a pure differenced AR(p) model, \(ARIMA(4,1,0)\) are working the best for the forecasting of this data. I am not sure why this is the case…???