The first step is to import the monthly time series for Civillian Unemployment Rate, FRED/UNRATE, using Quandl packages. We will denote FRED/UNRATE as \(Unemp_{t}\).
unemp <- Quandl("FRED/UNRATE", type="ts")
str(unemp)
Time-Series [1:817] from 1948 to 2016: 3.4 3.8 4 3.9 3.5 3.6 3.6 3.9 3.8 3.7 ...
plot(unemp, ylab="Unemployment Rate")
We then split the sample into two parts (up to 2014 and after 2014) in order to be able to show out model accuracy after forecast.
# split sample into two parts - estimation sample and prediction sample
unemp_all <- unemp
unemp1 <- window(unemp_all, end=c(2014,12))
unemp2 <- window(unemp_all, start=c(2015,1))
# first part used to identify and estimate the model
unemp <- unemp1
Based on the information from https://www.quandl.com/data/FRED/UNRATE-Civilian-Unemployment-Rate, the series that we use is actually a seasonally adjusted series. Hence, it is safe to transform our unemployment rate to log changes only (month to month) instead of seasonal differencing. We then construct a new time series \(LUnemp_{t}\) defined as the log changes of monthly Civillian Unemployment Rate, \(LUnemp_{t}= \Delta \log{Unemp_{t}} = \log{Unemp_{t}} - \log{Unemp_{t-1}}\), where \(Unemp_{t}\) is the original monthly Civillian Unemployment Rate, we can implement the calculation in R using the following code:
lunemp <- diff(log(unemp), differences=1)
To check for the unit-root, we will run ADF test and KPSS test for \(LUnemp_{t}\) as follow:
lunemp.urdf <- ur.df(lunemp, type = "trend", selectlags = "BIC")
summary(lunemp.urdf)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.222527 -0.019520 -0.000486 0.020714 0.222800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.384e-04 2.627e-03 0.357 0.721
## z.lag.1 -6.668e-01 4.514e-02 -14.770 < 2e-16 ***
## tt -1.795e-06 5.663e-06 -0.317 0.751
## z.diff.lag -2.330e-01 3.427e-02 -6.799 2.07e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03704 on 797 degrees of freedom
## Multiple R-squared: 0.4652, Adjusted R-squared: 0.4632
## F-statistic: 231.1 on 3 and 797 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -14.77 72.7304 109.0867
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2 6.09 4.68 4.03
## phi3 8.27 6.25 5.34
test1 <- kpss.test(lunemp, null = "Trend")
test1
KPSS Test for Trend Stationarity
data: lunemp
KPSS Trend = 0.027704, Truncation lag parameter = 6, p-value = 0.1
From ADF test we conclude that \(LUnemp_{t}\) is trend stationary. However KPSS test only indicate a borderline rejection of \(H_{0}: LUnemp_{t}\) is stationary. To confirm our finding, we will run an additional Phillips-Perron Test as follow:
To confirm our finding, we will run an additional Phillips-Perron Test as follow:
lunemp.urpp <- ur.pp(lunemp)
summary(lunemp.urpp)
##################################
# Phillips-Perron Unit Root Test #
##################################
Test regression with intercept
Call:
lm(formula = y ~ y.l1)
Residuals:
Min 1Q Median 3Q Max
-0.235391 -0.020688 -0.000393 0.019684 0.234429
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.000393 0.001344 0.292 0.770039
y.l1 0.135899 0.034863 3.898 0.000105 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.03805 on 800 degrees of freedom
Multiple R-squared: 0.01864, Adjusted R-squared: 0.01741
F-statistic: 15.2 on 1 and 800 DF, p-value: 0.0001051
Value of test-statistic, type: Z-alpha is: -942.8991
aux. Z statistics
Z-tau-mu 0.3165
The null hypothesis of Phillips-Perron test is that the \(LUnemp_{t}\) contains a unit root. From the magnitude of the test statistics of -942.8991, compared to the critical value for the test, we found conclusively that \(LUnemp_{t}\) is trend stationary.
The first step in model identification is to plot the ACF and PACF of \(LUnemp_{t}\) :
From the ACF and PACF, the most likely specification for \(LUnemp_{t}\) is ARIMA(1,0,1)(2,0,1)[12]. The next step is to decide whether to use zero mean or non-zero mean specification. To make the determination, we can compare the ADF test from both specification as follow:
lunemp.urdf.nonzeromean <- ur.df(lunemp, type ="drift", selectlags = "BIC")
lunemp.urdf.zeromean <- ur.df(lunemp, selectlags = "BIC")
We then compare the \(\tau_{3}\) statistics for non-zero and zero mean and found the following:
lunemp.urdf.nonzeromean@teststat[1]
[1] -14.77562
lunemp.urdf.zeromean@teststat[1]
[1] -14.78429
We found that zero mean specification is more preferable compared to non-zero mean.Hence, we will estimate ARIMA(1,0,1)(2,0,1)[12] with zero mean as our preferred specification and the base of our forecast. This can also be confirmed using auto.arima function as follow:
m1 <- auto.arima(lunemp, ic="bic", stepwise = FALSE)
m1
## Series: lunemp
## ARIMA(1,0,1)(2,0,1)[12] with zero mean
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1
## 0.8886 -0.7500 0.4505 -0.0608 -0.7090
## s.e. 0.0310 0.0412 0.1056 0.0548 0.1011
##
## sigma^2 estimated as 0.001243: log likelihood=1545.4
## AIC=-3078.8 AICc=-3078.7 BIC=-3050.67
Forecasting with ARIMA(1,0,1)(2,0,1)[12] using \(LUnemp_{t}\) transformation:
m1.fcast <- forecast(m1, h=13)
plot(m1.fcast, xlim=c(2012,2016))
lines(diff(log(unemp_all), differences=1))