Choose “monthly US employment data for leisure and hospitality jobs” series from
us_employment, the total employment in different industries in the United States.
- Produce an STL decomposition of the data and describe the trend and seasonality.
The data seems to be unpredictable with some increasing trend
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
model(STL(Employed)) %>%
components() %>% autoplot()
- Do the data need transforming? If so, find a suitable transformation.
Yes, the data need some transformation
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
model(STL(log(Employed))) %>%
components() %>% autoplot()
- Are the data stationary? If not, find an appropriate differencing which yields stationary data.
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
autoplot(Employed)
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
features(Employed, unitroot_kpss)
## # A tibble: 1 x 3
## Series_ID kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 CEU7000000001 11.9 0.01
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
mutate(diff_employed = difference(Employed)) %>%
features(diff_employed, unitroot_kpss)
## # A tibble: 1 x 3
## Series_ID kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 CEU7000000001 0.161 0.1
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
features(Employed, unitroot_ndiffs)
## # A tibble: 1 x 2
## Series_ID ndiffs
## <chr> <int>
## 1 CEU7000000001 1
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
mutate(log_employed = log(Employed)) %>%
features(log_employed, unitroot_nsdiffs)
## # A tibble: 1 x 2
## Series_ID nsdiffs
## <chr> <int>
## 1 CEU7000000001 1
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
mutate(log_employed = difference(log(Employed), 12)) %>%
features(log_employed, unitroot_ndiffs)
## # A tibble: 1 x 2
## Series_ID ndiffs
## <chr> <int>
## 1 CEU7000000001 1
The data are not stationary since the test statistic (11.95) is bigger than the 0.01 critical value, so the p-value is less than 0.01, so the data are not stationary. We can difference the data, we differentiate the data and we apply the test again to double check. We have a different result and in this case by differenciating, we have a bigger p-value , so the differenciate data are stationary.
- Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AICc values?
Asking R to choose our model, we have ARIMA(2,1,2) (0,1,2). Looking at the ACF and PACF plot, we can suggest a ARIMA(3,1,0) and (2,1,0).
According to the AICc values, the best model is the stepwise and auto. They are the same since they have the lower AICc= 9423.102
us_employment %>%
filter(Title == "Leisure and Hospitality") %>%
gg_tsdisplay(difference(Employed), plot_type='partial')
us_fit <- us_employment %>%
filter(Title == "Leisure and Hospitality") %>%
model(arima310 = ARIMA(Employed ~ pdq(3,1,0)),
arima210 = ARIMA(Employed ~ pdq(2,1,0)),
stepwise = ARIMA(Employed),
auto = ARIMA(Employed, stepwise=FALSE))
report (us_fit)
## # A tibble: 4 x 9
## Series_ID .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 CEU7000000001 arima310 1160. -4728. 9467. 9467. 9497. <cpl [3]> <cpl [24]>
## 2 CEU7000000001 arima210 1161. -4729. 9468. 9468. 9492. <cpl [2]> <cpl [24]>
## 3 CEU7000000001 stepwise 1104. -4705. 9423. 9423. 9457. <cpl [2]> <cpl [26]>
## 4 CEU7000000001 auto 1104. -4705. 9423. 9423. 9457. <cpl [2]> <cpl [26]>
- Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
fit2 <- us_employment %>%
filter(Title == "Leisure and Hospitality") %>%
model(ARIMA(Employed)) %>%
report(fit2)
## Series: Employed
## Model: ARIMA(2,1,2)(0,1,2)[12]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 sma2
## 1.6621 -0.9333 -1.5105 0.7822 -0.4322 -0.1297
## s.e. 0.0327 0.0299 0.0585 0.0489 0.0342 0.0359
##
## sigma^2 estimated as 1104: log likelihood=-4704.55
## AIC=9423.1 AICc=9423.22 BIC=9457.14
fit2 %>% gg_tsresiduals()
augment(fit2) %>% features(.innov, ljung_box, lag=10, dof=2)
## # A tibble: 1 x 4
## Series_ID .model lb_stat lb_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 CEU7000000001 ARIMA(Employed) 21.0 0.00724
Since we have a low p-value, it means that we do have some white noise
- Forecast the next 3 years of data. Get the latest figures from https://fred.stlouisfed.org/categories/11 to check the accuracy of your forecasts.
Checking the the latest figures from https://fred.stlouisfed.org/categories/11, we can see that the Us employment for Leisure and Hospitality decreased in 2020 and was around 10,000 in May 2020 which means that our forecast might not be accurate
ARIMA <- us_employment %>%
filter(Title == "Leisure and Hospitality") %>%
model(ARIMA(Employed))
ARIMA %>%
forecast(h = "3 years") %>%
autoplot(us_employment, level = NULL)
- Eventually, the prediction intervals are so wide that the forecasts are not particularly useful. How many years of forecasts do you think are sufficiently accurate to be usable?
I think that one year forecast will be sufficiently accurate to be usable, even less than a year.
PI <- ARIMA %>%
forecast(h = "3 years") %>%
mutate(PI = hilo(Employed, level = 95))
PI
## # A fable: 36 x 6 [1M]
## # Key: Series_ID, .model [1]
## Series_ID .model Month Employed .mean PI
## <chr> <chr> <mth> <dist> <dbl> <hilo>
## 1 CEU7000000001 ARIMA(Employed) 2019 Oct N(16717, 1104) 16717. [16652.22, 16782.49]95
## 2 CEU7000000001 ARIMA(Employed) 2019 Nov N(16519, 2569) 16519. [16419.55, 16618.24]95
## 3 CEU7000000001 ARIMA(Employed) 2019 Dec N(16536, 4302) 16536. [16407.56, 16664.67]95
## 4 CEU7000000001 ARIMA(Employed) 2020 Jan N(16206, 6108) 16206. [16053.25, 16359.61]95
## 5 CEU7000000001 ARIMA(Employed) 2020 Feb N(16347, 7774) 16347. [16173.82, 16519.45]95
## 6 CEU7000000001 ARIMA(Employed) 2020 Mar N(16606, 9159) 16606. [16418.33, 16793.48]95
## 7 CEU7000000001 ARIMA(Employed) 2020 Apr N(16928, 10234) 16928. [16729.77, 17126.31]95
## 8 CEU7000000001 ARIMA(Employed) 2020 May N(17309, 11062) 17309. [17103.16, 17515.45]95
## 9 CEU7000000001 ARIMA(Employed) 2020 Jun N(17746, 11753) 17746. [17533.32, 17958.29]95
## 10 CEU7000000001 ARIMA(Employed) 2020 Jul N(17815, 12422) 17815. [17596.34, 18033.23]95
## # ... with 26 more rows