Choose “monthly US employment data for leisure and hospitality jobs” series from us_employment, the total employment in different industries in the United States.

  1. Produce an STL decomposition of the data and describe the trend and seasonality.

The data seems to be unpredictable with some increasing trend

us_employment %>% filter(Title == "Leisure and Hospitality") %>%
  model(STL(Employed)) %>%
  components() %>% autoplot()

  1. Do the data need transforming? If so, find a suitable transformation.

Yes, the data need some transformation

us_employment %>% filter(Title == "Leisure and Hospitality") %>%
  model(STL(log(Employed))) %>%
  components() %>% autoplot()

  1. Are the data stationary? If not, find an appropriate differencing which yields stationary data.
us_employment %>% filter(Title == "Leisure and Hospitality")  %>%
  autoplot(Employed)

us_employment %>% filter(Title == "Leisure and Hospitality") %>%
  features(Employed, unitroot_kpss)
## # A tibble: 1 x 3
##   Series_ID     kpss_stat kpss_pvalue
##   <chr>             <dbl>       <dbl>
## 1 CEU7000000001      11.9        0.01
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
  mutate(diff_employed = difference(Employed)) %>%
  features(diff_employed, unitroot_kpss)
## # A tibble: 1 x 3
##   Series_ID     kpss_stat kpss_pvalue
##   <chr>             <dbl>       <dbl>
## 1 CEU7000000001     0.161         0.1
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
  features(Employed, unitroot_ndiffs)
## # A tibble: 1 x 2
##   Series_ID     ndiffs
##   <chr>          <int>
## 1 CEU7000000001      1
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
  mutate(log_employed = log(Employed)) %>%
  features(log_employed, unitroot_nsdiffs)
## # A tibble: 1 x 2
##   Series_ID     nsdiffs
##   <chr>           <int>
## 1 CEU7000000001       1
us_employment %>% filter(Title == "Leisure and Hospitality") %>%
  mutate(log_employed = difference(log(Employed), 12)) %>%
  features(log_employed, unitroot_ndiffs)
## # A tibble: 1 x 2
##   Series_ID     ndiffs
##   <chr>          <int>
## 1 CEU7000000001      1

The data are not stationary since the test statistic (11.95) is bigger than the 0.01 critical value, so the p-value is less than 0.01, so the data are not stationary. We can difference the data, we differentiate the data and we apply the test again to double check. We have a different result and in this case by differenciating, we have a bigger p-value , so the differenciate data are stationary.

  1. Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AICc values?

Asking R to choose our model, we have ARIMA(2,1,2) (0,1,2). Looking at the ACF and PACF plot, we can suggest a ARIMA(3,1,0) and (2,1,0).

According to the AICc values, the best model is the stepwise and auto. They are the same since they have the lower AICc= 9423.102

us_employment %>%
  filter(Title == "Leisure and Hospitality")  %>%
  gg_tsdisplay(difference(Employed), plot_type='partial')

us_fit <- us_employment %>%
  filter(Title == "Leisure and Hospitality") %>%
  model(arima310 = ARIMA(Employed ~ pdq(3,1,0)),
        arima210 = ARIMA(Employed ~ pdq(2,1,0)),
        stepwise = ARIMA(Employed),
        auto = ARIMA(Employed, stepwise=FALSE))
report (us_fit)
## # A tibble: 4 x 9
##   Series_ID     .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots  
##   <chr>         <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>    
## 1 CEU7000000001 arima310  1160.  -4728. 9467. 9467. 9497. <cpl [3]> <cpl [24]>
## 2 CEU7000000001 arima210  1161.  -4729. 9468. 9468. 9492. <cpl [2]> <cpl [24]>
## 3 CEU7000000001 stepwise  1104.  -4705. 9423. 9423. 9457. <cpl [2]> <cpl [26]>
## 4 CEU7000000001 auto      1104.  -4705. 9423. 9423. 9457. <cpl [2]> <cpl [26]>
  1. Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
fit2 <- us_employment %>%
  filter(Title == "Leisure and Hospitality") %>%
  model(ARIMA(Employed)) %>%
  report(fit2)
## Series: Employed 
## Model: ARIMA(2,1,2)(0,1,2)[12] 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2     sma1     sma2
##       1.6621  -0.9333  -1.5105  0.7822  -0.4322  -0.1297
## s.e.  0.0327   0.0299   0.0585  0.0489   0.0342   0.0359
## 
## sigma^2 estimated as 1104:  log likelihood=-4704.55
## AIC=9423.1   AICc=9423.22   BIC=9457.14
fit2 %>% gg_tsresiduals()

augment(fit2) %>% features(.innov, ljung_box, lag=10, dof=2)
## # A tibble: 1 x 4
##   Series_ID     .model          lb_stat lb_pvalue
##   <chr>         <chr>             <dbl>     <dbl>
## 1 CEU7000000001 ARIMA(Employed)    21.0   0.00724

Since we have a low p-value, it means that we do have some white noise

  1. Forecast the next 3 years of data. Get the latest figures from https://fred.stlouisfed.org/categories/11 to check the accuracy of your forecasts.

Checking the the latest figures from https://fred.stlouisfed.org/categories/11, we can see that the Us employment for Leisure and Hospitality decreased in 2020 and was around 10,000 in May 2020 which means that our forecast might not be accurate

ARIMA <- us_employment %>%
  filter(Title == "Leisure and Hospitality") %>%
  model(ARIMA(Employed))

ARIMA %>%
  forecast(h = "3 years") %>%
  autoplot(us_employment, level = NULL)

  1. Eventually, the prediction intervals are so wide that the forecasts are not particularly useful. How many years of forecasts do you think are sufficiently accurate to be usable?

I think that one year forecast will be sufficiently accurate to be usable, even less than a year.

PI <- ARIMA %>%
  forecast(h = "3 years") %>%
  mutate(PI = hilo(Employed, level = 95))
PI
## # A fable: 36 x 6 [1M]
## # Key:     Series_ID, .model [1]
##    Series_ID     .model             Month        Employed  .mean                     PI
##    <chr>         <chr>              <mth>          <dist>  <dbl>                 <hilo>
##  1 CEU7000000001 ARIMA(Employed) 2019 Oct  N(16717, 1104) 16717. [16652.22, 16782.49]95
##  2 CEU7000000001 ARIMA(Employed) 2019 Nov  N(16519, 2569) 16519. [16419.55, 16618.24]95
##  3 CEU7000000001 ARIMA(Employed) 2019 Dec  N(16536, 4302) 16536. [16407.56, 16664.67]95
##  4 CEU7000000001 ARIMA(Employed) 2020 Jan  N(16206, 6108) 16206. [16053.25, 16359.61]95
##  5 CEU7000000001 ARIMA(Employed) 2020 Feb  N(16347, 7774) 16347. [16173.82, 16519.45]95
##  6 CEU7000000001 ARIMA(Employed) 2020 Mar  N(16606, 9159) 16606. [16418.33, 16793.48]95
##  7 CEU7000000001 ARIMA(Employed) 2020 Apr N(16928, 10234) 16928. [16729.77, 17126.31]95
##  8 CEU7000000001 ARIMA(Employed) 2020 May N(17309, 11062) 17309. [17103.16, 17515.45]95
##  9 CEU7000000001 ARIMA(Employed) 2020 Jun N(17746, 11753) 17746. [17533.32, 17958.29]95
## 10 CEU7000000001 ARIMA(Employed) 2020 Jul N(17815, 12422) 17815. [17596.34, 18033.23]95
## # ... with 26 more rows