Load packages and data

library(fpp3)
library(readxl)

Questions

Exercise 1

A

Each data set has the same amount of lags in each of their ACFs. None of the spikes are larger than the critical value range. This means that all of this data resembles white noise. The main difference between each data set is the number of observations. The data set with the smallest number of observations, has the largest chance of an issue as it is hard for each lag to be autocorrelated. The opposite is true for the data set with more observations.

B

As sample size increases, the critical values get smaller.The autocorrelations are different because we’re generating numbers at random so the relationship between these numbers will be random for the smaller sample while it starts to level out for a larger sample size.

Exercise 2

gafa_stock %>%
    filter(Symbol == 'AMZN') %>%
    gg_tsdisplay(Close, plot_type = 'partial')
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

The Amazon stock is not stationary. A stationary series has no predictable patterns and no trend or seasonality. The Amazon stock has an upward trend. I can also identify that the ACF is slowly decreasing. I also know that the initial value of the PACF is extremely high meaning that there is very high correlation between the past and the next observation. This all means that the Amazon stock is not stationary and there is a strong case for difference.

Exercise 3

turkey <- global_economy %>%
  filter(Country == "Turkey") %>%
  mutate(GDP = (GDP/1000000)) %>%
  select(Year, GDP) %>%
  na.omit()

turkey %>%
  ggplot(aes(x = Year, y = GDP)) +
  geom_line(aes(y = GDP))

turkey %>%
  gg_tsdisplay(GDP, plot_type = 'partial')

turkeyOG <- turkey %>%
  features(GDP, features = guerrero)

turkeygang <- turkey %>%
  mutate(GDP = box_cox(GDP, turkeyOG))

turkeygang %>%
  features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1
turkeydiff <- turkeygang %>%
  mutate(GDP = difference(GDP)) %>%
  na.omit()

turkeydiff %>%
  ggplot(aes(x = Year, y = GDP)) +
  geom_line(aes(y = GDP)) 

turkeydiff %>%
  features(GDP, unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0889         0.1

The residuals show that this data is not stationary. I used a Box cox to help with the lack of constant variance. The data has a strong trend with non-constant variance. Due to this I will use a box cox to fix this. Next I will use a difference to make it stationary. After using both the Box Cox and differencing, my data is stationary. I can check this by using the KPSS test (p-value > 0.05)

Notation: Byt = (1-B)yt

B

tasmania <- aus_accommodation %>%
  filter(State == "Tasmania") %>%
  select(Date, Takings)

tasmania %>%
  ggplot(aes(x = Date, y = Takings)) +
  geom_line(aes(y = Takings)) 

tasmania %>%
  gg_tsdisplay(Takings, plot_type = 'partial')

tasmania %>%
  features(Takings, unitroot_nsdiffs)
## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1
tasmaniadiff <- tasmania %>%
  mutate(Takings = difference(Takings, 4)) %>%
  na.omit()

tasmaniadiff %>%
  features(Takings, unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0
tasmaniadiff %>%
  ggplot(aes(x = Date, y = Takings)) +
  geom_line(aes(y = Takings))

tasmaniadiff %>%
  features(Takings, unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.196         0.1

There is an upward trend with constant variance. Plotting the residuals, I found that the data isn’t stationary. (High correlation in ACF and PACF at lag 6) Using the unitroot function, I found that I needed to use a seasonal difference for my data. I found that I didn’t need to take any differences after the seasonal. (Using the unitroot function) After using seasonal difference, I ran a KPSS test to see if my data was now stationary. This was confirmed due to my p-value being larger than 0.05. This means that the data is stationary.

C

souvenirs %>%
  ggplot(aes(x = Month, y = Sales)) +
  geom_line(aes(y = Sales))

souvenirs %>%
  gg_tsdisplay(Sales, plot_type = 'partial')

souvenirsOG <- souvenirs %>%
  features(Sales, features = guerrero)

souvenirsgang <- souvenirs %>%
  mutate(Sales = box_cox(Sales, souvenirsOG))

souvenirsgang %>%
  features(Sales, unitroot_nsdiffs)
## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1
souvenirsdiff <- souvenirsgang %>%
  mutate(Sales = difference(Sales, 12)) %>%
  na.omit()

souvenirsdiff %>%
  features(Sales, unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0
souvenirsdiff %>%
  ggplot(aes(x = Month, y = Sales)) +
  geom_line(aes(y = Sales))

souvenirsdiff %>%
  features(Sales, unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.240         0.1

This data has upward trend with non-constant variance. This is similar to part A. I will do the same thing as I did in part A. (Box-Cox and difference) Using the Unitroot function I found that, unlike in part A, I needed to use seasonal difference. I will use the seasonal difference in place for the first difference. After using the Box-Cox and seasonal differencing, I can see in the plot that my data is stationary. This is confirmed by my KPSS test in which the p-value is above 0.05

Notation: (1-B^12)yt

Exercise 4

A & B

ar1 <- function(phi, n = 100) {
  y <- numeric(n)
  e <- rnorm(n)
  for (i in 2:n) {
    y[i] <- phi * y[i - 1] + e[i]
  }
  tsibble(idx = seq_len(n), y = y, index = idx)
}
ar1(0.6) %>%
  autoplot() +
  labs(title = "AR1(0.6)")
## Plot variable not specified, automatically selected `.vars = y`

ar1(0) %>%
  autoplot() +
  labs(title = "AR1(0)")
## Plot variable not specified, automatically selected `.vars = y`

ar1(-1) %>%
  autoplot() +
  labs(title = "AR1(-1)")
## Plot variable not specified, automatically selected `.vars = y`

ar1(1) %>%
  autoplot() +
  labs(title = "AR1(1)")
## Plot variable not specified, automatically selected `.vars = y`

As I changed phi, the level of my variation changes. phi = 0, then the data resembles white noise phi = 1, the data follows a random walk with some trend phi = -1, the data flips signs each period phi = 0.6, the data is stationary (This is best)

C & D

ma1 = function(param = 0.6)
{
  seed = 123
  y <- ts(numeric(100))
  forecast.err <- rnorm(100)
  rand.err = rnorm(100)
  for(i in 2:100)
    y[i] <- param *  forecast.err[i-1] + rand.err[i]
  return(y)
}

ma1(0.6) %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "MA1(0.6)")
## Plot variable not specified, automatically selected `.vars = value`

ma1(0) %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "MA1(0)")
## Plot variable not specified, automatically selected `.vars = value`

ma1(1) %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "MA1(1)")
## Plot variable not specified, automatically selected `.vars = value`

ma1(-1) %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "MA1(-1)")
## Plot variable not specified, automatically selected `.vars = value`

For each value of theta, I found that the data kept resembling white noise.

E

arma1 = function(alpha = 0.6, theta = 0.6)
{
  seed = 123
  y <- ts(numeric(100))
  e <- rnorm(100)
  rand.error = rnorm(100)
  for(i in 2:100)
    y[i] <- alpha *  y[i-1] + theta * e[i] + rand.error[i]
  return(y)
}
arma1() %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "ARMA(1,1)")
## Plot variable not specified, automatically selected `.vars = value`

F

ar2 = function(alpha1 = -0.8,  alpha2 = 0.3)
{
  seed = 123
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 3:100)
    y[i] <- alpha1 *  y[i-1] + alpha2 *  y[i-2] + e[i]
  return(y)
}

ar2() %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "AR(2)")
## Plot variable not specified, automatically selected `.vars = value`

G

The ARMA (1,1) has no trend or seasonality. I cannot identify an patterns of this data. I can confidently say that this data is stationary. This model would be ready to forecast further. The AR(2) doesn’t seem stationary. The data has an exponential pattern, and the sign flips each period. The biggest difference between the two is that the AR(2) isn’t ready to forecast accuratly and the ARMA(1,1) is.

Exercise 5

A

aus_airpassengers %>%
  filter(Year < 2012)
## # A tsibble: 42 x 2 [1Y]
##     Year Passengers
##    <dbl>      <dbl>
##  1  1970       7.32
##  2  1971       7.33
##  3  1972       7.80
##  4  1973       9.38
##  5  1974      10.7 
##  6  1975      11.1 
##  7  1976      10.9 
##  8  1977      11.3 
##  9  1978      12.1 
## 10  1979      13.0 
## # … with 32 more rows
airauto <- aus_airpassengers %>%
  model(
    'Auto' = ARIMA(Passengers)
  )

airauto %>%
  gg_tsresiduals()

augment(airauto) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 Auto      6.70     0.754
airauto %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

B

Notation: (1-2B+B^2)yt

C

aus010 <- aus_airpassengers %>%
  model(
    ARIMA(Passengers ~ pdq(0,1,0))
  )

aus010 %>%
  gg_tsresiduals()

augment(aus010) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model                           lb_stat lb_pvalue
##   <chr>                              <dbl>     <dbl>
## 1 ARIMA(Passengers ~ pdq(0, 1, 0))    6.77     0.747
aus010 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

D

aus212 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))

aus212 %>%
  forecast(h= 10) %>%
  autoplot(aus_airpassengers)

aus212.noc <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
aus212.noc %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 row(s) containing missing values (geom_path).

I couldn’t get this to work without warning messages or errors.

E

aus021 <- aus_airpassengers %>%
  model(
    ARIMA(Passengers ~ 1 + pdq(0,2,1))
  )
## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.
aus021 %>%
  gg_tsresiduals()

augment(aus021) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model                               lb_stat lb_pvalue
##   <chr>                                  <dbl>     <dbl>
## 1 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))    6.91     0.734
aus021 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

Exercise 6

A

usa <- global_economy %>%
  filter(Code == "USA") %>%
  mutate(GDP = (GDP/1000000000000)) %>%
  select(Year, GDP)

usa %>%
  ggplot(aes(x = Year, y = GDP)) +
  geom_line(aes(y = GDP))

usaOG <- usa %>%
  features(GDP, features = guerrero)

usagang <- usa %>%
  mutate(GDP = box_cox(GDP, usaOG))

usagang %>%
  autoplot()
## Plot variable not specified, automatically selected `.vars = GDP`

B

usa %>%
  gg_tsdisplay(GDP, plot_type = 'partial')

usa %>%
  features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      2
usaauto <- usa %>%
  model(
    'Auto' = ARIMA(GDP)
  )

usaauto %>%
  report()
## Series: GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4206  -0.3048
## s.e.   0.1197   0.1078
## 
## sigma^2 estimated as 0.02615:  log likelihood=23.26
## AIC=-40.52   AICc=-40.06   BIC=-34.45

C

usacomp <- usa %>%
  model(
    'Auto' = ARIMA(GDP),
    'Diffs Only' = ARIMA(GDP ~ pdq(0,2,0)),
    'Autoregression + Diff' = ARIMA(GDP ~ pdq(1,2,0)),
    "Moving Average + Diff" = ARIMA(GDP ~ pdq(0,2,1)),
    "AR + Diff + MA" = ARIMA(GDP ~ pdq(1,2,1))
  )

usacomp %>%
  report()
## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.
## # A tibble: 5 × 8
##   .model                sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>                  <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 Auto                  0.0261    23.3 -40.5 -40.1 -34.4 <cpl [0]> <cpl [2]>
## 2 Diffs Only            0.0310    17.8 -33.5 -33.5 -31.5 <cpl [0]> <cpl [0]>
## 3 Autoregression + Diff 0.0306    18.6 -33.3 -33.0 -29.2 <cpl [1]> <cpl [0]>
## 4 Moving Average + Diff 0.0292    19.8 -35.6 -35.3 -31.5 <cpl [0]> <cpl [1]>
## 5 AR + Diff + MA        0.0263    23.1 -40.2 -39.7 -34.1 <cpl [1]> <cpl [1]>

D

usaallofem <- usa %>%
  model(
    "AR + Diff + MA" = ARIMA(GDP ~ pdq(1,2,1))
  )
usaallofem %>%
  gg_tsresiduals()

augment(usaauto) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 Auto      12.2     0.273
augment(usaallofem) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model         lb_stat lb_pvalue
##   <chr>            <dbl>     <dbl>
## 1 AR + Diff + MA    14.1     0.169

E

usaallofem %>%
  forecast(h = 10) %>%
  autoplot(usa)

This forecast method works. It continues the upward trend, and looks to be accurate based on previous knowledge. This forecast method’s only downfall is that with something like GDP it is subject to exogenous shocks. The PI’s don’t necessarily account for a huge change.

F

usaets <- usa %>%
  model('Auto' = ETS(GDP),
        'Multiplicative Trend' = ETS(GDP ~ error("A") + trend("M") + season("N"))
        )
usaets %>%
  report()
## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.
## # A tibble: 2 × 9
##   .model                 sigma2 log_lik   AIC  AICc   BIC    MSE  AMSE    MAE
##   <chr>                   <dbl>   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>
## 1 Auto                 0.000455    23.8 -37.7 -36.5 -27.4 0.0280 0.128 0.0169
## 2 Multiplicative Trend 0.0318     -15.7  41.4  42.6  51.7 0.0296 0.137 0.0983
usaets %>%
  forecast(h = 10) %>%
  autoplot(usa)

Exercise 7

A

japanar <- aus_arrivals %>%
  filter(Origin == "Japan")

autoplot(japanar)  
## Plot variable not specified, automatically selected `.vars = Arrivals`

B

japanardiff <- japanar %>%
  gg_tsdisplay(difference(Arrivals, 4) %>% 
                 difference(), plot_type = "partial" , lag = 16)
japanardiff
## Warning: Removed 5 row(s) containing missing values (geom_path).
## Warning: Removed 5 rows containing missing values (geom_point).

C

There is very high autocorrelation at lag 4. This means that this data is very highly seasonal as the data is quarterly and the 4th lag would be the same time the next year. I also know that this data is visits to Australia, so I know that travel is very seasonal as well.

D

The PACF still shows autocorrelation, even after we take out the effects of the lags in the ACF. This data is extremely seasonal, so it makes sense for there still to be a lot of autocorrelation. The PACF shows critical values at lag 8 and 12. This means that same quarter from each year is correlated.

E

Using the two graphs I can see that this data is very seasonal. I need to difference the data to make it stationary. If I difference the data twice it loses all meaning. Knowing all of this the best model to use would be a seasonal difference with a lag of 4.

F

japanfitty <- japanar %>%
  model(ARIMA(Arrivals))

report(japanfitty)
## Series: Arrivals 
## Model: ARIMA(0,1,1)(1,1,1)[4] 
## 
## Coefficients:
##           ma1     sar1     sma1
##       -0.4228  -0.2305  -0.5267
## s.e.   0.0944   0.1337   0.1246
## 
## sigma^2 estimated as 174801727:  log likelihood=-1330.66
## AIC=2669.32   AICc=2669.66   BIC=2680.54
japanarima <- japanar %>%
  model(ARIMA(Arrivals ~ pdq(0,1,1) + PDQ(1,1,1)))

report(japanarima)
## Series: Arrivals 
## Model: ARIMA(0,1,1)(1,1,1)[4] 
## 
## Coefficients:
##           ma1     sar1     sma1
##       -0.4228  -0.2305  -0.5267
## s.e.   0.0944   0.1337   0.1246
## 
## sigma^2 estimated as 174801727:  log likelihood=-1330.66
## AIC=2669.32   AICc=2669.66   BIC=2680.54
japanarimaagain <- japanar %>%
  model(ARIMA(Arrivals ~ pdq(1,1,0) + PDQ(1,1,1)))

report(japanarimaagain)
## Series: Arrivals 
## Model: ARIMA(1,1,0)(1,1,1)[4] 
## 
## Coefficients:
##           ar1     sar1     sma1
##       -0.3091  -0.2129  -0.5534
## s.e.   0.0889   0.1321   0.1224
## 
## sigma^2 estimated as 180712140:  log likelihood=-1332.66
## AIC=2673.32   AICc=2673.67   BIC=2684.54

G

?????

Exercise 8

NatUE <- read_excel("~/Downloads/Plz let me see .xlsx")

unemploy <- NatUE %>%
  mutate(Date = yearquarter(Date)) %>%
  as_tsibble(index = Date)

B

unemploy %>% 
  autoplot() +  
  labs(title = "Natural Rate of Unemployment")
## Plot variable not specified, automatically selected `.vars = Value`

unemploy %>%
  features(Value, unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1
NUEcomp <- unemploy %>% 
  model(
    'Auto' = ARIMA(Value),
    'Diffs Only' = ARIMA(Value ~ pdq(0,1,0)),
    'Autoregression + Diff' = ARIMA(Value ~ pdq(1,1,0)),
    "Moving Average + Diff" = ARIMA(Value ~ pdq(0,1,1)),
    "All" = ARIMA(Value ~ pdq(1,1,1))
        ) 
NUEcomp %>% 
  report()
## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.
## # A tibble: 5 × 8
##   .model                  sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>                    <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 Auto                  0.000370    211. -408. -407. -391. <cpl [3]> <cpl [3]>
## 2 Diffs Only            0.00106     165. -325. -325. -318. <cpl [8]> <cpl [0]>
## 3 Autoregression + Diff 0.000453    202. -398. -398. -391. <cpl [1]> <cpl [4]>
## 4 Moving Average + Diff 0.000778    179. -352. -352. -345. <cpl [0]> <cpl [5]>
## 5 All                   0.000444    203. -398. -398. -389. <cpl [1]> <cpl [5]>
NUEcomp8 <- unemploy %>% 
  model(
    'Auto' = ARIMA(Value))

NUEcomp8 %>%
  report()
## Series: Value 
## Model: ARIMA(3,1,3) 
## 
## Coefficients:
##          ar1     ar2      ar3      ma1      ma2     ma3
##       0.8988  0.3956  -0.4571  -0.3412  -0.3247  0.8130
## s.e.  0.1208  0.1751   0.1147   0.0950   0.0752  0.0886
## 
## sigma^2 estimated as 0.0003698:  log likelihood=211.01
## AIC=-408.03   AICc=-406.53   BIC=-391.1

C

NUEauto <- unemploy %>%
  model('Auto' = ARIMA(Value))

NUEauto %>%
  gg_tsresiduals()

augment(NUEauto) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 Auto      8.05     0.624

When looking at the graph of the residuals, they do not seem to resemble white noise. There is still a trend in my residuals over time, there is slight autocorrelation, but the residuals seem to be normally distributed around zero (Kinda?). I ran a Ljung-Box test to check my residuals, and my p-value was above 0.05 therefore my residuals don’t resemble white noise.

D

NUEauto %>%
  forecast(h = 4) %>%
  autoplot(unemploy) 

E

unemploy %>% 
  model(
    'Auto' = ETS(Value),
    'Simple' = ETS(Value ~ error("A") + trend("N") + season("N")),
    'Holt' = ETS(Value ~ error("A") + trend("A") + season("N")),
    'Multiplicative Trend' = ETS(Value ~ error("A") + trend("M") + season("N"))
    ) %>% 
  report(unemploy)
## Warning in report.mdl_df(., unemploy): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 4 × 9
##   .model                sigma2 log_lik   AIC  AICc   BIC     MSE    AMSE     MAE
##   <chr>                  <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
## 1 Auto                 1.59e-5   145.  -277. -276. -263. 4.36e-4 0.00241 0.00190
## 2 Simple               2.46e-3    67.2 -128. -128. -121. 2.41e-3 0.0104  0.0321 
## 3 Holt                 4.77e-4   137.  -264. -264. -252. 4.54e-4 0.00261 0.0104 
## 4 Multiplicative Trend 4.81e-4   137.  -264. -263. -251. 4.58e-4 0.00265 0.0102

F

NUEholt <- unemploy %>%
    model(
      'Holt' = ETS(Value ~ error("A") + trend("A") + season("N"))
      )

NUEholt %>%
  gg_tsresiduals()

augment(NUEholt) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 Holt      17.5    0.0636

When I used the best of the ETS models (Auto), I found that my residuals look almost the exact same as when I used the best ARIMA model. I ran a Ljung-Box test to check my residuals, and I found that the p-value is now above 0.05 meaning the residuals aren’t white noise.

G

NUEholt %>%
  forecast(h = 4) %>%
  autoplot(unemploy)

H

unemploy %>%
  stretch_tsibble(.init = 10) %>%
  model(
    'Holt' = ETS(Value ~ error("A") + trend("A") + season("N")),
    'Auto ARIMA' = ARIMA(Value)) %>%
  forecast(h = 1) %>%
  accuracy(unemploy)
## Warning in sqrt(diag(best$var.coef)): NaNs produced

## Warning in sqrt(diag(best$var.coef)): NaNs produced

## Warning in sqrt(diag(best$var.coef)): NaNs produced
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 1 observation is missing at 2021 Q1
## # A tibble: 2 × 10
##   .model     .type        ME   RMSE    MAE      MPE  MAPE  MASE RMSSE   ACF1
##   <chr>      <chr>     <dbl>  <dbl>  <dbl>    <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Auto ARIMA Test  -0.00662  0.0315 0.0187 -0.128   0.360 0.147 0.169 -0.109
## 2 Holt       Test  -0.000680 0.0242 0.0128 -0.00277 0.241 0.101 0.130  0.110

I used cross-validation to check the difference of accuracy between the two models. I found that the Holt method has a slightly lower RMSE. This means that the Holt method is likely more accurate than the ARIMA model. However there seems to be major issues with both models (Either residuals resemble white noise) therefore I don’t really trust any of the methods I tried.