library(fpp3)

Introduction

In this document, we will be going through exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 from Forecasting: Principles and Practice (3rd ed).

Exercises

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Figure 9.32
Figure 9.32
  • Explain the differences among these figures. Do they all indicate that the data are white noise?

The difference between these figures are the fact that the auto correlation of each lag decreases as the sample size grows bigger. In each case data is indicated as white noise.

  • Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values decrease as the sample size does because as the amount of data we have increases, so does the assumption that we are closer to the true distribution of the population values. If there is even a small amount of correlation with a very large sample size we know that there might be true autocorrelation within the population. This is also why the autocorrelations decrease over time. The more data we have the less any incidental correlation would show up in the ACF over the true autocorrelation of random data which is 0.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock |>
  filter(Symbol == "AMZN") |>
  gg_tsdisplay(Close, plot_type='partial')

The daily closing prices can be seen to be non-stationary from the daily closing prices as we have a clear increasing trend up until recently. The ACF shows only the tiniest bit of decreasing auto-correlation up to 30 lags out, normally auto-correlation would drop close to zero if the data was non-stationary. The PACF plot tells us something similar with the first lag being extremely partially auto-correlated, as a high partial auto-correlation of the first lag means every point and the one before it heavily influence each other. Thankfully it is the same property that would leave differencing this data into change of close price over time to be transformed into a stationary time series.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

  • Turkish GDP from global_economy.

First we initially plot the data to get an idea of what kind of series it is. Here we see that it is non seasonal data with an increasing trend and increasing variance over time as well.

df <- global_economy |>
  filter(Country == "Turkey") 

df |>
  gg_tsdisplay(GDP)

As an experiment, I decided to check if transforming such a series where the change in variance seems to be minimal was even necessary. We can see below that a log transformation itself provides the benefit of a more homoscedastic differenced series.

df |>
  transmute(
    `GDP` = GDP,
    `Log GDP` = log(GDP),
    `Annual change in GDP` = difference(GDP),
    `Annual change in log GDP` = difference(log(GDP)),
    `Doubly differenced GDP` = difference(GDP, differences = 2),
    `Doubly differenced log GDP` = difference(log(GDP), differences = 2)
    ) |>
  pivot_longer(-Year, names_to="Type", values_to="GDP") |>
  mutate(
    Type = factor(Type, levels = c(
      "GDP",
      "Log GDP",
      "Annual change in GDP",
      "Annual change in log GDP",
      "Doubly differenced GDP",
      "Doubly differenced log GDP"
      ))
  ) |>
  ggplot(aes(x = Year, y = GDP)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Turkish GDP", y = NULL)

Next we find the appropriate box-cox transformation lambda value through the guerrero feature. While utilizing the unitroot category of features to find how many differences we should take of both seasonal and first variety. Note that in this case, since there is no seasonal trend from plotting, we already know that seasonal-differences would not be required.

df |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero) -> lambda

df |>
  features(box_cox(GDP, lambda), features = unitroot_nsdiffs) |>
  pull(nsdiffs) -> nsdiffs

df |>
  features(box_cox(GDP, lambda), features = unitroot_ndiffs) |>
  pull(ndiffs) -> ndiffs

cat("The optimal box-cox lambda value for this series is", lambda,
    "\nThe optimal number of seasonal differences for this series is", nsdiffs,
    "\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.1572187 
## The optimal number of seasonal differences for this series is 0 
## The optimal number of first differences for this series is 1

Below we can see the previously given lambda and orders of differences working well.

df |>
  transmute(
    `GDP` = GDP,
    `Box-Cox GDP` = box_cox(GDP, lambda),
    `First Differenced Box-Cox GDP` = difference(box_cox(GDP, lambda), ndiffs)
    ) |>
  pivot_longer(-Year, names_to="Type", values_to="GDP") |>
  mutate(
    Type = factor(Type, levels = c(
      "GDP",
      "Box-Cox GDP",
      "First Differenced Box-Cox GDP"
      ))
  ) |>
  ggplot(aes(x = Year, y = GDP)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Turkish GDP", y = NULL)

  • Accommodation takings in the state of Tasmania from aus_accommodation.

First we initially plot the data to get an idea of what kind of series it is. Here we see that it is seasonal data with an increasing trend and increasing variance over time as well.

df <- aus_accommodation |>
  filter(State == "Tasmania")

df |>
  gg_tsdisplay(Takings)

Next we find the appropriate box-cox transformation lambda value through the guerrero feature. While utilizing the unitroot category of features to find how many differences we should take of both seasonal and first variety.

Note that because lambda is so small we could just use a log transform instead, but since interpretability is not a factor here I choose to use the small lambda.

df |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero) -> lambda

df |>
  features(box_cox(Takings, lambda), features = unitroot_nsdiffs) |>
  pull(nsdiffs) -> nsdiffs

df |>
  features(difference(box_cox(Takings, lambda),lag = 4, differences = nsdiffs), features = unitroot_ndiffs) |>
  pull(ndiffs) -> ndiffs

cat("The optimal box-cox lambda value for this series is", lambda,
    "\nThe optimal number of seasonal differences for this series is", nsdiffs,
    "\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.001819643 
## The optimal number of seasonal differences for this series is 1 
## The optimal number of first differences for this series is 0

Below we see that although the transformation and difference help remove seasonality, there seems to be more variance in the beginning so an ARIMA model based off of this data might not be optimal.

df |>
  transmute(
    `Takings` = Takings,
    `Box-Cox Takings` = box_cox(Takings, lambda),
    `Seasonal Differenced Box-Cox Takings` = difference(box_cox(Takings, lambda), lag = 4, differences = nsdiffs)
    ) |>
  pivot_longer(-Date, names_to="Type", values_to="Takings") |>
  mutate(
    Type = factor(Type, levels = c(
      "Takings",
      "Box-Cox Takings",
      "Seasonal Differenced Box-Cox Takings"
      ))
  ) |>
  ggplot(aes(x = Date, y = Takings)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Tasmanian Accommodation Takings", y = NULL)

  • Monthly sales from souvenirs.

First we initially plot the data to get an idea of what kind of series it is. Here we see that it is seasonal data with a slightly increasing trend but a heavily increasing variance over time.

df <- souvenirs

df |>
  gg_tsdisplay(Sales)

Next we find the appropriate box-cox transformation lambda value through the guerrero feature. While utilizing the unitroot category of features to find how many differences we should take of both seasonal and first variety.

Note that because lambda is so small we could just use a log transform instead, but since interpretability is not a factor here I choose to use the small lambda.

df |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero) -> lambda

df |>
  features(box_cox(Sales, lambda), features = unitroot_nsdiffs) |>
  pull(nsdiffs) -> nsdiffs

df |>
  features(difference(box_cox(Sales, lambda),lag = 12, differences = nsdiffs), features = unitroot_ndiffs) |>
  pull(ndiffs) -> ndiffs

cat("The optimal box-cox lambda value for this series is", lambda,
    "\nThe optimal number of seasonal differences for this series is", nsdiffs,
    "\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.002118221 
## The optimal number of seasonal differences for this series is 1 
## The optimal number of first differences for this series is 0

Below we see that both seasonality and heteroscedasicity get removed well with the given values.

df |>
  transmute(
    `Sales` = Sales,
    `Box-Cox Sales` = box_cox(Sales, lambda),
    `Seasonal Differenced Box-Cox Sales` = difference(box_cox(Sales, lambda), lag = 12, differences = nsdiffs)
    ) |>
  pivot_longer(-Month, names_to="Type", values_to="Sales") |>
  mutate(
    Type = factor(Type, levels = c(
      "Sales",
      "Box-Cox Sales",
      "Seasonal Differenced Box-Cox Sales"
      ))
  ) |>
  ggplot(aes(x = Month, y = Sales)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Souvenir Sales", y = NULL)

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(1234567)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
df <- myseries

First we initially plot the data to get an idea of what kind of series it is. Here we see that it is seasonal data with an increasing trend and increasing variance over time as well.

df |>
  gg_tsdisplay(Turnover)

Next we find the appropriate box-cox transformation lambda value through the guerrero feature. While utilizing the unitroot category of features to find how many differences we should take of both seasonal and first variety.

Note that because lambda is so small we could just use a log transform instead, but since interpretability is not a factor here I choose to use the small lambda.

df |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero) -> lambda

df |>
  features(box_cox(Turnover, lambda), features = unitroot_nsdiffs) |>
  pull(nsdiffs) -> nsdiffs

df |>
  features(difference(box_cox(Turnover, lambda),lag = 12, differences = nsdiffs), features = unitroot_ndiffs) |>
  pull(ndiffs) -> ndiffs

cat("The optimal box-cox lambda value for this series is", lambda,
    "\nThe optimal number of seasonal differences for this series is", nsdiffs,
    "\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.1761103 
## The optimal number of seasonal differences for this series is 1 
## The optimal number of first differences for this series is 0

Below we see that both seasonality and heteroscedasticity get removed with the given values. Yet there still seems to be a pattern left over with autocorrelated dips and peaks. Perhaps this is a cyclical pattern that is left which would be better removed with a first difference.

df |>
  transmute(
    `Turnover` = Turnover,
    `Box-Cox Turnover` = box_cox(Turnover, lambda),
    `Seasonal Differenced Box-Cox Turnover` = difference(box_cox(Turnover, lambda), lag = 12, differences = nsdiffs)
    ) |>
  pivot_longer(-Month, names_to="Type", values_to="Turnover") |>
  mutate(
    Type = factor(Type, levels = c(
      "Turnover",
      "Box-Cox Turnover",
      "Seasonal Differenced Box-Cox Turnover"
      ))
  ) |>
  ggplot(aes(x = Month, y = Turnover)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Souvenir Turnover", y = NULL)

This definitely seems much more like random noise compared to just a singular seasonal difference. Therefore we update our optimal orders. A teachable moment here about blindly following the purely mathematically optimal values.

df |>
  autoplot(
    difference(
      difference(box_cox(Turnover, lambda), lag = 12, differences = nsdiffs
      )
      ))

ndiffs <- 1

cat("The optimal box-cox lambda value for this series is", lambda,
    "\nThe optimal number of seasonal differences for this series is", nsdiffs,
    "\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.1761103 
## The optimal number of seasonal differences for this series is 1 
## The optimal number of first differences for this series is 1

9.6

Simulate and plot some data from simple ARIMA models.

For context of this problem, we generate and plot the original error distribution.

e <- rnorm(100)

tsibble(idx = seq_len(100),e=e, index = idx) |>
  autoplot(e)

  • Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0
y <- numeric(100)
spr <- seq(-1,1,0.4)
spr <- append(spr, 0, after=3)
sim <- tsibble(idx = seq_len(100), index = idx)
for(phi in spr){
  for(i in 2:100){
    y[i] <- phi*y[i-1] + e[i]
  }
  sim[,paste0('ph1_',phi)] <- y
}
  • Produce a time plot for the series. How does the plot change as you change ϕ1?

If we increase the magnitude of ϕ1 then the importance of the previous term keeps increasing making the points smoother and closer together. Once we hit ϕ1 = 1 then we now have a random walk. If we decrease the magnitude of ϕ1 the points have less influence on each other up until 0 where the data now is the purely random error term, aka white noise. What making ϕ1 negative does is simply increasing the chance of oscillatory behavior which is strongest at ϕ1 = -1 and when the error term is closer to 0.

sim |>
  gather(variable,value, `ph1_-1`:ph1_1, factor_key = TRUE) |>
  autoplot(value) + facet_wrap(~variable, scales = "free_y", ncol = 2)

  • Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.
y <- numeric(100)
spr <- seq(-1,1,0.4)
spr <- append(spr, 0, after=3)
sim <- tsibble(idx = seq_len(100), index = idx)
for(thet in spr){
  for(i in 2:100){
    y[i] <- thet*e[i-1] + e[i]
  }
  sim[,paste0('thet1_',thet)] <- y
}
  • Produce a time plot for the series. How does the plot change as you change θ1?

As the magnitude of θ1 decreases and gets closer to 0, the effect of the previous error from the distribution decreases and we will end up with the random white noise of the error distribution at 0. As θ1 increases in magnitude towards θ1 = -1 we end up with increasingly oscillatory behavior as the distance between errors get exaggerated. As θ1 increases in magnitude towards θ1 = 1 we end up with a smoothing effect where the series is less likely to drastically change in value between consecutive points.

sim |>
  gather(variable,value, `thet1_-1`:thet1_1, factor_key = TRUE) |>
  autoplot(value) + facet_wrap(~variable, scales = "free_y", ncol = 2)

  • Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.
y <- numeric(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), ARMA = y, index = idx)
  • Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)
y <- numeric(100)
for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
sim$AR2 <- y
  • Graph the latter two series and compare them.

The ARMA model generates a pretty normal looking stationary time series with values that don’t seem to stray too far off from our original error distribution besides an increased standard deviation. Our AR2 model uses the original error distribution as a base but grows in variance over time to completely eclipse the magnitude of values we had in our original error distribution. Additionally, the AR2 model seems to have extracted a seasonal pattern from where there was only data which also increases in variance over time.

par(mfrow=c(2,1))
sim |>
  autoplot(AR2)

sim |>
  autoplot(ARMA)

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

  • Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
(models <- aus_airpassengers |>
  model(
    ARIMA = ARIMA(Passengers),
    # Has a constant = with drift
    `ARIMA010` = ARIMA(Passengers ~ 1 + pdq(0,1,0)),
    # Has a constant = with drift
    `ARIMA212` = ARIMA(Passengers ~ 1 + pdq(2,1,2)),
    # Constant Removed
    `ARIMA212-C` = ARIMA(Passengers ~ 0 + pdq(2,1,2), method="ML"),
    # Constant Added
    `ARIMA021` = ARIMA(Passengers ~ 1 + pdq(0,2,1)),
  ))
## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.
## # A mable: 1 x 5
##            ARIMA                ARIMA010                ARIMA212   `ARIMA212-C`
##          <model>                 <model>                 <model>        <model>
## 1 <ARIMA(0,2,1)> <ARIMA(0,1,0) w/ drift> <ARIMA(2,1,2) w/ drift> <ARIMA(2,1,2)>
## # ℹ 1 more variable: ARIMA021 <model>
models|>
  report()
## Warning in report.mdl_df(models): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 5 × 8
##   .model     sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>       <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 ARIMA        4.31   -97.0  198.  198.  202. <cpl [0]> <cpl [1]>
## 2 ARIMA010     4.27   -98.2  200.  201.  204. <cpl [0]> <cpl [0]>
## 3 ARIMA212     4.03   -96.2  204.  207.  215. <cpl [2]> <cpl [2]>
## 4 ARIMA212-C   3.90   -97.2  204.  206.  214. <cpl [2]> <cpl [2]>
## 5 ARIMA021     3.85   -95.1  196.  197.  202. <cpl [0]> <cpl [1]>

We get an ARIMA(0,2,1) model.

models |>
  select(ARIMA)|>
  gg_tsresiduals()

The residuals mostly look like white noise, but there is increasing heteroscedasticity over time along with large outliers within our residuals.

models |>
  augment() |>
  filter(.model=='ARIMA') |>
  features(.innov, ljung_box, lag = 10, dof = 1)
## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 ARIMA     6.70     0.669

Utilizing a portmanteau test we can see that statistically the residuals are white noise.

models |>
  select(ARIMA)|>
  forecast(h=10) |>
  autoplot(aus_airpassengers)

These prediction intervals are varying in tens of millions after a few years, thus these forecasts seem unreliable.

  • Write the model in terms of the backshift operator.
models |>
  select(ARIMA)|>
  report()
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

(1−B)2(y_t−μt2/2!) = (1-0.8963B)ε_t

Note the absence of the constant c because d > 1.

  • Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
models |>
  select(ARIMA, ARIMA010)|>
  forecast(h=10) |>
  autoplot(aus_airpassengers)

The ARIMA(0,1,0) model is less optimistic with its estimate than the auto-selected ARIMA(0,2,1) model. The 010 model is simply a random walk with drift where the inclusion of the constant tends to have the model increase. A simple model like this still has a very close AICc of 200 to the auto generated model’s AICc of 198. Telling us that it is still worth considering simple models if computation is a bottleneck.

  • Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
models |>
  select(ARIMA, ARIMA010, ARIMA212)|>
  forecast(h=10) |>
  autoplot(aus_airpassengers)

The ARIMA(2,1,2) model very closely resembles the ARIMA(0,1,0) model except the forecasts are not linear and instead very slightly sinusoidal while increasing in value. The AICc is worse than the previous two models at 204 even with the increasing complexity. Telling us that more complex isn’t necessarily better.

The constant can not be removed by default unless we force MLE calculation of the model over CSS. This means that removing the constant makes the coefficients non-stationary through CSS calculation.

models |>
  select(ARIMA, ARIMA010,`ARIMA212-C`)|>
  forecast(h=10) |>
  autoplot(aus_airpassengers)

Removing the constant both straightens out the forecasts and makes it much more optimistic than the previous 212 model forecasts. It is also the most optimistic model so far beating out the auto ARIMA model. Removing the constant still leaves us with a similar AICc of 204, so it doesn’t make sense to use the 212 model in any capacity here.

  • Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
models |>
  select(ARIMA, ARIMA021)|>
  forecast(h=10) |>
  autoplot(aus_airpassengers)

Again here we have more optimistic forecasts than the auto ARIMA(0,2,1) model. However, checking the reports we see that this model with a constant actually beats out the auto ARIMA model. The reason for this being constants are not included in the automatic algorithm when d>1. Thus, it might be worth considering making slight changes to your auto modeled ARIMA model to improve it.

9.8

For the United States GDP series (from global_economy):

df <-
  global_economy |>
  filter(Code == "USA") |>
  select(GDP)

autoplot(df, GDP)

  • if necessary, find a suitable Box-Cox transformation for the data;

This should be box-cox transformed as it seems to be increasing geometrically.

Below we use a familiar algorithm to find the optimal Box-Cox lambda.

df |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero) -> lambda

df |>
  features(box_cox(GDP, lambda), features = unitroot_ndiffs) |>
  pull(ndiffs) -> ndiffs

cat("The optimal box-cox lambda value for this series is", lambda,
    "\nThe optimal number of first differences for this series is", ndiffs)
## The optimal box-cox lambda value for this series is 0.2819443 
## The optimal number of first differences for this series is 1
df |>
  mutate(GDP_Box = box_cox(GDP,lambda)) -> df
  
autoplot(df,GDP_Box)

We have successfully reduced the geometric pattern after box-cox transforming.

df |>
  gg_tsdisplay(GDP_Box, plot_type = "partial")

  • fit a suitable ARIMA model to the transformed data using ARIMA();

The automatically selected ARIMA model is an ARIMA(1,1,0) with drift.

(models <- df |>
  model(
    ARIMA = ARIMA(box_cox(GDP,lambda)),
    ARIMA_Manual = ARIMA(box_cox(GDP,lambda) ~ 0 + pdq(1,1,0)),
    ARIMA_SemiAuto = ARIMA(box_cox(GDP,lambda) ~ pdq(p=1,d=2:4,q=0:4)),
    ARIMA_FullAuto = ARIMA(box_cox(GDP,lambda), greedy = FALSE, approximation = FALSE, stepwise = FALSE)
))
## # A mable: 1 x 4
##                     ARIMA   ARIMA_Manual ARIMA_SemiAuto          ARIMA_FullAuto
##                   <model>        <model>        <model>                 <model>
## 1 <ARIMA(1,1,0) w/ drift> <ARIMA(1,1,0)> <ARIMA(1,2,1)> <ARIMA(1,1,0) w/ drift>
models |>
  glance()
## # A tibble: 4 × 8
##   .model         sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>           <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 ARIMA           5479.   -325.  657.  657.  663. <cpl [1]> <cpl [0]>
## 2 ARIMA_Manual    7038.   -334.  672.  672.  676. <cpl [1]> <cpl [0]>
## 3 ARIMA_SemiAuto  5761.   -321.  648.  649.  655. <cpl [1]> <cpl [1]>
## 4 ARIMA_FullAuto  5479.   -325.  657.  657.  663. <cpl [1]> <cpl [0]>
  • try some other plausible models by experimenting with the orders chosen;

We try a few other options for modeling ARIMA that would be plausible based on the ACF and PACF, but we are limited with a (1,1,X) model already seeming to be optimal from our autocorrelation plots and unit roots.

Having ARIMA move through the set space we give it, seems to give a better AICc than the automatically selected model. However, we must consider that since we have a different differential order, AICc is not directly comparable.

  • choose what you think is the best model and check the residual diagnostics;

Thus, we stick with the automatically produced model.

models |>
  select(ARIMA) |>
  gg_tsresiduals()

Checking the residual diagnostics there is not any significant correlation, but there are very large outliers that make the distribution skewed heavily left. So, we switch to looking at the ARIMA(1,2,1) that was generated from set searching.

models |>
  select(ARIMA_SemiAuto) |>
  gg_tsresiduals()

Unfortunately, the residual distribution and graph largely looks the same. So, we will have to continue with the knowledge that our residuals indicate that this model is not utilizing all the information available in the series.

  • produce forecasts of your fitted model. Do the forecasts look reasonable?
models |>
  select(ARIMA) |>
  forecast(h=10) |>
  autoplot(df)

The forecasts do seem like they follow the overall trend of the series.

  • compare the results with what you would obtain using ETS() (with no transformation).
models <- df |>
  model(
    ARIMA = ARIMA(box_cox(GDP,lambda)),
    ETS = ETS(GDP)
)

models |>
  glance()
## # A tibble: 2 × 11
##   .model    sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots      MSE     AMSE
##   <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>      <dbl>    <dbl>
## 1 ARIMA    5.48e+3   -325.  657.  657.  663. <cpl>    <cpl>    NA       NA      
## 2 ETS      6.78e-4  -1590. 3191. 3192. 3201. <NULL>   <NULL>    2.79e22  1.20e23
## # ℹ 1 more variable: MAE <dbl>
models |>
  forecast(h=10) |>
  autoplot(df)

Visually the ETS model seems to continue the trend better, however the prediction interval is extremely large which leaves us with less confidence in this model. Let’s compare accuracy over a five year period to better compare these two models.

(models <- df |>
  filter_index(~ "2012") |>
  model(
    ARIMA = ARIMA(box_cox(GDP,lambda)),
    ETS = ETS(GDP)
  ))
## # A mable: 1 x 2
##                     ARIMA          ETS
##                   <model>      <model>
## 1 <ARIMA(1,1,0) w/ drift> <ETS(M,A,N)>
models |>
  forecast(h = 5) |>
  accuracy(df)
## # A tibble: 2 × 10
##   .model .type       ME          RMSE          MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>    <dbl>         <dbl>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA  Test  -2.18e11 249926621810.      2.18e11 -1.18  1.18 0.699 0.666 0.400
## 2 ETS    Test   2.91e11 345234202897.      2.91e11  1.57  1.57 0.935 0.920 0.191

We can confirm here that our RMSE is better with the ARIMA model over the ETS model, thus making it a better fit. However, we didn’t attempt to optimize the ETS model at all.