Homework 5

8.1

8.1.1

By using the ETS function with an additive error and no trend or season we model simple exponential smoothing. The optimal α is 0.3221247 and the optimal l[0] is 100646.6 for this model.

vic_pigs <- aus_livestock |> 
  filter(Animal == "Pigs", State == "Victoria") |>
  select(Month, Count)

pig_model <- vic_pigs |>
  model(ETS(Count ~ error("A") + trend("N") + season("N"))) 

pig_model |>
  report()

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

Below are the forecasts for the next four months,

pig_forecast <- pig_model |>
  forecast(h = 4)

pig_forecast |>
  autoplot(vic_pigs |> filter(year(Month) >= 2016))

8.1.2

Looking at the calculated intervals vs. the generated intervals, the calculated is extremely close to the interval of the first month forecast, being ever so slightly tighter than the generated. With subsequent generated predictions the interval gradually increases so the difference between the calculated and generated increases.

# Mean is the same for all values
y = pig_forecast$.mean[1]

# Getting residuals and other stuff
augmented <- augment(pig_model)

# Get sd from residuals
s_d <- sd(augmented$.resid)

# 95% intervals
upper <- y + (s_d * 1.96)
lower <- y - (s_d * 1.96)

upper

## [1] 113502.1

lower

## [1] 76871.01

pig_forecast |>
  hilo(95) |>
  select(`95%`)

## # A tsibble: 4 x 2 [1M]
##                    `95%`    Month
##                   <hilo>    <mth>
## 1 [76854.79, 113518.3]95 2019 Jan
## 2 [75927.17, 114445.9]95 2019 Feb
## 3 [75042.22, 115330.9]95 2019 Mar
## 4 [74194.54, 116178.6]95 2019 Apr

8.5

8.5.1

Looking at the exports for Denmark we can see exports are relatively flat from 1960-1980 before spiking over a short span and stagnating again at the new height until the end of the 90s. From then on the exports have a strong positive trend. There are dips in exports corresponding to various financial crises such as in 2008 but virtually none of these declines last more than a couple years.

# Using Denmark as the country
denmark_exports <- global_economy |> 
  filter(Country == "Denmark") |>
  select(Year, Exports)

denmark_exports |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = Exports`

8.5.2

# Doing model and forecast as variables to make later steps easy
den_model <- denmark_exports |>
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))

den_forecast <- den_model |>
  forecast(h = 5)

den_forecast |>
  autoplot(denmark_exports) +
  geom_line(aes(y = .fitted), col="red",
            data = augment(den_model))

8.5.3

den_model |>
  accuracy() |>
  pull(RMSE)

## [1] 2.007573

8.5.4

Both models have very similar error values in virtually every error metric, with relatively significant differences only arising in ME and MPE. Given the general lack of difference in the error metrics it would be difficult to recommend one method over the other based solely on these metrics. Looking at the trend of the data we can see that the periods of stagnation in Denmark’s exports are largely at the beginning of the period and the general trend of data since the 90s has a strong positive trend it seems reasonable to choose the AAN model, however the relatively flat exports from 2010 onward combined with the frequent periods of stagnation make a case for the ANN model.

compare_models <- denmark_exports |>
  model(
    simple = ETS(Exports ~ error("A") + trend("N") + season("N")),
    trend = ETS(Exports ~ error("A") + trend("A") + season("N"))
  )

accuracy(compare_models)

## # A tibble: 2 × 10
##   .model .type         ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <chr>  <chr>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 simple Training  0.395   2.01  1.41  0.796  3.69 0.983 0.991 0.0503
## 2 trend  Training -0.0264  1.97  1.43 -0.380  3.83 0.993 0.972 0.0521

8.5.5

Plotting the two forecasts together does not really give us a better idea of which to choose. Ultimately the choice of model comes down less to performance metrics and more to how one predicts Denmark’s economy will perform. It seems reasonable to anticipate continued growth, so I would choose the AAN model. The difficulty in choosing largely comes down to the fact the exports flatten at the end of the period, increasing the merit of the ANN model.

den_model_2 <- denmark_exports |>
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

den_forecast_2 <- den_model_2 |>
  forecast(h = 5)

den_forecast_2 |>
  autoplot(denmark_exports) +
  autolayer(den_forecast, alpha = 0.7, color = "orange") +
  autolayer(den_forecast_2, alpha = 0.7, color = "blue")

## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.

8.5.6

The calculated intervals are virtually identical to the generated interval for the ANN model, but the AAN calculations are about a tenth further from the generated intervals. Ultimately the difference is so small as to be irrelevant.

# different from earlier, need to use rmse
rmse <- accuracy(compare_models) |>
  select(.model, RMSE)

simple_rmse <- rmse |>
  filter(.model == "simple") |>
  pull(RMSE)
trend_rmse <- rmse |>
  filter(.model == "trend") |>
  pull(RMSE)

simple_mean <- den_forecast |>
  filter(Year == 2018) |>
  pull(.mean)
trend_mean <- den_forecast_2 |>
  filter(Year == 2018) |>
  pull(.mean)

simple_95 <- c(simple_mean - 2 * simple_rmse, simple_mean + 2 * simple_rmse)
trend_95 <- c(trend_mean - 2 * trend_rmse, trend_mean + 2 * trend_rmse)

simple_95

## [1] 51.19479 59.22508

trend_95

## [1] 51.69044 59.56762

den_forecast |>
  hilo(95) |>
  select(`95%`)

## # A tsibble: 5 x 2 [1Y]
##                    `95%`  Year
##                   <hilo> <dbl>
## 1 [51.20551, 59.21435]95  2018
## 2 [49.54712, 60.87275]95  2019
## 3 [48.27454, 62.14532]95  2020
## 4 [47.20170, 63.21816]95  2021
## 5 [46.25650, 64.16336]95  2022

den_forecast_2 |>
  hilo(95) |>
  select(`95%`)

## # A tsibble: 5 x 2 [1Y]
##                    `95%`  Year
##                   <hilo> <dbl>
## 1 [51.62888, 59.62918]95  2018
## 2 [50.39102, 61.70514]95  2019
## 3 [49.53844, 63.39582]95  2020
## 4 [48.88529, 64.88708]95  2021
## 5 [48.35956, 66.25091]95  2022

8.6

We can see from a simple plot of the data there is no seasonal component, so we will omit that from the models we test.

forecast_window <- 20

chinese_gdp <- global_economy |>
  filter(Country == "China") |>
  select(Year, GDP)

chinese_gdp |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = GDP`

Looking at the following charts we can clearly see the multiplicative models have significantly larger confidence intervals, and we can also see the effect of using a damped trend. Applying a box-cox transformation makes the damping component much more significant for the additive models, and the multiplicative models look significantly better with much smaller prediction intervals. When transformed the additive and multiplicative models look very similar.

chinese_gdp |>
  model(
    additive = ETS(GDP ~ error("A") + trend("A") + season("N")),
    add_damp = ETS(GDP ~ error("A") + trend("Ad") + season("N"))
    ) |>
  forecast(h = forecast_window) |>
  autoplot(chinese_gdp) + 
  labs(title = "Additive Models")

forecast_window <- 20

chinese_gdp <- global_economy |>
  filter(Country == "China") |>
  select(Year, GDP)

chinese_gdp |>
  model(
    multiplicative = ETS(GDP ~ error("M") + trend("A") + season("N")),
    multiplicative_damp = ETS(GDP ~ error("M") + trend("Ad") + season("N"))
    ) |>
  forecast(h = forecast_window) |>
  autoplot(chinese_gdp) + 
  labs(title = "Multiplicative Models")

lambda_gdp <- chinese_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

trans_chinese_gdp <- chinese_gdp |>
  mutate(trans_GDP = box_cox(GDP, lambda = lambda_gdp)) |>
  select(Year, trans_GDP)

trans_chinese_gdp |>
  model(
    additive = ETS(trans_GDP ~ error("A") + trend("A") + season("N")),
    add_damp = ETS(trans_GDP ~ error("A") + trend("Ad") + season("N"))
    ) |>
  forecast(h = forecast_window) |>
  autoplot(trans_chinese_gdp) +
  labs(title = "Additive Models with Box-Cox Transformation")

lambda_gdp <- chinese_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

trans_chinese_gdp <- chinese_gdp |>
  mutate(trans_GDP = box_cox(GDP, lambda = lambda_gdp)) |>
  select(Year, trans_GDP)

trans_chinese_gdp |>
  model(
    additive = ETS(trans_GDP ~ error("M") + trend("A") + season("N")),
    add_damp = ETS(trans_GDP ~ error("M") + trend("Ad") + season("N"))
    ) |>
  forecast(h = forecast_window) |>
  autoplot(trans_chinese_gdp) +
  labs(title = "Multiplicative Models with Box-Cox Transformation")

8.7

We are going to let the ETS model pick its own best parameters for this case, we can see it went with a MAM model. The multiplicative seasonality is necessary as the seasonal component is increasing with time.

aus_gas <- aus_production |>
  select(Gas)

gas_model <- aus_gas |>
  model(ETS(Gas))

gas_model

## # A mable: 1 x 1
##     `ETS(Gas)`
##        <model>
## 1 <ETS(M,A,M)>

gas_model |>
  forecast(h = 12) |>
  autoplot(aus_gas)

If we add a damped trend the forecast hardly changes over the prediction period, which is 12 quarters. I would not say it improves the forecast in this case.

gas_model_damped <- aus_gas |>
  model(ETS(Gas ~ error("M") + trend("Ad") + season("M")))

gas_model_damped |>
  forecast(h = 12) |>
  autoplot(aus_gas)

8.8

8.8.1

The retail series is charted below for context. We can see the series has an increasing seasonality as time goes on which necesitates a multiplicative seasonality when using ETS models.

set.seed(38472)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

8.8.2

For the period in question there is not a huge difference between the two methods, however we can begin to see the effects of the damping trend in the later periods.

retail_model <- myseries |>
  model(
    holt_winters_mult = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    holt_winters_damped = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
    )

retail_forecast <- retail_model |>
  forecast(h = 36)

retail_forecast |>
  autoplot(myseries |> filter(Month > ym("2010-01")))

## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `Month > ym("2010-01")`.
## Caused by warning:
## ! Incompatible methods (">.vctrs_vctr", ">.Date") for ">"

8.8.3

The RMSE for the two models is very close, as we would expect given similarity in their plots. Overall I would prefer the damped model as the trend does appear to be leveling off in the retail data.

accuracy(retail_model) |> select(.model, RMSE)

## # A tibble: 2 × 2
##   .model               RMSE
##   <chr>               <dbl>
## 1 holt_winters_mult    4.29
## 2 holt_winters_damped  4.27

8.8.4

Looking at the residuals, and particularly the acf plot it appears the residuals do not look like white noise as too many values are significant and there does appear to be a pattern in the acf.

retail_model |>
  select(holt_winters_damped) |>
  gg_tsresiduals()

8.8.5

Both the damped and undamped holt-winters methods far outperform the seasonal naive, with the damped doing slightly better than the undamped.

training <- myseries |>
  filter(year(Month) < 2011)

training_model <- training |>
  model(
    holt_winters_mult = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    holt_winters_damped = ETS(Turnover ~ error("M") + trend("Ad") + season("M")),
    s_naive = SNAIVE(Turnover)
    )

accuracy(training_model) |>
  select(.model, RMSE)

## # A tibble: 3 × 2
##   .model               RMSE
##   <chr>               <dbl>
## 1 holt_winters_mult    3.92
## 2 holt_winters_damped  3.67
## 3 s_naive              6.50

8.9

The transformed models drastically outperform the untransformed models when looking at RMSE and MAPE. The difference is similar to that between the Holt-Winters ETS methods and the Naive method.

lambda <- training |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

transformed <- training |>
  mutate(trans_turn = box_cox(Turnover, lambda))

stl_model <- transformed |>
  model(
    stl_trans = STL(trans_turn ~ season(window = "periodic"))
    )

seasonally_adjusted <- stl_model |>
  components() |>
  select(-.model)

season_adj_model <- seasonally_adjusted |>
  model(ETS(season_adjust))

prev_best <- training |>
  model(
    holt_winters_damped = ETS(Turnover ~ error("M") + trend("Ad") + season("M")),
    )

rbind(accuracy(stl_model), accuracy(season_adj_model), accuracy(prev_best))

## # A tibble: 3 × 12
##   State  Industry .model .type       ME   RMSE    MAE      MPE  MAPE  MASE RMSSE
##   <chr>  <chr>    <chr>  <chr>    <dbl>  <dbl>  <dbl>    <dbl> <dbl> <dbl> <dbl>
## 1 Queen… Footwea… stl_t… Trai… -1.11e-4 0.0351 0.0268 -0.0140  0.757 0.354 0.362
## 2 Queen… Footwea… ETS(s… Trai… -1.03e-4 0.0415 0.0315  0.00131 0.895 0.417 0.428
## 3 Queen… Footwea… holt_… Trai…  3.23e-1 3.67   2.20    0.364   3.75  0.464 0.564
## # ℹ 1 more variable: ACF1 <dbl>