DATA 624: Homework 5

Exercise 8.1

First, let’s construct our time series for pigs slaughtere din Victoria and plot

victoria_pigs <- aus_livestock %>% filter(Animal == "Pigs" & State == "Victoria")

victoria_pigs %>% autoplot(Count)

There doesn’t appread to be strong seasonality or a general overall trend in this time series, so simple exponential smoothing makes sense to use. Now, let’s construct a model via the ETS function for SES. We’ll mirror the techniques used in Hyndman

# Fit model with only an error term to reflect SES
fit <- victoria_pigs %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))
fc <- fit |>
  forecast(h = 4)

Plotting our forecast over our time series gives us the following

# Plot victorian pigs model
fc |>
  autoplot(victoria_pigs) +
  geom_line(aes(y = .fitted), col="#D55E00",
            data = augment(fit)) +
  labs(y="Number of Pigs", title="Victorian Pigs Slaughtered") +
  guides(colour = "none")

To find the optimized values fo \(\alpha\) and \(l_0\), we can use the report function on our fit mable. We see optimized values for each parameter in our SES model below:

fit %>% report()

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

Exercise 8.5

We can look at the export data from France

# Add fill_gaps for later pltting/forecasting
france <- global_economy %>% filter(Country == "France") %>% fill_gaps()
head(france)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country Code   Year           GDP Growth   CPI Imports Exports Population
##   <fct>   <fct> <dbl>         <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 France  FRA    1960  62651474947.  NA     10.4    12.6    14.4   46814237
## 2 France  FRA    1961  68346741504.   5.51  10.7    12.3    13.9   47444751
## 3 France  FRA    1962  76313782252.   6.67  11.3    12.1    12.8   48119649
## 4 France  FRA    1963  85551113767.   5.35  11.8    12.5    12.6   48803680
## 5 France  FRA    1964  94906593388.   6.52  12.2    13.0    12.6   49449403
## 6 France  FRA    1965 102160571409.   4.78  12.5    12.6    13.2   50023774

france %>% autoplot(Exports)

While this data isn’t strongly seasonal (no noticeable patterns), there is a general upward trend in this data that may need to be accounted for.

Let’s fit an \((A, N, N)\) ETS model to this dataset and forecast out 5 years

france_fit <- france %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))
france_fc <- france_fit %>%
  forecast(h = 5)

france_fc %>% autoplot(france) +
  # geom_line(aes(y = .fitted), color="#456fd4" ,
  #           data = augment(france_fit)) +
  labs(y="Exports", title="French Export Forecast") +
  guides(colour = "none")

Now we’cc ompute RMSE values for the training data ismilar to the method used in Hyndman

# Get accuracy, including RMSE
acc_ANN <- france_fit %>% accuracy()
acc_ANN

## # A tibble: 1 × 11
##   Country .model        .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <fct>   <chr>         <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 France  "ETS(Exports… Trai… 0.284  1.15 0.840  1.17  3.86 0.983 0.991 -0.00542

We see an RMSE of 1.152 on our trainging data (the france time series). We can compare this accuracy to an \((A, A, N)\) model on the same

france_fit_AAN <- france %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))
france_fc_AAN <- france_fit_AAN %>%
  forecast(h = 5)

# Display both accuracies
acc_AAN <- france_fit_AAN %>% accuracy()
rbind(acc_ANN, acc_AAN)

## # A tibble: 2 × 11
##   Country .model     .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE     ACF1
##   <fct>   <chr>      <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 France  "ETS(Expo… Trai…  0.284   1.15 0.840  1.17   3.86 0.983 0.991 -0.00542
## 2 France  "ETS(Expo… Trai… -0.0111  1.12 0.800 -0.243  3.81 0.936 0.964  0.0264

With an \((A, A, N)\) ETS model we see an improvement in our RMSE value. In this case a model with an (additive) trend term makes sense as the export series itself does have a general upward trend. In addition, the trend forecast (below with prediction intervals) does continue a general increase, which could be a resonable assumption for export data (generally, GDP/Export numbers will increase YoY). However, the non-trended forecaste does incorporate events that would break the upward trend (recession, etc.) better, and may be more realistic. In this case, a damped trended forecast could be a good “middle ground”

france_fc_AAN %>% autoplot(france)

##### Prediction Intervals

Our general formula for a prediction interval is:

\[\begin{aligned} \hat{y}_{T+h|T} = c\sigma_h \end{aligned}\]

Where \(c=1.96\) as our Z-score value for a 95% confidence interval, and \(\signa_h\) is our forecast variance (given by the Hyndman text here)

c = 1.96
# Get residuals of ANN fit
france_ANN_aug <- france_fit %>% augment()
var_ann <- var(france_ANN_aug$.resid)

france_ANN_aug$interval_lower <- france_ANN_aug$.fitted - c * var_ann
france_ANN_aug$interval_upper <- france_ANN_aug$.fitted + c * var_ann

# Get residuals of AAN fit
france_AAN_aug <- france_fit_AAN %>% augment()
var_aan <- var(france_AAN_aug$.resid)

# Calculate values
france_AAN_aug$interval_lower <- france_AAN_aug$.fitted - c * var_aan
france_AAN_aug$interval_upper <- france_AAN_aug$.fitted + c * var_aan
head(france_AAN_aug)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country, .model [1]
##   Country .model               Year Exports .fitted .resid .innov interval_lower
##   <fct>   <chr>               <dbl>   <dbl>   <dbl>  <dbl>  <dbl>          <dbl>
## 1 France  "ETS(Exports ~ err…  1960    14.4    13.7  0.711  0.711           11.2
## 2 France  "ETS(Exports ~ err…  1961    13.9    14.7 -0.752 -0.752           12.2
## 3 France  "ETS(Exports ~ err…  1962    12.8    14.3 -1.44  -1.44            11.8
## 4 France  "ETS(Exports ~ err…  1963    12.6    13.2 -0.625 -0.625           10.7
## 5 France  "ETS(Exports ~ err…  1964    12.6    12.9 -0.304 -0.304           10.4
## 6 France  "ETS(Exports ~ err…  1965    13.2    13.0  0.265  0.265           10.5
## # ℹ 1 more variable: interval_upper <dbl>

Alternatively, e can use the hilo method as well to construct a 95% confidence interval

france_fc_AAN %>% hilo(level = c(95))

## # A tsibble: 5 x 6 [1Y]
## # Key:       Country, .model [1]
##   Country .model                    Year    Exports .mean                  `95%`
##   <fct>   <chr>                    <dbl>     <dist> <dbl>                 <hilo>
## 1 France  "ETS(Exports ~ error(\"…  2018 N(31, 1.3)  31.2 [28.89915, 33.44901]95
## 2 France  "ETS(Exports ~ error(\"…  2019 N(31, 2.6)  31.5 [28.35198, 34.62019]95
## 3 France  "ETS(Exports ~ error(\"…  2020 N(32, 3.8)  31.8 [27.99403, 35.60215]95
## 4 France  "ETS(Exports ~ error(\"…  2021   N(32, 5)  32.1 [27.73743, 36.48275]95
## 5 France  "ETS(Exports ~ error(\"…  2022 N(32, 6.2)  32.4 [27.54660, 37.29758]95

Exercise 8.6

china <- global_economy %>% filter(Country == "China")

china_fit <- china %>% model(
  `(A, A, N)` = ETS(Exports ~ error("A") + trend("A") + season("N")),
  `(A, N, N)` = ETS(Exports ~ error("A") + trend("N") + season("N")),
   `Damped Trend (A, A, N)` = ETS(Exports ~ error("A") +
                                trend("Ad", phi = 0.9) + season("N"))
  )
china_fc<- china_fit %>%
  forecast(h = 20)

china_fc %>% autoplot(china) + labs(y="Exports", title="Forecasts of Chinese exports: 2018 - 2038")

Looking at this forecast of 20 years out, we see that the damped method (blue) tends to have a less “extreme” impact than the forecast with a trend. This helps to account for reversals in the trend that the naive additive method (with a trend term) does not always account for.

Exercise 8.7

aus_gas <- aus_production %>% select(Gas)
aus_gas %>% autoplot(Gas)

In this case, a multiplicative model would be best since the variance of our seasonality changes with time.

gas_fit <- aus_gas %>% 
    model(ETS(Gas ~ error("A") + trend("A") + season("M")))


gas_fc <- gas_fit %>% forecast(h=4)

# Plot gas forecast
gas_fc %>% autoplot(aus_gas)

china_aug <- china_fit %>% augment()
var(china_aug$.resid)

## [1] 3.589456

Exercise 8.8

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

# Plot retail data
myseries %>% autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

In this case, multiplicative seasonality since the seasonal variance increases with time.

Fitting Holt-Winters’ multiplicative model to the retail data, we see the following:

fit <- myseries |>
  model(
    multiplicative = ETS(Turnover ~ error("M") + trend("A") +
                                                season("M")),
    damped = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
  )
fc <- fit |> forecast(h = "5 years")
fc |>
  autoplot(myseries) +
  labs(title="Australian Retail Turnover",
       y="Turnover") +
  guides(colour = guide_legend(title = "Forecast"))

With a damped trend (orange) we see a flattening of the forecast level with time, while the reegular trend method from Holt-Winters continues to increase in level. We can compare the RMSE of the two methods via the accuracy function:

fit %>% accuracy()

## # A tibble: 2 × 12
##   State      Industry .model .type      ME  RMSE   MAE     MPE  MAPE  MASE RMSSE
##   <chr>      <chr>    <chr>  <chr>   <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 Northern … Clothin… multi… Trai… -0.0128 0.613 0.450 -0.469   5.15 0.513 0.529
## 2 Northern … Clothin… damped Trai…  0.0398 0.616 0.444 -0.0723  5.06 0.507 0.531
## # ℹ 1 more variable: ACF1 <dbl>

We see a slightly better performance in terms of RMSE from the damped method as opposed to the regular trended method. We can check the residuals from the aumented method with our fit passed

augmented <- fit %>% augment()
hist(augmented$.resid)

train <- myseries %>% filter(Month < yearmonth("Jan 2011"))

# Train a model on new test set with damped method
fit_train <- train |> model(
    damped = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
  )

prediction_interval <- "10 years"
fc <- fit_train |> forecast(h = prediction_interval)
fc |>
  autoplot(train) +
  labs(title="Australian Retail Turnover",
       y="Turnover") +
  guides(colour = guide_legend(title = "Forecast"))

Now let’s look at the RMSE of our fit trained up until 2011, we see a better RMSE than our other model from exercise 5.7 (\(0.51 < 1.51\))

fit_train %>% accuracy()

## # A tibble: 1 × 12
##   State  Industry .model .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>  <chr>    <chr>  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 North… Clothin… damped Trai… 0.0357 0.519 0.392 0.158  5.34 0.428 0.427 0.0233

Exercise 8.9

We can transform our retail timeseries via the same method applied

lambda <- train %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# Transform turnover series based on returned lambda
train$turnover_bc <- box_cox(train$Turnover, lambda=lambda)

train %>% autoplot(turnover_bc) + labs(title="Box-Cox transformed Australian Retail Turnover")

Now that we have our Box-Cox transformed series, we

# Fit an STL
fit_bc_stl <- train %>% model(
  STL(turnover_bc)
)

comp_bc <- components(fit_bc_stl) 
comp_bc %>% autoplot()

Now we can apply ETS on the seasonally-adjusted data from our comp variable

fit_bc_ets <- comp_bc %>% as_tsibble() %>%
  model(
    seasonal_adjustment = ETS(season_adjust ~ error("A") + trend("A"))
  ) %>% dplyr::select(-c(.model))

# forecast based on our box-cox transformed seasonally-adjusted data
fc_bc <- fit_bc_ets %>% 
  forecast(h=prediction_interval)

fit_bc_ets %>% accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type      ME   RMSE    MAE     MPE  MAPE  MASE RMSSE
##   <chr>    <chr>    <chr>  <chr>   <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… seaso… Trai… 1.21e-4 0.0890 0.0684 -0.0473  3.16 0.363 0.354
## # ℹ 1 more variable: ACF1 <dbl>

Here we see a much lower RMSE value of 0.089. This is an improvement on the models above