Data 624 Homework 5

library(fpp3)
library(tidyverse)

Question 1

Consider the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

A

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.

#?aus_livestock
pig_fit <- aus_livestock %>%
  filter(State == "Victoria") %>%
  filter(Animal == "Pigs") %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N"))) 

pig_fc <- pig_fit %>%
  forecast(h = 4)

pig_fc %>%
  autoplot(aus_livestock) +
  ggtitle("The Number of Pigs Slaughtered in Victoria")

report(pig_fit)
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

The optimal values for α is 0.322124 and for ℓ0 is 100646.6

B

Compute a 95% prediction interval for the first forecast using ^y±1.96 s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

# R
pig_fc %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [76854.79, 113518.3]95
# Calculated
mean <- 95186.56
s <- sqrt(87480760)
z <- 1.96
margin <- z * s
lower <- mean - margin
upper <- mean + margin

paste(lower, upper)
## [1] "76854.4546212935 113518.665378707"

As shown above we can see that the 95% prediction interval calculated by hand and by R are only off by a small margin of .34 on the lower and .33 on the upper. This shows that they are basically identical as it is an error rate of less than 0.01%.

Question 5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

data("global_economy")
A

Plot the Exports series and discuss the main features of the data.

#?global_economy
global_economy %>%
  filter(Country == "Norway") %>%
  autoplot(Exports) +
  labs(y="% of GDP", title="Norway Annual Exports")

Looking at the Norwegian Exports, we can see that there was a consistency of around 35-40% of the GDP being in exports. With a short term spike in around early 1980’s followed by a sharp decrease in the late 1980s. From then there seems to be a steady increase of Export’s percentage share of the GDP to peak around 2008. From then we see a heavy decrease until mid 2010s. It seems like the heavy decreases are following global economics decrease, with the late 1980s recession, 2000 .com burst and the housing crisis/recession period in 2008.

B

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

Nw_ft <- global_economy %>%
  filter(Country == "Norway") %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N"))) 

Nw_fc <- Nw_ft %>%
  forecast(h = 6)

Nw_fc %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Norway Annual Exports", subtitle = "ETS(A,N,N)")

C

Compute the RMSE values for the training data.

accuracy(Nw_ft)$RMSE
## [1] 2.485073
D.

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

Nw_ft2 <- global_economy %>%
  filter(Country == "Norway") %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

Nw_fc2 <- Nw_ft2 %>%
  forecast(h = 6)

Nw_fc2 %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Norway Annual Exports", subtitle = "ETS(A,A,N)")

accuracy(Nw_ft2)$RMSE
## [1] 2.489792

The difference between ETS(A,A,N) and ETS(A,N,N), is the A/N which signals the trend component of the model. A being for additive showcasing that there is a linear trend within the data compared to N being for none meaning that there was no specific trend within the data.

E

Compare the forecasts from both methods. Which do you think is best?

I think for this particular dataset, ETS(A,N,N) is better. In general there doesn’t seem to be a trend within this timeframe from the data. As the Root Mean Square Error is roughly the same for both forecast but slightly smaller in the ETS(A,N,N) at 2.485073.

F

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

#head(Nw_fc)
#head(Nw_fc2)

# R Calculated
Nw_fc %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [30.51542, 40.42916]95
# Hand Calculated 
mean_fc1 <- 35.5
s_fc1 <- sqrt(7.5)
z_fc1 <- 1.96
margin_fc1 <- z_fc1 * s_fc1
lower_fc1 <- mean_fc1 - margin_fc1
upper_fc1 <- mean_fc1 + margin_fc1

paste(lower_fc1, upper_fc1)
## [1] "30.1323189364494 40.8676810635506"
# R Calculated
Nw_fc2 %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [30.34156, 40.45638]95
# Hand Calculated 
mean_fc2 <- 35.4
s_fc2 <- sqrt(7.5)
z_fc2 <- 1.96
margin_fc2 <- z_fc2 * s_fc2
lower_fc2 <- mean_fc2 - margin_fc2
upper_fc2 <- mean_fc2 + margin_fc2
paste(lower_fc2, upper_fc2)
## [1] "30.0323189364494 40.7676810635506"

As we can see the calculated interval, the ranges on the hand calculated are slightly larger compared to the R calculated.

Question 6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts. [Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

Lambda_china <- global_economy %>%
  filter(Country == "China") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

Ch_fit <- global_economy %>%
  filter(Country == "China") %>%
  model(`Standard` = ETS(GDP ~ error("A") + trend("N") + season("N")),
        `Holt's method` = ETS(GDP ~ error("A") + trend("A") + season("N")),
        `Damped Holt's method` = ETS(GDP ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
        `Box-Cox` = ETS(box_cox(GDP,Lambda_china) ~ error("A") + trend("Ad") + season("N")),
        `Damped Box-Cox` = ETS(box_cox(GDP,Lambda_china) ~ error("A") + trend("Ad", phi = 0.8) + season("N"))) 

Ch_fc <- Ch_fit %>%
  forecast(h = 20)

Ch_fc %>%
  autoplot(global_economy, level = NULL) +
  labs(title="Chinese GDP") +
  guides(colour = guide_legend(title = "Forecast"))

Question 7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

fit_gas <- aus_production %>%
  model(`Additive` = ETS(Gas ~ error("A") + trend("A") + season("A")),
        `Multiplicative` = ETS(Gas ~ error("M") + trend("A") + season("M")),
        `Damped Multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M"))) 

aus_production %>%
  model(`Additive` = ETS(Gas ~ error("A") + trend("A") + season("A")),
        `Multiplicative` = ETS(Gas ~ error("M") + trend("A") + season("M")),
        `Damped Multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9))) %>%
  forecast(h=20) %>%
  autoplot(aus_production, level = NULL) +
  labs(title="Australian Gas Production") +
  guides(colour = guide_legend(title = "Forecast"),
         subtitle="Additive vs. Multiplicative Seasonality")

report(fit_gas)
## # A tibble: 3 × 9
##   .model                  sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
##   <chr>                    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Additive              23.6       -927. 1872. 1873. 1903.  22.7  29.7 3.35  
## 2 Multiplicative         0.00324   -831. 1681. 1682. 1711.  21.1  32.2 0.0413
## 3 Damped Multiplicative  0.00340   -835. 1688. 1689. 1719.  21.0  32.4 0.0424
accuracy(fit_gas)
## # A tibble: 3 × 10
##   .model            .type       ME  RMSE   MAE    MPE  MAPE  MASE RMSSE     ACF1
##   <chr>             <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 Additive          Trai…  0.00525  4.76  3.35 -4.69  10.9  0.600 0.628  0.0772 
## 2 Multiplicative    Trai… -0.115    4.60  3.02  0.199  4.08 0.542 0.606 -0.0131 
## 3 Damped Multiplic… Trai…  0.255    4.58  3.04  0.655  4.15 0.545 0.604 -0.00451

Multiplicative method is preferred when the seasonal variation are changed in relation to the level. This is different then if the seasonal variation was constant in which case the additive method would be preferred. But as we can see there is a proportioanl increase within the seasonal cycle through out the timeframe, multiplicative method is the most appropriate. As we can see that making the trend damped does improve the forecast by decreasing the RMSE from 4.60 to 4.58.

Question 8

Recall your retail time series data (from Exercise 7 in Section 2.10).

set.seed(334)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`

A

Why is multiplicative seasonality necessary for this series?

Looking at the retail data we can see that multiplicative seasonality would be necessary here because the trend is increasing over time as well as the amplitude of the seasonality is also increasing. We can interpret then that the variation of the seasonal pattern is proportional to the level of the time series.

B

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit_my <- myseries %>%
  model(`Multiplicative` = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        `Damped Multiplicative` = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))) 

fit_my %>%
  forecast(h=36) %>%
  autoplot(myseries, level = NULL) +
  labs(title="Australian Retail Turnover") +
  guides(colour = guide_legend(title = "Forecast"))

C.

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(fit_my) %>%
  select(.model, RMSE)
## # A tibble: 2 × 2
##   .model                 RMSE
##   <chr>                 <dbl>
## 1 Multiplicative         4.05
## 2 Damped Multiplicative  4.01

It seems that the preferred method would be Damped Multiplicative forecast as it has a RMSE of 4.01 compared to regular multiplicative which has a RMSE value of 4.05.

D

Check that the residuals from the best method look like white noise.

myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  gg_tsresiduals() +
  ggtitle("Multiplicative Method")

# Box-Pierce test
myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  augment() %>% 
  features(.innov, box_pierce, lag = 24, dof = 0)
## # A tibble: 1 × 5
##   State           Industry                              .model bp_stat bp_pvalue
##   <chr>           <chr>                                 <chr>    <dbl>     <dbl>
## 1 South Australia Hardware, building and garden suppli… multi…    21.5     0.611
# Ljung-Box test
myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  augment() %>% 
  features(.innov, ljung_box, lag = 24, dof = 0)
## # A tibble: 1 × 5
##   State           Industry                              .model lb_stat lb_pvalue
##   <chr>           <chr>                                 <chr>    <dbl>     <dbl>
## 1 South Australia Hardware, building and garden suppli… multi…    21.9     0.585
E

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

fit_train <- myseries_train %>%
  model(multi = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        snaive = SNAIVE(Turnover))

# Forecasts
fc <- fit_train %>%
  forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc %>% autoplot(myseries, level = NULL)

accuracy(fit_train) %>%
  select(.type, .model, RMSE)
## # A tibble: 2 × 3
##   .type    .model  RMSE
##   <chr>    <chr>  <dbl>
## 1 Training multi   3.72
## 2 Training snaive  8.10
fc %>% accuracy(myseries)  %>%
  select(.type, .model, RMSE)
## # A tibble: 2 × 3
##   .type .model  RMSE
##   <chr> <chr>  <dbl>
## 1 Test  multi   16.6
## 2 Test  snaive  13.8

I couldn’t beat the Seasonal Naive approach as it has a smaller RMSE.

Question 9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda_retail <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# STL Box Cox transformed data
myseries %>%
  model(STL(box_cox(Turnover,lambda_retail) ~ season(window = "periodic"), robust = TRUE)) %>%
  components() %>%
  autoplot() +
  ggtitle("STL with Box-Cox Transformation")

# Box-Cox STL
Decomp <- myseries %>%
  model(STL_box = STL(box_cox(Turnover,lambda_retail) ~ season(window = "periodic"), robust = TRUE)) %>%
  components()

# Seasonally Adjusted
myseries$Turnover_sa <- Decomp$season_adjust

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

fit <- myseries_train %>%
  model(ETS(Turnover_sa ~ error("M") + trend("A") + season("M")))
fit %>% gg_tsresiduals()  +
  ggtitle("Australian Retail Turnover Residual")

# Forecasted Test Data
fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover,
## Turnover_sa)`
fit %>% accuracy() %>%
  select(.model, .type, RMSE)
## # A tibble: 1 × 3
##   .model                                                           .type    RMSE
##   <chr>                                                            <chr>   <dbl>
## 1 "ETS(Turnover_sa ~ error(\"M\") + trend(\"A\") + season(\"M\"))" Traini… 0.426
fc %>% accuracy(myseries) %>%
  select(.model, .type, RMSE)
## # A tibble: 1 × 3
##   .model                                                           .type  RMSE
##   <chr>                                                            <chr> <dbl>
## 1 "ETS(Turnover_sa ~ error(\"M\") + trend(\"A\") + season(\"M\"))" Test   2.08