# Import required R libraries
library(fpp3)

Exercise 8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Section a

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(\ell_{0}\), and generate forecasts for the next four months.

vic_pigs <- aus_livestock %>%
  filter(State == 'Victoria',
           Animal == 'Pigs')
# Estimate parameters
fit <- vic_pigs %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))

# Output model fit
report(fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

Results: Optimal values based on the simple exponential smoothing of ETS() function are \(\alpha\) equal to 0.32 and \(\ell_{0}\) equal to 100646.6

fc <- fit %>%
  forecast(h = 4)

fc

## # A fable: 4 x 6 [1M]
## # Key:     Animal, State, .model [1]
##   Animal State    .model                          Month             Count  .mean
##   <fct>  <fct>    <chr>                           <mth>            <dist>  <dbl>
## 1 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Jan N(95187, 8.7e+07) 95187.
## 2 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Feb N(95187, 9.7e+07) 95187.
## 3 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Mar N(95187, 1.1e+08) 95187.
## 4 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Apr N(95187, 1.1e+08) 95187.

Above output shows the forecasted value for the next 4 months, which is the same value: 95187.

vic_pigs_2010 <- vic_pigs %>%
  filter(year(Month) > 2009)

fc %>%
  autoplot(vic_pigs_2010) +
  labs(y="Count", title="Pigs Slaughtered: Victoria") +
  guides(colour = "none")

The above plot shows the forecasted value for the next 4 months. Note: the data has been truncated so as to better show the forecasted values in context. Given the values are the same, the forecast line is horizontal.

Section b

Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

# y-hat is the predicted value (.mean)
y_hat <- mean(fc$.mean[1])

# Apply augment function to get the residuals
aug_fit <- augment(fit)

# Calculate standard deviation based on the residuals from augment
s <- sd(aug_fit$.resid)

# Calculate the 95% prediction intervals
upper_limit_95 <- y_hat + (s * 1.96)
lower_limit_95 <- y_hat - (s * 1.96)

int_95 <- c(lower_limit_95, upper_limit_95)

# Output calculated interval values
int_95

## [1]  76871.01 113502.10

Above is the calculated lower and upper limit for the 95% confidence interval.

# Determine the model forecast 95% intervals
fc_hilo <- fc %>% hilo()

# Output model interval values
fc_hilo$`95%`[1]

## <hilo[1]>
## [1] [76854.79, 113518.3]95

Above is the interval produced by R.

The model low value is 16.22 lower than my calculated low interval value. The model high value is 16.2 higher than my calculate high interval value. It appears the model calculated values provide a slightly larger 95% prediction interval than the calculations I performed. That being said, the intervals are very similar.

Exercise 8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

fra_exports <- global_economy %>%
  filter(Country == 'France')

# Selected France for no particular reason besides it contains data for the entirety of the defined time series.
head(fra_exports)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country Code   Year           GDP Growth   CPI Imports Exports Population
##   <fct>   <fct> <dbl>         <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 France  FRA    1960  62651474947.  NA     10.4    12.6    14.4   46814237
## 2 France  FRA    1961  68346741504.   5.51  10.7    12.3    13.9   47444751
## 3 France  FRA    1962  76313782252.   6.67  11.3    12.1    12.8   48119649
## 4 France  FRA    1963  85551113767.   5.35  11.8    12.5    12.6   48803680
## 5 France  FRA    1964  94906593388.   6.52  12.2    13.0    12.6   49449403
## 6 France  FRA    1965 102160571409.   4.78  12.5    12.6    13.2   50023774

Selected France for no particular reason besides it is not the United States and contains data for the entirety of the defined time series.

Section a

Plot the Exports series and discuss the main features of the data.

# Exports:  Exports of goods and services (% of GDP).
fra_exports %>%
  autoplot(Exports) +
  labs(y="% of GDP", title="Exports: France") +
  guides(colour = "none")

Main features: Overall, there appears to be an upward trend in the Exports, but I do not see any seasonality, nor cyclic nature to the time series. Departing from the overall upward trend, a dip appears between 1960 and 1970. And dip occurs between 1985 and 1998. Also, a very distinct negative spike occurs between 2008 and 2011.

Section b

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

# Estimate parameters
fit <- fra_exports %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))

report(fit)

## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9998995 
## 
##   Initial states:
##     l[0]
##  14.3989
## 
##   sigma^2:  1.3745
## 
##      AIC     AICc      BIC 
## 257.9200 258.3644 264.1013

# Set the forecast to 10 for 10 years
fc <- fit %>%
  forecast(h = 10)

fc %>%
  autoplot(fra_exports) +
  labs(y="% of GDP", title="Exports: France") +
  guides(colour = "none")

Above plot shows the forecast for the next 10 years. The forecast value is 30.9(%) as noted below in the tsibble output.

head(fc)

## # A fable: 6 x 5 [1Y]
## # Key:     Country, .model [1]
##   Country .model                                           Year    Exports .mean
##   <fct>   <chr>                                           <dbl>     <dist> <dbl>
## 1 France  "ETS(Exports ~ error(\"A\") + trend(\"N\") + s…  2018 N(31, 1.4)  30.9
## 2 France  "ETS(Exports ~ error(\"A\") + trend(\"N\") + s…  2019 N(31, 2.7)  30.9
## 3 France  "ETS(Exports ~ error(\"A\") + trend(\"N\") + s…  2020 N(31, 4.1)  30.9
## 4 France  "ETS(Exports ~ error(\"A\") + trend(\"N\") + s…  2021 N(31, 5.5)  30.9
## 5 France  "ETS(Exports ~ error(\"A\") + trend(\"N\") + s…  2022 N(31, 6.9)  30.9
## 6 France  "ETS(Exports ~ error(\"A\") + trend(\"N\") + s…  2023 N(31, 8.2)  30.9

Section c

Compute the RMSE values for the training data.

fit_acc <- accuracy(fit)

fit_acc

## # A tibble: 1 × 11
##   Country .model        .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <fct>   <chr>         <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 France  "ETS(Exports… Trai… 0.284  1.15 0.840  1.17  3.86 0.983 0.991 -0.00542

The RMSE of the training data for the ETS(A,N,N) model is 1.15.

Section d

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

fit_2 <- fra_exports %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

fit_2_acc <- accuracy(fit_2)
fit_2_acc

## # A tibble: 1 × 11
##   Country .model       .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <fct>   <chr>        <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 France  "ETS(Export… Trai… -0.0111  1.12 0.800 -0.243  3.81 0.936 0.964 0.0264

The RMSE of the training data for the ETS(A,A,N) model is 1.12. A slightly better performance by the ETS(A,A,N) model over the ETS(A,N,N) model. Not a surprising result, as the ETS(A,A,N) model takes into account trending, which the initial plot of Exports clearly indicated.

Section e

Compare the forecasts from both methods. Which do you think is best?

# Set the forecast to 10 for 10 years
fc_2 <- fit_2 %>%
  forecast(h = 10)

fc_2 %>%
  autoplot(fra_exports) +
  labs(y="% of GDP", title="Exports: France") +
  guides(colour = "none")

The above plot shows the forecasted values for the ETS(A,A,N) model.

fc_2

## # A fable: 10 x 5 [1Y]
## # Key:     Country, .model [1]
##    Country .model                                          Year    Exports .mean
##    <fct>   <chr>                                          <dbl>     <dist> <dbl>
##  1 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2018 N(31, 1.3)  31.2
##  2 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2019 N(31, 2.6)  31.5
##  3 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2020 N(32, 3.8)  31.8
##  4 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2021   N(32, 5)  32.1
##  5 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2022 N(32, 6.2)  32.4
##  6 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2023 N(33, 7.4)  32.7
##  7 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2024 N(33, 8.6)  33.0
##  8 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2025 N(33, 9.8)  33.4
##  9 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2026  N(34, 11)  33.7
## 10 France  "ETS(Exports ~ error(\"A\") + trend(\"A\") + …  2027  N(34, 12)  34.0

As noted in Section B, the forecast values for the ETS(A,N,N) model is the same value 30.9 for every year forward. As the above plot and forecast values for the ETS(A,A,N) model shows, the forecasted values for ETS(A,A,N) increases due to the additive trend parameter. The constant increase for this model of 0.3 does visually appear more in line with the increasing trend of the Exports data.

Section f

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

# 95% prediction interval for the first forecast: ETS(A,N,N)
y_hat <- fc$.mean[1]

lower_limit_95_fc <- y_hat - (fit_acc$RMSE * 1.96)
upper_limit_95_fc <- y_hat + (fit_acc$RMSE * 1.96)

ets_ann_interval <- c(lower_limit_95_fc, upper_limit_95_fc)

ets_ann_interval

## [1] 28.62385 33.13971

Above is the calculated 95% interval for the ETS(A,N,N) model.

# 95% prediction interval for the second forecast: ETS(A,A,N)
y_hat_2 <- fc_2$.mean[1]

lower_limit_95_fc2 <- y_hat_2 - (fit_2_acc$RMSE * 1.96)
upper_limit_95_fc2 <- y_hat_2 + (fit_2_acc$RMSE * 1.96)

ets_aan_interval <- c(lower_limit_95_fc2, upper_limit_95_fc2)

ets_aan_interval

## [1] 28.97896 33.36920

Above is the calculated 95% interval for the ETS(A,A,N) model.

# Determine the model forecast 95% intervals
fc_hilo <- fc %>% hilo()

# Output model interval values
fc_hilo$`95%`[1]

## <hilo[1]>
## [1] [28.58393, 33.17963]95

Above is the 95% interval for the ETS(A,N,N) model produced by R. The R produced values are 0.04 lower for the lower limit and 0.04 higher for the higher limit.

# Determine the model forecast 95% intervals
fc_2_hilo <- fc_2 %>% hilo()

# Output model interval values
fc_2_hilo$`95%`[1]

## <hilo[1]>
## [1] [28.89915, 33.44901]95

Above is the 95% interval for the ETS(A,A,N) model produced by R. The R produced values are 0.08 lower for the lower limit and 0.08 higher for the higher limit.

As noted in Exercise 8.1 the R produced confidence intervals appear a bit broader than the calculated values, but given the small differences, the calculated and R produced confidence intervals do appear in line with each other.

Exercise 8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

chn_gdp <- global_economy %>%
  filter(Country == 'China') %>%
  mutate(GDP = GDP/1e9) %>%
  select(c(Country, Code, Year, GDP))

(chn_gdp)

## # A tsibble: 58 x 4 [1Y]
## # Key:       Country [1]
##    Country Code   Year   GDP
##    <fct>   <fct> <dbl> <dbl>
##  1 China   CHN    1960  59.7
##  2 China   CHN    1961  50.1
##  3 China   CHN    1962  47.2
##  4 China   CHN    1963  50.7
##  5 China   CHN    1964  59.7
##  6 China   CHN    1965  70.4
##  7 China   CHN    1966  76.7
##  8 China   CHN    1967  72.9
##  9 China   CHN    1968  70.8
## 10 China   CHN    1969  79.7
## # … with 48 more rows

# GDP:  Gross domestic product (in $USD February 2019).
chn_gdp %>%
  autoplot(GDP) +
  labs(y="Billions $USD", title="GDP: China") +
  guides(colour = "none")

Above plot shows the Chinese GDP for initial visual inspection and understanding of the data.

fit <- chn_gdp %>%
  model(
    SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    Damped = ETS(GDP ~ error("A") + trend("Ad") + season("N"))
  )

# Forecast for 20 years
fc <- fit %>% forecast(h = 20)

fc %>%
  autoplot(chn_gdp, level=NULL) +
  labs(y="Billions $USD", title="GDP: China") +
  guides(colour = guide_legend(title = "Forecast"))

Above plot shows the forecasted values for the next 20 years of the Chinese GDP. The simple exponential smoothing forecast, as expected, produces a horizontal line which doesn’t appear to follow the increasing trend. The Holt forecast shows an increasing forecast with constant growth. And the Holt dampened forecast shows increasing trend that does slow over time. The Holt dampened forecast appears the most realistic of the three given it would be unlikely for the Chinese GDP to grow at a constant rate for 20 years.

# Calculate lambda for Box-Cox
lambda <- chn_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit_bc <- chn_gdp %>%
  model(
    SES = ETS(box_cox(GDP, lambda) ~ error("A") + trend("N") + season("N")),
    Holt = ETS(box_cox(GDP, lambda) ~ error("A") + trend("A") + season("N")),
    Damped = ETS(box_cox(GDP, lambda) ~ error("A") + trend("Ad") + season("N"))
  )

# Forecast for 20 years
fc_bc <- fit_bc %>% forecast(h = 20)

fc_bc %>%
  autoplot(chn_gdp, level=NULL) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed China GDP with $\\lambda$ = ",
         round(lambda,2)))) +
  guides(colour = guide_legend(title = "Forecast"))

Now, applying the Box-Cox transformation along with the ETS() function produces some interesting results. But first, in the not-so surprising result, the simple exponential smoothing model produced a horizontal line. The Holt approach produced a forecast that follows an exponential growth that clearly takes off after 5 years. The dampened method also starts to increase but not quite at the rate of the Holt method.

Given the 6 forecasts above, I personally would select the dampened method on the non-transformed data. I’m not expert on the Chinese economy, but the continue growth rate seems difficult to maintain according to the Hold method forecast.

Exercise 8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

# Gas:  Gas production in petajoules.
aus_gas <- aus_production %>%
  select(Quarter, Gas)

aus_gas %>%
  autoplot(Gas)

fit <- aus_gas %>%
  model(
    add = ETS(Gas ~ error("A") + trend("A") + season("N")),
    mult = ETS(Gas ~ error("M") + trend("A") + season("N")),
    add_sea = ETS(Gas ~ error("A") + trend("A") + season("A")),
    mult_sea = ETS(Gas ~ error("M") + trend("A") + season("M")),
    mult_sea_damp = ETS(Gas ~ error("M") + trend("Ad") + season("M"))
  )

# Forecast for 5 years (interval is quarters)
fc <- fit %>% forecast(h = 20)

fc %>%
  autoplot(aus_gas, level=NULL) +
  labs(y="Petajoules", title="Gas Production: Australia") +
  guides(colour = guide_legend(title = "Forecast"))

Multiplicative seasonality is necessary because as the Hyndman textbook points out “multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.” As the initial data plot and the forecast data plot indicate, the seasonal nature of the time series increases proportional to the level of the series itself.

I attempted 5 different ETS() based models. The purely forecasts without seasonality did not fare well at all. Based on the above answer, I’m also ignoring the forecast with the additive seasonality. So between the multiplicative seasonality forecasts with and without dampening, a small difference is visible with the dampening applied. I would argue the dampening does improve the forecast as the increasing trend is not a constant.

Exercise 8.8

Recall your retail time series data (from Exercise 8 in Section 2.10).

set.seed(8675309)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries)

Section a

Why is multiplicative seasonality necessary for this series?

Answer: As mentioned in the answer to Exercise 8.7, multiplicative seasonality is necessary because as the Hyndman textbook points out “multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.” The retail time series displays increasing proportional seasonality as the level of the series increases.

Section b

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit_ms <- myseries %>%
  model(
    M_S = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    M_S_D = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
  )

# Forecast for 5 years (interval is monthly)
fc_ms <- fit_ms %>% forecast(h = 60)

fc_ms %>%
  autoplot(myseries, level=NULL) +
  labs(y="$Million AUD", title="Retail Turnover: Australia") +
  guides(colour = guide_legend(title = "Forecast"))

As expected, the damped method shows an increasing overall trend but not as constant as the straightforward Holt-Winters’ multiplicative method.

Section c

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

acc_fit <- accuracy(fit_ms)

acc_fit

## # A tibble: 2 × 12
##   State      Industry   .model .type    ME  RMSE   MAE     MPE  MAPE  MASE RMSSE
##   <chr>      <chr>      <chr>  <chr> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 Queensland Takeaway … M_S    Trai… 0.215  6.88  5.06 -0.0490  3.35 0.401 0.405
## 2 Queensland Takeaway … M_S_D  Trai… 0.465  6.52  4.75  0.313   3.10 0.376 0.384
## # … with 1 more variable: ACF1 <dbl>

The RMSE of the damped model is 6.52, which slightly outperforms the RMSE of 6.88 from the non-damped model. Given just this diagnostic for evaluation, I prefer the damped model.

Section d

Check that the residuals from the best method look like white noise.

aug_fit <- augment(fit_ms)

#msd <- fit_ms %>%
#  filter(.model == "M_S_D") %>%
#  gg_tsresiduals()

#model <- fit_ms$M_S_D
#m <- mable(fit_ms$M_S_D)
#m %>% gg_tsresiduals()

fit_msd <- myseries %>%
  model(ETS(Turnover ~ error("M") + trend("Ad") + season("M")))

# fit_msd

fit_msd %>% gg_tsresiduals()

The plot of the innovation residuals shows a plot that fits the definition of white noise of constant mean around zero and a near constant variance.

Section e

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

# Define training set
myseries_train <- myseries %>%
  filter(year(Month) < 2011)

# From the exercise 7 in Chapter 5
# lambda from Assignment 2
lambda <- 0.24

fit <- myseries_train %>%
  model(SNAIVE(box_cox(Turnover, lambda) ~ drift()))

# Check residuals
fit %>% gg_tsresiduals()

fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))
fc %>% autoplot(myseries)

fit %>% accuracy()

## # A tibble: 1 × 12
##   State  Industry .model .type    ME  RMSE   MAE     MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>    <chr>  <chr> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Queen… Takeawa… SNAIV… Trai… -1.37  14.1  10.4 -0.0720  7.86 0.901 0.937 0.828

fc %>% accuracy(myseries)

## # A tibble: 1 × 12
##   .model   State  Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>  <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(… Queen… Takeawa… Test  0.630  27.6  21.8 0.163  7.17  1.89  1.84 0.850

The RMSE of the previous exercise for the forecasted values is 27.6.

fit_msd_train <- myseries_train %>%
  model(ETS(Turnover ~ error("M") + trend("M") + season("M")))

fc_msd <- fit_msd_train %>%
  forecast(new_data = anti_join(myseries, myseries_train))

fc_msd %>% autoplot(myseries)

fit_msd_train %>% accuracy()

## # A tibble: 1 × 12
##   State  Industry .model .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>    <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Queen… Takeawa… "ETS(… Trai… -0.540  6.26  4.35 -0.434  3.42 0.377 0.417 0.170

fc_msd %>% accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "ETS(Tur… Quee… Takeawa… Test   28.0  35.6  29.3  9.22  9.67  2.54  2.37 0.891

Can I beat the seasonal naïve approach from Exercise 7? Answer: Nope. Unfortunately, I couldn’t get the forecast to follow along the overall increasing trend of the data. By applying an ETS(M,A,M) model, the forecasted values started to follow a decreasing trend and resulted in an RMSE of 101.3121. By applying an ETS(M,Ad,M) model, the RMSE is better at 81.3 and the trend line did increase, but the increasing trend did not come close to the strong increase of the real data between 2010 and 2015. In the end, the seasonal naïve approach is victorious.

Even though the textbook didn’t indicate the option “M” for the trend parameter of the ETS() function, I tried it anyway. So with an ETS(M,M,M) model, the RMSE resulted in 35.6 which far beat out the other two ETS() models and appears much closer to the seasonal naïve approach.

Exercise 8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

# Initial STL decomposition on the data (no Box-Cox)
myseries_stl <- myseries_train %>% 
  model(stl = STL(Turnover))

components(myseries_stl) %>% autoplot()

Above is the STL decomposition applied to the raw (non-transformed) series.

# STL on box-cox transformation
myseries_stl_bc <- myseries_train %>% 
  model(stl = STL(box_cox(Turnover, lambda)))

components(myseries_stl_bc) %>% autoplot()

Above is the STL decomposition applied to the Box-Cox transformed series.

# Now apply the ETS
my_series_bc_ets_fit <- myseries_train %>%
  model(ETS(box_cox(Turnover, lambda)))

my_series_bc_ets_fc <- my_series_bc_ets_fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))

my_series_bc_ets_fc %>% autoplot(myseries)

my_series_bc_ets_fc %>% accuracy(myseries)

## # A tibble: 1 × 12
##   .model   State  Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>  <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS(box… Queen… Takeawa… Test   19.5  30.0  23.9  6.41  7.92  2.08  2.00 0.862

With the application of the ETS() function on the seasonally adjusted data via Box-Cox transformation, the visual result shows a forecast much more in-line with the actual data than compared to the results of the previous exercise. The calculated RMSE for the ETS approach on the transformed series is 30.0, much better than the results from the previous exercise. This RMSE is only slightly worse than the seasonal naïve approach at 27.6. Turns out the seasonal naïve approach is the best, at least so far.

DATA 624 Assignment 5

CUNY Fall 2021

Philip Tanofsky

10 October 2021

Exercise 8.1

Section a

Section b

Exercise 8.5

Section a

Section b

Section c

Section d

Section e

Section f

Exercise 8.6

Exercise 8.7

Exercise 8.8

Section a

Section b

Section c

Section d

Section e

Exercise 8.9