R Markdown

library(AppliedPredictiveModeling)
library(mlbench)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(corrplot)
## corrplot 0.94 loaded
library(purrr)
library(tidyr)
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble      3.2.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ tsibble     1.1.5     ✔ fabletools  0.4.2
## ✔ tsibbledata 0.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

1. Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

a. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values and generate forecasts for the next four months.

p <- aus_livestock %>% 
  filter(Animal == 'Pigs' & State == 'Victoria')

pigs <- p %>%
  autoplot(Count) +
  labs(title = 'Timeseries')
pigs

fit <- p %>%
    model(ses = ETS(Count ~ error('A') + trend('N') + season('N')))
opt_val <- fit %>% report()
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07
pigs_forecast <- fit %>%
  forecast(h = 4)

pigs_forecast
## # A fable: 4 x 6 [1M]
## # Key:     Animal, State, .model [1]
##   Animal State    .model    Month             Count  .mean
##   <fct>  <fct>    <chr>     <mth>            <dist>  <dbl>
## 1 Pigs   Victoria ses    2019 Jan N(95187, 8.7e+07) 95187.
## 2 Pigs   Victoria ses    2019 Feb N(95187, 9.7e+07) 95187.
## 3 Pigs   Victoria ses    2019 Mar N(95187, 1.1e+08) 95187.
## 4 Pigs   Victoria ses    2019 Apr N(95187, 1.1e+08) 95187.
plot <- fit %>%
  forecast(h = 4) %>%
  autoplot(filter(p, Month >= yearmonth('2016 Jan'))) +
  labs(title = 'Four Month Forecast')
plot

b. Compute a 95% prediction interval for the first forecast using y^ 1.96 s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

y <- pigs_forecast %>%
  pull(Count) %>%
  head(1)


sD <- augment(fit) %>%
  pull(.resid) %>%
  sd()

# Calculate the lower and upper confidence intervals. 
lowerCi <- y - 1.96 * sD
upperCi <- y + 1.96 * sD

z <- c(lowerCi, upperCi)
names(z) <- c('Lower', 'Upper')
z
## <distribution[2]>
##              Lower              Upper 
##  N(76871, 8.7e+07) N(113502, 8.7e+07)
#For this first forecast, the 95% interval lower and upper bounds are 76871 to 113502.

hilo(pigs_forecast$Count, 95)
## <hilo[4]>
## [1] [76854.79, 113518.3]95 [75927.17, 114445.9]95 [75042.22, 115330.9]95
## [4] [74194.54, 116178.6]95
# The intervals produced by R are different but close to our calculated intervals.

5. Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

a. Plot the Exports series and discuss the main features of the data.

global_economy %>%
  filter(Country == "Turkey") %>%
  autoplot(Exports) +
  labs(y="% of GDP", title="Exports: Turkey")

We can see there is an increasing trend over time, but no seasonality. From 1980 to around 1983 there is a large jump, while the late 1990s saw one of the biggest dips. More recent years have less volatility.

b. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit_tk <- global_economy %>%
  filter(Country == "Turkey") %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N"))) 

forecast_tk <- fit_tk %>%
  forecast(h = 5)

forecast_tk %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Exports of Turkey")

c. Compute the RMSE values for the training data.

fittk <- accuracy(fit_tk)$RMSE
fittk
## [1] 2.183255

d. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

fit_tk2 <- global_economy %>%
  filter(Country == "Turkey") %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

forecast_tk2 <- fit_tk2 %>%
  forecast(h = 5)

forecast_tk2 %>%
  autoplot(global_economy) +
  labs(y="GDP %", title="Exports of Turkey")

fittk2 <- accuracy(fit_tk2)$RMSE
fittk2
## [1] 2.144857
#This model's forecast is better since the RMSE is lower for the ETS (A,A,N) model.

e. Compare the forecasts from both methods. Which do you think is best?

I think that the ETS(A,A,N) model is better because of its lower RMSE and that it can better capture the data’s increasing trend. The ETS(A,A,N) also has a smaller interval than the ETS(A,N,N).

f. Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

s_tk <- residuals(fit_tk)$.resid %>% sd()
hat_y_tk <- forecast_tk$.mean[1]

forecast_tk %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [20.14458, 28.85427]95

95% prediction interval: (20.2673177, 28.7315304) 95% prediction interval using RMSE: (20.2202452, 28.778603)

The interval computed by R using hilo() is a slightly larger interval compared to the others. When I use RMSE as the s I have a slightly larger interval, compared to when I use the standard deviation of the residuals as my s.

s_tk_2 <- residuals(fit_tk2)$.resid %>% sd()
hat_y_tk_2 <- forecast_tk2$.mean[1]

forecast_tk2 %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [20.37251, 29.08602]95

95% prediction interval: (20.4909565, 28.9675735) 95% prediction interval using RMSE: (20.5253448, 28.9331852)

The interval computed by R using hilo() is a slightly larger interval compared to the others. When I use RMSE as the s I have a smaller interval, compared to when I use the standard deviation of the residuals as my s.

6. Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

lambda <- global_economy %>%
  filter(Country == "China") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit_ch <- global_economy %>%
  filter(Country == "China") %>%
  model(`Simple` = ETS(GDP ~ error("A") + trend("N") + season("N")),
        `Holt's method` = ETS(GDP ~ error("A") + trend("A") + season("N")),
        `Damped Holt's method` = ETS(GDP ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
        `Box-Cox` = ETS(box_cox(GDP,lambda) ~ error("A") + trend("A") + season("N")),
        `Box-Cox Damped` = ETS(box_cox(GDP,lambda) ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
        `Log` = ETS(log(GDP) ~ error("A") + trend("A") + season("N")),
        `Log Damped` = ETS(log(GDP) ~ error("A") + trend("Ad", phi = 0.8) + season("N"))
        ) 

fc_ch <- fit_ch %>%
  forecast(h = 15)

fc_ch %>%
  autoplot(global_economy, level = NULL) +
  labs(title="GDP: China") +
  guides(colour = guide_legend(title = "Forecast"))

The Chinese GDP has an increasing trend, but does not show seasonality. It would be best to have additive errors and “N” for season.

It was interesting to work with phi in the damped trend models as it changed the severity of the forecasts. When I used phi as 0.9, the forecasts seemed larger. I chose phi as 0.8 to be more realistic and since the GDP has very large numbers. Log and Box-Cox transformations without any damping, seem to over-forecast quite a bit as h gets larger. Damped log and Holt’s Method at phi=0.8 seem to have similar effects. The Damped Holt’s method without any transformation seemed to be midway between the Holt’s and the simple exponential method.

7. Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

Multiplicative seasonality is necessary here because the variation of the seasonal pattern appears to be proportional to the level of the time series. With an increasing trend, the amplitude of the seasonal pattern increases as well.

fit_gas <- aus_production %>%
  model(additive = ETS(Gas ~ error("A") + trend("A") + season("A")),
        multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
        `damped multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M"))) 

aus_production %>%
   model(additive = ETS(Gas ~ error("A") + trend("A") + season("A")),
        multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M"))) %>%
   forecast(h=20) %>%
  autoplot(aus_production, level = NULL) +
  labs(title="Australian Gas Production") +
  guides(colour = guide_legend(title = "Forecast"),
         subtitle="Additive vs. Multiplicative Seasonality")

aus_production %>%
  model(multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
        `damped multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M"))) %>%
  forecast(h=20) %>%
  autoplot(aus_production, level= NULL) +
  labs(title="Australian Gas Production") +
  guides(colour = guide_legend(title = "Forecast"),
         Subtitle = "Additive vs Damped Trend")

report(fit_gas)
## Warning in report.mdl_df(fit_gas): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 3 × 9
##   .model                  sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
##   <chr>                    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 additive              23.6       -927. 1872. 1873. 1903.  22.7  29.7 3.35  
## 2 multiplicative         0.00324   -831. 1681. 1682. 1711.  21.1  32.2 0.0413
## 3 damped multiplicative  0.00340   -835. 1688. 1689. 1719.  21.0  32.4 0.0424
accuracy(fit_gas)
## # A tibble: 3 × 10
##   .model            .type       ME  RMSE   MAE    MPE  MAPE  MASE RMSSE     ACF1
##   <chr>             <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 additive          Trai…  0.00525  4.76  3.35 -4.69  10.9  0.600 0.628  0.0772 
## 2 multiplicative    Trai… -0.115    4.60  3.02  0.199  4.08 0.542 0.606 -0.0131 
## 3 damped multiplic… Trai…  0.255    4.58  3.04  0.655  4.15 0.545 0.604 -0.00451

The RMSE did improve with a damped multiplicative method, however the AIC worsened a bit with each method. For improving the model with dampening, You can choose to either base it off the RMSE or the AIC, AICc, and BIC.

Since all three methods have the same number of parameters to estimate, I believe the RMSE can be most appropriate, as we see that the damped multiplicative method was improved slightly.

8.Recall your retail time series data (from Exercise 8 in Section 2.10).

a. Why is multiplicative seasonality necessary for this series?

Multiplicative seasonality is necessary for this series because the variation of the seasonal pattern appears to be proportional to the level of the time series. Because the trend is increasing over time, the amplitude of the seasonality increases with time simultaneously.

b. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

set.seed(1234)
my_series <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

fit <- my_series %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        `damped multiplicative` = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))) 

fit %>%
  forecast(h=36) %>%
  autoplot(my_series, level = NULL) +
  labs(title="AU Retail Turnover") +
  guides(colour = guide_legend(title = "Forecast"))

The multiplicative method increased more over time and produced a higher forecas, however the damped method show smaller changes and seemed somewhat stagnant. It also produced higher forecasts.

c. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(fit) %>%
  select(.model, RMSE)
## # A tibble: 2 × 2
##   .model                 RMSE
##   <chr>                 <dbl>
## 1 multiplicative         1.34
## 2 damped multiplicative  1.36

For he RMSE, we can see that it is slightly lower for the multiplicative version, therefore the damped multiplicative version would be more preferred.

d. Check that the residuals from the best method look like white noise.

my_series %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  gg_tsresiduals() +
  ggtitle("Multiplicative Method")

#Box-Pierce test
my_series %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  augment() %>% 
  features(.innov, box_pierce, lag = 24, dof = 0)
## # A tibble: 1 × 5
##   State    Industry                                     .model bp_stat bp_pvalue
##   <chr>    <chr>                                        <chr>    <dbl>     <dbl>
## 1 Tasmania Cafes, restaurants and takeaway food servic… multi…    27.2     0.296
#Ljung-Box test
my_series %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  augment() %>% 
  features(.innov, ljung_box, lag = 24, dof = 0)
## # A tibble: 1 × 5
##   State    Industry                                     .model lb_stat lb_pvalue
##   <chr>    <chr>                                        <chr>    <dbl>     <dbl>
## 1 Tasmania Cafes, restaurants and takeaway food servic… multi…    28.0     0.260

Both tests show that the results are not significant at a 0.05 level, and so the residuals are not distinguishable from a white noise series.

e. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

my_series_training <- my_series %>%
  filter(year(Month) < 2011)

fit_train <- my_series_training %>%
  model(multi = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        snaive = SNAIVE(Turnover))

#producing forecasts
fc <- fit_train %>%
  forecast(new_data = anti_join(my_series, my_series_training))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc %>% autoplot(my_series, level = NULL)

#Compare the RMSE
accuracy(fit_train) %>%
  select(.type, .model, RMSE)
## # A tibble: 2 × 3
##   .type    .model  RMSE
##   <chr>    <chr>  <dbl>
## 1 Training multi   1.18
## 2 Training snaive  2.90
fc %>% accuracy(my_series)  %>%
  select(.type, .model, RMSE)
## # A tibble: 2 × 3
##   .type .model  RMSE
##   <chr> <chr>  <dbl>
## 1 Test  multi   3.99
## 2 Test  snaive  9.13

Here, the multiplicative method seems to forecast the data better since the RMSE is significantly lower compared to the seasonal naïve approach.

9. For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda <- my_series_training %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

#stl decomp applied to the box cox transformed data
my_series_training %>%
  model(
    STL(box_cox(Turnover,lambda) ~ season(window = "periodic"), robust = TRUE)) %>%
  components() %>%
  autoplot() +
  ggtitle("STL with Box-Cox")

#computed the seasonally adjusted data , stl decomp applied to the box cox transformed data
dcmp <- my_series_training %>%
  model(STL_box = STL(box_cox(Turnover,lambda) ~ season(window = "periodic"), robust = TRUE)) %>%
  components()

#replacing turnover with the seasonally adjusted data
my_series_training$Turnover <- dcmp$season_adjust

#modeling on the seasonally adjusted data
fit <- my_series_training %>%
  model(ETS(Turnover ~ error("M") + trend("A") + season("M")))

#checking the residuals
fit %>% gg_tsresiduals()  +
  ggtitle("Residual Plots for Australian Retail Turnover")

#produce forecasts for test data
fc <- fit %>%
  forecast(new_data = anti_join(my_series, my_series_training))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fit %>% accuracy() %>%
  select(.model, .type, RMSE)
## # A tibble: 1 × 3
##   .model                                                        .type     RMSE
##   <chr>                                                         <chr>    <dbl>
## 1 "ETS(Turnover ~ error(\"M\") + trend(\"A\") + season(\"M\"))" Training 0.141
fc %>% accuracy(my_series) %>%
  select(.model, .type, RMSE)
## # A tibble: 1 × 3
##   .model                                                        .type  RMSE
##   <chr>                                                         <chr> <dbl>
## 1 "ETS(Turnover ~ error(\"M\") + trend(\"A\") + season(\"M\"))" Test   21.0

I was a bit confused with the process for STL decomposition, so I believe this code may have some issues. I attempted to apply the box cox transformation on the training data, but this did result in a RMSE that is a bit larger than I expected on the test data while the train data has a smaller RMSE. My main obstacle is trying to see at what point is best to perform the STL decomposition (prior or after splitting the data) but here I decided to perform the transformation before splitting. Any advice for this would be much appreciated, thank you!