Nama NIM Email RPubs
Julian Salomo 20194920003 https://rpubs.com/juliansalomo
Vanessa Supit 20194920014 https://rpubs.com/vanessasupit
Kefas Ronaldo 20194920004 https://rpubs.com/kefasronaldo

Department  : Business Statistics
Address         : ARA Center, Matana University Tower
                         Jl. CBD Barat Kav, RT.1, Curug Sangereng, Kelapa Dua, Tangerang, Banten 15810.



1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Below is the plot of the number of pigs slaughtered in Victoria.

aus_livestock %>%
  filter(Animal == "Pigs", State == "Victoria") %>%
  autoplot(Count)+
  labs(title="Pigs slaughtered in Victoria")

1.1 Use the ETS() function in R to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

fit <- aus_livestock %>%
  filter(Animal == "Pigs", State == "Victoria") %>%
  model(ses = ETS(Count ~ error("A") + trend("N") + season("N")))
report_pig <- fit %>% report()
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##         l
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

So the optimal values are:

  • \(\alpha=0.3221247\)
  • \(\ell_0=100646.6\)

Next, we will generate forecasts for the next four months

fc <- fit %>% forecast(h = "4 months")
fc
## # A fable: 4 x 6 [1M]
## # Key:     Animal, State, .model [1]
##   Animal State    .model    Month             Count  .mean
##   <fct>  <fct>    <chr>     <mth>            <dist>  <dbl>
## 1 Pigs   Victoria ses    2019 Jan N(95187, 8.7e+07) 95187.
## 2 Pigs   Victoria ses    2019 Feb N(95187, 9.7e+07) 95187.
## 3 Pigs   Victoria ses    2019 Mar N(95187, 1.1e+08) 95187.
## 4 Pigs   Victoria ses    2019 Apr N(95187, 1.1e+08) 95187.

Above is the value for the forecast in the next 4 months. Now we will generate the plot of the forecast. We will plot the forecast with the last 8 months data.

fc %>%
  autoplot(filter(aus_livestock, Month >= yearmonth("2018 May"))) +
  labs(title="Forecast: Victorian Pigs")

1.2 Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

s <- augment(fit) %>% pull(.resid) %>% 
  sd()
yhat <- fc %>% 
  pull(Count) %>% 
  head(1)
yhat + c(-1, 1) * 1.96 * s
## <distribution[2]>
## [1] N(76871, 8.7e+07)  N(113502, 8.7e+07)

So, the prediction interval is \(76871 \le\hat{y} \le 113502\)

Next, we will generate the prediction interval produced by R

fc %>%
  mutate(interval = hilo(Count, 95)) %>%
  pull(interval)
## <hilo[4]>
## [1] [76854.79, 113518.3]95 [75927.17, 114445.9]95 [75042.22, 115330.9]95
## [4] [74194.54, 116178.6]95

Although the intervals are similar, they are not identical. This is due to the fact that R calculates the variance of the residuals in a different way, correctly accounting for the degrees of freedom (and also using a more accurate critical value rather than just 1.96).

2 Write your own function to implement simple exponential smoothing. The function should take arguments y (the time series), alpha (the smoothing parameter \(\alpha\)) and level (the initial level \(\ell_0\)). It should return the forecast of the next observation in the series. Does it give the same forecast as ETS()?

Bellow is our own SES funciton.

ses1 <- function(y, alpha, level)
  {
  yhat <- numeric(length(y)+1)
  yhat[1] <- level
  for(i in 2:(length(yhat))) {
    yhat[i] <- alpha * y[i-1] + (1 - alpha) * yhat[i-1]
    }
  return(last(yhat))
}

Now we will find the forecast of the next observation in the series

vic_pigs <- aus_livestock %>%
  filter(Animal == "Pigs", State == "Victoria") %>%
  pull(Count)
ses_fc <- vic_pigs %>%
  ses1(alpha = 0.3221, level = 100646.6) %>%
  round(0)
c(`Our own SES` = ses_fc, `fable::ETS()` = fc$Count[1])
## $`Our own SES`
## [1] 95187
## 
## $`fable::ETS()`
## N(95187, 8.7e+07)

Our function’s result give the same forecast value as the ETS() give.

3 Modify your function from the previous exercise to return the sum of squared errors rather than the forecast of the next observation. Then use the optim() function to find the optimal values of \(\alpha\) and \(\ell_0\). Do you get the same values as the ETS() function?

ses2 <- function(par, y)
  {
    alpha <- par[1]
    level <- par[2]
    n <- length(y)
    yhat <- numeric(n)
    yhat[1] <- level
    for(i in 2:n) 
      {
      yhat[i] <- alpha * y[i-1] + (1 - alpha) * yhat[i-1]
      }
    return(sum((y - yhat)^2) %>%
                round(3)
              )    
}

optim(c(0.00001, vic_pigs[1]), ses2, y=vic_pigs)$par
## [1] 3.236967e-01 9.422438e+04

Those are the SSE estimation from my function. Now let’s compare the result with tidy() function

tidy(fit)
## # A tibble: 2 x 5
##   Animal State    .model term    estimate
##   <fct>  <fct>    <chr>  <chr>      <dbl>
## 1 Pigs   Victoria ses    alpha      0.322
## 2 Pigs   Victoria ses    l     100647.

We can clearly see, that our function produce similar SSE with tidy() function, but the result isn’t identical estimates. This is due to different starting values being used.

4 Combine your previous two functions to produce a function that both finds the optimal values of \(\alpha\) and \(\ell_0\), and produces a forecast of the next observation in the series.

# modify SES function to find the optimal values of alpha and l0, and produce the next observation forecast.
SES <- function(init_pars, data){
  # init_pars is c(init_alpha, init_l0)
  # make next observation forecast variable
  fc_next <- 0
  
  # make SSE function to get SSE if parameters and data are given.
  SSE <- function(pars, data){
    error <- 0
    SSE <- 0
    alpha <- pars[1]
    l0 <- pars[2]
    y_hat <- l0
    
    for(index in 1:length(data)){
      error <- data[index] - y_hat
      SSE <- SSE + error^2
      
      y_hat <- alpha*data[index] + (1 - alpha)*y_hat 
    }
    # use superassignment to make forecast value possible to use outside SSE function.
    fc_next <<- y_hat
    return(SSE)
  }
  
  # use optim function to get optimal values of alpha and l0.
  optim_pars <- optim(par = init_pars, data = data, fn = SSE)
  
  # return results
  return(list(
    Next_observation_forecast = fc_next,
    alpha = optim_pars$par[1],
    l0 = optim_pars$par[2]
    ))
}

SES(c(0.5, vic_pigs[1]), vic_pigs)
## $Next_observation_forecast
## [1] 95186.55
## 
## $alpha
## [1] 0.3221274
## 
## $l0
## [1] 99223.08

5 Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

5.1 Plot the Exports series and discuss the main features of the data.

# Here we will plot the annual Exports of Belgium, with code BEL.
bel_eco <- global_economy %>% filter(Code == 'BEL')
bel_eco %>% autoplot(Exports)

This series has a trend that tends to increase every year

5.2 Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit <- bel_eco %>% 
  model(ANN = ETS (Exports ~ error("A") + trend("N") + season("N")))
fc <- fit %>% forecast(h = 4)
fc %>% autoplot(bel_eco) +
  labs(title="Forecast: Belgium's Exports (ANN)")

##Compute the RMSE values for the training data.

fit %>% accuracy()
## # A tibble: 1 x 11
##   Country .model .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <fct>   <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Belgium ANN    Training 0.877  3.23  2.52  1.35  4.30 0.987 0.989 -0.0642

the RMSE value is \(3.227668\)

5.3 Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

fit_compare <- bel_eco %>%
  model(
    ANN = ETS (Exports ~ error("A") + trend("N") + season("N")),
    AAN = ETS (Exports ~ error("A") + trend("A") + season("N"))
    )
accuracy(fit_compare)
## # A tibble: 2 x 11
##   Country .model .type          ME  RMSE   MAE    MPE  MAPE  MASE RMSSE    ACF1
##   <fct>   <chr>  <chr>       <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Belgium ANN    Training 0.877     3.23  2.52  1.35   4.30 0.987 0.989 -0.0642
## 2 Belgium AAN    Training 0.000848  3.08  2.31 -0.232  3.97 0.905 0.943  0.0980

AAN model reduce the RMSE value, sehich signifies AAN model has better accuracy.

5.4 Compare the forecasts from both methods. Which do you think is best?

fit_compare %>% 
  forecast(h=4) %>% 
  autoplot(bel_eco, level=NULL) +
  labs(title="Forecast Comparison: Belgium's Exports")

The result from AAN model is higher than ANN model because AAN model allows increasing trend, so AAN model is the best model for this model that have trending behavior

5.5 Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

# Extract the RMSE
s <- fit_compare %>%
  select(Country, AAN) %>%     # We use AAN model because it has better prediction
  accuracy() %>%
  transmute(Country, s = RMSE)

fit_compare %>%
  select(Country, AAN) %>% 
  forecast(h=1) %>% 
# Join in the RMSE calculated earlier
  left_join(s, by = 'Country') %>%
# Compute the interval, manually and from the distribution
  mutate(
    low = Exports - 1.96 * s,
    high = Exports + 1.96 * s
    ) %>%
  select(Country, Exports, low, high) 
## # A fable: 1 x 5 [?]
## # Key:     Country [1]
##   Country   Exports       low      high  Year
##   <fct>      <dist>    <dist>    <dist> <dbl>
## 1 Belgium N(85, 10) N(79, 10) N(91, 10)  2018

The Export forecast for the next first year (2018) is \(85\) with prediction interval $79 $

6 Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

china <- global_economy %>%
  filter(Country == "China")

china %>% autoplot(GDP) +
  labs(title="Chinese GDP")

Due to the increasing variance, it obviously needs a relatively strong transformation. To solve variance problem, we will use box cox transformation.

lambda <- china %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)
bc_china<-china %>% autoplot(box_cox(GDP, lambda)) 

After transformation, the data looks like a trend model, so we will try damped model here, and any other model.

fit <- china %>%
  model(
    # Automatic Model from ETS
    ets = ETS(GDP),
    # Damped Model
    ets_damped = ETS(GDP ~ trend("Ad")),
    # Automatic Model with Boc-Cox Transformation
    ets_bc = ETS(box_cox(GDP, lambda)),
    # Damped Model with Boc-Cox Transformation
    ets_bc_damped = ETS(box_cox(GDP, lambda) ~ trend("Ad")),
    # Automatic Model with Log Transformation
    ets_log = ETS(log(GDP))
)

fit %>%
  forecast(h="10 years") %>%
  autoplot(china, level =NULL)+
  labs(title="Forecast: Chinese GDP (10 Years)")

From the plot we can clearly see, both log and box-cox transformation has a great impact for the forecast, but damping has relatively small effect. But when we combine damping and box cox transformation, we can get the plot that suit the previous trend the most.

7 Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

aus_production %>% autoplot(Gas)

fit <- aus_production %>%
  filter(Quarter > yearquarter("1990 Q4")) %>%
  model(fit = ETS(Gas))
report(fit)
## Series: Gas 
## Model: ETS(A,A,A) 
##   Smoothing parameters:
##     alpha = 0.4902673 
##     beta  = 0.0001000091 
##     gamma = 0.0001001453 
## 
##   Initial states:
##         l        b        s1       s2       s3       s4
##  133.4487 1.227524 -12.80569 26.37931 10.65398 -24.2276
## 
##   sigma^2:  28.4997
## 
##      AIC     AICc      BIC 
## 610.6743 613.3213 631.8847
fit %>%
  forecast(h = "3 years") %>%
  autoplot(aus_production)

Why is multiplicative seasonality necessary here? Because the seasonal variation increases over time.

8 Recall your retail time series data (from Exercise 8 in Section 2.10).

set.seed(12345678)
myseries <- aus_retail %>%
  filter(`Series ID` == "A3349337W")

myseries%>% autoplot(Turnover)

8.1 Why is multiplicative seasonality necessary for this series?

It is clear from the graph that seasonality variations are changing with increase in time. In that case, multiplicative seasonality is the best approach because seasonal variations are not constant and additive method can handle constant seasonal variations only.

8.2 Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit <- myseries %>%
  model(
    "Holt-Winters' Damped" = ETS(Turnover ~ error("M") + 
                                            trend("Ad") +
                                            season("M")),
    "Holt-Winters' Multiplicative" = ETS(Turnover ~ error("M") + 
                                                    trend("A") +
                                                    season("M"))
  )
fc <- fit %>% forecast(h = "10 years")
fc %>%
  autoplot(myseries, level = NULL) +
  labs(title="Australian retail",
       y="Turnover") +
  guides(colour = guide_legend(title = "Forecast"))

We can see that damped methods generated less aggressive forecast compared to un-damped method

8.3 Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(fit) %>%select(".model", "RMSE")
## # A tibble: 2 x 2
##   .model                        RMSE
##   <chr>                        <dbl>
## 1 Holt-Winters' Damped          14.6
## 2 Holt-Winters' Multiplicative  15.0

The result clearly shows, that Holt-Winters’ damped model has better RMSE than Holt-Winters’ multiplicative method. So Holt-Winter’s damped is the best method for this time series regarding the RMSE value.

8.4 Check that the residuals from the best method look like white noise.

fit %>% select("Holt-Winters' Damped") %>% gg_tsresiduals()

From the plot and histogram, we can say the residuals look like a white noise with occasional spikes, but when we check the ACF plot, it shows this method is not white noise, because more than 5% of the spikes is out of the bound.
So we will try another method (multiplicative)

fit %>% select("Holt-Winters' Multiplicative") %>% gg_tsresiduals()

Multiplicative yield plots are better than damped. We can see on the ACF plot, spikes that out of the bound less than before.

8.5 Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

# Split the data we use
myseries_train <- myseries %>%
  filter(year(Month) < 2011)

# Add seasonal naïve model to our fit model
fit <- myseries_train %>%
  model(
    "Holt-Winters' Damped" = ETS(Turnover ~ error("M") + 
                                            trend("Ad") +
                                            season("M")),
    "Holt-Winters' Multiplicative" = ETS(Turnover ~ error("M") + 
                                                    trend("A") +
                                                    season("M")),
    "Seasonal Naïve Forecast" = SNAIVE(Turnover)
  )

# Assign the data over 2011 to a variable as our forecast comparison later
Compare <- anti_join(myseries, myseries_train, 
                     by = c("State", "Industry", "Series ID", "Month", "Turnover"))

# Do the forecasting according to comparison data
fc <- fit %>%
      forecast(Compare)

# Call the Comparison plot
autoplot(Compare, Turnover) +
  autolayer(fc, level = NULL) +
  guides(colour=guide_legend(title="Forecast")) +
  ggtitle('Comparison of Forecast') 

From the comparison we can prove that Holt-Winters’ Multiplicative method can beat the seasonal naïve approach. Otherwise Holt-Winters’Damped method didn’t show any significant difference with seasonal naïve approach.

9 For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda <- myseries_train %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
ts_bc <- myseries_train %>%
  mutate(
    bc = box_cox(Turnover, lambda)
  )

# Model use box-cox transformation
fit <- ts_bc %>%
  model(
    'STL (BoxCox)' = STL(bc ~ season(window = "periodic"),
             robust = T),
    'ETS (BoxCox)' = ETS(bc)
  )

# Our best previous model is multiplicative method
fit_best <-ts_bc %>%
  model(
    "Holt-Winters' Multiplicative" = ETS(Turnover ~ error("M") + 
                                                    trend("A") +
                                                    season("M"))
  )

rbind(accuracy(fit),accuracy(fit_best))
## # A tibble: 3 x 12
##   State  Industry .model .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>    <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New S~ Hardwar~ STL (~ Trai~  0.320  12.9  8.67 -0.399  5.40 0.421 0.507 0.355
## 2 New S~ Hardwar~ ETS (~ Trai~ -1.74   13.2  9.37 -1.47   5.85 0.455 0.519 0.221
## 3 New S~ Hardwar~ Holt-~ Trai~ -1.14   13.5  9.55 -0.912  5.79 0.450 0.515 0.247

From the RMSE value we can see that STL to the Box-Cox transformation has the best accuracy, event better than our previous best model.

10 Compute the total domestic overnight trips across Australia from the tourism dataset.

10.1 Plot the data and describe the main features of the series.

aus_trips <- tourism %>%
  summarise(Trips = sum(Trips))

aus_trips %>%
  autoplot(Trips)

The data is seasonal. There is a moderate downward trend until 2010, then it is replaced by a stronger upward trend.

10.2 Decompose the series using STL and obtain the seasonally adjusted data.

stl_dcmp <- aus_trips %>%
  model(STL(Trips)) %>% components()

stl_dcmp %>%
  as_tsibble() %>%
  autoplot(season_adjust)

10.3 Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. (This can be specified using decomposition_model().)

aus_trips %>%
  model(
    decomposition_model(STL(Trips),
                        ETS(season_adjust ~ error("A") + 
                                            trend("Ad") + 
                                            season("N")
                            )
                        )
    ) %>%
  forecast(h = "2 years") %>%
  autoplot(aus_trips)

10.4 Forecast the next two years of the series using an appropriate model for Holt’s linear method applied to the seasonally adjusted data (as before but without damped trend).

aus_trips %>%
  model(
    decomposition_model(STL(Trips),
                        ETS(season_adjust ~ error("A") + 
                                            trend("A") + 
                                            season("N")
                            )
                        )
    ) %>%
  forecast(h = "2 years") %>%
  autoplot(aus_trips)

10.5 Now use ETS() to choose a seasonal model for the data.

aus_trips %>%
  model(
    ETS(Trips)
    ) %>%
  forecast(h = "2 years") %>%
  autoplot(aus_trips)

10.6 Compare the RMSE of the ETS model with the RMSE of the models you obtained using STL decompositions. Which gives the better in-sample fits?

fit <- aus_trips %>%
  model(
    STL_AAdN = decomposition_model(STL(Trips),
                                   ETS(season_adjust ~ error("A") + 
                                                       trend("Ad") + 
                                                       season("N"))),
    STL_AAN = decomposition_model(STL(Trips),
                                  ETS(season_adjust ~ error("A") + 
                                                      trend("A") + 
                                                      season("N"))),
    ETS = ETS(Trips)
    )

accuracy(fit)
## # A tibble: 3 x 10
##   .model   .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <chr>    <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 STL_AAdN Training  105.  778.  592. 0.375  2.80 0.623 0.641 -0.00450
## 2 STL_AAN  Training  108.  778.  589. 0.397  2.79 0.620 0.640 -0.00584
## 3 ETS      Training  105.  794.  604. 0.379  2.86 0.636 0.653 -0.00151

Regarding the RMSE value, ETS AAN to the STL Decomposition is slightly better in-sample.

10.7 Compare the forecasts from the three approaches? Which seems most reasonable?

fit %>%
  forecast(h = "2 years") %>%
  autoplot(aus_trips, level = NULL) +
  guides(colour=guide_legend(title="Forecast")) 

Since most of the forecasts are comparable, the model with the smallest RMSE value, ETS AAN to the STL Decomposition model, is the most likely being the best.

10.8 Check the residuals of your preferred model.

best_model <- fit %>%
  select(STL_AAN) 
best_model %>%
  gg_tsresiduals()

augment(best_model) %>%
  features(.resid, ljung_box, lag = 24, dof = 5)
## # A tibble: 1 x 3
##   .model  lb_stat lb_pvalue
##   <chr>     <dbl>     <dbl>
## 1 STL_AAN    36.5   0.00904

At a glance, we can say that the residuals are a white noise type, but when we do ljung_box test, we got the lb_value is less than 0.05, which is shows the residuals are not a white noise. This is mainly due to the large spike at lag 14.

11 For this exercise use the quarterly number of arrivals to Australia from New Zealand, 1981 Q1 – 2012 Q3, from data set aus_arrivals.

nz_visit <- aus_arrivals %>% filter(Origin == 'NZ')

11.1 Make a time plot of your data and describe the main features of the series.

autoplot(nz_visit, Arrivals) +
  ggtitle("Arrivals from New Zealand")

  • The data has an upward trend
  • The data has a seasonal pattern which decreases arrivals number every year in the first quarter.

11.2 Create a training set that withholds the last two years of available data. Forecast the test set using an appropriate model for Holt-Winters’ multiplicative method.

train <- nz_visit %>%
  slice(1:(nrow(nz_visit)-(4*2)))   # 4*2 because the data is quarterly divided, then 2 years mean 2 times 4 quarters.

train %>%
  model(
    ETS(Arrivals ~ error("M") +
                   trend("A") +
                   season("M"))
  ) %>%
  forecast(h="2 years") %>%
  autoplot(level = NULL) + 
  autolayer(nz_visit, Arrivals) +
  labs(title = "Train forecast comparison to the actual data.")

11.3 Why is multiplicative seasonality necessary here?

In this case, multiplicative seasonality is important because the size of the seasonal pattern grows in proportion to the level of the trend. In a model with multiplicative seasonality, the seasonal pattern’s behavior will be captured and projected.

11.4 Forecast the two-year test set using each of the following methods:

  • an ETS model;
  • an additive ETS model applied to a log transformed series;
  • a seasonal naïve method;
  • an STL decomposition applied to the log transformed data followed by an ETS model applied to the seasonally adjusted (transformed) data.
compare <- anti_join(nz_visit, train,
                     by = c("Quarter", "Origin", "Arrivals"))

fit <- train %>%
  model(
    'ETS model' = ETS(Arrivals),
    'Additive to a Log Transformed' = ETS(log(Arrivals) ~ error("A") +
                                                          trend("A") +
                                                          season("A")),
    'Seasonal Naïve Method' = SNAIVE(Arrivals),
    'STL decomposition applied to the log transformed' = decomposition_model(STL(log(Arrivals)), 
                                                                             ETS(season_adjust))
  ) 

fc <- fit %>%
  forecast(h="2 years") 

fc %>%
  autoplot(level = NULL) + 
  autolayer(compare, Arrivals) +
  guides(colour=guide_legend(title="Forecast")) +
  labs(title = "New Zealand Visitors Train Forecast Comparison to the Actual Data.")

11.5 Which method gives the best forecasts? Does it pass the residual tests?

fc %>% accuracy(nz_visit)
## # A tibble: 4 x 11
##   .model     Origin .type      ME   RMSE    MAE    MPE  MAPE  MASE RMSSE    ACF1
##   <chr>      <chr>  <chr>   <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Additive ~ NZ     Test   -7093. 17765. 12851. -2.17   4.20 0.864 0.918  0.0352
## 2 ETS model  NZ     Test   -3495. 14913. 11421. -0.964  3.78 0.768 0.771 -0.0260
## 3 Seasonal ~ NZ     Test    9709. 18051. 17156.  3.44   5.80 1.15  0.933 -0.239 
## 4 STL decom~ NZ     Test  -11967. 22749. 16289. -3.82   5.26 1.09  1.18   0.104

From the plot and the RMSE value, the best forecast that has the most close error to the actual plot data is ETS model’s forecast.

best_model <- fit %>% select('ETS model')
best_model %>%
  gg_tsresiduals()

augment(best_model) %>%
  features(.resid, ljung_box)
## # A tibble: 1 x 3
##   .model    lb_stat lb_pvalue
##   <chr>       <dbl>     <dbl>
## 1 ETS model    1.29     0.256

Both acf plot and ljung box’s test show that the ETS model pass the residual tests. All of the lag in ACF plot is under the bound and the ljung box value is more than 0.05.

11.6 Compare the same four methods using time series cross-validation instead of using a training and test set. Do you come to the same conclusions?

nz_cv <- nz_visit %>%
  slice(1:(n() - 3)) %>%
  stretch_tsibble(.init = 36, .step = 3)

nz_cv %>% 
  model(
    'ETS model' = ETS(Arrivals),
    'Additive to a Log Transformed' = ETS(log(Arrivals) ~ error("A") +
                                                          trend("A") +
                                                          season("A")),
    'Seasonal Naïve Method' = SNAIVE(Arrivals),
    'STL decomposition applied to the log transformed' = decomposition_model(STL(log(Arrivals)), 
                                                                             ETS(season_adjust))
  ) %>%
  forecast(h = 3) %>%
  accuracy(nz_visit)
## # A tibble: 4 x 11
##   .model          Origin .type    ME   RMSE    MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>           <chr>  <chr> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Additive to a ~ NZ     Test  -375. 14347. 11021. 0.200  5.83 0.741 0.746 0.309
## 2 ETS model       NZ     Test  4627. 15327. 11799. 2.23   6.45 0.793 0.797 0.283
## 3 Seasonal Naïve~ NZ     Test  8244. 18768. 14422. 3.83   7.76 0.970 0.976 0.566
## 4 STL decomposit~ NZ     Test  3599. 16367. 12528. 1.81   6.53 0.842 0.851 0.256

From the time series cross-validation, the best method is Additive to a Log Transformed and followed by our previous best model, ETS model.

12 Aus Production

12.1 Apply cross-validation techniques to produce 1 year ahead ETS and seasonal naïve forecasts for Portland cement production (from aus_production). Use a stretching data window with initial size of 5 years, and increment the window by one observation.

cement_cv <- aus_production %>%
  slice(1:(n()-4)) %>%                             # '-4' for a year forecast
  stretch_tsibble(.init = 5*4, .step = 1)                     # 5 years, so 5*4 observation.

fc <- cement_cv %>%
  model(ETS(Cement), 
        SNAIVE(Cement)
        ) %>%
  forecast(h = "1 year")

12.2 Compute the MSE of the resulting \(4\)-step-ahead errors. Comment on which forecasts are more accurate. Is this what you expected?

fc %>%
  group_by(.id, .model) %>%
  mutate(h = row_number()) %>%
  ungroup() %>%
  accuracy(aus_production, by = c(".model", "h"))
## # A tibble: 8 x 11
##   .model           h .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE     ACF1
##   <chr>        <int> <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 ETS(Cement)      1 Test  -0.0902  82.9  60.1 -0.227  3.95 0.597 0.625 -0.00185
## 2 ETS(Cement)      2 Test  -0.653  101.   72.0 -0.325  4.74 0.708 0.756  0.495  
## 3 ETS(Cement)      3 Test  -1.71   119.   87.0 -0.492  5.80 0.856 0.894  0.616  
## 4 ETS(Cement)      4 Test  -0.729  137.  102.  -0.543  6.65 1.01  1.03   0.699  
## 5 SNAIVE(Ceme~     1 Test  30.9    138.  107.   1.97   6.99 1.06  1.04   0.640  
## 6 SNAIVE(Ceme~     2 Test  30.0    139.  107.   1.90   6.96 1.05  1.04   0.649  
## 7 SNAIVE(Ceme~     3 Test  29.5    139.  107.   1.85   6.95 1.05  1.04   0.651  
## 8 SNAIVE(Ceme~     4 Test  30.8    140.  108    1.91   6.99 1.06  1.05   0.637

For the first three horizons, the ETS results are much better, but for h=4 the results are much closer. We expect ETS to perform better with a long series like this because it should have no trouble estimating the parameters and will include trends if necessary.

13 Compare ETS(), SNAIVE() and decomposition_model(STL, ???) on the following five time series. You might need to use a Box-Cox transformation for the STL decomposition forecasts. Use a test set of three years to decide what gives the best forecasts.

  • Beer and bricks production from aus_production
aus_production %>% autoplot(Beer) 

There is no variance problem spotted, so we don’t need box-cox transformation.

fc <- aus_production %>%
  slice(1:(nrow(aus_production)-12)) %>%
  model(
    'ETS' = ETS(Beer), 
    'Seasonal Naïve' = SNAIVE(Beer),
    'STL Decomposition' = decomposition_model(STL(log(Beer)),
                                              ETS(season_adjust))
    ) %>%
  forecast(h = "3 years")

fc %>% 
  autoplot(filter_index(aus_production, "2000 Q1" ~ .),level=NULL) +
  guides(colour=guide_legend(title="Forecast")) +
  ggtitle("Beer production from aus_production")

fc %>% accuracy(aus_production)
## # A tibble: 3 x 10
##   .model            .type     ME  RMSE   MAE      MPE  MAPE  MASE RMSSE  ACF1
##   <chr>             <chr>  <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS               Test   0.127  9.62  8.92  0.00998  2.13 0.563 0.489 0.376
## 2 Seasonal Naïve    Test  -2.92  10.8   9.75 -0.651    2.34 0.616 0.549 0.325
## 3 STL Decomposition Test  -3.36   9.91  8.65 -0.816    2.10 0.546 0.504 0.342

Based on the test set results, ETS and STLM are the best for this dataset.

tidy_bricks <- aus_production %>%
  filter(!is.na(Bricks))

tidy_bricks %>% autoplot(Bricks)

There is no variance problem spotted, so we don’t need box-cox transformation.

fc <- tidy_bricks %>%
  slice(1:(nrow(tidy_bricks)-12)) %>%
  model(
    'ETS' = ETS(Bricks), 
    'Seasonal Naïve' = SNAIVE(Bricks),
    'STL Decomposition' = decomposition_model(STL(log(Bricks)),
                                              ETS(season_adjust))
    ) %>%
  forecast(h = "3 years")

fc %>% 
  autoplot(filter_index(tidy_bricks, "1980 Q1" ~ .),level=NULL) +
  guides(colour=guide_legend(title="Forecast")) +
  ggtitle("Beer production from aus_production")

fc %>% accuracy(tidy_bricks)
## # A tibble: 3 x 10
##   .model            .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>             <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 ETS               Test   2.27  17.5  13.2 0.474  3.31 0.365 0.354  0.339 
## 2 Seasonal Naïve    Test  32.6   36.5  32.6 7.85   7.85 0.898 0.739 -0.322 
## 3 STL Decomposition Test  10.5   18.9  15.2 2.49   3.70 0.418 0.382  0.0910

Based on the test set results, ETS and STLM are the best for this dataset.

  • Cost of drug subsidies for diabetes (ATC2 == "A10") and corticosteroids (ATC2 == "H02") from PBS
subsidies <- PBS %>%
  filter(ATC2 %in% c("A10", "H02")) %>%
  group_by(ATC2) %>%
  summarise(Cost = sum(Cost))

subsidies %>% 
  autoplot(Cost) +
  facet_grid(vars(ATC2), scales = "free_y") +
  ggtitle("Cost of drug subsidies for diabetes and corticosteroids")

There is variance problem spotted, so we need box-cox transformation for STL. We need to split that data, because lambda for each data is different.

# Diabetes
diabet <- subsidies %>%
  filter(ATC2 %in% "A10")

lambda1 <- diabet %>%
  features(Cost, features = guerrero) %>%
  pull(lambda_guerrero)

fc1 <- diabet %>%
  filter(Month < max(Month) - 35) %>%
  model(
    'ETS' = ETS(Cost), 
    'Seasonal Naïve' = SNAIVE(Cost), 
    'STL Decomposition' = decomposition_model(STL(box_cox(log(Cost), lambda1)),
                                              ETS(season_adjust))
  ) %>%
  forecast(h = "3 years")

# Corticosteroids
corti <- subsidies %>%
  filter(ATC2 %in% "H02")

lambda2 <- corti %>%
  features(Cost, features = guerrero) %>%
  pull(lambda_guerrero)

fc2 <- corti %>%
  filter(Month < max(Month) - 35) %>%
  model(
    'ETS' = ETS(Cost), 
    'Seasonal Naïve' = SNAIVE(Cost), 
    'STL Decomposition' = decomposition_model(STL(box_cox(log(Cost), lambda2)),
                                              ETS(season_adjust))
  ) %>%
  forecast(h = "3 years")

# Combine the data again
fc <- bind_rows(fc1,fc2)
fc %>% autoplot(subsidies, level=NULL) +
  guides(colour=guide_legend(title="Forecast")) +
  ggtitle("Forecasting for 3 years")

fc %>% 
  accuracy(subsidies) %>%
  arrange(ATC2)
## # A tibble: 6 x 11
##   .model     ATC2  .type       ME   RMSE    MAE    MPE  MAPE  MASE RMSSE    ACF1
##   <chr>      <chr> <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 ETS        A10   Test  1382519. 2.36e6 1.85e6  5.83   8.74 1.89  2.00   0.177 
## 2 Seasonal ~ A10   Test  4318158. 5.18e6 4.33e6 19.8   19.9  4.42  4.40   0.638 
## 3 STL Decom~ A10   Test    87980. 1.59e6 1.32e6 -0.296  6.77 1.35  1.35  -0.168 
## 4 ETS        H02   Test    27005. 7.65e4 6.45e4  1.99   7.05 1.09  1.07  -0.0990
## 5 Seasonal ~ H02   Test   -14795. 8.55e4 7.16e4 -1.31   7.88 1.21  1.20   0.0226
## 6 STL Decom~ H02   Test    23313. 6.93e4 5.79e4  1.69   6.37 0.977 0.974 -0.222

For the A10 series, the STLM method appears to be the best, while for the H02 series, SNAIVE/STLM appears to be the best.

  • Total food retailing turnover for Australia from aus_retail.
food <- aus_retail %>%
  filter(Industry == "Food retailing") %>%
  summarise(Turnover = sum(Turnover))
autoplot(food, Turnover) +
  ggtitle('Total food retailing turnover for Australia')

here is variance problem spotted, so we need box-cox transformation for STL.

lambda <- food %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

fc <- food %>%
  filter(Month < max(Month) - 35) %>%
  model(
    'ETS' = ETS(Turnover), 
    'Seasonal Naïve' = SNAIVE(Turnover), 
    'STL Decomposition' = decomposition_model(STL(box_cox(log(Turnover), lambda)),
                                              ETS(season_adjust))
  ) %>%
  forecast(h = "3 years")

fc %>% 
  autoplot(filter_index(food, "2005 Jan"~ .), level=NULL)+
  guides(colour=guide_legend(title="Forecast")) +
  ggtitle("Forecasting for 3 years")

fc %>% accuracy(food)
## # A tibble: 3 x 10
##   .model            .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>             <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS               Test  -151.   194.  170. -1.47   1.65 0.639 0.634 0.109
## 2 Seasonal Naïve    Test   625.   699.  625.  5.86   5.86 2.35  2.29  0.736
## 3 STL Decomposition Test   -15.8  152.  130. -0.205  1.23 0.490 0.498 0.249

The STL Decomposition with log and box-cox transformation has the best result as we can see from the RMSE value.

14 Tourism

14.1 Use ETS() to select an appropriate model for the following series: total number of trips across Australia using tourism, the closing prices for the four stocks in gafa_stock, and the lynx series in pelt. Does it always give good forecasts?

  • Total number of trips across Australia using
aus_trips <- tourism %>%
  summarise(Trips = sum(Trips))

fit <- aus_trips %>%
  model(ETS(Trips))

report(fit)
## Series: Trips 
## Model: ETS(A,A,A) 
##   Smoothing parameters:
##     alpha = 0.4495675 
##     beta  = 0.04450178 
##     gamma = 0.0001000075 
## 
##   Initial states:
##         l         b        s1        s2        s3       s4
##  21689.64 -58.46946 -125.8548 -816.3416 -324.5553 1266.752
## 
##   sigma^2:  699901.4
## 
##      AIC     AICc      BIC 
## 1436.829 1439.400 1458.267
fit %>%
  forecast() %>%
  autoplot(aus_trips) +
  ggtitle("Forecasting of total number of trips across Australia")

ETS model’s forecasts looks reasonable.

  • The closing prices for the four stocks in gafa_stock
gafa_stock %>%
  autoplot(Close) +
  facet_grid(vars(Symbol), scales = "free_y")

gafa_regular <- gafa_stock %>%
  group_by(Symbol) %>%
  mutate(trading_day = row_number()) %>%
  ungroup() %>%
  as_tsibble(index = trading_day, regular = TRUE)

fit <- gafa_regular %>%
  model(
    ETS(Close)
  )

report(fit)
## # A tibble: 4 x 10
##   Symbol .model       sigma2 log_lik    AIC   AICc    BIC    MSE   AMSE    MAE
##   <chr>  <chr>         <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 AAPL   ETS(Close) 0.000228  -5299. 10604. 10604. 10620.   4.39   8.96 0.0106
## 2 AMZN   ETS(Close) 0.000383  -7778. 15561. 15561. 15577. 359.   700.   0.0129
## 3 FB     ETS(Close) 0.000356  -5442. 10890. 10890. 10906.   5.82  11.3  0.0126
## 4 GOOG   ETS(Close) 0.000219  -7529. 15064. 15064. 15079. 144.   291.   0.0102
fit %>%
  forecast(h=50) %>%
  autoplot(gafa_regular %>% group_by_key() %>% slice((n() - 100):n()))

ETS model’s forecasts looks reasonable.

  • Pelt trading records
pelt %>%
  model(
    ETS(Lynx)
    ) %>%
  report()
## Series: Lynx 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9998998 
## 
##   Initial states:
##         l
##  25737.12
## 
##   sigma^2:  162584145
## 
##      AIC     AICc      BIC 
## 2134.976 2135.252 2142.509
pelt %>%
  model(
    ETS(Lynx)
    ) %>%
  forecast(h=10) %>%
  autoplot(pelt) +
  ggtitle("Forecasting of PELT trading records.")

The lynx data’s cyclic behavior is completely lost here. There is nothing that can be done to improve this because ETS models are not designed to handle cyclic data.

14.2 Find an example where it does not work well. Can you figure out why?

As seen in the pelt data set above, ETS does not work well with cyclic data.

15 Show that the point forecasts from an ETS(M,A,M) model are the same as those obtained using Holt-Winters’ multiplicative method.

we’ll use usdeaths data

usdeaths.ets <- ets(usdeaths, model="MAM")
usdeaths.etsF <- forecast(usdeaths.ets, h = 1)
usdeaths.HWM <- hw(usdeaths, seasonal = 'multiplicative', h=1)
usdeaths.etsF$mean
##           Jan
## 1979 8233.107
usdeaths.HWM$mean
##          Jan
## 1979 8217.64

Both of the methods produce a very close forecasting, which is shows that both methods are the same.