library(fpp3)

Introduction

In this document, we will be going through exercises 5.1, 5.2, 5.3, 5.4 and 5.7 from Forecasting: Principles and Practice (3rd ed).

Exercises

5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

  • Australian Population (global_economy)
data <- global_economy
#head(data)
data_filter <- data |>
  filter(Country == "Australia") |>
  select(Population)
data_filter |>
  autoplot(Population) +
  labs(title= "Australian Population Over Time", y = "People") 

A consistent upward trend without seasonality indicates a drift forecast would be most appropriate.

data_filter |>
  model(RW(Population ~ drift())) |>
  forecast(h = 20) |>
  autoplot(data_filter, level = 95)  +
  labs(title= "Australian Population Over Time w/ Forecast", y = "People") 

  • Bricks (aus_production)
data <- aus_production
#head(data)
data_filter <- data |>
  select(Bricks) |>
  na.omit()
data_filter |>
  autoplot(Bricks) +
  labs(title= "Bricks Produced in Australia Over Time", y = "Bricks")

Seasonality is very heavily apparent, thus we want to utilize seasonal naive.

data_filter |>
  model(SNAIVE(Bricks ~ lag("year")))|>
  forecast(h = 20) |>
  autoplot(data_filter, level = 95)  +
  labs(title= "Bricks Produced in Australia Over Time w/ Forecast", y = "Bricks") 

  • NSW Lambs (aus_livestock)
data <- aus_livestock
data_filter <- data |>
  filter(Animal == "Lambs", State == "New South Wales") |>
  select(Count)
data_filter |>
  autoplot(Count) +
  labs(title= "Australian Lambs slaughtered Over Time", y = "Lambs") 

Seasonality is very heavily apparent, thus we want to utilize seasonal naive.

data_filter |>
  model(SNAIVE(Count)) |>
  forecast(h = 20) |>
  autoplot(data_filter, level = 95)  +
  labs(title= "Australian Lambs slaughtered Over Time w/ Forecast", y = "Lambs") 

  • Household wealth (hh_budget).
data <- hh_budget
#head(data)
data_filter <- data |>
  filter(Country == "Australia") |>
  select(Wealth)
data_filter |>
  autoplot(Wealth) +
  labs(title= "Australian Household Wealth Over Time", y = "% Net Disposable Income") 

A consistent upward trend without seasonality indicates a drift forecast would be most appropriate.

data_filter |>
  model(RW(Wealth ~ drift())) |>
  forecast(h = 20) |>
  autoplot(data_filter, level = 95)  +
  labs(title= "Australian Household Wealth Over Time w/ Forecast", y = "% Net Disposable Income") 

  • Australian takeaway food turnover (aus_retail).
data <- aus_retail
#head(data)
data_filter <- data |>
  filter(Industry == "Cafes, restaurants and takeaway food services") |>
  group_by(Industry) |>
  summarise(Turnover = sum(Turnover))
data_filter |>
  autoplot(Turnover) +
  labs(title= "Australian Turnover Over Time", y = "Turnover") 

Seasonality is very heavily apparent, thus we want to utilize seasonal naive.

data_filter |>
  model(SNAIVE(Turnover)) |>
  forecast(h = 20) |>
  autoplot(data_filter, level = 95)  +
  labs(title= "Australian Turnover Over Time w/ Forecast", y = "Turnover") 

5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

  • Produce a time plot of the series.
data <- gafa_stock
#head(data)
data_filter <- data |>
  filter(Symbol == "FB") |>
  mutate(trading_day = row_number()) |>
  update_tsibble(index = trading_day, regular = TRUE) |>
  select(Close)
data_filter |>
  autoplot(Close) +
  labs(title= "Facebook Stock Price Over Time", y = "Price (USD)") 

  • Produce forecasts using the drift method and plot them.
data_filter |>
  model(RW(Close ~ drift())) |>
  forecast(h = 100) |>
  autoplot(data_filter, level = 95)  +
  labs(title= "Facebook Stock Price Over Time w/ Forecast", y = "Price (USD)") 

  • Show that the forecasts are identical to extending the line drawn between the first and last observations.

Here we add an AB line overlay that utilizes the equation of a line between the first and last observations.

data_filter |>
  model(RW(Close ~ drift())) |>
  forecast(h = 100) |>
  autoplot(data_filter, level = 95)  +
  geom_abline(intercept = data_filter[[1,1]], slope = (tail(data_filter$Close, n=1) - data_filter[[1,1]]) / length(data_filter$Close), col = "light blue", linetype = "dashed") +
  labs(title= "Facebook Stock Price Over Time w/ Forecast", y = "Price (USD)")

  • Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
data_filter |>
  model(Mean = MEAN(Close),
    `Naïve` = NAIVE(Close),
    Drift = RW(Close ~ drift())) |>
  forecast(h = 100) |>
  autoplot(data_filter, level = NULL)  +
  labs(title= "Facebook Stock Price Over Time w/ Forecast", y = "Price (USD)") 

Here, and in general with financial markets, of the basic benchmark functions it makes the most sense to utilize a naïve forecast. This is because a drift model will almost always imply that a stock is going to increase with its predictions, which in this case is the exact opposite behavior the Facebook stock is exhibiting with the most recent data. The mean model will tend to under or overvalue a stock and be unconnected to the high autocorrelation that stock prices exhibit. Which leaves the naive model of simply extending the most recent stock price to be the most reasonable model to follow.

5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?

The innovation residuals plot shows us that for the most part the residuals from our forecasts are homoscedastic with no noticeable trends in variance. However, they do not seem to be uncorrelated with a seasonal pattern shown in both this graph and the ACF, despite our model being seasonal naive there is still significant quarterly seasonality remaining within the residuals. It is likely that this can be improved on with our more advanced models or seasonal decomposition. Despite being bimodal, the residuals have a zero mean and close to a normal distribution. Our model satisfies some residual properties but not others indicating it can be further improved.

5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

# Extract data of interest
recent_production <- global_economy |>
  filter(Country == "Australia")
# Define and estimate a model
fit <- recent_production |> model(NAIVE(Exports))
# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast(h = 3) |> autoplot(recent_production)

Since our data is only yearly there can not be seasonality, thus we use a naive model. We get surprisingly good results from our residuals, indicating that a naive model may be a good fit here. There does not seem to be significant correlation between the residuals. The residuals have a mean of zero with an almost perfect normal distribution. Finally, the residuals are mostly homoscedastic with a slight increase in variance towards the end of the data but that seems to normalize.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) <= 2003)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Bricks))
# Look at the residuals
fit |> gg_tsresiduals()

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

Overall, a seasonal naive model from this data without any transformations seems like a bad idea. The innovation residuals plot shows us heteroscedasticity in the residuals with variance increasing after 1970 and peaking near 1985. Additionally, the residuals are following a rough seasonal pattern with significant autocorrelation shown in the ACF. Finally, the distribution of the residual seems highly leftward skewed with our model having some big underestimations caused by seeming black swan events in 1975 and 1985. The only saving grace is that the mean of the residuals does seem to settle around zero.

5.7

For your retail time series (from Exercise 7 in Section 2.10):

  • Create a training dataset consisting of observations before 2011 using
set.seed(1234567)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries |>
  filter(year(Month) < 2011)
  • Check that your data have been split appropriately by producing the following plot.
autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

  • Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).
fit <- myseries_train |>
  model(SNAIVE(Turnover))
  • Check the residuals.
fit |> gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

There seems to be significant autocorrelation left within the data and there is not a normal distribution with a rightward skewing of the residuals. Additionally, the residuals have a mean of above 0 and are heteroscedastic.

  • Produce forecasts for the test data
fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
fc |> autoplot(myseries)

  • Compare the accuracy of your forecasts against the actual values.
fit |> accuracy()
## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Victoria Cafes, … SNAIV… Trai…  22.1  37.6  27.6  6.81  8.76     1     1 0.829
fc |> accuracy(myseries)
## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Vict… Cafes, … Test   69.0  116.  89.9  7.68  10.8  3.25  3.09 0.921

Every accuracy measure against the training data is vastly better (lower) than when measuring accuracy against the testing data. This indicates that utilizing a seasonal naive model is prone to overfitting the data here. Likely due to the fact that it does not take into account future trends that can increase the heteroscedasticity of the seasonality. Which leads to bigger errors as time goes on.

  • How sensitive are the accuracy measures to the amount of training data used?
myseries_autoval <- myseries |>
  stretch_tsibble(.init = 36, .step = 6) |>
  relocate(Month, State, Industry, .id)

myseries_autoval |>
  model(SNAIVE(Turnover)) |>
  forecast(h = 6) |>
  accuracy(myseries)
## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Vict… Cafes, … Test   24.4  40.9  31.4  6.13  8.07  1.06  1.03 0.812

By taking a look at testing with a cross-validation accuracy, we see that the amount of data used for this seasonal naive model greatly matters with a better test accuracy in each measure compared to the test accuracy in a regular split where we are missing almost a decade of data that could be used.