library(fpp3)
library(knitr)

Exercise 5.1:

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Because the data is not very seasonal, and there is a clear upward trend in the data, the Drift forecast method is preferable for the Australian population data.

data(global_economy)
aus_population <- global_economy |>
    filter(Country == "Australia") |>
    select(c("Country", "Code", "Year", "Population")) |>
    mutate(Population_Millions = Population / 1000000)
train <- aus_population |>
    filter_index("1960" ~ "2006")
aus_population_fit <- train |>
    model(Drift = RW(Population_Millions ~ drift()),
          Naive = NAIVE(Population_Millions))
aus_population_fc <- aus_population_fit |>
    forecast(h = 12)
aus_population_fc |>
    autoplot(train, level=NULL) +
    autolayer(filter_index(aus_population, "2006" ~ .),
              Population_Millions,
              colour = "black") +
    labs(y = "Population in Millions",
         title = "Annual Forecasts for Australian Population") +
    guides(color = guide_legend(title = "Forecast"))

We can confirm that for the 12 most recent years in the data set, the forecasts from the Drift method are much closer to the actual values than the forecasts from the Naive method.

We would expect SNAIVE(y) to be the most appropriate forecast method for the Australian bricks production data.

data(aus_production)
bricks <- aus_production |>
    select(c("Quarter", "Bricks")) |>
    filter(!is.na(Bricks))
train <- bricks |>
    filter_index("1956 Q1" ~ "2001 Q4")
bricks_fit <- train |>
    model(Drift = RW(Bricks ~ drift()),
          `Seasonal Naive` = SNAIVE(Bricks ~ lag("year")))
bricks_fc <- bricks_fit |>
    forecast(h = 14)
bricks_fc |>
    autoplot(train, level=NULL) +
    autolayer(filter_index(bricks, "2001 Q4" ~ .),
              Bricks,
              colour = "black") +
    labs(y = "Bricks",
         title = "Quarterly Forecasts for Australian Brick Production") +
    guides(color = guide_legend(title = "Forecast"))

However, the Seasonal Naive forecasts do not come close to the actual values for the last 3.5 years of data. Most values are actually above the Drift forecasts.

For the NSW lambs data, we would again expect SNAIVE(y) to be the most appropriate forecast method.

data(aus_livestock)
nsw_lambs <- aus_livestock |>
    filter(State == "New South Wales" & Animal == "Lambs")
train <- nsw_lambs |>
    filter_index("1972 Jul" ~ "2016 Dec")
nsw_lambs_fit <- train |>
    model(`Seasonal Naive` = SNAIVE(Count ~ lag("year")))
nsw_lambs_fc <- nsw_lambs_fit |>
    forecast(h = 24)
nsw_lambs_fc |>
    autoplot(filter_index(train, "2011 Jan" ~ .), level=NULL) +
    autolayer(filter_index(nsw_lambs, "2016 Dec" ~ .),
              Count,
              colour = "black") +
    labs(y = "Lambs",
         title = "Monthly Forecasts for Australian Meat Production: Lambs") +
    guides(color = guide_legend(title = "Forecast"))

Here, we’ve limited the data period shown to 2011-2018 so the patterns are more visible. The Seasonal Naive forecasts do in fact come close to many of the actual values for the last two years of data. However, there are some notable dips at the beginning and end of the forecasting period that the forecasts miss. That said, it’s still better than either other method.

For the household wealth data, we expect Drift method forecasts to best capture the trends for the four countries measured.

data(hh_budget)
hh_wealth <- hh_budget |>
    select(c("Country", "Year", "Wealth"))
train <- hh_wealth |>
    filter_index("1995" ~ "2011")
hh_wealth_fit <- train |>
    model(Drift = RW(Wealth ~ drift()),
          Naive = NAIVE(Wealth))
hh_wealth_fc <- hh_wealth_fit |>
    forecast(h = 5)
hh_wealth_fc |>
    autoplot(train, level=NULL) +
    autolayer(filter_index(hh_wealth, "2011" ~ .),
              Wealth,
              colour = "black") +
    labs(y = "Wealth",
         title = "Annual Forecasts for Household Wealth by Country") +
    guides(color = guide_legend(title = "Forecast")) +
    facet_grid(Country ~ .)

The Drift method forecasts are in fact closer to the actual values for the last five years of data than NAIVE(y) method forecasts for most countries. For Australia, the NAIVE(y) method works just as well as the Drift method, obscuring its line in the figure above, because the trend is so flat for Australia, whereas the other countries have at least a small upward trend.

We expect SNAIVE(y) to produce the best forecasts for the Australian takeaway food turnover data.

data(aus_retail)
takeaway <- aus_retail |>
    filter(Industry == "Takeaway food services") |>
    as_tibble() |>
    group_by(Industry, Month) |>
    summarize(Turnover_Sum = sum(Turnover)) |>
    as_tsibble(index=Month)
train <- takeaway |>
    filter_index("1982 April" ~ "2016 Dec")
takeaway_fit <- train |>
    model(Drift = RW(Turnover_Sum ~ drift()),
          `Seasonal Naive` = SNAIVE(Turnover_Sum ~ lag("year")))
takeaway_fc <- takeaway_fit |>
    forecast(h = 24)
takeaway_fc |>
    autoplot(filter_index(train, "2011 Jan" ~ .), level=NULL) +
    autolayer(filter_index(takeaway, "2016 Dec" ~ .),
              Turnover_Sum,
              colour = "black") +
    labs(y = "Turnover",
         title = "Monthly Turnover Forecasts for Australian Takeaway Industry") +
    guides(color = guide_legend(title = "Forecast"))

The Seasonal Naive forecasts for the last two years of data are in fact close to the actual values. Neither of the other methods would provide as reliable a forecast due to the high seasonality.

Exercise 5.2:

Use the Facebook stock price (data set gafa_stock) to do the following:

data(gafa_stock)
facebook_stock <- gafa_stock |>
    filter(Symbol == "FB") |>
    mutate(day = row_number()) |>
    update_tsibble(index = day, regular = TRUE)
facebook_stock |>
    autoplot(Close)

train <- facebook_stock |>
    filter_index("1" ~ "1228")
facebook_stock_fit <- train |>
    model(Drift = RW(Close ~ drift()))
facebook_stock_fc <- facebook_stock_fit |>
    forecast(h = 30)
p1 <- facebook_stock_fc |>
    autoplot(train, level=NULL) +
    autolayer(filter_index(facebook_stock, "1228" ~ .),
              Close,
              colour = "black") +
    labs(y = "Close",
         title = "Daily Forecasts for Facebook Stock") +
    guides(color = guide_legend(title = "Forecast")) +
    theme()
p1

x0 <- 1
y0 <- facebook_stock[[x0, 6]]
x1 <- 1228
y1 <- facebook_stock[[x1, 6]]
p1 <- p1 + 
    geom_segment(aes(x = x0, y = y0, xend = x1, yend = y1),
                 color = "blue", lty="dotted")
p1

facebook_stock_fit <- train |>
    model(Drift = RW(Close ~ drift()),
          Mean = MEAN(Close),
          Naive = NAIVE(Close))
facebook_stock_fc <- facebook_stock_fit |>
    forecast(h = 30)
p2 <- facebook_stock_fc |>
    autoplot(train, level=NULL) +
    autolayer(filter_index(facebook_stock, "1228" ~ .),
              Close,
              colour = "black") +
    labs(y = "Close",
         title = "Daily Forecasts for Facebook Stock") +
    guides(color = guide_legend(title = "Forecast"))
p2

The NAIVE(y) method is the best forecasting method here. Its forecasts are generally closer to the actual values for the last 30 days of data than either the Mean or Drift methods.

Exercise 5.3:

Apply a seasonal naive method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
    filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |>
    model(SNAIVE(Beer))
# Look at the residuals
fit |>
    gg_tsresiduals()

# Look a some forecasts
fit |>
    forecast() |>
    autoplot(recent_production)

What do you conclude?

There’s a large spike outside the blue dotted bounds of the ACF plot at r4, which indicates autocorrelation; the data are not white noise. We can confirm via the Ljung-Box test that the residuals are distinguishable from white noise:

aug <- fit |>
    augment() |>
    features(.innov, ljung_box, lag = 8)
kable(aug, format="simple")
.model lb_stat lb_pvalue
SNAIVE(Beer) 32.26893 8.34e-05

The p-value is very small (0.000083), so we reject the null hypothesis that the data are white noise.

Exercise 5.4:

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

aus_exports <- global_economy |>
    filter(Country == "Australia") |>
    select(c("Country", "Code", "Year", "Exports"))
fit <- aus_exports |>
    model(NAIVE(Exports))
# Look at the residuals
fit |>
    gg_tsresiduals()

# Look a some forecasts
fit |>
    forecast() |>
    autoplot(aus_exports)

While one spike does lie outside the blue bounds of the ACF plot for the Australian exports data, it is not a very large spike. So there is evidence that there’s no autocorrelation here; the data are white noise.

aug <- fit |>
    augment() |>
    features(.innov, ljung_box, lag = 10)
kable(aug, format="simple")
Country .model lb_stat lb_pvalue
Australia NAIVE(Exports) 16.3655 0.0896368

Using the Ljung-Box test again, we see a high enough p-value not to reject the null hypothesis that the data are white noise.

bricks <- aus_production |>
    select(c("Quarter", "Bricks")) |>
    filter(!is.na(Bricks))
fit <- bricks |>
    model(SNAIVE(Bricks))
# Look at the residuals
fit |>
    gg_tsresiduals()

# Look at some forecasts
fit |>
    forecast() |>
    autoplot(bricks)

For the Australian bricks production data, we see many spikes that lie outside the blue bounds of the ACF plot. So there is evidence of autocorrelation; the data are not white noise. We can confirm this again via the Ljung-Box test:

aug <- fit |>
    augment() |>
    features(.innov, ljung_box, lag = 8)
kable(aug, format="simple")
.model lb_stat lb_pvalue
SNAIVE(Bricks) 274.1714 0

The p-value is low enough to reject the null hypothesis that the data are white noise.

Exercise 5.7:

For your retail time series (from Exercise 7 in Section 2.10):

data(aus_retail)
set.seed(1221)
my_series <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
my_series_train <- my_series |>
    filter(year(Month) < 2011)
autoplot(my_series, Turnover) +
    autolayer(my_series_train, Turnover, color = "red")

Yes, the data have been split appropriately.

fit <- my_series_train |>
    model(SNAIVE(Turnover))
fit |>
    gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

aug <- fit |>
    augment() |>
    features(.innov, ljung_box, lag = 8)
kable(aug, format="simple")
State Industry .model lb_stat lb_pvalue
Australian Capital Territory Newspaper and book retailing SNAIVE(Turnover) 805.9453 0

The residuals appear to be uncorrelated and normally distributed, but the many spikes that lie outside the blue bounds on the ACF plot are confusing.

fc <- fit |>
    forecast(new_data = anti_join(my_series, my_series_train))
fc |>
    autoplot(my_series)

fit |>
    accuracy() |>
    select(c(".model", ".type", "RMSE", "MAE", "MAPE", "MASE")) |>
    kable(format = "simple")
.model .type RMSE MAE MAPE MASE
SNAIVE(Turnover) Training 1.551014 1.105706 14.95283 1
fc |>
    accuracy(my_series) |>
    select(c(".model", ".type", "RMSE", "MAE", "MAPE", "MASE")) |>
    kable(format = "simple")
.model .type RMSE MAE MAPE MASE
SNAIVE(Turnover) Test 3.624842 3.511458 66.29122 3.175762

Our forecasts using this model are not very accurate. Since the MASE of the forecast model is greater than 1, this suggests that the model is no better than a naive model.

The MASE accuracy measure we used to evaluate the above model is independent of the amount of training data used.