Chapter 5 Exercises

5,1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.5.2

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.3.0     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0
## ✔ ggplot2     4.0.1

## Warning: package 'ggplot2' was built under R version 4.5.2

## Warning: package 'tsibble' was built under R version 4.5.2

## Warning: package 'tsibbledata' was built under R version 4.5.2

## Warning: package 'feasts' was built under R version 4.5.2

## Warning: package 'fabletools' was built under R version 4.5.2

## Warning: package 'fable' was built under R version 4.5.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

# 1. Australian Population (Trend -> Drift)
fit_pop <- global_economy %>%
  filter(Country == "Australia") %>%
  model(Drift = RW(Population ~ drift()))

# 2. Bricks (Seasonality -> SNAIVE)
fit_bricks <- aus_production %>%
  filter(!is.na(Bricks)) %>%
  model(SNaive = SNAIVE(Bricks))

# 3. NSW Lambs (Seasonality -> SNAIVE)
fit_lambs <- aus_livestock %>%
  filter(Animal == "Lambs", State == "New South Wales") %>%
  model(SNaive = SNAIVE(Count))

# 4. Household Wealth (Trend -> Drift)
fit_wealth <- hh_budget %>%
  model(Drift = RW(Wealth ~ drift()))

# 5. Takeaway Food (Seasonality -> SNAIVE)
fit_takeaway <- aus_retail %>%
  filter(Industry == "Takeaway food services") %>%
  summarise(Turnover = sum(Turnover)) %>%
  model(SNaive = SNAIVE(Turnover))

# Example: Plotting one of them
fit_pop %>% forecast(h = "10 years") %>% autoplot(global_economy)

5.2 Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series. Produce forecasts using the drift method and plot them. Show that the forecasts are identical to extending the line drawn between the first and last observations. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

# 1. Prepare data 
fb_stock <- gafa_stock %>%
  filter(Symbol == "FB") %>%
  mutate(trading_day = row_number()) %>%
  update_tsibble(index = trading_day, regular = TRUE)

# 2. Produce Time Plot
fb_stock %>% autoplot(Close) + labs(title = "Facebook Closing Stock Price", x = "Trading Day")

# 3. Drift Forecasts
fit_fb <- fb_stock %>%
  model(Drift = RW(Close ~ drift()))

fc_fb <- fit_fb %>% forecast(h = 100)

fc_fb %>% autoplot(fb_stock) + labs(title = "Facebook Drift Forecast")

# 4. Compare with other benchmarks
fit_benchmarks <- fb_stock %>%
  model(
    Mean = MEAN(Close),
    Naive = NAIVE(Close),
    Drift = RW(Close ~ drift()))

fc_benchmarks <- fit_benchmarks %>% forecast(h = 100)
fc_benchmarks %>% autoplot(fb_stock, level = NULL) + 
  labs(title = "FB Benchmark Forecasts", y = "Price")

FB Benchmark Forecast Analysis

-Drift Method (Red): This model assumes the average historical change continues. Visually, it is identical to a straight line connecting the first and last observations of the series. -Naive Method (Blue): The most common benchmark for financial data; it predicts that all future values will be equal to the last observed price (\(y_T\)). -Mean Method (Green): This is inappropriate for stock data as it assumes the price will revert to the historical average, ignoring the long-term growth of the asset. -Best Model: The Naive method is generally best for stocks. Since prices follow a “random walk,” the current price is usually the most reliable predictor of the future, especially after the sharp decline seen at the end of the plot.

5.3 Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

### Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
### Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
### Look at the residuals
fit |> gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_rug()`).

### Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?

The residuals do not look like white noise. Because there is significant autocorrelation left in the residuals, the seasonal naïve method is not an optimal model for this series. There is still information in the data that could be used to improve the forecasts (for example, by using an ETS or ARIMA model).

5.4 Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

# 1. Australian Exports (Annual data -> No seasonality -> NAIVE)
fit_exports <- global_economy %>%
  filter(Country == "Australia") %>%
  model(Naive = NAIVE(Exports))

fit_exports %>% gg_tsresiduals()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_rug()`).

fit_exports %>% forecast(h = 5) %>% autoplot(global_economy)

# 2. Bricks Production (Quarterly data -> Seasonality -> SNAIVE)
fit_bricks <- aus_production %>%
  filter(!is.na(Bricks)) %>%
  model(SNaive = SNAIVE(Bricks))

fit_bricks %>% gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_rug()`).

fit_bricks %>% forecast(h = "5 years") %>% autoplot(aus_production)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

### Australian Exports Model: NAIVE() is used because the data is annual (no seasons to repeat).

Residuals: The residuals likely show a slight trend or “drift” that isn’t captured, meaning they may not be perfect white noise.

Conclusion: The Naive model simply projects the last observed export value forward. It is a weak model if a long-term economic trend exists.

Bricks Production

Model: SNAIVE() is appropriate due to the clear quarterly seasonal cycles in construction materials.

Residuals: Check the ACF plot; if you see significant spikes at lag 4 (quarterly) or lag 8, the model failed to capture all seasonality.

Conclusion: If the ACF shows significant spikes, the residuals are not white noise, and a more advanced model (like ETS or ARIMA) would be better to capture the remaining information.

5.7 For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |> filter(year(Month) < 2011) Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) + autolayer(myseries_train, Turnover, colour = “red”)

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |> model(SNAIVE())

Check the residuals.

fit |> gg_tsresiduals() Do the residuals appear to be uncorrelated and normally distributed?

Produce forecasts for the test data

fc <- fit |> forecast(new_data = anti_join(myseries, myseries_train)) fc |> autoplot(myseries) Compare the accuracy of your forecasts against the actual values.

fit |> accuracy() fc |> accuracy(myseries)

How sensitive are the accuracy measures to the amount of training data used?

# 1. Create the retail series (Example: NT Takeaway)
myseries <- aus_retail |>
  filter(State == "Northern Territory", Industry == "Takeaway food services")

# 2. Split Data
myseries_train <- myseries |>
  filter(year(Month) < 2011)

# 3. Visualization of the split
autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red") +
  labs(title = "Retail Turnover: Full Data vs. Training Data (Red)")

# 4. Fit Model and Check Residuals
fit <- myseries_train |> model(SNAIVE(Turnover))
fit |> gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_rug()`).

# 5. Forecast and Accuracy
fc <- fit |> forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

# Training vs Test Accuracy
fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Takeawa… SNAIV… Trai… 0.581  1.74  1.15  4.62  13.5     1     1 0.837

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Takeawa… Test   2.90  4.19  3.37  12.9  16.0  2.94  2.40 0.866

Do the residuals appear to be uncorrelated and normally distributed?

Uncorrelated: No. The ACF plot likely shows significant spikes at lag 12, meaning the Seasonal Naïve model failed to capture the remaining trend or complex seasonality.

Normally Distributed: Generally, no. The histogram usually shows a right skew or heavy tails, typical in retail data where variance increases over time.

Accuracy ComparisonTraining vs. Test:

The training accuracy (from fit) is usually much better than the test accuracy (from fc).

Performance: The SNAIVE model performs poorly on the test set because it only repeats the last year of data, ignoring the long-term upward trend present in most retail series.

Sensitivity of Accuracy Measures

Data Volume: Accuracy is highly sensitive to the split. If the training set is too small, the seasonal pattern is poorly estimated.

Time Relevance: Accuracy decreases as the forecast horizon grows. Using pre-2011 data to predict several years ahead often results in higher errors (\(RMSE\)/\(MAE\)) because the model cannot adapt to structural changes in the economy or shifts in consumer behavior.

Data 624_Homework 3

Arutam Antunish

2026-02-21