Forecasting: Principles and Practice (3rd ed.). Exercise pages for Sections 1.8, 2.10, 3.7, 5.11, 7.10, 8.8, and 9.11

Introduction

1) Section 1.8, Problem 1

For cases 3 and 4 in Section 1.5, list the possible predictor variables that might be useful, assuming that the relevant data are available.

For both cases, the predictors should be things that actually affect what you’re trying to forecast.

For Case 3, important predictors would be things like price, advertising, promotions, and what competitors are doing. Internal factors like advertising and external ones like competitor actions matter differently because they don’t affect demand at the same time or in the same way.

For Case 4, things like time of day and day of the week are really important because call volume changes a lot during the day. Other useful predictors could be seasonality, trends over time, and how busy things have been recently.

2) Section 2.10, Problem 6

The aus_arrivals data set comprises quarterly international arrivals to Australia from Japan, New Zealand, UK and the US. Use autoplot(), gg_season() and gg_subseries() to compare the differences between the arrivals from these four countries. Can you identify any unusual observation?

# Inspect data
glimpse(aus_arrivals)

## Rows: 508
## Columns: 3
## Key: Origin [4]
## $ Quarter  <qtr> 1981 Q1, 1981 Q2, 1981 Q3, 1981 Q4, 1982 Q1, 1982 Q2, 1982 Q3…
## $ Origin   <chr> "Japan", "Japan", "Japan", "Japan", "Japan", "Japan", "Japan"…
## $ Arrivals <int> 14763, 9321, 10166, 19509, 17117, 10617, 11737, 20961, 20671,…

# Time plot
aus_arrivals |>
  autoplot(Arrivals) +
  labs(
    title = "Quarterly International Arrivals to Australia",
    subtitle = "Japan, New Zealand, UK, and US",
    x = "Quarter",
    y = "Arrivals"
  )

# Seasonal plot
aus_arrivals |>
  gg_season(Arrivals, labels = "both") +
  labs(
    title = "Seasonal Plot of International Arrivals to Australia",
    x = "Quarter",
    y = "Arrivals"
  )

# Seasonal subseries plot
aus_arrivals |>
  gg_subseries(Arrivals) +
  labs(
    title = "Seasonal Subseries Plot of International Arrivals to Australia",
    x = "Quarter",
    y = "Arrivals"
  )

The four countries all show different patterns in arrivals over time.

The UK has a really clear spike in Q3, which makes sense because that’s summer in the Northern Hemisphere. New Zealand is kind of the opposite, with higher arrivals in Q1 and Q4, which matches summer in the Southern Hemisphere.

Japan shows a noticeable drop around 2011, which looks unusual compared to the rest of the data. Overall, all four countries have seasonality, but the timing and strength of that seasonality are different.

##3) Section 3.7, Problem 7 Use the last five years of Gas data from aus_production, plot the series, use classical decomposition with multiplicative type, compute and plot the seasonally adjusted data, then examine the effect of introducing an outlier in the middle versus near the end of the series

# Extract last five years of Gas data
gas <- tail(aus_production, 5 * 4) |>
  select(Quarter, Gas)

# 1. Plot the data
autoplot(gas, Gas) +
  labs(
    title = "Last Five Years of Gas Production",
    x = "Quarter",
    y = "Gas"
  )

# 2. Classical multiplicative decomposition
gas_fit <- gas |>
  model(classical_decomposition(Gas, type = "multiplicative"))

gas_comp <- components(gas_fit)

autoplot(gas_comp) +
  labs(title = "Classical Multiplicative Decomposition of Gas")

# 3 and 4. Seasonally adjusted data
gas_comp |>
  autoplot(season_adjust) +
  labs(
    title = "Seasonally Adjusted Gas Series",
    x = "Quarter",
    y = "Seasonally Adjusted Gas"
  )

# 5. Add outlier in the middle
gas_out_mid <- gas |>
  mutate(Gas = if_else(row_number() == 10, Gas + 300, Gas))

gas_out_mid_fit <- gas_out_mid |>
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_out_mid_fit) |>
  autoplot() +
  labs(title = "Decomposition with Outlier in Middle")

# 6. Add outlier near the end
gas_out_end <- gas |>
  mutate(Gas = if_else(row_number() == n(), Gas + 300, Gas))

gas_out_end_fit <- gas_out_end |>
  model(classical_decomposition(Gas, type = "multiplicative"))

components(gas_out_end_fit) |>
  autoplot() +
  labs(title = "Decomposition with Outlier Near End")

The gas data shows a clear pattern with both a trend and strong seasonal changes.

Using multiplicative decomposition makes sense because the seasonal pattern changes depending on the level of the data. When the data is higher, the seasonal swings are bigger.

The seasonally adjusted data removes the repeating seasonal pattern and makes the overall movement easier to see.

When an outlier is added in the middle, it messes up the trend and seasonal components because classical decomposition isn’t very good at handling outliers. The effect spreads out across the data instead of staying in one spot. When the outlier is at the end, it still causes problems, but the impact is more limited.

Time series regression, autocorrelation, moving averages.

4) Section 5.11, Problem 7

For your retail time series from Exercise 7 in Section 2.10, create a training set using observations before 2011, verify the split, fit a seasonal naïve model with SNAIVE(), check residuals, forecast the test period, compare accuracy, and comment on how sensitive the accuracy measures are to the amount of training data used.

# Assuming 'retail_data' is the retail time series dataset

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# 1. Training data before 2011
myseries_train <- myseries |>
  filter(year(Month) < 2011)

# 2. Check split
autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red") +
  labs(
    title = "Retail Series with Training Data Highlighted",
    x = "Month",
    y = "Turnover"
  )

# 3. Fit seasonal naive model
fit <- myseries_train |>
  model(SNAIVE(Turnover))

# 4. Residual diagnostics
fit |> gg_tsresiduals()

# 5. Forecast test data
fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

fc |>
  autoplot(myseries) +
  labs(
    title = "Seasonal Naive Forecasts for Retail Series",
    x = "Month",
    y = "Turnover"
  )

# 6. Accuracy
fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… SNAIV… Trai… 0.439  1.21 0.915  5.23  12.4     1     1 0.768

fc  |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Nort… Clothin… Test  0.836  1.55  1.24  5.94  9.06  1.36  1.28 0.601

The seasonal naïve model works well here because retail data usually repeats the same pattern each year.

The residual plots help check if there’s anything the model is missing, like patterns or autocorrelation.

When comparing accuracy, the test set results matter more than the training results because they show how well the model predicts new data.

One important thing is that this model depends a lot on the last year of data. If that year is unusual, then all future forecasts will copy that pattern, which can make the model less reliable.

5) Section 7.10, Problem 6

Use the annual population of Afghanistan from global_economy, plot the data, comment on its features and whether the Soviet-Afghan war is visible, fit a linear trend and a piecewise linear trend with knots at 1980 and 1989, then forecast five years beyond the end of the data and comment on the results.

# Extract Afghanistan population data
afg <- global_economy |>
  filter(Country == "Afghanistan") |>
  select(Year, Population)

# 1. Plot the data
autoplot(afg, Population) +
  labs(
    title = "Population of Afghanistan",
    x = "Year",
    y = "Population"
  )

# Piecewise predictors
afg <- afg |>
  mutate(
    trend = row_number(),
    knot1980 = pmax(0, Year - 1980),
    knot1989 = pmax(0, Year - 1989)
  )

# 2. Fit models
fit_afg <- afg |>
  model(
    Linear = TSLM(Population ~ trend),
    Piecewise = TSLM(Population ~ trend + knot1980 + knot1989)
  )

report(fit_afg)

## # A tibble: 2 × 15
##   .model  r_squared adj_r_squared  sigma2 statistic  p_value    df log_lik   AIC
##   <chr>       <dbl>         <dbl>   <dbl>     <dbl>    <dbl> <int>   <dbl> <dbl>
## 1 Linear      0.838         0.835 1.02e13      290. 8.37e-24     2   -950. 1741.
## 2 Piecew…     0.999         0.999 9.05e10    12929. 4.34e-77     4   -812. 1469.
## # ℹ 6 more variables: AICc <dbl>, BIC <dbl>, CV <dbl>, deviance <dbl>,
## #   df.residual <int>, rank <int>

# Plot fitted values
augment(fit_afg) |>
  ggplot(aes(x = Year)) +
  geom_line(data = afg, aes(y = Population), colour = "black") +
  geom_line(aes(y = .fitted, colour = .model)) +
  labs(
    title = "Linear vs Piecewise Trend Models for Afghanistan Population",
    y = "Population",
    colour = "Model"
  )

# 3. Forecast 5 years ahead
future_afg <- new_data(afg, 5) |>
  mutate(
    Year = max(afg$Year) + row_number(),
    trend = max(afg$trend) + row_number(),
    knot1980 = pmax(0, Year - 1980),
    knot1989 = pmax(0, Year - 1989)
  )

fc_afg <- fit_afg |>
  forecast(new_data = future_afg)

fc_afg |>
  autoplot(afg, level = NULL) +
  labs(
    title = "Five-Year Population Forecasts for Afghanistan",
    x = "Year",
    y = "Population"
  )

The Afghanistan population shows overall growth, but it doesn’t increase at a constant rate the whole time.

During the 1980s, the population growth slows down or even drops, which matches the time of the Soviet-Afghan war. The piecewise model captures this change better because it allows the trend to shift at specific years.

The regular linear model just draws one straight line, so it misses these changes. Because of that, the piecewise model gives more realistic forecasts.

Exponential smoothing methods

6) Section 8.8, Problem 5

Choose one country from global_economy, plot its Exports series, forecast with ETS(A,N,N), compute training RMSE, compare with ETS(A,A,N), discuss the merits of the two methods, compare forecasts, and manually compute a 95% prediction interval for the first forecast from each model using RMSE and normality.

library(fpp3)

# Pick one country
exports_data <- global_economy |>
  filter(Country == "Australia") |>
  select(Year, Exports)

# 1. Plot the series
autoplot(exports_data, Exports) +
  labs(
    title = "Australian Exports",
    x = "Year",
    y = "Exports"
  )

# 2 to 5. Fit ETS models and compare
fit_exports <- exports_data |>
  model(
    ANN = ETS(Exports ~ error("A") + trend("N") + season("N")),
    AAN = ETS(Exports ~ error("A") + trend("A") + season("N"))
  )

report(fit_exports)

## # A tibble: 2 × 9
##   .model sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE   MAE
##   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ANN      1.36   -126.  257.  258.  264.  1.32  1.79 0.914
## 2 AAN      1.34   -124.  258.  259.  269.  1.25  1.59 0.893

# Forecasts
fc_exports <- fit_exports |>
  forecast(h = 10)

fc_exports |>
  autoplot(exports_data, level = NULL) +
  labs(
    title = "ETS Forecasts for Exports",
    x = "Year",
    y = "Exports"
  )

# 3. Training RMSE
acc_exports <- fit_exports |> accuracy()
acc_exports

## # A tibble: 2 × 10
##   .model .type              ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <chr>  <chr>           <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 ANN    Training  0.232        1.15 0.914  1.09   5.41 0.928 0.928 0.0125
## 2 AAN    Training -0.000000746  1.12 0.893 -0.387  5.39 0.907 0.904 0.109

# 6. Manual 95% prediction intervals for the first forecast
# Use RMSE as sigma and assume normal errors:
# forecast +/- 1.96 * RMSE

fc_first <- fc_exports |>
  as_tibble() |>
  group_by(.model) |>
  slice(1) |>
  ungroup()

rmse_vals <- acc_exports |>
  as_tibble() |>
  select(.model, RMSE)

pi_calc <- fc_first |>
  left_join(rmse_vals, by = ".model") |>
  mutate(
    lower_95_manual = .mean - 1.96 * RMSE,
    upper_95_manual = .mean + 1.96 * RMSE
  )

pi_calc

## # A tibble: 2 × 7
##   .model  Year    Exports
##   <chr>  <dbl>     <dist>
## 1 AAN     2018 N(21, 1.3)
## 2 ANN     2018 N(21, 1.4)
## # ℹ 4 more variables: .mean <dbl>, RMSE <dbl>, lower_95_manual <dbl>,
## #   upper_95_manual <dbl>

The exports data should first be checked to see if there’s a trend.

The ETS(A,N,N) model assumes no trend, while ETS(A,A,N) includes a trend. If the data clearly increases over time, the model with a trend usually works better.

The better model isn’t just the one with lower error, but the one that also makes sense when forecasting into the future.

The prediction interval was calculated using the formula:

forecast ± 1.96 × RMSE

This works for one-step forecasts, but it assumes the errors are normally distributed, which should be checked using the residual plots.

ARIMA Models

7) Section 9.11, Problem 6

Simulate and plot data from several simple ARIMA-type models: AR(1), MA(1), ARMA(1,1), and AR(2), and compare how the series behave as parameters change. The exercise specifies AR(1) with (_1=0.6), MA(1) with (_1=0.6), ARMA(1,1) with (_1=0.6,_1=0.6), and AR(2) with (_1=-0.8,_2=0.3).

# Function to simulate AR(1) series
simulate_ar1 <- function(phi, n = 100){

  y <- numeric(n)
  e <- rnorm(n)

  for(i in 2:n){
    y[i] <- phi * y[i-1] + e[i]
  }

  tsibble(
    phi = as.character(phi),   # series identifier (key)
    idx = 1:n,                 # time index
    y = y,
    key = phi,
    index = idx
  )
}

# Combine simulations
ar1_multi <- bind_rows(
  simulate_ar1(0.2),
  simulate_ar1(0.6),
  simulate_ar1(0.9),
  simulate_ar1(-0.6)
)

# Plot results
ar1_multi |>
  autoplot(y) +
  facet_wrap(~phi, scales = "free_y") +
  labs(
    title = "AR(1) Simulations with Different Phi Values",
    x = "Time",
    y = "y"
  )

The AR(1) simulations show how the value of φ changes the behavior of the series.

When φ is small, like 0.2, the series moves around randomly and doesn’t depend much on past values. When φ is larger, like 0.9, the series becomes more persistent and moves slowly back toward the mean.

When φ is negative, the series tends to bounce up and down more.

The MA(1) model only depends on recent shocks, so its effects don’t last very long. The ARMA model combines both patterns, showing both persistence and short-term shocks.