Exercise: 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

(a) Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer: Sincr the amount of random numbers increases, the closer the blue-bounded lines. Yes, all 3 figures show that the data are white noise, but it also shows that larger sample sizes better approximate that autocorrelations = 0.

(b) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer: The critical values depend on the sample size. The bounds are given by ±1.96𝑛±n 1.96 ​, so as 𝑛n increases, the bounds become narrower. The autocorrelations differ in each figure because smaller samples have greater variability, causing larger random fluctuations even though the data are still white noise.

Exercise 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

# Load required packages
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0
## ✔ ggplot2     3.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
# Extract Amazon stock data
amazon <- gafa_stock |>
  filter(Symbol == "AMZN")

# Plot daily closing prices
amazon |>
  autoplot(Close) +
  labs(title = "Amazon Daily Closing Prices",
       y = "Price", x = "Date")

ACF Plot:

# ACF plot
amazon |>
  ACF(Close) |>
  autoplot() +
  labs(title = "ACF of Amazon Closing Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

PACF Plot:

# PACF plot
amazon |>
  PACF(Close) |>
  autoplot() +
  labs(title = "PACF of Amazon Closing Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

Exercise: 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

(a) Turkish GDP from global_economy.

turkey_gdp <- global_economy |>
  filter(Country == "Turkey")

# Plot original series
turkey_gdp |>
  autoplot(GDP) +
  labs(title = "Turkish GDP")

# Find optimal Box-Cox transformation
lambda <- turkey_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

lambda
## [1] 0.1572187
# Apply Box-Cox transformation
turkey_gdp <- turkey_gdp |>
  mutate(GDP_trans = box_cox(GDP, lambda))

# Plot transformed series
turkey_gdp |>
  autoplot(GDP_trans) +
  labs(title = "Transformed Turkish GDP")

# First difference
turkey_gdp <- turkey_gdp |>
  mutate(GDP_diff = difference(GDP_trans))

# Plot differenced series
turkey_gdp |>
  autoplot(GDP_diff) +
  labs(title = "Differenced Transformed Turkish GDP")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

# Check ACF
turkey_gdp |>
  ACF(GDP_diff) |>
  autoplot()

Comment: The estimated Box–Cox parameter will be close to 0, so a log transformation is appropriate to stabilize the variance. After transformation, the series still shows a trend, indicating non-stationarity. Applying first-order differencing (d = 1) removes the trend and produces a series that fluctuates around a constant mean.

(b) Accommodation takings in the state of Tasmania from aus_accommodation.

# Extract Tasmania accommodation data
tas_accom <- aus_accommodation |>
  filter(State == "Tasmania")

# Plot original series
tas_accom |>
  autoplot(Takings) +
  labs(title = "Accommodation Takings - Tasmania")

# Find Box-Cox lambda
lambda <- tas_accom |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)

lambda
## [1] 0.001819643
# Apply Box-Cox transformation
tas_accom <- tas_accom |>
  mutate(Takings_trans = box_cox(Takings, lambda))

# Plot transformed data
tas_accom |>
  autoplot(Takings_trans) +
  labs(title = "Transformed Takings")

# Apply first and seasonal differencing
tas_accom <- tas_accom |>
  mutate(diff1 = difference(Takings_trans),
         diff_seasonal = difference(diff1, lag = 12))

# Plot differenced data
tas_accom |>
  autoplot(diff_seasonal) +
  labs(title = "Differenced (Trend + Seasonality Removed)")
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Check ACF
tas_accom |>
  ACF(diff_seasonal) |>
  autoplot()

(c) Monthly sales from souvenirs.

# Plot original monthly sales 
souvenirs |>
  autoplot(Sales) +
  labs(title = "Monthly Souvenir Sales")

# Find Box-Cox lambda
lambda <- souvenirs |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero)

lambda
## [1] 0.002118221
# Apply Box-Cox transformation
souvenirs <- souvenirs |>
  mutate(Sales_trans = box_cox(Sales, lambda))

# Plot transformed series
souvenirs |>
  autoplot(Sales_trans) +
  labs(title = "Transformed Souvenir Sales")

# Apply first and seasonal differencing
souvenirs <- souvenirs |>
  mutate(diff1 = difference(Sales_trans),
         diff_seasonal = difference(diff1, lag = 12))

# Plot differenced series
souvenirs |>
  autoplot(diff_seasonal) +
  labs(title = "Differenced (Trend + Seasonality Removed)")
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Check ACF
souvenirs |>
  ACF(diff_seasonal) |>
  autoplot()

Exercise: 9.6

Simulate and plot some data from simple ARIMA models.

(a) Use the following R code to generate data from an AR(1) model with ϕ1=0.6and σ2=1. The process starts with y1=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

(a) Produce a time plot for the series. How does the plot change as you change ϕ1?

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
set.seed(123)

phi_values <- c(0, 0.5, 0.9, -0.5)
n <- 200

ar1_series <- lapply(phi_values, function(phi) {
  tibble(
    time = 1:n,
    y = as.numeric(arima.sim(n = n, list(ar = phi))),
    phi1 = phi
  )
}) |> bind_rows()
## Warning in min(Mod(polyroot(c(1, -model$ar)))): no non-missing arguments to
## min; returning Inf
# Plot
ggplot(ar1_series, aes(x = time, y = y, color = factor(phi1))) +
  geom_line() +
  facet_wrap(~phi1, scales = "free_y") +
  labs(title = "AR(1) Series for Different phi1 Values",
       color = expression(phi[1]))

(c) Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

set.seed(123)

n <- 200

# Generate MA(1) series
ma1_series <- arima.sim(
  n = n,
  list(ma = 0.6),   # theta1 = 0.6
  sd = 1            # since variance = 1 → sd = sqrt(1) = 1
)

# Convert to data frame for plotting
ma1_df <- data.frame(
  time = 1:n,
  y = as.numeric(ma1_series)
)

# Plot the series
library(ggplot2)

ggplot(ma1_df, aes(x = time, y = y)) +
  geom_line() +
  labs(title = "Simulated MA(1) Series (theta1 = 0.6, sigma^2 = 1)",
       x = "Time", y = "Value")

(d) Produce a time plot for the series. How does the plot change as you change θ1?

library(tidyverse)

set.seed(123)

theta_values <- c(-0.8, -0.4, 0, 0.4, 0.8)
n <- 200

# Create a proper data frame directly
ma1_series <- map_dfr(theta_values, function(theta) {
  tibble(
    time = 1:n,
    y = as.numeric(arima.sim(n = n, list(ma = theta), sd = 1)),
    theta1 = theta
  )
})

# Check structure (optional but useful)
str(ma1_series)
## tibble [1,000 × 3] (S3: tbl_df/tbl/data.frame)
##  $ time  : int [1:1000] 1 2 3 4 5 6 7 8 9 10 ...
##  $ y     : num [1:1000] 0.2182 1.7429 -1.1765 0.0729 1.6116 ...
##  $ theta1: num [1:1000] -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 ...
# Plot
ggplot(ma1_series, aes(x = time, y = y)) +
  geom_line() +
  facet_wrap(~theta1, scales = "free_y") +
  labs(title = "MA(1) Series for Different Theta1 Values",
       x = "Time", y = "Value")

(e) Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

set.seed(123)

n <- 200

# Generate ARMA(1,1) series
arma11_series <- arima.sim(
  n = n,
  model = list(ar = 0.6, ma = 0.6),  # phi1 and theta1
  sd = 1                             # since variance = 1 → sd = 1
)

# Convert to data frame for plotting
arma_df <- data.frame(
  time = 1:n,
  y = as.numeric(arma11_series)
)

# Plot
library(ggplot2)

ggplot(arma_df, aes(x = time, y = y)) +
  geom_line() +
  labs(title = "Simulated ARMA(1,1) Series (phi1 = 0.6, theta1 = 0.6)",
       x = "Time", y = "Value")

(f) Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

set.seed(123)

n <- 200
y <- numeric(n)

# Generate white noise
e <- rnorm(n, mean = 0, sd = 1)

# Initialize first two values
y[1] <- e[1]
y[2] <- e[2]

# Generate AR(2) manually
for (t in 3:n) {
  y[t] <- -0.8 * y[t-1] + 0.3 * y[t-2] + e[t]
}

# Convert to data frame
ar2_df <- data.frame(time = 1:n, y = y)

# Plot
library(ggplot2)

ggplot(ar2_df, aes(x = time, y = y)) +
  geom_line() +
  labs(title = "Non-stationary AR(2) Series (phi1 = -0.8, phi2 = 0.3)",
       x = "Time", y = "Value")

(g) Graph the latter two series and compare them.

set.seed(123)
n <- 200

# --- ARMA(1,1): phi1 = 0.6, theta1 = 0.6 ---
arma <- arima.sim(
  n = n,
  model = list(ar = 0.6, ma = 0.6),
  sd = 1
)

arma_df <- tibble(
  time = 1:n,
  y = as.numeric(arma),
  series = "ARMA(1,1)"
)

# --- AR(2): phi1 = -0.8, phi2 = 0.3 (manual simulation) ---
y <- numeric(n)
e <- rnorm(n, 0, 1)

y[1] <- e[1]
y[2] <- e[2]

for (t in 3:n) {
  y[t] <- -0.8 * y[t-1] + 0.3 * y[t-2] + e[t]
}

ar2_df <- tibble(
  time = 1:n,
  y = y,
  series = "AR(2) (non-stationary)"
)

# Combine both
combined <- bind_rows(arma_df, ar2_df)

# Plot
ggplot(combined, aes(x = time, y = y)) +
  geom_line() +
  facet_wrap(~series, scales = "free_y") +
  labs(title = "Comparison of ARMA(1,1) and AR(2) Series",
       x = "Time", y = "Value")

Comment: The ARMA(1,1) series behaves like a typical stationary process, while the AR(2) series clearly shows non-stationary behavior, making it unsuitable for standard modeling without transformation.

Exercise 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

(a) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

# Plot the data
aus_airpassengers |>
  autoplot(Passengers) +
  labs(title = "Australian Air Passengers",
       y = "Millions", x = "Year")

# Fit ARIMA model (automatic selection)
fit <- aus_airpassengers |>
  model(ARIMA(Passengers))

# View model
report(fit)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65
# Check residuals
fit |>
  gg_tsresiduals()
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Forecast next 10 periods
fc <- fit |>
  forecast(h = 10)

# Plot forecasts
fc |>
  autoplot(aus_airpassengers) +
  labs(title = "Forecasts for Australian Air Passengers",
       y = "Millions", x = "Year")

(b) Write the model in terms of the backshift operator.

The general ARIMA(0,1,1) with drift model is:

(1−𝐵)𝑦𝑡=𝑐+(1+𝜃1𝐵)𝜀𝑡(1−B)yt=c+(1+θ1B)εt

(c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

# Fit ARIMA(0,1,0) with drift
fit_rw <- aus_airpassengers |>
  model(RW(Passengers ~ drift()))

# Forecast next 10 periods
fc_rw <- fit_rw |>
  forecast(h = 10)

# Plot and compare with original ARIMA model from part (a)
fc_rw |>
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(0,1,0) with Drift Forecasts",
       y = "Millions", x = "Year")

(d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

# --- ARIMA(2,1,2) with drift ---
fit_212_drift <- aus_airpassengers |>
  model(ARIMA(Passengers ~ pdq(2,1,2)))
## Warning: It looks like you're trying to fully specify your ARIMA model but have not said
## if a constant should be included. You can include a constant using `ARIMA(y~1)`
## to the formula or exclude it by adding `ARIMA(y~0)`.
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
# Forecast
fc_212_drift <- fit_212_drift |>
  forecast(h = 10)

# Plot
fc_212_drift |>
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(2,1,2) with Drift Forecasts",
       y = "Millions", x = "Year")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

# --- ARIMA(2,1,2) without drift ---
fit_212_nodrift <- aus_airpassengers |>
  model(ARIMA(Passengers ~ pdq(2,1,2) + 0))
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + 0)
## [1] non-stationary AR part from CSS
# Forecast
fc_212_nodrift <- fit_212_nodrift |>
  forecast(h = 10)

# Plot
fc_212_nodrift |>
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(2,1,2) without Drift Forecasts",
       y = "Millions", x = "Year")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

(e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

# Fit ARIMA(0,2,1) with constant
fit_021 <- aus_airpassengers |>
  model(ARIMA(Passengers ~ pdq(0,2,1)))

# Forecast next 10 periods
fc_021 <- fit_021 |>
  forecast(h = 10)

# Plot forecasts
fc_021 |>
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(0,2,1) with Constant Forecasts",
       y = "Millions", x = "Year")

What happens?

Comment: The forecasts show a curved accelerating trend, rather than a straight line. The series increase rapidly over time (or bend depending on the estimated constant). Prediction intervals grow very quickly, becoming wide.

Exercise- 9.8

For the United States GDP series (from global_economy):

(a) if necessary, find a suitable Box-Cox transformation for the data;

# Extract US GDP data
us_gdp <- global_economy |>
  filter(Country == "United States")

# Plot the data
us_gdp |>
  autoplot(GDP) +
  labs(title = "US GDP", y = "GDP", x = "Year")

# Estimate Box-Cox lambda using Guerrero's method
lambda <- us_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

lambda
## [1] 0.2819443

(b) fit a suitable ARIMA model to the transformed data using ARIMA();

# Extract US GDP data
us_gdp <- global_economy |>
  filter(Country == "United States")

# Estimate Box-Cox lambda
lambda <- us_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

# Fit ARIMA model to transformed data
fit <- us_gdp |>
  model(ARIMA(box_cox(GDP, lambda)))

# View selected model
report(fit)
## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

(c) try some other plausible models by experimenting with the orders chosen;

# Extract US GDP data
us_gdp <- global_economy |>
  filter(Country == "United States")

# Box-Cox transformation (log)
lambda <- us_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

# Fit alternative models
fits <- us_gdp |>
  model(
    ARIMA011 = ARIMA(box_cox(GDP, lambda) ~ pdq(0,1,1)),
    ARIMA111 = ARIMA(box_cox(GDP, lambda) ~ pdq(1,1,1)),
    ARIMA210 = ARIMA(box_cox(GDP, lambda) ~ pdq(2,1,0)),
    ARIMA212 = ARIMA(box_cox(GDP, lambda) ~ pdq(2,1,2))
  )

# Compare models
glance(fits)
## # A tibble: 4 × 9
##   Country       .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <fct>         <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 United States ARIMA011  5689.   -326.  659.  659.  665. <cpl [0]> <cpl [1]>
## 2 United States ARIMA111  5580.   -325.  659.  659.  667. <cpl [1]> <cpl [1]>
## 3 United States ARIMA210  5580.   -325.  659.  659.  667. <cpl [2]> <cpl [0]>
## 4 United States ARIMA212  5734.   -325.  662.  664.  674. <cpl [2]> <cpl [2]>
fits |> select(ARIMA011) |> gg_tsresiduals()

fits |> select(ARIMA111) |> gg_tsresiduals()

fits |> select(ARIMA210) |> gg_tsresiduals()

fits |> select(ARIMA212) |> gg_tsresiduals()

(d) choose what you think is the best model and check the residual diagnostics;

Comment: ARIMA(0,1,1) is the best model.

(e) produce forecasts of your fitted model. Do the forecasts look reasonable?

best_fit <- us_gdp |>
  model(ARIMA(GDP ~ pdq(0,1,1)))

fc <- best_fit |>
  forecast(h = 10)

autoplot(fc) +
  labs(title = "US GDP Forecasts (ARIMA(0,1,1))",
       y = "GDP (millions)", x = "Year")

(f) compare the results with what you would obtain using ETS() (with no transformation).

# Filter US GDP
us_gdp_filtered <- global_economy |>
  filter(Country == "United States")

# --- ETS model (no transformation) ---
fit_ets <- us_gdp_filtered |>
  model(ETS(GDP ~ error("A") + trend("A") + season("N")))

# Forecast 10 periods ahead
fc_ets <- fit_ets |> forecast(h = 10)

# --- ARIMA model (already fitted) ---
fc_arima <- best_fit |> forecast(h = 10)

# Plot both forecasts together
autoplot(us_gdp_filtered) +
  autolayer(fc_arima, series = "ARIMA(0,1,1)") +
  autolayer(fc_ets, series = "ETS(A,A,N)") +
  labs(title = "US GDP Forecasts: ARIMA vs ETS",
       y = "GDP (millions)", x = "Year") +
  scale_color_manual(values = c("ARIMA(0,1,1)" = "blue", "ETS(A,A,N)" = "red")) +
  guides(color = guide_legend(title = "Model"))
## Plot variable not specified, automatically selected `.vars = GDP`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.