Exercise: 9.1
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers
and 1,000 random numbers.
(a) Explain the differences among these figures. Do they all
indicate that the data are white noise?
Answer: Sincr the amount of random numbers increases, the closer the
blue-bounded lines. Yes, all 3 figures show that the data are white
noise, but it also shows that larger sample sizes better approximate
that autocorrelations = 0.
(b) Why are the critical values at different distances from the mean
of zero? Why are the autocorrelations different in each figure when they
each refer to white noise?
Answer: The critical values depend on the sample size. The bounds
are given by ±1.96𝑛±n 1.96 , so as 𝑛n increases, the bounds become
narrower. The autocorrelations differ in each figure because smaller
samples have greater variability, causing larger random fluctuations
even though the data are still white noise.
Exercise 9.2
A classic example of a non-stationary series are stock prices. Plot
the daily closing prices for Amazon stock (contained in gafa_stock),
along with the ACF and PACF. Explain how each plot shows that the series
is non-stationary and should be differenced.
# Load required packages
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.5.0
## ✔ ggplot2 3.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
# Extract Amazon stock data
amazon <- gafa_stock |>
filter(Symbol == "AMZN")
# Plot daily closing prices
amazon |>
autoplot(Close) +
labs(title = "Amazon Daily Closing Prices",
y = "Price", x = "Date")

ACF Plot:
# ACF plot
amazon |>
ACF(Close) |>
autoplot() +
labs(title = "ACF of Amazon Closing Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

PACF Plot:
# PACF plot
amazon |>
PACF(Close) |>
autoplot() +
labs(title = "PACF of Amazon Closing Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

Exercise: 9.3
For the following series, find an appropriate Box-Cox transformation
and order of differencing in order to obtain stationary data.
(a) Turkish GDP from global_economy.
turkey_gdp <- global_economy |>
filter(Country == "Turkey")
# Plot original series
turkey_gdp |>
autoplot(GDP) +
labs(title = "Turkish GDP")

# Find optimal Box-Cox transformation
lambda <- turkey_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
lambda
## [1] 0.1572187
# Apply Box-Cox transformation
turkey_gdp <- turkey_gdp |>
mutate(GDP_trans = box_cox(GDP, lambda))
# Plot transformed series
turkey_gdp |>
autoplot(GDP_trans) +
labs(title = "Transformed Turkish GDP")

# First difference
turkey_gdp <- turkey_gdp |>
mutate(GDP_diff = difference(GDP_trans))
# Plot differenced series
turkey_gdp |>
autoplot(GDP_diff) +
labs(title = "Differenced Transformed Turkish GDP")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

# Check ACF
turkey_gdp |>
ACF(GDP_diff) |>
autoplot()

Comment: The estimated Box–Cox parameter will be close to 0, so a
log transformation is appropriate to stabilize the variance. After
transformation, the series still shows a trend, indicating
non-stationarity. Applying first-order differencing (d = 1) removes the
trend and produces a series that fluctuates around a constant mean.
(b) Accommodation takings in the state of Tasmania from
aus_accommodation.
# Extract Tasmania accommodation data
tas_accom <- aus_accommodation |>
filter(State == "Tasmania")
# Plot original series
tas_accom |>
autoplot(Takings) +
labs(title = "Accommodation Takings - Tasmania")

# Find Box-Cox lambda
lambda <- tas_accom |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
lambda
## [1] 0.001819643
# Apply Box-Cox transformation
tas_accom <- tas_accom |>
mutate(Takings_trans = box_cox(Takings, lambda))
# Plot transformed data
tas_accom |>
autoplot(Takings_trans) +
labs(title = "Transformed Takings")

# Apply first and seasonal differencing
tas_accom <- tas_accom |>
mutate(diff1 = difference(Takings_trans),
diff_seasonal = difference(diff1, lag = 12))
# Plot differenced data
tas_accom |>
autoplot(diff_seasonal) +
labs(title = "Differenced (Trend + Seasonality Removed)")
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Check ACF
tas_accom |>
ACF(diff_seasonal) |>
autoplot()

(c) Monthly sales from souvenirs.
# Plot original monthly sales
souvenirs |>
autoplot(Sales) +
labs(title = "Monthly Souvenir Sales")

# Find Box-Cox lambda
lambda <- souvenirs |>
features(Sales, features = guerrero) |>
pull(lambda_guerrero)
lambda
## [1] 0.002118221
# Apply Box-Cox transformation
souvenirs <- souvenirs |>
mutate(Sales_trans = box_cox(Sales, lambda))
# Plot transformed series
souvenirs |>
autoplot(Sales_trans) +
labs(title = "Transformed Souvenir Sales")

# Apply first and seasonal differencing
souvenirs <- souvenirs |>
mutate(diff1 = difference(Sales_trans),
diff_seasonal = difference(diff1, lag = 12))
# Plot differenced series
souvenirs |>
autoplot(diff_seasonal) +
labs(title = "Differenced (Trend + Seasonality Removed)")
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Check ACF
souvenirs |>
ACF(diff_seasonal) |>
autoplot()

Exercise: 9.6
Simulate and plot some data from simple ARIMA models.
(a) Use the following R code to generate data from an AR(1) model
with ϕ1=0.6and σ2=1. The process starts with y1=0.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
(a) Produce a time plot for the series. How does the plot change as
you change ϕ1?
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
set.seed(123)
phi_values <- c(0, 0.5, 0.9, -0.5)
n <- 200
ar1_series <- lapply(phi_values, function(phi) {
tibble(
time = 1:n,
y = as.numeric(arima.sim(n = n, list(ar = phi))),
phi1 = phi
)
}) |> bind_rows()
## Warning in min(Mod(polyroot(c(1, -model$ar)))): no non-missing arguments to
## min; returning Inf
# Plot
ggplot(ar1_series, aes(x = time, y = y, color = factor(phi1))) +
geom_line() +
facet_wrap(~phi1, scales = "free_y") +
labs(title = "AR(1) Series for Different phi1 Values",
color = expression(phi[1]))

(c) Write your own code to generate data from an MA(1) model with
θ1=0.6 and σ2=1.
set.seed(123)
n <- 200
# Generate MA(1) series
ma1_series <- arima.sim(
n = n,
list(ma = 0.6), # theta1 = 0.6
sd = 1 # since variance = 1 → sd = sqrt(1) = 1
)
# Convert to data frame for plotting
ma1_df <- data.frame(
time = 1:n,
y = as.numeric(ma1_series)
)
# Plot the series
library(ggplot2)
ggplot(ma1_df, aes(x = time, y = y)) +
geom_line() +
labs(title = "Simulated MA(1) Series (theta1 = 0.6, sigma^2 = 1)",
x = "Time", y = "Value")

(d) Produce a time plot for the series. How does the plot change as
you change θ1?
library(tidyverse)
set.seed(123)
theta_values <- c(-0.8, -0.4, 0, 0.4, 0.8)
n <- 200
# Create a proper data frame directly
ma1_series <- map_dfr(theta_values, function(theta) {
tibble(
time = 1:n,
y = as.numeric(arima.sim(n = n, list(ma = theta), sd = 1)),
theta1 = theta
)
})
# Check structure (optional but useful)
str(ma1_series)
## tibble [1,000 × 3] (S3: tbl_df/tbl/data.frame)
## $ time : int [1:1000] 1 2 3 4 5 6 7 8 9 10 ...
## $ y : num [1:1000] 0.2182 1.7429 -1.1765 0.0729 1.6116 ...
## $ theta1: num [1:1000] -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 -0.8 ...
# Plot
ggplot(ma1_series, aes(x = time, y = y)) +
geom_line() +
facet_wrap(~theta1, scales = "free_y") +
labs(title = "MA(1) Series for Different Theta1 Values",
x = "Time", y = "Value")

(e) Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and
σ2=1.
set.seed(123)
n <- 200
# Generate ARMA(1,1) series
arma11_series <- arima.sim(
n = n,
model = list(ar = 0.6, ma = 0.6), # phi1 and theta1
sd = 1 # since variance = 1 → sd = 1
)
# Convert to data frame for plotting
arma_df <- data.frame(
time = 1:n,
y = as.numeric(arma11_series)
)
# Plot
library(ggplot2)
ggplot(arma_df, aes(x = time, y = y)) +
geom_line() +
labs(title = "Simulated ARMA(1,1) Series (phi1 = 0.6, theta1 = 0.6)",
x = "Time", y = "Value")

(f) Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1.
(Note that these parameters will give a non-stationary series.)
set.seed(123)
n <- 200
y <- numeric(n)
# Generate white noise
e <- rnorm(n, mean = 0, sd = 1)
# Initialize first two values
y[1] <- e[1]
y[2] <- e[2]
# Generate AR(2) manually
for (t in 3:n) {
y[t] <- -0.8 * y[t-1] + 0.3 * y[t-2] + e[t]
}
# Convert to data frame
ar2_df <- data.frame(time = 1:n, y = y)
# Plot
library(ggplot2)
ggplot(ar2_df, aes(x = time, y = y)) +
geom_line() +
labs(title = "Non-stationary AR(2) Series (phi1 = -0.8, phi2 = 0.3)",
x = "Time", y = "Value")

(g) Graph the latter two series and compare them.
set.seed(123)
n <- 200
# --- ARMA(1,1): phi1 = 0.6, theta1 = 0.6 ---
arma <- arima.sim(
n = n,
model = list(ar = 0.6, ma = 0.6),
sd = 1
)
arma_df <- tibble(
time = 1:n,
y = as.numeric(arma),
series = "ARMA(1,1)"
)
# --- AR(2): phi1 = -0.8, phi2 = 0.3 (manual simulation) ---
y <- numeric(n)
e <- rnorm(n, 0, 1)
y[1] <- e[1]
y[2] <- e[2]
for (t in 3:n) {
y[t] <- -0.8 * y[t-1] + 0.3 * y[t-2] + e[t]
}
ar2_df <- tibble(
time = 1:n,
y = y,
series = "AR(2) (non-stationary)"
)
# Combine both
combined <- bind_rows(arma_df, ar2_df)
# Plot
ggplot(combined, aes(x = time, y = y)) +
geom_line() +
facet_wrap(~series, scales = "free_y") +
labs(title = "Comparison of ARMA(1,1) and AR(2) Series",
x = "Time", y = "Value")

Comment: The ARMA(1,1) series behaves like a typical stationary
process, while the AR(2) series clearly shows non-stationary behavior,
making it unsuitable for standard modeling without transformation.
Exercise 9.7
Consider aus_airpassengers, the total number of passengers (in
millions) from Australian air carriers for the period 1970-2011.
(a) Use ARIMA() to find an appropriate ARIMA model. What model was
selected. Check that the residuals look like white noise. Plot forecasts
for the next 10 periods.
# Plot the data
aus_airpassengers |>
autoplot(Passengers) +
labs(title = "Australian Air Passengers",
y = "Millions", x = "Year")

# Fit ARIMA model (automatic selection)
fit <- aus_airpassengers |>
model(ARIMA(Passengers))
# View model
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
# Check residuals
fit |>
gg_tsresiduals()
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Forecast next 10 periods
fc <- fit |>
forecast(h = 10)
# Plot forecasts
fc |>
autoplot(aus_airpassengers) +
labs(title = "Forecasts for Australian Air Passengers",
y = "Millions", x = "Year")

(b) Write the model in terms of the backshift operator.
The general ARIMA(0,1,1) with drift model is:
(1−𝐵)𝑦𝑡=𝑐+(1+𝜃1𝐵)𝜀𝑡(1−B)yt=c+(1+θ1B)εt
(c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare
these to part a.
# Fit ARIMA(0,1,0) with drift
fit_rw <- aus_airpassengers |>
model(RW(Passengers ~ drift()))
# Forecast next 10 periods
fc_rw <- fit_rw |>
forecast(h = 10)
# Plot and compare with original ARIMA model from part (a)
fc_rw |>
autoplot(aus_airpassengers) +
labs(title = "ARIMA(0,1,0) with Drift Forecasts",
y = "Millions", x = "Year")

(d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare
these to parts a and c. Remove the constant and see what happens.
# --- ARIMA(2,1,2) with drift ---
fit_212_drift <- aus_airpassengers |>
model(ARIMA(Passengers ~ pdq(2,1,2)))
## Warning: It looks like you're trying to fully specify your ARIMA model but have not said
## if a constant should be included. You can include a constant using `ARIMA(y~1)`
## to the formula or exclude it by adding `ARIMA(y~0)`.
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
# Forecast
fc_212_drift <- fit_212_drift |>
forecast(h = 10)
# Plot
fc_212_drift |>
autoplot(aus_airpassengers) +
labs(title = "ARIMA(2,1,2) with Drift Forecasts",
y = "Millions", x = "Year")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

# --- ARIMA(2,1,2) without drift ---
fit_212_nodrift <- aus_airpassengers |>
model(ARIMA(Passengers ~ pdq(2,1,2) + 0))
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + 0)
## [1] non-stationary AR part from CSS
# Forecast
fc_212_nodrift <- fit_212_nodrift |>
forecast(h = 10)
# Plot
fc_212_nodrift |>
autoplot(aus_airpassengers) +
labs(title = "ARIMA(2,1,2) without Drift Forecasts",
y = "Millions", x = "Year")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

(e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What
happens?
# Fit ARIMA(0,2,1) with constant
fit_021 <- aus_airpassengers |>
model(ARIMA(Passengers ~ pdq(0,2,1)))
# Forecast next 10 periods
fc_021 <- fit_021 |>
forecast(h = 10)
# Plot forecasts
fc_021 |>
autoplot(aus_airpassengers) +
labs(title = "ARIMA(0,2,1) with Constant Forecasts",
y = "Millions", x = "Year")

What happens?
Exercise- 9.8
For the United States GDP series (from global_economy):
(c) try some other plausible models by experimenting with the orders
chosen;
# Extract US GDP data
us_gdp <- global_economy |>
filter(Country == "United States")
# Box-Cox transformation (log)
lambda <- us_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
# Fit alternative models
fits <- us_gdp |>
model(
ARIMA011 = ARIMA(box_cox(GDP, lambda) ~ pdq(0,1,1)),
ARIMA111 = ARIMA(box_cox(GDP, lambda) ~ pdq(1,1,1)),
ARIMA210 = ARIMA(box_cox(GDP, lambda) ~ pdq(2,1,0)),
ARIMA212 = ARIMA(box_cox(GDP, lambda) ~ pdq(2,1,2))
)
# Compare models
glance(fits)
## # A tibble: 4 × 9
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 United States ARIMA011 5689. -326. 659. 659. 665. <cpl [0]> <cpl [1]>
## 2 United States ARIMA111 5580. -325. 659. 659. 667. <cpl [1]> <cpl [1]>
## 3 United States ARIMA210 5580. -325. 659. 659. 667. <cpl [2]> <cpl [0]>
## 4 United States ARIMA212 5734. -325. 662. 664. 674. <cpl [2]> <cpl [2]>
fits |> select(ARIMA011) |> gg_tsresiduals()

fits |> select(ARIMA111) |> gg_tsresiduals()

fits |> select(ARIMA210) |> gg_tsresiduals()

fits |> select(ARIMA212) |> gg_tsresiduals()

(d) choose what you think is the best model and check the residual
diagnostics;
(e) produce forecasts of your fitted model. Do the forecasts look
reasonable?
best_fit <- us_gdp |>
model(ARIMA(GDP ~ pdq(0,1,1)))
fc <- best_fit |>
forecast(h = 10)
autoplot(fc) +
labs(title = "US GDP Forecasts (ARIMA(0,1,1))",
y = "GDP (millions)", x = "Year")

(f) compare the results with what you would obtain using ETS() (with
no transformation).
# Filter US GDP
us_gdp_filtered <- global_economy |>
filter(Country == "United States")
# --- ETS model (no transformation) ---
fit_ets <- us_gdp_filtered |>
model(ETS(GDP ~ error("A") + trend("A") + season("N")))
# Forecast 10 periods ahead
fc_ets <- fit_ets |> forecast(h = 10)
# --- ARIMA model (already fitted) ---
fc_arima <- best_fit |> forecast(h = 10)
# Plot both forecasts together
autoplot(us_gdp_filtered) +
autolayer(fc_arima, series = "ARIMA(0,1,1)") +
autolayer(fc_ets, series = "ETS(A,A,N)") +
labs(title = "US GDP Forecasts: ARIMA vs ETS",
y = "GDP (millions)", x = "Year") +
scale_color_manual(values = c("ARIMA(0,1,1)" = "blue", "ETS(A,A,N)" = "red")) +
guides(color = guide_legend(title = "Model"))
## Plot variable not specified, automatically selected `.vars = GDP`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.

Comment: The time plot of the closing prices shows a clear upward trend over time, indicating that the mean is not constant but increasing, which violates the assumption of stationarity and provides strong evidence that the series is non-stationary. This is further supported by the ACF plot, where the autocorrelations start very close to 1 and decline only slowly across many lags; such slow decay is a classic sign of a non-stationary, trending series, whereas a stationary series would show correlations dropping quickly to zero. The PACF plot also reinforces this conclusion, as it displays a large spike at lag 1 followed by smaller yet still significant spikes, without a clear cutoff, suggesting persistence driven by an underlying trend rather than a stable autoregressive structure.