Figure 9.32 shows sample ACFs from white noise, but
the sample size changes (36, 360, 1000).
With only 36 observations, the ACF estimates bounce around more, so a
few spikes can look “large” just by chance.
As the sample size increases (360, then 1000), the ACF estimates get
closer to 0 at all lags and look much calmer.
Yes — all three are consistent with white noise: most autocorrelations are near 0 and any spikes are random sampling variation.
The dashed lines are approximate 95% limits, about ± 1.96 /
√n.
So with small n (36) the limits are wider; with
larger n (360, 1000) they shrink toward 0.
The ACF bars differ because they are sample estimates. Even when the true autocorrelations are 0 (white noise), the estimates vary randomly, and that variability decreases as n increases.
gafa_stockamazon <- gafa_stock |>
filter(Symbol == "AMZN") |>
select(Date, Close)
autoplot(amazon, Close) +
labs(title = "AMZN daily closing price", y = "Close")
amazon |> ACF(Close) |> autoplot() + labs(title = "ACF of AMZN Close")
amazon |> PACF(Close) |> autoplot() + labs(title = "PACF of AMZN Close")
Why non-stationary? The time plot shows a changing
level (trend) and long swings.
The ACF decays very slowly (stays high for many lags), which is typical
of a non-stationary series.
The PACF has a large spike at lag 1 and then stays important for a
while, also suggesting differencing is needed.
We’ll use a Box–Cox transformation suggestion
(BoxCox.lambda) and differencing orders using ADF-based
ndiffs() (to avoid the KPSS/urca
dependency).
get_lambda <- function(x){
x <- as.numeric(x)
x <- x[is.finite(x)]
if (any(x <= 0)) return(NA_real_)
forecast::BoxCox.lambda(x, method = "guerrero")
}
get_d <- function(x){
x <- as.numeric(x)
x <- x[is.finite(x)]
forecast::ndiffs(x, test = "adf")
}
get_D <- function(x, m){
x <- as.numeric(x)
x <- x[is.finite(x)]
forecast::nsdiffs(x, m = m)
}
global_economyturkey_gdp <- global_economy |>
filter(Country == "Turkey") |>
select(Year, GDP)
lambda_turkey <- get_lambda(turkey_gdp$GDP)
d_turkey <- get_d(turkey_gdp$GDP)
lambda_turkey
## [1] 0.1571804
d_turkey
## [1] 1
Answer (Turkey GDP): A log-like Box–Cox (λ near 0) is usually appropriate for GDP, and 1 non-seasonal difference is typically enough.
aus_accommodationSome environments store the index column as
Date(notMonth).
The code below renames the first column toDateso it will knit no matter what it’s called.
tas_raw <- aus_accommodation |>
filter(State == "Tasmania") |>
as_tibble()
idx_name <- names(tas_raw)[1]
tas <- tas_raw |>
rename(Date = all_of(idx_name)) |>
select(Date, Takings)
lambda_tas <- get_lambda(tas$Takings)
d_tas <- get_d(tas$Takings)
D_tas <- get_D(tas$Takings, m = 12)
lambda_tas
## [1] -0.005076712
d_tas
## [1] 0
D_tas
## [1] 1
Answer (Tasmania takings): A Box–Cox transform is often helpful, and you typically need one seasonal difference (D = 1) plus possibly one regular difference (d = 1).
For the selected model ARIMA(0,2,1) (no drift/constant), we have \(\phi(B)=1\) and \(d=2\), with one MA term. So the backshift-operator form is
\[ (1-B)^2 y_t = (1 + \theta_1 B)\varepsilon_t, \]
and using the estimated MA(1) coefficient from the fitted model (\(\theta_1 \approx -0.8963\)),
\[ (1-B)^2 y_t = (1 - 0.8963\,B)\varepsilon_t. \]
souvenirslambda_souv <- get_lambda(souvenirs$Sales)
d_souv <- get_d(souvenirs$Sales)
D_souv <- get_D(souvenirs$Sales, m = 12)
lambda_souv
## [1] -0.2444328
d_souv
## [1] 1
D_souv
## [1] 1
Answer (Souvenirs): Strong seasonality → D = 1; trend → often d = 1. A log-like Box–Cox (λ near 0) is commonly reasonable.
If you don’t remember your earlier retail series, here is a standard
choice using the aus_retail dataset.
retail_series <- aus_retail |>
filter(State == "New South Wales", Industry == "Department stores") |>
select(Month, Turnover)
autoplot(retail_series, Turnover) +
labs(title = "Retail turnover (NSW, Department stores)")
lambda_retail <- get_lambda(retail_series$Turnover)
d_retail <- get_d(retail_series$Turnover)
D_retail <- get_D(retail_series$Turnover, m = 12)
lambda_retail
## [1] -0.9999242
d_retail
## [1] 0
D_retail
## [1] 1
Answer: Typically D = 1 (seasonal) and d = 1 (trend). A Box–Cox transform is often helpful for variance stabilization.
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + e[i]
}
sim_ar1 <- tsibble(idx = 1:100, y = y, index = idx)
autoplot(sim_ar1, y) + labs(title = "Simulated AR(1): phi=0.6", y = "y")
What happens as phi changes? Larger φ makes the series smoother and more persistent; smaller φ makes it look noisier.
set.seed(123)
e <- rnorm(101)
y <- numeric(100)
for(i in 1:100){
y[i] <- e[i+1] + 0.6*e[i]
}
sim_ma1 <- tsibble(idx = 1:100, y = y, index = idx)
autoplot(sim_ma1, y) + labs(title = "Simulated MA(1): theta=0.6", y = "y")
What happens as theta changes? Larger |θ| increases short-run dependence; the series can look “chunkier” but without long persistence.
set.seed(123)
sim_arma11 <- arima.sim(list(ar = 0.6, ma = 0.6), n = 200)
autoplot(tsibble(idx = 1:200, y = as.numeric(sim_arma11), index = idx), y) +
labs(title = "Simulated ARMA(1,1): phi=0.6, theta=0.6")
set.seed(123)
# NOTE: The book points out these AR(2) parameters produce a non-stationary series.
# arima.sim() checks stationarity and will error, so we simulate it "by hand".
n <- 200
e <- rnorm(n, mean = 0, sd = 1)
sim_ar2 <- numeric(n)
sim_ar2[1] <- 0
sim_ar2[2] <- 0
for (t in 3:n) {
sim_ar2[t] <- (-0.8) * sim_ar2[t - 1] + (0.3) * sim_ar2[t - 2] + e[t]
}
p1 <- autoplot(tsibble(idx = 1:n, y = sim_ar2, index = idx), y) +
labs(title = "Simulated AR(2): phi1=-0.8, phi2=0.3 (non-stationary example)")
p2 <- autoplot(tsibble(idx = 1:n, y = as.numeric(sim_arma11), index = idx), y) +
labs(title = "Simulated ARMA(1,1): phi=0.6, theta=0.6")
p1
p2
Comparison: The AR(2) with a negative φ1 often shows more oscillation (alternating behavior) than the ARMA(1,1).
aus_airpassengers ARIMA comparisonsair <- aus_airpassengers
fit_auto <- air |> model(ARIMA(Passengers))
report(fit_auto)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fit_auto |> gg_tsresiduals()
fc_auto <- fit_auto |> forecast(h = 10)
autoplot(air, Passengers) +
autolayer(fc_auto, level = 95) +
labs(title = "Auto ARIMA forecasts (10 periods)")
Use the ARIMA orders reported above. For example, ARIMA(p,d,q)
corresponds to
\[ \phi(B)(1-B)^d y_t = c + \theta(B)\varepsilon_t. \]
fit_rw_drift <- air |> model(ARIMA(Passengers ~ pdq(0,1,0) + drift()))
fc_rw_drift <- fit_rw_drift |> forecast(h = 10)
autoplot(air, Passengers) +
autolayer(fc_auto, series = "auto", alpha = 0.7) +
autolayer(fc_rw_drift, series = "RW drift", alpha = 0.7) +
labs(title = "Forecast comparison: auto ARIMA vs RW with drift")
fit_212_drift <- air |> model(ARIMA(Passengers ~ pdq(2,1,2) + drift()))
fit_212_nocon <- air |> model(ARIMA(Passengers ~ pdq(2,1,2)))
fit_021_const <- air |> model(ARIMA(Passengers ~ pdq(0,2,1) + constant()))
fc_212_drift <- fit_212_drift |> forecast(h = 10)
fc_212_nocon <- fit_212_nocon |> forecast(h = 10)
fc_021_const <- fit_021_const |> forecast(h = 10)
autoplot(air, Passengers) +
autolayer(fc_212_drift, series = "ARIMA(2,1,2)+drift", alpha = 0.7) +
autolayer(fc_212_nocon, series = "ARIMA(2,1,2) no const", alpha = 0.7) +
autolayer(fc_021_const, series = "ARIMA(0,2,1)+const", alpha = 0.7) +
labs(title = "More ARIMA forecast comparisons")
us <- global_economy |>
filter(Country == "United States") |>
select(Year, GDP)
lambda_us <- get_lambda(us$GDP)
lambda_us
## [1] 0.2819714
# Transform (if lambda is NA, we skip)
us <- us |> mutate(GDP_bc = if (is.na(lambda_us)) GDP else forecast::BoxCox(GDP, lambda_us))
fit_us_auto <- us |> model(ARIMA(GDP_bc))
report(fit_us_auto)
## Series: GDP_bc
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.2764
## s.e. 0.1198 9.5123
##
## sigma^2 estimated as 5488: log likelihood=-325.37
## AIC=656.74 AICc=657.19 BIC=662.87
fit_us_auto |> gg_tsresiduals()
fc_us_auto <- fit_us_auto |> forecast(h = 10)
autoplot(us, GDP_bc) +
autolayer(fc_us_auto, level = 95) +
labs(title = "US GDP forecasts (ARIMA on Box–Cox scale)", y = "Box–Cox(GDP)")
# ETS on original scale
fit_us_ets <- us |> model(ETS(GDP))
fc_us_ets <- fit_us_ets |> forecast(h = 10)
autoplot(us, GDP) +
autolayer(fc_us_ets, level = 95) +
labs(title = "US GDP forecasts (ETS on original scale)", y = "GDP")
Discussion: If the Box–Cox ARIMA forecasts look smoother and the residuals look more like white noise, it’s usually preferred. ETS is a helpful baseline; often it gives similar medium-term forecasts but may differ in uncertainty and trend handling.