Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
The three ACF plots differ because of their sample sizes. In the left plot with 36 observations, the autocorrelations appear more irregular and show larger spikes due to greater variability from the small sample. In the middle plot with 360 observations, the ACF becomes more stable, with smaller spikes that stay closer to zero. In the right plot with 1,000 observations, the autocorrelations are very close to zero with minimal fluctuation, reflecting a much more accurate estimate. Despite these differences, all three plots indicate white noise because there is no clear pattern and most values lie within the confidence bounds.
Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The critical values are different because they depend on sample size. Smaller samples have wider bounds since there is more uncertainty, while larger samples have narrower bounds because the estimates are more precise. The autocorrelations differ because of random variation. Even though all three are white noise, smaller samples show more fluctuation, while larger samples stay closer to zero.
A classic example of a non-stationary series are stock prices.
Plot the daily closing prices for Amazon stock (contained in
gafa_stock), along with the ACF and PACF. Explain how each
plot shows that the series is non-stationary and should be
differenced.
The daily closing prices for Amazon display a clear upward trend, indicating that the mean changes over time and the series is non-stationary. In the ACF plot, the correlations stay high and decrease slowly across many lags instead of dropping off quickly, showing strong dependence over time. The PACF plot also has noticeable spikes at the first few lags, suggesting the presence of structure in the data. These features indicate that the series should be differenced to remove the trend and make it more stationary.
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.2 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.5.0
## ✔ ggplot2 4.0.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
# Filter Amazon stock data
amazon <- gafa_stock %>%
filter(Symbol == "AMZN")
# Plot daily closing prices
amazon %>%
autoplot(Close) +
labs(title = "Amazon Daily Closing Prices")
# ACF plot
amazon %>%
ACF(Close) %>%
autoplot() +
labs(title = "ACF of Amazon Stock Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
# PACF plot
amazon %>%
PACF(Close) %>%
autoplot() +
labs(title = "PACF of Amazon Stock Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Turkish GDP from global_economy.
After log transformation and first differencing, the series fluctuates around zero with no clear trend, suggesting the data is now stationary. The variation looks fairly stable over time, so one difference is sufficient.
# --- Turkish GDP ---
turkey_gdp <- global_economy %>%
filter(Country == "Turkey")
# Find Box-Cox lambda
lambda_gdp <- turkey_gdp %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
lambda_gdp
## [1] 0.1572187
# Apply transformation and differencing
turkey_gdp %>%
mutate(GDP_trans = box_cox(GDP, lambda_gdp),
GDP_diff = difference(GDP_trans)) %>%
autoplot(GDP_diff) +
labs(title = "Differenced Turkish GDP")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
Accommodation takings in the state of Tasmania from
aus_accommodation.
After log transformation and both seasonal and first differencing, the series centers around zero with no visible seasonal pattern. The fluctuations are consistent, indicating the trend and seasonality have been removed.
# --- Tasmania Accommodation ---
tas_accom <- aus_accommodation %>%
filter(State == "Tasmania")
# Find Box-Cox lambda
lambda_accom <- tas_accom %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
lambda_accom
## [1] 0.001819643
# Apply transformation and differencing
tas_accom %>%
mutate(Takings_trans = box_cox(Takings, lambda_accom),
Takings_diff = difference(Takings_trans, lag = 12),
Takings_diff2 = difference(Takings_diff)) %>%
autoplot(Takings_diff2) +
labs(title = "Differenced Tasmania Accommodation Takings")
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
Monthly sales from souvenirs.
After log transformation and both seasonal and first differencing, the series shows no clear trend or repeating seasonal pattern. The values move randomly around zero with stable variation, suggesting the series is now stationary.
# --- Souvenir Sales ---
souvenir <- souvenirs
# Find Box-Cox lambda
lambda_souvenir <- souvenir %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
lambda_souvenir
## [1] 0.002118221
# Apply transformation and differencing
souvenir %>%
mutate(Sales_trans = box_cox(Sales, lambda_souvenir),
Sales_diff = difference(Sales_trans, lag = 12),
Sales_diff2 = difference(Sales_diff)) %>%
autoplot(Sales_diff2) +
labs(title = "Differenced Souvenir Sales")
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
For the souvenirs data, write down the differences
you chose above using backshift operator notation.
For the souvenirs data, both a regular difference and a seasonal difference are used because the data shows a trend and yearly seasonality. In backshift notation, this is written as (1 - B)(1 - B^12) y_t, where (1 - B) removes the trend and (1 - B^12) removes the seasonal pattern that repeats every 12 periods.
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
For the retail data, you first apply a transformation like a log to stabilize the variance. Then, you use a first difference to remove the trend and a seasonal difference if there is a repeating pattern, such as monthly seasonality. This means using one regular difference and one seasonal difference to make the data stationary.
Simulate and plot some data from simple ARIMA models.
library(tsibble)
library(ggplot2)
set.seed(123) # for reproducibility
# simulate AR(1)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100) {
y[i] <- 0.6 * y[i-1] + e[i]
}
# create tsibble
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
# plot
autoplot(sim, y) +
labs(title = "Simulated AR(1) Process (phi = 0.6)",
y = "y",
x = "Time")
```
y <- numeric(100) e <- rnorm(100) for(i in 2:100) y[i] <- 0.6*y[i-1] + e[i] sim <- tsibble(idx = seq_len(100), y = y, index = idx)
```
set.seed(123)
phi_vals <- c(-0.6, 0.2, 0.6, 0.9)
sim_data <- data.frame()
for (phi in phi_vals) {
y <- numeric(100)
e <- rnorm(100)
for (i in 2:100) {
y[i] <- phi * y[i-1] + e[i]
}
temp <- data.frame(
idx = 1:100,
y = y,
phi = as.factor(phi)
)
sim_data <- rbind(sim_data, temp)
}
sim_ts <- as_tsibble(sim_data, index = idx, key = phi)
autoplot(sim_ts, y) +
labs(title = "AR(1) Processes for Different phi1 Values",
x = "Time",
y = "y")
**As phi1 changes the time series changes behavior. When phi1 is near 0 the series looks like white noise with little connection between values. As phi1 increases the series becomes smoother and more persistent so changes happen more slowly. When phi1 is close to 1 the series shows strong dependence and long trends. When phi1 is negative the values switch direction often creating a zigzag pattern.**
set.seed(123)
# generate noise
e <- rnorm(100)
# initialize series
y <- numeric(100)
# MA(1) process
for(i in 2:100) {
y[i] <- e[i] + 0.6 * e[i-1]
}
# view first few values
head(y)
## [1] 0.0000000 -0.5664629 1.4206018 1.0057334 0.1715928 1.7926376
Each value depends on the current noise plus 0.6 times the previous noise. This creates short term dependence between observations but unlike AR models it does not persist far into the future.
set.seed(123)
theta_vals <- c(-0.6, 0, 0.6, 0.9)
sim_data <- data.frame()
for (theta in theta_vals) {
e <- rnorm(100)
y <- numeric(100)
for (i in 2:100) {
y[i] <- e[i] + theta * e[i-1]
}
temp <- data.frame(
idx = 1:100,
y = y,
theta = as.factor(theta)
)
sim_data <- rbind(sim_data, temp)
}
ggplot(sim_data, aes(x = idx, y = y)) +
geom_line() +
facet_wrap(~theta, scales = "free_y") +
labs(title = "MA(1) Processes for Different theta1 Values",
x = "Time",
y = "y")
set.seed(123)
# generate noise
e <- rnorm(100)
# initialize series
y <- numeric(100)
# ARMA(1,1) process
for (i in 2:100) {
y[i] <- 0.6 * y[i-1] + e[i] + 0.6 * e[i-1]
}
# view first few values
head(y)
## [1] 0.0000000 -0.5664629 1.0807241 1.6541678 1.1640935 2.4910937
Each value depends on the previous value and both the current and previous noise. This combines the persistence of an AR model with the short term shock effects of an MA model.
set.seed(123)
# generate noise
e <- rnorm(100)
# initialize series
y <- numeric(100)
# set initial values
y[1] <- 0
y[2] <- 0
# AR(2) process
for (i in 3:100) {
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
}
# view first few values
head(y)
## [1] 0.000000 0.000000 1.558708 -1.176458 1.538067 0.131674
**Each value depends on the two previous values along with random noise. The negative phi1 makes the series alternate direction, while phi2 adds more influence from earlier values. These parameters make the series non stationary, so it may not stay around a constant mean and can show unstable behavior over time.**
set.seed(123)
# --- ARMA(1,1) ---
e1 <- rnorm(100)
y1 <- numeric(100)
for (i in 2:100) {
y1[i] <- 0.6 * y1[i-1] + e1[i] + 0.6 * e1[i-1]
}
# --- AR(2) ---
e2 <- rnorm(100)
y2 <- numeric(100)
for (i in 3:100) {
y2[i] <- -0.8 * y2[i-1] + 0.3 * y2[i-2] + e2[i]
}
# combine data
data <- data.frame(
idx = rep(1:100, 2),
y = c(y1, y2),
series = rep(c("ARMA(1,1)", "AR(2)"), each = 100)
)
# plot
ggplot(data, aes(x = idx, y = y)) +
geom_line() +
facet_wrap(~series, scales = "free_y") +
labs(title = "Comparison of ARMA(1,1) and AR(2) Series",
x = "Time",
y = "y")
The ARMA(1,1) series is smoother and more stable since it uses both past values and recent noise, keeping it close to a constant level. The AR(2) series is more uneven and unstable, with bigger swings and less consistency. Overall, the ARMA(1,1) series is steadier, while the AR(2) series is more volatile.
Consider aus_airpassengers, the total number of
passengers (in millions) from Australian air carriers for the period
1970-2011.
ARIMA() to find an appropriate ARIMA model. What
model was selected. Check that the residuals look like white noise. Plot
forecasts for the next 10 periods.# Load data
data <- aus_airpassengers
# Fit ARIMA automatically
fit <- data %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
#check for white noise
gg_tsresiduals(fit)
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
augment(fit) %>%
features(.innov, ljung_box, lag = 24)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 ARIMA(Passengers) 24.0 0.463
#Forecasts For Next 10 Cycles
fc <- fit %>%
forecast(h = 10)
autoplot(data, level = NULL) +
autolayer(fc)
## Plot variable not specified, automatically selected `.vars = Passengers`
## Warning in geom_line(...): Ignoring unknown parameters: `level`
The selected model is ARIMA(0,1,1)(0,1,1)[12]. It applies both
regular and seasonal differencing to remove trend and seasonality. The
residuals behave like white noise, with no clear pattern and no
significant autocorrelations. The forecasts indicate a continued upward
trend with a repeating seasonal pattern and wider prediction intervals
over time.
2. Write the model in terms of the backshift operator.
yt​=yt−1​+yt−12​−yt−13​+εt​+θ1​εt−1​+Θ1​εt−12​+θ1​Θ1​εt−13​
3. Plot forecasts from an ARIMA(0,1,0) model with drift and compare
these to part a.
ata <- aus_airpassengers
# Model from part (a)
fit <- data %>%
model(ARIMA(Passengers))
fc <- fit %>%
forecast(h = 10)
# Random walk with drift
fit_rw <- data %>%
model(ARIMA(Passengers ~ pdq(0,1,0) + 1))
fc_rw <- fit_rw %>%
forecast(h = 10)
# Plot comparison
autoplot(data) +
autolayer(fc, series = "Seasonal ARIMA", level = NULL) +
autolayer(fc_rw, series = "RW with drift", level = NULL)
## Plot variable not specified, automatically selected `.vars = Passengers`
## Warning in geom_line(mapping = without(mapping, "shape"), data = unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters: `series`
## Ignoring unknown parameters: `series`
**The ARIMA(0,1,0) model with drift produces forecasts that follow a
straight upward line, reflecting a constant growth rate over time. The
prediction intervals widen as the forecast horizon increases, showing
increasing uncertainty.
Compared to the ARIMA(0,1,1)(0,1,1)[12] model from part (a), the forecasts here do not capture any seasonal variation and are much smoother. The seasonal ARIMA model provides more detailed forecasts with repeating patterns, making it a better fit for the data.**
4. Plot forecasts from an ARIMA(2,1,2) model with drift and compare
these to parts a and c. Remove the constant and see what
happens.
#with constant
fit_212_drift <- data %>%
model(ARIMA(Passengers ~ pdq(2,1,2) + 1))
fc_212_drift <- fit_212_drift %>%
forecast(h = 10)
#without constant
fit_212_nodrift <- data %>%
model(ARIMA(Passengers ~ pdq(2,1,2) + 0))
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + 0)
## [1] non-stationary AR part from CSS
fc_212_nodrift <- fit_212_nodrift %>%
forecast(h = 10)
#all models plotted together
autoplot(data) +
autolayer(fc, series = "Seasonal ARIMA", level = NULL) +
autolayer(fc_rw, series = "RW with drift", level = NULL) +
autolayer(fc_212_drift, series = "ARIMA(2,1,2) with drift", level = NULL) +
autolayer(fc_212_nodrift, series = "ARIMA(2,1,2) no drift", level = NULL)
## Plot variable not specified, automatically selected `.vars = Passengers`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data = unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters: `series`
## Ignoring unknown parameters: `series`
## Ignoring unknown parameters: `series`
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
The ARIMA(2,1,2) model with drift produces upward trending forecasts that are more flexible than the random walk model but still do not capture seasonality like the ARIMA(0,1,1)(0,1,1)[12] model. When the constant is removed, the forecasts flatten and lose the upward trend, showing that the drift term controls long-term growth.
5. Plot forecasts from an ARIMA(0,2,1) model with a constant. What
happens?
fit_021 <- data %>%
model(ARIMA(Passengers ~ pdq(0,2,1) + 1))
## Warning: Model specification induces a quadratic or higher order polynomial trend. This
## is generally discouraged, consider removing the constant or reducing the number
## of differences.
fc_021 <- fit_021 %>%
forecast(h = 10)
autoplot(data) +
autolayer(fc_021, level = NULL)
## Plot variable not specified, automatically selected `.vars = Passengers`
The ARIMA(0,2,1) model with a constant generates forecasts that
follow a curved, accelerating pattern instead of a straight line. The
prediction intervals expand quickly, reflecting greater uncertainty over
time. This happens because second differencing with a constant produces
a quadratic trend in the forecasts.