All three are still white noise since there’s no clear pattern — the spikes are random. The only difference is that as the sample size increases, the random variation in the ACF gets smaller, making the plots look “cleaner.”
The blue dashed lines show the 95% confidence limits for autocorrelations, calculated as: +/-(1.96/(sqrt)T) where T = number of observations. As T increases, the denominator gets larger, so the bounds get narrower.
Hence, the confidence bands are wider for 36 observations and narrower for 1,000.
The differences in autocorrelation spikes come from random sampling. Even though all three are white noise (mean 0, no true correlation), each random sample produces slightly different autocorrelation estimates. With more data, these random ups and downs average out, so large samples show ACFs that hug zero more closely.
# Filter for Amazon
amazon <- gafa_stock %>%
filter(Symbol == "AMZN")
# time series
amazon %>%
autoplot(Close) +
labs(title = "Daily Closing Price for Amazon (AMZN)",
y = "Price (USD)",
x = "Date")
# ACF and PACF
amazon %>%
ACF(Close) %>%
autoplot() +
labs(title = "ACF of Amazon Closing Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
amazon %>%
PACF(Close) %>%
autoplot() +
labs(title = "PACF of Amazon Closing Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
turkey <- global_economy %>%
filter(Country == "Turkey") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
turkey
## [1] 0.1572187
# original series
global_economy %>%
filter(Country == "Turkey") %>%
gg_tsdisplay(GDP, plot_type = "partial") +
labs(title = "Turkish GDP — Original Series")
# number of differences required
global_economy %>%
filter(Country == "Turkey") %>%
features(box_cox(GDP, turkey), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
global_economy %>%
filter(Country == "Turkey") %>%
gg_tsdisplay(difference(box_cox(GDP, turkey)), plot_type = "partial") +
labs(title = "Turkish GDP — After Box-Cox and First Difference")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
The Turkish GDP data clearly isn’t stationary at first, it trends upward over time. Using the Box-Cox transformation gave a lambda close to 0, meaning a log transform works best to even out the variance. After that, taking the first difference made the data look stable around a constant mean. The ACF also confirmed it- after one difference, the autocorrelations dropped off quickly, showing the series is now stationary.
tasmania <- aus_accommodation %>%
filter(State == "Tasmania") %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
tasmania
## [1] 0.001819643
# original series
aus_accommodation %>%
filter(State == "Tasmania") %>%
gg_tsdisplay(Takings, plot_type = "partial") +
labs(title = "Tasmanian Accommodation Takings — Original Series")
# seasonal differences
aus_accommodation %>%
filter(State == "Tasmania") %>%
features(box_cox(Takings, tasmania), unitroot_nsdiffs)
## # A tibble: 1 × 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
aus_accommodation %>%
filter(State == "Tasmania") %>%
gg_tsdisplay(difference(box_cox(Takings, tasmania), 4), plot_type = "partial") +
labs(title = "Tasmanian Accommodation Takings — After Seasonal Difference")
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
The Tasmanian accommodation takings data show a strong upward trend and clear quarterly seasonality. Using the Box-Cox transformation gave a lambda around 0, so a log transform helps even out the variance. The seasonal unit-root test suggested we needed one seasonal difference to make the series stationary. After the transformation and differencing, the series looks much more stable. The ACF and PACF mostly hover around zero, with only a few small spikes left.
souv <- souvenirs %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
souv
## [1] 0.002118221
# original series
souvenirs %>%
gg_tsdisplay(Sales, plot_type = "partial") +
labs(title = "Monthly Souvenir Sales — Original Series")
# seasonal differences
souvenirs %>%
features(box_cox(Sales, souv), unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
souvenirs %>%
gg_tsdisplay(difference(box_cox(Sales, souv), 12), plot_type = "partial") +
labs(title = "Monthly Souvenir Sales — After Seasonal Difference")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
The monthly souvenir sales data clearly show a rising trend and strong yearly seasonality. Using the Box–Cox transformation gave a lambda of about 0, so a log transform was used to stabilize the variance. The seasonal unit-root test suggested one seasonal difference was needed. After transforming and differencing, the pattern looks much steadier — the ACF and PACF don’t show any strong or repeating spikes, meaning the series is now pretty stable and close to stationary.
set.seed(2000)
myseries <- aus_retail %>%
filter(`Series ID` == "A3349791W")
s_lambda <- myseries %>%
features(Turnover, guerrero) %>%
pull(lambda_guerrero)
s_lambda
## [1] -0.4026063
# original series
myseries %>%
gg_tsdisplay(Turnover, plot_type = "partial") +
labs(title = "Retail Series A3349791W — Original Data")
# seasonal differencing
myseries %>%
features(box_cox(Turnover, s_lambda), unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 New South Wales Other recreational goods retailing 1
myseries %>%
gg_tsdisplay(difference(box_cox(Turnover, s_lambda), 12),
plot_type = "partial") +
labs(title = "Retail Series A3349791W — After Box–Cox and Seasonal Differencing")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
The retail turnover data for New South Wales show an upward trend and strong yearly seasonality. Using the Box–Cox transformation gave a lambda around –0.4, which helped stabilize the variance. The seasonal unit-root test suggested one seasonal difference was needed to make the series stable. After transforming and differencing, the pattern looks much steadier. The ACF and PACF don’t show long-lasting spikes anymore, so the series is now close to stationary.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>%
autoplot(y) +
labs(
title = "Simulated AR(1) Process (phi1 = 0.6, sigma^2 = 1)",
y = expression(y[t]),
x = "Time"
)
# ACF and PACF
sim %>%
gg_tsdisplay(y, plot_type = "partial") +
labs(title = "ACF and PACF of Simulated AR(1) Process")
The ACF slowly drops off after lag 1, expected from an AR process. The PACF, on the other hand, cuts off sharply after lag 1, a clear sign of an AR(1) model. Together, they show that each value depends a lot on the previous one, meaning the series has positive persistence (tends to “stick” around recent levels).
ar1 <- function(phi) {
set.seed(2000)
y <- numeric(100)
e <- rnorm(100)
for (i in 2:100)
y[i] <- phi * y[i - 1] + e[i]
tibble(
Time = 1:100,
y = y,
phi = as.factor(phi)
)
}
sims <- bind_rows(
ar1(1),
ar1(0.6),
ar1(0),
ar1(-1)
)
sims <- sims %>%
as_tsibble(index = Time, key = phi)
# Plot
sims %>%
ggplot(aes(x = Time, y = y)) +
geom_line(color = "steelblue") +
facet_wrap(~phi, ncol = 2, scales = "fixed") +
labs(
title = "Simulated AR(1) Processes for Different phi1 Values",
subtitle = "Higher phi1 → smoother & persistent; Negative phi1 → oscillating pattern",
x = "Time",
y = expression(y[t])
) +
theme_minimal()
Varying phi-one in the AR(1) model changes overall pattern of the time plot - how “sticky” or bouncy the series looks:
phi-one = 1: behaves like a random walk. It’s not stationary, the series drifts over time, and its ups and downs get bigger.
phi-one = 0.6: stationary with positive correlation- the plot is smooth and shocks decay slowly.
phi-one = 0: white noise; no persistence—values bounce randomly around zero.
phi-one = −1: gives a sharp back-and-forth, zig-zag pattern where values flip signs each step..
Summary: Larger absolute values of phi-one make the series more persistent and smoother.. The sign of phi-one determines the shape- positive means smoother changes, while negative makes it oscillate. The process is stationary only when the absolute value of phi-one is less than one.
# Simulate MA(1) process
set.seed(2000)
n <- 100
theta1 <- 0.6
sigma <- 1
e <- rnorm(n, mean = 0, sd = sigma)
y <- numeric(n)
y[1] <- e[1]
for (i in 2:n) y[i] <- e[i] + theta1 * e[i-1]
sim_ma1 <- tsibble(
idx = 1:n,
y = y,
index = idx
)
# simulated MA(1) process
sim_ma1 %>%
autoplot(y) +
labs(
title = "Simulated MA(1) Process (Theta1 = 0.6, Sigma^2 = 1)",
x = "Time",
y = expression(y[t])
) +
theme_minimal()
# ACF and PACF
sim_ma1 %>%
gg_tsdisplay(y, plot_type = "partial") +
labs(title = "ACF and PACF of Simulated MA(1) Process (Theta1 = 0.6)")
The MA(1) series shows short-term dependence, each value is mostly affected by the most recent random shock, not older ones.
The ACF drops off right after lag 1, which is exactly what we expect for an MA(1) process, while the PACF slowly fades out.
Overall, the series has quick, random ups and downs around zero- showing short-lived correlation typical of a stationary moving average model.
# MA(1) simulator
ma1 <- function(theta) {
set.seed(123)
n <- 100
e <- rnorm(n) # white noise
y <- numeric(n)
for (i in 2:n)
y[i] <- e[i] + theta * e[i - 1]
tibble(
Time = 1:n,
y = y,
theta = as.factor(theta)
)
}
# multiple MA(1) series for different theta values
sims <- bind_rows(
ma1(1),
ma1(0.6),
ma1(0),
ma1(-0.6),
ma1(-1)
)
sims <- sims %>%
as_tsibble(index = Time, key = theta)
# all MA(1) simulations
sims %>%
ggplot(aes(x = Time, y = y)) +
geom_line(color = "steelblue") +
facet_wrap(~theta, ncol = 2, scales = "fixed") +
labs(
title = "Simulated MA(1) Processes for Different theta1 Values",
subtitle = "Higher |theta1| → stronger short-term correlation; Negative theta1 → alternating pattern",
x = "Time",
y = expression(y[t])
) +
theme_minimal()
Changing theta-one affects the short-term correlation between consecutive observations:
Positive values (like 0.6 or 1) - smoother short-term movements, so neighboring points move in the same direction.
Negative values (like −0.6 or −1) produce alternating up-and-down movements, giving an oscillating pattern.
As the absolute value of theta-one increases, the short-term correlation becomes stronger, making patterns more pronounced.
Summary: Increasing the absolute value of theta-one strengthens short-term dependence, while the sign of theta-one determines whether the series moves smoothly (positive) or oscillates (negative).
# Simulate ARMA(1,1): phi1 = 0.6, theta1 = 0.6, sigma^2 = 1
set.seed(123)
n <- 100
phi <- 0.6
theta <- 0.6
e <- rnorm(n, mean = 0, sd = 1) # white noise
y <- numeric(n)
for (i in 2:n) {
y[i] <- phi * y[i - 1] + e[i] + theta * e[i - 1]
}
arma_sim <- tsibble(
Time = 1:n,
y = y,
index = Time
)
# Plot the simulated ARMA(1,1) series
arma_sim %>%
autoplot(y) +
labs(
title = "Simulated ARMA(1,1) Process (phi1 = 0.6, theta1 = 0.6, sigma^2 = 1)",
x = "Time", y = expression(y[t])
) +
theme_minimal()
The simulated ARMA(1,1) series mixes both autoregressive and moving average effects.
The AR part (phi-one = 0.6) makes each value partly depend on the previous one, adding some persistence. The MA part (theta-one = 0.6) smooths out random shocks by factoring in the last error.
Since both parameters are positive and moderate, the series stays stable and moves smoothly around zero.
Overall, it shows short-term correlation without any long-term trend, this is a typical pattern for a steady ARMA(1,1) process.
set.seed(2000)
n <- 100
phi1 <- -0.8
phi2 <- 0.3
sigma <- 1
# Simulate AR(2)
e <- rnorm(n, mean = 0, sd = sigma)
y <- numeric(n)
y[1:2] <- 0
for (i in 3:n) {
y[i] <- phi1 * y[i - 1] + phi2 * y[i - 2] + e[i]
}
ar2_sim <- tsibble(
Time = 1:n,
y = y,
index = Time
)
# Plot
ar2_sim %>%
autoplot(y) +
labs(
title = "Simulated AR(2) Process (phi1 = -0.8, phi2 = 0.3, sigma^2 = 1)",
subtitle = "This parameter combination produces a non-stationary series",
x = "Time", y = expression(y[t])
) +
theme_minimal()
The combination of a large negative first coefficient (phi-one = –0.8) and a positive second coefficient (phi-two = 0.3) makes the series oscillate and grow bigger over time.
In the time plot, you can see the swings getting larger instead of settling down- the shocks don’t fade, they actually build up.
Summary: This AR(2) process is unstable - it keeps spiraling out with bigger and bigger oscillations instead of returning to a steady pattern.
set.seed(2000)
n <- 100
# ARMA(1,1): phi1 = 0.6, theta1 = 0.6, sigma^2 = 1
e1 <- rnorm(n)
y_arma11 <- numeric(n)
y_arma11[1] <- 0
for (i in 2:n) {
y_arma11[i] <- 0.6 * y_arma11[i - 1] + e1[i] + 0.6 * e1[i - 1]
}
arma11 <- tibble(
Time = 1:n,
y = y_arma11,
Model = "ARMA(1,1)"
)
# AR(2): phi1 = -0.8, phi2 = 0.3, sigma^2 = 1 (non-stationary)
e2 <- rnorm(n)
y_ar2 <- numeric(n)
y_ar2[1:2] <- 0
for (i in 3:n) {
y_ar2[i] <- -0.8 * y_ar2[i - 1] + 0.3 * y_ar2[i - 2] + e2[i]
}
ar2 <- tibble(
Time = 1:n,
y = y_ar2,
Model = "AR(2)"
)
combined <- bind_rows(arma11, ar2)
# Plot
ggplot(combined, aes(x = Time, y = y)) +
geom_line(color = "steelblue") +
facet_wrap(~Model, ncol = 1, scales = "free_y") +
labs(
title = "Comparison of ARMA(1,1) and AR(2) Simulated Series",
y = expression(y[t]),
x = "Time"
) +
theme_minimal()
The two simulated series behave very differently:
ARMA(1,1) (phi-one = 0.6, theta-one = 0.6): This model is stationary. It moves smoothly around zero, with moderate persistence, shocks fade out gradually.
AR(2) (phi-one = –0.8, phi-two = 0.3): This one’s non-stationary. The plot shows oscillations that grow bigger with time- each swing amplifies instead of dying out. That’s because the parameters break the stationarity condition.
Comparison: The ARMA(1,1) model stays steady and predictable, while the AR(2) model blows up dramatically. It’s a clear example of how the choice of coefficients controls whether a time series remains stable or spirals out of control.
aus_airpassengers %>% autoplot(Passengers) +
labs(
title = "Australian Air Passenger Numbers (1970–2011)",
y = "Passengers (millions)", x = "Year"
)
fit <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fit %>%
gg_tsresiduals()
augment(fit) %>%
features(.innov, ljung_box, lag = 24, dof = 2)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 ARIMA(Passengers) 24.0 0.348
fit %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers) +
labs(
title = "10-Year Forecast of Australian Air Passenger Numbers",
y = "Passengers (millions)", x = "Year"
)
The fitted model for the Australian air passenger data is
ARIMA(0,1,1)(0,1,1)[12].
In terms of the backshift operator:
\[ (1 - B)(1 - B^{12})y_t = (1 + \theta_1 B)(1 + \Theta_1 B^{12})\varepsilon_t \]
Expanding gives:
\[ (1 - B)(1 - B^{12})y_t = (1 + \theta_1 B + \Theta_1 B^{12} + \theta_1\Theta_1 B^{13})\varepsilon_t \]
where
- \(\varepsilon_t\) represents white
noise errors,
- \(\theta_1\) is the non-seasonal
MA(1) parameter, and
- \(\Theta_1\) is the seasonal MA(1)
parameter.
# Fit ARIMA(0,1,0) with drift using explicit intercept (same as drift)
fit_drift <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))
# Summarize model
report(fit_drift)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
# Forecast 10 periods ahead
fc_drift <- fit_drift %>%
forecast(h = 10)
# Plot forecasts
autoplot(fc_drift, aus_airpassengers) +
labs(title = "Forecasts from ARIMA(0,1,0) with Drift",
y = "Passengers (millions)", x = "Year")
# Compare with model from part (a)
fit_a <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit_a)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fc_compare <- bind_rows(
forecast(fit_drift, h = 10) %>% mutate(Model = "ARIMA(0,1,0) with Drift"),
forecast(fit_a, h = 10) %>% mutate(Model = "Model from part (a)")
)
autoplot(fc_compare, aus_airpassengers) +
facet_wrap(~Model, ncol = 2) +
labs(title = "Forecast Comparison: ARIMA(0,1,0) with Drift vs Model (a)",
y = "Passengers (millions)", x = "Year")
The ARIMA(0,1,0) with drift model is basically a random walk with a steady upward trend. The drift term (about 1.42) means passenger numbers are growing by roughly 1.4 million per year on average.
Compared to the ARIMA(0,2,1) model from part (a), this one gives a simpler, straight-line forecast — it doesn’t capture as much short-term movement. Both models predict continued growth, but ARIMA(0,2,1) fits the past data a bit better and has slightly lower AIC values, so it’s the stronger overall model.
fit_drift_212 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))
report(fit_drift_212)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
fc_drift_212 <- fit_drift_212 %>%
forecast(h = 10)
autoplot(fc_drift_212, aus_airpassengers) +
labs(title = "Forecasts from ARIMA(2,1,2) with Drift",
y = "Passengers (millions)", x = "Year")
fit_nodrift_212 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ pdq(2,1,2)))
## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
report(fit_nodrift_212)
## Series: Passengers
## Model: NULL model
## NULL model
fc_compare_all <- bind_rows(
forecast(fit_a, h = 10) %>% mutate(Model = "Model from part (a)"),
forecast(fit_drift, h = 10) %>% mutate(Model = "ARIMA(0,1,0) with Drift"),
forecast(fit_drift_212, h = 10) %>% mutate(Model = "ARIMA(2,1,2) with Drift"),
forecast(fit_nodrift_212, h = 10) %>% mutate(Model = "ARIMA(2,1,2) without Drift")
)
autoplot(fc_compare_all, aus_airpassengers) +
facet_wrap(~Model, ncol = 2) +
labs(title = "Forecast Comparison: Parts (a), (c), and (d)",
y = "Passengers (millions)", x = "Year")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
The ARIMA(2,1,2) with drift model fits the data well, with a drift term of about 3.2 million passengers per year, which lines up with the steady growth seen in air travel.
When the drift is removed, the model breaks down and can’t generate valid forecasts- this shows that the drift term is critical for capturing the long-term upward trend.
Comparing across models:
ARIMA(0,1,0) with drift gives a straight-line trend.
ARIMA(0,2,1) picks up short-term changes but misses some long-run growth.
ARIMA(2,1,2) with drift does the best job- it handles both short-term fluctuations and long-term growth realistically.
Without the drift, the model flattens out, proving that the steady increase in passengers depends on that constant term.
# Fit ARIMA(0,2,1) with a constant (equivalent to adding a linear trend)
fit_const_021 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
report(fit_const_021)
## Series: Passengers
## Model: ARIMA(0,2,1) w/ poly
##
## Coefficients:
## ma1 constant
## -1.0000 0.0571
## s.e. 0.0585 0.0213
##
## sigma^2 estimated as 3.855: log likelihood=-95.1
## AIC=196.21 AICc=196.79 BIC=201.63
fc_const_021 <- fit_const_021 %>%
forecast(h = 10)
autoplot(fc_const_021, aus_airpassengers) +
labs(
title = "Forecasts from ARIMA(0,2,1) with a Constant",
y = "Passengers (millions)", x = "Year"
)
fit_no_const_021 <- aus_airpassengers %>% model(ARIMA(Passengers ~ pdq(0,2,1)))
fc_compare <- bind_rows(
forecast(fit_const_021, h=10) %>% mutate(Model="ARIMA(0,2,1)+constant"),
forecast(fit_no_const_021, h=10) %>% mutate(Model="ARIMA(0,2,1)")
)
autoplot(fc_compare, aus_airpassengers) + facet_wrap(~Model)
When you fit an ARIMA(0,2,1) model with a constant, the forecasts curve upward instead of following a straight line — it creates a quadratic trend.
That happens because adding a constant when differencing twice (d=2) makes the model generate a polynomial trend, which usually isn’t realistic for long-term forecasting.
The MA(1) term is also close to –1, which means the model is almost non-invertible and not very stable.
In short, the constant makes the model overfit and exaggerate future growth. It’s better to drop the constant or reduce the differencing to keep the trend reasonable.
us_gdp <- global_economy %>%
filter(Country == "United States")
us_lambda <- us_gdp %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
us_lambda
## [1] 0.2819443
us_gdp %>%
autoplot(GDP) +
labs(title = "U.S. GDP (Original Scale)", y = "GDP (US$ billions)")
# Apply Box-Cox transformation
us_gdp %>%
mutate(GDP_transformed = box_cox(GDP, us_lambda)) %>%
autoplot(GDP_transformed) +
labs(title = "U.S. GDP (Box-Cox Transformed)", y = "Transformed GDP")
Using the Guerrero method, the estimated Box–Cox parameter for U.S. GDP is about lambda = 0.28. That means the data shows strong level-dependent variance, so a transformation helps stabilize it.
After applying the Box–Cox transformation (or ~ a log transform), the growth becomes more linear and consistent over time. The variance is now much steadier, but there’s still a clear upward trend- so we’ll need to difference the series to make it stationary.
# Filter U.S. GDP data
us_gdp <- global_economy %>%
filter(Country == "United States")
us_lambda <- 0.2819443
# Fit ARIMA model to transformed series
fit_us_gdp <- us_gdp %>%
model(ARIMA(box_cox(GDP, us_lambda)))
report(fit_us_gdp)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, us_lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1823
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
# Check residual diagnostics
fit_us_gdp %>%
gg_tsresiduals()
# Plot 10-year forecasts
fit_us_gdp %>%
forecast(h = "10 years") %>%
autoplot(us_gdp) +
labs(
title = "Forecasts for U.S. GDP (Box-Cox transformed)",
y = "GDP (Box-Cox scale)", x = "Year"
)
Residual Diagnostics
Randomly scattered residuals around 0 -> no clear patterns remain.
ACF values lie within the 95% confidence bounds, indicating no significant autocorrelation.
The histogram of residuals is roughly symmetric, suggesting approximate normality.
Interpretation
The fitted ARIMA(1, 1, 0) with drift model captures the long-term growth trend in U.S. GDP while allowing for moderate short-term autocorrelation.
The drift term (≈ 118) represents the average annual increase in the Box-Cox-transformed GDP, indicating persistent long-run growth.
The model’s residuals behave like white noise, confirming that the ARIMA specification is appropriate for forecasting.
Forecast
# Filter U.S. GDP data
us_gdp <- global_economy %>%
filter(Country == "United States")
us_lambda <- us_gdp %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
us_lambda
## [1] 0.2819443
us_gdp <- us_gdp %>%
mutate(GDP_bc = box_cox(GDP, us_lambda))
us_models <- us_gdp %>%
model(
arima110 = ARIMA(GDP_bc ~ 1 + pdq(1,1,0)),
arima011 = ARIMA(GDP_bc ~ 1 + pdq(0,1,1)),
arima210 = ARIMA(GDP_bc ~ 1 + pdq(2,1,0)),
arima111 = ARIMA(GDP_bc ~ 1 + pdq(1,1,1)),
rw_drift = ARIMA(GDP_bc ~ 1 + pdq(0,1,0))
)
# Compare AIC, AICc, BIC
us_models %>%
glance() %>%
select(.model, AIC, AICc, BIC) %>%
arrange(AICc)
## # A tibble: 5 × 4
## .model AIC AICc BIC
## <chr> <dbl> <dbl> <dbl>
## 1 arima110 657. 657. 663.
## 2 arima011 659. 659. 665.
## 3 arima111 659. 659. 667.
## 4 arima210 659. 659. 667.
## 5 rw_drift 668. 668. 672.
I tried several ARIMA models with different combinations of autoregressive (AR) and moving average (MA) terms to see which one best fits the Box–Cox-transformed U.S. GDP data.
Comparing the AIC, AICc, and BIC values, the ARIMA(1,1,0) model had the lowest scores, making it the best overall fit. The ARIMA(0,1,1) and ARIMA(1,1,1) models were close behind, but adding more parameters didn’t really improve performance.
Overall, the ARIMA(1,1,0) model is the simplest option that still explains the data well.
us_models %>% select(arima011) %>% gg_tsresiduals()
us_models %>% select(arima110) %>% gg_tsresiduals()
I compared the residuals from the ARIMA(1,1,0) and ARIMA(0,1,1) models to decide which one fits better. Both models show residuals that bounce randomly around zero with no clear patterns, and their ACF plots have no significant spikes- meaning there’s no leftover autocorrelation.
The histograms look roughly bell-shaped, so the residuals are close to normal. Overall, both models seem well-fitted, but since ARIMA(1,1,0) had the lowest AICc, it’s the better choice for forecasting.
# Forecast comparison
us_models %>%
forecast(h = 10) %>%
autoplot(us_gdp) +
facet_wrap(~.model) +
labs(
title = "Forecast Comparison: Candidate ARIMA Models for U.S. GDP",
y = "GDP (Box–Cox scale)", x = "Year"
)
All five ARIMA models produce similar forecast patterns, showing a steady upward trend in U.S. GDP. The prediction intervals widen over time, which makes sense because uncertainty naturally increases when forecasting further into the future.
The forecasts from the ARIMA(1,1,0) model look the most reasonable, they extend the long-term growth smoothly without overreacting to short-term noise. Overall, the forecasts make economic sense and align with the historical trend of steady GDP growth.
us_gdp <- global_economy %>%
filter(Country == "United States")
# Box–Cox transform
us_lambda <- us_gdp %>%
features(GDP, guerrero) %>%
pull(lambda_guerrero)
us_gdp <- us_gdp %>%
mutate(GDP_bc = box_cox(GDP, us_lambda))
us_fit_best <- us_gdp %>%
model(ARIMA(GDP_bc ~ 1 + pdq(1,1,0)))
report(us_fit_best)
## Series: GDP_bc
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
fit_ets <- us_gdp %>%
model(ETS(GDP))
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fc_arima <- us_fit_best %>%
forecast(h = 10) %>%
as_tibble() %>%
mutate(GDP = inv_box_cox(.mean, us_lambda),
Model = "ARIMA(1,1,0) with drift (Box–Cox)") %>%
select(Year, GDP, Model)
fc_ets <- fit_ets %>%
forecast(h = 10) %>%
as_tibble() %>%
mutate(Model = "ETS(M,A,N) – no transformation") %>%
select(Year, GDP, Model)
fc_compare_final <- bind_rows(fc_arima, fc_ets)
# Plot comparison
fc_compare_final %>%
mutate(
Year = as.numeric(Year),
GDP_mean = mean(GDP) / 1e9
) %>%
ggplot(aes(x = Year, y = GDP_mean)) +
geom_line(color = "steelblue") +
facet_wrap(~Model, scales = "free_y", ncol = 2) +
labs(
title = "Forecast Comparison: ARIMA vs ETS for U.S. GDP",
y = "Forecasted GDP (Billions of US$)",
x = "Year"
) +
theme_minimal()
Both the ARIMA(1,1,0) with drift and the ETS(M,A,N) model give very similar forecasts: a steady increase in U.S. GDP over time. The ARIMA model uses the Box–Cox transformation to stabilize variance, giving a smoother linear trend, while the ETS model fits directly on the original scale and captures a bit more curvature. Overall, both produce realistic forecasts, but the ETS model might be slightly more flexible for long-term growth patterns.