For this section, I analyze the exports of Algeria as a percentage of
GDP, using the global_economy dataset.
# 1. Choose a time series
algeria_economy <- global_economy |>
filter(Country == "Algeria") |>
drop_na(Exports)
# Create a Train/Test split
train_data <- algeria_economy |> filter(Year <= 2012)
test_data <- algeria_economy |> filter(Year > 2012)
# Plot ACF and PACF to justify manual parameter choice
train_data |> gg_tsdisplay(Exports, plot_type = 'partial')
# Fit Manual and Auto ARIMA models
# Note: The pdq(1, 1, 0) is a placeholder based on theoretical observation.
# Adjust according to the actual ACF/PACF output.
fit_part1 <- train_data |> model(
manual_arima = ARIMA(Exports ~ pdq(1, 1, 0)),
auto_arima = ARIMA(Exports)
)
# Compare parameters chosen manually vs automatically
report(fit_part1)
## # A tibble: 2 × 9
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 Algeria manual_arima 37.4 -167. 339. 339. 343. <cpl [1]> <cpl [0]>
## 2 Algeria auto_arima 37.6 -168. 338. 338. 340. <cpl [0]> <cpl [0]>
# Forecast for the test period
fc_part1 <- fit_part1 |> forecast(new_data = test_data)
# Plot historical, actual, predictions, and 95% PI
fc_part1 |>
autoplot(algeria_economy, level = 95) +
labs(title = "Algerian Exports: Manual vs Auto ARIMA",
y = "Exports (% of GDP)",
x = "Year") +
theme_minimal()
Discussion: Based on the gg_tsdisplay output, the ACF plot shows a very slow decay, indicating that the data is non-stationary and has a trend. This justifies applying a first difference (\(d=1\)). Looking at the PACF plot, there is a significant spike at lag 1 that quickly cuts off, suggesting an autoregressive term of 1 (\(p=1\)). Therefore, my manual guess was an \(ARIMA(1,1,0)\).When comparing the manual model to the model selected by the auto_arima algorithm, the results are very close. The manual_arima yielded an AICc of 339.1158, while the auto_arima yielded a slightly lower (better) AICc of 338.2751. The algorithm’s lower AICc and BIC indicate it found a slightly more parsimonious fit for the data, minimizing information loss compared to the manual guess.
For seasonal data, will look at US Employment in the Leisure and Hospitality sector.
# Choose a seasonal time series
us_employment_leisure <- us_employment |>
filter(Title == "Leisure and Hospitality")
# Fit a Seasonal ARIMA model automatically
fit_part2 <- us_employment_leisure |> model(
sarima = ARIMA(Employed)
)
# View coefficients for interpretation
report(fit_part2)
## Series: Employed
## Model: ARIMA(2,1,2)(0,1,2)[12]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 sma2
## 1.6621 -0.9333 -1.5105 0.7822 -0.4322 -0.1297
## s.e. 0.0327 0.0299 0.0585 0.0489 0.0342 0.0359
##
## sigma^2 estimated as 1104: log likelihood=-4704.55
## AIC=9423.1 AICc=9423.22 BIC=9457.14
Discussion: The automatic model selection yielded an \(ARIMA(2,1,2)(0,1,2)[12]\).Non-Seasonal Part (2,1,2): The model applied a first difference (\(d=1\)) to remove the overall trend. The current employment numbers are influenced by the actual employment values of the previous two months (\(ar1 = 1.6621\), \(ar2 = -0.9333\)) and the residual errors of the previous two months (\(ma1 = -1.5105\), \(ma2 = 0.7822\)).Seasonal Part (0,1,2)[12]: The model applied a seasonal difference (\(D=1\)) to remove annual seasonality (period = 12 months). There are no seasonal autoregressive terms (\(P=0\)), but there are two seasonal moving average terms (\(Q=2\)). This means current employment is significantly impacted by the seasonal shocks (unpredictable shifts) from exactly 12 months ago (\(sma1 = -0.4322\)) and 24 months ago (\(sma2 = -0.1297\)).
1. White Noise (WN) White noise represents pure randomness with no discernible pattern or memory. \[y_t = \epsilon_t\] (Assumption: \(\epsilon_t \sim WN(0, \sigma^2)\), meaning errors are independent and identically distributed with a mean of zero and constant variance).
2. Random Walk (RW) A random walk accumulates white noise over time, carrying permanent memory of past shocks. \[y_t = y_{t-1} + \epsilon_t\] Relationship: If you take the first difference of a RW model, you revert to White Noise. \[y_t - y_{t-1} = \epsilon_t\]
3. Random Walk with Drift This model accumulates white noise but includes a constant shift (\(c\)) at each step, introducing a systematic trend. \[y_t = c + y_{t-1} + \epsilon_t\] Relationship: If you take the first difference of a RW with Drift, you get White Noise plus a constant. \[y_t - y_{t-1} = c + \epsilon_t\]
set.seed(123)
n_obs <- 100
# Simulate the data
simulations <- tsibble(
t = 1:n_obs,
wn = rnorm(n_obs, mean = 0, sd = 1),
index = t
) |>
mutate(
rw = cumsum(wn),
rw_drift = cumsum(0.5 + wn) # 0.5 represents the constant drift
)
# Plot the three series
simulations |>
pivot_longer(cols = c(wn, rw, rw_drift), names_to = "Series", values_to = "Value") |>
ggplot(aes(x = t, y = Value, color = Series)) +
geom_line() +
facet_wrap(~Series, scales = "free_y", ncol = 1) +
labs(title = "Simulated Stochastic Processes: WN, RW, and RW with Drift",
x = "Time (t)", y = "Value") +
theme_minimal() +
theme(legend.position = "none")
| Attribute | White Noise (WN) | Random Walk (RW) | RW with Drift |
|---|---|---|---|
| Trend | None | None (wanders unpredictably) | Upward or downward (driven by drift \(c\)) |
| Shock Effect | Immediate and temporary | Permanent (shifts the entire future path) | Permanent (shifts the entire future path) |
| Memory | None | Infinite (permanent memory) | Infinite (permanent memory) |
| Stationarity | Yes (constant mean and variance) | No (variance grows with time) | No (mean and variance change over time) |
| Forecast Shape | Constant (flat line at the mean, usually 0) | Flat line (equal to the last observed value) | Trending line (slope equal to the drift \(c\)) |