Module 8 Discussion

Part I: Non-Seasonal ARIMA

For this section, I analyze the exports of Algeria as a percentage of GDP, using the global_economy dataset.

# 1. Choose a time series
algeria_economy <- global_economy |> 
  filter(Country == "Algeria") |> 
  drop_na(Exports)

# Create a Train/Test split
train_data <- algeria_economy |> filter(Year <= 2012)
test_data  <- algeria_economy |> filter(Year > 2012)

# Plot ACF and PACF to justify manual parameter choice
train_data |> gg_tsdisplay(Exports, plot_type = 'partial')

# Fit Manual and Auto ARIMA models
# Note: The pdq(1, 1, 0) is a placeholder based on theoretical observation. 
# Adjust according to the actual ACF/PACF output.
fit_part1 <- train_data |> model(
  manual_arima = ARIMA(Exports ~ pdq(1, 1, 0)), 
  auto_arima   = ARIMA(Exports)
)

# Compare parameters chosen manually vs automatically
report(fit_part1)

## # A tibble: 2 × 9
##   Country .model       sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <fct>   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 Algeria manual_arima   37.4   -167.  339.  339.  343. <cpl [1]> <cpl [0]>
## 2 Algeria auto_arima     37.6   -168.  338.  338.  340. <cpl [0]> <cpl [0]>

# Forecast for the test period
fc_part1 <- fit_part1 |> forecast(new_data = test_data)

# Plot historical, actual, predictions, and 95% PI
fc_part1 |> 
  autoplot(algeria_economy, level = 95) +
  labs(title = "Algerian Exports: Manual vs Auto ARIMA",
       y = "Exports (% of GDP)",
       x = "Year") +
  theme_minimal()

Discussion: Based on the gg_tsdisplay output, the ACF plot shows a very slow decay, indicating that the data is non-stationary and has a trend. This justifies applying a first difference (\(d=1\)). Looking at the PACF plot, there is a significant spike at lag 1 that quickly cuts off, suggesting an autoregressive term of 1 (\(p=1\)). Therefore, my manual guess was an \(ARIMA(1,1,0)\).When comparing the manual model to the model selected by the auto_arima algorithm, the results are very close. The manual_arima yielded an AICc of 339.1158, while the auto_arima yielded a slightly lower (better) AICc of 338.2751. The algorithm’s lower AICc and BIC indicate it found a slightly more parsimonious fit for the data, minimizing information loss compared to the manual guess.

Part II: Seasonal ARIMA (SARIMA)

For seasonal data, will look at US Employment in the Leisure and Hospitality sector.

# Choose a seasonal time series
us_employment_leisure <- us_employment |> 
  filter(Title == "Leisure and Hospitality")

# Fit a Seasonal ARIMA model automatically
fit_part2 <- us_employment_leisure |> model(
  sarima = ARIMA(Employed)
)

# View coefficients for interpretation
report(fit_part2)

## Series: Employed 
## Model: ARIMA(2,1,2)(0,1,2)[12] 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2     sma1     sma2
##       1.6621  -0.9333  -1.5105  0.7822  -0.4322  -0.1297
## s.e.  0.0327   0.0299   0.0585  0.0489   0.0342   0.0359
## 
## sigma^2 estimated as 1104:  log likelihood=-4704.55
## AIC=9423.1   AICc=9423.22   BIC=9457.14

Discussion: The automatic model selection yielded an \(ARIMA(2,1,2)(0,1,2)[12]\).Non-Seasonal Part (2,1,2): The model applied a first difference (\(d=1\)) to remove the overall trend. The current employment numbers are influenced by the actual employment values of the previous two months (\(ar1 = 1.6621\), \(ar2 = -0.9333\)) and the residual errors of the previous two months (\(ma1 = -1.5105\), \(ma2 = 0.7822\)).Seasonal Part (0,1,2)[12]: The model applied a seasonal difference (\(D=1\)) to remove annual seasonality (period = 12 months). There are no seasonal autoregressive terms (\(P=0\)), but there are two seasonal moving average terms (\(Q=2\)). This means current employment is significantly impacted by the seasonal shocks (unpredictable shifts) from exactly 12 months ago (\(sma1 = -0.4322\)) and 24 months ago (\(sma2 = -0.1297\)).

Part III: WN, RW, and RW with Drift

Equations and Relationships

1. White Noise (WN) White noise represents pure randomness with no discernible pattern or memory. \[y_t = \epsilon_t\] (Assumption: \(\epsilon_t \sim WN(0, \sigma^2)\), meaning errors are independent and identically distributed with a mean of zero and constant variance).

2. Random Walk (RW) A random walk accumulates white noise over time, carrying permanent memory of past shocks. \[y_t = y_{t-1} + \epsilon_t\] Relationship: If you take the first difference of a RW model, you revert to White Noise. \[y_t - y_{t-1} = \epsilon_t\]

3. Random Walk with Drift This model accumulates white noise but includes a constant shift (\(c\)) at each step, introducing a systematic trend. \[y_t = c + y_{t-1} + \epsilon_t\] Relationship: If you take the first difference of a RW with Drift, you get White Noise plus a constant. \[y_t - y_{t-1} = c + \epsilon_t\]

Visualizing the Processes

set.seed(123)
n_obs <- 100

# Simulate the data
simulations <- tsibble(
  t = 1:n_obs,
  wn = rnorm(n_obs, mean = 0, sd = 1),
  index = t
) |> 
  mutate(
    rw = cumsum(wn),
    rw_drift = cumsum(0.5 + wn) # 0.5 represents the constant drift
  )

# Plot the three series
simulations |> 
  pivot_longer(cols = c(wn, rw, rw_drift), names_to = "Series", values_to = "Value") |> 
  ggplot(aes(x = t, y = Value, color = Series)) +
  geom_line() +
  facet_wrap(~Series, scales = "free_y", ncol = 1) +
  labs(title = "Simulated Stochastic Processes: WN, RW, and RW with Drift", 
       x = "Time (t)", y = "Value") +
  theme_minimal() +
  theme(legend.position = "none")

Process Comparison

Attribute	White Noise (WN)	Random Walk (RW)	RW with Drift
Trend	None	None (wanders unpredictably)	Upward or downward (driven by drift \(c\))
Shock Effect	Immediate and temporary	Permanent (shifts the entire future path)	Permanent (shifts the entire future path)
Memory	None	Infinite (permanent memory)	Infinite (permanent memory)
Stationarity	Yes (constant mean and variance)	No (variance grows with time)	No (mean and variance change over time)
Forecast Shape	Constant (flat line at the mean, usually 0)	Flat line (equal to the last observed value)	Trending line (slope equal to the drift \(c\))