Module 8 Discussion

Author

Jincheng Xie

library(fpp3)
library(ggplot2)
set.seed(69)

Part I: ARIMA Model Fitting

egypt <- global_economy |>
  filter(Country == "Egypt, Arab Rep.") |>
  select(Year, Exports)

egypt |> autoplot(Exports) +
  labs(title = "Egyptian Exports (% of GDP)", y = "% of GDP")

The time series shows considerable fluctuation over time without a strong linear trend, but there are extended periods of increase and decrease, suggesting possible non-stationarity.

ACF and PACF Analysis

egypt |> features(Exports, unitroot_kpss)

# A tibble: 1 × 2
  kpss_stat kpss_pvalue
      <dbl>       <dbl>
1     0.192         0.1

egypt |> features(Exports, unitroot_ndiffs)

# A tibble: 1 × 1
  ndiffs
   <int>
1      0

egypt |> ACF(Exports) |> autoplot() + labs(title = "ACF of Exports")

egypt |> PACF(Exports) |> autoplot() + labs(title = "PACF of Exports")

The ACF of the original series shows a slow, gradual decay — a hallmark of a non-stationary or strongly autoregressive series. The PACF shows a large significant spike at lag 1, with subsequent lags mostly within the significance bounds.

egypt |> ACF(difference(Exports)) |> autoplot() +
  labs(title = "ACF of Differenced Exports")

egypt |> PACF(difference(Exports)) |> autoplot() +
  labs(title = "PACF of Differenced Exports")

After differencing, the ACF and PACF help us identify the MA and AR orders respectively:

If the ACF cuts off sharply after lag \(q\), this suggests an MA(\(q\)) component.
If the PACF cuts off sharply after lag \(p\), this suggests an AR(\(p\)) component.

Based on the patterns observed, we select a candidate model. Given the typical behavior of this series (ACF with possible cutoff near lag 1-2, PACF with possible cutoff near lag 1), reasonable candidates include ARIMA(0,1,1), ARIMA(1,1,0), or ARIMA(1,1,1).

Manual vs Auto ARIMA

fit <- egypt |>
  model(
    manual_111 = ARIMA(Exports ~ pdq(1,1,1)),
    manual_011 = ARIMA(Exports ~ pdq(0,1,1)),
    manual_110 = ARIMA(Exports ~ pdq(1,1,0)),
    auto        = ARIMA(Exports)
  )

fit

# A mable: 1 x 4
      manual_111     manual_011     manual_110                   auto
         <model>        <model>        <model>                <model>
1 <ARIMA(1,1,1)> <ARIMA(0,1,1)> <ARIMA(1,1,0)> <ARIMA(2,0,1) w/ mean>

glance(fit) |> select(.model, AICc, BIC)

# A tibble: 4 × 3
  .model      AICc   BIC
  <chr>      <dbl> <dbl>
1 manual_111  298.  304.
2 manual_011  296.  300.
3 manual_110  296.  300.
4 auto        294.  303.

The model with the lowest AICc is preferred.

fit |> select(auto) |> report()

Series: Exports 
Model: ARIMA(2,0,1) w/ mean 

Coefficients:
         ar1      ar2      ma1  constant
      1.6764  -0.8034  -0.6896    2.5623
s.e.  0.1111   0.0928   0.1492    0.1161

sigma^2 estimated as 8.046:  log likelihood=-141.57
AIC=293.13   AICc=294.29   BIC=303.43

Reconciliation: The auto algorithm selected an ARIMA(2,0,1) model with a non-zero mean. This differs from our manual candidates, which all applied a first difference (\(d=1\)). The automatic selection suggests that while the original ACF decayed slowly (often a sign of non-stationarity), the algorithm found that a second-order autoregressive process (\(p=2\)) along with an MA(1) term was sufficient to capture the high persistence in the data without strictly requiring differencing. By optimizing for the lowest AICc, the algorithm determined that modeling this persistent AR(2) process with a mean provided a more parsimonious and better-fitting specification than forcing a strict \(d=1\) difference.

Forecasting

fc <- fit |> select(auto) |> forecast(h = 5)

fc |>
  autoplot(egypt, level = 95) +
  labs(title = "Egyptian Exports: Forecast with 95% Prediction Interval",
       y = "% of GDP")

The black line shows historical values, the blue line shows point forecasts, and the shaded region is the 95% prediction interval. The interval widens over the forecast horizon because uncertainty accumulates — each future step depends on the previous unknown values plus a new shock.

Part II: Seasonal ARIMA

myseries <- aus_retail |>
  filter(
    State == "New South Wales",
    Industry == "Department stores"
  ) |>
  select(Month, Turnover)

myseries |> autoplot(Turnover) +
  labs(title = "NSW Department Store Turnover", y = "Turnover (A$ Million)")

The plot shows a clear upward trend and recurring seasonal peaks (typically around December due to holiday shopping), confirming this data is well-suited for a Seasonal ARIMA model.

Fit a Seasonal ARIMA Model

fit_seasonal <- myseries |>
  model(sarima = ARIMA(Turnover))

fit_seasonal |> report()

Series: Turnover 
Model: ARIMA(2,1,1)(1,1,2)[12] 

Coefficients:
          ar1      ar2      ma1     sar1    sma1     sma2
      -0.3126  -0.1833  -0.7290  -0.6271  0.2332  -0.3270
s.e.   0.0682   0.0609   0.0523   0.2523  0.2483   0.0968

sigma^2 estimated as 370.4:  log likelihood=-1872.01
AIC=3758.02   AICc=3758.29   BIC=3786.43

The report() output shows the selected ARIMA(p,d,q)(P,D,Q)[12] specification. The “[12]” denotes a seasonal period of 12 months. Let’s examine each component:

d (non-seasonal differencing): Removes the overall trend. A first difference (\(d=1\)) transforms the data so the mean level is stable over time.
D (seasonal differencing): Removes the repeating seasonal pattern. A seasonal difference (\(D=1\)) subtracts the value from the same month last year, eliminating the annual cycle.
p (non-seasonal AR order): The AR coefficients capture how much the current value depends on recent past values (at lags 1, 2, …, \(p\)). A positive AR1 coefficient means high values tend to follow high values in the short run.
q (non-seasonal MA order): The MA coefficients capture how much recent forecast errors influence the current value. They model the short-run adjustment to shocks.
P (seasonal AR order): Captures year-over-year autoregressive dependence. For example, a seasonal AR1 coefficient tells us how much this January depends on last January (after differencing).
Q (seasonal MA order): Captures how seasonal forecast errors propagate. A seasonal MA1 coefficient indicates that a shock felt in, say, last December still influences this December.

fit_seasonal |> tidy()

# A tibble: 6 × 6
  .model term  estimate std.error statistic  p.value
  <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 sarima ar1     -0.313    0.0682    -4.58  5.97e- 6
2 sarima ar2     -0.183    0.0609    -3.01  2.76e- 3
3 sarima ma1     -0.729    0.0523   -13.9   1.28e-36
4 sarima sar1    -0.627    0.252     -2.49  1.33e- 2
5 sarima sma1     0.233    0.248      0.939 3.48e- 1
6 sarima sma2    -0.327    0.0968    -3.38  7.98e- 4

The tidy() output provides the statistical significance of our model’s components. Looking at the p-values, almost all coefficients (ar1, ar2, ma1, sar1, and sma2) are highly significant (p-value < 0.05). However, it is interesting to note that the sma1 (Seasonal MA 1) coefficient has a p-value of approximately 0.348, indicating it is not statistically significant at the 5% level. This suggests that while the auto.arima algorithm included it to minimize the overall AICc, this specific first-order seasonal shock term doesn’t strongly drive the model compared to the second-order seasonal shock (sma2) and the seasonal autoregressive term (sar1). The negative non-seasonal AR coefficients (ar1 and ar2) suggest an oscillating, mean-reverting behavior in the short-term differences.

Forecast

fit_seasonal |> forecast(h = 24) |>
  autoplot(myseries |> filter(year(Month) >= 2010)) +
  labs(title = "Seasonal ARIMA Forecast: NSW Department Stores",
       y = "Turnover (A$ Million)")

The forecast reproduces the seasonal pattern (peaks in December, troughs in February) and continues the overall trend. The prediction intervals widen over the 2-year horizon as uncertainty grows.

Part III: White Noise, Random Walk, and Random Walk with Drift

Equations and Assumptions

White Noise (WN)

\[Y_t = \varepsilon_t, \quad \varepsilon_t \sim \text{iid } N(0, \sigma^2)\]

Each observation is an independent, identically distributed random draw.
No dependence on the past whatsoever.
\(E[Y_t] = 0\), \(\text{Var}(Y_t) = \sigma^2\) — both constant over time.

Random Walk (RW)

\[Y_t = Y_{t-1} + \varepsilon_t, \quad \varepsilon_t \sim \text{iid } N(0, \sigma^2)\]

Today’s value is yesterday’s value plus a new shock.
By recursive substitution (assuming \(Y_0 = 0\)): \(Y_t = \sum_{i=1}^{t} \varepsilon_i\) — a cumulative sum of shocks.
\(E[Y_t] = 0\), but \(\text{Var}(Y_t) = t\sigma^2\) — variance grows linearly with time.

Random Walk with Drift (RW + Drift)

\[Y_t = \delta + Y_{t-1} + \varepsilon_t, \quad \varepsilon_t \sim \text{iid } N(0, \sigma^2)\]

Same as RW but a constant \(\delta\) is added each period.
By recursive substitution: \(Y_t = \delta t + \sum_{i=1}^{t} \varepsilon_i\).
\(E[Y_t] = \delta t\) — the mean drifts linearly. \(\text{Var}(Y_t) = t\sigma^2\) — variance still grows.

Relationship Through Differencing

Differencing a Random Walk gives White Noise: \[\Delta Y_t = Y_t - Y_{t-1} = (Y_{t-1} + \varepsilon_t) - Y_{t-1} = \varepsilon_t \quad \Rightarrow \quad \text{WN}\]

Differencing a Random Walk with Drift gives White Noise + constant: \[\Delta Y_t = Y_t - Y_{t-1} = (\delta + Y_{t-1} + \varepsilon_t) - Y_{t-1} = \delta + \varepsilon_t \quad \Rightarrow \quad \text{WN} + \delta\]

Conversely, a RW is the cumulative sum of WN, and a RW with drift is the cumulative sum of WN + constant. They are linked through integration (cumulation) and differencing.

Simulation and Visualization

n <- 200
sigma <- 1
delta <- 0.2

eps <- rnorm(n, mean = 0, sd = sigma)

wn <- eps
rw <- cumsum(eps)
rw_drift <- cumsum(delta + eps)

sim_data <- tibble(
  t = rep(1:n, 3),
  value = c(wn, rw, rw_drift),
  process = rep(c("White Noise", "Random Walk", "RW with Drift"), each = n)
) |>
  mutate(process = factor(process, levels = c("White Noise", "Random Walk", "RW with Drift")))

ggplot(sim_data, aes(x = t, y = value)) +
  geom_line(color = "steelblue") +
  facet_wrap(~process, scales = "free_y", ncol = 1) +
  labs(title = "Simulated WN, RW, and RW with Drift",
       x = "Time", y = "Value") +
  theme_minimal()

What to observe in the plots:

White Noise fluctuates rapidly around zero with no visible pattern — values at different times are unrelated.
Random Walk wanders away from zero in an unpredictable path — it has no tendency to return to any particular level. The range it covers grows over time.
RW with Drift shows the same wandering behavior but superimposed on a systematic upward trend (because \(\delta = 0.2 > 0\)). The drift pulls the series steadily upward on average.

Verify the Differencing Relationship

rw_diff <- diff(rw)
rw_drift_diff <- diff(rw_drift)

verify_data <- tibble(
  t = rep(1:(n-1), 2),
  value = c(rw_diff, rw_drift_diff),
  series = rep(c("Differenced RW → WN (mean ≈ 0)",
                  "Differenced RW+Drift → WN + δ (mean ≈ 0.2)"), each = n-1)
)

ggplot(verify_data, aes(x = t, y = value)) +
  geom_line(color = "steelblue") +
  facet_wrap(~series, ncol = 1) +
  geom_hline(data = tibble(
    series = c("Differenced RW → WN (mean ≈ 0)",
               "Differenced RW+Drift → WN + δ (mean ≈ 0.2)"),
    yint = c(0, delta)),
    aes(yintercept = yint), color = "red", linetype = "dashed") +
  labs(title = "Verification: Differencing Recovers White Noise",
       x = "Time", y = "Value") +
  theme_minimal()

The red dashed lines show the theoretical means. Differencing the RW produces noise centered at 0 (pure WN). Differencing the RW with drift produces noise centered at \(\delta = 0.2\) (WN + constant). This confirms the mathematical relationship. ✓

cat("Mean of differenced RW:        ", mean(rw_diff), "\n")

Mean of differenced RW:         0.0277581

cat("Mean of differenced RW+Drift:  ", mean(rw_drift_diff), "\n")

Mean of differenced RW+Drift:   0.2277581

cat("Expected mean of WN:            0\n")

Expected mean of WN:            0

cat("Expected mean of WN + delta:    0.2\n")

Expected mean of WN + delta:    0.2

The sample means are close to the theoretical values (small deviations are expected due to finite sample size).

Comparison Table

Property	White Noise	Random Walk	RW with Drift
Trend	None	None (no systematic direction)	Linear trend (direction set by sign of \(\delta\))
Shock	Independent at each \(t\); no accumulation	Each shock \(\varepsilon_t\) permanently changes the level	Each shock permanently changes the level
Memory	None — past shocks have zero effect on future	Permanent — every past shock is embedded in \(Y_t\) forever	Permanent — same as RW
Stationarity	Yes — constant mean (0) and constant variance (\(\sigma^2\))	No — mean is 0 but variance (\(t\sigma^2\)) grows without bound	No — both mean (\(\delta t\)) and variance (\(t\sigma^2\)) change over time
Forecast	Flat line at the mean (0)	Flat line at the last observed value \(Y_T\)	Trending line from last value: \(Y_T + \delta h\) for \(h\) steps ahead