Module 8

ARIMA

setwd("~/Library/Mobile Documents/com~apple~CloudDocs/Boston College/Spring 2026/Forecasting/Discussions/Module 8")

library(fpp3)       
library(fredr)

fredr_set_key(Sys.getenv("FRED_API_KEY"))

indpro_raw <- fredr(
  series_id = "INDPRO",
  observation_start = as.Date("2010-01-01")
)

indpro <- indpro_raw |>
  mutate(Month = yearmonth(date)) |>
  select(Month, value) |>
  rename(IP = value) |>
  as_tsibble(index = Month)

indpro |> autoplot(IP) +
  labs(title = "US Industrial Production Index",
       y = "Index (2017 = 100)", x = NULL)

indpro |>
  gg_tsdisplay(IP, plot_type = "partial")

Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
ℹ Please use `ggtime::gg_tsdisplay()` instead.

indpro |>
  gg_tsdisplay(difference(IP), plot_type = "partial")

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).

Parameters:

d = 1: The ACF of the raw series decays slowly and nearly linearly, indicating non-stationarity and requiring one difference.

p = 0 and q =2: It is hard to tell by just looking at the ACF and PACF plot, but I chose these values because the ACF plot of the differenced series shows significant spikes at lags 1 and 2 and then cuts off, which is the signature of an MA(2) process.

ARIMA

fit <- indpro |>
  model(auto = ARIMA(IP))

report(fit)

Series: IP 
Model: ARIMA(0,1,3) 

Coefficients:
         ma1      ma2      ma3
      0.1997  -0.2529  -0.1503
s.e.  0.0707   0.0729   0.0720

sigma^2 estimated as 1.375:  log likelihood=-303.21
AIC=614.42   AICc=614.63   BIC=627.47

Interpretation: My manual ARIMA(0, 1, 2) and the automatically selected ARIMA(0, 1, 3) agree on the MA form and differ only by one lag, suggesting my reading of the correlograms was consistent with the algorithm’s formal AICc-based selection.

Forecast

fit |>
  forecast(h = 12) |>
  autoplot(indpro, level = 95) +
  labs(title = "Industrial Production: Auto ARIMA Forecast",
       subtitle = "12-month forecast with 95% prediction interval",
       y = "IP Index", x = NULL)

SARIMA

houst_raw <- fredr(
  series_id = "HOUSTNSA",
  observation_start = as.Date("2010-01-01")
)

houst <- houst_raw |>
  mutate(Month = yearmonth(date)) |>
  select(Month, value) |>
  rename(Starts = value) |>
  as_tsibble(index = Month)

houst |> autoplot(Starts) +
  labs(title = "US Housing Starts (Not Seasonally Adjusted)",
       y = "Thousands of units", x = NULL)

houst |> gg_season(Starts) +
  labs(title = "Seasonal plot of Housing Starts",
       y = "Thousands of units")

Warning: `gg_season()` was deprecated in feasts 0.4.2.
ℹ Please use `ggtime::gg_season()` instead.

fit_sarima <- houst |>
  model(sarima = ARIMA(Starts))

report(fit_sarima)

Series: Starts 
Model: ARIMA(0,1,3)(1,1,0)[12] 

Coefficients:
          ma1     ma2      ma3     sar1
      -0.4979  -0.049  -0.1808  -0.4353
s.e.   0.0733   0.088   0.0876   0.0673

sigma^2 estimated as 107.3:  log likelihood=-675.8
AIC=1361.6   AICc=1361.94   BIC=1377.56

Parameters

p = 0: No non-seasonal AR terms, meaning housing starts are not directly regressed on their own recent past values.

d = 1: One non-seasonal difference was applied to remove the trend in housing starts.

q = 3: Three non-seasonal MA terms, meaning this month’s housing starts depend on the forecast errors from the previous three months.

P = 1: One seasonal AR term, meaning this month’s value depends on the value from 12 months ago.

D = 1: One seasonal difference was applied to remove the annual seasonal cycle.

Q = 0: No seasonal MA terms were included.

Coefficients

ma1 = -0.4979 (significant): A negative and strongly significant coefficient meaning a positive shock last month is followed by a downward correction of about half that size this month, capturing short-run mean reversion.

ma2 = -0.049 (not significant): Essentially zero and statistically insignificant, indicating that two-month-ago shocks have no meaningful effect on current housing starts.

ma3 = -0.1808 (significant): A smaller but still significant negative effect, suggesting that shocks from three months ago continue to exert a modest downward correction as housing starts return to trend.

sar1 = -0.4353 (significant): A strongly significant negative coefficient meaning that if last year’s same-month residual was unusually high, this month’s residual tends to be unusually low, capturing year-over-year mean reversion after seasonal differencing.

White Noise and Random Walk

White Noise:

\[ y_t = \varepsilon_t \]

Random Walk:

\[y_t = y_{t-1} + \varepsilon_t\]

Random Walk with Drift:

\[y_t = c + y_{t-1} + \varepsilon_t\]

Differencing a Random Walk:

\[\Delta y_t = y_t - y_{t-1} = \varepsilon_t\]

Differencing a Random Walk with Drift:

\[\Delta y_t = y_t - y_{t-1} = c + \varepsilon_t\]

Where:

\(y_t\) is the value of the series at time t
\(\varepsilon_t\) is a random shock with a mean of 0 and constant variance of \(\sigma^2\)
\(c\) is a constant drift term that adds a systematic push each period

n <- 200
eps <- rnorm(n, mean = 0, sd = 1)

wn  <- eps
rw  <- cumsum(eps)
rwd <- cumsum(0.3 + eps)

sim <- tibble(
  t = 1:n,
  `White Noise` = wn,
  `Random Walk` = rw,
  `Random Walk with Drift` = rwd
) |>
  pivot_longer(cols = -t, names_to = "Process", values_to = "Value") |>
  as_tsibble(index = t, key = Process)

sim |>
  autoplot(Value) +
  facet_wrap(~ Process, scales = "free_y", ncol = 1) +
  labs(title = "Simulated White Noise, Random Walk, and Random Walk with Drift",
       y = "Value", x = "Time") +
  theme(legend.position = "none")

Interpretation: The three simulated series differ in appearance and behavior. White noise fluctuates tightly around zero with no trend, no memory, and constant variance, making it the only stationary process of the three. The random walk drifts aimlessly because each new value is built on the previous one plus a fresh shock, giving it permanent memory and a variance that grows linearly with time. The random walk with drift looks similar but gains a systematic direction from the constant term, producing a visible trend on top of the random walk.

Comparison

Feature	White Noise	Random Walk	Random Walk with Drift
Trend	None	None, but wanders unpredictably	Upward or downward depending on sign of \(c\)
Shock	Each value is a fresh shock	Each period adds a new shock to the previous value	Each period adds a new shock plus a constant drift
Memory	None	Permanent	Permanent
Stationarity	Yes	No, variance grows with time	No, mean and variance grow with time
Forecast	Flat at the mean (zero)	Flat at the last observed value	Linear trend from the last value with slope \(c\)