ARIMA, ADL, ARCH, and GARCH Models

Author

Ethan

Published

April 30, 2026

Part 1

ARIMA(p, d, q)

ARIMA models a time series using its own past values and past forecast errors.

\[ \tilde{y}_t = \mu + \phi_1 \tilde{y}_{t-1} + \cdots + \phi_p \tilde{y}_{t-p} + \varepsilon_t - \theta_1 \varepsilon_{t-1} - \cdots - \theta_q \varepsilon_{t-q} \]

The difference between this and ARCH/GARCH is that arima is modeled on the mean, and also has homoskeastic errors

ADL(p, q)

The ADL model builds on the AR structure with a regressor and its lags:

\[ y_t = \mu + \phi_1 y_{t-1} + \cdots + \phi_p y_{t-p} + \beta_0 x_t + \beta_1 x_{t-1} + \cdots + \beta_q x_{t-q} + \varepsilon_t \]

Adding a regressor gives us the ARIMAX framework. Both ARIMA and ADL model only the mean with homoskedastic error.

ARCH(q)

Method for modelling variance by clustering

Mean equation:

\[ y_t = \mu + \varepsilon_t, \qquad \varepsilon_t = \sigma_t z_t, \quad z_t \overset{\text{i.i.d.}}{\sim} N(0,1) \]

Variance equation:

\[ \sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \alpha_2 \varepsilon_{t-2}^2 + \cdots + \alpha_q \varepsilon_{t-q}^2 \]

While ARIMA has autoregressive mean, ARCH says that the variance itself is autoregressive.

GARCH(p, q) Generalized ARCH

Adds variance lags to ARCH

Mean equation:

\[ y_t = \mu + \varepsilon_t, \qquad \varepsilon_t = \sigma_t z_t, \quad z_t \overset{\text{i.i.d.}}{\sim} N(0,1) \]

Variance equation:

\[ \sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \cdots + \alpha_q \varepsilon_{t-q}^2 + \beta_1 \sigma_{t-1}^2 + \cdots + \beta_p \sigma_{t-p}^2 \]

When p is or is close to 0, GARCH acts like ARCH. When p has minimal impact on the model, it’s better to just use ARCH because it is less complex while yielding the same results

Part 2

Data

library(rugarch)

Loading required package: parallel

library(ggplot2)
library(patchwork)
library(scales)
library(knitr)
library(tidyquant)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.12 ──
✔ PerformanceAnalytics 2.1.0      ✔ TTR                  0.24.4
✔ quantmod             0.4.28     ✔ xts                  0.14.2
── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
✖ zoo::as.Date()                 masks base::as.Date()
✖ zoo::as.Date.numeric()         masks base::as.Date.numeric()
✖ PerformanceAnalytics::legend() masks graphics::legend()
✖ quantmod::summary()            masks base::summary()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)


######################### Warning from 'xts' package ##########################
#                                                                             #
# The dplyr lag() function breaks how base R's lag() function is supposed to  #
# work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
# source() into this session won't work correctly.                            #
#                                                                             #
# Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
# conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
# dplyr from breaking base R's lag() function.                                #
#                                                                             #
# Code in packages is not affected. It's protected by R's namespace mechanism #
# Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
#                                                                             #
###############################################################################

Attaching package: 'dplyr'

The following objects are masked from 'package:xts':

    first, last

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

sp500_daily <- tq_get("^GSPC",
                      from = "2021-03-01",
                      to   = "2026-03-31")

sp500_daily <- sp500_daily |>
  mutate(returns = (adjusted / lag(adjusted) - 1) * 100)

ggplot(sp500_daily |> na.omit(), aes(x = date, y = returns)) +
  geom_line(color = "blue", linewidth = 0.3, alpha = 0.8) +
  geom_hline(yintercept = mean(sp500_daily$returns, na.rm = TRUE), color = "orange",
             linetype = "dashed", linewidth = 0.8) +
  labs(title = "S&P 500 Daily Returns",
       subtitle = "Dashed line = sample mean",
       x = NULL, y = "Return (%)") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", color = "black"))

Clear volatility clustering

Model Specification (Pre-estimation)

Mean equation:

\[y_t = \mu + \varepsilon_t, \qquad \varepsilon_t = \sigma_t z_t, \quad z_t \overset{\text{i.i.d.}}{\sim} N(0,1)\]

Variance equation:

\[\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2\]

spec <- ugarchspec(
  variance.model     = list(model = "sGARCH", garchOrder = c(1, 1)),
  mean.model         = list(armaOrder = c(0, 0), include.mean = TRUE),
  distribution.model = "norm"
)

fit <- ugarchfit(spec = spec, data = na.omit(sp500_daily$returns))
fit


*---------------------------------*
*          GARCH Model Fit        *
*---------------------------------*

Conditional Variance Dynamics   
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model  : ARFIMA(0,0,0)
Distribution    : norm 

Optimal Parameters
------------------------------------
        Estimate  Std. Error  t value Pr(>|t|)
mu      0.073698    0.023652   3.1160 0.001833
omega   0.034497    0.008918   3.8685 0.000110
alpha1  0.104803    0.017859   5.8682 0.000000
beta1   0.862712    0.022040  39.1435 0.000000

Robust Standard Errors:
        Estimate  Std. Error  t value Pr(>|t|)
mu      0.073698    0.021746   3.3890 0.000701
omega   0.034497    0.010489   3.2888 0.001006
alpha1  0.104803    0.021885   4.7888 0.000002
beta1   0.862712    0.022987  37.5310 0.000000

LogLikelihood : -1732.655 

Information Criteria
------------------------------------
                   
Akaike       2.7220
Bayes        2.7382
Shibata      2.7220
Hannan-Quinn 2.7281

Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
                        statistic p-value
Lag[1]                    0.04202  0.8376
Lag[2*(p+q)+(p+q)-1][2]   0.47659  0.7047
Lag[4*(p+q)+(p+q)-1][5]   3.01256  0.4048
d.o.f=0
H0 : No serial correlation

Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
                        statistic p-value
Lag[1]                     0.4459  0.5043
Lag[2*(p+q)+(p+q)-1][5]    4.8945  0.1617
Lag[4*(p+q)+(p+q)-1][9]    7.7663  0.1431
d.o.f=2

Weighted ARCH LM Tests
------------------------------------
            Statistic Shape Scale P-Value
ARCH Lag[3]    0.1355 0.500 2.000 0.71280
ARCH Lag[5]    6.9047 1.440 1.667 0.03670
ARCH Lag[7]    7.7790 2.315 1.543 0.05873

Nyblom stability test
------------------------------------
Joint Statistic:  0.5499
Individual Statistics:              
mu     0.06488
omega  0.04749
alpha1 0.15132
beta1  0.12752

Asymptotic Critical Values (10% 5% 1%)
Joint Statistic:         1.07 1.24 1.6
Individual Statistic:    0.35 0.47 0.75

Sign Bias Test
------------------------------------
                   t-value     prob sig
Sign Bias           1.9120 0.056102   *
Negative Sign Bias  0.5961 0.551206    
Positive Sign Bias  0.2371 0.812654    
Joint Effect       11.3949 0.009771 ***


Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
  group statistic p-value(g-1)
1    20     44.75    0.0007423
2    30     55.05    0.0024445
3    40     73.28    0.0007297
4    50     75.80    0.0083429


Elapsed time : 0.02457881

Estimated Equations (Filled-In Coefficients)

cf       <- coef(fit)
mu_hat    <- cf["mu"]
omega_hat <- cf["omega"]
alpha_hat <- cf["alpha1"]
beta_hat  <- cf["beta1"]

persist  <- alpha_hat + beta_hat
lr_var   <- omega_hat / (1 - persist)
lr_vol   <- sqrt(as.numeric(lr_var))
half_life <- log(0.5) / log(persist)

params_tbl <- data.frame(
  Parameter     = c("mu", "omega", "alpha1", "beta1"),
  Estimate      = round(c(mu_hat, omega_hat, alpha_hat, beta_hat), 5),
  `Std. Error`  = round(fit@fit$matcoef[, 2], 5),
  `t-stat`      = round(fit@fit$matcoef[, 3], 3),
  `p-value`     = signif(fit@fit$matcoef[, 4], 3)
)
kable(params_tbl, caption = "GARCH(1,1) Estimated Coefficients")

GARCH(1,1) Estimated Coefficients
	Parameter	Estimate	Std..Error	t.stat	p.value
mu	mu	0.07370	0.02365	3.116	0.00183
omega	omega	0.03450	0.00892	3.869	0.00011
alpha1	alpha1	0.10480	0.01786	5.868	0.00000
beta1	beta1	0.86271	0.02204	39.143	0.00000

Updated mean equation:

\[\hat{y}_t = {0.07370} + \varepsilon_t\]

Updated variance equation:

\[\hat{\sigma}_t^2 = {0.0345} + {0.1048}\,\varepsilon_{t-1}^2 + {0.8627}\,\sigma_{t-1}^2\]

Conditional Volatility Plot

cond_vol <- sigma(fit)
df_vol   <- data.frame(date = sp500_daily$date[!is.na(sp500_daily$returns)],
                       vol  = as.numeric(cond_vol))

ggplot(df_vol, aes(x = date, y = vol)) +
  geom_line(color = "orange", linewidth = 0.5) +
  geom_hline(yintercept = lr_vol, color = "red",
             linetype = "dashed", linewidth = 0.9) +
  annotate("text", x = max(df_vol$date) - 200, y = lr_vol + 0.05,
           label = paste0("LR Vol = ", round(lr_vol, 3), "%"),
           color = "red", size = 3.5) +
  labs(title = expression("Conditional Volatility " * hat(sigma)[t]),
       subtitle = "Dashed = long-run unconditional volatility",
       x = NULL, y = expression(hat(sigma)[t] * " (%)")) +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold", color = "black"))

Interpretation

Coefficient Interpretation

\(\hat{\mu} = 0.0737\): Average daily log return ≈ 7.4 basis points
\(\hat{\omega} = 0.0345\): Variance intercept
\(\hat{\alpha}_1 = 0.1048\) (ARCH): A unit increase in yesterday’s squared shock raises today’s conditional variance by 0.1048. The low magnitude means individual shocks don’t spike volatility dramatically on their own.
\(\hat{\beta}_1 = 0.8627\) (GARCH): Yesterday’s conditional variance carries forward with weight 0.8627. Variance is highly persistent, meaning high follows high and low follows low.

Volatility Clustering

The ACF of squared returns (above) shows significant autocorrelation extending well beyond lag 20 — the standard diagnostic for ARCH effects. The conditional volatility plot confirms that turbulent periods cluster together rather than being randomly distributed. This is precisely what \(\hat{\beta}_1 \approx 0.90\) encodes: today’s variance is largely inherited from yesterday’s.

Persistence

\[\hat{\alpha}_1 + \hat{\beta}_1 = 0.0688 + 0.9037 = 0.9725\]

cat("Persistence (alpha + beta):", round(persist, 6), "\n")

Persistence (alpha + beta): 0.967514

Persistence of 0.9725 is very high meaning little reversion in direction.

Long-Run Variance and Mean Reversion

derived <- data.frame(
  Quantity = c("Persistence (α + β)", "Long-run daily variance",
               "Long-run daily vol",
               "Half-life of shock (days)"),
  Value    = c(round(persist, 6),
               round(as.numeric(lr_var), 5),
               round(sqrt(as.numeric(lr_var)), 5),
               round(as.numeric(half_life), 1))
)
kable(derived, caption = "Derived GARCH(1,1) Quantities")

Derived GARCH(1,1) Quantities
Quantity	Value
Persistence (α + β)	0.967514
Long-run daily variance	1.061920
Long-run daily vol	1.030500
Half-life of shock (days)	21.000000

The unconditional daily volatility of ~1.03% means that if volatility spike above or below, GARCH forecasts a gradual decay back towards that value.

Half-Life of a Volatility Shock

A volatility shock takes approximately 21 days to be halfway back to normal. This means big shocks take a long time to be forgotten.