library(tidyverse)
library(patchwork)
library(knitr)
library(kableExtra)
library(gridExtra)
library(writexl)
library(tseries)
library(forecast)
library(quantmod)
library(rugarch)
Sys.setlocale("LC_TIME", "en_US.UTF-8")
remove(list=ls())Discussion14_GARCH
PART I. Comparison of Time Series Models: ARIMA, ADL, ARCH, and GARCH
1. ARIMA (Autoregressive Integrated Moving Average)
The ARIMA model captures persistence in the mean of a time series. It models the current value \(y_t\) based on its own past values (Autoregressive) and past random shocks (Moving Average).
Estimating Equation: \[y_t = c + \phi_1 y_{t-1} + \dots + \phi_p y_{t-p} + \theta_1 \epsilon_{t-1} + \dots + \theta_q \epsilon_{t-q} + \epsilon_t\]
Coefficient Interpretation:
\(c\) Constant/Intercept: The baseline level of the series.
\(\phi_i\) AR coefficients: Measures the influence of past observations on the current value.
\(\theta_j\) MA coefficients: Measures the impact of past random shocks on the current value.
\(\epsilon_t\) The unpredictable random error at time \(t\).
2. ADL (Autoregressive Distributed Lag)
Structurally, the ADL model is an extension of the AR (Autoregressive) process that incorporates exogenous lags. However, it shares the fundamental philosophy of the ARIMA framework by modeling the dynamic persistence of a time series. While it omits the Moving Average (MA) component found in ARIMA to maintain structural simplicity, ADL expands the scope from univariate analysis to a multivariate perspective, capturing how \(y_t\) depends on both its own history and the lagged impacts of external predictors (\(x\)).
Estimating Equation: \[y_t = \beta_0 + \sum_{i=1}^{p} \beta_i y_{t-i} + \sum_{j=0}^{q} \gamma_j x_{t-j} + \epsilon_t\]
Coefficient Interpretation:
\(\beta_0\)Intercept: The constant term in the mean relationship.
\(\beta_i\) Lagged \(y\) coefficients: Represents the persistence of \(y\) based on its own history.
\(\gamma_j\) Lagged \(x\) coefficients: Quantifies how past values of the exogenous predictor \(x\) affect the current \(y\).
\(x_{t-j}\) Exogenous regressor: Independent variables used to improve the model’s forecasting power.
3. ARCH (Autoregressive Conditional Heteroskedasticity)
The ARCH model focuses on the dynamics of the variance of errors. It assumes today’s volatility depends on the magnitude of past squared shocks.
Estimating Equation: \[\sigma_t^2 = \alpha_0 + \sum_{i=1}^{p} \alpha_i u_{t-i}^2\]
Coefficient Interpretation:
\(\sigma_t^2\)Conditional variance: The predicted volatility at time \(t\) given past information.
\(\alpha_0\) Baseline volatility constant: The long-term constant component of the variance.
\(\alpha_i\) ARCH terms: Measures how much past shocks increase today’s volatility.
4. GARCH (Generalized ARCH)
GARCH is a generalized version of ARCH, where today’s volatility depends on both past squared shocks and past volatility levels.
Estimating Equation: \[\sigma_t^2 = \alpha_0 + \sum_{i=1}^{p} \alpha_i u_{t-i}^2 + \sum_{j=1}^{q} \phi_j \sigma_{t-j}^2\]
Coefficient Interpretation:
\(\alpha_0\) Constant: The intercept of the variance equation.
\(\alpha_i\): ARCH effects: Represents the immediate impact of new information or “shocks”.
\(\phi_j\) GARCH effects: Measures the persistence of volatility over time.
Connections & Associations
Mean vs. Variance: ARIMA and ADL model the mean, while ARCH and GARCH model the volatility (conditional variance).
Evolutionary Path: ADL is essentially ARIMA with exogenous predictors. GARCH generalizes ARCH by adding lagged variance terms to capture long-term volatility more parsimoniously.
Part II
I chose the historical stock prices of TSMC (TSM) for this analysis. As a global leader in the semiconductor industry, its stock price is highly sensitive to macroeconomic shifts and industry cycles, exhibiting significant volatility patterns ideal for GARCH modeling.
I used Adjusted Price instead of the raw Close Price. Adjusted prices account for dividends and stock splits, providing a more accurate representation of total returns and preventing artificial volatility caused by corporate actions from biasing the model estimates.
# Fetch TSMC (TSM) historical data with adjusted price to account for dividends and splits
getSymbols("TSM", src = "yahoo")[1] "TSM"
price <- TSM$TSM.AdjustedTo ensure stationarity, I converted the raw prices into log returns. This helps remove the price trend and shifts the focus to the dynamic behavior of the returns.
# Calculate Log Returns
ret <- diff(log(price))[-1]
colnames(ret) <- "Returns"From the TSMC Daily Log Returns plot below, we can observe the following key characteristics:
Volatility Clustering: The plot exhibits clear volatility clustering, meaning “large changes tend to be followed by large changes, and small changes by small changes.” This is evident during major events such as the 2008 Financial Crisis, the 2020 COVID-19 pandemic, and recent AI-driven market shifts, where high-volatility spikes appear in prolonged clusters.
Mean Reversion: Despite periods of intense turbulence, the returns consistently oscillate around the zero axis over the long run. This stationarity provides a solid foundation for estimating conditional variance using the GARCH model.
# Plot the time series
plot(ret, main ="TSMC Daily Percentage Changes (Log Returns)", col = "blue")Model Equations and Variable Definitions
Textbook Theoretical Framework
According to the textbook definition, the GARCH(1,1) model system is:
\[ \begin{cases} R_t = \beta_0 + u_t, & u_t \sim \mathcal{N}(0, \sigma_t^2) \quad \text{(Mean Equation)} \\ \sigma_t^2 = \alpha_0 + \alpha_1 u_{t-1}^2 + \phi_1 \sigma_{t-1}^2 & \text{(Variance Equation)} \end{cases} \]
Detailed Explanation of Variables:
| Symbol | Definition |
|---|---|
| \(R_t\) | The percentage change in period \(t\) (returns) |
| \(u_t\) | Error term, assumed to be normally distributed with conditional mean zero |
| \(\sigma_t^2\) | Variance \(\sigma_t^2\) which depends on \(t\) following the GARCH recursion |
| \(\beta_0\) | Unknown coefficient in the mean equation (constant return) |
| \(\alpha_0, \alpha_1, \phi_1\) | Unknown coefficients in the variance equation |
Empirical Results for TSM
Updating the equations based on the results from R:
\[ \begin{cases} R_t = 0.001005 + u_t \\ \sigma_t^2 = 0.000004 + 0.048237 u_{t-1}^2 + 0.941859 \sigma_{t-1}^2 \end{cases} \]
Analysis and Comments
- Volatility Clustering: The highly significant \(\alpha_1\) (0.0482) confirms volatility clustering in TSMC’s returns. This indicates that yesterday’s market shocks (\(u_{t-1}^2\)) directly impact today’s risk level.
- Persistence: The volatility persistence (\(\alpha_1 + \phi_1\)) is 0.9901. This value, being very close to 1, indicates that volatility is highly persistent; shocks will affect the system for an extended period.
- Long-run Variance & Mean reversion: Despite high persistence, since the sum is less than 1, the process exhibits mean reversion. Volatility will eventually return to its long-run average level (approximately 0.000404).
# Specify GARCH(1,1) model parameters
spec <- ugarchspec(
variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
mean.model = list(armaOrder = c(0, 0), include.mean = TRUE),
distribution.model = "norm"
)
# Fit the GARCH model to the data
fit <- ugarchfit(spec = spec, data = ret)
fit
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(0,0,0)
Distribution : norm
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu 0.001005 0.000255 3.9356 0.000083
omega 0.000004 0.000002 2.0243 0.042944
alpha1 0.048237 0.006492 7.4297 0.000000
beta1 0.941859 0.008940 105.3547 0.000000
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.001005 0.000240 4.18450 0.000029
omega 0.000004 0.000010 0.46286 0.643462
alpha1 0.048237 0.019567 2.46525 0.013692
beta1 0.941859 0.033158 28.40477 0.000000
LogLikelihood : 12256.83
Information Criteria
------------------------------------
Akaike -5.0454
Bayes -5.0401
Shibata -5.0454
Hannan-Quinn -5.0436
Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
statistic p-value
Lag[1] 10.18 0.0014193
Lag[2*(p+q)+(p+q)-1][2] 10.33 0.0015650
Lag[4*(p+q)+(p+q)-1][5] 13.96 0.0008411
d.o.f=0
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
statistic p-value
Lag[1] 2.751 0.09716
Lag[2*(p+q)+(p+q)-1][5] 3.521 0.32009
Lag[4*(p+q)+(p+q)-1][9] 5.268 0.39119
d.o.f=2
Weighted ARCH LM Tests
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 0.009825 0.500 2.000 0.9210
ARCH Lag[5] 0.564401 1.440 1.667 0.8645
ARCH Lag[7] 2.187321 2.315 1.543 0.6777
Nyblom stability test
------------------------------------
Joint Statistic: 3.1442
Individual Statistics:
mu 0.1172
omega 1.2047
alpha1 0.4181
beta1 0.4381
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.07 1.24 1.6
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
------------------------------------
t-value prob sig
Sign Bias 0.8992 0.3686
Negative Sign Bias 0.3922 0.6949
Positive Sign Bias 1.2326 0.2178
Joint Effect 2.1549 0.5409
Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
group statistic p-value(g-1)
1 20 112.3 2.944e-15
2 30 120.2 4.570e-13
3 40 148.9 9.395e-15
4 50 169.5 3.429e-15
Elapsed time : 0.1763082
Plot illustrates the returns superimposed with the dynamic risk boundaries (\(\pm 2\sigma_t\)). The expansion of these red bands during high-volatility periods (such as 2008 and 2020) provides direct visual evidence that the model successfully captures volatility clustering and the persistent nature of market shocks, validating the effectiveness of the estimated \(\alpha_1\) and \(\phi_1\) parameters.
plot(fit, which = 1)