A time series is simply a sequence of data points collected over time. The key feature is that the order matters - each observation is linked to a specific moment in time.
Examples of time series:
What makes time series special?
Unlike regular data where observations are independent (like heights of random people), time series observations are typically dependent on each other. Today’s stock price is related to yesterday’s price. This dependency is both a challenge and an opportunity:
This is a crucial concept that many students miss.
The Problem with Prices:
Stock prices tend to wander - they go up, down, and don’t settle around any particular value. Statistically, we call this non-stationarity. A price of $100 today tells us nothing about whether prices will average $100 in the future.
The Solution - Returns:
Returns measure the percentage change in price. Unlike prices, returns tend to fluctuate around zero (or a small positive average). This is stationarity - the statistical properties stay stable over time.
Simple Return: \[R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1\]
This is the intuitive “percentage change” - if a stock goes from $100 to $105, the simple return is 5%.
Log Return (Continuously Compounded Return): \[r_t = \ln\left(\frac{P_t}{P_{t-1}}\right) = \ln(P_t) - \ln(P_{t-1})\]
Why do we prefer log returns?
Time Additivity: Multi-period log returns simply add up. If Monday’s log return is 2% and Tuesday’s is 3%, the two-day log return is exactly 5%. Simple returns don’t add this way (they compound).
Symmetry: A +10% log return followed by -10% log return brings you back to start. Simple returns: +10% then -10% leaves you at 99% of original.
Statistical Properties: Log returns are more likely to be normally distributed.
Approximation: For small returns (< 10%), log returns ≈ simple returns.
In this analysis, we use LOG RETURNS throughout.
Our analysis follows this logical flow:
Each step builds on the previous one. We can’t skip steps because later steps assume earlier conditions are met.
# Run this ONCE to install packages (not every time you knit)
install.packages(c("quantmod", "xts", "zoo", "tseries", "forecast",
"rugarch", "FinTS", "lmtest", "ggplot2", "moments",
"knitr", "kableExtra", "scales"))
library(quantmod)
library(xts)
library(zoo)
library(tseries)
library(forecast)
library(rugarch)
library(FinTS)
library(ggplot2)
library(moments)
library(knitr)
library(kableExtra)
library(scales)
message("All packages loaded successfully!")
We need historical stock prices to analyze. Yahoo Finance provides free daily data including Open, High, Low, Close, Volume, and Adjusted Close. We use Adjusted Close because it reflects the true value an investor would have experienced, accounting for corporate actions like stock splits and dividend payments.
# Define parameters
ticker <- "AAPL"
start_date <- "2019-01-01"
end_date <- Sys.Date()
# Download data from Yahoo Finance
stock_data <- getSymbols(
Symbols = ticker,
src = "yahoo",
from = start_date,
to = end_date,
auto.assign = FALSE
)
# Extract Adjusted Close price
price <- Ad(stock_data)
colnames(price) <- "Price"
# Generate data summary
data_summary <- paste0(
"DATA SUMMARY\n",
"============\n",
"Stock: ", ticker, "\n",
"Period: ", as.character(index(price)[1]), " to ",
as.character(index(price)[nrow(price)]), "\n",
"Trading days: ", nrow(price), "\n",
"Starting price: $", round(as.numeric(price[1]), 2), "\n",
"Ending price: $", round(as.numeric(price[nrow(price)]), 2), "\n",
"Missing values: ", sum(is.na(price)), "\n"
)
cat(data_summary)
## DATA SUMMARY
## ============
## Stock: AAPL
## Period: 2019-01-02 to 2025-12-19
## Trading days: 1753
## Starting price: $37.54
## Ending price: $273.67
## Missing values: 0
A price chart reveals the trend and volatility patterns in the data.
price_df <- data.frame(Date = index(price), Price = as.numeric(price))
ggplot(price_df, aes(x = Date, y = Price)) +
geom_line(color = "steelblue", linewidth = 0.6) +
labs(title = paste(ticker, "Daily Adjusted Closing Price"),
x = "Date", y = "Price (USD)") +
theme_minimal(base_size = 12) +
scale_y_continuous(labels = dollar_format())
Daily Closing Prices
Definition of Log Return: \[r_t = \ln(P_t) - \ln(P_{t-1})\]
# Calculate log returns
returns <- diff(log(price))
returns <- na.omit(returns)
colnames(returns) <- "Returns"
returns_vec <- as.numeric(returns)
# Store key dates
start_dt <- as.character(index(price)[1])
end_dt <- as.character(index(price)[nrow(price)])
n_obs <- nrow(price)
Understanding these statistics is crucial for financial analysis:
Location Measures: Mean (average return) and Median (middle value)
Dispersion Measures: Standard Deviation (volatility/risk measure), Variance, Range
Shape Measures:
# Calculate all statistics
mean_ret <- mean(returns_vec)
median_ret <- median(returns_vec)
sd_ret <- sd(returns_vec)
var_ret <- var(returns_vec)
min_ret <- min(returns_vec)
max_ret <- max(returns_vec)
skew_ret <- skewness(returns_vec)
kurt_ret <- kurtosis(returns_vec)
excess_kurt <- kurt_ret - 3
annual_ret <- mean_ret * 252
annual_vol <- sd_ret * sqrt(252)
# Create comprehensive summary
summary_text <- paste0(
"\n=== SUMMARY STATISTICS ===\n\n",
"CENTRAL TENDENCY\n",
" Mean daily return: ", sprintf("%.6f", mean_ret),
" (", sprintf("%.4f%%", mean_ret * 100), ")\n",
" Median daily return: ", sprintf("%.6f", median_ret), "\n",
" Annualized return: ", sprintf("%.2f%%", annual_ret * 100), "\n\n",
"RISK MEASURES\n",
" Daily volatility: ", sprintf("%.6f", sd_ret),
" (", sprintf("%.4f%%", sd_ret * 100), ")\n",
" Annualized volatility: ", sprintf("%.2f%%", annual_vol * 100), "\n",
" Minimum return: ", sprintf("%.4f%%", min_ret * 100), "\n",
" Maximum return: ", sprintf("%.4f%%", max_ret * 100), "\n\n",
"DISTRIBUTION SHAPE\n",
" Skewness: ", sprintf("%.4f", skew_ret),
ifelse(abs(skew_ret) < 0.5, " (approximately symmetric)\n",
ifelse(skew_ret < 0, " (left-skewed: larger losses)\n",
" (right-skewed: larger gains)\n")),
" Kurtosis: ", sprintf("%.4f", kurt_ret), "\n",
" Excess Kurtosis: ", sprintf("%.4f", excess_kurt),
ifelse(excess_kurt > 0, " (fat tails: extreme events more likely)\n",
" (thin tails)\n")
)
cat(summary_text)
##
## === SUMMARY STATISTICS ===
##
## CENTRAL TENDENCY
## Mean daily return: 0.001134 (0.1134%)
## Median daily return: 0.001306
## Annualized return: 28.57%
##
## RISK MEASURES
## Daily volatility: 0.019586 (1.9586%)
## Annualized volatility: 31.09%
## Minimum return: -13.7708%
## Maximum return: 14.2617%
##
## DISTRIBUTION SHAPE
## Skewness: -0.0837 (approximately symmetric)
## Kurtosis: 9.6024
## Excess Kurtosis: 6.6024 (fat tails: extreme events more likely)
returns_df <- data.frame(Date = index(returns), Returns = returns_vec)
ggplot(returns_df, aes(x = Date, y = Returns)) +
geom_line(color = "steelblue", linewidth = 0.4) +
geom_hline(yintercept = 0, color = "red", linetype = "dashed") +
labs(title = paste(ticker, "Daily Log Returns"),
subtitle = "Notice: Periods of high volatility cluster together",
x = "Date", y = "Log Return") +
theme_minimal(base_size = 12) +
scale_y_continuous(labels = percent_format())
Daily Log Returns showing volatility clustering
ggplot(returns_df, aes(x = Returns)) +
geom_histogram(aes(y = after_stat(density)), bins = 50,
fill = "steelblue", color = "white", alpha = 0.7) +
stat_function(fun = dnorm, args = list(mean = mean_ret, sd = sd_ret),
color = "red", linewidth = 1.2, linetype = "dashed") +
labs(title = "Distribution of Returns vs Normal Distribution",
subtitle = "Blue = actual, Red dashed = normal. Note the fat tails.",
x = "Log Return", y = "Density") +
theme_minimal(base_size = 12)
Distribution of Returns vs Normal Distribution
The Jarque-Bera test checks whether data follows a normal distribution by examining skewness and kurtosis.
jb_test <- jarque.bera.test(returns_vec)
jb_result <- paste0(
"\n=== JARQUE-BERA TEST FOR NORMALITY ===\n\n",
"Hypotheses:\n",
" H0: Returns are normally distributed\n",
" H1: Returns are NOT normally distributed\n\n",
"Results:\n",
" Test Statistic: ", sprintf("%.4f", jb_test$statistic), "\n",
" p-value: ", sprintf("%.6f", jb_test$p.value), "\n\n",
"Conclusion:\n",
ifelse(jb_test$p.value < 0.05,
paste0(" REJECT H0 at 5% significance level.\n",
" Returns are NOT normally distributed.\n\n",
" Implications:\n",
" - Risk measures assuming normality may underestimate true risk\n",
" - Fat tails mean extreme losses are more likely than normal predicts\n",
" - This justifies using GARCH models for volatility\n"),
" Cannot reject H0. No strong evidence against normality.\n")
)
cat(jb_result)
##
## === JARQUE-BERA TEST FOR NORMALITY ===
##
## Hypotheses:
## H0: Returns are normally distributed
## H1: Returns are NOT normally distributed
##
## Results:
## Test Statistic: 3184.2718
## p-value: 0.000000
##
## Conclusion:
## REJECT H0 at 5% significance level.
## Returns are NOT normally distributed.
##
## Implications:
## - Risk measures assuming normality may underestimate true risk
## - Fat tails mean extreme losses are more likely than normal predicts
## - This justifies using GARCH models for volatility
Autocorrelation measures how correlated a time series is with its own past values. It answers: “If I know today’s return, does that tell me anything about tomorrow’s?”
Mathematical Definition: \[\rho_k = \frac{Cov(r_t, r_{t-k})}{Var(r_t)}\]
Interpretation:
Why This Matters:
The Efficient Market Hypothesis states that returns should be unpredictable. If we find significant autocorrelation, it suggests some predictability exists.
ACF (Autocorrelation Function): Shows total correlation at each lag, including direct and indirect effects.
PACF (Partial ACF): Shows correlation at lag k after removing effects of intermediate lags. Isolates the direct relationship.
Pattern Recognition for Model Selection:
| True Model | ACF Pattern | PACF Pattern |
|---|---|---|
| AR(p) | Decays gradually | Cuts off after lag p |
| MA(q) | Cuts off after lag q | Decays gradually |
| ARMA(p,q) | Decays gradually | Decays gradually |
| White Noise | All near 0 | All near 0 |
par(mfrow = c(2, 1))
acf(returns_vec, lag.max = 30, main = paste("ACF of", ticker, "Returns"),
col = "steelblue", lwd = 2)
pacf(returns_vec, lag.max = 30, main = paste("PACF of", ticker, "Returns"),
col = "steelblue", lwd = 2)
ACF and PACF of Returns
par(mfrow = c(1, 1))
# Calculate ACF/PACF and analyze
n <- length(returns_vec)
ci_bound <- 1.96 / sqrt(n)
acf_result <- acf(returns_vec, lag.max = 30, plot = FALSE)
pacf_result <- pacf(returns_vec, lag.max = 30, plot = FALSE)
acf_values <- acf_result$acf[-1]
pacf_values <- pacf_result$acf
sig_acf <- which(abs(acf_values) > ci_bound)
sig_pacf <- which(abs(pacf_values) > ci_bound)
acf_analysis <- paste0(
"\n=== ACF/PACF ANALYSIS ===\n\n",
"Sample size: ", n, "\n",
"95% confidence bounds: ±", sprintf("%.4f", ci_bound), "\n\n",
"Significant ACF lags: ",
ifelse(length(sig_acf) == 0, "None", paste(sig_acf, collapse = ", ")), "\n",
"Significant PACF lags: ",
ifelse(length(sig_pacf) == 0, "None", paste(sig_pacf, collapse = ", ")), "\n\n",
"Interpretation:\n",
ifelse(length(sig_acf) == 0 && length(sig_pacf) == 0,
paste0(" Both ACF and PACF show no significant correlations.\n",
" Returns appear to be WHITE NOISE (unpredictable).\n",
" This is consistent with the Efficient Market Hypothesis.\n",
" Implication: ARIMA(0,0,0) may be appropriate.\n"),
paste0(" Some significant lags detected.\n",
" An ARIMA model may capture this predictability.\n"))
)
cat(acf_analysis)
##
## === ACF/PACF ANALYSIS ===
##
## Sample size: 1752
## 95% confidence bounds: ±0.0468
##
## Significant ACF lags: 1, 6, 7, 8, 9, 10, 18, 22
## Significant PACF lags: 1, 6, 8, 9, 18
##
## Interpretation:
## Some significant lags detected.
## An ARIMA model may capture this predictability.
The Ljung-Box test checks whether autocorrelations up to lag m are jointly equal to zero.
lb_5 <- Box.test(returns_vec, lag = 5, type = "Ljung-Box")
lb_10 <- Box.test(returns_vec, lag = 10, type = "Ljung-Box")
lb_20 <- Box.test(returns_vec, lag = 20, type = "Ljung-Box")
lb_result <- paste0(
"\n=== LJUNG-BOX TEST FOR WHITE NOISE ===\n\n",
"Hypotheses:\n",
" H0: Returns are white noise (no autocorrelation)\n",
" H1: Autocorrelation exists\n\n",
"Results:\n",
sprintf(" Lag 5: Q = %7.4f, p-value = %.4f %s\n",
lb_5$statistic, lb_5$p.value,
ifelse(lb_5$p.value < 0.05, "→ Reject H0", "→ Cannot reject H0")),
sprintf(" Lag 10: Q = %7.4f, p-value = %.4f %s\n",
lb_10$statistic, lb_10$p.value,
ifelse(lb_10$p.value < 0.05, "→ Reject H0", "→ Cannot reject H0")),
sprintf(" Lag 20: Q = %7.4f, p-value = %.4f %s\n",
lb_20$statistic, lb_20$p.value,
ifelse(lb_20$p.value < 0.05, "→ Reject H0", "→ Cannot reject H0")),
"\nConclusion:\n",
ifelse(all(c(lb_5$p.value, lb_10$p.value, lb_20$p.value) > 0.05),
paste0(" Cannot reject H0 at any lag.\n",
" Returns appear to be WHITE NOISE.\n",
" Past returns do not help predict future returns.\n"),
paste0(" Reject H0 - significant autocorrelation detected.\n",
" An ARIMA model may capture this predictability.\n"))
)
cat(lb_result)
##
## === LJUNG-BOX TEST FOR WHITE NOISE ===
##
## Hypotheses:
## H0: Returns are white noise (no autocorrelation)
## H1: Autocorrelation exists
##
## Results:
## Lag 5: Q = 17.6678, p-value = 0.0034 → Reject H0
## Lag 10: Q = 64.9937, p-value = 0.0000 → Reject H0
## Lag 20: Q = 86.4768, p-value = 0.0000 → Reject H0
##
## Conclusion:
## Reject H0 - significant autocorrelation detected.
## An ARIMA model may capture this predictability.
A time series is stationary if its statistical properties don’t change over time:
Why Stationarity Matters:
Stock Prices vs Returns:
Tests for a unit root - the signature of non-stationarity.
We want to REJECT H0 to confirm stationarity.
adf_result <- adf.test(returns, alternative = "stationary")
adf_output <- paste0(
"\n=== AUGMENTED DICKEY-FULLER TEST ===\n\n",
"Hypotheses:\n",
" H0: Unit root exists (series is NON-STATIONARY)\n",
" H1: No unit root (series is STATIONARY)\n\n",
"Results:\n",
" Test Statistic: ", sprintf("%.4f", adf_result$statistic), "\n",
" Lag Order: ", adf_result$parameter, "\n",
" p-value: ", sprintf("%.6f", adf_result$p.value), "\n\n",
"Critical Values (approximate):\n",
" 1% level: -3.43\n",
" 5% level: -2.86\n",
" 10% level: -2.57\n\n",
"Conclusion:\n",
ifelse(adf_result$p.value < 0.05,
paste0(" p-value < 0.05: REJECT H0 at 5% significance level.\n",
" The returns series is STATIONARY.\n"),
paste0(" p-value >= 0.05: Cannot reject H0.\n",
" Evidence suggests non-stationarity (unusual for returns).\n"))
)
cat(adf_output)
##
## === AUGMENTED DICKEY-FULLER TEST ===
##
## Hypotheses:
## H0: Unit root exists (series is NON-STATIONARY)
## H1: No unit root (series is STATIONARY)
##
## Results:
## Test Statistic: -11.8165
## Lag Order: 12
## p-value: 0.010000
##
## Critical Values (approximate):
## 1% level: -3.43
## 5% level: -2.86
## 10% level: -2.57
##
## Conclusion:
## p-value < 0.05: REJECT H0 at 5% significance level.
## The returns series is STATIONARY.
The KPSS test has opposite hypotheses to ADF:
We want to FAIL TO REJECT H0 to confirm stationarity.
Using both tests gives a more robust conclusion.
kpss_result <- kpss.test(returns, null = "Level")
kpss_output <- paste0(
"\n=== KPSS TEST ===\n\n",
"Hypotheses (OPPOSITE of ADF):\n",
" H0: Series is STATIONARY\n",
" H1: Series is NON-STATIONARY\n\n",
"Results:\n",
" Test Statistic: ", sprintf("%.4f", kpss_result$statistic), "\n",
" Truncation Lag: ", kpss_result$parameter, "\n",
" p-value: ",
ifelse(kpss_result$p.value > 0.1, "> 0.10", sprintf("%.4f", kpss_result$p.value)), "\n\n",
"Critical Values:\n",
" 10% level: 0.347\n",
" 5% level: 0.463\n",
" 1% level: 0.739\n\n",
"Conclusion:\n",
ifelse(kpss_result$p.value > 0.05,
" p-value > 0.05: Cannot reject H0. Evidence supports STATIONARITY.\n",
" p-value < 0.05: Reject H0. Evidence against stationarity.\n")
)
cat(kpss_output)
##
## === KPSS TEST ===
##
## Hypotheses (OPPOSITE of ADF):
## H0: Series is STATIONARY
## H1: Series is NON-STATIONARY
##
## Results:
## Test Statistic: 0.2224
## Truncation Lag: 8
## p-value: 0.1000
##
## Critical Values:
## 10% level: 0.347
## 5% level: 0.463
## 1% level: 0.739
##
## Conclusion:
## p-value > 0.05: Cannot reject H0. Evidence supports STATIONARITY.
adf_stationary <- adf_result$p.value < 0.05
kpss_stationary <- kpss_result$p.value > 0.05
combined_result <- paste0(
"\n=== COMBINED STATIONARITY ASSESSMENT ===\n\n",
"Test Results:\n",
" ADF Test: ", ifelse(adf_stationary, "Stationary (reject unit root)",
"Non-stationary (cannot reject unit root)"), "\n",
" KPSS Test: ", ifelse(kpss_stationary, "Stationary (cannot reject stationarity)",
"Non-stationary (reject stationarity)"), "\n\n",
"Overall Conclusion:\n",
ifelse(adf_stationary && kpss_stationary,
paste0(" BOTH TESTS AGREE: Returns are STATIONARY.\n",
" This is the expected result for financial returns.\n",
" We can proceed with ARIMA modeling using d = 0 (no differencing).\n"),
ifelse(!adf_stationary && !kpss_stationary,
" TESTS CONFLICT: Results are inconclusive. Further investigation needed.\n",
" MIXED RESULTS: One test suggests stationarity, one doesn't.\n"))
)
cat(combined_result)
##
## === COMBINED STATIONARITY ASSESSMENT ===
##
## Test Results:
## ADF Test: Stationary (reject unit root)
## KPSS Test: Stationary (cannot reject stationarity)
##
## Overall Conclusion:
## BOTH TESTS AGREE: Returns are STATIONARY.
## This is the expected result for financial returns.
## We can proceed with ARIMA modeling using d = 0 (no differencing).
ARIMA = AutoRegressive Integrated Moving Average
An ARIMA(p, d, q) model has three components:
AR(1) Model: \(r_t = c + \phi_1 r_{t-1} + \varepsilon_t\)
“Today’s return = constant + (coefficient × yesterday’s return) + random shock”
MA(1) Model: \(r_t = c + \varepsilon_t + \theta_1 \varepsilon_{t-1}\)
“Today’s return = constant + today’s shock + (coefficient × yesterday’s shock)”
The MA component captures how the series responds to news/innovations.
For our returns: Since returns are typically stationary, we use d = 0. An ARIMA(p, 0, q) is also called ARMA(p, q).
We use Information Criteria to balance fit against complexity:
Lower is better. BIC penalizes complexity more, choosing simpler models.
# Automatic model selection
auto_fit <- auto.arima(
returns,
d = 0,
max.p = 5,
max.q = 5,
ic = "aic",
stepwise = FALSE,
approximation = FALSE
)
p_order <- auto_fit$arma[1]
q_order <- auto_fit$arma[2]
d_order <- auto_fit$arma[6]
arima_selection <- paste0(
"\n=== AUTOMATIC ARIMA SELECTION ===\n\n",
"Search parameters: p ∈ [0,5], d = 0, q ∈ [0,5]\n",
"Selection criterion: AIC\n\n",
"SELECTED MODEL: ARIMA(", p_order, ", ", d_order, ", ", q_order, ")\n\n",
"Model fit statistics:\n",
" AIC: ", sprintf("%.4f", AIC(auto_fit)), "\n",
" BIC: ", sprintf("%.4f", BIC(auto_fit)), "\n"
)
cat(arima_selection)
##
## === AUTOMATIC ARIMA SELECTION ===
##
## Search parameters: p ∈ [0,5], d = 0, q ∈ [0,5]
## Selection criterion: AIC
##
## SELECTED MODEL: ARIMA(4, 0, 1)
##
## Model fit statistics:
## AIC: -8827.4883
## BIC: -8789.2088
# Fit candidate models
m000 <- Arima(returns, order = c(0, 0, 0))
m100 <- Arima(returns, order = c(1, 0, 0))
m001 <- Arima(returns, order = c(0, 0, 1))
m101 <- Arima(returns, order = c(1, 0, 1))
models <- list(m000, m100, m001, m101)
model_names <- c("ARIMA(0,0,0)", "ARIMA(1,0,0)", "ARIMA(0,0,1)", "ARIMA(1,0,1)")
comparison <- data.frame(
Model = model_names,
AIC = sapply(models, AIC),
BIC = sapply(models, BIC)
)
comparison <- comparison[order(comparison$AIC), ]
kable(comparison, digits = 2, row.names = FALSE,
caption = "Model Comparison (lower is better)") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
| Model | AIC | BIC |
|---|---|---|
| ARIMA(1,0,0) | -8816.53 | -8800.12 |
| ARIMA(0,0,1) | -8815.90 | -8799.49 |
| ARIMA(1,0,1) | -8815.88 | -8794.01 |
| ARIMA(0,0,0) | -8805.98 | -8795.04 |
final_arima <- auto_fit
cat("\n=== FINAL ARIMA MODEL ===\n\n")
##
## === FINAL ARIMA MODEL ===
summary(final_arima)
## Series: returns
## ARIMA(4,0,1) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 mean
## -0.9174 -0.0443 -0.0111 -0.0756 0.8417 0.0011
## s.e. 0.0576 0.0329 0.0326 0.0249 0.0531 0.0004
##
## sigma^2 = 0.0003779: log likelihood = 4420.74
## AIC=-8827.49 AICc=-8827.42 BIC=-8789.21
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -6.209177e-06 0.01940518 0.01357841 -Inf Inf 0.6897393
## ACF1
## Training set 0.003339183
# Interpretation
coefs <- coef(final_arima)
interpretation <- paste0(
"\n=== MODEL INTERPRETATION ===\n\n"
)
if (length(coefs) == 0 || (length(coefs) == 1 && names(coefs) == "intercept")) {
interpretation <- paste0(interpretation,
"Model: ARIMA(0,0,0) - White Noise with Mean\n\n",
"This means:\n",
" • Returns are UNPREDICTABLE from past values\n",
" • The best forecast for tomorrow is simply the mean return\n",
" • This is consistent with the Efficient Market Hypothesis\n",
" • Past returns contain no information about future returns\n"
)
if ("intercept" %in% names(coefs)) {
interpretation <- paste0(interpretation,
"\nMean return: ", sprintf("%.6f", coefs["intercept"]),
" (", sprintf("%.4f%%", coefs["intercept"]*100), " daily)\n"
)
}
} else {
interpretation <- paste0(interpretation, "Estimated coefficients:\n\n")
for (nm in names(coefs)) {
interpretation <- paste0(interpretation,
" ", nm, " = ", sprintf("%.6f", coefs[nm]), "\n"
)
}
}
cat(interpretation)
##
## === MODEL INTERPRETATION ===
##
## Estimated coefficients:
##
## ar1 = -0.917423
## ar2 = -0.044339
## ar3 = -0.011130
## ar4 = -0.075634
## ma1 = 0.841684
## intercept = 0.001140
If our ARIMA model is correct, the residuals should be uncorrelated white noise.
arima_residuals <- residuals(final_arima)
resid_vec <- as.numeric(arima_residuals)
par(mfrow = c(2, 2))
plot(arima_residuals, main = "Residuals Over Time", ylab = "Residuals", col = "steelblue")
abline(h = 0, col = "red", lty = 2)
acf(resid_vec, main = "ACF of Residuals", col = "steelblue", lwd = 2)
hist(resid_vec, breaks = 40, freq = FALSE, col = "steelblue",
main = "Histogram of Residuals", xlab = "Residuals", border = "white")
curve(dnorm(x, mean(resid_vec), sd(resid_vec)), add = TRUE, col = "red", lwd = 2)
qqnorm(resid_vec, main = "Q-Q Plot", col = "steelblue", pch = 20)
qqline(resid_vec, col = "red", lwd = 2)
ARIMA Model Diagnostics
par(mfrow = c(1, 1))
lb_resid <- Box.test(resid_vec, lag = 10, type = "Ljung-Box")
jb_resid <- jarque.bera.test(resid_vec)
diag_result <- paste0(
"\n=== RESIDUAL DIAGNOSTIC TESTS ===\n\n",
"1. Ljung-Box Test (autocorrelation in residuals)\n",
" H0: Residuals are white noise\n",
" Q-statistic: ", sprintf("%.4f", lb_resid$statistic), "\n",
" p-value: ", sprintf("%.4f", lb_resid$p.value), "\n",
" Result: ", ifelse(lb_resid$p.value > 0.05,
"PASS - No autocorrelation remains ✓",
"FAIL - Some autocorrelation remains ✗"), "\n\n",
"2. Jarque-Bera Test (normality of residuals)\n",
" H0: Residuals are normally distributed\n",
" JB-statistic: ", sprintf("%.4f", jb_resid$statistic), "\n",
" p-value: ", sprintf("%.6f", jb_resid$p.value), "\n",
" Result: ", ifelse(jb_resid$p.value > 0.05,
"Residuals appear normal ✓",
"Non-normal residuals (common for financial data)"), "\n"
)
cat(diag_result)
##
## === RESIDUAL DIAGNOSTIC TESTS ===
##
## 1. Ljung-Box Test (autocorrelation in residuals)
## H0: Residuals are white noise
## Q-statistic: 13.0497
## p-value: 0.2209
## Result: PASS - No autocorrelation remains ✓
##
## 2. Jarque-Bera Test (normality of residuals)
## H0: Residuals are normally distributed
## JB-statistic: 2111.5059
## p-value: 0.000000
## Result: Non-normal residuals (common for financial data)
Our ARIMA model predicts expected returns (the mean), not actual returns: \[r_{t+1} = E[r_{t+1}] + \varepsilon_{t+1}\]
We can predict \(E[r_{t+1}]\) but not \(\varepsilon_{t+1}\) (the random shock).
Prediction intervals capture forecast uncertainty - they widen as we forecast further ahead.
h <- 5
fc <- forecast(final_arima, h = h)
# Forecast table
fc_table <- data.frame(
Day = 1:h,
Forecast_Pct = as.numeric(fc$mean) * 100,
Lower_95_Pct = as.numeric(fc$lower[, 2]) * 100,
Upper_95_Pct = as.numeric(fc$upper[, 2]) * 100
)
kable(fc_table, digits = 4,
col.names = c("Day Ahead", "Forecast (%)", "Lower 95%", "Upper 95%"),
caption = "Return Forecasts with 95% Prediction Intervals") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
| Day Ahead | Forecast (%) | Lower 95% | Upper 95% |
|---|---|---|---|
| 1 | 0.0065 | -3.8034 | 3.8164 |
| 2 | 0.2787 | -3.5421 | 4.0995 |
| 3 | -0.0383 | -3.8603 | 3.7837 |
| 4 | 0.2151 | -3.6087 | 4.0389 |
| 5 | 0.0342 | -3.7939 | 3.8623 |
plot(fc, main = paste(ticker, "Return Forecast"),
xlab = "Time", ylab = "Return", col = "steelblue", lwd = 2)
5-Day Return Forecast
Since log returns are: \(r_t = \ln(P_t) - \ln(P_{t-1})\)
We recover prices: \(P_t = P_{t-1} \times e^{r_t}\)
last_price <- as.numeric(tail(price, 1))
cum_returns <- cumsum(as.numeric(fc$mean))
price_forecasts <- last_price * exp(cum_returns)
price_result <- paste0(
"\n=== PRICE FORECASTS ===\n\n",
"Last known price: $", sprintf("%.2f", last_price), "\n\n",
"Forecasted prices:\n"
)
for (i in 1:h) {
price_result <- paste0(price_result,
" Day ", i, ": $", sprintf("%.2f", price_forecasts[i]),
" (cumulative return: ", sprintf("%.4f%%", cum_returns[i] * 100), ")\n"
)
}
price_result <- paste0(price_result,
"\n5-day forecast: $", sprintf("%.2f", last_price), " → $",
sprintf("%.2f", price_forecasts[h]),
" (", sprintf("%.2f%%", (price_forecasts[h]/last_price - 1) * 100), " change)\n"
)
cat(price_result)
##
## === PRICE FORECASTS ===
##
## Last known price: $273.67
##
## Forecasted prices:
## Day 1: $273.69 (cumulative return: 0.0065%)
## Day 2: $274.45 (cumulative return: 0.2852%)
## Day 3: $274.35 (cumulative return: 0.2469%)
## Day 4: $274.94 (cumulative return: 0.4620%)
## Day 5: $275.03 (cumulative return: 0.4962%)
##
## 5-day forecast: $273.67 → $275.03 (0.50% change)
ARIMA models the conditional mean but assumes constant variance. However, financial returns show volatility clustering - periods of high volatility tend to persist, as do calm periods.
This is visible in our returns plot: large moves (positive or negative) cluster together. GARCH models capture this predictable volatility.
ARCH = AutoRegressive Conditional Heteroskedasticity
ARCH(1) Model:
In words: Today’s variance depends on yesterday’s squared shock. Big shocks (positive or negative) increase tomorrow’s expected volatility.
Before fitting GARCH, we verify that ARCH effects exist.
par(mfrow = c(2, 1))
plot(index(returns), resid_vec^2, type = "h", col = "steelblue",
main = "Squared Residuals (Volatility Proxy)",
xlab = "Date", ylab = "Squared Residuals")
abline(h = mean(resid_vec^2), col = "red", lty = 2)
acf(resid_vec^2, main = "ACF of Squared Residuals", lag.max = 30,
col = "steelblue", lwd = 2)
Evidence of Volatility Clustering
par(mfrow = c(1, 1))
arch_test_5 <- ArchTest(resid_vec, lags = 5)
arch_test_10 <- ArchTest(resid_vec, lags = 10)
arch_exists <- arch_test_5$p.value < 0.05 || arch_test_10$p.value < 0.05
arch_result <- paste0(
"\n=== ARCH-LM TEST ===\n\n",
"Hypotheses:\n",
" H0: No ARCH effects (variance is constant)\n",
" H1: ARCH effects present (volatility is predictable)\n\n",
"Results:\n",
sprintf(" Lag 5: Chi-sq = %.4f, p-value = %.4f %s\n",
arch_test_5$statistic, arch_test_5$p.value,
ifelse(arch_test_5$p.value < 0.05, "→ Reject H0", "→ Cannot reject")),
sprintf(" Lag 10: Chi-sq = %.4f, p-value = %.4f %s\n",
arch_test_10$statistic, arch_test_10$p.value,
ifelse(arch_test_10$p.value < 0.05, "→ Reject H0", "→ Cannot reject")),
"\nConclusion:\n",
ifelse(arch_exists,
paste0(" ARCH effects are PRESENT.\n",
" Volatility is predictable from past squared residuals.\n",
" A GARCH model is appropriate.\n"),
paste0(" No strong evidence of ARCH effects.\n",
" Variance may be constant. GARCH may not improve the model.\n"))
)
cat(arch_result)
##
## === ARCH-LM TEST ===
##
## Hypotheses:
## H0: No ARCH effects (variance is constant)
## H1: ARCH effects present (volatility is predictable)
##
## Results:
## Lag 5: Chi-sq = 247.1889, p-value = 0.0000 → Reject H0
## Lag 10: Chi-sq = 268.0612, p-value = 0.0000 → Reject H0
##
## Conclusion:
## ARCH effects are PRESENT.
## Volatility is predictable from past squared residuals.
## A GARCH model is appropriate.
GARCH(1,1) - The workhorse model:
Components:
Key derived quantities:
# Specify GARCH(1,1) with Student-t distribution (handles fat tails)
spec_t <- ugarchspec(
variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
mean.model = list(armaOrder = c(0, 0), include.mean = TRUE),
distribution.model = "std"
)
spec_norm <- ugarchspec(
variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
mean.model = list(armaOrder = c(0, 0), include.mean = TRUE),
distribution.model = "norm"
)
# Fit models
fit_t <- ugarchfit(spec_t, data = returns, solver = "hybrid")
fit_norm <- ugarchfit(spec_norm, data = returns, solver = "hybrid")
# Compare
compare_df <- data.frame(
Model = c("GARCH(1,1) Normal", "GARCH(1,1) Student-t"),
LogLik = c(likelihood(fit_norm), likelihood(fit_t)),
AIC = c(infocriteria(fit_norm)[1], infocriteria(fit_t)[1]),
BIC = c(infocriteria(fit_norm)[2], infocriteria(fit_t)[2])
)
kable(compare_df, digits = 4,
caption = "GARCH Model Comparison (lower AIC/BIC is better)") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
| Model | LogLik | AIC | BIC |
|---|---|---|---|
| GARCH(1,1) Normal | 4598.455 | -5.2448 | -5.2323 |
| GARCH(1,1) Student-t | 4673.228 | -5.3290 | -5.3134 |
# Select best model
if (infocriteria(fit_t)[2] < infocriteria(fit_norm)[2]) {
final_garch <- fit_t
dist_used <- "Student-t"
} else {
final_garch <- fit_norm
dist_used <- "Normal"
}
cat("\nSelected model:", dist_used, "GARCH(1,1)\n")
##
## Selected model: Student-t GARCH(1,1)
cat("\n=== FINAL GARCH MODEL ===\n\n")
##
## === FINAL GARCH MODEL ===
print(final_garch)
##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : std
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001612 0.000346 4.6622 3e-06
## omega 0.000013 0.000002 7.6447 0e+00
## alpha1 0.097152 0.003036 31.9960 0e+00
## beta1 0.871233 0.013214 65.9346 0e+00
## shape 4.803874 0.475949 10.0933 0e+00
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001612 0.000355 4.5340 6e-06
## omega 0.000013 0.000003 4.9654 1e-06
## alpha1 0.097152 0.010973 8.8538 0e+00
## beta1 0.871233 0.012951 67.2702 0e+00
## shape 4.803874 0.578671 8.3016 0e+00
##
## LogLikelihood : 4673.228
##
## Information Criteria
## ------------------------------------
##
## Akaike -5.3290
## Bayes -5.3134
## Shibata -5.3290
## Hannan-Quinn -5.3233
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.004474 0.9467
## Lag[2*(p+q)+(p+q)-1][2] 0.111083 0.9120
## Lag[4*(p+q)+(p+q)-1][5] 1.030053 0.8528
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.531 0.4662
## Lag[2*(p+q)+(p+q)-1][5] 1.992 0.6204
## Lag[4*(p+q)+(p+q)-1][9] 3.649 0.6488
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.1778 0.500 2.000 0.6733
## ARCH Lag[5] 2.8358 1.440 1.667 0.3145
## ARCH Lag[7] 3.8934 2.315 1.543 0.3622
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 19.6985
## Individual Statistics:
## mu 0.6105
## omega 4.9202
## alpha1 0.6530
## beta1 0.5412
## shape 0.9963
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.28 1.47 1.88
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.30538 0.7601
## Negative Sign Bias 1.32420 0.1856
## Positive Sign Bias 0.07407 0.9410
## Joint Effect 3.59422 0.3087
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 18.07 0.5179
## 2 30 27.97 0.5197
## 3 40 30.92 0.8185
## 4 50 35.10 0.9326
##
##
## Elapsed time : 0.1088591
coefs <- coef(final_garch)
mu <- coefs["mu"]
omega <- coefs["omega"]
alpha1 <- coefs["alpha1"]
beta1 <- coefs["beta1"]
persistence <- alpha1 + beta1
halflife <- log(0.5) / log(persistence)
longrun_var <- omega / (1 - alpha1 - beta1)
longrun_vol_daily <- sqrt(longrun_var)
longrun_vol_annual <- longrun_vol_daily * sqrt(252) * 100
garch_interp <- paste0(
"\n=== GARCH COEFFICIENT INTERPRETATION ===\n\n",
"MEAN EQUATION:\n",
" μ (mean return) = ", sprintf("%.6f", mu),
" (", sprintf("%.4f%%", mu * 100), " daily)\n",
" Annualized return: ", sprintf("%.2f%%", mu * 252 * 100), "\n\n",
"VARIANCE EQUATION: σ²_t = ω + α₁ε²_{t-1} + β₁σ²_{t-1}\n\n",
" ω (omega) = ", sprintf("%.10f", omega), "\n",
" Baseline variance component\n\n",
" α₁ (alpha) = ", sprintf("%.6f", alpha1), "\n",
" ARCH effect: Impact of yesterday's shock on today's variance\n",
" A 1% unexpected return increases tomorrow's variance by ",
sprintf("%.4f", alpha1 * 0.0001 * 10000), " basis points\n\n",
" β₁ (beta) = ", sprintf("%.6f", beta1), "\n",
" GARCH effect: Persistence of past variance\n",
" ", sprintf("%.1f%%", beta1 * 100), " of yesterday's variance carries over\n\n",
"DERIVED QUANTITIES:\n\n",
" Persistence (α + β) = ", sprintf("%.6f", persistence), "\n",
ifelse(persistence < 1,
" Less than 1: Model is stationary ✓\n",
" Equals/exceeds 1: Integrated GARCH (shocks permanent)\n"),
"\n Half-life = ", sprintf("%.1f", halflife), " days\n",
" Time for a volatility shock to decay by half\n",
"\n Long-run daily volatility = ", sprintf("%.4f%%", longrun_vol_daily * 100), "\n",
" Long-run annual volatility = ", sprintf("%.2f%%", longrun_vol_annual), "\n",
" Volatility eventually returns to this level\n"
)
if ("shape" %in% names(coefs)) {
shape <- coefs["shape"]
garch_interp <- paste0(garch_interp,
"\n Shape (ν) = ", sprintf("%.4f", shape), "\n",
" Student-t degrees of freedom (lower = fatter tails)\n"
)
}
cat(garch_interp)
##
## === GARCH COEFFICIENT INTERPRETATION ===
##
## MEAN EQUATION:
## μ (mean return) = 0.001612 (0.1612% daily)
## Annualized return: 40.61%
##
## VARIANCE EQUATION: σ²_t = ω + α₁ε²_{t-1} + β₁σ²_{t-1}
##
## ω (omega) = 0.0000125662
## Baseline variance component
##
## α₁ (alpha) = 0.097152
## ARCH effect: Impact of yesterday's shock on today's variance
## A 1% unexpected return increases tomorrow's variance by 0.0972 basis points
##
## β₁ (beta) = 0.871233
## GARCH effect: Persistence of past variance
## 87.1% of yesterday's variance carries over
##
## DERIVED QUANTITIES:
##
## Persistence (α + β) = 0.968385
## Less than 1: Model is stationary ✓
##
## Half-life = 21.6 days
## Time for a volatility shock to decay by half
##
## Long-run daily volatility = 1.9937%
## Long-run annual volatility = 31.65%
## Volatility eventually returns to this level
##
## Shape (ν) = 4.8039
## Student-t degrees of freedom (lower = fatter tails)
If GARCH correctly captures volatility dynamics, the standardized residuals \(z_t = \varepsilon_t / \sigma_t\) should be white noise with no remaining ARCH effects.
# In rugarch, standardized residuals are obtained differently
std_resid <- as.numeric(residuals(final_garch) / sigma(final_garch))
cond_vol <- as.numeric(sigma(final_garch))
par(mfrow = c(3, 2))
# Conditional volatility
plot(index(returns), cond_vol, type = "l", col = "steelblue",
main = "Conditional Volatility Over Time", xlab = "Date", ylab = "Daily σ")
# Returns with volatility bands
plot(index(returns), returns_vec, type = "l", col = "gray60",
main = "Returns with ±2σ Bands", xlab = "Date", ylab = "Return")
lines(index(returns), 2 * cond_vol, col = "red", lty = 2)
lines(index(returns), -2 * cond_vol, col = "red", lty = 2)
# Standardized residuals
plot(index(returns), std_resid, type = "l", col = "steelblue",
main = "Standardized Residuals", xlab = "Date", ylab = "z_t")
abline(h = c(-2, 0, 2), col = c("red", "black", "red"), lty = c(2, 1, 2))
# ACF of squared standardized residuals
acf(std_resid^2, main = "ACF of Squared Std Residuals", lag.max = 30,
col = "steelblue", lwd = 2)
# Histogram
hist(std_resid, breaks = 50, freq = FALSE, col = "steelblue",
main = "Distribution of Std Residuals", xlab = "z_t", border = "white")
curve(dnorm(x), add = TRUE, col = "red", lwd = 2)
# Q-Q plot
qqnorm(std_resid, main = "Q-Q Plot of Std Residuals", col = "steelblue", pch = 20)
qqline(std_resid, col = "red", lwd = 2)
GARCH Model Diagnostics
par(mfrow = c(1, 1))
lb_std <- Box.test(std_resid, lag = 10, type = "Ljung-Box")
arch_std <- ArchTest(std_resid, lags = 10)
lb_sq <- Box.test(std_resid^2, lag = 10, type = "Ljung-Box")
pass_lb <- lb_std$p.value > 0.05
pass_arch <- arch_std$p.value > 0.05
pass_sq <- lb_sq$p.value > 0.05
pass_count <- sum(c(pass_lb, pass_arch, pass_sq))
garch_diag <- paste0(
"\n=== GARCH DIAGNOSTIC TESTS ===\n\n",
"1. Ljung-Box on Standardized Residuals\n",
" Purpose: Check if mean dynamics are captured\n",
" H0: No remaining autocorrelation\n",
" Q = ", sprintf("%.4f", lb_std$statistic),
", p-value = ", sprintf("%.4f", lb_std$p.value), "\n",
" Result: ", ifelse(pass_lb, "PASS ✓", "FAIL ✗"), "\n\n",
"2. ARCH-LM on Standardized Residuals\n",
" Purpose: Check if volatility dynamics are captured\n",
" H0: No remaining ARCH effects\n",
" Chi-sq = ", sprintf("%.4f", arch_std$statistic),
", p-value = ", sprintf("%.4f", arch_std$p.value), "\n",
" Result: ", ifelse(pass_arch, "PASS ✓", "FAIL ✗"), "\n\n",
"3. Ljung-Box on Squared Standardized Residuals\n",
" Purpose: Alternative volatility test\n",
" H0: No autocorrelation in squared residuals\n",
" Q = ", sprintf("%.4f", lb_sq$statistic),
", p-value = ", sprintf("%.4f", lb_sq$p.value), "\n",
" Result: ", ifelse(pass_sq, "PASS ✓", "FAIL ✗"), "\n\n",
"=== OVERALL DIAGNOSTIC SUMMARY ===\n\n",
"Tests passed: ", pass_count, " out of 3\n\n",
ifelse(pass_count == 3,
"CONCLUSION: GARCH model adequately captures both mean and volatility dynamics.\n",
ifelse(pass_count >= 2,
"CONCLUSION: GARCH model captures most dynamics. Minor improvements possible.\n",
"CONCLUSION: Consider alternative specifications (EGARCH, GJR-GARCH, etc.)\n"))
)
cat(garch_diag)
##
## === GARCH DIAGNOSTIC TESTS ===
##
## 1. Ljung-Box on Standardized Residuals
## Purpose: Check if mean dynamics are captured
## H0: No remaining autocorrelation
## Q = 7.1911, p-value = 0.7073
## Result: PASS ✓
##
## 2. ARCH-LM on Standardized Residuals
## Purpose: Check if volatility dynamics are captured
## H0: No remaining ARCH effects
## Chi-sq = 9.2725, p-value = 0.5065
## Result: PASS ✓
##
## 3. Ljung-Box on Squared Standardized Residuals
## Purpose: Alternative volatility test
## H0: No autocorrelation in squared residuals
## Q = 7.5790, p-value = 0.6699
## Result: PASS ✓
##
## === OVERALL DIAGNOSTIC SUMMARY ===
##
## Tests passed: 3 out of 3
##
## CONCLUSION: GARCH model adequately captures both mean and volatility dynamics.
GARCH forecasts conditional volatility - expected standard deviation given current information.
For GARCH(1,1), forecasts converge to long-run volatility as horizon increases: \[E[\sigma^2_{t+h}] = \bar{\sigma}^2 + (\alpha + \beta)^{h-1}(\sigma^2_{t+1} - \bar{\sigma}^2)\]
h_vol <- 20
garch_fc <- ugarchforecast(final_garch, n.ahead = h_vol)
vol_forecast <- as.numeric(sigma(garch_fc))
ret_forecast <- as.numeric(fitted(garch_fc))
# Forecast table
vol_table <- data.frame(
Day = 1:10,
Daily_Vol_Pct = vol_forecast[1:10] * 100,
Annual_Vol_Pct = vol_forecast[1:10] * sqrt(252) * 100
)
kable(vol_table, digits = 4,
col.names = c("Day Ahead", "Daily Vol (%)", "Annualized Vol (%)"),
caption = "Volatility Forecasts (First 10 Days)") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
| Day Ahead | Daily Vol (%) | Annualized Vol (%) |
|---|---|---|
| 1 | 1.2152 | 19.2902 |
| 2 | 1.2472 | 19.7994 |
| 3 | 1.2775 | 20.2803 |
| 4 | 1.3062 | 20.7353 |
| 5 | 1.3334 | 21.1667 |
| 6 | 1.3592 | 21.5762 |
| 7 | 1.3837 | 21.9654 |
| 8 | 1.4070 | 22.3359 |
| 9 | 1.4293 | 22.6889 |
| 10 | 1.4505 | 23.0256 |
# Plot
longrun_vol <- sqrt(omega / (1 - alpha1 - beta1)) * 100
plot(1:h_vol, vol_forecast * 100, type = "b", pch = 19, col = "steelblue",
main = paste(ticker, "Volatility Forecast"),
xlab = "Days Ahead", ylab = "Daily Volatility (%)",
ylim = c(0, max(vol_forecast * 100) * 1.5))
abline(h = longrun_vol, col = "red", lty = 2, lwd = 2)
legend("topright", legend = c("Forecast", "Long-run level"),
col = c("steelblue", "red"), lty = c(1, 2), pch = c(19, NA))
20-Day Volatility Forecast
current_vol <- vol_forecast[1] * 100
end_vol <- vol_forecast[h_vol] * 100
vol_interp <- paste0(
"\n=== VOLATILITY FORECAST INTERPRETATION ===\n\n",
"Current (1-day) forecast: ", sprintf("%.4f%%", current_vol),
" daily (", sprintf("%.2f%%", current_vol * sqrt(252)), " annualized)\n",
"20-day forecast: ", sprintf("%.4f%%", end_vol),
" daily (", sprintf("%.2f%%", end_vol * sqrt(252)), " annualized)\n",
"Long-run volatility: ", sprintf("%.4f%%", longrun_vol),
" daily (", sprintf("%.2f%%", longrun_vol * sqrt(252)), " annualized)\n\n",
ifelse(current_vol > longrun_vol * 1.1,
paste0("Current volatility is ABOVE long-run level.\n",
"Volatility expected to DECREASE toward equilibrium.\n",
"Recent market turbulence is subsiding.\n"),
ifelse(current_vol < longrun_vol * 0.9,
paste0("Current volatility is BELOW long-run level.\n",
"Volatility expected to INCREASE toward equilibrium.\n",
"Current calm period may not persist.\n"),
paste0("Current volatility is NEAR long-run level.\n",
"Volatility expected to stay relatively stable.\n"))),
"\nHalf-life of shocks: ", sprintf("%.1f", halflife), " days\n",
"After ", round(halflife), " days, a volatility shock is halved.\n"
)
cat(vol_interp)
##
## === VOLATILITY FORECAST INTERPRETATION ===
##
## Current (1-day) forecast: 1.2152% daily (19.29% annualized)
## 20-day forecast: 1.6180% daily (25.69% annualized)
## Long-run volatility: 1.9937% daily (31.65% annualized)
##
## Current volatility is BELOW long-run level.
## Volatility expected to INCREASE toward equilibrium.
## Current calm period may not persist.
##
## Half-life of shocks: 21.6 days
## After 22 days, a volatility shock is halved.
exec_summary <- paste0(
"================================================================\n",
" EXECUTIVE SUMMARY: TIME SERIES ANALYSIS \n",
"================================================================\n\n",
"STOCK: ", ticker, "\n",
"PERIOD: ", start_dt, " to ", end_dt, " (", n_obs, " trading days)\n\n",
"--- RETURN CHARACTERISTICS ---\n\n",
" Mean daily return: ", sprintf("%.4f%%", mean_ret * 100),
" (", sprintf("%.2f%%", annual_ret * 100), " annualized)\n",
" Daily volatility: ", sprintf("%.4f%%", sd_ret * 100),
" (", sprintf("%.2f%%", annual_vol * 100), " annualized)\n",
" Skewness: ", sprintf("%.4f", skew_ret),
ifelse(abs(skew_ret) < 0.5, " (symmetric)\n",
ifelse(skew_ret < 0, " (negatively skewed)\n", " (positively skewed)\n")),
" Excess Kurtosis: ", sprintf("%.4f", excess_kurt),
" (fat tails - extreme events likely)\n\n",
"--- STATIONARITY ---\n\n",
" ADF test: ", ifelse(adf_result$p.value < 0.05, "Stationary", "Non-stationary"),
" (p = ", sprintf("%.4f", adf_result$p.value), ")\n",
" KPSS test: ", ifelse(kpss_result$p.value > 0.05, "Stationary", "Non-stationary"), "\n",
" Conclusion: Returns are stationary as expected.\n\n",
"--- ARIMA MODEL ---\n\n",
" Selected: ARIMA(", p_order, ", ", d_order, ", ", q_order, ")\n",
ifelse(p_order == 0 && q_order == 0,
" Interpretation: Returns are unpredictable (white noise)\n",
" Some return predictability detected\n"),
" Consistent with Efficient Market Hypothesis.\n\n",
"--- VOLATILITY (GARCH) ---\n\n",
" ARCH effects: ", ifelse(arch_exists, "Present", "Not detected"), "\n",
" Model: GARCH(1,1) with ", dist_used, " distribution\n",
" Persistence (α + β): ", sprintf("%.4f", persistence), "\n",
" Half-life: ", sprintf("%.1f", halflife), " days\n",
" Long-run volatility: ", sprintf("%.2f%%", longrun_vol_annual), " (annualized)\n\n",
"--- FORECASTS ---\n\n",
" 1-day return forecast: ", sprintf("%.4f%%", ret_forecast[1] * 100), "\n",
" 1-day volatility: ", sprintf("%.4f%%", vol_forecast[1] * 100),
" (", sprintf("%.2f%%", vol_forecast[1] * sqrt(252) * 100), " annualized)\n"
)
cat(exec_summary)
## ================================================================
## EXECUTIVE SUMMARY: TIME SERIES ANALYSIS
## ================================================================
##
## STOCK: AAPL
## PERIOD: 2019-01-02 to 2025-12-19 (1753 trading days)
##
## --- RETURN CHARACTERISTICS ---
##
## Mean daily return: 0.1134% (28.57% annualized)
## Daily volatility: 1.9586% (31.09% annualized)
## Skewness: -0.0837 (symmetric)
## Excess Kurtosis: 6.6024 (fat tails - extreme events likely)
##
## --- STATIONARITY ---
##
## ADF test: Stationary (p = 0.0100)
## KPSS test: Stationary
## Conclusion: Returns are stationary as expected.
##
## --- ARIMA MODEL ---
##
## Selected: ARIMA(4, 0, 1)
## Some return predictability detected
## Consistent with Efficient Market Hypothesis.
##
## --- VOLATILITY (GARCH) ---
##
## ARCH effects: Present
## Model: GARCH(1,1) with Student-t distribution
## Persistence (α + β): 0.9684
## Half-life: 21.6 days
## Long-run volatility: 31.65% (annualized)
##
## --- FORECASTS ---
##
## 1-day return forecast: 0.1612%
## 1-day volatility: 1.2152% (19.29% annualized)
takeaways <- paste0(
"\n=== KEY POINTS FOR YOUR WRITTEN REPORT ===\n\n",
"1. WHY LOG RETURNS?\n",
" Log returns are stationary, time-additive, and more suitable for\n",
" statistical modeling than raw prices.\n\n",
"2. STATIONARITY MATTERS\n",
" We confirmed stationarity using both ADF and KPSS tests.\n",
" This validates using ARIMA without differencing (d = 0).\n\n",
"3. EFFICIENT MARKETS IMPLICATION\n",
ifelse(p_order == 0 && q_order == 0,
paste0(" ARIMA(0,0,0) suggests returns are unpredictable.\n",
" Consistent with market efficiency - technical analysis\n",
" based on past returns would not generate excess profits.\n\n"),
paste0(" Some predictability detected, but effect is typically small\n",
" and may not survive transaction costs.\n\n")),
"4. VOLATILITY IS PREDICTABLE\n",
" Unlike returns, volatility shows strong persistence.\n",
" Implications for:\n",
" - Risk management: Adjust positions based on predicted volatility\n",
" - Option pricing: Use GARCH volatility forecasts\n",
" - Portfolio allocation: Reduce exposure in high-volatility periods\n\n",
"5. FAT TAILS WARNING\n",
" Excess kurtosis > 0 means extreme events occur more often\n",
" than normal distribution predicts. Standard VaR may\n",
" underestimate tail risk.\n\n",
"6. MODEL LIMITATIONS\n",
" - GARCH assumes symmetric response to positive/negative shocks\n",
" - Historical models may not capture regime changes\n",
" - Forecasts degrade over longer horizons\n"
)
cat(takeaways)
##
## === KEY POINTS FOR YOUR WRITTEN REPORT ===
##
## 1. WHY LOG RETURNS?
## Log returns are stationary, time-additive, and more suitable for
## statistical modeling than raw prices.
##
## 2. STATIONARITY MATTERS
## We confirmed stationarity using both ADF and KPSS tests.
## This validates using ARIMA without differencing (d = 0).
##
## 3. EFFICIENT MARKETS IMPLICATION
## Some predictability detected, but effect is typically small
## and may not survive transaction costs.
##
## 4. VOLATILITY IS PREDICTABLE
## Unlike returns, volatility shows strong persistence.
## Implications for:
## - Risk management: Adjust positions based on predicted volatility
## - Option pricing: Use GARCH volatility forecasts
## - Portfolio allocation: Reduce exposure in high-volatility periods
##
## 5. FAT TAILS WARNING
## Excess kurtosis > 0 means extreme events occur more often
## than normal distribution predicts. Standard VaR may
## underestimate tail risk.
##
## 6. MODEL LIMITATIONS
## - GARCH assumes symmetric response to positive/negative shocks
## - Historical models may not capture regime changes
## - Forecasts degrade over longer horizons
Key Academic Papers:
Engle, R. F. (1982). “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica, 50(4), 987-1007.
Bollerslev, T. (1986). “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics, 31(3), 307-327.
Box, G. E., & Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control. Holden-Day.
Tsay, R. S. (2010). Analysis of Financial Time Series. 3rd Edition, Wiley.
End of Part A Analysis
Document generated on 2025-12-21 using R version R version 4.4.3 (2025-02-28 ucrt)