FOUNDATION: What is Time Series Analysis?
- The Big Picture
SECTION 1: Data Collection [5%]
- Installing and Loading Packages
- Downloading Stock Price Data
  - What We’re Doing and Why
SECTION 2: Returns and Descriptive Statistics [5%]
SECTION 3: Autocorrelation Analysis (ACF/PACF) [5%]
SECTION 4: Stationarity Tests [5%]
SECTION 5: ARIMA Modeling [10%]
SECTION 6: Return Forecasting [3%]
- Forecasting Concept
- Price Forecasts
SECTION 7: ARCH/GARCH Modeling [10%]
SECTION 8: Volatility Forecasting [3%]
- Forecasting Conditional Volatility
SECTION 9: Summary and Conclusions [4%]

FOUNDATION: What is Time Series Analysis?

The Big Picture

What is a Time Series?

A time series is simply a sequence of data points collected over time. The key feature is that the order matters - each observation is linked to a specific moment in time.

Examples of time series:

Daily stock prices (what we analyze here)
Monthly unemployment rates
Quarterly GDP figures
Hourly temperature readings

What makes time series special?

Unlike regular data where observations are independent (like heights of random people), time series observations are typically dependent on each other. Today’s stock price is related to yesterday’s price. This dependency is both a challenge and an opportunity:

Challenge: Standard statistical methods assume independence, so we need special tools
Opportunity: If past values help predict future values, we can build forecasting models

Why Analyze Stock Returns (Not Prices)?

This is a crucial concept that many students miss.

The Problem with Prices:

Stock prices tend to wander - they go up, down, and don’t settle around any particular value. Statistically, we call this non-stationarity. A price of $100 today tells us nothing about whether prices will average $100 in the future.

The Solution - Returns:

Returns measure the percentage change in price. Unlike prices, returns tend to fluctuate around zero (or a small positive average). This is stationarity - the statistical properties stay stable over time.

Simple Returns vs Log Returns

Simple Return: \[R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1\]

This is the intuitive “percentage change” - if a stock goes from $100 to $105, the simple return is 5%.

Log Return (Continuously Compounded Return): \[r_t = \ln\left(\frac{P_t}{P_{t-1}}\right) = \ln(P_t) - \ln(P_{t-1})\]

Why do we prefer log returns?

Time Additivity: Multi-period log returns simply add up. If Monday’s log return is 2% and Tuesday’s is 3%, the two-day log return is exactly 5%. Simple returns don’t add this way (they compound).
Symmetry: A +10% log return followed by -10% log return brings you back to start. Simple returns: +10% then -10% leaves you at 99% of original.
Statistical Properties: Log returns are more likely to be normally distributed.
Approximation: For small returns (< 10%), log returns ≈ simple returns.

In this analysis, we use LOG RETURNS throughout.

The Overall Workflow

Our analysis follows this logical flow:

Get Price Data
Calculate Log Returns
Check if Returns are Stationary (ADF, KPSS tests)
Examine Autocorrelation (ACF, PACF)
Build ARIMA Model for Return Prediction
Check Residuals for ARCH Effects (Volatility Clustering)
Build GARCH Model for Volatility Prediction
Forecast Returns and Volatility

Each step builds on the previous one. We can’t skip steps because later steps assume earlier conditions are met.

SECTION 1: Data Collection [5%]

Installing and Loading Packages

# Run this ONCE to install packages (not every time you knit)
install.packages(c("quantmod", "xts", "zoo", "tseries", "forecast", 
                   "rugarch", "FinTS", "lmtest", "ggplot2", "moments",
                   "knitr", "kableExtra", "scales"))

library(quantmod)
library(xts)
library(zoo)
library(tseries)
library(forecast)
library(rugarch)
library(FinTS)
library(ggplot2)
library(moments)
library(knitr)
library(kableExtra)
library(scales)

message("All packages loaded successfully!")

Downloading Stock Price Data

What We’re Doing and Why

We need historical stock prices to analyze. Yahoo Finance provides free daily data including Open, High, Low, Close, Volume, and Adjusted Close. We use Adjusted Close because it reflects the true value an investor would have experienced, accounting for corporate actions like stock splits and dividend payments.

# Define parameters
ticker <- "AAPL"
start_date <- "2019-01-01"
end_date <- Sys.Date()

# Download data from Yahoo Finance
stock_data <- getSymbols(
  Symbols = ticker,
  src = "yahoo",
  from = start_date,
  to = end_date,
  auto.assign = FALSE
)

# Extract Adjusted Close price
price <- Ad(stock_data)
colnames(price) <- "Price"

# Generate data summary
data_summary <- paste0(
  "DATA SUMMARY\n",
  "============\n",
  "Stock: ", ticker, "\n",
  "Period: ", as.character(index(price)[1]), " to ", 
             as.character(index(price)[nrow(price)]), "\n",
  "Trading days: ", nrow(price), "\n",
  "Starting price: $", round(as.numeric(price[1]), 2), "\n",
  "Ending price: $", round(as.numeric(price[nrow(price)]), 2), "\n",
  "Missing values: ", sum(is.na(price)), "\n"
)
cat(data_summary)

## DATA SUMMARY
## ============
## Stock: AAPL
## Period: 2019-01-02 to 2025-12-19
## Trading days: 1753
## Starting price: $37.54
## Ending price: $273.67
## Missing values: 0

SECTION 2: Returns and Descriptive Statistics [5%]

Visualizing the Price Series

A price chart reveals the trend and volatility patterns in the data.

price_df <- data.frame(Date = index(price), Price = as.numeric(price))

ggplot(price_df, aes(x = Date, y = Price)) +
  geom_line(color = "steelblue", linewidth = 0.6) +
  labs(title = paste(ticker, "Daily Adjusted Closing Price"),
       x = "Date", y = "Price (USD)") +
  theme_minimal(base_size = 12) +
  scale_y_continuous(labels = dollar_format())

Daily Closing Prices

Calculating Log Returns

Definition of Log Return: \[r_t = \ln(P_t) - \ln(P_{t-1})\]

# Calculate log returns
returns <- diff(log(price))
returns <- na.omit(returns)
colnames(returns) <- "Returns"
returns_vec <- as.numeric(returns)

# Store key dates
start_dt <- as.character(index(price)[1])
end_dt <- as.character(index(price)[nrow(price)])
n_obs <- nrow(price)

Summary Statistics

Understanding these statistics is crucial for financial analysis:

Location Measures: Mean (average return) and Median (middle value)

Dispersion Measures: Standard Deviation (volatility/risk measure), Variance, Range

Shape Measures:

Skewness measures asymmetry. Negative skewness means larger losses than gains.
Kurtosis measures tail thickness. Excess kurtosis > 0 means “fat tails” - extreme events more likely than normal distribution predicts.

# Calculate all statistics
mean_ret <- mean(returns_vec)
median_ret <- median(returns_vec)
sd_ret <- sd(returns_vec)
var_ret <- var(returns_vec)
min_ret <- min(returns_vec)
max_ret <- max(returns_vec)
skew_ret <- skewness(returns_vec)
kurt_ret <- kurtosis(returns_vec)
excess_kurt <- kurt_ret - 3
annual_ret <- mean_ret * 252
annual_vol <- sd_ret * sqrt(252)

# Create comprehensive summary
summary_text <- paste0(
  "\n=== SUMMARY STATISTICS ===\n\n",
  "CENTRAL TENDENCY\n",
  "  Mean daily return:     ", sprintf("%.6f", mean_ret), 
      " (", sprintf("%.4f%%", mean_ret * 100), ")\n",
  "  Median daily return:   ", sprintf("%.6f", median_ret), "\n",
  "  Annualized return:     ", sprintf("%.2f%%", annual_ret * 100), "\n\n",
  
  "RISK MEASURES\n",
  "  Daily volatility:      ", sprintf("%.6f", sd_ret), 
      " (", sprintf("%.4f%%", sd_ret * 100), ")\n",
  "  Annualized volatility: ", sprintf("%.2f%%", annual_vol * 100), "\n",
  "  Minimum return:        ", sprintf("%.4f%%", min_ret * 100), "\n",
  "  Maximum return:        ", sprintf("%.4f%%", max_ret * 100), "\n\n",
  
  "DISTRIBUTION SHAPE\n",
  "  Skewness:              ", sprintf("%.4f", skew_ret),
      ifelse(abs(skew_ret) < 0.5, " (approximately symmetric)\n",
             ifelse(skew_ret < 0, " (left-skewed: larger losses)\n",
                    " (right-skewed: larger gains)\n")),
  "  Kurtosis:              ", sprintf("%.4f", kurt_ret), "\n",
  "  Excess Kurtosis:       ", sprintf("%.4f", excess_kurt),
      ifelse(excess_kurt > 0, " (fat tails: extreme events more likely)\n",
             " (thin tails)\n")
)
cat(summary_text)

## 
## === SUMMARY STATISTICS ===
## 
## CENTRAL TENDENCY
##   Mean daily return:     0.001134 (0.1134%)
##   Median daily return:   0.001306
##   Annualized return:     28.57%
## 
## RISK MEASURES
##   Daily volatility:      0.019586 (1.9586%)
##   Annualized volatility: 31.09%
##   Minimum return:        -13.7708%
##   Maximum return:        14.2617%
## 
## DISTRIBUTION SHAPE
##   Skewness:              -0.0837 (approximately symmetric)
##   Kurtosis:              9.6024
##   Excess Kurtosis:       6.6024 (fat tails: extreme events more likely)

Visualizing Returns

returns_df <- data.frame(Date = index(returns), Returns = returns_vec)

ggplot(returns_df, aes(x = Date, y = Returns)) +
  geom_line(color = "steelblue", linewidth = 0.4) +
  geom_hline(yintercept = 0, color = "red", linetype = "dashed") +
  labs(title = paste(ticker, "Daily Log Returns"),
       subtitle = "Notice: Periods of high volatility cluster together",
       x = "Date", y = "Log Return") +
  theme_minimal(base_size = 12) +
  scale_y_continuous(labels = percent_format())

Daily Log Returns showing volatility clustering

ggplot(returns_df, aes(x = Returns)) +
  geom_histogram(aes(y = after_stat(density)), bins = 50, 
                 fill = "steelblue", color = "white", alpha = 0.7) +
  stat_function(fun = dnorm, args = list(mean = mean_ret, sd = sd_ret),
                color = "red", linewidth = 1.2, linetype = "dashed") +
  labs(title = "Distribution of Returns vs Normal Distribution",
       subtitle = "Blue = actual, Red dashed = normal. Note the fat tails.",
       x = "Log Return", y = "Density") +
  theme_minimal(base_size = 12)

Distribution of Returns vs Normal Distribution

Normality Test (Jarque-Bera)

The Jarque-Bera test checks whether data follows a normal distribution by examining skewness and kurtosis.

$H_0$: Data is normally distributed
$H_1$: Data is NOT normally distributed

jb_test <- jarque.bera.test(returns_vec)

jb_result <- paste0(
  "\n=== JARQUE-BERA TEST FOR NORMALITY ===\n\n",
  "Hypotheses:\n",
  "  H0: Returns are normally distributed\n",
  "  H1: Returns are NOT normally distributed\n\n",
  "Results:\n",
  "  Test Statistic: ", sprintf("%.4f", jb_test$statistic), "\n",
  "  p-value:        ", sprintf("%.6f", jb_test$p.value), "\n\n",
  "Conclusion:\n",
  ifelse(jb_test$p.value < 0.05,
         paste0("  REJECT H0 at 5% significance level.\n",
                "  Returns are NOT normally distributed.\n\n",
                "  Implications:\n",
                "  - Risk measures assuming normality may underestimate true risk\n",
                "  - Fat tails mean extreme losses are more likely than normal predicts\n",
                "  - This justifies using GARCH models for volatility\n"),
         "  Cannot reject H0. No strong evidence against normality.\n")
)
cat(jb_result)

## 
## === JARQUE-BERA TEST FOR NORMALITY ===
## 
## Hypotheses:
##   H0: Returns are normally distributed
##   H1: Returns are NOT normally distributed
## 
## Results:
##   Test Statistic: 3184.2718
##   p-value:        0.000000
## 
## Conclusion:
##   REJECT H0 at 5% significance level.
##   Returns are NOT normally distributed.
## 
##   Implications:
##   - Risk measures assuming normality may underestimate true risk
##   - Fat tails mean extreme losses are more likely than normal predicts
##   - This justifies using GARCH models for volatility

SECTION 3: Autocorrelation Analysis (ACF/PACF) [5%]

What is Autocorrelation?

Autocorrelation measures how correlated a time series is with its own past values. It answers: “If I know today’s return, does that tell me anything about tomorrow’s?”

Mathematical Definition: \[\rho_k = \frac{Cov(r_t, r_{t-k})}{Var(r_t)}\]

Interpretation:

$\rho_k = 1$: Perfect positive correlation with lag k
$\rho_k = 0$: No correlation (past returns don’t predict current)
$\rho_k = -1$: Perfect negative correlation

Why This Matters:

The Efficient Market Hypothesis states that returns should be unpredictable. If we find significant autocorrelation, it suggests some predictability exists.

ACF and PACF

ACF (Autocorrelation Function): Shows total correlation at each lag, including direct and indirect effects.

PACF (Partial ACF): Shows correlation at lag k after removing effects of intermediate lags. Isolates the direct relationship.

Pattern Recognition for Model Selection:

True Model	ACF Pattern	PACF Pattern
AR(p)	Decays gradually	Cuts off after lag p
MA(q)	Cuts off after lag q	Decays gradually
ARMA(p,q)	Decays gradually	Decays gradually
White Noise	All near 0	All near 0

par(mfrow = c(2, 1))
acf(returns_vec, lag.max = 30, main = paste("ACF of", ticker, "Returns"),
    col = "steelblue", lwd = 2)
pacf(returns_vec, lag.max = 30, main = paste("PACF of", ticker, "Returns"),
     col = "steelblue", lwd = 2)

ACF and PACF of Returns

par(mfrow = c(1, 1))

# Calculate ACF/PACF and analyze
n <- length(returns_vec)
ci_bound <- 1.96 / sqrt(n)
acf_result <- acf(returns_vec, lag.max = 30, plot = FALSE)
pacf_result <- pacf(returns_vec, lag.max = 30, plot = FALSE)

acf_values <- acf_result$acf[-1]
pacf_values <- pacf_result$acf

sig_acf <- which(abs(acf_values) > ci_bound)
sig_pacf <- which(abs(pacf_values) > ci_bound)

acf_analysis <- paste0(
  "\n=== ACF/PACF ANALYSIS ===\n\n",
  "Sample size: ", n, "\n",
  "95% confidence bounds: ±", sprintf("%.4f", ci_bound), "\n\n",
  "Significant ACF lags: ", 
  ifelse(length(sig_acf) == 0, "None", paste(sig_acf, collapse = ", ")), "\n",
  "Significant PACF lags: ",
  ifelse(length(sig_pacf) == 0, "None", paste(sig_pacf, collapse = ", ")), "\n\n",
  "Interpretation:\n",
  ifelse(length(sig_acf) == 0 && length(sig_pacf) == 0,
         paste0("  Both ACF and PACF show no significant correlations.\n",
                "  Returns appear to be WHITE NOISE (unpredictable).\n",
                "  This is consistent with the Efficient Market Hypothesis.\n",
                "  Implication: ARIMA(0,0,0) may be appropriate.\n"),
         paste0("  Some significant lags detected.\n",
                "  An ARIMA model may capture this predictability.\n"))
)
cat(acf_analysis)

## 
## === ACF/PACF ANALYSIS ===
## 
## Sample size: 1752
## 95% confidence bounds: ±0.0468
## 
## Significant ACF lags: 1, 6, 7, 8, 9, 10, 18, 22
## Significant PACF lags: 1, 6, 8, 9, 18
## 
## Interpretation:
##   Some significant lags detected.
##   An ARIMA model may capture this predictability.

Ljung-Box Test for White Noise

The Ljung-Box test checks whether autocorrelations up to lag m are jointly equal to zero.

$H_0$: Data is white noise (no autocorrelation)
$H_1$: Autocorrelation exists

lb_5 <- Box.test(returns_vec, lag = 5, type = "Ljung-Box")
lb_10 <- Box.test(returns_vec, lag = 10, type = "Ljung-Box")
lb_20 <- Box.test(returns_vec, lag = 20, type = "Ljung-Box")

lb_result <- paste0(
  "\n=== LJUNG-BOX TEST FOR WHITE NOISE ===\n\n",
  "Hypotheses:\n",
  "  H0: Returns are white noise (no autocorrelation)\n",
  "  H1: Autocorrelation exists\n\n",
  "Results:\n",
  sprintf("  Lag  5: Q = %7.4f, p-value = %.4f  %s\n", 
          lb_5$statistic, lb_5$p.value,
          ifelse(lb_5$p.value < 0.05, "→ Reject H0", "→ Cannot reject H0")),
  sprintf("  Lag 10: Q = %7.4f, p-value = %.4f  %s\n",
          lb_10$statistic, lb_10$p.value,
          ifelse(lb_10$p.value < 0.05, "→ Reject H0", "→ Cannot reject H0")),
  sprintf("  Lag 20: Q = %7.4f, p-value = %.4f  %s\n",
          lb_20$statistic, lb_20$p.value,
          ifelse(lb_20$p.value < 0.05, "→ Reject H0", "→ Cannot reject H0")),
  "\nConclusion:\n",
  ifelse(all(c(lb_5$p.value, lb_10$p.value, lb_20$p.value) > 0.05),
         paste0("  Cannot reject H0 at any lag.\n",
                "  Returns appear to be WHITE NOISE.\n",
                "  Past returns do not help predict future returns.\n"),
         paste0("  Reject H0 - significant autocorrelation detected.\n",
                "  An ARIMA model may capture this predictability.\n"))
)
cat(lb_result)

## 
## === LJUNG-BOX TEST FOR WHITE NOISE ===
## 
## Hypotheses:
##   H0: Returns are white noise (no autocorrelation)
##   H1: Autocorrelation exists
## 
## Results:
##   Lag  5: Q = 17.6678, p-value = 0.0034  → Reject H0
##   Lag 10: Q = 64.9937, p-value = 0.0000  → Reject H0
##   Lag 20: Q = 86.4768, p-value = 0.0000  → Reject H0
## 
## Conclusion:
##   Reject H0 - significant autocorrelation detected.
##   An ARIMA model may capture this predictability.

SECTION 4: Stationarity Tests [5%]

What is Stationarity and Why Does it Matter?

A time series is stationary if its statistical properties don’t change over time:

Constant Mean: Average value is stable
Constant Variance: Volatility is stable
Constant Autocovariance: Correlations depend only on lag, not time

Why Stationarity Matters:

Meaningful statistics require stable properties
Forecasting assumes future resembles past
Standard tests require stationarity
Non-stationary regressions can be spurious

Stock Prices vs Returns:

Prices: Non-stationary (wander without returning to a mean)
Returns: Typically stationary (fluctuate around a stable average)

The ADF Test (Augmented Dickey-Fuller)

Tests for a unit root - the signature of non-stationarity.

$H_0$: Unit root exists (series is NON-STATIONARY)
$H_1$: No unit root (series is STATIONARY)

We want to REJECT H0 to confirm stationarity.

adf_result <- adf.test(returns, alternative = "stationary")

adf_output <- paste0(
  "\n=== AUGMENTED DICKEY-FULLER TEST ===\n\n",
  "Hypotheses:\n",
  "  H0: Unit root exists (series is NON-STATIONARY)\n",
  "  H1: No unit root (series is STATIONARY)\n\n",
  "Results:\n",
  "  Test Statistic: ", sprintf("%.4f", adf_result$statistic), "\n",
  "  Lag Order:      ", adf_result$parameter, "\n",
  "  p-value:        ", sprintf("%.6f", adf_result$p.value), "\n\n",
  "Critical Values (approximate):\n",
  "  1% level:  -3.43\n",
  "  5% level:  -2.86\n",
  "  10% level: -2.57\n\n",
  "Conclusion:\n",
  ifelse(adf_result$p.value < 0.05,
         paste0("  p-value < 0.05: REJECT H0 at 5% significance level.\n",
                "  The returns series is STATIONARY.\n"),
         paste0("  p-value >= 0.05: Cannot reject H0.\n",
                "  Evidence suggests non-stationarity (unusual for returns).\n"))
)
cat(adf_output)

## 
## === AUGMENTED DICKEY-FULLER TEST ===
## 
## Hypotheses:
##   H0: Unit root exists (series is NON-STATIONARY)
##   H1: No unit root (series is STATIONARY)
## 
## Results:
##   Test Statistic: -11.8165
##   Lag Order:      12
##   p-value:        0.010000
## 
## Critical Values (approximate):
##   1% level:  -3.43
##   5% level:  -2.86
##   10% level: -2.57
## 
## Conclusion:
##   p-value < 0.05: REJECT H0 at 5% significance level.
##   The returns series is STATIONARY.

The KPSS Test

The KPSS test has opposite hypotheses to ADF:

$H_0$: Series is STATIONARY
$H_1$: Series is NON-STATIONARY

We want to FAIL TO REJECT H0 to confirm stationarity.

Using both tests gives a more robust conclusion.

kpss_result <- kpss.test(returns, null = "Level")

kpss_output <- paste0(
  "\n=== KPSS TEST ===\n\n",
  "Hypotheses (OPPOSITE of ADF):\n",
  "  H0: Series is STATIONARY\n",
  "  H1: Series is NON-STATIONARY\n\n",
  "Results:\n",
  "  Test Statistic: ", sprintf("%.4f", kpss_result$statistic), "\n",
  "  Truncation Lag: ", kpss_result$parameter, "\n",
  "  p-value:        ", 
  ifelse(kpss_result$p.value > 0.1, "> 0.10", sprintf("%.4f", kpss_result$p.value)), "\n\n",
  "Critical Values:\n",
  "  10% level: 0.347\n",
  "  5% level:  0.463\n",
  "  1% level:  0.739\n\n",
  "Conclusion:\n",
  ifelse(kpss_result$p.value > 0.05,
         "  p-value > 0.05: Cannot reject H0. Evidence supports STATIONARITY.\n",
         "  p-value < 0.05: Reject H0. Evidence against stationarity.\n")
)
cat(kpss_output)

## 
## === KPSS TEST ===
## 
## Hypotheses (OPPOSITE of ADF):
##   H0: Series is STATIONARY
##   H1: Series is NON-STATIONARY
## 
## Results:
##   Test Statistic: 0.2224
##   Truncation Lag: 8
##   p-value:        0.1000
## 
## Critical Values:
##   10% level: 0.347
##   5% level:  0.463
##   1% level:  0.739
## 
## Conclusion:
##   p-value > 0.05: Cannot reject H0. Evidence supports STATIONARITY.

Combined Stationarity Assessment

adf_stationary <- adf_result$p.value < 0.05
kpss_stationary <- kpss_result$p.value > 0.05

combined_result <- paste0(
  "\n=== COMBINED STATIONARITY ASSESSMENT ===\n\n",
  "Test Results:\n",
  "  ADF Test:  ", ifelse(adf_stationary, "Stationary (reject unit root)", 
                          "Non-stationary (cannot reject unit root)"), "\n",
  "  KPSS Test: ", ifelse(kpss_stationary, "Stationary (cannot reject stationarity)",
                          "Non-stationary (reject stationarity)"), "\n\n",
  "Overall Conclusion:\n",
  ifelse(adf_stationary && kpss_stationary,
         paste0("  BOTH TESTS AGREE: Returns are STATIONARY.\n",
                "  This is the expected result for financial returns.\n",
                "  We can proceed with ARIMA modeling using d = 0 (no differencing).\n"),
         ifelse(!adf_stationary && !kpss_stationary,
                "  TESTS CONFLICT: Results are inconclusive. Further investigation needed.\n",
                "  MIXED RESULTS: One test suggests stationarity, one doesn't.\n"))
)
cat(combined_result)

## 
## === COMBINED STATIONARITY ASSESSMENT ===
## 
## Test Results:
##   ADF Test:  Stationary (reject unit root)
##   KPSS Test: Stationary (cannot reject stationarity)
## 
## Overall Conclusion:
##   BOTH TESTS AGREE: Returns are STATIONARY.
##   This is the expected result for financial returns.
##   We can proceed with ARIMA modeling using d = 0 (no differencing).

SECTION 5: ARIMA Modeling [10%]

What is ARIMA?

ARIMA = AutoRegressive Integrated Moving Average

An ARIMA(p, d, q) model has three components:

p (AR): AutoRegressive - today’s value depends on past values
d (I): Integrated - number of differences needed for stationarity
q (MA): Moving Average - today’s value depends on past shocks

The AR Component

AR(1) Model: $r_t = c + \phi_1 r_{t-1} + \varepsilon_t$

“Today’s return = constant + (coefficient × yesterday’s return) + random shock”

If $\phi_1 > 0$: Momentum (high returns follow high returns)
If $\phi_1 < 0$: Mean reversion (high returns follow low returns)

The MA Component

MA(1) Model: $r_t = c + \varepsilon_t + \theta_1 \varepsilon_{t-1}$

“Today’s return = constant + today’s shock + (coefficient × yesterday’s shock)”

The MA component captures how the series responds to news/innovations.

The I Component

For our returns: Since returns are typically stationary, we use d = 0. An ARIMA(p, 0, q) is also called ARMA(p, q).

Model Selection

We use Information Criteria to balance fit against complexity:

AIC: $-2\ln(L) + 2k$
BIC: $-2\ln(L) + k\ln(n)$

Lower is better. BIC penalizes complexity more, choosing simpler models.

# Automatic model selection
auto_fit <- auto.arima(
  returns,
  d = 0,
  max.p = 5,
  max.q = 5,
  ic = "aic",
  stepwise = FALSE,
  approximation = FALSE
)

p_order <- auto_fit$arma[1]
q_order <- auto_fit$arma[2]
d_order <- auto_fit$arma[6]

arima_selection <- paste0(
  "\n=== AUTOMATIC ARIMA SELECTION ===\n\n",
  "Search parameters: p ∈ [0,5], d = 0, q ∈ [0,5]\n",
  "Selection criterion: AIC\n\n",
  "SELECTED MODEL: ARIMA(", p_order, ", ", d_order, ", ", q_order, ")\n\n",
  "Model fit statistics:\n",
  "  AIC: ", sprintf("%.4f", AIC(auto_fit)), "\n",
  "  BIC: ", sprintf("%.4f", BIC(auto_fit)), "\n"
)
cat(arima_selection)

## 
## === AUTOMATIC ARIMA SELECTION ===
## 
## Search parameters: p ∈ [0,5], d = 0, q ∈ [0,5]
## Selection criterion: AIC
## 
## SELECTED MODEL: ARIMA(4, 0, 1)
## 
## Model fit statistics:
##   AIC: -8827.4883
##   BIC: -8789.2088

Model Comparison

# Fit candidate models
m000 <- Arima(returns, order = c(0, 0, 0))
m100 <- Arima(returns, order = c(1, 0, 0))
m001 <- Arima(returns, order = c(0, 0, 1))
m101 <- Arima(returns, order = c(1, 0, 1))

models <- list(m000, m100, m001, m101)
model_names <- c("ARIMA(0,0,0)", "ARIMA(1,0,0)", "ARIMA(0,0,1)", "ARIMA(1,0,1)")

comparison <- data.frame(
  Model = model_names,
  AIC = sapply(models, AIC),
  BIC = sapply(models, BIC)
)
comparison <- comparison[order(comparison$AIC), ]

kable(comparison, digits = 2, row.names = FALSE,
      caption = "Model Comparison (lower is better)") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Model Comparison (lower is better)
Model	AIC	BIC
ARIMA(1,0,0)	-8816.53	-8800.12
ARIMA(0,0,1)	-8815.90	-8799.49
ARIMA(1,0,1)	-8815.88	-8794.01
ARIMA(0,0,0)	-8805.98	-8795.04

Final ARIMA Model

final_arima <- auto_fit

cat("\n=== FINAL ARIMA MODEL ===\n\n")

## 
## === FINAL ARIMA MODEL ===

summary(final_arima)

## Series: returns 
## ARIMA(4,0,1) with non-zero mean 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4     ma1    mean
##       -0.9174  -0.0443  -0.0111  -0.0756  0.8417  0.0011
## s.e.   0.0576   0.0329   0.0326   0.0249  0.0531  0.0004
## 
## sigma^2 = 0.0003779:  log likelihood = 4420.74
## AIC=-8827.49   AICc=-8827.42   BIC=-8789.21
## 
## Training set error measures:
##                         ME       RMSE        MAE  MPE MAPE      MASE
## Training set -6.209177e-06 0.01940518 0.01357841 -Inf  Inf 0.6897393
##                     ACF1
## Training set 0.003339183

# Interpretation
coefs <- coef(final_arima)

interpretation <- paste0(
  "\n=== MODEL INTERPRETATION ===\n\n"
)

if (length(coefs) == 0 || (length(coefs) == 1 && names(coefs) == "intercept")) {
  interpretation <- paste0(interpretation,
    "Model: ARIMA(0,0,0) - White Noise with Mean\n\n",
    "This means:\n",
    "  • Returns are UNPREDICTABLE from past values\n",
    "  • The best forecast for tomorrow is simply the mean return\n",
    "  • This is consistent with the Efficient Market Hypothesis\n",
    "  • Past returns contain no information about future returns\n"
  )
  if ("intercept" %in% names(coefs)) {
    interpretation <- paste0(interpretation,
      "\nMean return: ", sprintf("%.6f", coefs["intercept"]), 
      " (", sprintf("%.4f%%", coefs["intercept"]*100), " daily)\n"
    )
  }
} else {
  interpretation <- paste0(interpretation, "Estimated coefficients:\n\n")
  for (nm in names(coefs)) {
    interpretation <- paste0(interpretation,
      "  ", nm, " = ", sprintf("%.6f", coefs[nm]), "\n"
    )
  }
}
cat(interpretation)

## 
## === MODEL INTERPRETATION ===
## 
## Estimated coefficients:
## 
##   ar1 = -0.917423
##   ar2 = -0.044339
##   ar3 = -0.011130
##   ar4 = -0.075634
##   ma1 = 0.841684
##   intercept = 0.001140

Model Diagnostics

If our ARIMA model is correct, the residuals should be uncorrelated white noise.

arima_residuals <- residuals(final_arima)
resid_vec <- as.numeric(arima_residuals)

par(mfrow = c(2, 2))
plot(arima_residuals, main = "Residuals Over Time", ylab = "Residuals", col = "steelblue")
abline(h = 0, col = "red", lty = 2)
acf(resid_vec, main = "ACF of Residuals", col = "steelblue", lwd = 2)
hist(resid_vec, breaks = 40, freq = FALSE, col = "steelblue",
     main = "Histogram of Residuals", xlab = "Residuals", border = "white")
curve(dnorm(x, mean(resid_vec), sd(resid_vec)), add = TRUE, col = "red", lwd = 2)
qqnorm(resid_vec, main = "Q-Q Plot", col = "steelblue", pch = 20)
qqline(resid_vec, col = "red", lwd = 2)

ARIMA Model Diagnostics

par(mfrow = c(1, 1))

lb_resid <- Box.test(resid_vec, lag = 10, type = "Ljung-Box")
jb_resid <- jarque.bera.test(resid_vec)

diag_result <- paste0(
  "\n=== RESIDUAL DIAGNOSTIC TESTS ===\n\n",
  "1. Ljung-Box Test (autocorrelation in residuals)\n",
  "   H0: Residuals are white noise\n",
  "   Q-statistic: ", sprintf("%.4f", lb_resid$statistic), "\n",
  "   p-value: ", sprintf("%.4f", lb_resid$p.value), "\n",
  "   Result: ", ifelse(lb_resid$p.value > 0.05, 
                        "PASS - No autocorrelation remains ✓",
                        "FAIL - Some autocorrelation remains ✗"), "\n\n",
  "2. Jarque-Bera Test (normality of residuals)\n",
  "   H0: Residuals are normally distributed\n",
  "   JB-statistic: ", sprintf("%.4f", jb_resid$statistic), "\n",
  "   p-value: ", sprintf("%.6f", jb_resid$p.value), "\n",
  "   Result: ", ifelse(jb_resid$p.value > 0.05,
                        "Residuals appear normal ✓",
                        "Non-normal residuals (common for financial data)"), "\n"
)
cat(diag_result)

## 
## === RESIDUAL DIAGNOSTIC TESTS ===
## 
## 1. Ljung-Box Test (autocorrelation in residuals)
##    H0: Residuals are white noise
##    Q-statistic: 13.0497
##    p-value: 0.2209
##    Result: PASS - No autocorrelation remains ✓
## 
## 2. Jarque-Bera Test (normality of residuals)
##    H0: Residuals are normally distributed
##    JB-statistic: 2111.5059
##    p-value: 0.000000
##    Result: Non-normal residuals (common for financial data)

SECTION 6: Return Forecasting [3%]

Forecasting Concept

Our ARIMA model predicts expected returns (the mean), not actual returns: \[r_{t+1} = E[r_{t+1}] + \varepsilon_{t+1}\]

We can predict $E[r_{t+1}]$ but not $\varepsilon_{t+1}$ (the random shock).

Prediction intervals capture forecast uncertainty - they widen as we forecast further ahead.

h <- 5
fc <- forecast(final_arima, h = h)

# Forecast table
fc_table <- data.frame(
  Day = 1:h,
  Forecast_Pct = as.numeric(fc$mean) * 100,
  Lower_95_Pct = as.numeric(fc$lower[, 2]) * 100,
  Upper_95_Pct = as.numeric(fc$upper[, 2]) * 100
)

kable(fc_table, digits = 4,
      col.names = c("Day Ahead", "Forecast (%)", "Lower 95%", "Upper 95%"),
      caption = "Return Forecasts with 95% Prediction Intervals") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Return Forecasts with 95% Prediction Intervals
Day Ahead	Forecast (%)	Lower 95%	Upper 95%
1	0.0065	-3.8034	3.8164
2	0.2787	-3.5421	4.0995
3	-0.0383	-3.8603	3.7837
4	0.2151	-3.6087	4.0389
5	0.0342	-3.7939	3.8623

plot(fc, main = paste(ticker, "Return Forecast"),
     xlab = "Time", ylab = "Return", col = "steelblue", lwd = 2)

5-Day Return Forecast

Price Forecasts

Since log returns are: $r_t = \ln(P_t) - \ln(P_{t-1})$

We recover prices: $P_t = P_{t-1} \times e^{r_t}$

last_price <- as.numeric(tail(price, 1))
cum_returns <- cumsum(as.numeric(fc$mean))
price_forecasts <- last_price * exp(cum_returns)

price_result <- paste0(
  "\n=== PRICE FORECASTS ===\n\n",
  "Last known price: $", sprintf("%.2f", last_price), "\n\n",
  "Forecasted prices:\n"
)
for (i in 1:h) {
  price_result <- paste0(price_result,
    "  Day ", i, ": $", sprintf("%.2f", price_forecasts[i]),
    " (cumulative return: ", sprintf("%.4f%%", cum_returns[i] * 100), ")\n"
  )
}
price_result <- paste0(price_result,
  "\n5-day forecast: $", sprintf("%.2f", last_price), " → $", 
  sprintf("%.2f", price_forecasts[h]), 
  " (", sprintf("%.2f%%", (price_forecasts[h]/last_price - 1) * 100), " change)\n"
)
cat(price_result)

## 
## === PRICE FORECASTS ===
## 
## Last known price: $273.67
## 
## Forecasted prices:
##   Day 1: $273.69 (cumulative return: 0.0065%)
##   Day 2: $274.45 (cumulative return: 0.2852%)
##   Day 3: $274.35 (cumulative return: 0.2469%)
##   Day 4: $274.94 (cumulative return: 0.4620%)
##   Day 5: $275.03 (cumulative return: 0.4962%)
## 
## 5-day forecast: $273.67 → $275.03 (0.50% change)

SECTION 7: ARCH/GARCH Modeling [10%]

The Problem: Volatility Clustering

ARIMA models the conditional mean but assumes constant variance. However, financial returns show volatility clustering - periods of high volatility tend to persist, as do calm periods.

This is visible in our returns plot: large moves (positive or negative) cluster together. GARCH models capture this predictable volatility.

Understanding ARCH

ARCH = AutoRegressive Conditional Heteroskedasticity

ARCH(1) Model:

Mean: $r_t = \mu + \varepsilon_t$ where $\varepsilon_t = \sigma_t \cdot z_t$ and $z_t \sim N(0,1)$
Variance: $\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2$

In words: Today’s variance depends on yesterday’s squared shock. Big shocks (positive or negative) increase tomorrow’s expected volatility.

Testing for ARCH Effects

Before fitting GARCH, we verify that ARCH effects exist.

par(mfrow = c(2, 1))
plot(index(returns), resid_vec^2, type = "h", col = "steelblue",
     main = "Squared Residuals (Volatility Proxy)",
     xlab = "Date", ylab = "Squared Residuals")
abline(h = mean(resid_vec^2), col = "red", lty = 2)
acf(resid_vec^2, main = "ACF of Squared Residuals", lag.max = 30, 
    col = "steelblue", lwd = 2)

Evidence of Volatility Clustering

par(mfrow = c(1, 1))

arch_test_5 <- ArchTest(resid_vec, lags = 5)
arch_test_10 <- ArchTest(resid_vec, lags = 10)

arch_exists <- arch_test_5$p.value < 0.05 || arch_test_10$p.value < 0.05

arch_result <- paste0(
  "\n=== ARCH-LM TEST ===\n\n",
  "Hypotheses:\n",
  "  H0: No ARCH effects (variance is constant)\n",
  "  H1: ARCH effects present (volatility is predictable)\n\n",
  "Results:\n",
  sprintf("  Lag  5: Chi-sq = %.4f, p-value = %.4f  %s\n",
          arch_test_5$statistic, arch_test_5$p.value,
          ifelse(arch_test_5$p.value < 0.05, "→ Reject H0", "→ Cannot reject")),
  sprintf("  Lag 10: Chi-sq = %.4f, p-value = %.4f  %s\n",
          arch_test_10$statistic, arch_test_10$p.value,
          ifelse(arch_test_10$p.value < 0.05, "→ Reject H0", "→ Cannot reject")),
  "\nConclusion:\n",
  ifelse(arch_exists,
         paste0("  ARCH effects are PRESENT.\n",
                "  Volatility is predictable from past squared residuals.\n",
                "  A GARCH model is appropriate.\n"),
         paste0("  No strong evidence of ARCH effects.\n",
                "  Variance may be constant. GARCH may not improve the model.\n"))
)
cat(arch_result)

## 
## === ARCH-LM TEST ===
## 
## Hypotheses:
##   H0: No ARCH effects (variance is constant)
##   H1: ARCH effects present (volatility is predictable)
## 
## Results:
##   Lag  5: Chi-sq = 247.1889, p-value = 0.0000  → Reject H0
##   Lag 10: Chi-sq = 268.0612, p-value = 0.0000  → Reject H0
## 
## Conclusion:
##   ARCH effects are PRESENT.
##   Volatility is predictable from past squared residuals.
##   A GARCH model is appropriate.

The GARCH Model

GARCH(1,1) - The workhorse model:

Mean: $r_t = \mu + \varepsilon_t$
Variance: $\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2$

Components:

$\omega$: Baseline variance (constant)
$\alpha_1$: ARCH effect - impact of yesterday’s shock
$\beta_1$: GARCH effect - persistence of past variance

Key derived quantities:

Persistence: $\alpha + \beta$ (how long shocks last; must be < 1 for stationarity)
Half-life: $\ln(0.5) / \ln(\alpha + \beta)$ (days for shock to decay by half)
Long-run variance: $\omega / (1 - \alpha - \beta)$

Estimating GARCH

# Specify GARCH(1,1) with Student-t distribution (handles fat tails)
spec_t <- ugarchspec(
  variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
  mean.model = list(armaOrder = c(0, 0), include.mean = TRUE),
  distribution.model = "std"
)

spec_norm <- ugarchspec(
  variance.model = list(model = "sGARCH", garchOrder = c(1, 1)),
  mean.model = list(armaOrder = c(0, 0), include.mean = TRUE),
  distribution.model = "norm"
)

# Fit models
fit_t <- ugarchfit(spec_t, data = returns, solver = "hybrid")
fit_norm <- ugarchfit(spec_norm, data = returns, solver = "hybrid")

# Compare
compare_df <- data.frame(
  Model = c("GARCH(1,1) Normal", "GARCH(1,1) Student-t"),
  LogLik = c(likelihood(fit_norm), likelihood(fit_t)),
  AIC = c(infocriteria(fit_norm)[1], infocriteria(fit_t)[1]),
  BIC = c(infocriteria(fit_norm)[2], infocriteria(fit_t)[2])
)

kable(compare_df, digits = 4,
      caption = "GARCH Model Comparison (lower AIC/BIC is better)") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

GARCH Model Comparison (lower AIC/BIC is better)
Model	LogLik	AIC	BIC
GARCH(1,1) Normal	4598.455	-5.2448	-5.2323
GARCH(1,1) Student-t	4673.228	-5.3290	-5.3134

# Select best model
if (infocriteria(fit_t)[2] < infocriteria(fit_norm)[2]) {
  final_garch <- fit_t
  dist_used <- "Student-t"
} else {
  final_garch <- fit_norm
  dist_used <- "Normal"
}

cat("\nSelected model:", dist_used, "GARCH(1,1)\n")

## 
## Selected model: Student-t GARCH(1,1)

GARCH Results and Interpretation

cat("\n=== FINAL GARCH MODEL ===\n\n")

## 
## === FINAL GARCH MODEL ===

print(final_garch)

## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001612    0.000346   4.6622    3e-06
## omega   0.000013    0.000002   7.6447    0e+00
## alpha1  0.097152    0.003036  31.9960    0e+00
## beta1   0.871233    0.013214  65.9346    0e+00
## shape   4.803874    0.475949  10.0933    0e+00
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.001612    0.000355   4.5340    6e-06
## omega   0.000013    0.000003   4.9654    1e-06
## alpha1  0.097152    0.010973   8.8538    0e+00
## beta1   0.871233    0.012951  67.2702    0e+00
## shape   4.803874    0.578671   8.3016    0e+00
## 
## LogLikelihood : 4673.228 
## 
## Information Criteria
## ------------------------------------
##                     
## Akaike       -5.3290
## Bayes        -5.3134
## Shibata      -5.3290
## Hannan-Quinn -5.3233
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                   0.004474  0.9467
## Lag[2*(p+q)+(p+q)-1][2]  0.111083  0.9120
## Lag[4*(p+q)+(p+q)-1][5]  1.030053  0.8528
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      0.531  0.4662
## Lag[2*(p+q)+(p+q)-1][5]     1.992  0.6204
## Lag[4*(p+q)+(p+q)-1][9]     3.649  0.6488
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.1778 0.500 2.000  0.6733
## ARCH Lag[5]    2.8358 1.440 1.667  0.3145
## ARCH Lag[7]    3.8934 2.315 1.543  0.3622
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  19.6985
## Individual Statistics:             
## mu     0.6105
## omega  4.9202
## alpha1 0.6530
## beta1  0.5412
## shape  0.9963
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias          0.30538 0.7601    
## Negative Sign Bias 1.32420 0.1856    
## Positive Sign Bias 0.07407 0.9410    
## Joint Effect       3.59422 0.3087    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     18.07       0.5179
## 2    30     27.97       0.5197
## 3    40     30.92       0.8185
## 4    50     35.10       0.9326
## 
## 
## Elapsed time : 0.1088591

coefs <- coef(final_garch)
mu <- coefs["mu"]
omega <- coefs["omega"]
alpha1 <- coefs["alpha1"]
beta1 <- coefs["beta1"]

persistence <- alpha1 + beta1
halflife <- log(0.5) / log(persistence)
longrun_var <- omega / (1 - alpha1 - beta1)
longrun_vol_daily <- sqrt(longrun_var)
longrun_vol_annual <- longrun_vol_daily * sqrt(252) * 100

garch_interp <- paste0(
  "\n=== GARCH COEFFICIENT INTERPRETATION ===\n\n",
  "MEAN EQUATION:\n",
  "  μ (mean return) = ", sprintf("%.6f", mu), 
  " (", sprintf("%.4f%%", mu * 100), " daily)\n",
  "  Annualized return: ", sprintf("%.2f%%", mu * 252 * 100), "\n\n",
  
  "VARIANCE EQUATION: σ²_t = ω + α₁ε²_{t-1} + β₁σ²_{t-1}\n\n",
  
  "  ω (omega)  = ", sprintf("%.10f", omega), "\n",
  "    Baseline variance component\n\n",
  
  "  α₁ (alpha) = ", sprintf("%.6f", alpha1), "\n",
  "    ARCH effect: Impact of yesterday's shock on today's variance\n",
  "    A 1% unexpected return increases tomorrow's variance by ", 
  sprintf("%.4f", alpha1 * 0.0001 * 10000), " basis points\n\n",
  
  "  β₁ (beta)  = ", sprintf("%.6f", beta1), "\n",
  "    GARCH effect: Persistence of past variance\n",
  "    ", sprintf("%.1f%%", beta1 * 100), " of yesterday's variance carries over\n\n",
  
  "DERIVED QUANTITIES:\n\n",
  "  Persistence (α + β) = ", sprintf("%.6f", persistence), "\n",
  ifelse(persistence < 1,
         "    Less than 1: Model is stationary ✓\n",
         "    Equals/exceeds 1: Integrated GARCH (shocks permanent)\n"),
  "\n  Half-life = ", sprintf("%.1f", halflife), " days\n",
  "    Time for a volatility shock to decay by half\n",
  "\n  Long-run daily volatility  = ", sprintf("%.4f%%", longrun_vol_daily * 100), "\n",
  "  Long-run annual volatility = ", sprintf("%.2f%%", longrun_vol_annual), "\n",
  "    Volatility eventually returns to this level\n"
)

if ("shape" %in% names(coefs)) {
  shape <- coefs["shape"]
  garch_interp <- paste0(garch_interp,
    "\n  Shape (ν) = ", sprintf("%.4f", shape), "\n",
    "    Student-t degrees of freedom (lower = fatter tails)\n"
  )
}

cat(garch_interp)

## 
## === GARCH COEFFICIENT INTERPRETATION ===
## 
## MEAN EQUATION:
##   μ (mean return) = 0.001612 (0.1612% daily)
##   Annualized return: 40.61%
## 
## VARIANCE EQUATION: σ²_t = ω + α₁ε²_{t-1} + β₁σ²_{t-1}
## 
##   ω (omega)  = 0.0000125662
##     Baseline variance component
## 
##   α₁ (alpha) = 0.097152
##     ARCH effect: Impact of yesterday's shock on today's variance
##     A 1% unexpected return increases tomorrow's variance by 0.0972 basis points
## 
##   β₁ (beta)  = 0.871233
##     GARCH effect: Persistence of past variance
##     87.1% of yesterday's variance carries over
## 
## DERIVED QUANTITIES:
## 
##   Persistence (α + β) = 0.968385
##     Less than 1: Model is stationary ✓
## 
##   Half-life = 21.6 days
##     Time for a volatility shock to decay by half
## 
##   Long-run daily volatility  = 1.9937%
##   Long-run annual volatility = 31.65%
##     Volatility eventually returns to this level
## 
##   Shape (ν) = 4.8039
##     Student-t degrees of freedom (lower = fatter tails)

GARCH Diagnostics

If GARCH correctly captures volatility dynamics, the standardized residuals $z_t = \varepsilon_t / \sigma_t$ should be white noise with no remaining ARCH effects.

# In rugarch, standardized residuals are obtained differently
std_resid <- as.numeric(residuals(final_garch) / sigma(final_garch))
cond_vol <- as.numeric(sigma(final_garch))

par(mfrow = c(3, 2))

# Conditional volatility
plot(index(returns), cond_vol, type = "l", col = "steelblue",
     main = "Conditional Volatility Over Time", xlab = "Date", ylab = "Daily σ")

# Returns with volatility bands
plot(index(returns), returns_vec, type = "l", col = "gray60",
     main = "Returns with ±2σ Bands", xlab = "Date", ylab = "Return")
lines(index(returns), 2 * cond_vol, col = "red", lty = 2)
lines(index(returns), -2 * cond_vol, col = "red", lty = 2)

# Standardized residuals
plot(index(returns), std_resid, type = "l", col = "steelblue",
     main = "Standardized Residuals", xlab = "Date", ylab = "z_t")
abline(h = c(-2, 0, 2), col = c("red", "black", "red"), lty = c(2, 1, 2))

# ACF of squared standardized residuals
acf(std_resid^2, main = "ACF of Squared Std Residuals", lag.max = 30,
    col = "steelblue", lwd = 2)

# Histogram
hist(std_resid, breaks = 50, freq = FALSE, col = "steelblue",
     main = "Distribution of Std Residuals", xlab = "z_t", border = "white")
curve(dnorm(x), add = TRUE, col = "red", lwd = 2)

# Q-Q plot
qqnorm(std_resid, main = "Q-Q Plot of Std Residuals", col = "steelblue", pch = 20)
qqline(std_resid, col = "red", lwd = 2)

GARCH Model Diagnostics

par(mfrow = c(1, 1))

lb_std <- Box.test(std_resid, lag = 10, type = "Ljung-Box")
arch_std <- ArchTest(std_resid, lags = 10)
lb_sq <- Box.test(std_resid^2, lag = 10, type = "Ljung-Box")

pass_lb <- lb_std$p.value > 0.05
pass_arch <- arch_std$p.value > 0.05
pass_sq <- lb_sq$p.value > 0.05
pass_count <- sum(c(pass_lb, pass_arch, pass_sq))

garch_diag <- paste0(
  "\n=== GARCH DIAGNOSTIC TESTS ===\n\n",
  "1. Ljung-Box on Standardized Residuals\n",
  "   Purpose: Check if mean dynamics are captured\n",
  "   H0: No remaining autocorrelation\n",
  "   Q = ", sprintf("%.4f", lb_std$statistic), 
  ", p-value = ", sprintf("%.4f", lb_std$p.value), "\n",
  "   Result: ", ifelse(pass_lb, "PASS ✓", "FAIL ✗"), "\n\n",
  
  "2. ARCH-LM on Standardized Residuals\n",
  "   Purpose: Check if volatility dynamics are captured\n",
  "   H0: No remaining ARCH effects\n",
  "   Chi-sq = ", sprintf("%.4f", arch_std$statistic),
  ", p-value = ", sprintf("%.4f", arch_std$p.value), "\n",
  "   Result: ", ifelse(pass_arch, "PASS ✓", "FAIL ✗"), "\n\n",
  
  "3. Ljung-Box on Squared Standardized Residuals\n",
  "   Purpose: Alternative volatility test\n",
  "   H0: No autocorrelation in squared residuals\n",
  "   Q = ", sprintf("%.4f", lb_sq$statistic),
  ", p-value = ", sprintf("%.4f", lb_sq$p.value), "\n",
  "   Result: ", ifelse(pass_sq, "PASS ✓", "FAIL ✗"), "\n\n",
  
  "=== OVERALL DIAGNOSTIC SUMMARY ===\n\n",
  "Tests passed: ", pass_count, " out of 3\n\n",
  ifelse(pass_count == 3,
         "CONCLUSION: GARCH model adequately captures both mean and volatility dynamics.\n",
         ifelse(pass_count >= 2,
                "CONCLUSION: GARCH model captures most dynamics. Minor improvements possible.\n",
                "CONCLUSION: Consider alternative specifications (EGARCH, GJR-GARCH, etc.)\n"))
)
cat(garch_diag)

## 
## === GARCH DIAGNOSTIC TESTS ===
## 
## 1. Ljung-Box on Standardized Residuals
##    Purpose: Check if mean dynamics are captured
##    H0: No remaining autocorrelation
##    Q = 7.1911, p-value = 0.7073
##    Result: PASS ✓
## 
## 2. ARCH-LM on Standardized Residuals
##    Purpose: Check if volatility dynamics are captured
##    H0: No remaining ARCH effects
##    Chi-sq = 9.2725, p-value = 0.5065
##    Result: PASS ✓
## 
## 3. Ljung-Box on Squared Standardized Residuals
##    Purpose: Alternative volatility test
##    H0: No autocorrelation in squared residuals
##    Q = 7.5790, p-value = 0.6699
##    Result: PASS ✓
## 
## === OVERALL DIAGNOSTIC SUMMARY ===
## 
## Tests passed: 3 out of 3
## 
## CONCLUSION: GARCH model adequately captures both mean and volatility dynamics.

SECTION 8: Volatility Forecasting [3%]

Forecasting Conditional Volatility

GARCH forecasts conditional volatility - expected standard deviation given current information.

For GARCH(1,1), forecasts converge to long-run volatility as horizon increases: \[E[\sigma^2_{t+h}] = \bar{\sigma}^2 + (\alpha + \beta)^{h-1}(\sigma^2_{t+1} - \bar{\sigma}^2)\]

h_vol <- 20
garch_fc <- ugarchforecast(final_garch, n.ahead = h_vol)

vol_forecast <- as.numeric(sigma(garch_fc))
ret_forecast <- as.numeric(fitted(garch_fc))

# Forecast table
vol_table <- data.frame(
  Day = 1:10,
  Daily_Vol_Pct = vol_forecast[1:10] * 100,
  Annual_Vol_Pct = vol_forecast[1:10] * sqrt(252) * 100
)

kable(vol_table, digits = 4,
      col.names = c("Day Ahead", "Daily Vol (%)", "Annualized Vol (%)"),
      caption = "Volatility Forecasts (First 10 Days)") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Volatility Forecasts (First 10 Days)
Day Ahead	Daily Vol (%)	Annualized Vol (%)
1	1.2152	19.2902
2	1.2472	19.7994
3	1.2775	20.2803
4	1.3062	20.7353
5	1.3334	21.1667
6	1.3592	21.5762
7	1.3837	21.9654
8	1.4070	22.3359
9	1.4293	22.6889
10	1.4505	23.0256

# Plot
longrun_vol <- sqrt(omega / (1 - alpha1 - beta1)) * 100
plot(1:h_vol, vol_forecast * 100, type = "b", pch = 19, col = "steelblue",
     main = paste(ticker, "Volatility Forecast"),
     xlab = "Days Ahead", ylab = "Daily Volatility (%)",
     ylim = c(0, max(vol_forecast * 100) * 1.5))
abline(h = longrun_vol, col = "red", lty = 2, lwd = 2)
legend("topright", legend = c("Forecast", "Long-run level"),
       col = c("steelblue", "red"), lty = c(1, 2), pch = c(19, NA))

20-Day Volatility Forecast

current_vol <- vol_forecast[1] * 100
end_vol <- vol_forecast[h_vol] * 100

vol_interp <- paste0(
  "\n=== VOLATILITY FORECAST INTERPRETATION ===\n\n",
  "Current (1-day) forecast: ", sprintf("%.4f%%", current_vol),
  " daily (", sprintf("%.2f%%", current_vol * sqrt(252)), " annualized)\n",
  "20-day forecast: ", sprintf("%.4f%%", end_vol),
  " daily (", sprintf("%.2f%%", end_vol * sqrt(252)), " annualized)\n",
  "Long-run volatility: ", sprintf("%.4f%%", longrun_vol),
  " daily (", sprintf("%.2f%%", longrun_vol * sqrt(252)), " annualized)\n\n",
  
  ifelse(current_vol > longrun_vol * 1.1,
         paste0("Current volatility is ABOVE long-run level.\n",
                "Volatility expected to DECREASE toward equilibrium.\n",
                "Recent market turbulence is subsiding.\n"),
         ifelse(current_vol < longrun_vol * 0.9,
                paste0("Current volatility is BELOW long-run level.\n",
                       "Volatility expected to INCREASE toward equilibrium.\n",
                       "Current calm period may not persist.\n"),
                paste0("Current volatility is NEAR long-run level.\n",
                       "Volatility expected to stay relatively stable.\n"))),
  
  "\nHalf-life of shocks: ", sprintf("%.1f", halflife), " days\n",
  "After ", round(halflife), " days, a volatility shock is halved.\n"
)
cat(vol_interp)

## 
## === VOLATILITY FORECAST INTERPRETATION ===
## 
## Current (1-day) forecast: 1.2152% daily (19.29% annualized)
## 20-day forecast: 1.6180% daily (25.69% annualized)
## Long-run volatility: 1.9937% daily (31.65% annualized)
## 
## Current volatility is BELOW long-run level.
## Volatility expected to INCREASE toward equilibrium.
## Current calm period may not persist.
## 
## Half-life of shocks: 21.6 days
## After 22 days, a volatility shock is halved.

SECTION 9: Summary and Conclusions [4%]

Executive Summary

exec_summary <- paste0(
  "================================================================\n",
  "           EXECUTIVE SUMMARY: TIME SERIES ANALYSIS              \n",
  "================================================================\n\n",
  
  "STOCK: ", ticker, "\n",
  "PERIOD: ", start_dt, " to ", end_dt, " (", n_obs, " trading days)\n\n",
  
  "--- RETURN CHARACTERISTICS ---\n\n",
  "  Mean daily return:     ", sprintf("%.4f%%", mean_ret * 100), 
  " (", sprintf("%.2f%%", annual_ret * 100), " annualized)\n",
  "  Daily volatility:      ", sprintf("%.4f%%", sd_ret * 100),
  " (", sprintf("%.2f%%", annual_vol * 100), " annualized)\n",
  "  Skewness:              ", sprintf("%.4f", skew_ret),
  ifelse(abs(skew_ret) < 0.5, " (symmetric)\n",
         ifelse(skew_ret < 0, " (negatively skewed)\n", " (positively skewed)\n")),
  "  Excess Kurtosis:       ", sprintf("%.4f", excess_kurt),
  " (fat tails - extreme events likely)\n\n",
  
  "--- STATIONARITY ---\n\n",
  "  ADF test: ", ifelse(adf_result$p.value < 0.05, "Stationary", "Non-stationary"),
  " (p = ", sprintf("%.4f", adf_result$p.value), ")\n",
  "  KPSS test: ", ifelse(kpss_result$p.value > 0.05, "Stationary", "Non-stationary"), "\n",
  "  Conclusion: Returns are stationary as expected.\n\n",
  
  "--- ARIMA MODEL ---\n\n",
  "  Selected: ARIMA(", p_order, ", ", d_order, ", ", q_order, ")\n",
  ifelse(p_order == 0 && q_order == 0,
         "  Interpretation: Returns are unpredictable (white noise)\n",
         "  Some return predictability detected\n"),
  "  Consistent with Efficient Market Hypothesis.\n\n",
  
  "--- VOLATILITY (GARCH) ---\n\n",
  "  ARCH effects: ", ifelse(arch_exists, "Present", "Not detected"), "\n",
  "  Model: GARCH(1,1) with ", dist_used, " distribution\n",
  "  Persistence (α + β): ", sprintf("%.4f", persistence), "\n",
  "  Half-life: ", sprintf("%.1f", halflife), " days\n",
  "  Long-run volatility: ", sprintf("%.2f%%", longrun_vol_annual), " (annualized)\n\n",
  
  "--- FORECASTS ---\n\n",
  "  1-day return forecast: ", sprintf("%.4f%%", ret_forecast[1] * 100), "\n",
  "  1-day volatility: ", sprintf("%.4f%%", vol_forecast[1] * 100),
  " (", sprintf("%.2f%%", vol_forecast[1] * sqrt(252) * 100), " annualized)\n"
)
cat(exec_summary)

## ================================================================
##            EXECUTIVE SUMMARY: TIME SERIES ANALYSIS              
## ================================================================
## 
## STOCK: AAPL
## PERIOD: 2019-01-02 to 2025-12-19 (1753 trading days)
## 
## --- RETURN CHARACTERISTICS ---
## 
##   Mean daily return:     0.1134% (28.57% annualized)
##   Daily volatility:      1.9586% (31.09% annualized)
##   Skewness:              -0.0837 (symmetric)
##   Excess Kurtosis:       6.6024 (fat tails - extreme events likely)
## 
## --- STATIONARITY ---
## 
##   ADF test: Stationary (p = 0.0100)
##   KPSS test: Stationary
##   Conclusion: Returns are stationary as expected.
## 
## --- ARIMA MODEL ---
## 
##   Selected: ARIMA(4, 0, 1)
##   Some return predictability detected
##   Consistent with Efficient Market Hypothesis.
## 
## --- VOLATILITY (GARCH) ---
## 
##   ARCH effects: Present
##   Model: GARCH(1,1) with Student-t distribution
##   Persistence (α + β): 0.9684
##   Half-life: 21.6 days
##   Long-run volatility: 31.65% (annualized)
## 
## --- FORECASTS ---
## 
##   1-day return forecast: 0.1612%
##   1-day volatility: 1.2152% (19.29% annualized)

Key Takeaways

takeaways <- paste0(
  "\n=== KEY POINTS FOR YOUR WRITTEN REPORT ===\n\n",
  
  "1. WHY LOG RETURNS?\n",
  "   Log returns are stationary, time-additive, and more suitable for\n",
  "   statistical modeling than raw prices.\n\n",
  
  "2. STATIONARITY MATTERS\n",
  "   We confirmed stationarity using both ADF and KPSS tests.\n",
  "   This validates using ARIMA without differencing (d = 0).\n\n",
  
  "3. EFFICIENT MARKETS IMPLICATION\n",
  ifelse(p_order == 0 && q_order == 0,
         paste0("   ARIMA(0,0,0) suggests returns are unpredictable.\n",
                "   Consistent with market efficiency - technical analysis\n",
                "   based on past returns would not generate excess profits.\n\n"),
         paste0("   Some predictability detected, but effect is typically small\n",
                "   and may not survive transaction costs.\n\n")),
  
  "4. VOLATILITY IS PREDICTABLE\n",
  "   Unlike returns, volatility shows strong persistence.\n",
  "   Implications for:\n",
  "   - Risk management: Adjust positions based on predicted volatility\n",
  "   - Option pricing: Use GARCH volatility forecasts\n",
  "   - Portfolio allocation: Reduce exposure in high-volatility periods\n\n",
  
  "5. FAT TAILS WARNING\n",
  "   Excess kurtosis > 0 means extreme events occur more often\n",
  "   than normal distribution predicts. Standard VaR may\n",
  "   underestimate tail risk.\n\n",
  
  "6. MODEL LIMITATIONS\n",
  "   - GARCH assumes symmetric response to positive/negative shocks\n",
  "   - Historical models may not capture regime changes\n",
  "   - Forecasts degrade over longer horizons\n"
)
cat(takeaways)

## 
## === KEY POINTS FOR YOUR WRITTEN REPORT ===
## 
## 1. WHY LOG RETURNS?
##    Log returns are stationary, time-additive, and more suitable for
##    statistical modeling than raw prices.
## 
## 2. STATIONARITY MATTERS
##    We confirmed stationarity using both ADF and KPSS tests.
##    This validates using ARIMA without differencing (d = 0).
## 
## 3. EFFICIENT MARKETS IMPLICATION
##    Some predictability detected, but effect is typically small
##    and may not survive transaction costs.
## 
## 4. VOLATILITY IS PREDICTABLE
##    Unlike returns, volatility shows strong persistence.
##    Implications for:
##    - Risk management: Adjust positions based on predicted volatility
##    - Option pricing: Use GARCH volatility forecasts
##    - Portfolio allocation: Reduce exposure in high-volatility periods
## 
## 5. FAT TAILS WARNING
##    Excess kurtosis > 0 means extreme events occur more often
##    than normal distribution predicts. Standard VaR may
##    underestimate tail risk.
## 
## 6. MODEL LIMITATIONS
##    - GARCH assumes symmetric response to positive/negative shocks
##    - Historical models may not capture regime changes
##    - Forecasts degrade over longer horizons

References

Key Academic Papers:

Engle, R. F. (1982). “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica, 50(4), 987-1007.
Bollerslev, T. (1986). “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics, 31(3), 307-327.
Box, G. E., & Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control. Holden-Day.
Tsay, R. S. (2010). Analysis of Financial Time Series. 3rd Edition, Wiley.

End of Part A Analysis

Document generated on 2025-12-21 using R version R version 4.4.3 (2025-02-28 ucrt)

Part A: Time Series Analysis of Stock Returns

A Complete Educational Guide - MSc Applied Financial Econometrics

Student Name - Student ID

2025-12-21