Time series analysis is a critical tool for understanding and forecasting financial markets. This project focuses on Bitcoin (BTC), the leading cryptocurrency, to demonstrate a comprehensive approach to time series modeling. Cryptocurrencies exhibit unique characteristics including high volatility, 24/7 trading, and strong volatility clustering—making them ideal candidates for advanced time series techniques.
This analysis aims to:
Our approach follows a systematic pipeline:
| Step | Description |
|---|---|
| Data Preparation | Load, clean, and validate the dataset |
| Stationarity Analysis | Confirm non-stationarity and transform to stationary series |
| Mean Modeling | Fit OLS and ARIMAX models for log returns |
| Volatility Modeling | Test for ARCH effects and fit eGARCH model |
| Forecasting | Generate 5-day ahead predictions with uncertainty quantification |
# Load required libraries
library(dplyr)
library(ggplot2)
library(gridExtra)
library(forecast)
library(lmtest)
library(tseries)
library(zoo)
library(lubridate)
library(knitr)
library(rugarch)
library(FinTS)We begin by loading the Bitcoin price data and performing essential data cleaning operations including date parsing, type conversion, and handling missing values.
# Load and clean data function
load_and_clean_data <- function(file_path) {
# Read CSV file
data <- read.csv(file_path, stringsAsFactors = FALSE)
# Remove second row if it contains currency metadata
if (nrow(data) > 1 && grepl("BTC-USD|USD", data[1, 2], ignore.case = TRUE)) {
data <- data[-1, ]
}
# Parse Date column
data$Date <- as.Date(data$Date, format = "%Y-%m-%d")
# Convert numeric columns
data$Close <- as.numeric(data$Close)
data$Open <- as.numeric(data$Open)
data$High <- as.numeric(data$High)
data$Low <- as.numeric(data$Low)
data$Volume <- as.numeric(data$Volume)
# Sort by date (ascending)
data <- data %>% arrange(Date)
# Remove duplicate dates
data <- data %>% distinct(Date, .keep_all = TRUE)
# Remove rows with missing values
data <- data %>% filter(!is.na(Date), !is.na(Close), !is.na(Volume))
return(data)
}
# Function to check trading days
check_trading_days <- function(data) {
data$Weekday <- wday(data$Date, week_start = 1)
has_weekends <- any(data$Weekday %in% c(6, 7))
total_days <- nrow(data)
weekend_count <- sum(data$Weekday %in% c(6, 7))
list(
has_weekends = has_weekends,
total_days = total_days,
weekend_count = weekend_count
)
}# Load Bitcoin data
csv_file <- "archive/bitcoin.csv"
data <- load_and_clean_data(csv_file)
# Display dataset information
cat("Dataset Summary:\n")## Dataset Summary:
## - Date Range: 2014-09-17 to 2026-01-05
## - Total Observations: 4129
## - Trading Days Info:
## $has_weekends
## [1] TRUE
##
## $total_days
## [1] 4129
##
## $weekend_count
## [1] 1180
# Close Price over time
p1 <- ggplot(data, aes(x = Date, y = Close)) +
geom_line(color = "steelblue", linewidth = 0.6) +
labs(title = "Bitcoin Close Price Over Time",
x = "Date",
y = "Close Price (USD)") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Volume over time
p2 <- ggplot(data, aes(x = Date, y = Volume)) +
geom_line(color = "darkgreen", linewidth = 0.6) +
labs(title = "Bitcoin Trading Volume Over Time",
x = "Date",
y = "Volume") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
grid.arrange(p1, p2, ncol = 1)# Calculate summary statistics
summary_stats <- data %>%
summarise(
Mean_Price = mean(Close, na.rm = TRUE),
Median_Price = median(Close, na.rm = TRUE),
SD_Price = sd(Close, na.rm = TRUE),
Min_Price = min(Close, na.rm = TRUE),
Max_Price = max(Close, na.rm = TRUE),
Mean_Volume = mean(Volume, na.rm = TRUE)
)
kable(summary_stats, caption = "Summary Statistics for Bitcoin", digits = 2)| Mean_Price | Median_Price | SD_Price | Min_Price | Max_Price | Mean_Volume |
|---|---|---|---|---|---|
| 26925.96 | 10779.9 | 31709.9 | 178.1 | 124752.5 | 21643259806 |
Time series models assume stationarity—constant mean and variance over time. Financial price levels typically violate this assumption due to trends and changing volatility. We must transform the data to achieve stationarity before modeling.
# Create log-transformed series
data$log_Close <- log(data$Close)
data$log_Volume <- log(data$Volume)
# Calculate log returns
data$returns <- c(NA, diff(data$log_Close))
# Rolling statistics window
window <- min(60, floor(nrow(data) / 10))
# Calculate rolling mean and SD for Close price
data$Close_rolling_mean <- rollmean(data$Close, k = window, fill = NA, align = "right")
data$Close_rolling_sd <- rollapply(data$Close, width = window, FUN = sd, fill = NA, align = "right")# Plot 1: Close Price with rolling statistics
p1 <- ggplot(data, aes(x = Date)) +
geom_line(aes(y = Close), color = "steelblue", linewidth = 0.6) +
geom_line(aes(y = Close_rolling_mean), color = "red", linewidth = 1, linetype = "dashed", na.rm = TRUE) +
geom_ribbon(aes(ymin = Close_rolling_mean - Close_rolling_sd,
ymax = Close_rolling_mean + Close_rolling_sd),
alpha = 0.2, fill = "red", na.rm = TRUE) +
labs(title = "Close Price with Rolling Mean ± SD (60-day window)",
subtitle = "Non-stationary: trending mean and changing variance",
x = "Date", y = "Close Price") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Plot 2: ACF of Close Price
acf_close <- acf(data$Close, plot = FALSE, na.action = na.pass)
acf_df_close <- data.frame(Lag = acf_close$lag[,1,1], ACF = acf_close$acf[,1,1]) %>%
filter(Lag > 0)
p2 <- ggplot(acf_df_close, aes(x = Lag, y = ACF)) +
geom_bar(stat = "identity", fill = "steelblue", width = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_hline(yintercept = c(-1.96, 1.96) / sqrt(length(data$Close)),
linetype = "dashed", color = "red", alpha = 0.7) +
labs(title = "ACF of Close Price",
subtitle = "Slow decay → non-stationarity",
x = "Lag", y = "ACF") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
grid.arrange(p1, p2, ncol = 2)
Diagnosis: Close price shows a clear trend and ACF
values near 1 with slow decay → non-stationary.
Solution: Use log returns: \(r_t = \log(C_t) - \log(C_{t-1})\)
ggplot(data, aes(x = Date, y = returns)) +
geom_line(color = "steelblue", linewidth = 0.5) +
geom_hline(yintercept = 0, color = "red", linetype = "dashed", alpha = 0.7) +
labs(title = "Log Returns: r(t) = log(Cₜ) - log(Cₜ₋₁)",
subtitle = "Mean-reverting around zero with volatility clustering",
x = "Date",
y = "Log Returns") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))# Remove NA values
returns_clean <- na.omit(data$returns)
# ADF test
# H0: Unit root (non-stationary)
adf.test(returns_clean)##
## Augmented Dickey-Fuller Test
##
## data: returns_clean
## Dickey-Fuller = -14.595, Lag order = 16, p-value = 0.01
## alternative hypothesis: stationary
##
## KPSS Test for Trend Stationarity
##
## data: returns_clean
## KPSS Trend = 0.077407, Truncation lag parameter = 10, p-value = 0.1
Conclusion: Log returns are stationary—ADF rejects the unit root hypothesis, and KPSS fails to reject stationarity. Visually, they are mean-reverting around ~0 with changing variance over time, i.e., much closer to stationary (in mean) and exhibiting volatility clustering. We can proceed with modeling.
We create the dependent variable (log returns) and relevant predictors:
prepare_regression_data <- function(data) {
# Dependent variable: log returns
data$log_returns <- c(NA, diff(log(data$Close)))
# Regressors
data$lag_return <- lag(data$log_returns, 1)
data$intraday_range <- log(data$High / data$Low)
data$log_Volume <- log(data$Volume)
data$delta_log_Volume <- c(NA, diff(data$log_Volume))
# Remove incomplete observations
data <- data %>% filter(!is.na(log_returns),
!is.na(lag_return),
!is.na(delta_log_Volume),
!is.na(intraday_range))
return(data)
}
data_reg <- prepare_regression_data(data)
cat("Observations for modeling:", nrow(data_reg), "\n")## Observations for modeling: 4127
\[r_t = \beta_0 + \beta_1 r_{t-1} + \beta_2 \Delta\log(Vol_t) + \beta_3 \log(H_t/L_t) + \varepsilon_t\]
model1 <- lm(log_returns ~ lag_return + delta_log_Volume + intraday_range,
data = data_reg)
summary(model1)##
## Call:
## lm(formula = log_returns ~ lag_return + delta_log_Volume + intraday_range,
## data = data_reg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.41120 -0.01483 -0.00204 0.01393 0.25323
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.006891 0.000890 7.743 1.22e-14 ***
## lag_return -0.035920 0.015536 -2.312 0.0208 *
## delta_log_Volume 0.007332 0.001733 4.231 2.37e-05 ***
## intraday_range -0.128431 0.016223 -7.916 3.12e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03511 on 4123 degrees of freedom
## Multiple R-squared: 0.01615, Adjusted R-squared: 0.01543
## F-statistic: 22.56 on 3 and 4123 DF, p-value: 1.752e-14
You found small but real mean effects: slight mean reversion, volume-associated drift, and a strong negative relationship with intraday volatility.
But the model is not a strong return predictor (R² ~ 1.6%).
Residual behavior likely indicates heteroskedasticity and heavy tails, so pairing this with ARMA errors and/or GARCH volatility is the natural next step.
res1 <- residuals(model1)
# ACF of residuals
acf_res <- acf(res1, lag.max = 40, plot = FALSE, na.action = na.pass)
acf_df <- data.frame(Lag = acf_res$lag[,1,1], ACF = acf_res$acf[,1,1])
p_acf <- ggplot(acf_df, aes(x = Lag, y = ACF)) +
geom_bar(stat = "identity", fill = "steelblue", width = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_hline(yintercept = c(-1.96, 1.96) / sqrt(length(res1)),
linetype = "dashed", color = "red", alpha = 0.7) +
labs(title = "ACF of OLS Residuals",
x = "Lag", y = "ACF") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# PACF of residuals
pacf_res <- pacf(res1, plot = FALSE, na.action = na.pass)
pacf_df <- data.frame(Lag = as.numeric(pacf_res$lag), PACF = as.numeric(pacf_res$acf))
p_pacf <- ggplot(pacf_df, aes(x = Lag, y = PACF)) +
geom_bar(stat = "identity", fill = "darkgreen", width = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_hline(yintercept = c(-1.96, 1.96) / sqrt(length(res1)),
linetype = "dashed", color = "red", alpha = 0.7) +
labs(title = "PACF of OLS Residuals",
x = "Lag", y = "PACF") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
grid.arrange(p_acf, p_pacf, ncol = 2)##
## Box-Ljung test
##
## data: res1
## X-squared = 23.231, df = 10, p-value = 0.009927
Interpretation:
Residual ACF is mostly near zero, but a small number of lags exceed the 95% bounds, indicating mild residual autocorrelation.
Given the large sample size, even small correlations can be statistically significant; Ljung–Box confirms residuals are not perfectly white noise.
For forecasting: upgrade to regression with ARMA errors / ARIMAX, because the residuals aren’t fully independent.
We model the regression error as an ARMA process. Since cryptocurrencies trade 7 days/week, we also test a seasonal component with period \(s=7\).
y <- data_reg$log_returns
xreg <- cbind(data_reg$lag_return, data_reg$delta_log_Volume, data_reg$intraday_range)
# Non-seasonal ARIMAX
fit_ns <- auto.arima(y, xreg = xreg, seasonal = FALSE)
# Seasonal ARIMAX (weekly seasonality)
y7 <- ts(y, frequency = 7)
fit_s <- auto.arima(y7, xreg = xreg, seasonal = TRUE)
# Model comparison
comparison <- data.frame(
Model = c("Non-Seasonal ARIMAX", "Seasonal ARIMAX (s=7)"),
AIC = c(AIC(fit_ns), AIC(fit_s)),
BIC = c(BIC(fit_ns), BIC(fit_s))
)
kable(comparison, digits = 2, caption = "Model Selection Criteria")| Model | AIC | BIC |
|---|---|---|
| Non-Seasonal ARIMAX | -15956.29 | -15899.36 |
| Seasonal ARIMAX (s=7) | -15964.46 | -15901.21 |
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,2)(0,0,1)[7] errors
## Q* = 11.901, df = 9, p-value = 0.2189
##
## Model df: 5. Total lags used: 14
Selected Model: Seasonal ARIMAX with ARIMA\((2,0,2)(0,0,1)_{7}\) errors achieves lower BIC and passes the Ljung-Box test.
Financial time series often exhibit volatility clustering—large returns tend to be followed by large returns. We test for ARCH effects in the ARIMAX residuals.
res2 <- residuals(fit_s)
res2 <- res2[!is.na(res2)]
# Visual check: ACF of squared residuals (volatility clustering)
acf_sq <- acf(res2^2, lag.max = 40, plot = FALSE, na.action = na.pass)
acf_sq_df <- data.frame(Lag = as.numeric(acf_sq$lag[,1,1]),
ACF = as.numeric(acf_sq$acf[,1,1]))
ggplot(acf_sq_df, aes(x = Lag, y = ACF)) +
geom_bar(stat = "identity", fill = "steelblue", width = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_hline(yintercept = c(-1.96, 1.96) / sqrt(length(res2)),
linetype = "dashed", color = "red", alpha = 0.7) +
labs(title = "ACF of Squared Residuals",
subtitle = "Significant spikes indicate volatility clustering / ARCH effects",
x = "Lag", y = "ACF") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: res2
## Chi-squared = 197.26, df = 7, p-value < 2.2e-16
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: res2
## Chi-squared = 209.63, df = 14, p-value < 2.2e-16
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: res2
## Chi-squared = 220.49, df = 21, p-value < 2.2e-16
Conclusion: Strong ARCH effects detected (p < 0.05). We proceed with GARCH modeling.
We use an eGARCH(1,1) model with skewed Student-t innovations to capture:
# eGARCH specification
spec <- ugarchspec(
variance.model = list(model = "eGARCH", garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
distribution.model = "sstd"
)
# Fit on ARIMAX residuals
fit_vol <- ugarchfit(spec, data = res2, solver = "hybrid")
show(fit_vol)##
## *---------------------------------*
## * GARCH Model Fit *
## *---------------------------------*
##
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : eGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : sstd
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## omega -0.098688 0.032346 -3.05099 0.002281
## alpha1 -0.008444 0.014671 -0.57557 0.564906
## beta1 0.984950 0.005059 194.67468 0.000000
## gamma1 0.226942 0.039111 5.80253 0.000000
## skew 1.139943 0.017382 65.58249 0.000000
## shape 2.879734 0.154490 18.64026 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## omega -0.098688 0.059388 -1.66176 0.096560
## alpha1 -0.008444 0.023182 -0.36426 0.715665
## beta1 0.984950 0.009557 103.05761 0.000000
## gamma1 0.226942 0.078354 2.89635 0.003775
## skew 1.139943 0.015684 72.68141 0.000000
## shape 2.879734 0.172840 16.66129 0.000000
##
## LogLikelihood : 8856.944
##
## Information Criteria
## ------------------------------------
##
## Akaike -4.2893
## Bayes -4.2801
## Shibata -4.2893
## Hannan-Quinn -4.2860
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 12.10 0.0005051
## Lag[2*(p+q)+(p+q)-1][2] 12.78 0.0003522
## Lag[4*(p+q)+(p+q)-1][5] 14.57 0.0005797
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 2.006 0.1566
## Lag[2*(p+q)+(p+q)-1][5] 2.820 0.4408
## Lag[4*(p+q)+(p+q)-1][9] 3.278 0.7122
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 1.025 0.500 2.000 0.3113
## ARCH Lag[5] 1.190 1.440 1.667 0.6776
## ARCH Lag[7] 1.311 2.315 1.543 0.8584
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 3.306
## Individual Statistics:
## omega 0.1661
## alpha1 1.0930
## beta1 0.1642
## gamma1 1.4742
## skew 0.4420
## shape 0.1234
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 2.116 0.03442 **
## Negative Sign Bias 1.294 0.19565
## Positive Sign Bias 1.291 0.19682
## Joint Effect 4.795 0.18740
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 23.06 0.23468
## 2 30 38.04 0.12140
## 3 40 49.12 0.12853
## 4 50 63.20 0.08361
##
##
## Elapsed time : 0.345021
Interpretation:
The eGARCH(1,1) model with skewed Student-\(t\) innovations fits volatility well: volatility is highly persistent \((\beta_1 = 0.985,\ p < 0.001)\) and there is sign asymmetry/leverage \((\gamma_1 = 0.227,\ \text{robust } p = 0.0038)\). The shock-magnitude term is not significant \((\alpha_1,\ p > 0.7)\). The skewed-\(t\) parameters are strongly significant (skew \(= 1.14\), shape \(= 2.88\)), confirming skewness and fat tails.
There is no remaining ARCH/volatility clustering (Ljung–Box on squared residuals \(p = 0.16\text{–}0.71\); ARCH–LM \(p = 0.31\text{–}0.86\)), but standardized residuals are still autocorrelated (Ljung–Box \(p < 0.001\)), suggesting leftover mean dynamics not captured in this specific fit. Parameter stability is a concern: the Nyblom joint test rejects stability \((3.306 > 2.12)\), especially for \(\alpha_1\) and \(\gamma_1\). Sign-bias is mildly significant \((p = 0.034)\), while overall distributional fit is acceptable (Pearson GoF \(p \approx 0.08\text{–}0.23\)).
We combine the mean model (seasonal ARIMAX) and volatility model (eGARCH) to generate forecasts with proper uncertainty quantification.
# Forecast horizon
h <- 5
# Future regressors (use last known values as proxy)
last_values <- tail(data_reg, 1)
xreg_future <- matrix(
rep(c(last_values$lag_return, last_values$delta_log_Volume, last_values$intraday_range), h),
nrow = h, ncol = 3, byrow = TRUE
)
# Forecast mean returns
fc_mean <- forecast(fit_s, h = h, xreg = xreg_future)
mu_hat <- as.numeric(fc_mean$mean)
# Forecast volatility
res_all <- as.numeric(residuals(fit_s))
res_all <- res_all[is.finite(res_all)]
spec <- ugarchspec(
variance.model = list(model = "eGARCH", garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
distribution.model = "sstd"
)
fit_vol <- ugarchfit(spec, data = res_all, solver = "hybrid")
fc_vol <- ugarchforecast(fit_vol, n.ahead = h)
sigma_hat <- as.numeric(sigma(fc_vol))# Create forecast table
last_date <- max(data_reg$Date)
forecast_dates <- last_date + 1:h
forecast_results <- data.frame(
Date = forecast_dates,
Forecasted_Return = mu_hat,
Forecasted_Volatility = sigma_hat,
Lower_95 = mu_hat - 1.96 * sigma_hat,
Upper_95 = mu_hat + 1.96 * sigma_hat
)
kable(forecast_results, digits = 6, caption = "Return Forecasts with 95% Confidence Intervals")| Date | Forecasted_Return | Forecasted_Volatility | Lower_95 | Upper_95 |
|---|---|---|---|---|
| 2026-01-06 | 0.014201 | 0.023308 | -0.031482 | 0.059885 |
| 2026-01-07 | 0.002576 | 0.023477 | -0.043440 | 0.048591 |
| 2026-01-08 | 0.002770 | 0.023645 | -0.043574 | 0.049114 |
| 2026-01-09 | 0.002733 | 0.023811 | -0.043938 | 0.049403 |
| 2026-01-10 | 0.002592 | 0.023976 | -0.044401 | 0.049586 |
# Plot 1: Return forecasts with confidence intervals
p1 <- ggplot(forecast_results, aes(x = Date)) +
geom_ribbon(aes(ymin = Lower_95, ymax = Upper_95), alpha = 0.3, fill = "steelblue") +
geom_line(aes(y = Forecasted_Return), color = "steelblue", linewidth = 1.2) +
geom_point(aes(y = Forecasted_Return), color = "steelblue", size = 3) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red", alpha = 0.7) +
labs(title = "5-Day Log Return Forecast",
subtitle = "Shaded region: 95% confidence interval",
x = "Date",
y = "Forecasted Log Return") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Plot 2: Volatility forecast
p2 <- ggplot(forecast_results, aes(x = Date, y = Forecasted_Volatility)) +
geom_line(color = "darkgreen", linewidth = 1.2) +
geom_point(color = "darkgreen", size = 3) +
geom_area(alpha = 0.3, fill = "darkgreen") +
labs(title = "5-Day Volatility Forecast",
subtitle = "Forecasted standard deviation of returns",
x = "Date",
y = "Forecasted Volatility (σ)") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
grid.arrange(p1, p2, ncol = 1)# Show recent returns with forecast
recent_data <- tail(data_reg, 30)
recent_returns <- data.frame(
Date = recent_data$Date,
Return = recent_data$log_returns,
Type = "Historical"
)
forecast_data <- data.frame(
Date = forecast_results$Date,
Return = forecast_results$Forecasted_Return,
Type = "Forecast"
)
combined <- rbind(recent_returns, forecast_data)
ggplot(combined, aes(x = Date, y = Return, color = Type)) +
geom_line(linewidth = 0.8) +
geom_point(size = 2) +
geom_ribbon(data = forecast_results,
aes(x = Date, ymin = Lower_95, ymax = Upper_95),
inherit.aes = FALSE, alpha = 0.2, fill = "coral") +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
scale_color_manual(values = c("Historical" = "steelblue", "Forecast" = "coral")) +
labs(title = "Recent History and 5-Day Forecast",
subtitle = "With 95% confidence interval for forecasts",
x = "Date",
y = "Log Return") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "bottom")| Aspect | Result |
|---|---|
| Data | 4129 observations covering 2014-09-17 to 2026-01-05 |
| Stationarity | Price levels non-stationary; log returns are stationary (ADF/KPSS confirmed) |
| Mean Model | Seasonal ARIMAX(2,0,2)(1,0,0)₇ with external regressors |
| Volatility Model | eGARCH(1,1) with skewed Student-t distribution |
| ARCH Effects | Strong evidence of volatility clustering (p < 0.001) |
Preliminary Analysis: Loaded and cleaned data, confirmed Bitcoin trades 7 days/week, identified non-stationarity in price levels
Model Creation:
Forecasting: Generated 5-day ahead predictions combining mean and volatility forecasts with 95% confidence intervals