Este documento presenta el análisis exploratorio completo del Crypto EDA Dashboard, un proyecto académico que consume datos en tiempo real de la API de CryptoCompare para analizar el comportamiento histórico de 10 criptomonedas.
El análisis está orientado hacia la construcción de un modelo predictivo de precios basado en ARIMA, cubriendo desde la obtención y limpieza de datos hasta el ajuste, validación y predicción con series de tiempo.
Repositorio: https://github.com/Mateo3008/Crypto_EDA_App
Framework: R Shiny + shinydashboard
Fuente de datos: CryptoCompare API (OHLCV diario, ~1905 días por moneda)
cryptos <- tibble(
Nombre = c("Bitcoin","Ethereum","USD Coin","Solana","XRP",
"Bittensor","Tether","Dogecoin","USD1","Zcash"),
Símbolo = c("BTC","ETH","USDC","SOL","XRP",
"TAO","USDT","DOGE","USD1","ZEC"),
Categoría = c("Store of Value","Smart Contract","Stablecoin","Smart Contract",
"Payments","AI / Subnet","Stablecoin","Meme","Stablecoin","Privacy")
)
kable(cryptos, caption = "Criptomonedas incluidas en el análisis",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = FALSE)| Nombre | Símbolo | Categoría |
|---|---|---|
| Bitcoin | BTC | Store of Value |
| Ethereum | ETH | Smart Contract |
| USD Coin | USDC | Stablecoin |
| Solana | SOL | Smart Contract |
| XRP | XRP | Payments |
| Bittensor | TAO | AI / Subnet |
| Tether | USDT | Stablecoin |
| Dogecoin | DOGE | Meme |
| USD1 | USD1 | Stablecoin |
| Zcash | ZEC | Privacy |
El objetivo principal es aplicar técnicas de Análisis Exploratorio de Datos (EDA) sobre series temporales financieras para:
API_KEY <- "ce6e922820dabbb917d5f6fd82b867726fbf320cf3f7414b33748c19e9514aae"
BASE_URL <- "https://min-api.cryptocompare.com/data"
CRYPTOS <- c(
"Bitcoin" = "BTC", "Ethereum" = "ETH", "USD Coin" = "USDC",
"Solana" = "SOL", "XRP" = "XRP", "Bittensor" = "TAO",
"Tether" = "USDT","Dogecoin" = "DOGE","USD1" = "USD1",
"Zcash" = "ZEC"
)
COLOR_PALETTE <- c(
"#2E86AB","#A23B72","#F18F01","#C73E1D","#6A994E",
"#BC4A6C","#3D5A80","#EE6C4D","#98C1D9","#293241"
)
names(COLOR_PALETTE) <- CRYPTOSLa función get_historical_daily() consulta el endpoint /v2/histoday de CryptoCompare y retorna un data.frame con columnas OHLCV, retornos simples, logarítmicos y volatilidad diaria:
get_historical_daily <- function(fsym, tsym = "USD", limit = 1905) {
url <- paste0(BASE_URL, "/v2/histoday?fsym=", fsym, "&tsym=", tsym,
"&limit=", limit, "&api_key=", API_KEY)
url <- URLencode(url)
tryCatch({
resp <- GET(url, timeout(30))
if (http_error(resp)) return(NULL)
data <- fromJSON(content(resp, "text", encoding = "UTF-8"))
if (is.null(data$Data$Data)) return(NULL)
df <- as.data.frame(data$Data$Data) |>
mutate(
fecha = as.Date(as.POSIXct(time, origin = "1970-01-01")),
simbolo = fsym,
open = as.numeric(open),
high = as.numeric(high),
low = as.numeric(low),
close = as.numeric(close),
volume = as.numeric(volumefrom),
retorno = (close - lag(close)) / lag(close) * 100,
retorno_log = log(close / lag(close)) * 100,
volatilidad = abs(high - low) / open * 100
) |>
filter(!is.na(retorno))
return(df)
}, error = function(e) return(NULL))
}get_price_overview <- function(fsyms, tsym = "USD") {
fsyms_str <- paste(fsyms, collapse = ",")
url <- paste0(BASE_URL, "/pricemultifull?fsyms=", fsyms_str,
"&tsyms=", tsym, "&api_key=", API_KEY)
url <- URLencode(url)
tryCatch({
resp <- GET(url, timeout(30))
if (http_error(resp)) return(NULL)
data <- fromJSON(content(resp, "text", encoding = "UTF-8"))
if (is.null(data$RAW)) return(NULL)
rows <- list()
for (sym in fsyms) {
if (!is.null(data$RAW[[sym]]) && !is.null(data$RAW[[sym]][[tsym]])) {
d <- data$RAW[[sym]][[tsym]]
rows[[sym]] <- data.frame(
simbolo = sym,
precio = d$PRICE,
cambio_24h_pct = d$CHANGEPCT24HOUR,
volumen_24h = d$VOLUME24HOURTO,
cap_mercado = d$MKTCAP,
stringsAsFactors = FALSE
)
}
}
if (length(rows) == 0) return(NULL)
return(bind_rows(rows))
}, error = function(e) return(NULL))
}## === CARGANDO DATOS (10 monedas, 1905 días) ===
hist_data <- NULL
for (crypto in CRYPTOS) {
cat("Cargando", crypto, "... ")
data <- get_historical_daily(crypto, limit = 1905)
if (!is.null(data) && nrow(data) > 0) {
hist_data <- bind_rows(hist_data, data)
cat("OK (", nrow(data), "días)\n")
} else {
cat("FALLÓ — generando datos de ejemplo\n")
}
Sys.sleep(0.3)
}## Cargando BTC ... OK ( 1905 días)
## Cargando ETH ... OK ( 1905 días)
## Cargando USDC ... OK ( 1905 días)
## Cargando SOL ... OK ( 1905 días)
## Cargando XRP ... OK ( 1905 días)
## Cargando TAO ... OK ( 702 días)
## Cargando USDT ... OK ( 1905 días)
## Cargando DOGE ... OK ( 1905 días)
## Cargando USD1 ... OK ( 326 días)
## Cargando ZEC ... OK ( 1905 días)
# Fallback: datos sintéticos si la API no responde
if (is.null(hist_data) || nrow(hist_data) == 0) {
set.seed(123)
fechas <- seq.Date(as.Date("2019-01-01"), as.Date("2024-04-10"), by = "day")
for (sym in CRYPTOS) {
precio_base <- switch(sym,
"BTC"=50000,"ETH"=3000,"USDC"=1,"SOL"=100,"XRP"=0.5,
"TAO"=300,"USDT"=1,"DOGE"=0.1,"USD1"=1,"ZEC"=30, 1000)
trend <- seq(0, by=0.0002, length.out=length(fechas)) * precio_base
noise <- cumsum(rnorm(length(fechas), 0, precio_base * 0.015))
close <- pmax(precio_base + trend + noise, precio_base * 0.1)
df <- data.frame(
fecha = fechas, simbolo = sym, close = close,
open = c(close[1], close[-length(close)]),
high = close + abs(rnorm(length(fechas), 0, close * 0.02)),
low = close - abs(rnorm(length(fechas), 0, close * 0.02)),
retorno = c(0, diff(close)/close[-length(close)] * 100),
retorno_log = c(0, diff(log(close)) * 100),
volatilidad = runif(length(fechas), 1, 6)
)
hist_data <- bind_rows(hist_data, df)
}
}
prices_overview <- get_price_overview(CRYPTOS)
if (is.null(prices_overview)) {
prices_overview <- data.frame(
simbolo = names(CRYPTOS),
precio = c(50000,3000,1,100,0.5,300,1,0.1,1,30),
cambio_24h_pct = runif(10,-5,5),
volumen_24h = runif(10,1e8,1e10),
cap_mercado = c(1e12,4e11,5e10,3e10,1e10,5e9,8e10,2e10,4e9,1e9)
)
}
cat("\n✅ Total filas:", nrow(hist_data))##
## ✅ Total filas: 16268
##
## 📅 Período: 2021-02-19 → 2026-05-08
##
## 🪙 Monedas: BTC, ETH, USDC, SOL, XRP, TAO, USDT, DOGE, USD1, ZEC
missing_summary <- function(df) {
df |>
summarise(across(everything(), ~ sum(is.na(.)))) |>
pivot_longer(everything(), names_to = "Variable", values_to = "NAs") |>
mutate(Pct = round(NAs / nrow(df) * 100, 2)) |>
filter(NAs > 0)
}
ms_global <- hist_data |>
group_by(simbolo) |>
summarise(
Total_filas = n(),
NAs_close = sum(is.na(close)),
NAs_retorno = sum(is.na(retorno)),
NAs_vol = sum(is.na(volatilidad)),
Pct_NA_close = round(NAs_close / Total_filas * 100, 2)
)
kable(ms_global, caption = "Resumen de valores faltantes por moneda",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)| simbolo | Total_filas | NAs_close | NAs_retorno | NAs_vol | Pct_NA_close |
|---|---|---|---|---|---|
| BTC | 1905 | 0 | 0 | 0 | 0 |
| DOGE | 1905 | 0 | 0 | 0 | 0 |
| ETH | 1905 | 0 | 0 | 0 | 0 |
| SOL | 1905 | 0 | 0 | 0 | 0 |
| TAO | 702 | 0 | 0 | 0 | 0 |
| USD1 | 326 | 0 | 0 | 0 | 0 |
| USDC | 1905 | 0 | 0 | 0 | 0 |
| USDT | 1905 | 0 | 0 | 0 | 0 |
| XRP | 1905 | 0 | 0 | 0 | 0 |
| ZEC | 1905 | 0 | 0 | 0 | 0 |
cols <- c("close","retorno","retorno_log","volatilidad")
df_heat <- hist_data |>
group_by(simbolo) |>
summarise(across(all_of(intersect(cols, names(hist_data))),
~ sum(is.na(.)) / n() * 100, .names = "{.col}")) |>
pivot_longer(-simbolo, names_to = "Variable", values_to = "pct_na")
ggplot(df_heat, aes(x = Variable, y = simbolo, fill = pct_na)) +
geom_tile(color = "white", linewidth = 0.5) +
geom_text(aes(label = paste0(round(pct_na, 1), "%")), size = 3) +
scale_fill_gradient(low = "white", high = "#e74c3c", name = "% NA") +
labs(title = "Porcentaje de NAs por moneda y variable",
x = NULL, y = NULL) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 30, hjust = 1))Figure 3.1: Porcentaje de valores faltantes por moneda y variable
handle_missing <- function(df, method = "interpolation") {
cols_imp <- c("close","open","high","low","volume","retorno","retorno_log","volatilidad")
if (method == "remove") return(na.omit(df))
if (method == "interpolation") {
df <- df |> arrange(fecha)
for (col in intersect(cols_imp, names(df))) {
df[[col]] <- na.approx(df[[col]], na.rm = FALSE)
df[[col]] <- na.locf(df[[col]], na.rm = FALSE)
df[[col]] <- na.locf(df[[col]], fromLast = TRUE, na.rm = FALSE)
}
return(df)
}
if (method == "mean") {
for (col in intersect(cols_imp, names(df))) {
df[[col]][is.na(df[[col]])] <- mean(df[[col]], na.rm = TRUE)
}
return(df)
}
return(df)
}Se implementaron tres métodos de manejo de valores faltantes:
zoo::na.approx() seguido de propagación hacia adelante/atrás con na.locf().na.omit().prices_overview |>
mutate(
precio = dollar(precio, accuracy = 0.01),
cambio_24h_pct = paste0(round(cambio_24h_pct, 2), "%"),
volumen_24h = dollar(volumen_24h, accuracy = 1, scale = 1e-6, suffix = "M"),
cap_mercado = dollar(cap_mercado, accuracy = 1, scale = 1e-9, suffix = "B")
) |>
rename(
Símbolo = simbolo, `Precio USD` = precio, `Cambio 24h` = cambio_24h_pct,
`Volumen 24h` = volumen_24h, `Cap. Mercado` = cap_mercado
) |>
kable(caption = "Visión general del mercado en tiempo real",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)| Símbolo | Precio USD | Cambio 24h | Volumen 24h | Cap. Mercado |
|---|---|---|---|---|
| BTC | $79,577.11 | -1.56% | $1,566M | $1,594B |
| ETH | $2,278.34 | -1.88% | $651M | $275B |
| USDC | $1.00 | 0.01% | $220M | $78B |
| SOL | $88.21 | 0.51% | $155M | $55B |
| XRP | $1.38 | -1.53% | $114M | $138B |
| TAO | $302.82 | -1.18% | $21M | $3B |
| USDT | $1.00 | 0% | $382M | $195B |
| DOGE | $0.11 | -3.5% | $55M | $18B |
| USD1 | $1.00 | -0.01% | $123M | $4B |
| ZEC | $576.30 | 6.98% | $87M | $10B |
df_cap <- prices_overview |>
mutate(simbolo = reorder(simbolo, cap_mercado))
ggplot(df_cap, aes(x = simbolo, y = cap_mercado / 1e9, fill = simbolo)) +
geom_col(show.legend = FALSE, alpha = 0.85) +
geom_text(aes(label = paste0("$", round(cap_mercado / 1e9, 1), "B")),
hjust = -0.1, size = 3.5) +
coord_flip() +
scale_fill_manual(values = COLOR_PALETTE) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15)),
labels = dollar_format(suffix = "B", scale = 1)) +
labs(title = "Capitalización de Mercado",
x = NULL, y = "Miles de millones USD") +
theme_minimal(base_size = 12)Figure 4.1: Capitalización de mercado por criptomoneda (miles de millones USD)
df_vol <- prices_overview |>
mutate(simbolo = reorder(simbolo, volumen_24h))
ggplot(df_vol, aes(x = simbolo, y = volumen_24h / 1e6, fill = simbolo)) +
geom_col(show.legend = FALSE, alpha = 0.85) +
coord_flip() +
scale_fill_manual(values = COLOR_PALETTE) +
scale_y_continuous(labels = dollar_format(suffix = "M", scale = 1)) +
labs(title = "Volumen 24h", x = NULL, y = "Millones USD") +
theme_minimal(base_size = 12)Figure 4.2: Volumen de transacciones en las últimas 24 horas
df_365 <- hist_data |>
filter(fecha >= max(fecha) - 365)
ggplot(df_365, aes(x = simbolo, y = close, fill = simbolo)) +
geom_boxplot(alpha = 0.7, outlier.color = "#e74c3c", outlier.size = 1.5) +
stat_summary(fun = "mean", geom = "point", shape = 18, size = 4, color = "white") +
scale_y_log10(labels = dollar) +
scale_fill_manual(values = COLOR_PALETTE) +
coord_flip() +
labs(title = "Distribución de Precios de Cierre (últimos 365 días)",
x = NULL, y = "Precio USD (escala log)") +
theme_minimal(base_size = 12) +
theme(legend.position = "none")Figure 5.1: Distribución de precios de cierre (escala logarítmica) — últimos 365 días
df_btc <- hist_data |> filter(simbolo == "BTC") |> arrange(fecha)
ggplot(df_btc, aes(x = fecha, y = close)) +
geom_line(color = "#F7931A", linewidth = 0.7) +
scale_y_continuous(labels = dollar) +
labs(title = "Bitcoin — Precio de Cierre Histórico",
x = NULL, y = "Precio (USD)") +
theme_minimal(base_size = 12)Figure 5.2: Serie temporal del precio de cierre de Bitcoin (BTC)
df_candle <- df_btc |> tail(60)
plot_ly(df_candle, x = ~fecha, type = "candlestick",
open = ~open, close = ~close,
high = ~high, low = ~low,
increasing = list(line = list(color = "#2ecc71")),
decreasing = list(line = list(color = "#e74c3c"))) |>
layout(title = "BTC — Últimas 60 Velas Diarias",
xaxis = list(title = ""),
yaxis = list(title = "Precio (USD)"),
paper_bgcolor = "white",
plot_bgcolor = "white")Figure 5.3: Gráfico de velas japonesas — BTC, últimos 60 días
Las Bandas de Bollinger son un indicador de volatilidad que envuelve el precio alrededor de una media móvil simple (SMA).
BB± = SMAₙ ± k · σₙ
%Bandwidth = (BB₊ − BB₋) / SMAₙ × 100
calculate_bollinger_bands <- function(prices, window = 20, sd_mult = 2) {
sma <- rollmean(prices, window, fill = NA, align = "right")
sd <- rollapply(prices, window, sd, fill = NA, align = "right")
data.frame(
sma = sma,
upper = sma + (sd_mult * sd),
lower = sma - (sd_mult * sd)
)
}bb <- calculate_bollinger_bands(df_btc$close, window = 20, sd_mult = 2)
df_bb <- cbind(df_btc, bb) |> drop_na(sma, upper, lower) |> tail(365)
ggplot(df_bb, aes(x = fecha)) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "#3498db", alpha = 0.18) +
geom_line(aes(y = close, color = "Precio"), linewidth = 0.8) +
geom_line(aes(y = sma, color = "SMA 20"), linetype = "dashed", linewidth = 0.7) +
geom_line(aes(y = upper, color = "Banda Superior"), linetype = "dotted", linewidth = 0.6) +
geom_line(aes(y = lower, color = "Banda Inferior"), linetype = "dotted", linewidth = 0.6) +
scale_color_manual(values = c(
"Precio" = "#e94560", "SMA 20" = "#F7931A",
"Banda Superior" = "#2ecc71", "Banda Inferior" = "#2ecc71"
)) +
scale_y_continuous(labels = dollar) +
labs(title = "Bandas de Bollinger — BTC (SMA 20, k = 2)",
x = NULL, y = "Precio (USD)", color = NULL) +
theme_minimal(base_size = 12) +
theme(legend.position = "bottom")Figure 5.4: Bandas de Bollinger — BTC (SMA 20, k=2)
df_bb <- df_bb |>
mutate(bw = (upper - lower) / sma * 100)
ggplot(df_bb, aes(x = fecha, y = bw)) +
geom_line(color = "#627EEA", linewidth = 0.8) +
geom_area(fill = "#627EEA", alpha = 0.15) +
labs(title = "Ancho de Banda Relativo (%Bandwidth)",
x = NULL, y = "%Bandwidth") +
theme_minimal(base_size = 12)Figure 5.5: Ancho de banda relativo — medida de volatilidad de Bollinger
Los retornos se calculan como:
Retorno simple: rₜ = (Pₜ − Pₜ₋₁) / Pₜ₋₁ × 100
Retorno logarítmico: rₜˡᵒᵍ = ln(Pₜ / Pₜ₋₁) × 100
ggplot(df_365, aes(x = simbolo, y = retorno, fill = simbolo)) +
geom_boxplot(alpha = 0.7, outlier.color = "#e74c3c", outlier.size = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
scale_fill_manual(values = COLOR_PALETTE) +
coord_flip() +
labs(title = "Distribución de Retornos Diarios",
x = NULL, y = "Retorno (%)") +
theme_minimal(base_size = 12) +
theme(legend.position = "none")Figure 6.1: Distribución de retornos diarios por criptomoneda (últimos 365 días)
ggplot(df_btc, aes(x = retorno, fill = after_stat(x) > 0)) +
geom_histogram(bins = 60, alpha = 0.85) +
scale_fill_manual(values = c("TRUE" = "#2ecc71", "FALSE" = "#e74c3c"),
labels = c("TRUE" = "Positivo", "FALSE" = "Negativo"),
name = NULL) +
geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
labs(title = "Histograma de Retornos Diarios — BTC",
x = "Retorno (%)", y = "Frecuencia") +
theme_minimal(base_size = 12) +
theme(legend.position = "top")Figure 6.2: Histograma de retornos diarios de BTC
df_btc_vol <- df_btc |>
arrange(fecha) |>
mutate(vol30 = zoo::rollapply(retorno, 30, sd, fill = NA, align = "right"))
ggplot(df_btc_vol, aes(x = fecha, y = vol30)) +
geom_line(color = "#e94560", linewidth = 0.7) +
geom_area(fill = "#e94560", alpha = 0.15) +
labs(title = "Volatilidad Rodante 30 días — BTC",
x = NULL, y = "Desv. Est. Retorno (%)") +
theme_minimal(base_size = 12)Figure 6.3: Volatilidad rodante de 30 días para BTC
tabla_riesgo <- df_365 |>
group_by(simbolo) |>
summarise(
`Ret. Medio (%)` = round(mean(retorno, na.rm = TRUE), 3),
`Desv. Est. (%)` = round(sd(retorno, na.rm = TRUE), 3),
`VaR 95% (%)` = round(quantile(retorno, 0.05, na.rm = TRUE), 3),
`Días pos. (%)` = round(sum(retorno > 0, na.rm = TRUE) / n() * 100, 1)
) |>
arrange(desc(`Ret. Medio (%)`))
kable(tabla_riesgo,
caption = "Métricas de retorno y riesgo — últimos 365 días",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
column_spec(4, color = ifelse(tabla_riesgo$`VaR 95% (%)` < -3, "red", "black"))| simbolo | Ret. Medio (%) | Desv. Est. (%) | VaR 95% (%) | Días pos. (%) |
|---|---|---|---|---|
| USD1 | Inf | NaN | -0.100 | 38.7 |
| ZEC | 1.015 | 7.893 | -8.682 | 50.8 |
| ETH | 0.133 | 3.767 | -5.189 | 51.1 |
| TAO | 0.080 | 5.225 | -7.736 | 46.2 |
| USDC | 0.000 | 0.008 | -0.010 | 21.9 |
| USDT | 0.000 | 0.040 | -0.100 | 19.9 |
| BTC | -0.029 | 2.230 | -3.533 | 49.7 |
| DOGE | -0.031 | 4.488 | -6.072 | 44.5 |
| XRP | -0.053 | 3.599 | -4.914 | 43.2 |
| SOL | -0.067 | 3.827 | -5.748 | 49.2 |
El VaR 95% representa la pérdida máxima esperada en el 5% de los peores días históricos. Un valor de -5% indica que en el 5% de los días más adversos, la pérdida fue de al menos 5%.
df_wide <- df_365 |>
select(fecha, simbolo, retorno) |>
pivot_wider(names_from = simbolo, values_from = retorno) |>
select(-fecha)
mat_cor <- cor(df_wide, use = "complete.obs", method = "pearson")
df_cor_long <- as.data.frame(as.table(mat_cor)) |>
rename(Var1 = Var1, Var2 = Var2, Corr = Freq)
ggplot(df_cor_long, aes(x = Var1, y = Var2, fill = Corr)) +
geom_tile(color = "white", linewidth = 0.4) +
geom_text(aes(label = round(Corr, 2)), size = 3, color = "black") +
scale_fill_gradient2(low = "#3498db", mid = "white", high = "#e74c3c",
midpoint = 0, limits = c(-1, 1), name = "Pearson r") +
labs(title = "Matriz de Correlación — Retornos Diarios (Pearson)",
x = NULL, y = NULL) +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Figure 7.1: Mapa de calor de correlaciones de Pearson entre retornos diarios (últimos 365 días)
df_scatter <- df_365 |>
filter(simbolo %in% c("BTC", "ETH")) |>
select(fecha, simbolo, retorno) |>
pivot_wider(names_from = simbolo, values_from = retorno) |>
drop_na()
ggplot(df_scatter, aes(x = BTC, y = ETH)) +
geom_point(alpha = 0.4, color = "#627EEA", size = 1.5) +
geom_smooth(method = "lm", se = TRUE, color = "#e94560", linewidth = 1) +
labs(title = "Retornos Diarios: BTC vs ETH",
x = "Retorno BTC (%)", y = "Retorno ETH (%)") +
theme_minimal(base_size = 12)Figure 7.2: Diagrama de dispersión BTC vs ETH — retornos diarios
df_comp <- hist_data |>
filter(fecha >= max(fecha) - 365) |>
arrange(fecha) |>
group_by(simbolo) |>
mutate(ini = first(close), norm = close / ini * 100) |>
ungroup()
ggplot(df_comp, aes(x = fecha, y = norm, color = simbolo)) +
geom_line(linewidth = 0.8, alpha = 0.9) +
geom_hline(yintercept = 100, linetype = "dashed", color = "gray50") +
scale_color_manual(values = COLOR_PALETTE) +
labs(title = "Rendimiento Acumulado — Base 100 (últimos 365 días)",
x = NULL, y = "Índice (base = 100)", color = "Moneda") +
theme_minimal(base_size = 12) +
theme(legend.position = "right")Figure 8.1: Rendimiento acumulado normalizado en base 100 (desde el inicio del período)
tabla_comp <- hist_data |>
filter(fecha >= max(fecha) - 365) |>
group_by(simbolo) |>
summarise(
`P. Inicial ($)` = round(first(close), 4),
`P. Final ($)` = round(last(close), 4),
`Rend. (%)` = round((last(close) - first(close)) / first(close) * 100, 2),
`Vol. (%)` = round(sd(retorno, na.rm = TRUE), 3)
) |>
arrange(desc(`Rend. (%)`))
kable(tabla_comp,
caption = "Comparativa de rendimiento — últimos 365 días",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
column_spec(4, color = ifelse(tabla_comp$`Rend. (%)` >= 0, "#2ecc71", "#e74c3c"),
bold = TRUE)| simbolo | P. Inicial (\() </th> <th style="text-align:right;"> P. Final (\)) | Rend. (%) | Vol. (%) | |
|---|---|---|---|---|
| ZEC | 41.8800 | 576.3000 | 1276.07 | 7.893 |
| ETH | 2207.1900 | 2278.8100 | 3.24 | 3.767 |
| USDC | 1.0000 | 0.9999 | -0.01 | 0.008 |
| USDT | 0.9999 | 0.9997 | -0.02 | 0.040 |
| USD1 | 0.9999 | 0.9993 | -0.06 | NaN |
| BTC | 103259.0000 | 79572.9500 | -22.94 | 2.230 |
| TAO | 423.2000 | 302.8200 | -28.45 | 5.225 |
| XRP | 2.3270 | 1.3840 | -40.52 | 3.599 |
| DOGE | 0.1982 | 0.1064 | -46.32 | 4.488 |
| SOL | 164.4400 | 88.2200 | -46.35 | 3.827 |
df_btc_mes <- df_btc |>
mutate(mes = floor_date(fecha, "month"),
mes_label = format(mes, "%b %Y"))
ggplot(df_btc_mes, aes(x = reorder(mes_label, mes), y = retorno)) +
geom_boxplot(fill = "#3498db", alpha = 0.7, outlier.size = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray40") +
labs(title = "Retornos Diarios de BTC por Mes",
x = NULL, y = "Retorno (%)") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7))Figure 9.1: Boxplot mensual de retornos diarios de BTC
Las funciones de autocorrelación son fundamentales para identificar los órdenes del modelo ARIMA:
serie_btc <- na.omit(df_btc$retorno)
ci_val <- qnorm(0.975) / sqrt(length(serie_btc))
acf_r <- acf(serie_btc, plot = FALSE, lag.max = 40)
pacf_r <- pacf(serie_btc, plot = FALSE, lag.max = 40)
df_acf <- data.frame(Lag = as.numeric(acf_r$lag[-1]), ACF = as.numeric(acf_r$acf[-1]))
df_pacf <- data.frame(Lag = as.numeric(pacf_r$lag), PACF = as.numeric(pacf_r$acf))
p_acf <- ggplot(df_acf, aes(x = Lag, y = ACF)) +
geom_bar(stat = "identity", fill = "#3498db", alpha = 0.7) +
geom_hline(yintercept = c(ci_val, -ci_val), linetype = "dashed", color = "blue", linewidth = 0.7) +
labs(title = "ACF — Retornos BTC") + theme_minimal(base_size = 11)
p_pacf <- ggplot(df_pacf, aes(x = Lag, y = PACF)) +
geom_bar(stat = "identity", fill = "#e74c3c", alpha = 0.7) +
geom_hline(yintercept = c(ci_val, -ci_val), linetype = "dashed", color = "blue", linewidth = 0.7) +
labs(title = "PACF — Retornos BTC") + theme_minimal(base_size = 11)
gridExtra::grid.arrange(p_acf, p_pacf, ncol = 2)Figure 9.2: ACF y PACF de los retornos diarios de BTC
stats_desc <- hist_data |>
group_by(simbolo) |>
summarise(
n = n(),
Media = round(mean(close, na.rm = TRUE), 2),
Mediana= round(median(close, na.rm = TRUE), 2),
DS = round(sd(close, na.rm = TRUE), 2),
Min = round(min(close, na.rm = TRUE), 4),
Max = round(max(close, na.rm = TRUE), 2),
IQR = round(IQR(close, na.rm = TRUE), 2)
)
kable(stats_desc,
caption = "Estadísticas descriptivas del precio de cierre por moneda",
booktabs = TRUE,
col.names = c("Símbolo","n","Media","Mediana","Desv. Est.","Mín.","Máx.","IQR")) |>
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE)| Símbolo | n | Media | Mediana | Desv. Est. | Mín. | Máx. | IQR |
|---|---|---|---|---|---|---|---|
| BTC | 1905 | 56298.29 | 50431.63 | 29296.01 | 15760.1900 | 124723.00 | 45328.37 |
| DOGE | 1905 | 0.15 | 0.12 | 0.09 | 0.0477 | 0.69 | 0.12 |
| ETH | 1905 | 2552.70 | 2436.20 | 880.85 | 994.4100 | 4830.60 | 1383.69 |
| SOL | 1905 | 100.87 | 96.85 | 68.59 | 9.6410 | 261.87 | 122.76 |
| TAO | 702 | 350.61 | 331.72 | 109.96 | 145.7400 | 713.91 | 143.12 |
| USD1 | 326 | 1.00 | 1.00 | 0.00 | 0.9986 | 1.00 | 0.00 |
| USDC | 1905 | 1.00 | 1.00 | 0.00 | 0.9679 | 1.00 | 0.00 |
| USDT | 1905 | 1.00 | 1.00 | 0.00 | 0.9940 | 1.01 | 0.00 |
| XRP | 1905 | 1.08 | 0.63 | 0.81 | 0.3072 | 3.55 | 0.93 |
| ZEC | 1905 | 102.04 | 47.96 | 112.07 | 18.3100 | 698.97 | 105.23 |
La prueba Augmented Dickey-Fuller (ADF) contrasta la hipótesis nula de que la serie tiene una raíz unitaria (no es estacionaria):
test_stationarity <- function(serie) {
tryCatch({
test <- adf.test(na.omit(as.numeric(serie)), k = trunc((length(na.omit(serie)) - 1)^(1/3)))
list(
estadistico = test$statistic,
p_valor = test$p.value,
es_estacionaria = test$p.value < 0.05,
conclusion = ifelse(test$p.value < 0.05,
"Serie estacionaria (rechaza H₀)",
"Serie NO estacionaria (no rechaza H₀)")
)
}, error = function(e) list(estadistico=NA, p_valor=NA, es_estacionaria=FALSE, conclusion="Error"))
}
adf_resultados <- lapply(unique(hist_data$simbolo), function(cr) {
serie <- na.omit(hist_data[hist_data$simbolo == cr, "retorno"][[1]])
res <- test_stationarity(serie)
data.frame(
Moneda = cr,
`ADF Stat.` = round(res$estadistico, 4),
`p-valor` = round(res$p_valor, 6),
Estacionaria = ifelse(res$es_estacionaria, "✅ Sí", "❌ No"),
`d recomendado` = ifelse(res$es_estacionaria, 0, 1)
)
}) |> bind_rows()
kable(adf_resultados,
caption = "Resultados del Test ADF para retornos diarios por criptomoneda",
booktabs = TRUE,
row.names = FALSE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)| Moneda | ADF.Stat. | p.valor | Estacionaria | d.recomendado |
|---|---|---|---|---|
| BTC | NA | NA | ❌ No | | 1| |
| ETH | NA | NA | ❌ No | | 1| |
| USDC | NA | NA | ❌ No | | 1| |
| SOL | NA | NA | ❌ No | | 1| |
| XRP | NA | NA | ❌ No | | 1| |
| TAO | NA | NA | ❌ No | | 1| |
| USDT | NA | NA | ❌ No | | 1| |
| DOGE | NA | NA | ❌ No | | 1| |
| USD1 | NA | NA | ❌ No | | 1| |
| ZEC | NA | NA | ❌ No | | 1| |
La descomposición STL (Seasonal and Trend decomposition using Loess) separa la serie en:
Y(t) = T(t) + S(t) + R(t)
Donde: T(t) = Tendencia · S(t) = Estacionalidad · R(t) = Residuo
serie_stl <- ts(na.omit(df_btc$retorno), frequency = 7)
decomp <- stl(serie_stl, s.window = "periodic", robust = TRUE)
autoplot(decomp) +
labs(title = "Descomposición STL — Retornos BTC") +
theme_minimal(base_size = 12)Figure 9.3: Descomposición STL de los retornos de BTC (frecuencia semanal)
El modelo ARIMA(p, d, q) combina tres componentes:
| Componente | Símbolo | Descripción |
|---|---|---|
| Autoregresivo | AR(p) | Dependencia en p rezagos pasados |
| Integrado | I(d) | Diferenciación para estacionariedad |
| Media Móvil | MA(q) | Dependencia en q errores pasados |
Ecuación general:
φ(B)(1−B)ᵈ yₜ = θ(B) εₜ
Donde B es el operador de rezago, φ(B) el polinomio AR, θ(B) el polinomio MA y εₜ ruido blanco.
detect_stationarity_and_differentiate <- function(serie, max_d = 2) {
serie_clean <- as.numeric(na.omit(serie))
d_val <- 0
current <- serie_clean
for (i in seq_len(max_d)) {
tryCatch({
test <- adf.test(current, k = trunc((length(current)-1)^(1/3)))
if (test$p.value < 0.05) break
d_val <- i
current <- diff(current)
}, error = function(e) break)
}
list(is_stationary = d_val == 0, d_value = d_val)
}set.seed(42)
df_btc_clean <- hist_data |>
filter(simbolo == "BTC") |>
arrange(fecha) |>
handle_missing(method = "interpolation")
serie_full <- na.omit(df_btc_clean$close)
n <- length(serie_full)
n_train <- floor(n * 0.8)
train <- serie_full[1:n_train]
test <- serie_full[(n_train + 1):n]
fechas_all <- df_btc_clean$fecha[!is.na(df_btc_clean$close)]
train_dates <- fechas_all[1:n_train]
test_dates <- fechas_all[(n_train + 1):n]
# Detectar d óptimo
stat_res <- detect_stationarity_and_differentiate(train)
d_opt <- stat_res$d_value
cat("d óptimo detectado:", d_opt, "\n")## d óptimo detectado: 1
# Búsqueda de mejor (p, q) por AIC
pq_rng <- 0:3
best_aic <- Inf
best_order <- c(1, d_opt, 1)
best_model <- NULL
results <- list()
for (p in pq_rng) {
for (q in pq_rng) {
if (p == 0 && d_opt == 0 && q == 0) next
tryCatch({
m <- Arima(train, order = c(p, d_opt, q), method = "ML")
aic <- AIC(m)
results[[length(results)+1]] <- data.frame(p=p, d=d_opt, q=q, AIC=round(aic,2))
if (aic < best_aic) { best_aic <- aic; best_order <- c(p, d_opt, q); best_model <- m }
}, error = function(e) {})
}
}
cat(sprintf("Mejor modelo: ARIMA(%d,%d,%d) — AIC = %.2f\n",
best_order[1], best_order[2], best_order[3], best_aic))## Mejor modelo: ARIMA(1,1,0) — AIC = 26687.81
df_results <- bind_rows(results) |>
arrange(AIC) |>
head(10) |>
mutate(Modelo = paste0("ARIMA(", p, ",", d, ",", q, ")")) |>
select(Modelo, AIC)
kable(df_results,
caption = "Top 10 modelos ARIMA por criterio AIC — BTC precio de cierre",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) |>
row_spec(1, bold = TRUE, background = "#d5f0e0")| Modelo | AIC |
|---|---|
| ARIMA(1,1,0) | 26687.81 |
| ARIMA(0,1,1) | 26687.96 |
| ARIMA(2,1,0) | 26689.71 |
| ARIMA(0,1,2) | 26689.72 |
| ARIMA(1,1,1) | 26689.73 |
| ARIMA(3,1,3) | 26691.11 |
| ARIMA(3,1,0) | 26691.70 |
| ARIMA(2,1,1) | 26691.71 |
| ARIMA(0,1,3) | 26691.72 |
| ARIMA(1,1,2) | 26691.72 |
resid_model <- residuals(best_model)
p1 <- ggplot(data.frame(x = 1:length(resid_model), y = resid_model),
aes(x = x, y = y)) +
geom_line(color = "#3498db", linewidth = 0.5) +
labs(title = "Residuos en el tiempo", x = "Índice", y = "Residuo") +
theme_minimal(base_size = 11)
p2 <- ggplot(data.frame(x = resid_model), aes(x = x)) +
geom_histogram(bins = 40, fill = "#627EEA", alpha = 0.8, color = "white") +
labs(title = "Distribución de Residuos", x = "Residuo", y = "Frecuencia") +
theme_minimal(base_size = 11)
gridExtra::grid.arrange(p1, p2, ncol = 2)Figure 10.1: Diagnóstico de residuos del mejor modelo ARIMA
# Test de normalidad (Shapiro-Wilk)
sw_test <- if (length(resid_model) > 5000)
shapiro.test(sample(resid_model, 5000)) else shapiro.test(resid_model)
# Test de independencia (Ljung-Box)
lb_test <- Box.test(resid_model, lag = 10, type = "Ljung-Box")
tests_df <- data.frame(
Test = c("Shapiro-Wilk (Normalidad)", "Ljung-Box (Independencia)"),
`p-valor` = round(c(sw_test$p.value, lb_test$p.value), 6),
Conclusión = c(
ifelse(sw_test$p.value > 0.05, "✅ Residuos normales", "⚠️ No normales"),
ifelse(lb_test$p.value > 0.05, "✅ Residuos independientes", "⚠️ Autocorrelación presente")
)
)
kable(tests_df,
caption = "Tests sobre los residuos del mejor modelo ARIMA",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)| Test | p.valor | Conclusión |
|---|---|---|
| Shapiro-Wilk (Normalidad) | 0.000000 | ⚠️ No normales |
| Ljung-Box (Independencia) | 0.067679 | ✅ Residuos independientes | |
horizonte <- length(test)
fc <- forecast(best_model, h = horizonte)
pred <- as.numeric(fc$mean)
# Métricas
mae <- mean(abs(pred - test))
rmse <- sqrt(mean((pred - test)^2))
mape <- mean(abs((pred - test) / test)) * 100
r2 <- cor(pred, test)^2
metricas <- data.frame(
Métrica = c("MAE (USD)", "RMSE (USD)", "MAPE (%)", "R²"),
Valor = round(c(mae, rmse, mape, r2), 4)
)
kable(metricas,
caption = sprintf("Métricas de error — ARIMA(%d,%d,%d) en conjunto de test",
best_order[1], best_order[2], best_order[3]),
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)| Métrica | Valor |
|---|---|
| MAE (USD) | 15437.3115 |
| RMSE (USD) | 17588.6850 |
| MAPE (%) | 16.8523 |
| R² | 0.0000 |
n_hist <- min(200, length(train))
df_forecast <- bind_rows(
data.frame(Fecha = tail(train_dates, n_hist),
Precio = tail(train, n_hist), Tipo = "Entrenamiento"),
data.frame(Fecha = test_dates[1:horizonte],
Precio = test[1:horizonte], Tipo = "Test Real"),
data.frame(Fecha = test_dates[1:horizonte],
Precio = pred, Tipo = "Predicción"),
data.frame(Fecha = test_dates[1:horizonte],
Precio = as.numeric(fc$lower[,2]), Tipo = "IC 95% Inferior"),
data.frame(Fecha = test_dates[1:horizonte],
Precio = as.numeric(fc$upper[,2]), Tipo = "IC 95% Superior")
)
df_ribbon <- data.frame(
Fecha = test_dates[1:horizonte],
Lower = as.numeric(fc$lower[,2]),
Upper = as.numeric(fc$upper[,2])
)
df_lines <- df_forecast |> filter(Tipo %in% c("Entrenamiento","Test Real","Predicción"))
ggplot() +
geom_ribbon(data = df_ribbon, aes(x = Fecha, ymin = Lower, ymax = Upper),
fill = "#3498db", alpha = 0.2) +
geom_line(data = df_lines, aes(x = Fecha, y = Precio, color = Tipo, linewidth = Tipo)) +
scale_color_manual(values = c(
"Entrenamiento" = "#2c3e50",
"Test Real" = "#3498db",
"Predicción" = "#e74c3c"
)) +
scale_linewidth_manual(values = c(
"Entrenamiento" = 0.7, "Test Real" = 0.9, "Predicción" = 1.1
)) +
scale_y_continuous(labels = dollar) +
labs(
title = sprintf("ARIMA(%d,%d,%d) — Predicción vs Realidad — BTC",
best_order[1], best_order[2], best_order[3]),
subtitle = sprintf("MAE: $%.0f | RMSE: $%.0f | MAPE: %.2f%% | R²: %.4f",
mae, rmse, mape, r2),
x = NULL, y = "Precio (USD)",
color = NULL, linewidth = NULL
) +
theme_minimal(base_size = 12) +
theme(legend.position = "bottom")Figure 10.2: Ajuste del modelo ARIMA vs valores reales (BTC)
# Usar los 1000 días más recientes, predicción 30 días
n_pred_train <- min(1000, floor(n * 0.85))
train_opt <- serie_full[1:n_pred_train]
train_dates_opt <- fechas_all[1:n_pred_train]
horizonte_opt <- 30
stat_opt <- detect_stationarity_and_differentiate(train_opt)
d_opt2 <- stat_opt$d_value
best_bic <- Inf
best_ord2 <- c(1, d_opt2, 1)
best_mod2 <- NULL
for (p in 0:3) {
for (q in 0:3) {
if (p == 0 && d_opt2 == 0 && q == 0) next
tryCatch({
m <- Arima(train_opt, order = c(p, d_opt2, q), method = "ML")
b <- BIC(m)
if (b < best_bic) { best_bic <- b; best_ord2 <- c(p, d_opt2, q); best_mod2 <- m }
}, error = function(e) {})
}
}
cat(sprintf("Mejor modelo (BIC): ARIMA(%d,%d,%d) — BIC = %.2f\n",
best_ord2[1], best_ord2[2], best_ord2[3], best_bic))## Mejor modelo (BIC): ARIMA(0,1,0) — BIC = 17130.12
fc_opt <- forecast(best_mod2, h = horizonte_opt)
last_d <- tail(train_dates_opt, 1)
pred_dates <- seq(last_d + 1, by = "day", length.out = horizonte_opt)
df_pred_opt <- data.frame(
Fecha = pred_dates,
Prediccion = as.numeric(fc_opt$mean),
Lower_80 = as.numeric(fc_opt$lower[,1]),
Upper_80 = as.numeric(fc_opt$upper[,1]),
Lower_95 = as.numeric(fc_opt$lower[,2]),
Upper_95 = as.numeric(fc_opt$upper[,2])
)
n_ctx <- min(120, length(train_opt))
df_ctx <- data.frame(
Fecha = tail(train_dates_opt, n_ctx),
Precio = tail(train_opt, n_ctx)
)
ggplot() +
geom_line(data = df_ctx, aes(x = Fecha, y = Precio), color = "#2c3e50", linewidth = 0.8) +
geom_ribbon(data = df_pred_opt,
aes(x = Fecha, ymin = Lower_95, ymax = Upper_95), fill = "#3498db", alpha = 0.18) +
geom_ribbon(data = df_pred_opt,
aes(x = Fecha, ymin = Lower_80, ymax = Upper_80), fill = "#3498db", alpha = 0.28) +
geom_line(data = df_pred_opt, aes(x = Fecha, y = Prediccion),
color = "#e74c3c", linewidth = 1.1) +
scale_y_continuous(labels = dollar) +
labs(
title = sprintf("Predicción Óptima 30 días — BTC | ARIMA(%d,%d,%d) por BIC",
best_ord2[1], best_ord2[2], best_ord2[3]),
subtitle = "Azul: IC 80% y 95% | Rojo: Predicción puntual | Gris: Historial",
x = NULL, y = "Precio (USD)"
) +
theme_minimal(base_size = 12)Figure 11.1: Predicción óptima a 30 días para BTC con intervalos de confianza
df_pred_opt |>
mutate(
Fecha = format(Fecha, "%d/%m/%Y"),
Prediccion = dollar(Prediccion, accuracy = 1),
`IC 80%` = paste0(dollar(Lower_80, accuracy=1), " — ", dollar(Upper_80, accuracy=1)),
`IC 95%` = paste0(dollar(Lower_95, accuracy=1), " — ", dollar(Upper_95, accuracy=1))
) |>
select(Fecha, Prediccion, `IC 80%`, `IC 95%`) |>
kable(caption = "Predicciones diarias a 30 días con intervalos de confianza",
booktabs = TRUE, row.names = FALSE) |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = TRUE) |>
scroll_box(height = "400px")| Fecha | Prediccion | IC 80% | IC 95% |
|---|---|---|---|
| 16/11/2023 | $37,884 | $36,249 — $39,519 | $35,384 — $40,384 |
| 17/11/2023 | $37,884 | $35,572 — $40,196 | $34,348 — $41,420 |
| 18/11/2023 | $37,884 | $35,053 — $40,716 | $33,554 — $42,215 |
| 19/11/2023 | $37,884 | $34,614 — $41,154 | $32,884 — $42,885 |
| 20/11/2023 | $37,884 | $34,229 — $41,540 | $32,293 — $43,475 |
| 21/11/2023 | $37,884 | $33,880 — $41,889 | $31,760 — $44,009 |
| 22/11/2023 | $37,884 | $33,559 — $42,210 | $31,269 — $44,499 |
| 23/11/2023 | $37,884 | $33,260 — $42,508 | $30,812 — $44,956 |
| 24/11/2023 | $37,884 | $32,980 — $42,789 | $30,383 — $45,385 |
| 25/11/2023 | $37,884 | $32,714 — $43,054 | $29,978 — $45,791 |
| 26/11/2023 | $37,884 | $32,462 — $43,306 | $29,592 — $46,177 |
| 27/11/2023 | $37,884 | $32,221 — $43,547 | $29,223 — $46,545 |
| 28/11/2023 | $37,884 | $31,990 — $43,779 | $28,869 — $46,899 |
| 29/11/2023 | $37,884 | $31,767 — $44,001 | $28,529 — $47,239 |
| 30/11/2023 | $37,884 | $31,552 — $44,216 | $28,201 — $47,568 |
| 01/12/2023 | $37,884 | $31,345 — $44,424 | $27,883 — $47,885 |
| 02/12/2023 | $37,884 | $31,144 — $44,625 | $27,575 — $48,193 |
| 03/12/2023 | $37,884 | $30,948 — $44,820 | $27,276 — $48,492 |
| 04/12/2023 | $37,884 | $30,758 — $45,010 | $26,986 — $48,783 |
| 05/12/2023 | $37,884 | $30,573 — $45,195 | $26,703 — $49,066 |
| 06/12/2023 | $37,884 | $30,392 — $45,376 | $26,426 — $49,342 |
| 07/12/2023 | $37,884 | $30,216 — $45,552 | $26,157 — $49,612 |
| 08/12/2023 | $37,884 | $30,044 — $45,725 | $25,893 — $49,875 |
| 09/12/2023 | $37,884 | $29,875 — $45,893 | $25,635 — $50,133 |
| 10/12/2023 | $37,884 | $29,710 — $46,058 | $25,383 — $50,386 |
| 11/12/2023 | $37,884 | $29,548 — $46,220 | $25,135 — $50,633 |
| 12/12/2023 | $37,884 | $29,389 — $46,379 | $24,892 — $50,876 |
| 13/12/2023 | $37,884 | $29,233 — $46,535 | $24,654 — $51,115 |
| 14/12/2023 | $37,884 | $29,080 — $46,688 | $24,420 — $51,349 |
| 15/12/2023 | $37,884 | $28,930 — $46,839 | $24,190 — $51,579 |
Crypto_EDA_App/
├── global.R # Librerías, config API, funciones, carga de datos
├── ui.R # Interfaz (shinydashboard): 10 pestañas
├── server.R # Lógica reactiva y renderizado de gráficos
├── README.md # Documentación
└── www/
└── crypto.svg
modulos <- tibble(
Pestaña = c("Introducción","Visión General","Precios","Retornos & Riesgo",
"Correlaciones","Comparador","Análisis EDA","Modelo ARIMA",
"Predicción Óptima","Valores Faltantes"),
Contenido = c(
"Hero banner, value boxes, tarjetas de monedas, objetivo, equipo",
"Precios spot, cap. de mercado, volumen 24h en tiempo real",
"Boxplot, serie temporal, candlestick (60 días), Bandas de Bollinger",
"Boxplot retornos, histograma, volatilidad rodante 30d, VaR 95%",
"Heatmap Pearson/Spearman, scatter entre pares seleccionables",
"Rendimiento acumulado normalizado base 100 o %, tabla comparativa",
"Boxplot, ACF, PACF, test ADF, descomposición STL, estadísticas",
"Búsqueda automática ARIMA (AIC/BIC/HQIC), rolling/directo, diagnóstico",
"Mejor modelo automático, predicción a N días, IC 80% y 95%",
"Resumen NAs, heatmap, imputación (interpolación / media / eliminar)"
)
)
kable(modulos, caption = "Módulos del Crypto EDA Dashboard",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
column_spec(1, bold = TRUE, width = "2.5cm") |>
column_spec(2, width = "12cm")| Pestaña | Contenido |
|---|---|
| Introducción | Hero banner, value boxes, tarjetas de monedas, objetivo, equipo |
| Visión General | Precios spot, cap. de mercado, volumen 24h en tiempo real |
| Precios | Boxplot, serie temporal, candlestick (60 días), Bandas de Bollinger |
| Retornos & Riesgo | Boxplot retornos, histograma, volatilidad rodante 30d, VaR 95% |
| Correlaciones | Heatmap Pearson/Spearman, scatter entre pares seleccionables |
| Comparador | Rendimiento acumulado normalizado base 100 o %, tabla comparativa |
| Análisis EDA | Boxplot, ACF, PACF, test ADF, descomposición STL, estadísticas |
| Modelo ARIMA | Búsqueda automática ARIMA (AIC/BIC/HQIC), rolling/directo, diagnóstico |
| Predicción Óptima | Mejor modelo automático, predicción a N días, IC 80% y 95% |
| Valores Faltantes | Resumen NAs, heatmap, imputación (interpolación / media / eliminar) |
funciones <- tibble(
Función = c(
"get_historical_daily()",
"get_price_overview()",
"calculate_bollinger_bands()",
"handle_missing()",
"missing_summary()",
"perform_stl_decomposition_safe()",
"detect_stationarity_and_differentiate()",
"get_best_model_for_prediction()",
"analyze_residuals()"
),
Archivo = c("global.R","global.R","global.R","global.R","global.R",
"global.R","global.R","global.R","global.R"),
Descripción = c(
"Descarga OHLCV histórico de CryptoCompare (hasta 1905 días)",
"Obtiene precio spot, cap. de mercado y volumen 24h en tiempo real",
"Calcula SMA, banda superior e inferior dados window y k",
"Imputa NAs: interpolación lineal, media global o eliminación",
"Devuelve tabla de variables con NAs y su porcentaje",
"Descomposición STL robusta; maneja series cortas con fallback",
"Aplica test ADF iterativo para determinar el orden d óptimo",
"Busca el mejor ARIMA(p,d,q) por AIC/BIC/HQIC sobre el conjunto de entrenamiento",
"Ejecuta tests Shapiro-Wilk y Ljung-Box sobre los residuos del modelo"
)
)
kable(funciones, caption = "Funciones principales definidas en global.R",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = TRUE, font_size = 11) |>
column_spec(1, bold = TRUE, monospace = TRUE, width = "4.5cm") |>
column_spec(2, width = "2cm") |>
column_spec(3, width = "9cm")| Función | Archivo | Descripción |
|---|---|---|
| get_historical_daily() | global.R | Descarga OHLCV histórico de CryptoCompare (hasta 1905 días) |
| get_price_overview() | global.R | Obtiene precio spot, cap. de mercado y volumen 24h en tiempo real |
| calculate_bollinger_bands() | global.R | Calcula SMA, banda superior e inferior dados window y k |
| handle_missing() | global.R | Imputa NAs: interpolación lineal, media global o eliminación |
| missing_summary() | global.R | Devuelve tabla de variables con NAs y su porcentaje |
| perform_stl_decomposition_safe() | global.R | Descomposición STL robusta; maneja series cortas con fallback |
| detect_stationarity_and_differentiate() | global.R | Aplica test ADF iterativo para determinar el orden d óptimo |
| get_best_model_for_prediction() | global.R | Busca el mejor ARIMA(p,d,q) por AIC/BIC/HQIC sobre el conjunto de entrenamiento |
| analyze_residuals() | global.R | Ejecuta tests Shapiro-Wilk y Ljung-Box sobre los residuos del modelo |
deps <- tibble(
Librería = c("shiny","shinydashboard","tidyverse","plotly","DT",
"httr","jsonlite","zoo","forecast","tseries",
"lubridate","scales","rsconnect"),
Uso = c(
"Framework reactivo de la app",
"Layout tipo dashboard con sidebar y pestañas",
"Manipulación y transformación de datos",
"Gráficos interactivos con animaciones y tooltips",
"Tablas interactivas con filtros y paginación",
"Conexión HTTP a la API REST de CryptoCompare",
"Parseo de respuestas JSON de la API",
"Medias móviles, rollmean, rollapply, na.approx",
"auto.arima(), Arima(), forecast(), ACF/PACF",
"Test Augmented Dickey-Fuller (adf.test)",
"Manipulación de fechas (floor_date, as_date)",
"Formato de ejes y etiquetas (dollar, percent)",
"Publicación en shinyapps.io"
)
)
kable(deps, caption = "Dependencias del proyecto",
booktabs = TRUE) |>
kable_styling(bootstrap_options = c("striped","hover","condensed"),
full_width = TRUE, font_size = 11) |>
column_spec(1, bold = TRUE, monospace = TRUE)| Librería | Uso |
|---|---|
| shiny | Framework reactivo de la app |
| shinydashboard | Layout tipo dashboard con sidebar y pestañas |
| tidyverse | Manipulación y transformación de datos |
| plotly | Gráficos interactivos con animaciones y tooltips |
| DT | Tablas interactivas con filtros y paginación |
| httr | Conexión HTTP a la API REST de CryptoCompare |
| jsonlite | Parseo de respuestas JSON de la API |
| zoo | Medias móviles, rollmean, rollapply, na.approx |
| forecast | auto.arima(), Arima(), forecast(), ACF/PACF |
| tseries | Test Augmented Dickey-Fuller (adf.test) |
| lubridate | Manipulación de fechas (floor_date, as_date) |
| scales | Formato de ejes y etiquetas (dollar, percent) |
| rsconnect | Publicación en shinyapps.io |
| Concepto | Fórmula |
|---|---|
| Retorno simple | \(r_t = \frac{P_t - P_{t-1}}{P_{t-1}} \times 100\) |
| Retorno logarítmico | \(r_t^{log} = \ln\left(\frac{P_t}{P_{t-1}}\right) \times 100\) |
| Volatilidad diaria | \(v_t = \frac{High_t - Low_t}{Open_t} \times 100\) |
| SMA | \(SMA_n = \frac{1}{n}\sum_{i=0}^{n-1} P_{t-i}\) |
| Banda de Bollinger | \(BB_{\pm} = SMA_n \pm k \cdot \sigma_n\) |
| Ancho de banda | \(\%Bw = \frac{BB_{+} - BB_{-}}{SMA_n} \times 100\) |
| VaR 95% | \(VaR_{95\%} = Q_{0.05}(r_t)\) |
| Volatilidad rodante | \(\sigma_{30} = \sqrt{\frac{1}{29}\sum_{i=0}^{29}(r_{t-i} - \bar{r})^2}\) |
| Descomposición STL | \(Y(t) = T(t) + S(t) + R(t)\) |
| Modelo ARIMA(p,d,q) | \(\phi(B)(1-B)^d y_t = \theta(B)\varepsilon_t\) |
Heterogeneidad de precios: el rango de precios entre las 10 criptomonedas abarca varios órdenes de magnitud (de $0.10 en DOGE a más de $50,000 en BTC), lo que hace necesario el uso de escala logarítmica en las comparativas.
Distribución de retornos: todos los activos muestran colas pesadas (fat tails) y ligera asimetría, comportamiento típico de activos financieros de alto riesgo. Las stablecoins (USDC, USDT, USD1) presentan retornos casi nulos con muy baja varianza.
Correlaciones: BTC y ETH muestran alta correlación positiva (~0.7–0.8 en períodos recientes), mientras que las stablecoins presentan correlación cercana a cero con el resto del mercado.
Estacionariedad: los precios de cierre son no-estacionarios (no rechazan H₀ en ADF), mientras que los retornos diarios sí son estacionarios en la mayoría de los activos, confirmando que d=1 es el orden adecuado para modelar precios.
Modelo ARIMA: se identificó automáticamente el mejor orden (p, d, q) por criterio AIC/BIC para cada activo. Los modelos exhiben residuos con baja autocorrelación (Ljung-Box no significativo), aunque la no-normalidad de los residuos sugiere la posibilidad de mejorar con modelos GARCH o variantes con distribución t-Student.
| Integrante | Programa | GitHub |
|---|---|---|
| Mateo Barrios | Ciencia de Datos | Mateo3008 |
| Rafael Romero | Ciencia de Datos | rafaelromero06 |
# 1. Instalar dependencias
install.packages(c(
"bookdown", "tidyverse", "plotly", "DT", "lubridate",
"scales", "jsonlite", "httr", "zoo", "forecast", "tseries",
"knitr", "kableExtra", "gridExtra"
))
# 2. Renderizar
bookdown::render_book("Crypto_EDA_Bookdown.Rmd", "bookdown::html_document2")
# O con rmarkdown directamente:
rmarkdown::render("Crypto_EDA_Bookdown.Rmd")## Sesión R generada el 07/05/2026 22:21
## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=Spanish_Colombia.utf8 LC_CTYPE=Spanish_Colombia.utf8
## [3] LC_MONETARY=Spanish_Colombia.utf8 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Colombia.utf8
##
## time zone: America/Bogota
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kableExtra_1.4.0 knitr_1.50 tseries_0.10-60 forecast_9.0.2
## [5] zoo_1.8-15 httr_1.4.8 jsonlite_2.0.0 scales_1.4.0
## [9] DT_0.34.0 plotly_4.12.0 lubridate_1.9.5 forcats_1.0.1
## [13] stringr_1.5.1 dplyr_1.1.4 purrr_1.2.1 readr_2.1.6
## [17] tidyr_1.3.2 tibble_3.3.1 ggplot2_4.0.2 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.52 bslib_0.9.0 htmlwidgets_1.6.4
## [5] lattice_0.22-7 tzdb_0.5.0 crosstalk_1.2.2 quadprog_1.5-8
## [9] vctrs_0.6.5 tools_4.5.1 generics_0.1.4 curl_7.0.0
## [13] parallel_4.5.1 xts_0.14.1 pkgconfig_2.0.3 Matrix_1.7-3
## [17] data.table_1.18.0 RColorBrewer_1.1-3 S7_0.2.1 lifecycle_1.0.4
## [21] compiler_4.5.1 farver_2.1.2 textshaping_1.0.4 htmltools_0.5.8.1
## [25] sass_0.4.10 yaml_2.3.10 lazyeval_0.2.2 pillar_1.11.1
## [29] jquerylib_0.1.4 cachem_1.1.0 nlme_3.1-168 fracdiff_1.5-3
## [33] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.7 bookdown_0.46
## [37] splines_4.5.1 labeling_0.4.3 fastmap_1.2.0 grid_4.5.1
## [41] colorspace_2.1-2 cli_3.6.5 magrittr_2.0.3 withr_3.0.2
## [45] timechange_0.4.0 TTR_0.24.4 rmarkdown_2.29 quantmod_0.4.28
## [49] gridExtra_2.3 timeDate_4052.112 hms_1.1.4 urca_1.3-4
## [53] evaluate_1.0.5 viridisLite_0.4.2 mgcv_1.9-3 rlang_1.1.6
## [57] Rcpp_1.1.0 glue_1.8.0 xml2_1.5.2 svglite_2.2.2
## [61] rstudioapi_0.17.1 R6_2.6.1 systemfonts_1.3.1