1 Introducción

1.1 Descripción del Proyecto

Este documento presenta el análisis exploratorio completo del Crypto EDA Dashboard, un proyecto académico que consume datos en tiempo real de la API de CryptoCompare para analizar el comportamiento histórico de 10 criptomonedas.

El análisis está orientado hacia la construcción de un modelo predictivo de precios basado en ARIMA, cubriendo desde la obtención y limpieza de datos hasta el ajuste, validación y predicción con series de tiempo.

Repositorio: https://github.com/Mateo3008/Crypto_EDA_App
Framework: R Shiny + shinydashboard
Fuente de datos: CryptoCompare API (OHLCV diario, ~1905 días por moneda)

1.2 Criptomonedas Analizadas

cryptos <- tibble(
  Nombre     = c("Bitcoin","Ethereum","USD Coin","Solana","XRP",
                 "Bittensor","Tether","Dogecoin","USD1","Zcash"),
  Símbolo    = c("BTC","ETH","USDC","SOL","XRP",
                 "TAO","USDT","DOGE","USD1","ZEC"),
  Categoría  = c("Store of Value","Smart Contract","Stablecoin","Smart Contract",
                 "Payments","AI / Subnet","Stablecoin","Meme","Stablecoin","Privacy")
)

kable(cryptos, caption = "Criptomonedas incluidas en el análisis",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE)

Table 1.1: Table 1.2: Criptomonedas incluidas en el análisis
Nombre	Símbolo	Categoría
Bitcoin	BTC	Store of Value
Ethereum	ETH	Smart Contract
USD Coin	USDC	Stablecoin
Solana	SOL	Smart Contract
XRP	XRP	Payments
Bittensor	TAO	AI / Subnet
Tether	USDT	Stablecoin
Dogecoin	DOGE	Meme
USD1	USD1	Stablecoin
Zcash	ZEC	Privacy

1.3 Objetivo Académico

El objetivo principal es aplicar técnicas de Análisis Exploratorio de Datos (EDA) sobre series temporales financieras para:

Caracterizar la distribución de precios y retornos de cada activo.
Cuantificar el riesgo mediante métricas como la volatilidad rodante y el VaR.
Identificar correlaciones entre criptomonedas.
Evaluar la estacionariedad de las series y aplicar transformaciones necesarias.
Ajustar modelos ARIMA(p, d, q) para la predicción de precios a corto plazo.

2 Configuración y Carga de Datos

2.1 Parámetros de la API

API_KEY  <- "ce6e922820dabbb917d5f6fd82b867726fbf320cf3f7414b33748c19e9514aae"
BASE_URL <- "https://min-api.cryptocompare.com/data"

CRYPTOS <- c(
  "Bitcoin"   = "BTC", "Ethereum"  = "ETH", "USD Coin"  = "USDC",
  "Solana"    = "SOL", "XRP"       = "XRP", "Bittensor" = "TAO",
  "Tether"    = "USDT","Dogecoin"  = "DOGE","USD1"      = "USD1",
  "Zcash"     = "ZEC"
)

COLOR_PALETTE <- c(
  "#2E86AB","#A23B72","#F18F01","#C73E1D","#6A994E",
  "#BC4A6C","#3D5A80","#EE6C4D","#98C1D9","#293241"
)
names(COLOR_PALETTE) <- CRYPTOS

2.2 Funciones de Extracción

2.2.1 Datos Históricos OHLCV

La función get_historical_daily() consulta el endpoint /v2/histoday de CryptoCompare y retorna un data.frame con columnas OHLCV, retornos simples, logarítmicos y volatilidad diaria:

get_historical_daily <- function(fsym, tsym = "USD", limit = 1905) {
  url <- paste0(BASE_URL, "/v2/histoday?fsym=", fsym, "&tsym=", tsym,
                "&limit=", limit, "&api_key=", API_KEY)
  url <- URLencode(url)

  tryCatch({
    resp <- GET(url, timeout(30))
    if (http_error(resp)) return(NULL)

    data <- fromJSON(content(resp, "text", encoding = "UTF-8"))
    if (is.null(data$Data$Data)) return(NULL)

    df <- as.data.frame(data$Data$Data) |>
      mutate(
        fecha      = as.Date(as.POSIXct(time, origin = "1970-01-01")),
        simbolo    = fsym,
        open       = as.numeric(open),
        high       = as.numeric(high),
        low        = as.numeric(low),
        close      = as.numeric(close),
        volume     = as.numeric(volumefrom),
        retorno    = (close - lag(close)) / lag(close) * 100,
        retorno_log = log(close / lag(close)) * 100,
        volatilidad = abs(high - low) / open * 100
      ) |>
      filter(!is.na(retorno))
    return(df)
  }, error = function(e) return(NULL))
}

2.2.2 Precios en Tiempo Real

get_price_overview <- function(fsyms, tsym = "USD") {
  fsyms_str <- paste(fsyms, collapse = ",")
  url <- paste0(BASE_URL, "/pricemultifull?fsyms=", fsyms_str,
                "&tsyms=", tsym, "&api_key=", API_KEY)
  url <- URLencode(url)

  tryCatch({
    resp <- GET(url, timeout(30))
    if (http_error(resp)) return(NULL)

    data <- fromJSON(content(resp, "text", encoding = "UTF-8"))
    if (is.null(data$RAW)) return(NULL)

    rows <- list()
    for (sym in fsyms) {
      if (!is.null(data$RAW[[sym]]) && !is.null(data$RAW[[sym]][[tsym]])) {
        d <- data$RAW[[sym]][[tsym]]
        rows[[sym]] <- data.frame(
          simbolo        = sym,
          precio         = d$PRICE,
          cambio_24h_pct = d$CHANGEPCT24HOUR,
          volumen_24h    = d$VOLUME24HOURTO,
          cap_mercado    = d$MKTCAP,
          stringsAsFactors = FALSE
        )
      }
    }
    if (length(rows) == 0) return(NULL)
    return(bind_rows(rows))
  }, error = function(e) return(NULL))
}

2.3 Descarga de Datos

cat("=== CARGANDO DATOS (10 monedas, 1905 días) ===\n")

## === CARGANDO DATOS (10 monedas, 1905 días) ===

hist_data <- NULL

for (crypto in CRYPTOS) {
  cat("Cargando", crypto, "... ")
  data <- get_historical_daily(crypto, limit = 1905)
  if (!is.null(data) && nrow(data) > 0) {
    hist_data <- bind_rows(hist_data, data)
    cat("OK (", nrow(data), "días)\n")
  } else {
    cat("FALLÓ — generando datos de ejemplo\n")
  }
  Sys.sleep(0.3)
}

## Cargando BTC ... OK ( 1905 días)
## Cargando ETH ... OK ( 1905 días)
## Cargando USDC ... OK ( 1905 días)
## Cargando SOL ... OK ( 1905 días)
## Cargando XRP ... OK ( 1905 días)
## Cargando TAO ... OK ( 702 días)
## Cargando USDT ... OK ( 1905 días)
## Cargando DOGE ... OK ( 1905 días)
## Cargando USD1 ... OK ( 326 días)
## Cargando ZEC ... OK ( 1905 días)

# Fallback: datos sintéticos si la API no responde
if (is.null(hist_data) || nrow(hist_data) == 0) {
  set.seed(123)
  fechas <- seq.Date(as.Date("2019-01-01"), as.Date("2024-04-10"), by = "day")

  for (sym in CRYPTOS) {
    precio_base <- switch(sym,
      "BTC"=50000,"ETH"=3000,"USDC"=1,"SOL"=100,"XRP"=0.5,
      "TAO"=300,"USDT"=1,"DOGE"=0.1,"USD1"=1,"ZEC"=30, 1000)

    trend <- seq(0, by=0.0002, length.out=length(fechas)) * precio_base
    noise <- cumsum(rnorm(length(fechas), 0, precio_base * 0.015))
    close <- pmax(precio_base + trend + noise, precio_base * 0.1)

    df <- data.frame(
      fecha       = fechas, simbolo = sym, close = close,
      open        = c(close[1], close[-length(close)]),
      high        = close + abs(rnorm(length(fechas), 0, close * 0.02)),
      low         = close - abs(rnorm(length(fechas), 0, close * 0.02)),
      retorno     = c(0, diff(close)/close[-length(close)] * 100),
      retorno_log = c(0, diff(log(close)) * 100),
      volatilidad = runif(length(fechas), 1, 6)
    )
    hist_data <- bind_rows(hist_data, df)
  }
}

prices_overview <- get_price_overview(CRYPTOS)
if (is.null(prices_overview)) {
  prices_overview <- data.frame(
    simbolo        = names(CRYPTOS),
    precio         = c(50000,3000,1,100,0.5,300,1,0.1,1,30),
    cambio_24h_pct = runif(10,-5,5),
    volumen_24h    = runif(10,1e8,1e10),
    cap_mercado    = c(1e12,4e11,5e10,3e10,1e10,5e9,8e10,2e10,4e9,1e9)
  )
}

cat("\n✅ Total filas:", nrow(hist_data))

## 
## ✅ Total filas: 16268

cat("\n📅 Período:", as.character(min(hist_data$fecha)),
    "→", as.character(max(hist_data$fecha)))

## 
## 📅 Período: 2021-02-19 → 2026-05-08

cat("\n🪙 Monedas:", paste(unique(hist_data$simbolo), collapse=", "), "\n")

## 
## 🪙 Monedas: BTC, ETH, USDC, SOL, XRP, TAO, USDT, DOGE, USD1, ZEC

3 Calidad de los Datos

3.1 Valores Faltantes

missing_summary <- function(df) {
  df |>
    summarise(across(everything(), ~ sum(is.na(.)))) |>
    pivot_longer(everything(), names_to = "Variable", values_to = "NAs") |>
    mutate(Pct = round(NAs / nrow(df) * 100, 2)) |>
    filter(NAs > 0)
}

ms_global <- hist_data |>
  group_by(simbolo) |>
  summarise(
    Total_filas  = n(),
    NAs_close    = sum(is.na(close)),
    NAs_retorno  = sum(is.na(retorno)),
    NAs_vol      = sum(is.na(volatilidad)),
    Pct_NA_close = round(NAs_close / Total_filas * 100, 2)
  )

kable(ms_global, caption = "Resumen de valores faltantes por moneda",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)

Table 3.1: Table 3.2: Resumen de valores faltantes por moneda
simbolo	Total_filas	NAs_close	NAs_retorno	NAs_vol	Pct_NA_close
BTC	1905	0	0	0	0
DOGE	1905	0	0	0	0
ETH	1905	0	0	0	0
SOL	1905	0	0	0	0
TAO	702	0	0	0	0
USD1	326	0	0	0	0
USDC	1905	0	0	0	0
USDT	1905	0	0	0	0
XRP	1905	0	0	0	0
ZEC	1905	0	0	0	0

3.2 Mapa de Calor de NAs

cols <- c("close","retorno","retorno_log","volatilidad")

df_heat <- hist_data |>
  group_by(simbolo) |>
  summarise(across(all_of(intersect(cols, names(hist_data))),
                   ~ sum(is.na(.)) / n() * 100, .names = "{.col}")) |>
  pivot_longer(-simbolo, names_to = "Variable", values_to = "pct_na")

ggplot(df_heat, aes(x = Variable, y = simbolo, fill = pct_na)) +
  geom_tile(color = "white", linewidth = 0.5) +
  geom_text(aes(label = paste0(round(pct_na, 1), "%")), size = 3) +
  scale_fill_gradient(low = "white", high = "#e74c3c", name = "% NA") +
  labs(title = "Porcentaje de NAs por moneda y variable",
       x = NULL, y = NULL) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

Figure 3.1: Porcentaje de valores faltantes por moneda y variable

3.3 Métodos de Imputación

handle_missing <- function(df, method = "interpolation") {
  cols_imp <- c("close","open","high","low","volume","retorno","retorno_log","volatilidad")

  if (method == "remove") return(na.omit(df))

  if (method == "interpolation") {
    df <- df |> arrange(fecha)
    for (col in intersect(cols_imp, names(df))) {
      df[[col]] <- na.approx(df[[col]], na.rm = FALSE)
      df[[col]] <- na.locf(df[[col]], na.rm = FALSE)
      df[[col]] <- na.locf(df[[col]], fromLast = TRUE, na.rm = FALSE)
    }
    return(df)
  }

  if (method == "mean") {
    for (col in intersect(cols_imp, names(df))) {
      df[[col]][is.na(df[[col]])] <- mean(df[[col]], na.rm = TRUE)
    }
    return(df)
  }
  return(df)
}

Se implementaron tres métodos de manejo de valores faltantes:

Interpolación lineal (método predeterminado): utiliza zoo::na.approx() seguido de propagación hacia adelante/atrás con na.locf().
Media global: reemplaza cada NA por la media de esa columna.
Eliminación de filas: descarta registros incompletos con na.omit().

4 Visión General de Mercado

4.1 Precios y Capitalización Actuales

prices_overview |>
  mutate(
    precio         = dollar(precio, accuracy = 0.01),
    cambio_24h_pct = paste0(round(cambio_24h_pct, 2), "%"),
    volumen_24h    = dollar(volumen_24h, accuracy = 1, scale = 1e-6, suffix = "M"),
    cap_mercado    = dollar(cap_mercado, accuracy = 1, scale = 1e-9, suffix = "B")
  ) |>
  rename(
    Símbolo = simbolo, `Precio USD` = precio, `Cambio 24h` = cambio_24h_pct,
    `Volumen 24h` = volumen_24h, `Cap. Mercado` = cap_mercado
  ) |>
  kable(caption = "Visión general del mercado en tiempo real",
        booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)

Table 4.1: Table 4.2: Visión general del mercado en tiempo real
Símbolo	Precio USD	Cambio 24h	Volumen 24h	Cap. Mercado
BTC	$79,577.11	-1.56%	$1,566M	$1,594B
ETH	$2,278.34	-1.88%	$651M	$275B
USDC	$1.00	0.01%	$220M	$78B
SOL	$88.21	0.51%	$155M	$55B
XRP	$1.38	-1.53%	$114M	$138B
TAO	$302.82	-1.18%	$21M	$3B
USDT	$1.00	0%	$382M	$195B
DOGE	$0.11	-3.5%	$55M	$18B
USD1	$1.00	-0.01%	$123M	$4B
ZEC	$576.30	6.98%	$87M	$10B

4.2 Capitalización de Mercado

df_cap <- prices_overview |>
  mutate(simbolo = reorder(simbolo, cap_mercado))

ggplot(df_cap, aes(x = simbolo, y = cap_mercado / 1e9, fill = simbolo)) +
  geom_col(show.legend = FALSE, alpha = 0.85) +
  geom_text(aes(label = paste0("$", round(cap_mercado / 1e9, 1), "B")),
            hjust = -0.1, size = 3.5) +
  coord_flip() +
  scale_fill_manual(values = COLOR_PALETTE) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15)),
                     labels = dollar_format(suffix = "B", scale = 1)) +
  labs(title = "Capitalización de Mercado",
       x = NULL, y = "Miles de millones USD") +
  theme_minimal(base_size = 12)

Figure 4.1: Capitalización de mercado por criptomoneda (miles de millones USD)

4.3 Volumen de Transacciones (24h)

df_vol <- prices_overview |>
  mutate(simbolo = reorder(simbolo, volumen_24h))

ggplot(df_vol, aes(x = simbolo, y = volumen_24h / 1e6, fill = simbolo)) +
  geom_col(show.legend = FALSE, alpha = 0.85) +
  coord_flip() +
  scale_fill_manual(values = COLOR_PALETTE) +
  scale_y_continuous(labels = dollar_format(suffix = "M", scale = 1)) +
  labs(title = "Volumen 24h", x = NULL, y = "Millones USD") +
  theme_minimal(base_size = 12)

Figure 4.2: Volumen de transacciones en las últimas 24 horas

5 Análisis de Precios

5.1 Distribución de Precios (Boxplot)

df_365 <- hist_data |>
  filter(fecha >= max(fecha) - 365)

ggplot(df_365, aes(x = simbolo, y = close, fill = simbolo)) +
  geom_boxplot(alpha = 0.7, outlier.color = "#e74c3c", outlier.size = 1.5) +
  stat_summary(fun = "mean", geom = "point", shape = 18, size = 4, color = "white") +
  scale_y_log10(labels = dollar) +
  scale_fill_manual(values = COLOR_PALETTE) +
  coord_flip() +
  labs(title = "Distribución de Precios de Cierre (últimos 365 días)",
       x = NULL, y = "Precio USD (escala log)") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

Figure 5.1: Distribución de precios de cierre (escala logarítmica) — últimos 365 días

5.2 Serie Temporal de Precios — BTC

df_btc <- hist_data |> filter(simbolo == "BTC") |> arrange(fecha)

ggplot(df_btc, aes(x = fecha, y = close)) +
  geom_line(color = "#F7931A", linewidth = 0.7) +
  scale_y_continuous(labels = dollar) +
  labs(title = "Bitcoin — Precio de Cierre Histórico",
       x = NULL, y = "Precio (USD)") +
  theme_minimal(base_size = 12)

Figure 5.2: Serie temporal del precio de cierre de Bitcoin (BTC)

5.3 Candlestick — Últimos 60 días (BTC)

df_candle <- df_btc |> tail(60)

plot_ly(df_candle, x = ~fecha, type = "candlestick",
        open  = ~open,  close = ~close,
        high  = ~high,  low   = ~low,
        increasing = list(line = list(color = "#2ecc71")),
        decreasing = list(line = list(color = "#e74c3c"))) |>
  layout(title  = "BTC — Últimas 60 Velas Diarias",
         xaxis  = list(title = ""),
         yaxis  = list(title = "Precio (USD)"),
         paper_bgcolor = "white",
         plot_bgcolor  = "white")

Figure 5.3: Gráfico de velas japonesas — BTC, últimos 60 días

5.4 Bandas de Bollinger

Las Bandas de Bollinger son un indicador de volatilidad que envuelve el precio alrededor de una media móvil simple (SMA).

BB± = SMAₙ ± k · σₙ
%Bandwidth = (BB₊ − BB₋) / SMAₙ × 100

calculate_bollinger_bands <- function(prices, window = 20, sd_mult = 2) {
  sma <- rollmean(prices, window, fill = NA, align = "right")
  sd  <- rollapply(prices, window, sd, fill = NA, align = "right")
  data.frame(
    sma   = sma,
    upper = sma + (sd_mult * sd),
    lower = sma - (sd_mult * sd)
  )
}

bb <- calculate_bollinger_bands(df_btc$close, window = 20, sd_mult = 2)
df_bb <- cbind(df_btc, bb) |> drop_na(sma, upper, lower) |> tail(365)

ggplot(df_bb, aes(x = fecha)) +
  geom_ribbon(aes(ymin = lower, ymax = upper), fill = "#3498db", alpha = 0.18) +
  geom_line(aes(y = close, color = "Precio"),         linewidth = 0.8) +
  geom_line(aes(y = sma,   color = "SMA 20"),         linetype = "dashed", linewidth = 0.7) +
  geom_line(aes(y = upper, color = "Banda Superior"), linetype = "dotted", linewidth = 0.6) +
  geom_line(aes(y = lower, color = "Banda Inferior"), linetype = "dotted", linewidth = 0.6) +
  scale_color_manual(values = c(
    "Precio" = "#e94560", "SMA 20" = "#F7931A",
    "Banda Superior" = "#2ecc71", "Banda Inferior" = "#2ecc71"
  )) +
  scale_y_continuous(labels = dollar) +
  labs(title = "Bandas de Bollinger — BTC (SMA 20, k = 2)",
       x = NULL, y = "Precio (USD)", color = NULL) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")

Figure 5.4: Bandas de Bollinger — BTC (SMA 20, k=2)

df_bb <- df_bb |>
  mutate(bw = (upper - lower) / sma * 100)

ggplot(df_bb, aes(x = fecha, y = bw)) +
  geom_line(color = "#627EEA", linewidth = 0.8) +
  geom_area(fill = "#627EEA", alpha = 0.15) +
  labs(title = "Ancho de Banda Relativo (%Bandwidth)",
       x = NULL, y = "%Bandwidth") +
  theme_minimal(base_size = 12)

Figure 5.5: Ancho de banda relativo — medida de volatilidad de Bollinger

6 Retornos y Riesgo

6.1 Distribución de Retornos

Los retornos se calculan como:

Retorno simple: rₜ = (Pₜ − Pₜ₋₁) / Pₜ₋₁ × 100
Retorno logarítmico: rₜˡᵒᵍ = ln(Pₜ / Pₜ₋₁) × 100

ggplot(df_365, aes(x = simbolo, y = retorno, fill = simbolo)) +
  geom_boxplot(alpha = 0.7, outlier.color = "#e74c3c", outlier.size = 1) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  scale_fill_manual(values = COLOR_PALETTE) +
  coord_flip() +
  labs(title = "Distribución de Retornos Diarios",
       x = NULL, y = "Retorno (%)") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

Figure 6.1: Distribución de retornos diarios por criptomoneda (últimos 365 días)

6.2 Histograma de Retornos — BTC

ggplot(df_btc, aes(x = retorno, fill = after_stat(x) > 0)) +
  geom_histogram(bins = 60, alpha = 0.85) +
  scale_fill_manual(values = c("TRUE" = "#2ecc71", "FALSE" = "#e74c3c"),
                    labels = c("TRUE" = "Positivo", "FALSE" = "Negativo"),
                    name = NULL) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
  labs(title = "Histograma de Retornos Diarios — BTC",
       x = "Retorno (%)", y = "Frecuencia") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "top")

Figure 6.2: Histograma de retornos diarios de BTC

6.3 Volatilidad Rodante (30 días)

df_btc_vol <- df_btc |>
  arrange(fecha) |>
  mutate(vol30 = zoo::rollapply(retorno, 30, sd, fill = NA, align = "right"))

ggplot(df_btc_vol, aes(x = fecha, y = vol30)) +
  geom_line(color = "#e94560", linewidth = 0.7) +
  geom_area(fill = "#e94560", alpha = 0.15) +
  labs(title = "Volatilidad Rodante 30 días — BTC",
       x = NULL, y = "Desv. Est. Retorno (%)") +
  theme_minimal(base_size = 12)

Figure 6.3: Volatilidad rodante de 30 días para BTC

6.4 Tabla de Riesgo por Criptomoneda

tabla_riesgo <- df_365 |>
  group_by(simbolo) |>
  summarise(
    `Ret. Medio (%)` = round(mean(retorno, na.rm = TRUE), 3),
    `Desv. Est. (%)` = round(sd(retorno, na.rm = TRUE), 3),
    `VaR 95% (%)`    = round(quantile(retorno, 0.05, na.rm = TRUE), 3),
    `Días pos. (%)`  = round(sum(retorno > 0, na.rm = TRUE) / n() * 100, 1)
  ) |>
  arrange(desc(`Ret. Medio (%)`))

kable(tabla_riesgo,
      caption = "Métricas de retorno y riesgo — últimos 365 días",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
  column_spec(4, color = ifelse(tabla_riesgo$`VaR 95% (%)` < -3, "red", "black"))

Table 6.1: Table 6.2: Métricas de retorno y riesgo — últimos 365 días
simbolo	Ret. Medio (%)	Desv. Est. (%)	VaR 95% (%)	Días pos. (%)
USD1	Inf	NaN	-0.100	38.7
ZEC	1.015	7.893	-8.682	50.8
ETH	0.133	3.767	-5.189	51.1
TAO	0.080	5.225	-7.736	46.2
USDC	0.000	0.008	-0.010	21.9
USDT	0.000	0.040	-0.100	19.9
BTC	-0.029	2.230	-3.533	49.7
DOGE	-0.031	4.488	-6.072	44.5
XRP	-0.053	3.599	-4.914	43.2
SOL	-0.067	3.827	-5.748	49.2

El VaR 95% representa la pérdida máxima esperada en el 5% de los peores días históricos. Un valor de -5% indica que en el 5% de los días más adversos, la pérdida fue de al menos 5%.

7 Análisis de Correlaciones

7.1 Matriz de Correlación (Pearson)

df_wide <- df_365 |>
  select(fecha, simbolo, retorno) |>
  pivot_wider(names_from = simbolo, values_from = retorno) |>
  select(-fecha)

mat_cor <- cor(df_wide, use = "complete.obs", method = "pearson")

df_cor_long <- as.data.frame(as.table(mat_cor)) |>
  rename(Var1 = Var1, Var2 = Var2, Corr = Freq)

ggplot(df_cor_long, aes(x = Var1, y = Var2, fill = Corr)) +
  geom_tile(color = "white", linewidth = 0.4) +
  geom_text(aes(label = round(Corr, 2)), size = 3, color = "black") +
  scale_fill_gradient2(low = "#3498db", mid = "white", high = "#e74c3c",
                       midpoint = 0, limits = c(-1, 1), name = "Pearson r") +
  labs(title = "Matriz de Correlación — Retornos Diarios (Pearson)",
       x = NULL, y = NULL) +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Figure 7.1: Mapa de calor de correlaciones de Pearson entre retornos diarios (últimos 365 días)

7.2 Scatter BTC vs ETH

df_scatter <- df_365 |>
  filter(simbolo %in% c("BTC", "ETH")) |>
  select(fecha, simbolo, retorno) |>
  pivot_wider(names_from = simbolo, values_from = retorno) |>
  drop_na()

ggplot(df_scatter, aes(x = BTC, y = ETH)) +
  geom_point(alpha = 0.4, color = "#627EEA", size = 1.5) +
  geom_smooth(method = "lm", se = TRUE, color = "#e94560", linewidth = 1) +
  labs(title = "Retornos Diarios: BTC vs ETH",
       x = "Retorno BTC (%)", y = "Retorno ETH (%)") +
  theme_minimal(base_size = 12)

Figure 7.2: Diagrama de dispersión BTC vs ETH — retornos diarios

8 Comparador de Rendimiento

8.1 Rendimiento Acumulado (Base 100)

df_comp <- hist_data |>
  filter(fecha >= max(fecha) - 365) |>
  arrange(fecha) |>
  group_by(simbolo) |>
  mutate(ini = first(close), norm = close / ini * 100) |>
  ungroup()

ggplot(df_comp, aes(x = fecha, y = norm, color = simbolo)) +
  geom_line(linewidth = 0.8, alpha = 0.9) +
  geom_hline(yintercept = 100, linetype = "dashed", color = "gray50") +
  scale_color_manual(values = COLOR_PALETTE) +
  labs(title = "Rendimiento Acumulado — Base 100 (últimos 365 días)",
       x = NULL, y = "Índice (base = 100)", color = "Moneda") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "right")

Figure 8.1: Rendimiento acumulado normalizado en base 100 (desde el inicio del período)

8.2 Tabla Comparativa de Rendimiento

tabla_comp <- hist_data |>
  filter(fecha >= max(fecha) - 365) |>
  group_by(simbolo) |>
  summarise(
    `P. Inicial ($)` = round(first(close), 4),
    `P. Final ($)`   = round(last(close), 4),
    `Rend. (%)`      = round((last(close) - first(close)) / first(close) * 100, 2),
    `Vol. (%)`       = round(sd(retorno, na.rm = TRUE), 3)
  ) |>
  arrange(desc(`Rend. (%)`))

kable(tabla_comp,
      caption = "Comparativa de rendimiento — últimos 365 días",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
  column_spec(4, color = ifelse(tabla_comp$`Rend. (%)` >= 0, "#2ecc71", "#e74c3c"),
              bold = TRUE)

Table 8.1: Table 8.2: Comparativa de rendimiento — últimos 365 días
simbolo	P. Inicial ($) </th> <th style="text-align:right;"> P. Final ($)	Rend. (%)	Vol. (%)
ZEC	41.8800	576.3000	1276.07	7.893
ETH	2207.1900	2278.8100	3.24	3.767
USDC	1.0000	0.9999	-0.01	0.008
USDT	0.9999	0.9997	-0.02	0.040
USD1	0.9999	0.9993	-0.06	NaN
BTC	103259.0000	79572.9500	-22.94	2.230
TAO	423.2000	302.8200	-28.45	5.225
XRP	2.3270	1.3840	-40.52	3.599
DOGE	0.1982	0.1064	-46.32	4.488
SOL	164.4400	88.2200	-46.35	3.827

9 Análisis EDA Orientado al Modelo

9.1 Boxplot Mensual de Retornos — BTC

df_btc_mes <- df_btc |>
  mutate(mes = floor_date(fecha, "month"),
         mes_label = format(mes, "%b %Y"))

ggplot(df_btc_mes, aes(x = reorder(mes_label, mes), y = retorno)) +
  geom_boxplot(fill = "#3498db", alpha = 0.7, outlier.size = 1) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray40") +
  labs(title = "Retornos Diarios de BTC por Mes",
       x = NULL, y = "Retorno (%)") +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7))

Figure 9.1: Boxplot mensual de retornos diarios de BTC

9.2 Funciones ACF y PACF

Las funciones de autocorrelación son fundamentales para identificar los órdenes del modelo ARIMA:

ACF (Autocorrelation Function): indica el orden q del componente MA.
PACF (Partial Autocorrelation Function): indica el orden p del componente AR.

serie_btc <- na.omit(df_btc$retorno)
ci_val <- qnorm(0.975) / sqrt(length(serie_btc))

acf_r  <- acf(serie_btc,  plot = FALSE, lag.max = 40)
pacf_r <- pacf(serie_btc, plot = FALSE, lag.max = 40)

df_acf  <- data.frame(Lag = as.numeric(acf_r$lag[-1]),  ACF  = as.numeric(acf_r$acf[-1]))
df_pacf <- data.frame(Lag = as.numeric(pacf_r$lag),     PACF = as.numeric(pacf_r$acf))

p_acf <- ggplot(df_acf, aes(x = Lag, y = ACF)) +
  geom_bar(stat = "identity", fill = "#3498db", alpha = 0.7) +
  geom_hline(yintercept = c(ci_val, -ci_val), linetype = "dashed", color = "blue", linewidth = 0.7) +
  labs(title = "ACF — Retornos BTC") + theme_minimal(base_size = 11)

p_pacf <- ggplot(df_pacf, aes(x = Lag, y = PACF)) +
  geom_bar(stat = "identity", fill = "#e74c3c", alpha = 0.7) +
  geom_hline(yintercept = c(ci_val, -ci_val), linetype = "dashed", color = "blue", linewidth = 0.7) +
  labs(title = "PACF — Retornos BTC") + theme_minimal(base_size = 11)

gridExtra::grid.arrange(p_acf, p_pacf, ncol = 2)

Figure 9.2: ACF y PACF de los retornos diarios de BTC

9.3 Estadísticas Descriptivas

stats_desc <- hist_data |>
  group_by(simbolo) |>
  summarise(
    n      = n(),
    Media  = round(mean(close, na.rm = TRUE), 2),
    Mediana= round(median(close, na.rm = TRUE), 2),
    DS     = round(sd(close, na.rm = TRUE), 2),
    Min    = round(min(close, na.rm = TRUE), 4),
    Max    = round(max(close, na.rm = TRUE), 2),
    IQR    = round(IQR(close, na.rm = TRUE), 2)
  )

kable(stats_desc,
      caption = "Estadísticas descriptivas del precio de cierre por moneda",
      booktabs = TRUE,
      col.names = c("Símbolo","n","Media","Mediana","Desv. Est.","Mín.","Máx.","IQR")) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE)

Table 9.1: Table 9.2: Estadísticas descriptivas del precio de cierre por moneda
Símbolo	n	Media	Mediana	Desv. Est.	Mín.	Máx.	IQR
BTC	1905	56298.29	50431.63	29296.01	15760.1900	124723.00	45328.37
DOGE	1905	0.15	0.12	0.09	0.0477	0.69	0.12
ETH	1905	2552.70	2436.20	880.85	994.4100	4830.60	1383.69
SOL	1905	100.87	96.85	68.59	9.6410	261.87	122.76
TAO	702	350.61	331.72	109.96	145.7400	713.91	143.12
USD1	326	1.00	1.00	0.00	0.9986	1.00	0.00
USDC	1905	1.00	1.00	0.00	0.9679	1.00	0.00
USDT	1905	1.00	1.00	0.00	0.9940	1.01	0.00
XRP	1905	1.08	0.63	0.81	0.3072	3.55	0.93
ZEC	1905	102.04	47.96	112.07	18.3100	698.97	105.23

9.4 Test de Estacionariedad (ADF)

La prueba Augmented Dickey-Fuller (ADF) contrasta la hipótesis nula de que la serie tiene una raíz unitaria (no es estacionaria):

H₀: la serie tiene raíz unitaria (no estacionaria)
H₁: la serie es estacionaria
Se rechaza H₀ si p-valor < 0.05

test_stationarity <- function(serie) {
  tryCatch({
    test <- adf.test(na.omit(as.numeric(serie)), k = trunc((length(na.omit(serie)) - 1)^(1/3)))
    list(
      estadistico    = test$statistic,
      p_valor        = test$p.value,
      es_estacionaria = test$p.value < 0.05,
      conclusion     = ifelse(test$p.value < 0.05,
                              "Serie estacionaria (rechaza H₀)",
                              "Serie NO estacionaria (no rechaza H₀)")
    )
  }, error = function(e) list(estadistico=NA, p_valor=NA, es_estacionaria=FALSE, conclusion="Error"))
}

adf_resultados <- lapply(unique(hist_data$simbolo), function(cr) {
  serie <- na.omit(hist_data[hist_data$simbolo == cr, "retorno"][[1]])
  res   <- test_stationarity(serie)
  data.frame(
    Moneda        = cr,
    `ADF Stat.`   = round(res$estadistico, 4),
    `p-valor`     = round(res$p_valor, 6),
    Estacionaria  = ifelse(res$es_estacionaria, "✅ Sí", "❌ No"),
    `d recomendado` = ifelse(res$es_estacionaria, 0, 1)
  )
}) |> bind_rows()

kable(adf_resultados,
      caption = "Resultados del Test ADF para retornos diarios por criptomoneda",
      booktabs = TRUE,
      row.names = FALSE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE)

Table 9.3: Table 9.4: Resultados del Test ADF para retornos diarios por criptomoneda
Moneda	ADF.Stat.	p.valor	Estacionaria	d.recomendado
BTC	NA	NA	❌ No \|	1\|
ETH	NA	NA	❌ No \|	1\|
USDC	NA	NA	❌ No \|	1\|
SOL	NA	NA	❌ No \|	1\|
XRP	NA	NA	❌ No \|	1\|
TAO	NA	NA	❌ No \|	1\|
USDT	NA	NA	❌ No \|	1\|
DOGE	NA	NA	❌ No \|	1\|
USD1	NA	NA	❌ No \|	1\|
ZEC	NA	NA	❌ No \|	1\|

9.5 Descomposición STL — BTC

La descomposición STL (Seasonal and Trend decomposition using Loess) separa la serie en:

Y(t) = T(t) + S(t) + R(t)

Donde: T(t) = Tendencia · S(t) = Estacionalidad · R(t) = Residuo

serie_stl <- ts(na.omit(df_btc$retorno), frequency = 7)
decomp    <- stl(serie_stl, s.window = "periodic", robust = TRUE)

autoplot(decomp) +
  labs(title = "Descomposición STL — Retornos BTC") +
  theme_minimal(base_size = 12)

Figure 9.3: Descomposición STL de los retornos de BTC (frecuencia semanal)

10 Modelado ARIMA

10.1 Marco Teórico

El modelo ARIMA(p, d, q) combina tres componentes:

Componente	Símbolo	Descripción
Autoregresivo	AR(p)	Dependencia en p rezagos pasados
Integrado	I(d)	Diferenciación para estacionariedad
Media Móvil	MA(q)	Dependencia en q errores pasados

Ecuación general:
φ(B)(1−B)ᵈ yₜ = θ(B) εₜ

Donde B es el operador de rezago, φ(B) el polinomio AR, θ(B) el polinomio MA y εₜ ruido blanco.

10.2 Función: Detección de Estacionariedad

detect_stationarity_and_differentiate <- function(serie, max_d = 2) {
  serie_clean <- as.numeric(na.omit(serie))
  d_val <- 0
  current <- serie_clean

  for (i in seq_len(max_d)) {
    tryCatch({
      test <- adf.test(current, k = trunc((length(current)-1)^(1/3)))
      if (test$p.value < 0.05) break
      d_val    <- i
      current  <- diff(current)
    }, error = function(e) break)
  }
  list(is_stationary = d_val == 0, d_value = d_val)
}

10.3 Ajuste del Mejor Modelo — BTC

set.seed(42)
df_btc_clean <- hist_data |>
  filter(simbolo == "BTC") |>
  arrange(fecha) |>
  handle_missing(method = "interpolation")

serie_full  <- na.omit(df_btc_clean$close)
n           <- length(serie_full)
n_train     <- floor(n * 0.8)
train       <- serie_full[1:n_train]
test        <- serie_full[(n_train + 1):n]
fechas_all  <- df_btc_clean$fecha[!is.na(df_btc_clean$close)]
train_dates <- fechas_all[1:n_train]
test_dates  <- fechas_all[(n_train + 1):n]

# Detectar d óptimo
stat_res  <- detect_stationarity_and_differentiate(train)
d_opt     <- stat_res$d_value
cat("d óptimo detectado:", d_opt, "\n")

## d óptimo detectado: 1

# Búsqueda de mejor (p, q) por AIC
pq_rng    <- 0:3
best_aic  <- Inf
best_order <- c(1, d_opt, 1)
best_model <- NULL
results   <- list()

for (p in pq_rng) {
  for (q in pq_rng) {
    if (p == 0 && d_opt == 0 && q == 0) next
    tryCatch({
      m   <- Arima(train, order = c(p, d_opt, q), method = "ML")
      aic <- AIC(m)
      results[[length(results)+1]] <- data.frame(p=p, d=d_opt, q=q, AIC=round(aic,2))
      if (aic < best_aic) { best_aic <- aic; best_order <- c(p, d_opt, q); best_model <- m }
    }, error = function(e) {})
  }
}

cat(sprintf("Mejor modelo: ARIMA(%d,%d,%d) — AIC = %.2f\n",
            best_order[1], best_order[2], best_order[3], best_aic))

## Mejor modelo: ARIMA(1,1,0) — AIC = 26687.81

10.4 Top Modelos por AIC

df_results <- bind_rows(results) |>
  arrange(AIC) |>
  head(10) |>
  mutate(Modelo = paste0("ARIMA(", p, ",", d, ",", q, ")")) |>
  select(Modelo, AIC)

kable(df_results,
      caption = "Top 10 modelos ARIMA por criterio AIC — BTC precio de cierre",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) |>
  row_spec(1, bold = TRUE, background = "#d5f0e0")

Table 10.1: Table 10.2: Top 10 modelos ARIMA por criterio AIC — BTC precio de cierre
Modelo	AIC
ARIMA(1,1,0)	26687.81
ARIMA(0,1,1)	26687.96
ARIMA(2,1,0)	26689.71
ARIMA(0,1,2)	26689.72
ARIMA(1,1,1)	26689.73
ARIMA(3,1,3)	26691.11
ARIMA(3,1,0)	26691.70
ARIMA(2,1,1)	26691.71
ARIMA(0,1,3)	26691.72
ARIMA(1,1,2)	26691.72

10.5 Diagnóstico de Residuos

resid_model <- residuals(best_model)

p1 <- ggplot(data.frame(x = 1:length(resid_model), y = resid_model),
             aes(x = x, y = y)) +
  geom_line(color = "#3498db", linewidth = 0.5) +
  labs(title = "Residuos en el tiempo", x = "Índice", y = "Residuo") +
  theme_minimal(base_size = 11)

p2 <- ggplot(data.frame(x = resid_model), aes(x = x)) +
  geom_histogram(bins = 40, fill = "#627EEA", alpha = 0.8, color = "white") +
  labs(title = "Distribución de Residuos", x = "Residuo", y = "Frecuencia") +
  theme_minimal(base_size = 11)

gridExtra::grid.arrange(p1, p2, ncol = 2)

Figure 10.1: Diagnóstico de residuos del mejor modelo ARIMA

# Test de normalidad (Shapiro-Wilk)
sw_test <- if (length(resid_model) > 5000)
  shapiro.test(sample(resid_model, 5000)) else shapiro.test(resid_model)

# Test de independencia (Ljung-Box)
lb_test <- Box.test(resid_model, lag = 10, type = "Ljung-Box")

tests_df <- data.frame(
  Test         = c("Shapiro-Wilk (Normalidad)", "Ljung-Box (Independencia)"),
  `p-valor`    = round(c(sw_test$p.value, lb_test$p.value), 6),
  Conclusión   = c(
    ifelse(sw_test$p.value > 0.05, "✅ Residuos normales", "⚠️ No normales"),
    ifelse(lb_test$p.value > 0.05, "✅ Residuos independientes", "⚠️ Autocorrelación presente")
  )
)

kable(tests_df,
      caption = "Tests sobre los residuos del mejor modelo ARIMA",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Table 10.3: Table 10.4: Tests sobre los residuos del mejor modelo ARIMA
Test	p.valor	Conclusión
Shapiro-Wilk (Normalidad)	0.000000	⚠️ No normales
Ljung-Box (Independencia)	0.067679	✅ Residuos independientes \|

10.6 Forecast y Métricas de Error

horizonte <- length(test)
fc        <- forecast(best_model, h = horizonte)
pred      <- as.numeric(fc$mean)

# Métricas
mae  <- mean(abs(pred - test))
rmse <- sqrt(mean((pred - test)^2))
mape <- mean(abs((pred - test) / test)) * 100
r2   <- cor(pred, test)^2

metricas <- data.frame(
  Métrica = c("MAE (USD)", "RMSE (USD)", "MAPE (%)", "R²"),
  Valor   = round(c(mae, rmse, mape, r2), 4)
)
kable(metricas,
      caption = sprintf("Métricas de error — ARIMA(%d,%d,%d) en conjunto de test",
                        best_order[1], best_order[2], best_order[3]),
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Table 10.5: Table 10.6: Métricas de error — ARIMA(1,1,0) en conjunto de test
Métrica	Valor
MAE (USD)	15437.3115
RMSE (USD)	17588.6850
MAPE (%)	16.8523
R²	0.0000

n_hist <- min(200, length(train))
df_forecast <- bind_rows(
  data.frame(Fecha = tail(train_dates, n_hist),
             Precio = tail(train, n_hist), Tipo = "Entrenamiento"),
  data.frame(Fecha = test_dates[1:horizonte],
             Precio = test[1:horizonte],  Tipo = "Test Real"),
  data.frame(Fecha = test_dates[1:horizonte],
             Precio = pred,               Tipo = "Predicción"),
  data.frame(Fecha = test_dates[1:horizonte],
             Precio = as.numeric(fc$lower[,2]), Tipo = "IC 95% Inferior"),
  data.frame(Fecha = test_dates[1:horizonte],
             Precio = as.numeric(fc$upper[,2]), Tipo = "IC 95% Superior")
)

df_ribbon <- data.frame(
  Fecha = test_dates[1:horizonte],
  Lower = as.numeric(fc$lower[,2]),
  Upper = as.numeric(fc$upper[,2])
)

df_lines <- df_forecast |> filter(Tipo %in% c("Entrenamiento","Test Real","Predicción"))

ggplot() +
  geom_ribbon(data = df_ribbon, aes(x = Fecha, ymin = Lower, ymax = Upper),
              fill = "#3498db", alpha = 0.2) +
  geom_line(data = df_lines, aes(x = Fecha, y = Precio, color = Tipo, linewidth = Tipo)) +
  scale_color_manual(values = c(
    "Entrenamiento" = "#2c3e50",
    "Test Real"     = "#3498db",
    "Predicción"    = "#e74c3c"
  )) +
  scale_linewidth_manual(values = c(
    "Entrenamiento" = 0.7, "Test Real" = 0.9, "Predicción" = 1.1
  )) +
  scale_y_continuous(labels = dollar) +
  labs(
    title    = sprintf("ARIMA(%d,%d,%d) — Predicción vs Realidad — BTC",
                       best_order[1], best_order[2], best_order[3]),
    subtitle = sprintf("MAE: $%.0f | RMSE: $%.0f | MAPE: %.2f%% | R²: %.4f",
                       mae, rmse, mape, r2),
    x = NULL, y = "Precio (USD)",
    color = NULL, linewidth = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")

Figure 10.2: Ajuste del modelo ARIMA vs valores reales (BTC)

11 Predicción Óptima

11.1 Búsqueda Automática del Mejor Modelo

# Usar los 1000 días más recientes, predicción 30 días
n_pred_train <- min(1000, floor(n * 0.85))
train_opt    <- serie_full[1:n_pred_train]
train_dates_opt <- fechas_all[1:n_pred_train]
horizonte_opt   <- 30

stat_opt <- detect_stationarity_and_differentiate(train_opt)
d_opt2   <- stat_opt$d_value

best_bic  <- Inf
best_ord2 <- c(1, d_opt2, 1)
best_mod2 <- NULL

for (p in 0:3) {
  for (q in 0:3) {
    if (p == 0 && d_opt2 == 0 && q == 0) next
    tryCatch({
      m   <- Arima(train_opt, order = c(p, d_opt2, q), method = "ML")
      b   <- BIC(m)
      if (b < best_bic) { best_bic <- b; best_ord2 <- c(p, d_opt2, q); best_mod2 <- m }
    }, error = function(e) {})
  }
}

cat(sprintf("Mejor modelo (BIC): ARIMA(%d,%d,%d) — BIC = %.2f\n",
            best_ord2[1], best_ord2[2], best_ord2[3], best_bic))

## Mejor modelo (BIC): ARIMA(0,1,0) — BIC = 17130.12

11.2 Predicción a 30 Días

fc_opt  <- forecast(best_mod2, h = horizonte_opt)
last_d  <- tail(train_dates_opt, 1)
pred_dates <- seq(last_d + 1, by = "day", length.out = horizonte_opt)

df_pred_opt <- data.frame(
  Fecha      = pred_dates,
  Prediccion = as.numeric(fc_opt$mean),
  Lower_80   = as.numeric(fc_opt$lower[,1]),
  Upper_80   = as.numeric(fc_opt$upper[,1]),
  Lower_95   = as.numeric(fc_opt$lower[,2]),
  Upper_95   = as.numeric(fc_opt$upper[,2])
)

n_ctx <- min(120, length(train_opt))
df_ctx <- data.frame(
  Fecha  = tail(train_dates_opt, n_ctx),
  Precio = tail(train_opt, n_ctx)
)

ggplot() +
  geom_line(data = df_ctx, aes(x = Fecha, y = Precio), color = "#2c3e50", linewidth = 0.8) +
  geom_ribbon(data = df_pred_opt,
              aes(x = Fecha, ymin = Lower_95, ymax = Upper_95), fill = "#3498db", alpha = 0.18) +
  geom_ribbon(data = df_pred_opt,
              aes(x = Fecha, ymin = Lower_80, ymax = Upper_80), fill = "#3498db", alpha = 0.28) +
  geom_line(data = df_pred_opt, aes(x = Fecha, y = Prediccion),
            color = "#e74c3c", linewidth = 1.1) +
  scale_y_continuous(labels = dollar) +
  labs(
    title    = sprintf("Predicción Óptima 30 días — BTC | ARIMA(%d,%d,%d) por BIC",
                       best_ord2[1], best_ord2[2], best_ord2[3]),
    subtitle = "Azul: IC 80% y 95% | Rojo: Predicción puntual | Gris: Historial",
    x = NULL, y = "Precio (USD)"
  ) +
  theme_minimal(base_size = 12)

Figure 11.1: Predicción óptima a 30 días para BTC con intervalos de confianza

11.3 Tabla de Predicciones

df_pred_opt |>
  mutate(
    Fecha      = format(Fecha, "%d/%m/%Y"),
    Prediccion = dollar(Prediccion, accuracy = 1),
    `IC 80%`   = paste0(dollar(Lower_80, accuracy=1), " — ", dollar(Upper_80, accuracy=1)),
    `IC 95%`   = paste0(dollar(Lower_95, accuracy=1), " — ", dollar(Upper_95, accuracy=1))
  ) |>
  select(Fecha, Prediccion, `IC 80%`, `IC 95%`) |>
  kable(caption = "Predicciones diarias a 30 días con intervalos de confianza",
        booktabs = TRUE, row.names = FALSE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = TRUE) |>
  scroll_box(height = "400px")

Table 11.1: Table 11.2: Predicciones diarias a 30 días con intervalos de confianza
Fecha	Prediccion	IC 80%	IC 95%
16/11/2023	$37,884	$36,249 — $39,519	$35,384 — $40,384
17/11/2023	$37,884	$35,572 — $40,196	$34,348 — $41,420
18/11/2023	$37,884	$35,053 — $40,716	$33,554 — $42,215
19/11/2023	$37,884	$34,614 — $41,154	$32,884 — $42,885
20/11/2023	$37,884	$34,229 — $41,540	$32,293 — $43,475
21/11/2023	$37,884	$33,880 — $41,889	$31,760 — $44,009
22/11/2023	$37,884	$33,559 — $42,210	$31,269 — $44,499
23/11/2023	$37,884	$33,260 — $42,508	$30,812 — $44,956
24/11/2023	$37,884	$32,980 — $42,789	$30,383 — $45,385
25/11/2023	$37,884	$32,714 — $43,054	$29,978 — $45,791
26/11/2023	$37,884	$32,462 — $43,306	$29,592 — $46,177
27/11/2023	$37,884	$32,221 — $43,547	$29,223 — $46,545
28/11/2023	$37,884	$31,990 — $43,779	$28,869 — $46,899
29/11/2023	$37,884	$31,767 — $44,001	$28,529 — $47,239
30/11/2023	$37,884	$31,552 — $44,216	$28,201 — $47,568
01/12/2023	$37,884	$31,345 — $44,424	$27,883 — $47,885
02/12/2023	$37,884	$31,144 — $44,625	$27,575 — $48,193
03/12/2023	$37,884	$30,948 — $44,820	$27,276 — $48,492
04/12/2023	$37,884	$30,758 — $45,010	$26,986 — $48,783
05/12/2023	$37,884	$30,573 — $45,195	$26,703 — $49,066
06/12/2023	$37,884	$30,392 — $45,376	$26,426 — $49,342
07/12/2023	$37,884	$30,216 — $45,552	$26,157 — $49,612
08/12/2023	$37,884	$30,044 — $45,725	$25,893 — $49,875
09/12/2023	$37,884	$29,875 — $45,893	$25,635 — $50,133
10/12/2023	$37,884	$29,710 — $46,058	$25,383 — $50,386
11/12/2023	$37,884	$29,548 — $46,220	$25,135 — $50,633
12/12/2023	$37,884	$29,389 — $46,379	$24,892 — $50,876
13/12/2023	$37,884	$29,233 — $46,535	$24,654 — $51,115
14/12/2023	$37,884	$29,080 — $46,688	$24,420 — $51,349
15/12/2023	$37,884	$28,930 — $46,839	$24,190 — $51,579

12 Arquitectura del Dashboard

12.1 Estructura del Proyecto

Crypto_EDA_App/
├── global.R      # Librerías, config API, funciones, carga de datos
├── ui.R          # Interfaz (shinydashboard): 10 pestañas
├── server.R      # Lógica reactiva y renderizado de gráficos
├── README.md     # Documentación
└── www/
    └── crypto.svg

12.2 Módulos del Dashboard

modulos <- tibble(
  Pestaña    = c("Introducción","Visión General","Precios","Retornos & Riesgo",
                 "Correlaciones","Comparador","Análisis EDA","Modelo ARIMA",
                 "Predicción Óptima","Valores Faltantes"),
  Contenido  = c(
    "Hero banner, value boxes, tarjetas de monedas, objetivo, equipo",
    "Precios spot, cap. de mercado, volumen 24h en tiempo real",
    "Boxplot, serie temporal, candlestick (60 días), Bandas de Bollinger",
    "Boxplot retornos, histograma, volatilidad rodante 30d, VaR 95%",
    "Heatmap Pearson/Spearman, scatter entre pares seleccionables",
    "Rendimiento acumulado normalizado base 100 o %, tabla comparativa",
    "Boxplot, ACF, PACF, test ADF, descomposición STL, estadísticas",
    "Búsqueda automática ARIMA (AIC/BIC/HQIC), rolling/directo, diagnóstico",
    "Mejor modelo automático, predicción a N días, IC 80% y 95%",
    "Resumen NAs, heatmap, imputación (interpolación / media / eliminar)"
  )
)

kable(modulos, caption = "Módulos del Crypto EDA Dashboard",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
  column_spec(1, bold = TRUE, width = "2.5cm") |>
  column_spec(2, width = "12cm")

Table 12.1: Table 12.2: Módulos del Crypto EDA Dashboard
Pestaña	Contenido
Introducción	Hero banner, value boxes, tarjetas de monedas, objetivo, equipo
Visión General	Precios spot, cap. de mercado, volumen 24h en tiempo real
Precios	Boxplot, serie temporal, candlestick (60 días), Bandas de Bollinger
Retornos & Riesgo	Boxplot retornos, histograma, volatilidad rodante 30d, VaR 95%
Correlaciones	Heatmap Pearson/Spearman, scatter entre pares seleccionables
Comparador	Rendimiento acumulado normalizado base 100 o %, tabla comparativa
Análisis EDA	Boxplot, ACF, PACF, test ADF, descomposición STL, estadísticas
Modelo ARIMA	Búsqueda automática ARIMA (AIC/BIC/HQIC), rolling/directo, diagnóstico
Predicción Óptima	Mejor modelo automático, predicción a N días, IC 80% y 95%
Valores Faltantes	Resumen NAs, heatmap, imputación (interpolación / media / eliminar)

12.3 Funciones Principales

funciones <- tibble(
  Función = c(
    "get_historical_daily()",
    "get_price_overview()",
    "calculate_bollinger_bands()",
    "handle_missing()",
    "missing_summary()",
    "perform_stl_decomposition_safe()",
    "detect_stationarity_and_differentiate()",
    "get_best_model_for_prediction()",
    "analyze_residuals()"
  ),
  Archivo   = c("global.R","global.R","global.R","global.R","global.R",
                "global.R","global.R","global.R","global.R"),
  Descripción = c(
    "Descarga OHLCV histórico de CryptoCompare (hasta 1905 días)",
    "Obtiene precio spot, cap. de mercado y volumen 24h en tiempo real",
    "Calcula SMA, banda superior e inferior dados window y k",
    "Imputa NAs: interpolación lineal, media global o eliminación",
    "Devuelve tabla de variables con NAs y su porcentaje",
    "Descomposición STL robusta; maneja series cortas con fallback",
    "Aplica test ADF iterativo para determinar el orden d óptimo",
    "Busca el mejor ARIMA(p,d,q) por AIC/BIC/HQIC sobre el conjunto de entrenamiento",
    "Ejecuta tests Shapiro-Wilk y Ljung-Box sobre los residuos del modelo"
  )
)

kable(funciones, caption = "Funciones principales definidas en global.R",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = TRUE, font_size = 11) |>
  column_spec(1, bold = TRUE, monospace = TRUE, width = "4.5cm") |>
  column_spec(2, width = "2cm") |>
  column_spec(3, width = "9cm")

Table 12.3: Table 12.4: Funciones principales definidas en global.R
Función	Archivo	Descripción
get_historical_daily()	global.R	Descarga OHLCV histórico de CryptoCompare (hasta 1905 días)
get_price_overview()	global.R	Obtiene precio spot, cap. de mercado y volumen 24h en tiempo real
calculate_bollinger_bands()	global.R	Calcula SMA, banda superior e inferior dados window y k
handle_missing()	global.R	Imputa NAs: interpolación lineal, media global o eliminación
missing_summary()	global.R	Devuelve tabla de variables con NAs y su porcentaje
perform_stl_decomposition_safe()	global.R	Descomposición STL robusta; maneja series cortas con fallback
detect_stationarity_and_differentiate()	global.R	Aplica test ADF iterativo para determinar el orden d óptimo
get_best_model_for_prediction()	global.R	Busca el mejor ARIMA(p,d,q) por AIC/BIC/HQIC sobre el conjunto de entrenamiento
analyze_residuals()	global.R	Ejecuta tests Shapiro-Wilk y Ljung-Box sobre los residuos del modelo

12.4 Dependencias

deps <- tibble(
  Librería   = c("shiny","shinydashboard","tidyverse","plotly","DT",
                 "httr","jsonlite","zoo","forecast","tseries",
                 "lubridate","scales","rsconnect"),
  Uso        = c(
    "Framework reactivo de la app",
    "Layout tipo dashboard con sidebar y pestañas",
    "Manipulación y transformación de datos",
    "Gráficos interactivos con animaciones y tooltips",
    "Tablas interactivas con filtros y paginación",
    "Conexión HTTP a la API REST de CryptoCompare",
    "Parseo de respuestas JSON de la API",
    "Medias móviles, rollmean, rollapply, na.approx",
    "auto.arima(), Arima(), forecast(), ACF/PACF",
    "Test Augmented Dickey-Fuller (adf.test)",
    "Manipulación de fechas (floor_date, as_date)",
    "Formato de ejes y etiquetas (dollar, percent)",
    "Publicación en shinyapps.io"
  )
)

kable(deps, caption = "Dependencias del proyecto",
      booktabs = TRUE) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = TRUE, font_size = 11) |>
  column_spec(1, bold = TRUE, monospace = TRUE)

Table 12.5: Table 12.6: Dependencias del proyecto
Librería	Uso
shiny	Framework reactivo de la app
shinydashboard	Layout tipo dashboard con sidebar y pestañas
tidyverse	Manipulación y transformación de datos
plotly	Gráficos interactivos con animaciones y tooltips
DT	Tablas interactivas con filtros y paginación
httr	Conexión HTTP a la API REST de CryptoCompare
jsonlite	Parseo de respuestas JSON de la API
zoo	Medias móviles, rollmean, rollapply, na.approx
forecast	auto.arima(), Arima(), forecast(), ACF/PACF
tseries	Test Augmented Dickey-Fuller (adf.test)
lubridate	Manipulación de fechas (floor_date, as_date)
scales	Formato de ejes y etiquetas (dollar, percent)
rsconnect	Publicación en shinyapps.io

13 Ecuaciones y Marco Matemático

13.1 Resumen de Fórmulas

Concepto	Fórmula
Retorno simple	$r_t = \frac{P_t - P_{t-1}}{P_{t-1}} \times 100$
Retorno logarítmico	$r_t^{log} = \ln\left(\frac{P_t}{P_{t-1}}\right) \times 100$
Volatilidad diaria	$v_t = \frac{High_t - Low_t}{Open_t} \times 100$
SMA	$SMA_n = \frac{1}{n}\sum_{i=0}^{n-1} P_{t-i}$
Banda de Bollinger	$BB_{\pm} = SMA_n \pm k \cdot \sigma_n$
Ancho de banda	$\%Bw = \frac{BB_{+} - BB_{-}}{SMA_n} \times 100$
VaR 95%	$VaR_{95\%} = Q_{0.05}(r_t)$
Volatilidad rodante	$\sigma_{30} = \sqrt{\frac{1}{29}\sum_{i=0}^{29}(r_{t-i} - \bar{r})^2}$
Descomposición STL	$Y(t) = T(t) + S(t) + R(t)$
Modelo ARIMA(p,d,q)	$\phi(B)(1-B)^d y_t = \theta(B)\varepsilon_t$

14 Conclusiones y Próximos Pasos

14.1 Hallazgos Principales

Heterogeneidad de precios: el rango de precios entre las 10 criptomonedas abarca varios órdenes de magnitud (de $0.10 en DOGE a más de $50,000 en BTC), lo que hace necesario el uso de escala logarítmica en las comparativas.
Distribución de retornos: todos los activos muestran colas pesadas (fat tails) y ligera asimetría, comportamiento típico de activos financieros de alto riesgo. Las stablecoins (USDC, USDT, USD1) presentan retornos casi nulos con muy baja varianza.
Correlaciones: BTC y ETH muestran alta correlación positiva (~0.7–0.8 en períodos recientes), mientras que las stablecoins presentan correlación cercana a cero con el resto del mercado.
Estacionariedad: los precios de cierre son no-estacionarios (no rechazan H₀ en ADF), mientras que los retornos diarios sí son estacionarios en la mayoría de los activos, confirmando que d=1 es el orden adecuado para modelar precios.
Modelo ARIMA: se identificó automáticamente el mejor orden (p, d, q) por criterio AIC/BIC para cada activo. Los modelos exhiben residuos con baja autocorrelación (Ljung-Box no significativo), aunque la no-normalidad de los residuos sugiere la posibilidad de mejorar con modelos GARCH o variantes con distribución t-Student.

14.2 Próximos Pasos

Modelos GARCH: capturar la heterocedasticidad condicional (agrupamiento de volatilidad) presente en las series de criptomonedas.
Validación walk-forward: rolling forecast con ventana expandible para evaluación robusta fuera de muestra.
Modelos multivariados: VAR / VECM para explotar las correlaciones entre activos en la predicción.
Redes neuronales: comparar ARIMA con LSTM y Prophet para series temporales financieras.
Intervalos de predicción calibrados: mejorar la cobertura empírica de los IC mediante bootstrapping.

15 Información del Proyecto

15.1 Equipo

Integrante	Programa	GitHub
Mateo Barrios	Ciencia de Datos	Mateo3008
Rafael Romero	Ciencia de Datos	rafaelromero06

15.2 Cómo Reproducir Este Documento

# 1. Instalar dependencias
install.packages(c(
  "bookdown", "tidyverse", "plotly", "DT", "lubridate",
  "scales", "jsonlite", "httr", "zoo", "forecast", "tseries",
  "knitr", "kableExtra", "gridExtra"
))

# 2. Renderizar
bookdown::render_book("Crypto_EDA_Bookdown.Rmd", "bookdown::html_document2")

# O con rmarkdown directamente:
rmarkdown::render("Crypto_EDA_Bookdown.Rmd")

15.3 Fuentes

Datos: CryptoCompare API
Framework: R Shiny + shinydashboard
Despliegue: shinyapps.io
Repositorio: github.com/Mateo3008/Crypto_EDA_App

## Sesión R generada el 07/05/2026 22:21

## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=Spanish_Colombia.utf8  LC_CTYPE=Spanish_Colombia.utf8   
## [3] LC_MONETARY=Spanish_Colombia.utf8 LC_NUMERIC=C                     
## [5] LC_TIME=Spanish_Colombia.utf8    
## 
## time zone: America/Bogota
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.4.0 knitr_1.50       tseries_0.10-60  forecast_9.0.2  
##  [5] zoo_1.8-15       httr_1.4.8       jsonlite_2.0.0   scales_1.4.0    
##  [9] DT_0.34.0        plotly_4.12.0    lubridate_1.9.5  forcats_1.0.1   
## [13] stringr_1.5.1    dplyr_1.1.4      purrr_1.2.1      readr_2.1.6     
## [17] tidyr_1.3.2      tibble_3.3.1     ggplot2_4.0.2    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       xfun_0.52          bslib_0.9.0        htmlwidgets_1.6.4 
##  [5] lattice_0.22-7     tzdb_0.5.0         crosstalk_1.2.2    quadprog_1.5-8    
##  [9] vctrs_0.6.5        tools_4.5.1        generics_0.1.4     curl_7.0.0        
## [13] parallel_4.5.1     xts_0.14.1         pkgconfig_2.0.3    Matrix_1.7-3      
## [17] data.table_1.18.0  RColorBrewer_1.1-3 S7_0.2.1           lifecycle_1.0.4   
## [21] compiler_4.5.1     farver_2.1.2       textshaping_1.0.4  htmltools_0.5.8.1 
## [25] sass_0.4.10        yaml_2.3.10        lazyeval_0.2.2     pillar_1.11.1     
## [29] jquerylib_0.1.4    cachem_1.1.0       nlme_3.1-168       fracdiff_1.5-3    
## [33] tidyselect_1.2.1   digest_0.6.37      stringi_1.8.7      bookdown_0.46     
## [37] splines_4.5.1      labeling_0.4.3     fastmap_1.2.0      grid_4.5.1        
## [41] colorspace_2.1-2   cli_3.6.5          magrittr_2.0.3     withr_3.0.2       
## [45] timechange_0.4.0   TTR_0.24.4         rmarkdown_2.29     quantmod_0.4.28   
## [49] gridExtra_2.3      timeDate_4052.112  hms_1.1.4          urca_1.3-4        
## [53] evaluate_1.0.5     viridisLite_0.4.2  mgcv_1.9-3         rlang_1.1.6       
## [57] Rcpp_1.1.0         glue_1.8.0         xml2_1.5.2         svglite_2.2.2     
## [61] rstudioapi_0.17.1  R6_2.6.1           systemfonts_1.3.1

Concepto	Fórmula
Retorno simple	\(r_t = \frac{P_t - P_{t-1}}{P_{t-1}} \times 100\)
Retorno logarítmico	\(r_t^{log} = \ln\left(\frac{P_t}{P_{t-1}}\right) \times 100\)
Volatilidad diaria	\(v_t = \frac{High_t - Low_t}{Open_t} \times 100\)
SMA	\(SMA_n = \frac{1}{n}\sum_{i=0}^{n-1} P_{t-i}\)
Banda de Bollinger	\(BB_{\pm} = SMA_n \pm k \cdot \sigma_n\)
Ancho de banda	\(\%Bw = \frac{BB_{+} - BB_{-}}{SMA_n} \times 100\)
VaR 95%	\(VaR_{95\%} = Q_{0.05}(r_t)\)
Volatilidad rodante	\(\sigma_{30} = \sqrt{\frac{1}{29}\sum_{i=0}^{29}(r_{t-i} - \bar{r})^2}\)
Descomposición STL	\(Y(t) = T(t) + S(t) + R(t)\)
Modelo ARIMA(p,d,q)	\(\phi(B)(1-B)^d y_t = \theta(B)\varepsilon_t\)

Crypto EDA Dashboard

Análisis Exploratorio de Criptomonedas con R / Shiny

Mateo Barrios — Ciencia de Datos

Rafael Romero — Ciencia de Datos

07 de mayo de 2026