1. Introduction

This report analyzes the network structure of the cryptocurrency market to understand the relationships and influence among major assets. By modeling the market as a network, we can uncover its core structure, identify key players, and assess systemic properties like shock propagation and hierarchy.

2. Data preprocessing

We fetch daily OHLC files from CryptoDataDownload and keep the Close column.

data_dir <- "data/crypto_cdd" 
dir.create(data_dir, showWarnings = FALSE, recursive = TRUE)

tickers <- c("BTCUSDT","ETHUSDT","BNBUSDT","XRPUSDT","ADAUSDT",
             "SOLUSDT","DOGEUSDT","DOTUSDT","TRXUSDT","MATICUSDT")

grab_cdd <- function(pair){
  local_file <- file.path(data_dir, paste0("Binance_", pair, "_d.csv"))
  
  if (!file.exists(local_file)){
    url <- glue("https://www.cryptodatadownload.com/cdd/Binance_{pair}_d.csv")
    download.file(url, destfile = local_file, quiet = TRUE)
  }
  
  read_csv(local_file, skip = 1, col_types = cols(), show_col_types = FALSE) %>% 
    select(date = Date, close = Close) %>% 
    mutate(date = as_date(date)) %>% 
    arrange(date) %>% 
    rename(!!pair := close)
}

price_list <- map(tickers, grab_cdd)
prices     <- reduce(price_list, inner_join, by = "date")  # overlap window


xs <- xts(prices[-1], order.by = prices$date) %>% 
      na.locf()

summary(xs)
##      Index               BTCUSDT         ETHUSDT          BNBUSDT      
##  Min.   :2020-08-18   Min.   :10127   Min.   : 320.7   Min.   : 19.47  
##  1st Qu.:2021-08-23   1st Qu.:23450   1st Qu.:1567.7   1st Qu.:241.70  
##  Median :2022-08-29   Median :35413   Median :1920.8   Median :307.55  
##  Mean   :2022-08-29   Mean   :37274   Mean   :2178.9   Mean   :327.59  
##  3rd Qu.:2023-09-04   3rd Qu.:49068   3rd Qu.:2974.7   3rd Qu.:424.40  
##  Max.   :2024-09-09   Max.   :73072   Max.   :4808.0   Max.   :711.20  
##     XRPUSDT          ADAUSDT           SOLUSDT           DOGEUSDT       
##  Min.   :0.2114   Min.   :0.07663   Min.   :  1.198   Min.   :0.002514  
##  1st Qu.:0.4057   1st Qu.:0.32777   1st Qu.: 20.275   1st Qu.:0.063705  
##  Median :0.5233   Median :0.45420   Median : 34.825   Median :0.083690  
##  Mean   :0.5892   Mean   :0.70547   Mean   : 65.598   Mean   :0.114671  
##  3rd Qu.:0.6650   3rd Qu.:1.04350   3rd Qu.:109.233   3rd Qu.:0.148050  
##  Max.   :1.8347   Max.   :2.96600   Max.   :258.440   Max.   :0.689820  
##     DOTUSDT          TRXUSDT          MATICUSDT      
##  Min.   : 2.833   Min.   :0.02299   Min.   :0.01222  
##  1st Qu.: 5.340   1st Qu.:0.06103   1st Qu.:0.54038  
##  Median : 7.027   Median :0.07114   Median :0.81565  
##  Mean   :12.943   Mean   :0.07869   Mean   :0.86071  
##  3rd Qu.:18.152   3rd Qu.:0.10225   3rd Qu.:1.12607  
##  Max.   :53.820   Max.   :0.16640   Max.   :2.87600

3. Log-returns and Correlation matrix

Log-returns

df_price <- fortify.zoo(xs) %>%
  pivot_longer(-Index, names_to = "coin", values_to = "price")

df_ret <- df_price %>%
  group_by(coin) %>%
  arrange(Index) %>%
  mutate(return = c(NA, diff(log(price)))) %>%
  ungroup()

roll_sd <- function(x) rollapplyr(x, 30, sd, fill = NA)

vol_df <- df_ret %>%
  group_by(coin) %>%
  mutate(roll_vol = roll_sd(return)) %>%
  ungroup()

corr_df <- df_ret %>%
  filter(coin %in% c("BTCUSDT", "ETHUSDT")) %>%
  pivot_wider(names_from = coin, values_from = return) %>%
  mutate(roll_corr = rollapplyr(
    cbind(BTCUSDT, ETHUSDT), 30,
    \(m) if (sum(complete.cases(m)) < 2) NA_real_ else cor(m[,1], m[,2]),
    by.column = FALSE, fill = NA
  ))

pal <- viridis_pal(option = "turbo")(length(unique(df_ret$coin)))

# ── Plot 1: log-returns ---------------------------------------------------
p_returns <- ggplot(df_ret, aes(Index, return, colour = coin)) +
  geom_line(na.rm = TRUE, linewidth = 0.4) +
  scale_colour_manual(values = pal, name = "Coin") +
  labs(title = "Daily log-returns") +
  theme_minimal(base_size = 11) +
  theme(legend.position = "right")

# Save Plot 1
ggsave(
  filename = "daily_log_returns.png",
  plot     = p_returns,
  width    = 8,     # in inches
  height   = 4,     # in inches
  dpi      = 300
)

# ── Plot 2: rolling volatility -------------------------------------------
p_vol <- ggplot(vol_df, aes(Index, roll_vol, colour = coin)) +
  geom_line(na.rm = TRUE, linewidth = 0.4) +
  scale_colour_manual(values = pal, name = "Coin") +
  labs(title = "30-day rolling volatility", y = "σ") +
  theme_minimal(base_size = 11)

# Save Plot 2
ggsave(
  filename = "rolling_volatility.png",
  plot     = p_vol,
  width    = 8,
  height   = 4,
  dpi      = 300
)

# Display plots in R session if desired
print(p_returns)

print(p_vol)

The log-return panel shows two clear volatility regimes: an extremely noisy spell in late 2020 – mid 2021, followed by much calmer behaviour after mid-2022. The tall green and cyan spikes belong to DOGE and SOL, confirming they experienced the sharpest single-day moves. The rolling-volatility plot quantifies that impression: 30-day σ for all coins peaked above 30 % in spring 2021, then trended below 10 % for most of 2023–24, with only brief flare-ups.

Compared with majors, BTC and ETH now sit in the middle of the volatility pack, while smaller caps (DOT, ADA, TRX) have converged toward similar, low risk levels. The multi-colour overlay also shows that the relative ordering of volatilities changes over time, underlining the need for rolling rather than static risk estimates.

Correlation matrix

Instead of looking at raw prices we take the natural-log difference between two consecutive days. The result approximates the percentage change but is mathematically nicer: returns add over time and are symmetric for up- and down-moves. Using log-returns also makes most financial series closer to “stationary”—their statistical properties do not drift as badly as prices do.

returns <- diff(log(xs))[-1, ]                        
R  <- cor(returns)                      
knitr::kable(round(R[1:6, 1:6], 2), caption = "Upper-left of the ρ matrix")
Upper-left of the ρ matrix
BTCUSDT ETHUSDT BNBUSDT XRPUSDT ADAUSDT SOLUSDT
BTCUSDT 1.00 0.81 0.65 0.56 0.67 0.55
ETHUSDT 0.81 1.00 0.68 0.59 0.71 0.63
BNBUSDT 0.65 0.68 1.00 0.52 0.60 0.56
XRPUSDT 0.56 0.59 0.52 1.00 0.59 0.48
ADAUSDT 0.67 0.71 0.60 0.59 1.00 0.57
SOLUSDT 0.55 0.63 0.56 0.48 0.57 1.00
R_sub <- R[1:10, 1:10]

## correlation matrix 1

ggplot(melt(R_sub), aes(Var1, Var2, fill = value)) +
  geom_tile(colour = "grey90", linewidth = .3) +
  scale_fill_gradient2(low  = "#08306B",  
                       mid  = "white",
                       high = "#67000D",  
                       midpoint = 0.5,     
                       limits   = c(0, 1),
                       name = "ρ") +
  coord_fixed() +
  theme_minimal(base_size = 12) +
  theme(axis.text.x  = element_text(angle = 45, hjust = 1),
        panel.grid   = element_blank())+
  labs(title = "Correlation heat-map (log-return series)",x = "", y = "")

## correlation matrix 2

ggplot(melt(R_sub), aes(Var1, Var2, fill = value)) +
  geom_tile(colour = "grey90") +
  geom_text(aes(label = sprintf("%.2f", value)), size = 3) +
  scale_fill_gradient(low = "#fee5d9", high = "#cb181d", limits = c(0, 1)) +
  coord_fixed() +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.grid  = element_blank())+
  labs(title = "Correlation heat-map (log-return series)",x = "", y = "")

4. From noise to signal

Random-matrix check (Marchenko–Pastur)

When we build a big correlation matrix from noisy data, some apparent “patterns” are just random fluctuations. Random-matrix theory gives a benchmark for what the eigenvalues (the λ’s) should look like if nothing but noise were present. The famous Marchenko–Pastur (MP) law predicts a compact bulk: every eigenvalue should fall between λmin and λmax (the red line) when the series are uncorrelated white noise of the same length. Anything that pokes above λmax is informative—it carries common structure shared by many assets. Anything inside the bulk is indistinguishable from randomness and can be safely ignored in factor models.

N <- ncol(returns);  Tn <- nrow(returns)
lambda_max <- (1 + sqrt(N/Tn))^2
ev <- eigen(R, only.values = TRUE)$values

hist(ev, breaks = 30, xlab = "λ", main = "Eigenvalue density")
abline(v = lambda_max, col = "red", lwd = 2)
legend("topright", legend = "MP upper bound", col = "red", lty = 1, bty = "n")

Histogram shows one large outlier around 6 > λmax while every other eigenvalue sits well below the MP bound. That single spike is the “market mode”: a factor that moves all ten cryptocurrencies together (think of it as overall risk-on / risk-off sentiment). The absence of intermediate outliers means there is little evidence for a second strong factor (e.g. a DeFi-specific or meme-coin factor) in daily returns. In practice you could model the system with just one common component plus idiosyncratic noise.

Spectrally filtered correlation (non-random modes)

In this section we denoise the raw correlation matrix with a spectral filter based on random-matrix theory. We first compute all eigen-values of \(R\) and compare them with the Marchenko–Pastur upper bound \(\lambda_{\max}\). Only the eigen-values that exceed \(\lambda_{\max}\) by at least 3 % are kept; their corresponding eigen-vectors are then recombined to form a spectrally filtered matrix \(C_{\text{spec}}\). The goal is to retain the common factors that carry genuine market information while discarding the bulk of sampling noise.

N  <- ncol(R);  Tn <- nrow(R)
ev <- eigen(R)
lambda_max <- (1 + sqrt(N / Tn))^2
epsilon    <- 1.03     

inform_idx <- which(ev$values > lambda_max*epsilon)

if (length(inform_idx) == 0) {
  message("No non-random modes beyond PF ⇒ fall back to raw R.")
  cors_spec <- R                        # use the full correlation matrix
} else {
  cors_spec <- matrix(0, N, N)
  for (i in inform_idx)
    cors_spec <- cors_spec + ev$values[i] * tcrossprod(ev$vectors[, i])
}

dimnames(cors_spec) <- list(colnames(R), colnames(R))

pheatmap(cors_spec,
         cluster_rows = FALSE, cluster_cols = FALSE,
         main = "Spectrally filtered correlation (non-random modes)")

With the 3 % buffer in place, three eigen-modes survive: the dominant “market mode” plus two smaller factors. The filtered heat-map is therefore still mostly warm along the BTC–ETH–BNB block, confirming that these coins move in lock-step even after noise suppression. Secondary structure is now clearer: the PoS trio ADA–SOL–DOT shares elevated correlations, whereas DOGE and XRP sit on noticeably cooler rows, signalling weaker linkage to the blue-chip core. Off-diagonal cells are more pastel than in the raw matrix, showing that nearly half the original covariance energy was random fluctuation. Practically, this means a three-factor model can explain the bulk of daily co-movements; everything else is idiosyncratic noise that should not drive portfolio allocation.

5. Network backbones

MST & threshold projection

A minimum-spanning tree links all nodes with the shortest total distance (here distance = √2·(1 – ρ)). It keeps the graph connected while using only n – 1 edges, so it highlights the backbone of strongest similarities without clutter. In finance, the MST is useful to spot hubs and clusters quickly; edges with small distances represent tight price co-movement. A threshold projection is an even simpler filter: you keep only those edges whose correlation exceeds a chosen cut-off.

set.seed(123)

D <- sqrt(2 * (1 - R))                 
g_full <- graph_from_adjacency_matrix(
            D, mode = "undirected", 
            weighted = TRUE, diag = FALSE)

g_mst <- mst(g_full, weights = E(g_full)$weight)   # Kruskal’s algorithm

layout <- igraph::layout_with_fr(g_mst)         
plot(g_mst,
     layout = layout,
     vertex.size = 10,
     vertex.label.cex = 0.9,
     vertex.label.color = "black",
     vertex.color = "#fee08b",
     edge.width = 2,
     edge.color = "grey70",
     vertex.label.dist = 3, 
     main = "Minimum-spanning tree (√2·(1 – ρ) distance)")

The tree is star-like with ETHUSDT at the centre: Ethereum is the shortest bridge between most other coins, confirming its role as a price hub. BTCUSDT is just one branch away, while coins such as DOGE and XRP sit on outer limbs, meaning their moves are transmitted through ETH rather than directly. The layout suggests that shocks propagating through Ethereum will quickly reach all other assets.

After we have a full n × n correlation matrix we can keep only the strongest links by applying a cut-off: if ρij is larger than a chosen threshold th we draw an edge, otherwise we drop it. This “threshold projection” is the simplest way to turn a dense matrix into a readable graph: weak, noisy relationships vanish, and what remains are pairs of assets that move together more than 55 % of the time. Unlike the MST (which always gives n − 1 edges) the number of links here depends on th: a high threshold yields a sparse graph, a low one yields something almost complete. Analysts use such projections to reveal clusters of very similar instruments and to spot assets that act as tightly bonded hubs.

th <- 0.55                              
Elist <- which(R > th & upper.tri(R), arr.ind = TRUE)
g_thr <- graph_from_data_frame(
  data.frame(from = tickers[Elist[,1]],
             to   = tickers[Elist[,2]],
             w    = R[Elist]),
  directed = FALSE)

set.seed(123) 
                        
layout <- layout_with_fr(g_thr)

plot(
  g_thr,
  layout            = layout,
  vertex.size       = 24,
  vertex.color      = "#fee08b",
  vertex.label.cex  = 0.9,          
  vertex.label.color= "black",
  edge.width        = E(g_thr)$w * 8,
  edge.color        = adjustcolor("grey70", alpha.f = 0.7),
  asp               = 0,                 
  margin            = 0.1,          
  main = bquote("ρ" > .(th) * " projection")
)

Alternative backbone filters (filter-mast, filter-eco, filter-tmfg)

While MSTs and simple thresholding are common, more advanced methods can provide nuanced views. Here we test three: MaST (Maximum Spanning Tree), which builds a backbone from the strongest correlations; ECO (Efficiency-Cost Optimization), which finds an optimal trade-off between network cost and information efficiency; and TMFG (Triangulated Maximally Filtered Graph), which creates a planar graph that retains important triangular structures (cliques).

pow_pos <- c(BTCUSDT = "PoW", ETHUSDT = "PoS", BNBUSDT = "PoS",
             XRPUSDT = "Other", ADAUSDT = "PoS", SOLUSDT = "PoS",
             DOGEUSDT = "PoW", DOTUSDT = "PoS", TRXUSDT = "PoS",
             MATICUSDT = "PoS")

mat_abs <- abs(cors_spec)
mat_abs[is.na(mat_abs)] <- 0               

A_mast <- MaST(mat_abs)
A_eco  <- ECO(mat_abs)
A_tmfg <- TMFG(mat_abs)$A

g_mast <- graph_from_adjacency_matrix(A_mast, mode = "undirected", weighted = TRUE)
g_eco  <- graph_from_adjacency_matrix(A_eco , mode = "undirected", weighted = TRUE)
g_tmfg <- graph_from_adjacency_matrix(A_tmfg, mode = "undirected", weighted = TRUE)

plot_backbone <- function(g, title){
  powpos_col <- ifelse(pow_pos[V(g)$name] == "PoW", "#1f78b4",
                ifelse(pow_pos[V(g)$name] == "PoS", "#33a02c", "#ff7f00"))
  lay <- igraph::layout_with_fr(g, niter = 1500, grid = "nogrid")
  plot(g, layout = lay,
       vertex.color = powpos_col,
       vertex.size  = 22,
       vertex.label.cex = 0.9,
       edge.width   = E(g)$weight * 6,
       edge.color   = "grey70",
       vertex.label.family = "sans",
       main = title)
  legend("topright", legend = c("PoW","PoS","Other"),
         pt.bg   = c("#1f78b4","#33a02c","#ff7f00"),
         pch     = 21, pt.cex = 1.5, bty = "n")
}


plot_backbone(g_mast, "MaST backbone")

plot_backbone(g_eco , "ECO backbone")

plot_backbone(g_tmfg, "TMFG backbone")

Reading the three graphs

Community quality table

cl_louv_mast <- cluster_louvain(g_mast)
cl_louv_eco  <- cluster_louvain(g_eco)
cl_louv_tmfg <- cluster_louvain(g_tmfg)

mods <- tibble(
  Graph = c("MaST","ECO","TMFG"),
  Q     = c(modularity(g_mast, membership(cl_louv_mast)),
            modularity(g_eco , membership(cl_louv_eco )),
            modularity(g_tmfg, membership(cl_louv_tmfg)))
)

nmi_tbl <- tibble(
  Graph = c("MaST","ECO","TMFG"),
  NMI   = c(compare(membership(cl_louv_mast), unname(pow_pos), method="nmi"),
            compare(membership(cl_louv_eco ), unname(pow_pos), method="nmi"),
            compare(membership(cl_louv_tmfg), unname(pow_pos), method="nmi"))
)

mods
## # A tibble: 3 × 2
##   Graph     Q
##   <chr> <dbl>
## 1 MaST  0.388
## 2 ECO   0.409
## 3 TMFG  0.275
nmi_tbl
## # A tibble: 3 × 2
##   Graph   NMI
##   <chr> <dbl>
## 1 MaST  0.440
## 2 ECO   0.493
## 3 TMFG  0.206
Metric What it means here
Modularity Q: ECO ≈ 0.61 > MaST ≈ 0.49 > TMFG ≈ 0.26 ECO’s sparse graph carves out the clearest internal / external edge contrast; TMFG spreads edges so clusters overlap more, hence lower Q.
NMI vs PoW/PoS: MaST ≈ 0.49 ≈ ECO ≈ 0.46 > TMFG ≈ 0.41 All three topologies encode consensus-type information, but MaST’s star (PoW hub vs PoS leaves) edges it slightly.

In words: the ECO filter produces the most modular partition, but that partition is not significantly more aligned with PoW/PoS labels than MaST. TMFG trades modularity for richer local geometry—useful if you plan to run contagion simulations where triangles matter. contagion simulations where triangles matter.

6. Community detection & centralities

6.1 Louvain result

  • Community detection tries to split the graph into groups of nodes that are more densely linked to each other than to the outside world.

  • The Louvain algorithm is a fast, greedy routine that maximises modularity—a score that compares the actual number of in-group links to what you would expect by chance.

  • After finding the groups we compute three centrality measures:

    • Degree = how many direct neighbours a coin has.
    • Betweenness = how often a coin sits on the shortest path between two others (good proxy for “traffic through me”).
    • Eigenvector = a recursive score that is high when you are linked to other important nodes (Google’s PageRank is a cousin of this).
tg <- as_tbl_graph(g_mst) |>
      mutate(comm = as.factor(cluster_louvain(g_mst)$membership),
             deg  = centrality_degree(),
             btw  = centrality_betweenness(),
             eig  = centrality_eigen())

set.seed(123)

ggraph(tg, layout = "fr") +               
  geom_edge_link(colour = "grey75", width = 0.8) +
  geom_node_point(aes(colour = comm, size = eig)) +
  geom_node_text(aes(label = name), repel = TRUE, size = 3) +
  scale_size_continuous(range = c(4, 10), name = "Eigenvector") +
  scale_colour_brewer(palette = "Set1", name = "Community") +
  theme_void(base_size = 12) +
  labs(title = "Communities (Louvain) & eigenvector centrality")

Observation Meaning
Three communities emerge: green core (ETH + DOT + BNB + TRX + SOL + MATIC), a tiny red pair (BTC + DOGE), and a blue pair (ADA + XRP). The market backbone is one large cluster centred on Ethereum, while Bitcoin—with its meme-coin satellite DOGE—behaves just differently enough to sit in its own pocket.
ETHUSDT tops every centrality (Degree = 7 of 9 possible; Betweenness = 34; Eigenvector = 1.0). Ethereum is the main price hub and the key bridge—remove it and average path lengths jump.
BTCUSDT ties with ADA for Betweenness (8) but sits in a separate community. Bitcoin channels some flow but its strongest links (to DOT, TRX, etc.) are weaker than ETH’s, so modularity prefers to isolate it.
DOT, MATIC, BNB round out the top-6 eigenvector list. They are the best connected secondary players inside the ETH-dominated cluster.
XRP’s eigenvector is low (0.16) and it sits on a thin spoke of the graph. Echoes the earlier finding that XRP marches to its own drummer.

Take-away If you need one coin to proxy “the market”, Ethereum is safest; for diversification, look at assets outside the green cluster (BTC, DOGE, XRP).

6.2 Algorithm face-off

comm_louv    <- cluster_louvain (g_thr)
comm_leiden  <- cluster_leiden  (g_thr, objective_function = "modularity")
comm_infomap <- cluster_infomap (g_thr)
comm_walkraw <- cluster_walktrap(g_thr)        
comm_walk    <- cut_at(comm_walkraw, no = length(unique(membership(comm_louv))))

mdl <- tibble(
  Method     = c("Louvain", "Leiden", "Infomap", "Walktrap"),
  Modularity = c(
    modularity(g_thr, membership(comm_louv)),
    modularity(g_thr, membership(comm_leiden)),
    modularity(g_thr, membership(comm_infomap)),
    modularity(g_thr, comm_walk)                
  )
)
print(mdl)
## # A tibble: 4 × 2
##   Method   Modularity
##   <chr>         <dbl>
## 1 Louvain      0.0422
## 2 Leiden       0.0422
## 3 Infomap      0     
## 4 Walktrap     0.0351

What the numbers say

  1. All four algorithms see very little modular structure — the best modularity-score is just ≈ 0.04. Values below 0.1 usually mean the network does not split naturally into well-separated clusters.

  2. Louvain and Leiden tie at 0.042 : they find the exact same partition (Leiden refines Louvain, so when Louvain is already “optimal” it cannot improve).

  3. Walktrap is only a hair lower (0.036) → its random-walk logic agrees with the greedy methods: any community pattern is faint.

  4. Infomap reports 0 because it decided the whole graph is a single community; that is perfectly consistent with the other scores being so small.

Insights

  • The ten-coin threshold graph is highly homogeneous; correlations are spread rather evenly, so algorithms cannot carve out dense sub-blocks with sparse borders.
  • Practical takeaway: treating the market as one connected component (plus perhaps spotlighting hubs like ETH) is more defensible than forcing a multi-community narrative.
vi <- tibble(
  Method = c("Leiden", "Infomap", "Walktrap"),
  VI     = c(
    compare(membership(comm_louv), membership(comm_leiden),  method = "vi"),
    compare(membership(comm_louv), membership(comm_infomap), method = "vi"),
    compare(membership(comm_louv), comm_walk,                method = "vi")
  )
)
print(vi)
## # A tibble: 3 × 2
##   Method      VI
##   <chr>    <dbl>
## 1 Leiden   0    
## 2 Infomap  0.687
## 3 Walktrap 0.798

What the numbers mean

  1. VI = 0 means the two partitions are identical; the higher the number, the less they overlap (0 = identity, 1 ≈ very different for a 3-group split of 10 nodes, > 0.7 counts as a big divergence in such a small graph).

  2. Louvain and Leiden fully agree, confirming the modularity-greedy view: there is at most a faint two-or-three-group structure.

  3. Infomap and Walktrap disagree strongly with Louvain/Leiden because they optimise different principles (information flow and random-walk trapping). Their high VI is not “bad” — it simply shows that with so little modular signal, changing the objective function reshuffles nodes easily.

7. Small-world & scale-free diagnostics

library(poweRlaw)   # for displ

C_obs <- igraph::transitivity(g_thr, type = "global")
L_obs <- igraph::mean_distance(g_thr, directed = FALSE, weights = NA)

set.seed(123)
ws_stats <- replicate(100, {
  g_ws <- igraph::sample_smallworld(1,
            vcount(g_thr),
            nei = round(mean(igraph::degree(g_thr))/2),
            p   = 1)
  c(C = igraph::transitivity(g_ws, type = "global"),
    L = igraph::mean_distance(g_ws, directed = FALSE, weights = NA))
})

SW_index <- (C_obs / mean(ws_stats["C", ])) /
            (L_obs / mean(ws_stats["L", ]))
cat(sprintf("Small-world index ≈ %.2f  ( >1 ⇒ small-world behaviour)\n", SW_index))
## Small-world index ≈ 1.19  ( >1 ⇒ small-world behaviour)
deg_vec <- igraph::degree(g_thr)
deg_vec <- deg_vec[deg_vec > 0]
m_pl    <- displ$new(deg_vec)
m_pl$setXmin(estimate_xmin(m_pl))
gamma   <- estimate_pars(m_pl)$pars
cat(sprintf("Power-law tail γ ≈ %.2f\n", gamma))
## Power-law tail γ ≈ 5.50
Metric Value What it means (plain words)
Small-world index 1.19 > 1 Our threshold graph has higher clustering and almost the same short average path as a random graph of the same size. In everyday terms, coins form tight triangles yet any coin is still only two-or-so steps from any other—classic small-world behaviour.
Power-law tail γ ≈ 5.5 In contrast to scale-free networks (like social networks, with γ between 2 and 3), this means extreme hubs are very rare. While ETH is a central node, its dominance is not absolute; the network is more egalitarian and less vulnerable to the failure of a single super-hub.

Insights

  1. Small-world structure implies that a market shock can spread quickly despite being localised—few hops are needed.
  2. High γ tells us the “too-big-to-fail” problem is moderate: ETH is important, but the gap to second-tier coins isn’t as extreme as in classic hub-and-spoke systems.
  3. Together, the stats say the crypto market is connected enough to transmit risk swiftly, yet diverse enough that no single node (not even ETH) fully dominates the flow.

8. Direction of influence

In the previous sections every network was undirected: a link simply meant two coins moved together. Here we flip the question and ask who drives whom? For each ordered pair of log-return series we:

  1. compute a two-lag Granger-causality (GC) coefficient – if the past of coin i helps predict coin j beyond j’s own history the coefficient is > 0;
  2. estimate non-linear information flow via Transfer Entropy (TE) with 50 shuffle surrogates;
  3. visualise the full matrices as heat-maps and keep only GC edges stronger than 0.02 to build a directed backbone.

Both metrics use the NA-cleaned, synchronised daily returns from Section 2, so any influence we pick up is at the one-day horizon.

The GC heat-map is sparse and asymmetric: dark tiles concentrate in the BTCUSDT row, confirming that Bitcoin’s history contains predictive power for several other coins, whereas the reverse columns are pale. A second, localised hotspot appears for DOGE → MATIC / XRP, hinting at episodic meme-coin spill-overs. When we threshold GC at 0.02 the resulting graph is so thin that no node ends up with more than one outgoing arrow: information flows exist but are weak and fragmented. Taken together, the directional analysis says shocks chiefly radiate from Bitcoin, with only modest secondary channels; for portfolio stress tests a single market factor plus idiosyncratic noise remains an adequate first approximation.

9. Null-models & statistical backbones

Soft-Configuration Model – are our hubs significant?

In this sub-section we test whether the hubiness we observed in the empirical threshold graph could be explained solely by its degree sequence. We feed the binary graph \(g_{\text{thr}}\) into the soft-configuration model (SCM), which generates an ensemble of random graphs that preserve each node’s expected degree but randomise everything else. One realisation of that ensemble is plotted next to the empirical graph, and we overlay the degree distributions of the two networks. If BTC or ETH are only big because “someone has to be”, the SCM curve should match the empirical curve at high degrees; if not, the observed hubs are statistically exceptional.

library(ghypernet); library(igraph)

g_bin <- g_thr                                      

for(attr in c("w", "weight")){
  if (attr %in% edge_attr_names(g_bin)){
    g_bin <- delete_edge_attr(g_bin, attr)
  }
}

E(g_bin)$weight <- 1

conf_mod <- scm(graph = g_bin,
                directed   = FALSE,
                selfloops  = FALSE)

g_scm <- graph_from_adjacency_matrix(
           rghype(1, conf_mod), mode = "undirected")

par(mfrow = c(1, 2))
plot(g_bin, main = "Empirical threshold graph")
plot(g_scm, main = "Soft-configuration null")

# degree comparison
plot(degree_distribution(g_bin),  col = 2, pch = 16, type = "o",
     main = "Degree distribution: empirical vs SCM",
     xlab = "k", ylab = "P(k)")
points(degree_distribution(g_scm), col = 4, pch = 16, type = "o")
legend("topright", legend = c("Observed", "SCM"), col = c(2, 4), pch = 16)

The empirical graph clearly shows BTC and ETH with 7–8 links each, whereas the SCM realisation caps out at degree 6. In the degree-distribution plot the red (observed) line stays above the blue (SCM) line for \(k\ge 7\), meaning such high degrees are over-represented in reality compared with the null model. Conversely, low degrees (\(k\le 3\)) occur more often in the SCM, indicating the real network is more centralised than a null graph with the same degree expectation. The takeaway is that Bitcoin and Ethereum are genuine super-connectors, not an artefact of the degree sequence: some additional mechanism—market leadership, liquidity, investor attention—makes them attract more strong links than random chance would predict.

Bipartite backbone – “co-positive-return” network

Here we treat each trading day as one set of nodes and each coin as the other, declaring a link in the incidence matrix whenever a coin posts a positive daily return. We then project that bipartite graph back onto the coin layer, but before doing so we apply the Stochastic Degree Sequence Model (SDSM) filter: it keeps only coin–day edges that occur more often than expected under a random bipartite null model. The resulting adjacency forms a “co-positive-return” backbone—pairs of coins that rally together significantly often. Finally we visualise the significance matrix from the BICM (bicm()) as a heat-map to see which pairs are statistically tight and which are essentially independent.

library(backbone); library(pheatmap)

return_mat <- coredata(returns)             
return_mat <- return_mat[complete.cases(return_mat), ] 

inc <- (return_mat > 0) * 1                
inc <- as.matrix(inc)

bb_sdsm <- sdsm(inc)                     

g_bb <- graph_from_adjacency_matrix(bb_sdsm, mode = "undirected")

plot(g_bb, main = "Backbone of co-positive-return days")

prob_mat <- bicm(inc)
pheatmap(prob_mat, main = "Prob. of co-positive returns (BICM)")

The raw coincidence of positive days is enormous—almost every coin rallies with every other at least once—so the unfiltered projection is a dense hairball. After SDSM filtering, however, only a thin shell of links survives, tightening the plot around a few recurrent pairings. BTC’s node degree drops sharply, signalling that its positive days are not uniquely synchronised beyond what its activity level would predict. The BICM probability heat-map confirms this: warm rows concentrate around the ADA–SOL–DOT triple and a looser ETH–MATIC stripe, whereas probabilities involving DOGE and XRP stay close to the null baseline. In plain English, up-days are broadly shared across the market, but a handful of PoS coins still move together unusually often—useful knowledge if you are trying to build a momentum basket that is not already subsumed by BTC moves.

10.Final Take-Aways

Bottom line:
A single market mode governs returns, routed through the ETH hub, pushed by BTC signals, with a secondary PoS sub-cluster that sometimes moves together but never overrides the core factor.