This report analyzes the network structure of the cryptocurrency market to understand the relationships and influence among major assets. By modeling the market as a network, we can uncover its core structure, identify key players, and assess systemic properties like shock propagation and hierarchy.
We fetch daily OHLC files from CryptoDataDownload and keep the Close column.
data_dir <- "data/crypto_cdd"
dir.create(data_dir, showWarnings = FALSE, recursive = TRUE)
tickers <- c("BTCUSDT","ETHUSDT","BNBUSDT","XRPUSDT","ADAUSDT",
"SOLUSDT","DOGEUSDT","DOTUSDT","TRXUSDT","MATICUSDT")
grab_cdd <- function(pair){
local_file <- file.path(data_dir, paste0("Binance_", pair, "_d.csv"))
if (!file.exists(local_file)){
url <- glue("https://www.cryptodatadownload.com/cdd/Binance_{pair}_d.csv")
download.file(url, destfile = local_file, quiet = TRUE)
}
read_csv(local_file, skip = 1, col_types = cols(), show_col_types = FALSE) %>%
select(date = Date, close = Close) %>%
mutate(date = as_date(date)) %>%
arrange(date) %>%
rename(!!pair := close)
}
price_list <- map(tickers, grab_cdd)
prices <- reduce(price_list, inner_join, by = "date") # overlap window
xs <- xts(prices[-1], order.by = prices$date) %>%
na.locf()
summary(xs)
## Index BTCUSDT ETHUSDT BNBUSDT
## Min. :2020-08-18 Min. :10127 Min. : 320.7 Min. : 19.47
## 1st Qu.:2021-08-23 1st Qu.:23450 1st Qu.:1567.7 1st Qu.:241.70
## Median :2022-08-29 Median :35413 Median :1920.8 Median :307.55
## Mean :2022-08-29 Mean :37274 Mean :2178.9 Mean :327.59
## 3rd Qu.:2023-09-04 3rd Qu.:49068 3rd Qu.:2974.7 3rd Qu.:424.40
## Max. :2024-09-09 Max. :73072 Max. :4808.0 Max. :711.20
## XRPUSDT ADAUSDT SOLUSDT DOGEUSDT
## Min. :0.2114 Min. :0.07663 Min. : 1.198 Min. :0.002514
## 1st Qu.:0.4057 1st Qu.:0.32777 1st Qu.: 20.275 1st Qu.:0.063705
## Median :0.5233 Median :0.45420 Median : 34.825 Median :0.083690
## Mean :0.5892 Mean :0.70547 Mean : 65.598 Mean :0.114671
## 3rd Qu.:0.6650 3rd Qu.:1.04350 3rd Qu.:109.233 3rd Qu.:0.148050
## Max. :1.8347 Max. :2.96600 Max. :258.440 Max. :0.689820
## DOTUSDT TRXUSDT MATICUSDT
## Min. : 2.833 Min. :0.02299 Min. :0.01222
## 1st Qu.: 5.340 1st Qu.:0.06103 1st Qu.:0.54038
## Median : 7.027 Median :0.07114 Median :0.81565
## Mean :12.943 Mean :0.07869 Mean :0.86071
## 3rd Qu.:18.152 3rd Qu.:0.10225 3rd Qu.:1.12607
## Max. :53.820 Max. :0.16640 Max. :2.87600
Log-returns
df_price <- fortify.zoo(xs) %>%
pivot_longer(-Index, names_to = "coin", values_to = "price")
df_ret <- df_price %>%
group_by(coin) %>%
arrange(Index) %>%
mutate(return = c(NA, diff(log(price)))) %>%
ungroup()
roll_sd <- function(x) rollapplyr(x, 30, sd, fill = NA)
vol_df <- df_ret %>%
group_by(coin) %>%
mutate(roll_vol = roll_sd(return)) %>%
ungroup()
corr_df <- df_ret %>%
filter(coin %in% c("BTCUSDT", "ETHUSDT")) %>%
pivot_wider(names_from = coin, values_from = return) %>%
mutate(roll_corr = rollapplyr(
cbind(BTCUSDT, ETHUSDT), 30,
\(m) if (sum(complete.cases(m)) < 2) NA_real_ else cor(m[,1], m[,2]),
by.column = FALSE, fill = NA
))
pal <- viridis_pal(option = "turbo")(length(unique(df_ret$coin)))
# ── Plot 1: log-returns ---------------------------------------------------
p_returns <- ggplot(df_ret, aes(Index, return, colour = coin)) +
geom_line(na.rm = TRUE, linewidth = 0.4) +
scale_colour_manual(values = pal, name = "Coin") +
labs(title = "Daily log-returns") +
theme_minimal(base_size = 11) +
theme(legend.position = "right")
# Save Plot 1
ggsave(
filename = "daily_log_returns.png",
plot = p_returns,
width = 8, # in inches
height = 4, # in inches
dpi = 300
)
# ── Plot 2: rolling volatility -------------------------------------------
p_vol <- ggplot(vol_df, aes(Index, roll_vol, colour = coin)) +
geom_line(na.rm = TRUE, linewidth = 0.4) +
scale_colour_manual(values = pal, name = "Coin") +
labs(title = "30-day rolling volatility", y = "σ") +
theme_minimal(base_size = 11)
# Save Plot 2
ggsave(
filename = "rolling_volatility.png",
plot = p_vol,
width = 8,
height = 4,
dpi = 300
)
# Display plots in R session if desired
print(p_returns)
print(p_vol)
The log-return panel shows two clear volatility regimes: an extremely noisy spell in late 2020 – mid 2021, followed by much calmer behaviour after mid-2022. The tall green and cyan spikes belong to DOGE and SOL, confirming they experienced the sharpest single-day moves. The rolling-volatility plot quantifies that impression: 30-day σ for all coins peaked above 30 % in spring 2021, then trended below 10 % for most of 2023–24, with only brief flare-ups.
Compared with majors, BTC and ETH now sit in the middle of the volatility pack, while smaller caps (DOT, ADA, TRX) have converged toward similar, low risk levels. The multi-colour overlay also shows that the relative ordering of volatilities changes over time, underlining the need for rolling rather than static risk estimates.
Correlation matrix
Instead of looking at raw prices we take the natural-log difference between two consecutive days. The result approximates the percentage change but is mathematically nicer: returns add over time and are symmetric for up- and down-moves. Using log-returns also makes most financial series closer to “stationary”—their statistical properties do not drift as badly as prices do.
returns <- diff(log(xs))[-1, ]
R <- cor(returns)
knitr::kable(round(R[1:6, 1:6], 2), caption = "Upper-left of the ρ matrix")
| BTCUSDT | ETHUSDT | BNBUSDT | XRPUSDT | ADAUSDT | SOLUSDT | |
|---|---|---|---|---|---|---|
| BTCUSDT | 1.00 | 0.81 | 0.65 | 0.56 | 0.67 | 0.55 |
| ETHUSDT | 0.81 | 1.00 | 0.68 | 0.59 | 0.71 | 0.63 |
| BNBUSDT | 0.65 | 0.68 | 1.00 | 0.52 | 0.60 | 0.56 |
| XRPUSDT | 0.56 | 0.59 | 0.52 | 1.00 | 0.59 | 0.48 |
| ADAUSDT | 0.67 | 0.71 | 0.60 | 0.59 | 1.00 | 0.57 |
| SOLUSDT | 0.55 | 0.63 | 0.56 | 0.48 | 0.57 | 1.00 |
R_sub <- R[1:10, 1:10]
## correlation matrix 1
ggplot(melt(R_sub), aes(Var1, Var2, fill = value)) +
geom_tile(colour = "grey90", linewidth = .3) +
scale_fill_gradient2(low = "#08306B",
mid = "white",
high = "#67000D",
midpoint = 0.5,
limits = c(0, 1),
name = "ρ") +
coord_fixed() +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid = element_blank())+
labs(title = "Correlation heat-map (log-return series)",x = "", y = "")
## correlation matrix 2
ggplot(melt(R_sub), aes(Var1, Var2, fill = value)) +
geom_tile(colour = "grey90") +
geom_text(aes(label = sprintf("%.2f", value)), size = 3) +
scale_fill_gradient(low = "#fee5d9", high = "#cb181d", limits = c(0, 1)) +
coord_fixed() +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid = element_blank())+
labs(title = "Correlation heat-map (log-return series)",x = "", y = "")
Random-matrix check (Marchenko–Pastur)
When we build a big correlation matrix from noisy data, some apparent “patterns” are just random fluctuations. Random-matrix theory gives a benchmark for what the eigenvalues (the λ’s) should look like if nothing but noise were present. The famous Marchenko–Pastur (MP) law predicts a compact bulk: every eigenvalue should fall between λmin and λmax (the red line) when the series are uncorrelated white noise of the same length. Anything that pokes above λmax is informative—it carries common structure shared by many assets. Anything inside the bulk is indistinguishable from randomness and can be safely ignored in factor models.
N <- ncol(returns); Tn <- nrow(returns)
lambda_max <- (1 + sqrt(N/Tn))^2
ev <- eigen(R, only.values = TRUE)$values
hist(ev, breaks = 30, xlab = "λ", main = "Eigenvalue density")
abline(v = lambda_max, col = "red", lwd = 2)
legend("topright", legend = "MP upper bound", col = "red", lty = 1, bty = "n")
Histogram shows one large outlier around 6 > λmax while every other eigenvalue sits well below the MP bound. That single spike is the “market mode”: a factor that moves all ten cryptocurrencies together (think of it as overall risk-on / risk-off sentiment). The absence of intermediate outliers means there is little evidence for a second strong factor (e.g. a DeFi-specific or meme-coin factor) in daily returns. In practice you could model the system with just one common component plus idiosyncratic noise.
Spectrally filtered correlation (non-random modes)
In this section we denoise the raw correlation matrix with a spectral filter based on random-matrix theory. We first compute all eigen-values of \(R\) and compare them with the Marchenko–Pastur upper bound \(\lambda_{\max}\). Only the eigen-values that exceed \(\lambda_{\max}\) by at least 3 % are kept; their corresponding eigen-vectors are then recombined to form a spectrally filtered matrix \(C_{\text{spec}}\). The goal is to retain the common factors that carry genuine market information while discarding the bulk of sampling noise.
N <- ncol(R); Tn <- nrow(R)
ev <- eigen(R)
lambda_max <- (1 + sqrt(N / Tn))^2
epsilon <- 1.03
inform_idx <- which(ev$values > lambda_max*epsilon)
if (length(inform_idx) == 0) {
message("No non-random modes beyond PF ⇒ fall back to raw R.")
cors_spec <- R # use the full correlation matrix
} else {
cors_spec <- matrix(0, N, N)
for (i in inform_idx)
cors_spec <- cors_spec + ev$values[i] * tcrossprod(ev$vectors[, i])
}
dimnames(cors_spec) <- list(colnames(R), colnames(R))
pheatmap(cors_spec,
cluster_rows = FALSE, cluster_cols = FALSE,
main = "Spectrally filtered correlation (non-random modes)")
With the 3 % buffer in place, three eigen-modes survive: the dominant
“market mode” plus two smaller factors. The filtered heat-map is
therefore still mostly warm along the BTC–ETH–BNB
block, confirming that these coins move in lock-step even after noise
suppression. Secondary structure is now clearer: the PoS trio
ADA–SOL–DOT shares elevated correlations, whereas
DOGE and XRP sit on noticeably cooler
rows, signalling weaker linkage to the blue-chip core. Off-diagonal
cells are more pastel than in the raw matrix, showing that nearly half
the original covariance energy was random fluctuation. Practically, this
means a three-factor model can explain the bulk of
daily co-movements; everything else is idiosyncratic noise that should
not drive portfolio allocation.
MST & threshold projection
A minimum-spanning tree links all nodes with the shortest total distance (here distance = √2·(1 – ρ)). It keeps the graph connected while using only n – 1 edges, so it highlights the backbone of strongest similarities without clutter. In finance, the MST is useful to spot hubs and clusters quickly; edges with small distances represent tight price co-movement. A threshold projection is an even simpler filter: you keep only those edges whose correlation exceeds a chosen cut-off.
set.seed(123)
D <- sqrt(2 * (1 - R))
g_full <- graph_from_adjacency_matrix(
D, mode = "undirected",
weighted = TRUE, diag = FALSE)
g_mst <- mst(g_full, weights = E(g_full)$weight) # Kruskal’s algorithm
layout <- igraph::layout_with_fr(g_mst)
plot(g_mst,
layout = layout,
vertex.size = 10,
vertex.label.cex = 0.9,
vertex.label.color = "black",
vertex.color = "#fee08b",
edge.width = 2,
edge.color = "grey70",
vertex.label.dist = 3,
main = "Minimum-spanning tree (√2·(1 – ρ) distance)")
The tree is star-like with ETHUSDT at the centre: Ethereum is the shortest bridge between most other coins, confirming its role as a price hub. BTCUSDT is just one branch away, while coins such as DOGE and XRP sit on outer limbs, meaning their moves are transmitted through ETH rather than directly. The layout suggests that shocks propagating through Ethereum will quickly reach all other assets.
After we have a full n × n correlation matrix we can keep only the strongest links by applying a cut-off: if ρij is larger than a chosen threshold th we draw an edge, otherwise we drop it. This “threshold projection” is the simplest way to turn a dense matrix into a readable graph: weak, noisy relationships vanish, and what remains are pairs of assets that move together more than 55 % of the time. Unlike the MST (which always gives n − 1 edges) the number of links here depends on th: a high threshold yields a sparse graph, a low one yields something almost complete. Analysts use such projections to reveal clusters of very similar instruments and to spot assets that act as tightly bonded hubs.
th <- 0.55
Elist <- which(R > th & upper.tri(R), arr.ind = TRUE)
g_thr <- graph_from_data_frame(
data.frame(from = tickers[Elist[,1]],
to = tickers[Elist[,2]],
w = R[Elist]),
directed = FALSE)
set.seed(123)
layout <- layout_with_fr(g_thr)
plot(
g_thr,
layout = layout,
vertex.size = 24,
vertex.color = "#fee08b",
vertex.label.cex = 0.9,
vertex.label.color= "black",
edge.width = E(g_thr)$w * 8,
edge.color = adjustcolor("grey70", alpha.f = 0.7),
asp = 0,
margin = 0.1,
main = bquote("ρ" > .(th) * " projection")
)
Alternative backbone filters (filter-mast, filter-eco, filter-tmfg)
While MSTs and simple thresholding are common, more advanced methods can provide nuanced views. Here we test three: MaST (Maximum Spanning Tree), which builds a backbone from the strongest correlations; ECO (Efficiency-Cost Optimization), which finds an optimal trade-off between network cost and information efficiency; and TMFG (Triangulated Maximally Filtered Graph), which creates a planar graph that retains important triangular structures (cliques).
pow_pos <- c(BTCUSDT = "PoW", ETHUSDT = "PoS", BNBUSDT = "PoS",
XRPUSDT = "Other", ADAUSDT = "PoS", SOLUSDT = "PoS",
DOGEUSDT = "PoW", DOTUSDT = "PoS", TRXUSDT = "PoS",
MATICUSDT = "PoS")
mat_abs <- abs(cors_spec)
mat_abs[is.na(mat_abs)] <- 0
A_mast <- MaST(mat_abs)
A_eco <- ECO(mat_abs)
A_tmfg <- TMFG(mat_abs)$A
g_mast <- graph_from_adjacency_matrix(A_mast, mode = "undirected", weighted = TRUE)
g_eco <- graph_from_adjacency_matrix(A_eco , mode = "undirected", weighted = TRUE)
g_tmfg <- graph_from_adjacency_matrix(A_tmfg, mode = "undirected", weighted = TRUE)
plot_backbone <- function(g, title){
powpos_col <- ifelse(pow_pos[V(g)$name] == "PoW", "#1f78b4",
ifelse(pow_pos[V(g)$name] == "PoS", "#33a02c", "#ff7f00"))
lay <- igraph::layout_with_fr(g, niter = 1500, grid = "nogrid")
plot(g, layout = lay,
vertex.color = powpos_col,
vertex.size = 22,
vertex.label.cex = 0.9,
edge.width = E(g)$weight * 6,
edge.color = "grey70",
vertex.label.family = "sans",
main = title)
legend("topright", legend = c("PoW","PoS","Other"),
pt.bg = c("#1f78b4","#33a02c","#ff7f00"),
pch = 21, pt.cex = 1.5, bty = "n")
}
plot_backbone(g_mast, "MaST backbone")
plot_backbone(g_eco , "ECO backbone")
plot_backbone(g_tmfg, "TMFG backbone")
Reading the three graphs
Community quality table
cl_louv_mast <- cluster_louvain(g_mast)
cl_louv_eco <- cluster_louvain(g_eco)
cl_louv_tmfg <- cluster_louvain(g_tmfg)
mods <- tibble(
Graph = c("MaST","ECO","TMFG"),
Q = c(modularity(g_mast, membership(cl_louv_mast)),
modularity(g_eco , membership(cl_louv_eco )),
modularity(g_tmfg, membership(cl_louv_tmfg)))
)
nmi_tbl <- tibble(
Graph = c("MaST","ECO","TMFG"),
NMI = c(compare(membership(cl_louv_mast), unname(pow_pos), method="nmi"),
compare(membership(cl_louv_eco ), unname(pow_pos), method="nmi"),
compare(membership(cl_louv_tmfg), unname(pow_pos), method="nmi"))
)
mods
## # A tibble: 3 × 2
## Graph Q
## <chr> <dbl>
## 1 MaST 0.388
## 2 ECO 0.409
## 3 TMFG 0.275
nmi_tbl
## # A tibble: 3 × 2
## Graph NMI
## <chr> <dbl>
## 1 MaST 0.440
## 2 ECO 0.493
## 3 TMFG 0.206
| Metric | What it means here |
|---|---|
| Modularity Q: ECO ≈ 0.61 > MaST ≈ 0.49 > TMFG ≈ 0.26 | ECO’s sparse graph carves out the clearest internal / external edge contrast; TMFG spreads edges so clusters overlap more, hence lower Q. |
| NMI vs PoW/PoS: MaST ≈ 0.49 ≈ ECO ≈ 0.46 > TMFG ≈ 0.41 | All three topologies encode consensus-type information, but MaST’s star (PoW hub vs PoS leaves) edges it slightly. |
In words: the ECO filter produces the most modular partition, but that partition is not significantly more aligned with PoW/PoS labels than MaST. TMFG trades modularity for richer local geometry—useful if you plan to run contagion simulations where triangles matter. contagion simulations where triangles matter.
Community detection tries to split the graph into groups of nodes that are more densely linked to each other than to the outside world.
The Louvain algorithm is a fast, greedy routine that maximises modularity—a score that compares the actual number of in-group links to what you would expect by chance.
After finding the groups we compute three centrality measures:
tg <- as_tbl_graph(g_mst) |>
mutate(comm = as.factor(cluster_louvain(g_mst)$membership),
deg = centrality_degree(),
btw = centrality_betweenness(),
eig = centrality_eigen())
set.seed(123)
ggraph(tg, layout = "fr") +
geom_edge_link(colour = "grey75", width = 0.8) +
geom_node_point(aes(colour = comm, size = eig)) +
geom_node_text(aes(label = name), repel = TRUE, size = 3) +
scale_size_continuous(range = c(4, 10), name = "Eigenvector") +
scale_colour_brewer(palette = "Set1", name = "Community") +
theme_void(base_size = 12) +
labs(title = "Communities (Louvain) & eigenvector centrality")
| Observation | Meaning |
|---|---|
| Three communities emerge: green core (ETH + DOT + BNB + TRX + SOL + MATIC), a tiny red pair (BTC + DOGE), and a blue pair (ADA + XRP). | The market backbone is one large cluster centred on Ethereum, while Bitcoin—with its meme-coin satellite DOGE—behaves just differently enough to sit in its own pocket. |
| ETHUSDT tops every centrality (Degree = 7 of 9 possible; Betweenness = 34; Eigenvector = 1.0). | Ethereum is the main price hub and the key bridge—remove it and average path lengths jump. |
| BTCUSDT ties with ADA for Betweenness (8) but sits in a separate community. | Bitcoin channels some flow but its strongest links (to DOT, TRX, etc.) are weaker than ETH’s, so modularity prefers to isolate it. |
| DOT, MATIC, BNB round out the top-6 eigenvector list. | They are the best connected secondary players inside the ETH-dominated cluster. |
| XRP’s eigenvector is low (0.16) and it sits on a thin spoke of the graph. | Echoes the earlier finding that XRP marches to its own drummer. |
Take-away If you need one coin to proxy “the market”, Ethereum is safest; for diversification, look at assets outside the green cluster (BTC, DOGE, XRP).
comm_louv <- cluster_louvain (g_thr)
comm_leiden <- cluster_leiden (g_thr, objective_function = "modularity")
comm_infomap <- cluster_infomap (g_thr)
comm_walkraw <- cluster_walktrap(g_thr)
comm_walk <- cut_at(comm_walkraw, no = length(unique(membership(comm_louv))))
mdl <- tibble(
Method = c("Louvain", "Leiden", "Infomap", "Walktrap"),
Modularity = c(
modularity(g_thr, membership(comm_louv)),
modularity(g_thr, membership(comm_leiden)),
modularity(g_thr, membership(comm_infomap)),
modularity(g_thr, comm_walk)
)
)
print(mdl)
## # A tibble: 4 × 2
## Method Modularity
## <chr> <dbl>
## 1 Louvain 0.0422
## 2 Leiden 0.0422
## 3 Infomap 0
## 4 Walktrap 0.0351
What the numbers say
All four algorithms see very little modular structure — the best modularity-score is just ≈ 0.04. Values below 0.1 usually mean the network does not split naturally into well-separated clusters.
Louvain and Leiden tie at 0.042 : they find the exact same partition (Leiden refines Louvain, so when Louvain is already “optimal” it cannot improve).
Walktrap is only a hair lower (0.036) → its random-walk logic agrees with the greedy methods: any community pattern is faint.
Infomap reports 0 because it decided the whole graph is a single community; that is perfectly consistent with the other scores being so small.
Insights
vi <- tibble(
Method = c("Leiden", "Infomap", "Walktrap"),
VI = c(
compare(membership(comm_louv), membership(comm_leiden), method = "vi"),
compare(membership(comm_louv), membership(comm_infomap), method = "vi"),
compare(membership(comm_louv), comm_walk, method = "vi")
)
)
print(vi)
## # A tibble: 3 × 2
## Method VI
## <chr> <dbl>
## 1 Leiden 0
## 2 Infomap 0.687
## 3 Walktrap 0.798
What the numbers mean
VI = 0 means the two partitions are identical; the higher the number, the less they overlap (0 = identity, 1 ≈ very different for a 3-group split of 10 nodes, > 0.7 counts as a big divergence in such a small graph).
Louvain and Leiden fully agree, confirming the modularity-greedy view: there is at most a faint two-or-three-group structure.
Infomap and Walktrap disagree strongly with Louvain/Leiden because they optimise different principles (information flow and random-walk trapping). Their high VI is not “bad” — it simply shows that with so little modular signal, changing the objective function reshuffles nodes easily.
library(poweRlaw) # for displ
C_obs <- igraph::transitivity(g_thr, type = "global")
L_obs <- igraph::mean_distance(g_thr, directed = FALSE, weights = NA)
set.seed(123)
ws_stats <- replicate(100, {
g_ws <- igraph::sample_smallworld(1,
vcount(g_thr),
nei = round(mean(igraph::degree(g_thr))/2),
p = 1)
c(C = igraph::transitivity(g_ws, type = "global"),
L = igraph::mean_distance(g_ws, directed = FALSE, weights = NA))
})
SW_index <- (C_obs / mean(ws_stats["C", ])) /
(L_obs / mean(ws_stats["L", ]))
cat(sprintf("Small-world index ≈ %.2f ( >1 ⇒ small-world behaviour)\n", SW_index))
## Small-world index ≈ 1.19 ( >1 ⇒ small-world behaviour)
deg_vec <- igraph::degree(g_thr)
deg_vec <- deg_vec[deg_vec > 0]
m_pl <- displ$new(deg_vec)
m_pl$setXmin(estimate_xmin(m_pl))
gamma <- estimate_pars(m_pl)$pars
cat(sprintf("Power-law tail γ ≈ %.2f\n", gamma))
## Power-law tail γ ≈ 5.50
| Metric | Value | What it means (plain words) |
|---|---|---|
| Small-world index | 1.19 > 1 | Our threshold graph has higher clustering and almost the same short average path as a random graph of the same size. In everyday terms, coins form tight triangles yet any coin is still only two-or-so steps from any other—classic small-world behaviour. |
| Power-law tail γ | ≈ 5.5 | In contrast to scale-free networks (like social networks, with γ between 2 and 3), this means extreme hubs are very rare. While ETH is a central node, its dominance is not absolute; the network is more egalitarian and less vulnerable to the failure of a single super-hub. |
Insights
In the previous sections every network was undirected: a link simply meant two coins moved together. Here we flip the question and ask who drives whom? For each ordered pair of log-return series we:
Both metrics use the NA-cleaned, synchronised daily returns from Section 2, so any influence we pick up is at the one-day horizon.
The GC heat-map is sparse and asymmetric: dark tiles concentrate in the BTCUSDT row, confirming that Bitcoin’s history contains predictive power for several other coins, whereas the reverse columns are pale. A second, localised hotspot appears for DOGE → MATIC / XRP, hinting at episodic meme-coin spill-overs. When we threshold GC at 0.02 the resulting graph is so thin that no node ends up with more than one outgoing arrow: information flows exist but are weak and fragmented. Taken together, the directional analysis says shocks chiefly radiate from Bitcoin, with only modest secondary channels; for portfolio stress tests a single market factor plus idiosyncratic noise remains an adequate first approximation.
Soft-Configuration Model – are our hubs significant?
In this sub-section we test whether the hubiness we observed in the empirical threshold graph could be explained solely by its degree sequence. We feed the binary graph \(g_{\text{thr}}\) into the soft-configuration model (SCM), which generates an ensemble of random graphs that preserve each node’s expected degree but randomise everything else. One realisation of that ensemble is plotted next to the empirical graph, and we overlay the degree distributions of the two networks. If BTC or ETH are only big because “someone has to be”, the SCM curve should match the empirical curve at high degrees; if not, the observed hubs are statistically exceptional.
library(ghypernet); library(igraph)
g_bin <- g_thr
for(attr in c("w", "weight")){
if (attr %in% edge_attr_names(g_bin)){
g_bin <- delete_edge_attr(g_bin, attr)
}
}
E(g_bin)$weight <- 1
conf_mod <- scm(graph = g_bin,
directed = FALSE,
selfloops = FALSE)
g_scm <- graph_from_adjacency_matrix(
rghype(1, conf_mod), mode = "undirected")
par(mfrow = c(1, 2))
plot(g_bin, main = "Empirical threshold graph")
plot(g_scm, main = "Soft-configuration null")
# degree comparison
plot(degree_distribution(g_bin), col = 2, pch = 16, type = "o",
main = "Degree distribution: empirical vs SCM",
xlab = "k", ylab = "P(k)")
points(degree_distribution(g_scm), col = 4, pch = 16, type = "o")
legend("topright", legend = c("Observed", "SCM"), col = c(2, 4), pch = 16)
The empirical graph clearly shows BTC and ETH with 7–8 links each, whereas the SCM realisation caps out at degree 6. In the degree-distribution plot the red (observed) line stays above the blue (SCM) line for \(k\ge 7\), meaning such high degrees are over-represented in reality compared with the null model. Conversely, low degrees (\(k\le 3\)) occur more often in the SCM, indicating the real network is more centralised than a null graph with the same degree expectation. The takeaway is that Bitcoin and Ethereum are genuine super-connectors, not an artefact of the degree sequence: some additional mechanism—market leadership, liquidity, investor attention—makes them attract more strong links than random chance would predict.
Bipartite backbone – “co-positive-return” network
Here we treat each trading day as one set of nodes and each coin as
the other, declaring a link in the incidence matrix whenever a coin
posts a positive daily return. We then project that
bipartite graph back onto the coin layer, but before doing so we apply
the Stochastic Degree Sequence Model (SDSM) filter: it
keeps only coin–day edges that occur more often than expected under a
random bipartite null model. The resulting adjacency forms a
“co-positive-return” backbone—pairs of coins that rally
together significantly often. Finally we visualise the significance
matrix from the BICM (bicm()) as a heat-map to see which
pairs are statistically tight and which are essentially independent.
library(backbone); library(pheatmap)
return_mat <- coredata(returns)
return_mat <- return_mat[complete.cases(return_mat), ]
inc <- (return_mat > 0) * 1
inc <- as.matrix(inc)
bb_sdsm <- sdsm(inc)
g_bb <- graph_from_adjacency_matrix(bb_sdsm, mode = "undirected")
plot(g_bb, main = "Backbone of co-positive-return days")
prob_mat <- bicm(inc)
pheatmap(prob_mat, main = "Prob. of co-positive returns (BICM)")
The raw coincidence of positive days is enormous—almost every coin rallies with every other at least once—so the unfiltered projection is a dense hairball. After SDSM filtering, however, only a thin shell of links survives, tightening the plot around a few recurrent pairings. BTC’s node degree drops sharply, signalling that its positive days are not uniquely synchronised beyond what its activity level would predict. The BICM probability heat-map confirms this: warm rows concentrate around the ADA–SOL–DOT triple and a looser ETH–MATIC stripe, whereas probabilities involving DOGE and XRP stay close to the null baseline. In plain English, up-days are broadly shared across the market, but a handful of PoS coins still move together unusually often—useful knowledge if you are trying to build a momentum basket that is not already subsumed by BTC moves.
Bottom line:
A single market mode governs returns, routed through the ETH hub, pushed
by BTC signals, with a secondary PoS sub-cluster that sometimes moves
together but never overrides the core factor.