Swarms in Codex
Introduction
How do you download a file in Codex?
Before:
Problems
Implementation:
naive manifests: all block CIDs that make up a file;
large, expensive to store and download;
addressed by the Merkelization effort.
Bitswap:
designed for IPFS;
general and expensive.
My issues:
overlay shape/guarantees;
performance and load balancing.
Bitswap
General storage (Merkle DAG) with ambitious goals:
files;
folders;
version graphs;
blockchains;
yikes.
Bitswap
Any piece of the DAG must be addressable, because it can be shared.
Download: incrementally learn about DAG;
asking for child blocks by CID at every step.
Efficient Block Lists
Codex exchanges
files
.
File: totally ordered set of blocks of known length;
no versioning (mutability);
no sharing.
Blocks can be identified by a
\((\text{CID}, I)\)
pair where
\(I = [i_1, \cdots, i_n]\)
is a list of integers.
From Indexing to Discovery
Roaring
[1]
bitsets:
as low as
\(0.34\)
bit per block on average;
that is
\(752\times\)
more efficient than a CID;
this is all potential: experimentation/benchmarking required.
Enables efficient
block discovery
with
swarms
;
nodes interested in a given file form a
swarm
;
exchange lists with the blocks they have.
Discovery, Downloads, and Overlays
Codex Swarm Overlays
Known goals:
low diameter;
uniform degree distribution;
as close to “uniformly random” links as possible (Erdős–Rényi
[2]
).
Start simple: why can’t we copy Bittorrent?
CodexTorrent (tm)
First Questions
Overlay is “similar” to Erdős–Rényi graphs, but not exactly the same.
The diameter of Erdős–Rényi graphs is relatively uncharted territory
[3]
;
becomes constant to as the graph gets denser;
intermediate densities less clear.
Let us analyze, for our graphs:
degree distributions;
dissemination speed.
Degree Distribution
Swarms of sizes
\(|V_i| = 10 \times 2^i\)
with
\(i \in \{0, \cdots, 7\}\)
;
bootstrap size
\(1 \leq d \leq \left\lceil\ln |V_i|\right\rceil\)
.
Degree Distribution
Degree Distribution
Bias towards older nodes.
Some equivalent solutions:
compensate on sampling (“tracker” returns newer nodes first);
have “overloaded” nodes refuse new neighbors.
Both make it easier to hijack the swarm with Sybils;
unclear how to solve;
IPFS has related vulnerabilities.
Dissemination Speed
How long does it take for information to disseminate?
Without a link capacity model, graph topology (diameter) should dominate.
Multi-BFS experiment on the same graphs as before using
\(1\)
,
\(2\)
,
\(3\)
, and
\(4\)
random sources.
Dissemination Speed
Graph Diameters
References
[1]
S. Chambi, D. Lemire, O. Kaser, and R. Godin,
“
Better bitmap performance with roaring bitmaps
,”
Software: Practice and Experience
, vol. 46, no. 5, pp. 709–719, 2016.
[2]
B. Bollobás,
Random graphs
, 2nd ed. Cambridge University Press, 2001.
[3]
A. K. Hartmann and M. Mézard,
“
Distribution of diameters for erdős-rényi random graphs
,”
Phys. Rev. E
, vol. 97, Mar. 2018.