Swarms in Codex

Introduction

How do you download a file in Codex?
Before:

Problems

Implementation:
- naive manifests: all block CIDs that make up a file;
- large, expensive to store and download;
- addressed by the Merkelization effort.
Bitswap:
- designed for IPFS;
- general and expensive.
My issues:
- overlay shape/guarantees;
- performance and load balancing.

Bitswap

General storage (Merkle DAG) with ambitious goals:
- files;
- folders;
- version graphs;
- blockchains;
- yikes.

Bitswap

Any piece of the DAG must be addressable, because it can be shared.
Download: incrementally learn about DAG;
- asking for child blocks by CID at every step.

Efficient Block Lists

Codex exchanges files.
File: totally ordered set of blocks of known length;
- no versioning (mutability);
- no sharing.
Blocks can be identified by a \((\text{CID}, I)\) pair where \(I = [i_1, \cdots, i_n]\) is a list of integers.

From Indexing to Discovery

Roaring [1] bitsets:
- as low as \(0.34\) bit per block on average;
- that is \(752\times\) more efficient than a CID;
- this is all potential: experimentation/benchmarking required.
Enables efficient block discovery with swarms;
- nodes interested in a given file form a swarm;
- exchange lists with the blocks they have.

Discovery, Downloads, and Overlays

Codex Swarm Overlays

Known goals:
- low diameter;
- uniform degree distribution;
- as close to “uniformly random” links as possible (Erdős–Rényi [2]).
Start simple: why can’t we copy Bittorrent?

CodexTorrent (tm)

First Questions

Overlay is “similar” to Erdős–Rényi graphs, but not exactly the same.
The diameter of Erdős–Rényi graphs is relatively uncharted territory [3];
- becomes constant to as the graph gets denser;
- intermediate densities less clear.
Let us analyze, for our graphs:
- degree distributions;
- dissemination speed.

Degree Distribution

Swarms of sizes \(|V_i| = 10 \times 2^i\) with \(i \in \{0, \cdots, 7\}\);
bootstrap size \(1 \leq d \leq \left\lceil\ln |V_i|\right\rceil\).

Degree Distribution

Degree Distribution

Bias towards older nodes.
Some equivalent solutions:
- compensate on sampling (“tracker” returns newer nodes first);
- have “overloaded” nodes refuse new neighbors.
Both make it easier to hijack the swarm with Sybils;
- unclear how to solve;
- IPFS has related vulnerabilities.

Dissemination Speed

How long does it take for information to disseminate?
Without a link capacity model, graph topology (diameter) should dominate.
Multi-BFS experiment on the same graphs as before using \(1\), \(2\), \(3\), and \(4\) random sources.

Dissemination Speed

Graph Diameters

References

[1]

S. Chambi, D. Lemire, O. Kaser, and R. Godin, “Better bitmap performance with roaring bitmaps,” Software: Practice and Experience, vol. 46, no. 5, pp. 709–719, 2016.

[2]

B. Bollobás, Random graphs, 2nd ed. Cambridge University Press, 2001.

[3]

A. K. Hartmann and M. Mézard, “Distribution of diameters for erdős-rényi random graphs,” Phys. Rev. E, vol. 97, Mar. 2018.