Swarms in Codex

Introduction

  • How do you download a file in Codex?
  • Before:

Problems

  • Implementation:
    • naive manifests: all block CIDs that make up a file;
    • large, expensive to store and download;
    • addressed by the Merkelization effort.
  • Bitswap:
    • designed for IPFS;
    • general and expensive.
  • My issues:
    • overlay shape/guarantees;
    • performance and load balancing.

Bitswap

  • General storage (Merkle DAG) with ambitious goals:
    • files;
    • folders;
    • version graphs;
    • blockchains;
    • yikes.

Bitswap

  • Any piece of the DAG must be addressable, because it can be shared.
  • Download: incrementally learn about DAG;
    • asking for child blocks by CID at every step.

Efficient Block Lists

  • Codex exchanges files.
  • File: totally ordered set of blocks of known length;
    • no versioning (mutability);
    • no sharing.
  • Blocks can be identified by a \((\text{CID}, I)\) pair where \(I = [i_1, \cdots, i_n]\) is a list of integers.

From Indexing to Discovery

  • Roaring [1] bitsets:
    • as low as \(0.34\) bit per block on average;
    • that is \(752\times\) more efficient than a CID;
    • this is all potential: experimentation/benchmarking required.
  • Enables efficient block discovery with swarms;
    • nodes interested in a given file form a swarm;
    • exchange lists with the blocks they have.

Discovery, Downloads, and Overlays

Codex Swarm Overlays

  • Known goals:
    • low diameter;
    • uniform degree distribution;
    • as close to “uniformly random” links as possible (Erdős–Rényi [2]).
  • Start simple: why can’t we copy Bittorrent?

CodexTorrent (tm)

First Questions

  • Overlay is “similar” to Erdős–Rényi graphs, but not exactly the same.
  • The diameter of Erdős–Rényi graphs is relatively uncharted territory [3];
    • becomes constant to as the graph gets denser;
    • intermediate densities less clear.
  • Let us analyze, for our graphs:
    • degree distributions;
    • dissemination speed.

Degree Distribution

  • Swarms of sizes \(|V_i| = 10 \times 2^i\) with \(i \in \{0, \cdots, 7\}\);
  • bootstrap size \(1 \leq d \leq \left\lceil\ln |V_i|\right\rceil\).

Degree Distribution

Degree Distribution

  • Bias towards older nodes.
  • Some equivalent solutions:
    • compensate on sampling (“tracker” returns newer nodes first);
    • have “overloaded” nodes refuse new neighbors.
  • Both make it easier to hijack the swarm with Sybils;
    • unclear how to solve;
    • IPFS has related vulnerabilities.

Dissemination Speed

  • How long does it take for information to disseminate?
  • Without a link capacity model, graph topology (diameter) should dominate.
  • Multi-BFS experiment on the same graphs as before using \(1\), \(2\), \(3\), and \(4\) random sources.

Dissemination Speed

Graph Diameters

References

[1]
S. Chambi, D. Lemire, O. Kaser, and R. Godin, Better bitmap performance with roaring bitmaps,” Software: Practice and Experience, vol. 46, no. 5, pp. 709–719, 2016.
[2]
B. Bollobás, Random graphs, 2nd ed. Cambridge University Press, 2001.
[3]
A. K. Hartmann and M. Mézard, Distribution of diameters for erdős-rényi random graphs,” Phys. Rev. E, vol. 97, Mar. 2018.