library(tidyverse)
library(patchwork)

Problem

Block hashes provide us with a sequence of random numbers \(r_i\) at discrete time instants \(i\) which can be converted into Bernoulli trials. This allows us to easily construct a discrete renewal process in which inter-arrival times are geometrically distributed by simply picking a probability \(p\) and then constructing a function which, for each random number \(r_i\), generates a Bernoulli trial that succeeds with probability \(p\). The average inter-arrival time for such a process will be \(1/p\). Each renewal corresponds to a proof that needs to be submitted on-chain.

The problem with geometric inter-arrivals is that they are:

  1. Unbounded. Since the geometric distribution has unbounded support, the time between two successive renewals and, therefore, between two consecutive proofs, can be arbitrarily large.
  2. Skewed. The shape of geometric distributions means most renewals will be short. This means setting a mean of \(24\) does not really mean that one-proof-per-day is representative of the load one should prepare for. In fact, one should prepare for periods in which many proofs need to be submitted in a day, followed by long periods in which no proofs need to be submitted. This leads to bursty/non-bursty variance which is not desirable.

In the remainder of this document, I will show how to generate a uniform distribution using Bernoulli trials instead. This method can be readily converted to use block hashes in a decentralized fashion like before, but the characteristics of the uniform distribution are much more desirable. In particular, the uniform distribution is:

  1. Bounded. Parameters can be tuned such that nodes will need to provide at least one proof per \(24\text{h}\);
  2. Not skewed. The uniform distribution does not concentrate mass around short renewals - any value is as likely as the other. This makes it easier to maintain non-bursty loads, especially when holding multiple datasets1.

Generating Uniform Variates

At every discrete time instant, we are allowed to do one Bernoulli trial with probability \(p_i\). Since we are constructing a distribution with bounded support, the number of time instants that can go without proofs is also bounded. For simplicity, let us set the support of our distribution in \([1,24]\), which means we need to submit at least a proof per day and, on average, will be submitting one proof every \(12.5\text{h}\)2.

Assume without loss of generality we are at time zero, and that we have just submitted a proof; i.e., this is not a delayed renewal process. One way of seeing this, then, is that we want each of the next \(24\) time slots to be selected with equal probability for the next renewal (proof), i.e., each slot should be select with probability \(1/24\).

Since uniform distributions are not memoryless, implementing this with Bernoulli trials means we need to condition on each failed trial and adjust the probability weights accordingly. Let \(X\) be a random variable representing the next time instant, with respect to the last renewal, at which we will need to provide a proof. We want \(P(X = j) = 1/24\) for every \(j \in [1, 24]\).

First slot. This is easy: the first draw is done with \(p_1 = 1/24\). This ensures \(P(X = 1) = 1/24\).

Second slot. If we got to the second time instant, then we know \(X > 1\), so we condition on that:

\[ p_2 = P(X = 2 \mid X > 1) = \frac{P(X = 2 \cap X > 1)}{P(X > 1)} = \frac{\frac{1}{24}}{\frac{23}{24}} = \frac{1}{23} \]

\(k^{th}\) slot. If we are at the \(k^{th}\) slot, where \(1 \leq k \leq 24\), then, in general:

\[ p_k = P(X = k \mid X > k - 1) = \frac{P(X = k \cap X > k - 1)}{P(X > k - 1)} = \frac{\frac{1}{24}}{\frac{(24 - k + 1)}{24}} = \frac{1}{24 - k + 1} \]

Simulation

We simulate by generating Bernoulli trials with the prescribed probabilities until success and counting how many experiments failed before we got to success - this is the time elapsed to one proof. We then repeat this experiment \(50\,000\) times and look at the resulting distribution.

bernoulli <- function(p) runif(length(p)) < p
single_experiment <- function(p) {
  offset <- 0
  while(TRUE) {
    outcomes <- bernoulli(p)
    if (!any(outcomes)) {
      # No outcomes in this window, add offset.
      offset <- offset + length(p)
    } else {
      return(offset + which.min(!outcomes))
    }
  }
}

We use a (discrete) uniform distribution with mean \(12\) and support in \([1, 24]\).

unif_p <- function(k) 1/(k - (1:k) + 1)
outcomes_unif <- sapply(1:50000, function(i) single_experiment(unif_p(24)))

We will compare this with a geometric distribution with mean \(12.5\).

geom_p <- function(k) rep(1/k, k)
outcomes_geom <- sapply(1:50000, function(i) single_experiment(geom_p(12.5)))
ggplot(tibble(x = outcomes_unif)) + 
  geom_histogram(aes(x = x, y = after_stat(density)), binwidth = 1, col = 'black') +
  geom_vline(xintercept = 12.5, col = 'orange', lty = 2, lwd = 1.5) +
  ggtitle('Uniform Inter-Proof Times') + 
  xlab('time between proofs (hours)') + 
  ylab('density') | 
ggplot(tibble(x = outcomes_geom)) + 
  geom_histogram(aes(x = x, y = after_stat(density)), binwidth = 1, col = 'black') +
  geom_vline(xintercept = 12.5, col = 'orange', lty = 2, lwd = 1.5) +
  ggtitle('Geometric Inter-Proof Times') + 
    xlab('time between proofs (hours)') + 
    ylab('density')


  1. As I was writing this, it made me wonder: if we want values around the average to be more likely, then we should probably genearate Gaussian variates, or truncated Gaussian for bounded support.↩︎

  2. Having an average of \(12\) would require us to set the support of the distribution on \([0, 24]\), which would lead to \(25\) buckets and includes \(0\); i.e., you may have to provide a proof immediately after another. Setting the support to \([1,24]\) makes more sense, even if it results in a slightly off-centered average.↩︎