R Functions dnbinom, pnbinom, and rnbinom

Random varaible \(X\) is distributed \(X \sim NB(r,p)\) with mean \(\mu=\frac{{r}}{{p}}\) and variance \(\sigma^2 = \frac{{r(1-p)}}{{p^2}}\) if \(X\) is the count of independent Bernoulli trials required to achieve the \(r^{th}\) successful trial when the probability of success is constant \(p\). The probability of \(X=n\) trials is \(f(X=n) = \binom{n-1}{r-1}p^r(1-p)^{n-r}\).

R function dnbinom(x, size, prob) is the probability of x failures prior to the rth success (note the difference) when the probability of success is prob. R function pgeom(q, prob, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to q failures prior to success. R function rgeom(n, size, prob) returns n random numbers from the geometric distribution x~geom(prob). R function qgeom(p, prob, lower.tail) is the number of failures at the qth percentile (lower.tail = TRUE).

Example

An oil company has a p = 0.20 chance of striking oil when drilling a well. What is the probability the company drills x = 7 wells to strike oil r = 3 times?

r = 3
p = 0.20
n = 7 - r
# exact
dnbinom(x = n, size = r, prob = p)
## [1] 0.049152
# simulated
mean(rnbinom(n = 10000, size = r, prob = p) == n)
## [1] 0.0463
library(dplyr)
library(ggplot2)

data.frame(x = 0:10, prob = dnbinom(x = 0:10, size = r, prob = p)) %>%
  mutate(Failures = ifelse(x == n, n, "other")) %>%
ggplot(aes(x = factor(x), y = prob, fill = Failures)) +
  geom_col() +
  geom_text(
    aes(label = round(prob,2), y = prob + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "Probability of r = 3 Successes in X = 7 Trials",
       subtitle = "NB(3,.2)",
       x = "Failed Trials (X - r)",
       y = "Probability") 

Example

What is the expected number of trials to achieve r = 3 successes when the probability of success is p = 0.2?

r = 3
p = 0.20
# mean
# exact
r / p
## [1] 15
# simulated
mean(rnbinom(n = 10000, size = 3, prob = p)) + r
## [1] 15.0766
# Variance
# exact
r * (1 - p) / p^2
## [1] 60
# simulated
var(rnbinom(n = 100000, size = r, prob = p))
## [1] 59.76289
library(dplyr)
library(ggplot2)

data.frame(x = 1:20, 
           pmf = dnbinom(x = 1:20, size = r, prob = p),
           cdf = pnbinom(q = 1:20, size = r, prob = p, lower.tail = TRUE)) %>%
ggplot(aes(x = factor(x), y = cdf)) +
  geom_col() +
  geom_text(
    aes(label = round(cdf,2), y = cdf + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "Cumulative Probability of X = x failed trials to achieve 3rd success",
       subtitle = "NB(3,.2)",
       x = "Failed Trials (x)",
       y = "probability")