negative binomial distribution

Author

Sai Narsa Reddy

Abstract

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a die as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

An alternative formulation is to model the number of total trials (instead of the number of failures). In fact, for a specified (non-random) number of successes (r), the number of failures (n - r) are random because the total trials (n) are random. For example, we could use the negative binomial distribution to model the number of days n (random) a certain machine works (specified by r) before it breaks down.

Introduction

The Pascal distribution and Polya distribution are special cases of the negative binomial distribution. A convention among engineers, climatologists, and others is to use “negative binomial” or “Pascal” for the case of an integer-valued stopping-time parameter ({r}) and use “Polya” for the real-valued case.

For occurrences of associated discrete events, like tornado outbreaks, the Polya distributions can be used to give more accurate models than the Poisson distribution by allowing the mean and variance to be different, unlike the Poisson. The negative binomial distribution has a variance {/p}, where {r} is the number of successes, with the distribution becoming identical to Poisson in the limit {p} for a given mean {}. This can make the distribution a useful alternative to the Poisson distribution, for example for a modification of Poisson regression. In epidemiology, it has been used to model disease transmission for infectious diseases where the likely number of onward infections may vary considerably from individual to individual and from setting to setting. More generally, it may be appropriate where events have positively correlated occurrences causing a larger variance than if the occurrences were independent, due to a positive covariance term.

Definitions

Probability mass function

Cumulative distribution function

Properties

Expectation

The expected total number of successes in a negative binomial distribution with parameters (r, p) is rp/(1 − p). To see this, imagine an experiment simulating the negative binomial is performed many times. That is, a set of trials is performed until r failures are obtained, then another set of trials, and then another etc. Write down the number of trials performed in each experiment: a, b, c, ... and set a + b + c + ... = N. Now we would expect about Np successes in total. Say the experiment was performed n times. Then there are nr failures in total. So we would expect nr = N(1 − p), so N/n = r/(1 − p). See that N/n is just the average number of trials per experiment. That is what we mean by “expectation”. The average number of successes per experiment is N/n − r = r/(1 − p) − r = rp/(1 − p). This agrees with the mean given in the box on the right-hand side of this page.

variance

When counting the number of successes given the number r of failures, the variance is rp/(1 − p)². When counting the number of failures before the r-th success, the variance is r(1 − p)/p².

Relation to Binomial Theorem

Formulas of the Distribution

Example

An oil company has a p = 0.20 chance of striking oil when drilling a well. What is the probability the company drills x = 7 wells to strike oil r = 3 times?

r = 3
p = 0.20
n = 7 - r
# exact
dnbinom(x = n, size = r, prob = p)

[1] 0.049152

mean(rnbinom(n = 10000, size = r, prob = p) == n)

[1] 0.0486

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)

data.frame(x = 0:10, prob = dnbinom(x = 0:10, size = r, prob = p)) %>%
  mutate(Failures = ifelse(x == n, n, "other")) %>%
ggplot(aes(x = factor(x), y = prob, fill = Failures)) +
  geom_col() +
  geom_text(
    aes(label = round(prob,2), y = prob + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "Probability of r = 3 Successes in X = 7 Trials",
       subtitle = "NB(3,.2)",
       x = "Failed Trials (X - r)",
       y = "Probability")

References

https://en.wikipedia.org/wiki/Negative_binomial_distribution#References

https://rpubs.com/mpfoley73/458738

https://www.cuemath.com/algebra/negative-binomial-distribution/