Notes 3

Binomial and Poisson Distributions

Loading packages

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Binomial distribution

Wikipedia is a great source for information on distributions: Binomial distribution. For the binomial distribution, there are \(n\) independent experiments, each giving a binary result (“success” or “failure”), where \(p\) is the probability of “success”.

Examples

  • Flipping a (fair) coin 10 times and counting how many heads you get. n = 10, p = 0.5

  • Rolling a single 6-sided die three times and counting how many times you get a one or two. n = 3, p = 1/3

Discrete vs. Continuous Distributions

The binomial distribution is a discrete probability distribution because the only possible values are 0, 1, 2, 3, …, n.

The normal distribution is a continuous distribution because they can take any value in an interval.

Probability mass function

Thus, the binomial distribution has a probability mass function, where there is a probability associated with each individual value.

The normal distribution has a probability density function, where probability is only associated with ranges of numbers. For a continuous distribution, the probability of any specific value is zero.

The probability mass function for the Binomial(\(n\), \(p\)) distribution is

\[f(x|n, p) = \left(\begin{array}{c} n \\ x \end{array}\right) p^x (1-p)^{n-x},\]

for getting \(x\) successes out of \(n\). Note that \(\left(\begin{array}{c} n \\ x \end{array}\right)\) is the combination so

\[\left(\begin{array}{c} n \\ x \end{array}\right) = \frac{n!}{x!(n-x)!}\]

Why? Say we are flipping a coin 4 times

  • The \(p^x (1-p)^{n-x}\) is the probability of one arrangement of \(x\) successes

  • \(\left(\begin{array}{c} n \\ x \end{array}\right)\) is the number of arrangements. For example, if we got 1 head out of four tosses, there are 4 arrangements: THTT, HTTT, TTTH, TTHT.

Binomial R functions

Following the template _<dist> from Notes 2 with with <dist> = binom gives us R functions for the Binomial distribution. Let’s look at the documentation:

# ?dbinom

So the parameters are given as

  • size = number of trials (\(n\)) and

  • prob = probability of success (\(p\))

dbinom

For instance, we can use dbinom to give us the probability of 0, 1, 2, 3, 4 heads out of 4 tosses:

dbinom(0:4, size = 4, prob = .5)
## [1] 0.0625 0.2500 0.3750 0.2500 0.0625

What is the value of dbinom() for other \(x\) values (for instance, decimals)?

Does it work? google! no- because decimals binomal takes only discrete values.

dbinom(.4, size = 4, prob = .5)
## Warning in dbinom(0.4, size = 4, prob = 0.5): non-integer x = 0.400000
## [1] 0

Let’s plot the Binomial distribution for \(p = 1/6\) and \(n = 60\). This might be used for a dice rolling game like Settlers of Catan.

tibble(
  x = 0:60, 
  Probability = dbinom(x, size = 60, prob = 1/6)
) |> 
  ggplot() + 
  geom_segment(aes(x = x, xend = x, y = Probability, yend = 0))

Why do we display the distribution with these vertical lines instead of a density curve? Only integer values have probability

Let’s change the above so the x-axis gives the proportion of successes.

tibble(
  x = 0:60, 
  Probability = dbinom(x, size = 60, prob = 1/6),
  proportion = x/60
) |> 
  ggplot() + 
  geom_segment(aes(x = proportion, xend = proportion, y = Probability, yend = 0))

pbinom

How are the values given by pbinom related to the values given by dbinom?

dbinom(10, size = 60, prob = 1/6)
## [1] 0.1370131
pbinom(10, size = 60, prob = 1/6)
## [1] 0.5833866

pbinom(3) gives you dbinom(0) + dbinom(1) + dbinom(2) + dbinom(3).

Poisson distribution

Distribution for the count of events: Poisson distribution

Example: For instance, consider a call center which receives, randomly, an average of \(\lambda = 3\) calls per minute at all times of day. If the calls are independent, receiving one does not change the probability of when the next one will arrive. Under these assumptions, the number \(x\) of calls received during any minute has a Poisson probability distribution.

Probability mass function:

\[f(x | \lambda) = \frac{\lambda^x e^{-\lambda}}{x!}\] The abbreviation in _<dist> is <dist> = pois. Let’s look at the documentation:

# ?dpois

Let’s use this to plot the probability of \(x\) = the number of calls in the next minute in the call center described above with mean \(\lambda = 3\) calls per minute:

tibble(
  x = seq(from =0, to=15, by=1), 
  Probability = dpois(x, lambda=3)
) |> 
  ggplot() + 
  geom_segment(aes(x = x, xend = x, y = Probability, yend = 0))

What is the probabilty of getting less than or equal to 4 calls in the next minute?

ppois(4, lambda=3)
## [1] 0.8152632

What is the probability of getting more than 5 calls in the next minute?

1 - ppois(5, lambda=3)
## [1] 0.08391794