Notes 3
Binomial and Poisson Distributions
Loading packages
## Warning: package 'ggplot2' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Binomial distribution
Wikipedia is a great source for information on distributions: Binomial distribution. For the binomial distribution, there are \(n\) independent experiments, each giving a binary result (“success” or “failure”), where \(p\) is the probability of “success”.
Examples
Flipping a (fair) coin 10 times and counting how many heads you get. n = 10, p = 0.5
Rolling a single 6-sided die three times and counting how many times you get a one or two. n = 3, p = 1/3
Discrete vs. Continuous Distributions
The binomial distribution is a discrete probability distribution because the only possible values are 0, 1, 2, 3, …, n.
The normal distribution is a continuous distribution because they can take any value in an interval.
Probability mass function
Thus, the binomial distribution has a probability mass function, where there is a probability associated with each individual value.
The normal distribution has a probability density function, where probability is only associated with ranges of numbers. For a continuous distribution, the probability of any specific value is zero.
The probability mass function for the Binomial(\(n\), \(p\)) distribution is
\[f(x|n, p) = \left(\begin{array}{c} n \\ x \end{array}\right) p^x (1-p)^{n-x},\]
for getting \(x\) successes out of \(n\). Note that \(\left(\begin{array}{c} n \\ x \end{array}\right)\) is the combination so
\[\left(\begin{array}{c} n \\ x \end{array}\right) = \frac{n!}{x!(n-x)!}\]
Why? Say we are flipping a coin 4 times
The \(p^x (1-p)^{n-x}\) is the probability of one arrangement of \(x\) successes
\(\left(\begin{array}{c} n \\ x \end{array}\right)\) is the number of arrangements. For example, if we got 1 head out of four tosses, there are 4 arrangements: THTT, HTTT, TTTH, TTHT.
Binomial R functions
Following the template _<dist>
from Notes 2 with
with <dist>
= binom
gives us R functions
for the Binomial distribution. Let’s look at the documentation:
So the parameters are given as
size
= number of trials (\(n\)) andprob
= probability of success (\(p\))
dbinom
For instance, we can use dbinom
to give us the
probability of 0, 1, 2, 3, 4 heads out of 4 tosses:
## [1] 0.0625 0.2500 0.3750 0.2500 0.0625
What is the value of dbinom()
for other \(x\) values (for instance, decimals)?
Does it work? google! no- because decimals binomal takes only discrete values.
## Warning in dbinom(0.4, size = 4, prob = 0.5): non-integer x = 0.400000
## [1] 0
Let’s plot the Binomial distribution for \(p = 1/6\) and \(n = 60\). This might be used for a dice rolling game like Settlers of Catan.
tibble(
x = 0:60,
Probability = dbinom(x, size = 60, prob = 1/6)
) |>
ggplot() +
geom_segment(aes(x = x, xend = x, y = Probability, yend = 0))
Why do we display the distribution with these vertical lines instead of a density curve? Only integer values have probability
Let’s change the above so the x-axis gives the proportion of successes.
tibble(
x = 0:60,
Probability = dbinom(x, size = 60, prob = 1/6),
proportion = x/60
) |>
ggplot() +
geom_segment(aes(x = proportion, xend = proportion, y = Probability, yend = 0))
Poisson distribution
Distribution for the count of events: Poisson distribution
Example: For instance, consider a call center which receives, randomly, an average of \(\lambda = 3\) calls per minute at all times of day. If the calls are independent, receiving one does not change the probability of when the next one will arrive. Under these assumptions, the number \(x\) of calls received during any minute has a Poisson probability distribution.
Probability mass function:
\[f(x | \lambda) = \frac{\lambda^x
e^{-\lambda}}{x!}\] The abbreviation in
_<dist>
is <dist>
=
pois
. Let’s look at the documentation:
Let’s use this to plot the probability of \(x\) = the number of calls in the next minute in the call center described above with mean \(\lambda = 3\) calls per minute:
tibble(
x = seq(from =0, to=15, by=1),
Probability = dpois(x, lambda=3)
) |>
ggplot() +
geom_segment(aes(x = x, xend = x, y = Probability, yend = 0))
What is the probabilty of getting less than or equal to 4 calls in the next minute?
## [1] 0.8152632
What is the probability of getting more than 5 calls in the next minute?
## [1] 0.08391794