The different probability distribution functions all start with one of the following 4 letters:
d \(\rightarrow\) density: Find the probability for a specific value: \(P(Y=a)\)
p \(\rightarrow\) Find the probability for the specific value and all values less than it (aka, cumulative probability): \(P(Y \le a)\)
q \(\rightarrow\) quantile: Finds the smallest value of the random variable, \(a\), so that \(P(Y \le a) \ge p\)
The Poisson distribution is used for unbounded count data, similar to a negative binomial.
But it is often used when counting how often a certain outcome occurs during an event.
Unlike the other 3 distributions, a Poisson only has 1 parameter: The average number of occurrences during the event
The parameter is often denoted by \(\lambda\), although the letter \(\mu\) is occasionally used since it is an average.
The equation to calculate the probability is:
\[P(Y = a) = \frac{\lambda^a}{a!} e^{-\lambda} \]
dpois()
dpois()
has 2 arguments:
x =
the value of the random variable
lambda =
the mean of the Poisson
distribution
Let’s look at an example with \(\lambda = 4\) and \(a = 5\).
# P(Y = 5 | lambda = 4)
lambda <- 4; a <- 5
# Manual probability:
lambda ^ a / gamma(a + 1) * exp(-lambda)
## [1] 0.1562935
# Using the function
dpois(x = 5, lambda = 4)
## [1] 0.1562935
# Let's look at values of 0 to 20 and find their probabilities
pois_df <-
tibble(
Y = 0:20,
`P(Y = a)` = dpois(x = Y, lambda = lambda)
)
pois_df |>
round(digits = 4)
## # A tibble: 21 × 2
## Y `P(Y = a)`
## <dbl> <dbl>
## 1 0 0.0183
## 2 1 0.0733
## 3 2 0.146
## 4 3 0.195
## 5 4 0.195
## 6 5 0.156
## 7 6 0.104
## 8 7 0.0595
## 9 8 0.0298
## 10 9 0.0132
## # ℹ 11 more rows
ggplot(
data = pois_df,
mapping = aes(x = Y, y = `P(Y = a)`)
) +
geom_col(fill = "steelblue") +
labs(
title = 'Poisson(lambda = 4)',
x = "a",
y = "P(Y = a)"
) +
theme(plot.title = element_text(hjust = 0.5, size = 16)) +
scale_y_continuous(expand = c(0, 0, 0.05, 0))
ppois()
ppois()
is used to find \(P(Y
\le a)\) or can be used to find \(P(Y
> a)\) if lower = F
is included
# P(Y <= 5 | lambda = 4)
ppois(q = 5, lambda = 4)
## [1] 0.7851304
# P(Y > 5 | lambda = 4)
ppois(q = 5, lambda = 4, lower = F)
## [1] 0.2148696
# P(Y >= 5 | lambda = 4)
ppois(q = 5 - 1, lambda = 4, lower = F)
## [1] 0.3711631
# adding the cumulative probabilities to the data frame
pois_df <-
pois_df |>
mutate(
`P(Y <= a)` = ppois(q = Y, lambda = 4)
)
pois_df |>
round(digits = 4)
## # A tibble: 21 × 3
## Y `P(Y = a)` `P(Y <= a)`
## <dbl> <dbl> <dbl>
## 1 0 0.0183 0.0183
## 2 1 0.0733 0.0916
## 3 2 0.146 0.238
## 4 3 0.195 0.434
## 5 4 0.195 0.629
## 6 5 0.156 0.785
## 7 6 0.104 0.889
## 8 7 0.0595 0.949
## 9 8 0.0298 0.979
## 10 9 0.0132 0.992
## # ℹ 11 more rows
qpois()
qpois()
can be used to find the smallest \(a\) where \(P(Y
\le a) \ge p\)
For instance, a drive-thru averages of 20 customers per hour. What is the 90th percentile for the number of customers in an hour?
qpois(p = 0.90, lambda = 20)
## [1] 26
# Let's look at the table of probabilities for Y with probability +-5% from 90%
tibble(
Y = 10:30,
`P(Y <= a)` = ppois(q = Y, lambda = 20)
) |>
filter(between(`P(Y <= a)`, 0.85, 0.95))
## # A tibble: 3 × 2
## Y `P(Y <= a)`
## <int> <dbl>
## 1 25 0.888
## 2 26 0.922
## 3 27 0.948
So 25 is the 88.8th percentile and 26 is the 92.2nd percentile
rpois()
rpois()
can be used to generate random Poisson variables
with a certain mean, \(\lambda\)
# Generating 20 Poisson R.V. with lambda = 4
rpois(n = 20, lambda = 4)
## [1] 4 6 6 4 2 2 7 3 5 4 2 5 4 4 3 1 3 4 4 3
Let’s generate 10,000 Poisson RVs and plot them:
N = 1e5
tibble(
Y = rpois(n = N, lambda = 4)
) |>
# Counting how often each Y occurs
count(Y) |>
# Creating the graph
ggplot(
mapping = aes(x = factor(Y), y = n/N)
) +
geom_col(fill = "steelblue") +
labs(
x = "Randomly Generated Poisson",
y = "Probability"
) +
scale_y_continuous(expand = c(0, 0, 0.05, 0))