Initial Description: Four Distribution Letters

The different probability distribution functions all start with one of the following 4 letters:

  1. d \(\rightarrow\) density: Find the probability for a specific value: \(P(Y=a)\)

  2. p \(\rightarrow\) Find the probability for the specific value and all values less than it (aka, cumulative probability): \(P(Y \le a)\)

  3. q \(\rightarrow\) quantile: Finds the smallest value of the random variable, \(a\), so that \(P(Y \le a) \ge p\)

  1. r \(\rightarrow\) generate a value of the random variable Y given the parameters

Poisson Distribution

The Poisson distribution is used for unbounded count data, similar to a negative binomial.

But it is often used when counting how often a certain outcome occurs during an event.

  • Number of eggs by a mosquito
  • Number of cars that pass through an intersection during rush hour
  • Number of customers at a drive-thru in a day

Unlike the other 3 distributions, a Poisson only has 1 parameter: The average number of occurrences during the event

  • 250 eggs laid on average
  • 1750 cars on averate during rush hour
  • 123 customers on a typical day

The parameter is often denoted by \(\lambda\), although the letter \(\mu\) is occasionally used since it is an average.

The equation to calculate the probability is:

\[P(Y = a) = \frac{\lambda^a}{a!} e^{-\lambda} \]

1) dpois()

dpois() has 2 arguments:

  1. x = the value of the random variable

  2. lambda = the mean of the Poisson distribution

Let’s look at an example with \(\lambda = 4\) and \(a = 5\).

# P(Y = 5 | lambda = 4)
lambda <- 4; a <- 5

# Manual probability:
lambda ^ a / gamma(a + 1) * exp(-lambda)
## [1] 0.1562935
# Using the function
dpois(x = 5, lambda = 4)
## [1] 0.1562935
# Let's look at values of 0 to 20 and find their probabilities
pois_df <- 
  tibble(
    Y = 0:20,
    `P(Y = a)` = dpois(x = Y, lambda = lambda)
    )

pois_df |> 
  round(digits = 4)
## # A tibble: 21 × 2
##        Y `P(Y = a)`
##    <dbl>      <dbl>
##  1     0     0.0183
##  2     1     0.0733
##  3     2     0.146 
##  4     3     0.195 
##  5     4     0.195 
##  6     5     0.156 
##  7     6     0.104 
##  8     7     0.0595
##  9     8     0.0298
## 10     9     0.0132
## # ℹ 11 more rows
ggplot(
  data = pois_df,
       mapping = aes(x = Y, y = `P(Y = a)`)
  ) + 
  
  geom_col(fill = "steelblue") + 
  
  labs(
    title = 'Poisson(lambda = 4)',
    x = "a",
       y = "P(Y = a)"
    ) +
  theme(plot.title = element_text(hjust = 0.5, size = 16)) +
  
  scale_y_continuous(expand = c(0, 0, 0.05, 0))

2) ppois()

ppois() is used to find \(P(Y \le a)\) or can be used to find \(P(Y > a)\) if lower = F is included

# P(Y <= 5 | lambda = 4)
ppois(q = 5, lambda = 4)
## [1] 0.7851304
# P(Y > 5 | lambda = 4)
ppois(q = 5, lambda = 4, lower = F)
## [1] 0.2148696
# P(Y >= 5 | lambda = 4)
ppois(q = 5 - 1, lambda = 4, lower = F)
## [1] 0.3711631
# adding the cumulative probabilities to the data frame
pois_df <- 
  pois_df |> 
  mutate(
    `P(Y <= a)` = ppois(q = Y, lambda = 4)
    )

pois_df |> 
  round(digits = 4)
## # A tibble: 21 × 3
##        Y `P(Y = a)` `P(Y <= a)`
##    <dbl>      <dbl>       <dbl>
##  1     0     0.0183      0.0183
##  2     1     0.0733      0.0916
##  3     2     0.146       0.238 
##  4     3     0.195       0.434 
##  5     4     0.195       0.629 
##  6     5     0.156       0.785 
##  7     6     0.104       0.889 
##  8     7     0.0595      0.949 
##  9     8     0.0298      0.979 
## 10     9     0.0132      0.992 
## # ℹ 11 more rows

3) qpois()

qpois() can be used to find the smallest \(a\) where \(P(Y \le a) \ge p\)

For instance, a drive-thru averages of 20 customers per hour. What is the 90th percentile for the number of customers in an hour?

qpois(p = 0.90, lambda = 20)
## [1] 26
# Let's look at the table of probabilities for Y with probability +-5% from 90%
tibble(
  Y = 10:30,
  `P(Y <= a)` = ppois(q = Y, lambda = 20)
  ) |> 
  
  filter(between(`P(Y <= a)`, 0.85, 0.95))
## # A tibble: 3 × 2
##       Y `P(Y <= a)`
##   <int>       <dbl>
## 1    25       0.888
## 2    26       0.922
## 3    27       0.948

So 25 is the 88.8th percentile and 26 is the 92.2nd percentile

4) rpois()

rpois() can be used to generate random Poisson variables with a certain mean, \(\lambda\)

# Generating 20 Poisson R.V. with lambda = 4
rpois(n = 20, lambda = 4)
##  [1] 4 6 6 4 2 2 7 3 5 4 2 5 4 4 3 1 3 4 4 3

Let’s generate 10,000 Poisson RVs and plot them:

N = 1e5
tibble(
  Y = rpois(n = N, lambda = 4)
  ) |> 
  
  # Counting how often each Y occurs
  count(Y) |> 
  
  # Creating the graph
  ggplot(
    mapping = aes(x = factor(Y), y = n/N)
  ) + 
  
  geom_col(fill = "steelblue") + 
  
  labs(
    x = "Randomly Generated Poisson",
       y = "Probability"
    ) + 
  
  scale_y_continuous(expand = c(0, 0, 0.05, 0))