Poisson Distribution in R

Initial Description: Four Distribution Letters

The different probability distribution functions all start with one of the following 4 letters:

d \(\rightarrow\) density: Find the probability for a specific value: \(P(Y=a)\)
p \(\rightarrow\) Find the probability for the specific value and all values less than it (aka, cumulative probability): \(P(Y \le a)\)
q \(\rightarrow\) quantile: Finds the smallest value of the random variable, \(a\), so that \(P(Y \le a) \ge p\)

It’s basically p in reverse: If we know the probability, what is the value of the random variable?

r \(\rightarrow\) generate a value of the random variable Y given the parameters

Poisson Distribution

The Poisson distribution is used for unbounded count data, similar to a negative binomial.

But it is often used when counting how often a certain outcome occurs during an event.

Number of eggs by a mosquito
Number of cars that pass through an intersection during rush hour
Number of customers at a drive-thru in a day

Unlike the other 3 distributions, a Poisson only has 1 parameter: The average number of occurrences during the event

250 eggs laid on average
1750 cars on averate during rush hour
123 customers on a typical day

The parameter is often denoted by \(\lambda\), although the letter \(\mu\) is occasionally used since it is an average.

The equation to calculate the probability is:

\[P(Y = a) = \frac{\lambda^a}{a!} e^{-\lambda} \]

1) `dpois()`

dpois() has 2 arguments:

x = the value of the random variable
lambda = the mean of the Poisson distribution

Let’s look at an example with \(\lambda = 4\) and \(a = 5\).

# P(Y = 5 | lambda = 4)
lambda <- 4; a <- 5

# Manual probability:
lambda ^ a / gamma(a + 1) * exp(-lambda)

## [1] 0.1562935

# Using the function
dpois(x = 5, lambda = 4)

## [1] 0.1562935

# Let's look at values of 0 to 20 and find their probabilities
pois_df <- 
  tibble(
    Y = 0:20,
    `P(Y = a)` = dpois(x = Y, lambda = lambda)
    )

pois_df |> 
  round(digits = 4)

## # A tibble: 21 × 2
##        Y `P(Y = a)`
##    <dbl>      <dbl>
##  1     0     0.0183
##  2     1     0.0733
##  3     2     0.146 
##  4     3     0.195 
##  5     4     0.195 
##  6     5     0.156 
##  7     6     0.104 
##  8     7     0.0595
##  9     8     0.0298
## 10     9     0.0132
## # ℹ 11 more rows

ggplot(
  data = pois_df,
       mapping = aes(x = Y, y = `P(Y = a)`)
  ) + 
  
  geom_col(fill = "steelblue") + 
  
  labs(
    title = 'Poisson(lambda = 4)',
    x = "a",
       y = "P(Y = a)"
    ) +
  theme(plot.title = element_text(hjust = 0.5, size = 16)) +
  
  scale_y_continuous(expand = c(0, 0, 0.05, 0))

2) `ppois()`

ppois() is used to find \(P(Y \le a)\) or can be used to find \(P(Y > a)\) if lower = F is included

# P(Y <= 5 | lambda = 4)
ppois(q = 5, lambda = 4)

## [1] 0.7851304

# P(Y > 5 | lambda = 4)
ppois(q = 5, lambda = 4, lower = F)

## [1] 0.2148696

# P(Y >= 5 | lambda = 4)
ppois(q = 5 - 1, lambda = 4, lower = F)

## [1] 0.3711631

# adding the cumulative probabilities to the data frame
pois_df <- 
  pois_df |> 
  mutate(
    `P(Y <= a)` = ppois(q = Y, lambda = 4)
    )

pois_df |> 
  round(digits = 4)

## # A tibble: 21 × 3
##        Y `P(Y = a)` `P(Y <= a)`
##    <dbl>      <dbl>       <dbl>
##  1     0     0.0183      0.0183
##  2     1     0.0733      0.0916
##  3     2     0.146       0.238 
##  4     3     0.195       0.434 
##  5     4     0.195       0.629 
##  6     5     0.156       0.785 
##  7     6     0.104       0.889 
##  8     7     0.0595      0.949 
##  9     8     0.0298      0.979 
## 10     9     0.0132      0.992 
## # ℹ 11 more rows

3) `qpois()`

qpois() can be used to find the smallest \(a\) where \(P(Y \le a) \ge p\)

For instance, a drive-thru averages of 20 customers per hour. What is the 90th percentile for the number of customers in an hour?

qpois(p = 0.90, lambda = 20)

## [1] 26

# Let's look at the table of probabilities for Y with probability +-5% from 90%
tibble(
  Y = 10:30,
  `P(Y <= a)` = ppois(q = Y, lambda = 20)
  ) |> 
  
  filter(between(`P(Y <= a)`, 0.85, 0.95))

## # A tibble: 3 × 2
##       Y `P(Y <= a)`
##   <int>       <dbl>
## 1    25       0.888
## 2    26       0.922
## 3    27       0.948

So 25 is the 88.8th percentile and 26 is the 92.2nd percentile

4) `rpois()`

rpois() can be used to generate random Poisson variables with a certain mean, \(\lambda\)

# Generating 20 Poisson R.V. with lambda = 4
rpois(n = 20, lambda = 4)

##  [1] 4 6 6 4 2 2 7 3 5 4 2 5 4 4 3 1 3 4 4 3

Let’s generate 10,000 Poisson RVs and plot them:

N = 1e5
tibble(
  Y = rpois(n = N, lambda = 4)
  ) |> 
  
  # Counting how often each Y occurs
  count(Y) |> 
  
  # Creating the graph
  ggplot(
    mapping = aes(x = factor(Y), y = n/N)
  ) + 
  
  geom_col(fill = "steelblue") + 
  
  labs(
    x = "Randomly Generated Poisson",
       y = "Probability"
    ) + 
  
  scale_y_continuous(expand = c(0, 0, 0.05, 0))

Poisson Distribution in R

Module 1: Probability Distributions

STAT 5350

Initial Description: Four Distribution Letters

Poisson Distribution

1) `dpois()`

2) `ppois()`

3) `qpois()`

4) `rpois()`

Poisson Distribution in R

Module 1: Probability Distributions

STAT 5350

Initial Description: Four Distribution Letters

Poisson Distribution

1) dpois()

2) ppois()

3) qpois()

4) rpois()

1) `dpois()`

2) `ppois()`

3) `qpois()`

4) `rpois()`