R Functions dpois, ppois, and rpois

Random varaible \(X\) is distributed \(X \sim P(\lambda)\) with mean \(\mu=\lambda\) and variance \(\sigma^2 = \lambda\) if \(X = x\) is the number of successes in \(n\) (many) trials when the probability of success \(\lambda / n\) is small. The probability of \(X = k\) successes is \(Pr(X = k) = (e^{-\lambda} \lambda^k)/k!\).

R function dpois(x, lambda) is the probability of x successes in a period when the expected number of events is lambda. R function ppois(q, lambda, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to q successes. R function rpois(n, lambda) returns n random numbers from the Poisson distribution x ~ P(lambda). R function qpois(p, lambda, lower.tail returns the value (quantile) at the specified cumulative probability (percentile) p.

Example

What is the probability of making 2 to 4 sales in a week if the average sales rate is 3 per week?

# Using cumulative probability
ppois(q = 4, lambda = 3, lower.tail = TRUE) - 
  ppois(q = 1, lambda = 3, lower.tail = TRUE)
## [1] 0.616115
# Using exact probability
dpois(x = 2, lambda = 3) +
  dpois(x = 3, lambda = 3) +
  dpois(x = 4, lambda = 4)
## [1] 0.6434504
# expected number of sales = lambda = 3

# variance = lambda = 3

library(ggplot2)
library(dplyr)
options(scipen = 999, digits = 2) # sig digits

events <- 0:10
density <- dpois(x = events, lambda = 3)
prob <- ppois(q = events, lambda = 3, lower.tail = TRUE)
df <- data.frame(events, density, prob)
ggplot(df, aes(x = factor(events), y = density)) +
  geom_col() +
  geom_text(
    aes(label = round(density,2), y = density + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "PMF and CDF of Poisson Distribution",
       subtitle = "P(3).",
       x = "Events (x)",
       y = "Density") +
  geom_line(data = df, aes(x = events, y = prob))

Example

Suppose a baseball player has a p=.300 batting average. What is the probability of X<=150 hits in n=500 at bats? X=150? X>150?

# probability of x <= 150
ppois(q = 150, lambda = .300 * 500, lower.tail = TRUE)
## [1] 0.52
# probability of x = 150
dpois(x = 150, lambda = .300 * 500)
## [1] 0.033
# probability of x > 150
ppois(q = 150, lambda = .300 * 500, lower.tail = FALSE) 
## [1] 0.48
library(ggplot2)
library(dplyr)
options(scipen = 999, digits = 2) # sig digits

hits <- 0:100 * 3
density <- dpois(x = hits, lambda = .300 * 500)
prob <- ppois(q = hits, lambda = .300 * 500, lower.tail = TRUE)
df <- data.frame(hits, density, prob)
ggplot(df, aes(x = hits, y = density)) +
  geom_col() +
  labs(title = "Poisson(150)",
       subtitle = "PMF and CDF of Poisson(3) distribution.",
       x = "Hits (x)",
       y = "Density") +
  geom_line(data = df, aes(x = hits, y = prob))

The Poisson distribution approximates the binomial distribution with \(\lambda=np\) if \(n>=20\) and \(p<=0.05\).

Example

What is the distribution of successes from a sample of n = 50 when the probability of success is p = .03?

library(ggplot2)
library(dplyr)
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.4
options(scipen = 999, digits = 2) # sig digits

n = 0:10
df <- data.frame(events = 0:10, 
                      Poisson = dpois(x = n, lambda = .03 * 50),
                      Binomial = dbinom(x = n, size = 50, p = .03))
df_tidy <- gather(df, key = "Distribution", value = "density", -c(events))
ggplot(df_tidy, aes(x = factor(events), y = density, fill = Distribution)) +
  geom_col(position = "dodge") +
  labs(title = "Poisson(15) and Binomial(50, .03)",
       subtitle = "Poisson approximates binomial when n >= 20 and p <= .05.",
       x = "Events (x)",
       y = "Density")

Example

Suppose the probability that a drug produces a certain side effect is p = = 0.1% and n = 1,000 patients in a clinical trial receive the drug. What is the probability 0 people experience the side effect?

# The expected value is np
1000 * .001
## [1] 1
# The probability of measuring 0 when the expected value is 1
dpois(x = 0, lambda = 1000 * .001) 
## [1] 0.37
library(ggplot2)
library(dplyr)
options(scipen = 999, digits = 2) # sig digits

x <- 0:10
density <- dpois(x = x, lambda = 1000 * .001)
prob <- ppois(q = x, lambda = 1000 * .001, lower.tail = TRUE)
df <- data.frame(x, density, prob)
ggplot(df, aes(x = x, y = density)) +
  geom_col() +
  geom_text(
    aes(label = round(density,2), y = density + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "Poisson(1)",
       subtitle = "PMF and CDF of Poisson(1) distribution.",
       x = "Events (x)",
       y = "Density") +
  geom_line(data = df, aes(x = x, y = prob))