R Functions dgeom, pgeom, and rgeom

Random varaible \(X\) is distributed \(X \sim Geometric(p)\) with mean \(\mu=\frac{{n}}{{p}}\) and variance \(\sigma^2 = \frac{{(1-p)}}{{p^2}}\) if \(X\) is the count of independent Bernoulli trials required to achieve the first successful trial when the probability of success is constant \(p\). The probability of \(X=n\) trials is \(f(X=n) = p(1-p)^{n-1}\). The probability of \(X<=n\) trials is \(F(X=n) = 1 - (1-p)^n\).

R function dgeom(x, prob) is the probability of x failures prior to the first success (note the difference) when the probability of success is prob. R function pgeom(q, prob, lower.tail) is the cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to q failures prior to success. R function rgeom(n, size, prob) returns n random numbers from the geometric distribution x~geom(prob). R function qgeom(p, prob, lower.tail) is the number of failures at the qth percentile (lower.tail = TRUE).

Example

A sports marketer randomly selects persons on the street until he encounters someone who attended a game last season. What is the probability the market encounters x = 3 people who did not attend a game before the first success when p = 0.20 of the population attended a game?

p = 0.20
n = 3
# exact
dgeom(x = n, prob = p)
## [1] 0.1024
# simulated
mean(rgeom(n = 10000, prob = p) == 3)
## [1] 0.1042
library(dplyr)
library(ggplot2)

data.frame(x = 0:10, prob = dgeom(x = 0:10, prob = p)) %>%
  mutate(Failures = ifelse(x == n, n, "other")) %>%
ggplot(aes(x = factor(x), y = prob, fill = Failures)) +
  geom_col() +
  geom_text(
    aes(label = round(prob,2), y = prob + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "Probability of X = 3 Failures Prior to First Success",
       subtitle = "Geometric(.2)",
       x = "Failures prior to first success (x)",
       y = "Probability") 

Example

From the prior example, what is the probability the marketer fails to find someone who attended a game in x <= 5 trials before finding someone who attended a game on the next trial when the population probability is p = 0.20?

p = 0.20
n = 5
# exact
pgeom(q = n, prob = p, lower.tail = TRUE)
## [1] 0.737856
# simulated
mean(rgeom(n = 10000, prob = p) <= n)
## [1] 0.7421
library(dplyr)
library(ggplot2)

data.frame(x = 0:10, 
           pmf = dgeom(x = 0:10, prob = p),
           cdf = pgeom(q = 0:10, prob = p, lower.tail = TRUE)) %>%
  mutate(Failures = ifelse(x <= n, n, "other")) %>%
ggplot(aes(x = factor(x), y = cdf, fill = Failures)) +
  geom_col() +
  geom_text(
    aes(label = round(cdf,2), y = cdf + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "Cumulative Probability of X = 5 Failures.",
       subtitle = "Geometric(.3)",
       x = "Failures prior to first success (x)",
       y = "probability") 

Example

From the first example, what is the probability the marketer fails to find someone who attended a game on x >= 5 trials before finding someone who attended a game on the next trial when the population probability is p = 0.20?

p = 0.20
n = 5
# exact
pgeom(q = n, prob = p, lower.tail = FALSE)
## [1] 0.262144
# simulated
mean(rgeom(n = 10000, prob = p) > n)
## [1] 0.2613
library(dplyr)
library(ggplot2)

data.frame(x = 0:10, 
           pmf = dgeom(x = -1:9, prob = p),
           cdf = pgeom(q = -1:9, prob = p, lower.tail = FALSE)) %>%
  mutate(Failures = ifelse(x >= n + 1, n + 1, "other")) %>%
ggplot(aes(x = factor(x), y = cdf, fill = Failures)) +
  geom_col() +
  geom_text(
    aes(label = round(cdf,2), y = cdf + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "Cumulative Probability of X = 6 Failures (Right Tail).",
       subtitle = "Geometric(.3)",
       x = "Failures prior to first success (x)",
       y = "probability") 

Example

What is the expected number of trials to achieve the first success when the probability of success is p = 0.2?

p = 0.20
# mean
# exact
1 / p
## [1] 5
# simulated
mean(rgeom(n = 10000, prob = p)) + 1
## [1] 5.0183
# Variance
# exact
(1 - p) / p^2
## [1] 20
# simulated
var(rgeom(n = 100000, prob = p))
## [1] 19.7152
library(dplyr)
library(ggplot2)

data.frame(x = 1:20, 
           pmf = dgeom(x = 0:19, prob = p),
           cdf = pgeom(q = 0:19, prob = p, lower.tail = TRUE)) %>%
ggplot(aes(x = factor(x), y = cdf)) +
  geom_col() +
  geom_text(
    aes(label = round(cdf,2), y = cdf + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  ) +
  labs(title = "Probability of X = x trials to achieve first success",
       subtitle = "Geometric(.2)",
       x = "Trials (x)",
       y = "probability")