The Binomial distribution is one of the most frequently used discrete probability distributions. It models the number of “successes” in a fixed number of independent trials, where each trial has only two possible outcomes (success or failure).
To understand the Binomial distribution, we must first define a Bernoulli Trial. A Bernoulli trial is a single experiment with: - Exactly two outcomes: Success (S) or Failure (F). - Probability of success \(p\) is constant. - Probability of failure is \(q = 1 - p\).
The Binomial distribution is simply the sum of \(n\) independent Bernoulli trials.
For a random variable \(X\) to follow a Binomial distribution (\(X \sim B(n, p)\)), four conditions must be met (often remembered by the acronym BINS):
The probability of getting exactly \(k\) successes in \(n\) trials is given by the formula:
\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]
Where: - \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\) is the binomial coefficient (“n choose k”). - \(n\): Total number of trials. - \(k\): Number of successes (\(0, 1, 2, ..., n\)). - \(p\): Probability of success on a single trial.
R provides four essential functions for the Binomial distribution:
| Function | Purpose |
|---|---|
dbinom(k, n, p) |
Probability Mass Function (PMF) - \(P(X = k)\) |
pbinom(k, n, p) |
Cumulative Distribution Function (CDF) - \(P(X \le k)\) |
qbinom(q, n, p) |
Quantile function - Finds \(k\) such that \(P(X \le k) \ge q\) |
rbinom(m, n, p) |
Random generation - Generates \(m\) random variables |
The shape of the Binomial distribution depends on the values of \(n\) and \(p\).
# Parameters
n_trials <- 20
p_val <- 0.5
# Create data
data <- data.frame(
Successes = 0:n_trials,
Probability = dbinom(0:n_trials, size = n_trials, prob = p_val)
)
# Plot
ggplot(data, aes(x = factor(Successes), y = Probability)) +
geom_col(fill = "steelblue", color = "white") +
labs(title = paste("Binomial Distribution (n=", n_trials, ", p=", p_val, ")"),
x = "Number of Successes",
y = "Probability") +
theme_minimal()Scenario: A multiple-choice quiz has 10 questions. Each question has 4 options (only one correct). If a student guesses randomly on every question:
n <- 10
p <- 0.25 # 1 out of 4 options
# 1. P(X = 5)
prob_5 <- dbinom(5, size = n, prob = p)
# 2. P(X >= 6) = 1 - P(X <= 5)
prob_pass <- 1 - pbinom(5, size = n, prob = p)
# 3. Expected Value
expected_val <- n * p| Feature | Details |
|---|---|
| Notation | \(X \sim B(n, p)\) |
| Parameters | \(n\) (trials), \(p\) (prob of success) |
| Support | \(k \in \{0, 1, 2, \dots, n\}\) |
| PMF | \(\binom{n}{k} p^k q^{n-k}\) |
| Mean | \(np\) |
| Variance | \(npq\) |
```
ggplot2 bar
chart to help students visualize the “discrete” nature of the
distribution.dbinom, pbinom) they will use in
data science.