The Binomial Distribution is one of the most frequently used discrete probability distributions. It describes the number of “successes” in a fixed number of independent “trials,” where each trial has only two possible outcomes (Success or Failure).
To understand the Binomial distribution, we must first define a Bernoulli Trial. A Bernoulli trial is a random experiment where: 1. There are exactly two possible outcomes (e.g., Heads/Tails, Pass/Fail, Yes/No). 2. The probability of success (\(p\)) remains constant every time the experiment is performed.
A random variable \(X\) follows a Binomial Distribution if it meets the BINS criteria:
If \(X\) is a binomial random variable, denoted as \(X \sim B(n, p)\), the probability of getting exactly \(k\) successes in \(n\) trials is given by the Probability Mass Function (PMF):
\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]
Where: * \(n\): Total number of trials. * \(k\): Number of successes (\(k = 0, 1, 2, ..., n\)). * \(p\): Probability of success on a single trial. * \((1-p) = q\): Probability of failure. * \(\binom{n}{k}\): The binomial coefficient, calculated as \(\frac{n!}{k!(n-k)!}\).
The shape of the Binomial distribution changes based on the values of \(n\) and \(p\).
# Parameters
n_trials <- 20
p_val <- 0.5 # Symmetric distribution
# Generate data
data <- data.frame(
Successes = 0:n_trials,
Probability = dbinom(0:n_trials, size = n_trials, prob = p_val)
)
# Plot
ggplot(data, aes(x = factor(Successes), y = Probability)) +
geom_bar(stat = "identity", fill = "steelblue", color = "white") +
labs(title = "Binomial Distribution (n=20, p=0.5)",
x = "Number of Successes",
y = "Probability") +
theme_minimal()R provides four essential functions for the Binomial Distribution: *
dbinom(k, n, p): Probability Mass Function \(P(X = k)\). * pbinom(k, n, p):
Cumulative Distribution Function \(P(X \leq
k)\). * qbinom(q, n, p): Quantile function (finds
\(k\) for a given probability). *
rbinom(m, n, p): Generates \(m\) random variables.
Scenario: A multiple-choice quiz has 10 questions. Each question has 4 options (only one is correct). If a student guesses randomly:
Q1: What is the probability of getting exactly 3 correct?
## [1] 0.2502823
Q2: What is the probability of passing (getting 5 or more correct)? We need \(P(X \geq 5)\), which is \(1 - P(X \leq 4)\).
## [1] 0.07812691
# Or using lower.tail = FALSE (calculates P(X > 4))
pbinom(4, size = 10, prob = 0.25, lower.tail = FALSE)## [1] 0.07812691
| Feature | Formula / Function |
|---|---|
| Notation | \(X \sim B(n, p)\) |
| R Function (Exact) | dbinom(k, n, p) |
| R Function (Cumulative) | pbinom(k, n, p) |
| Mean | \(np\) |
| Variance | \(np(1-p)\) |
A manufacturing plant claims that only 5% of its chips are defective. A sample of 50 chips is drawn.
# Exercise Solution Snippet
n <- 50
p <- 0.05
# Mean
mu <- n * p
# Prob of 5 or more
prob_5_plus <- pbinom(4, n, p, lower.tail = FALSE)
cat("The mean is:", mu, "\n")## The mean is: 2.5
## The probability of at least 5 defective chips is: 0.1036168
```
ggplot2 to
visualize the distribution, which is essential for modern data
science.