1. Introduction

In this module, we explore one of the most fundamental discrete probability distributions: the Binomial Distribution.

The Binomial distribution describes the probability of obtaining a specific number of “successes” in a fixed number of independent trials, where each trial has only two possible outcomes (Binary).

1.1 The Bernoulli Trial

To understand the Binomial distribution, we first define a Bernoulli Trial. This is a random experiment with exactly two possible outcomes: 1. Success (usually denoted as 1) 2. Failure (usually denoted as 0)

Let \(p\) be the probability of success. Then \(1-p\) (often denoted as \(q\)) is the probability of failure.


2. Conditions for a Binomial Experiment

For a random variable \(X\) to follow a Binomial distribution, the following four conditions (often remembered by the acronym BINS) must be met:

  1. Binary: Each trial has only two outcomes (Success/Failure).
  2. Independent: The outcome of one trial does not affect the others.
  3. Number: The number of trials, \(n\), is fixed in advance.
  4. Same probability: The probability of success, \(p\), remains constant for each trial.

Notation: \(X \sim B(n, p)\)


3. Mathematical Formulation

3.1 Probability Mass Function (PMF)

The probability of getting exactly \(k\) successes in \(n\) trials is given by the formula:

\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

Where: * \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\) is the binomial coefficient (number of ways to choose \(k\) successes from \(n\) trials). * \(p^k\) is the probability of \(k\) successes. * \((1-p)^{n-k}\) is the probability of \(n-k\) failures.

3.2 Mean and Variance

  • Mean (Expected Value): \(\mu = E(X) = n \cdot p\)
  • Variance: \(\sigma^2 = Var(X) = n \cdot p \cdot (1-p)\)
  • Standard Deviation: \(\sigma = \sqrt{n \cdot p \cdot (1-p)}\)

4. R Functions for Binomial Distribution

R provides four standard functions for the binomial distribution:

Function Description Usage
dbinom(x, size, prob) Probability Mass Function. Calculates \(P(X=x)\). Finding exact probability.
pbinom(q, size, prob) Cumulative Distribution Function. Calculates \(P(X \le q)\). Finding probability of “at most” \(x\).
qbinom(p, size, prob) Quantile Function. Inverse of pbinom. Finding cut-off values.
rbinom(n, size, prob) Random Number Generation. Simulating experiments.

5. Visualizing the Distribution

Visualizations help us understand the shape of the distribution based on the parameters \(n\) and \(p\).

5.1 Symmetric vs. Skewed

If \(p = 0.5\), the distribution is symmetric. If \(p < 0.5\), it is right-skewed. If \(p > 0.5\), it is left-skewed.

Let’s visualize a scenario where we flip a coin 20 times (\(n=20\)) with varying probabilities of success.

# Parameters
n <- 20
k <- 0:n

# Create data frames for plotting
df_sym <- data.frame(Successes = k, Probability = dbinom(k, n, 0.5), Type = "p = 0.5 (Symmetric)")
df_skew <- data.frame(Successes = k, Probability = dbinom(k, n, 0.1), Type = "p = 0.1 (Right Skewed)")
df_comb <- rbind(df_sym, df_skew)

# Plot using ggplot2
ggplot(df_comb, aes(x = Successes, y = Probability, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.7) +
  labs(title = "Binomial Distribution Shapes",
       subtitle = "Comparison of p=0.5 vs p=0.1 with n=20",
       y = "Probability P(X=k)") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set1")


6. Real-Life Examples

Example A: Quality Control in Manufacturing

Scenario: A factory produces light bulbs. Historically, 5% of the bulbs produced are defective. A quality control engineer randomly selects a batch of 10 bulbs for testing.

  • Trial: Inspecting one bulb.
  • Success: Bulb is defective (statistically speaking, “success” is the event of interest).
  • n: 10
  • p: 0.05

Question: What is the probability that exactly 2 bulbs in the batch are defective?

n <- 10
p <- 0.05
k <- 2

prob_2_defective <- dbinom(x = k, size = n, prob = p)
print(paste("Probability of exactly 2 defective bulbs:", round(prob_2_defective, 4)))
## [1] "Probability of exactly 2 defective bulbs: 0.0746"

Example B: Clinical Drug Trials

Scenario: A new drug claims to cure 80% of patients with a specific infection. A doctor administers the drug to 15 patients.

  • n: 15
  • p: 0.80

Question: What is the probability that at least 13 patients are cured? \[P(X \ge 13) = P(X=13) + P(X=14) + P(X=15)\]

In R, we can use sum(dbinom(...)) or the cumulative function pbinom. Note that pbinom(q) gives \(P(X \le q)\). Therefore, \(P(X \ge 13) = 1 - P(X \le 12)\).

n <- 15
p <- 0.80

# Method 1: Summing specific probabilities
prob_at_least_13 <- sum(dbinom(13:15, size = n, prob = p))

# Method 2: Using Cumulative Distribution Function (1 - lower tail)
prob_at_least_13_cdf <- 1 - pbinom(12, size = n, prob = p)

print(paste("Probability at least 13 are cured:", round(prob_at_least_13, 4)))
## [1] "Probability at least 13 are cured: 0.398"

7. Comprehensive Worked Example

Problem: A marketing email has a click-through rate (CTR) of 20%. You send the email to 50 random customers.

  1. What is the expected number of clicks?
  2. What is the standard deviation?
  3. Plot the full probability distribution for this scenario.

Step 1 & 2: Mean and Variance

n <- 50
p <- 0.20

# Expected Value
expected_clicks <- n * p

# Variance and Standard Deviation
variance <- n * p * (1 - p)
std_dev <- sqrt(variance)

cat("Expected Clicks:", expected_clicks, "\n")
## Expected Clicks: 10
cat("Standard Deviation:", round(std_dev, 2), "\n")
## Standard Deviation: 2.83

Step 3: Plotting the Distribution

We will highlight the area representing “more than 15 clicks” to visualize tail probabilities.

# Generate Data
clicks <- 0:50
probs <- dbinom(clicks, size = n, prob = p)
plot_data <- data.frame(clicks, probs)

# Determine colors based on condition (More than 15)
plot_data <- plot_data %>%
  mutate(Highlight = ifelse(clicks > 15, "Tail (>15)", "Normal"))

# Plot
ggplot(plot_data, aes(x = clicks, y = probs, fill = Highlight)) +
  geom_col(width = 0.7) +
  scale_fill_manual(values = c("gray70", "red")) +
  geom_vline(xintercept = expected_clicks, linetype="dashed", color="blue", size=1) +
  annotate("text", x = expected_clicks + 2, y = 0.10, label = "Mean = 10", color="blue") +
  labs(title = "Binomial Distribution of Email Clicks",
       subtitle = "n=50, p=0.20. Red bars indicate >15 clicks.",
       x = "Number of Clicks",
       y = "Probability") +
  theme_classic()


8. Summary

  • The Binomial distribution models counts of successes in fixed trials.
  • Key parameters are \(n\) (trials) and \(p\) (probability of success).
  • R makes it easy to calculate exact probabilities (dbinom) and cumulative probabilities (pbinom).
  • As \(n\) gets larger, the Binomial distribution starts to look like a Normal distribution (Central Limit Theorem).

```

Instructions to Use This File:

  1. Install R and RStudio: Ensure you have these installed.
  2. Install Required Packages: Open RStudio and run the following in the console to ensure you have the plotting libraries: R install.packages(c("ggplot2", "dplyr", "rmarkdown"))
  3. Create the File:
    • Click File -> New File -> R Markdown.
    • Delete the default template content.
    • Paste the code block provided above.
  4. Render/Knit:
    • Click the Knit button (icon with a ball of yarn) in the toolbar.
    • This will generate a beautifully formatted HTML file with the text, the math equations rendered, the R code calculated, and the graphs generated automatically.