1. Introduction to Binomial Distribution

The Binomial distribution is one of the most frequently used discrete probability distributions. It models the number of “successes” in a fixed number of independent trials, where each trial has only two possible outcomes (success or failure).

1.1 Bernoulli Trials

To understand the Binomial distribution, we must first define a Bernoulli Trial. A Bernoulli trial is a single experiment with: - Exactly two outcomes: Success (S) or Failure (F). - Probability of success \(p\) is constant. - Probability of failure is \(q = 1 - p\).

The Binomial distribution is simply the sum of \(n\) independent Bernoulli trials.


2. Properties and Assumptions

For a random variable \(X\) to follow a Binomial distribution (\(X \sim B(n, p)\)), four conditions must be met (often remembered by the acronym BINS):

  1. Binary: There are only two possible outcomes for each trial.
  2. Independent: The outcome of one trial does not affect the others.
  3. Number: The number of trials (\(n\)) is fixed in advance.
  4. Success: The probability of success (\(p\)) is the same for each trial.

3. The Probability Mass Function (PMF)

The probability of getting exactly \(k\) successes in \(n\) trials is given by the formula:

\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]

Where: - \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\) is the binomial coefficient (“n choose k”). - \(n\): Total number of trials. - \(k\): Number of successes (\(0, 1, 2, ..., n\)). - \(p\): Probability of success on a single trial.

Mean and Variance

  • Mean (\(\mu\)): \(E(X) = np\)
  • Variance (\(\sigma^2\)): \(Var(X) = np(1-p)\)
  • Standard Deviation (\(\sigma\)): \(\sqrt{np(1-p)}\)

4. Binomial Distribution in R

R provides four essential functions for the Binomial distribution:

Function Purpose
dbinom(k, n, p) Probability Mass Function (PMF) - \(P(X = k)\)
pbinom(k, n, p) Cumulative Distribution Function (CDF) - \(P(X \le k)\)
qbinom(q, n, p) Quantile function - Finds \(k\) such that \(P(X \le k) \ge q\)
rbinom(m, n, p) Random generation - Generates \(m\) random variables

5. Visualizing the Distribution

The shape of the Binomial distribution depends on the values of \(n\) and \(p\).

# Parameters
n_trials <- 20
p_val <- 0.5

# Create data
data <- data.frame(
  Successes = 0:n_trials,
  Probability = dbinom(0:n_trials, size = n_trials, prob = p_val)
)

# Plot
ggplot(data, aes(x = factor(Successes), y = Probability)) +
  geom_col(fill = "steelblue", color = "white") +
  labs(title = paste("Binomial Distribution (n=", n_trials, ", p=", p_val, ")"),
       x = "Number of Successes",
       y = "Probability") +
  theme_minimal()

Impact of ‘p’ on Shape

  • If \(p = 0.5\), the distribution is symmetric.
  • If \(p < 0.5\), the distribution is skewed right.
  • If \(p > 0.5\), the distribution is skewed left.

6. Real-Life Examples

  1. Quality Control: A factory produces light bulbs with a 1% defect rate. If you pick 100 bulbs, what is the probability that exactly 2 are defective? (\(n=100, p=0.01\)).
  2. Public Health: If the effectiveness of a vaccine is 90%, and 50 people are exposed to the virus, how many are expected to remain healthy?
  3. Marketing: If a marketing email has a 5% Click-Through Rate (CTR), what is the probability that out of 500 recipients, at least 20 click the link?
  4. Sports: If a basketball player has a 70% free-throw average, what is the probability they make 8 out of 10 shots?

7. Solved Example Problem

Scenario: A multiple-choice quiz has 10 questions. Each question has 4 options (only one correct). If a student guesses randomly on every question:

  1. What is the probability of getting exactly 5 correct?
  2. What is the probability of passing (getting at least 6 correct)?
  3. What is the expected number of correct answers?

Solution using R:

n <- 10
p <- 0.25 # 1 out of 4 options

# 1. P(X = 5)
prob_5 <- dbinom(5, size = n, prob = p)

# 2. P(X >= 6) = 1 - P(X <= 5)
prob_pass <- 1 - pbinom(5, size = n, prob = p)

# 3. Expected Value
expected_val <- n * p
  • Probability of exactly 5 correct: 0.0584
  • Probability of passing (\(\ge 6\)): 0.0197
  • Expected number of correct answers: 2.5

8. Summary Table

Feature Details
Notation \(X \sim B(n, p)\)
Parameters \(n\) (trials), \(p\) (prob of success)
Support \(k \in \{0, 1, 2, \dots, n\}\)
PMF \(\binom{n}{k} p^k q^{n-k}\)
Mean \(np\)
Variance \(npq\)

```

Key Features of this Note:

  1. Mathematical Accuracy: Uses LaTeX for the PMF and statistical formulas.
  2. Interactivity: Uses R code chunks to calculate values and generate plots dynamically.
  3. Visual Aids: Includes a ggplot2 bar chart to help students visualize the “discrete” nature of the distribution.
  4. Practicality: Contextualizes abstract math with real-world scenarios like Quality Control and Marketing.
  5. R Integration: Teaches students the specific functions (dbinom, pbinom) they will use in data science.