1. Introduction to Binomial Distribution

The Binomial Distribution is one of the most frequently used discrete probability distributions. It describes the number of “successes” in a fixed number of independent “trials,” where each trial has only two possible outcomes (Success or Failure).

1.1 Bernoulli Trials

To understand the Binomial distribution, we must first define a Bernoulli Trial. A Bernoulli trial is a random experiment where: 1. There are exactly two possible outcomes (e.g., Heads/Tails, Pass/Fail, Yes/No). 2. The probability of success (\(p\)) remains constant every time the experiment is performed.

2. Key Characteristics (BINS)

A random variable \(X\) follows a Binomial Distribution if it meets the BINS criteria:

B - Binary: Outcomes are either “Success” or “Failure.”
I - Independent: The outcome of one trial does not affect the next.
N - Number: There is a fixed number of trials (\(n\)).
S - Success: The probability of success (\(p\)) is the same for each trial.

3. The Mathematical Formula

If \(X\) is a binomial random variable, denoted as \(X \sim B(n, p)\), the probability of getting exactly \(k\) successes in \(n\) trials is given by the Probability Mass Function (PMF):

\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]

Where: * \(n\): Total number of trials. * \(k\): Number of successes (\(k = 0, 1, 2, ..., n\)). * \(p\): Probability of success on a single trial. * \((1-p) = q\): Probability of failure. * \(\binom{n}{k}\): The binomial coefficient, calculated as \(\frac{n!}{k!(n-k)!}\).

Mean and Variance

Mean (\(\mu\)): \(E(X) = np\)
Variance (\(\sigma^2\)): \(Var(X) = np(1-p)\)
Standard Deviation (\(\sigma\)): \(\sqrt{np(1-p)}\)

4. Visualizing the Distribution in R

The shape of the Binomial distribution changes based on the values of \(n\) and \(p\).

# Parameters
n_trials <- 20
p_val <- 0.5 # Symmetric distribution

# Generate data
data <- data.frame(
  Successes = 0:n_trials,
  Probability = dbinom(0:n_trials, size = n_trials, prob = p_val)
)

# Plot
ggplot(data, aes(x = factor(Successes), y = Probability)) +
  geom_bar(stat = "identity", fill = "steelblue", color = "white") +
  labs(title = "Binomial Distribution (n=20, p=0.5)",
       x = "Number of Successes",
       y = "Probability") +
  theme_minimal()

Effect of ‘p’ on Shape

If \(p < 0.5\), the distribution is Right-Skewed.
If \(p > 0.5\), the distribution is Left-Skewed.
If \(p = 0.5\), the distribution is Symmetric.

5. Real-Life Examples

Quality Control: A factory produces light bulbs with a 1% defect rate. In a box of 100 bulbs, what is the probability that exactly 2 are defective? (\(n=100, p=0.01\)).
Medical Trials: If a drug has an 80% cure rate, what is the probability that 8 out of 10 patients are cured? (\(n=10, p=0.8\)).
Marketing: A marketing email has a “click-through” rate of 5%. If you send it to 500 people, how many clicks can you expect on average? (\(E(X) = 500 \times 0.05\)).

6. Solving Problems using R Functions

R provides four essential functions for the Binomial Distribution: * dbinom(k, n, p): Probability Mass Function \(P(X = k)\). * pbinom(k, n, p): Cumulative Distribution Function \(P(X \leq k)\). * qbinom(q, n, p): Quantile function (finds \(k\) for a given probability). * rbinom(m, n, p): Generates \(m\) random variables.

Example Problem

Scenario: A multiple-choice quiz has 10 questions. Each question has 4 options (only one is correct). If a student guesses randomly:

Q1: What is the probability of getting exactly 3 correct?

dbinom(3, size = 10, prob = 0.25)

## [1] 0.2502823

Q2: What is the probability of passing (getting 5 or more correct)? We need \(P(X \geq 5)\), which is \(1 - P(X \leq 4)\).

1 - pbinom(4, size = 10, prob = 0.25)

## [1] 0.07812691

# Or using lower.tail = FALSE (calculates P(X > 4))
pbinom(4, size = 10, prob = 0.25, lower.tail = FALSE)

## [1] 0.07812691

7. Summary Table

Feature	Formula / Function
Notation	\(X \sim B(n, p)\)
R Function (Exact)	`dbinom(k, n, p)`
R Function (Cumulative)	`pbinom(k, n, p)`
Mean	\(np\)
Variance	\(np(1-p)\)

8. Practice Exercise

A manufacturing plant claims that only 5% of its chips are defective. A sample of 50 chips is drawn.

Calculate the mean and standard deviation of the defective chips in the sample.
Using R, plot the probability distribution for this scenario.
What is the probability that at least 5 chips are defective?

# Exercise Solution Snippet
n <- 50
p <- 0.05
# Mean
mu <- n * p
# Prob of 5 or more
prob_5_plus <- pbinom(4, n, p, lower.tail = FALSE)

cat("The mean is:", mu, "\n")

## The mean is: 2.5

cat("The probability of at least 5 defective chips is:", prob_5_plus)

## The probability of at least 5 defective chips is: 0.1036168

```

Key Features of this Template:

BINS Mnemonic: Helps students remember when to apply the Binomial distribution.
LaTeX Integration: Beautifully rendered mathematical formulas (\(P(X=k)\)).
Code Chunks: Included R code so students see the connection between theory and computation.
Data Visualization: Uses ggplot2 to visualize the distribution, which is essential for modern data science.
Practicality: Uses real-world scenarios (Quality control, testing) to anchor theoretical concepts.

Module IV: The Binomial Distribution

Probability and Statistics for Data Science

Farhan Mohamed Saed

2025-12-29