Essential of Probability

Week 10

Logo

library(tidyverse)
library(readr)
library(ggplot2)
library(dplyr)
library(ggridges)
library(knitr)
library(DT)

1 Introduction

Probability is built from a sequence of ideas that connect together—starting with the basic structure of outcomes, then moving toward relationships between events, and finally arriving at probability models that describe repeated processes. This ebook brings all those ideas into one clean flow. Each chapter highlights one essential building block:

What outcomes are possible (sample space)
How to compute simple probabilities
How events combine through unions and intersections
The difference between mutually exclusive, overlapping, and exhaustive events
How independence changes the way probabilities multiply
How dependent events require conditional reasoning
How repeated trials lead to the binomial distribution

By understanding these steps in order, probability becomes more intuitive and the formulas make sense rather than feel memorized. This short guide follows the structure of the video lessons and turns them into a smoother, book-style explanation.

2 Probability, Sample Spaces & the Complement Rule

Probability forms one of the fundamental pillars of statistical thinking. Almost every decision involving uncertainty—weather forecasting, financial risk analysis, medical testing, quality control, machine learning classification—relies on the logic of probability.

This chapter builds a rigorous but intuitive understanding of how probability originates from the structure of random experiments, how outcomes are organized into sample spaces, and how mathematical rules ensure consistency in evaluating uncertainty.

We also explore the crucial Complement Rule, one of the most powerful tools in problem-solving, especially when direct computation is complex.

Video Reference

2.1 What is Probability?

Probability is a numerical measure of uncertainty. Whenever we perform an experiment whose outcome cannot be predicted with absolute certainty, we enter the realm of probability.

More formally:

Probability assigns a value between 0 and 1 to any event. It quantifies the long-run proportion of times an event would occur if the experiment were repeated infinitely many times. It also serves as a mathematical model for rational decision-making under uncertainty.

Intuitive Foundations

1. Long-Run Frequency The probability of an event reflects the proportion of times it happens over a very large number of trials.

2. Subjective Probability Sometimes probability reflects degrees of belief, such as estimating the chance of a political candidate winning.

3. Theoretical Probability For experiments with equally likely outcomes (like rolling a fair dice), probability is derived by counting:

\[ P(A) = \frac{\text{Number of outcomes in } A}{\text{Total number of outcomes in the sample space}} \]

Where:

𝐴 = event

𝑆= sample space

Why Probability Matters in Statistics

All inferential statistics—confidence intervals, hypothesis testing, regression interpretation—are built on probability models.
Data is inherently variable; probability explains how the variation behaves.
Probability enables us to transform randomness into mathematical structure.

Defining the Sample Space (Deep Dive)

A sample space, denoted 𝑆, is the set of every possible outcome of a random experiment. It serves as the universal set in probability theory.

Why Sample Spaces Matter

They determine what is possible and what is not.
They define the boundaries for any event we wish to analyze.
Without a clearly defined sample space, probability loses meaning.

Types of Sample Spaces

Finite Sample Space

\[S = \{1,2,3,4,5,6\}\]

(e.g., rolling a dice)

Countably Infinite Sample Space

\[ S=\{0,1,2,3,…\} \]

(e.g., counting the number of emails received in a day)

Uncountably Infinite Sample Space

\[ S = [0,1] \]

(e.g., selecting a random real number between 0 and 1)

Formal Set-Notation Definition

\[ S = \{\omega_1, \omega_2, \dots\} \]

2.2 Sample Space Diagram

Visualizing the sample space helps convert abstract probability into intuitive structure. Several diagram types exist:

1. Tree Diagrams Used for multi-step experiments:

Tossing a coin twice
Drawing two cards
Sequential processes

A tree diagram displays:

branches (possible outcomes)
paths (joint outcomes)
independence or dependence between steps

2. Grid (Table) Diagrams

Helpful for two-variable experiments, like rolling two dice. It organizes outcomes into a matrix structure, making counting straightforward.

3. Venn Diagrams

intersections \((𝐴∩B)\)
unions\((𝐴∪B)\)
complements \((A^c)\)
mutually exclusive events

Example: Two Dice Sample Space \[S=\{(i,j):i=1,…,6;j=1,…,6\} \]

This forms a 6 × 6 grid (36 total outcomes).

2.3 Probability Rules

Probability follows fundamental axioms (Kolmogorov’s Axioms):

Axiom 1: Non-negativity

\[P(A)≥0\]

This axiom ensures that probability behaves like a real-world measure — you cannot have “negative chance”.

Axiom 2: Normalization

\[P(S)=1\] The sample space 𝑆contains all possible outcomes, so the chance that “something happens” is always 1. This establishes probability as a normalized measure where 1 represents certainty.

Axiom 3: Additivity \[ P(A \cup B) = P(A) + P(B) \quad \text{if } A \cap B = \varnothing \] If two events cannot occur at the same time (for example, rolling a dice and getting both 2 and 5), then the probability that either occurs is simply the sum of their probabilities. This reflects that disjoint events do not overlap in the sample space.

For mutually exclusive events 𝐴 and 𝐵:

\[P(A∪B)=P(A)+P(B)\] General Addition Rule

Used when overlap exists:

\[P(A∪B)=P(A)+P(B)−P(A∩B)\] Use this rule when events can occur together. The intersection A∩B is subtracted because outcomes counted in both events would otherwise be counted twice. This rule generalizes the previous axiom. Consequences of These Rules

From these axioms, more results follow:

Complement rule
Law of total probability
Conditional probability
Bayes theorem

This chapter focuses on the complement rule.

2.4 The Complement Rule

One of the most useful tools in probability, especially when:

counting the direct event is hard,
but counting what doesn’t happen is easy.

Definition of Complement

\[A^c=\{ω∈S:ω ∈ A\}\] Complement Rule

\[P(A^c)=1−P(A)\] The complement \(A^c\)

represents everything in the sample space that is not in event 𝐴. This formula states: the chance that an event does not occur equals one minus the chance that it does occur. It is one of the most frequently used tools in basic probability. Interpretation

The entire sample space has probability 1. An event and its complement do not overlap:

\[A∩A^c =∅\]

Together they form the entire space:

\[A∪Ac^S\] Why It’s Powerful

Many problems become trivial using complements:

“At least one success”
“None of the events occur”
“At least one defective item”
“At least one person shares your birthday”

3 Probability of Independent and Dependent Events

The Framework of Compound Events In probability theory, real-world scenarios often require us to calculate the likelihood of multiple events occurring in sequence. This is known as finding the probability of compound events. The fundamental distinction that dictates the computational methodology is whether these events are independent or dependent. This text delineates the formal definitions and calculi for these two distinct classes of probabilistic relationships.

Video Reference

3.1 Learning Objectives

Define independent events and understand when events do not influence each other. - Use the multiplication rule for independent events: \[ P(A \cap B) = P(A)P(B) \]

Understand dependent events where earlier outcomes affect later probabilities. - Apply conditional probability for dependent events: \[ P(A \cap B) = P(A)P(B \mid A) \]

Distinguish independent vs dependent scenarios through examples. This chapter establishes the roadmap for the reader. Probability is not simply about memorizing formulas. Instead, it is about developing intuition: when does one event affect another, and when do events occur in isolation? The learning objectives serve as the compass for the entire topic. By the end of these chapters, readers should move beyond mechanical computation and understand the story behind every probability expression. Events are not just symbols; they represent actions and outcomes in the real world — tossing coins, drawing cards, selecting samples, making decisions.

This grounding prepares them for deeper statistical thinking. ## Independent Events

Two events \(A\) and \(B\) are independent if the outcome of one does not change the probability of the other.

The condition for independence is: \[ P(A \cap B) = P(A)P(B) \]

Example Flip a coin and roll a die.

\(P(\text{Heads}) = \frac{1}{2}\)
\(P(\text{Roll a 4}) = \frac{1}{6}\)

Thus: \[ P(\text{Heads and 4}) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12} \]

Interpretation

Independence is one of the most elegant ideas in probability. Two events are independent when they exist in different “worlds.” Tossing a coin doesn’t whisper any information to the dice. Rolling a dice doesn’t influence which side of a coin will appear.

This concept is powerful because it simplifies complex sequences. When events don’t affect each other, we don’t need to consider changing sample spaces or conditional behavior.

Notice the structure:

A fair coin always has a 50% chance of landing heads.
A fair dice always has a 1 in 6 chance of landing on 4.

These fixed probabilities make independence a key building block for multi-event probability models in science, gaming, simulation, and statistical reasoning.

The symbolic formula isn’t just math — it says, “If events don’t touch each other, you can simply multiply their chances.” When two events are independent: \[ P(A \cap B) = P(A) \cdot P(B) \]

For three mutually independent events: \[ P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C) \]

Example

Drawing with replacement:

Probability of drawing an Ace twice: - \(P(\text{Ace}) = \frac{4}{52}\)

Thus: \[ P(\text{Ace then Ace}) = \frac{4}{52} \times \frac{4}{52} \] Interpretation

This chapter expands the use of independence into multi-event situations. The key phrase is with replacement. When an experiment resets the sample space each time — such as putting a card back, reshuffling, or restarting a machine — each trial becomes identical to the last.

This stable structure allows the probability of repeated events to scale predictably. Drawing an Ace two times with replacement is no more complicated than multiplying the chance of one Ace by itself.

This type of reasoning is essential in real-world settings:

computer simulations
quality control
repeated testing
genetic probability models

Independence is not just a mathematical convenience — it reflects real systems where events start fresh each time.

3.2 The Probability of Dependent Events

Events are dependent if the outcome of the first event changes the probability of the second.

The multiplication rule becomes: \[ P(A \cap B) = P(A)P(B \mid A) \]

Example A bag contains 3 green and 2 red balls (5 total).

Event \(A\): First draw is green
\[ P(A) = \frac{3}{5} \]

Event \(B\): Second draw is red given the first was green
Remaining balls = 4
\[ P(B \mid A) = \frac{2}{4} \]

Joint probability: \[ P(G \text{ then } R) = \frac{3}{5} \times \frac{2}{4} = \frac{6}{20} \]

Interpretation

This chapter introduces dependence: when drawing without replacement or making sequential decisions, earlier outcomes modify the sample space. The conditional probability \[P(B∣A)\] reflects this new reality. This form shows how information accumulates — once the first event happens, the world changes, so calculations must adapt accordingly. Dependence introduces the idea that probability is dynamic — the future can be shaped by the past. Once the first ball is drawn and not replaced, the composition of the bag changes. The sample space shrinks. The probabilities shift. The entire situation evolves.

This is the essence of conditional probability:

“Now that we know this happened, how does it affect our expectations?”

Most real phenomena are dependent, not independent:
selecting students for a group project
removing defective items from a batch
drawing cards without replacement
diagnostic testing
sequential machine operations

Humans instinctively understand dependence (“if the first ticket is winning, the next one is less likely”), but the mathematical representation formalizes that intuition.

The formula is not merely symbolic — it represents probability adapting to new information.

Summary

Independent Events No influence between events. \[ P(A \cap B) = P(A)P(B) \]

Dependent Events Probability of the second event changes after the first. \[ P(A \cap B) = P(A)P(B \mid A) \]

Key Distinction

Check whether the first event changes:

the sample space
the probability of the second event
the amount of information available

4 Union Of Events

Probability provides a systematic way to measure uncertainty. Whether we are analyzing data, predicting outcomes, or understanding random processes, probability gives us the language and tools to reason logically. Every concept in probability begins with a simple idea: identifying all possible outcomes, determining what counts as a favorable outcome, and measuring how likely that outcome is.

This expanded explanation follows the structure of the video and develops each idea with deeper clarity, examples, and interpretations. You can extend each section even further whenever needed.

Video Reference

4.1 Understanding the Sample Space

What is a Sample Space?

A sample space

The sample space is the complete set of all possible outcomes that can occur in an experiment. It serves as the foundation of all probability calculations.

Every probability question begins with identifying the sample space. If the sample space is wrong or incomplete, every result afterward becomes unreliable.

\[ S = \{\text{all possible outcomes}\} \]

Examples

Coin Flip
\[ S = \{H, T\} \]
Rolling a Six-sided Die
\[ S = \{1,2,3,4,5,6\} \]
Drawing a Card numbered 1–10
\[ S = \{1,2,3,\dots,10\} \]

Why It Matters?

It defines the boundaries of what is possible, ensures we correctly count outcomes, avoids logical errors such as missing or double-counting events and forms the denominator in probability formulas. A strong grasp of the sample space avoids misunderstandings later in unions, intersections, and conditional probability.

4.2 Simple Probability

Simple probability refers to the likelihood of a single event occurring, assuming each outcome in the sample space is equally likely.

\[ P(A) = \frac{\text{Number of outcomes in } A}{\text{Total outcomes in } S} \]

Examples

Rolling a 4 on a dice

\[ P(4) = \frac{1}{6} \] Only one outcome is “4”, and the sample space contains 6 outcomes. Probability = 1/6

Flipping Heads on a coin

\[ P(H) = \frac{1}{2} \] There are 2 outcomes (H, T). Probability = 1/2

Drawing an even number from 1–10

\[ A = \{2,4,6,8,10\} \\ P(A) = \frac{5}{10} = 0.5 \] Even numbers: {2, 4, 6, 8, 10} → 5 numbers Total outcomes = 10 Probability = 5/10 = 1/2

Interpretation

Simple probability allows us to quantify the likelihood of an event occurring under equally likely outcomes. probability close to 1 → event is very likely.

A probability close to 0 → event is very unlikely.
A probability of 0.5 → the event is equally likely to happen or not.

4.3 Practice Review Questions

Example 1 — Rolling a Dice

Event: roll a number ≤ 3.

\[ A = \{1,2,3\} \\ P(A) = \frac{3}{6} = \frac{1}{2} \] Favorable outcomes = {1, 2, 3} → 3 outcomes Total outcomes = 6 Probability = 3/6 = 1/2 Example 2 — Drawing a Card Event: draw an odd number.

\[ A = \{1,3,5,7,9\} \\ P(A) = \frac{5}{10} = 0.5 \] Odd numbers = {1, 3, 5, 7, 9} → 5 outcomes Total outcomes = 10 Probability = 5/10 = 1/2 Example 3 — Coin Toss

\[ P(H) = \frac{1}{2} \] Probability of Heads = 1/2 These questions help reinforce the concepts of sample spaces and event probabilities.

4.4 Probability of the Union of Events

The Union of Events:

\[(A \cup B)\]

The union includes outcomes in event \(A\), event \(B\), or both:

\[ A \cup B \]

The formula for the probability of the union of two events is:

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

Why Do We Subtract the Intersection?

When adding \(P(A)\) and \(P(B)\), the overlapping outcomes are counted twice, so the intersection must be subtracted once.

Example

Suppose in a class:

40% like Math
30% like Physics
10% like both subjects

Then:

\[ P(M \cup F) = 0.40 + 0.30 - 0.10 = 0.60 \]

Interpretation:
60% of students like at least one of the two subjects. If we add 40% + 30%, the students who like both (10%) get counted twice. So we subtract them once to avoid overcounting.

Final probability = 60%.

Key Insight

The union describes the probability that at least one of the events occurs.

4.5 Visualizing Unions with a Venn Diagram

A Venn Diagram visually represents event relationships.

One circle = event \(A\)
Another circle = event \(B\)
Overlap = \(A \cap B\)
Combined shaded area = \(A \cup B\)

Why It Helps

Makes intersections easier to see
Helps avoid double counting
Supports intuition for beginners

Staying Connected

The video encourages continued learning and engagement.
In this topic context: Keep exploring concepts, solving problems, and applying probability to real data. Probability becomes clearer through constant practice and real-world examples.

5 Mutually Exclusive and Exhaustive Events

In probability, understanding how events relate to one another is just as important as calculating the probabilities themselves. This chapter focuses on three foundational ideas: mutually exclusive events, exhaustive events, and overlapping events. These concepts determine whether events can occur together, whether they cover all possible outcomes, and how their relationships affect the formulas we use.

Mutually exclusive events cannot happen at the same time — one outcome rules out the other. Exhaustive events, on the other hand, ensure that every possible outcome is included somewhere in the set. Between these two ideas lies the concept of overlapping events, where two or more events share some common outcomes.

By clarifying how events are connected within the sample space, you’ll be able to apply probability rules more accurately, avoid double-counting, and better interpret real-world situations. This chapter provides the core intuition needed before moving into more advanced topics like conditional probability, Bayes’ theorem, and probability distributions.

Video Reference

5.1 Learning Objectives & Review

When you finish reading this chapter you will be able to:

Define what it means for events to be mutually exclusive.
Understand what exhaustive (collectively exhaustive) events are.
Recognize when a set of events is both mutually exclusive and exhaustive.
Understand overlapping vs non-overlapping exhaustive events.
Apply probability formulas (union, addition) correctly under these different types of events.
Analyze examples to decide whether events are mutually exclusive / exhaustive / overlapping.

This chapter sets the stage. We begin by clarifying the core concepts we will explore — not merely formulas, but definitions tying directly to intuitive scenarios like coin tosses, dice rolls, or drawing cards. The review reminds the reader of prior foundational ideas (sample space, simple probability, union of events), and primes them to appreciate the nuance between different kinds of event relationships. The goal: after reading this, you should no longer be confused about what it means for two events to exclude each other, or to cover all possible outcomes, or to overlap.

5.2 Mutually Exclusive Events

Two events \(A\) and \(B\) are mutually exclusive (also called disjoint) if they cannot happen at the same time.
That is:

\[ A \cap B = \varnothing \quad\Longrightarrow\quad P(A \cap B) = 0 \]

Example When rolling a fair six-sided die:

Let \(A\) = “roll a 2”
Let \(B\) = “roll a 5”

Since the die cannot show both 2 and 5 at once,
\(A\) and \(B\) are mutually exclusive.

Thus:

\[ P(A \cup B) = P(A) + P(B) \]
because \(P(A \cap B) = 0\).

Narrative Mutual exclusivity is one of the simplest — yet most powerful — ideas in probability. It captures the intuitive idea of “either-or, but not both.” When you roll a dice, you can’t get both 2 and 5 simultaneously. When you toss a coin, you can’t get both heads and tails at once.

Understanding this allows you to simplify probability calculations — especially when dealing with “or” events (unions): the probability of “A or B” becomes the sum of their separate probabilities, provided they are mutually exclusive. That makes reasoning straightforward and avoids double-counting outcomes.

5.3 Exhaustive (Collectively Exhaustive) Events

A set of events \(\{E_1, E_2, \dots, E_n\}\) is called exhaustive (or collectively exhaustive) when their union covers the entire sample space \(S\):

\[ E_1 \cup E_2 \cup \dots \cup E_n = S \]

This means whenever the experiment is conducted, at least one of the events must occur.

Example

Tossing a fair coin: events “Heads” (H) and “Tails” (T).
Together: \(\{H, T\} = S\) — they are exhaustive.
Rolling a six-sided dice: events “odd” and “even”.
Together they cover all possible outcomes \(\{1,2,3,4,5,6\}\), so they are exhaustive.

Narrative

Exhaustive events ensure that no matter what happens, you have covered every possibility. It’s like having a guarantee that one of your defined events will occur — no surprises, no outside outcomes.

This is particularly useful when partitioning the sample space: by breaking down all possible outcomes into a collection of events whose union is the entire space, you create a complete framework for analyzing probabilities. An exhaustive partition helps when you want to compute probabilities over complex events by considering all scenarios.

5.4 The Union of Exhaustive Events

When events \(E_1, E_2, \dots, E_n\) are exhaustive:

\[ E_1 \cup E_2 \cup \dots \cup E_n = S \]

Since the occurrence of some event from the set is certain:

\[ P(E_1 \cup E_2 \cup \dots \cup E_n) = 1 \]

If the events are also mutually exclusive (no overlap), then:

\[ P(E_1) + P(E_2) + \dots + P(E_n) = 1 \] Narrative

This chapter builds on the prior definitions to show why exhaustiveness is more than conceptual — it leads to powerful probability identities. If your events collectively exhaust the sample space, then the union of them has probability 1 — meaning when the trial is run, you’re certain one of those events happens.

When the events are both exhaustive and mutually exclusive, computing probabilities becomes even simpler: the sum of the probabilities of individual events equals 1. In such cases, you’ve partitioned the entire sample space into neat, disjoint events. This property is the backbone of many statistical and probability models, including discrete distributions and partition-based reasoning.

5.5 Overlapping vs Non-overlapping Exhaustive Events

Non-overlapping exhaustive events: the events are mutually exclusive and their union is the sample space.
Then: \[ \bigcup_i E_i = S,\quad E_i \cap E_j = \varnothing \; (i \neq j) \] \[ \sum_i P(E_i) = 1 \]
Overlapping exhaustive events: the events together cover the sample space, but they may overlap (i.e. not mutually exclusive).
In that case, their union is still \(S\), but
\[ \sum_i P(E_i) \ge 1 \]
because overlap causes double-counting if one simply sums probabilities without correction.

Narrative

This chapter explores nuance. Not every ex*haustive partition is tidy and disjoint. Sometimes events overlap — meaning a particular outcome could satisfy more than one event. For example, imagine you partition a die’s sample space into “even numbers,” “multiples of 3,” and “numbers greater than 4.” These three events together might cover all outcomes, but they overlap (e.g., 6 is even, a multiple of 3, and greater than 4).

In overlapping exhaustive sets, simply adding up event probabilities will typically overcount. That’s why being clear about whether events are overlapping or not is crucial before applying probability rules. The careful use of union formulas and subtraction of intersections becomes necessary in overlapping partitions. This distinction helps avoid common probability mistakes.

6 The Binomial Experiment and the Binomial Formula

Many real-world situations involve repeating the same action several times while observing how often a certain outcome occurs. Whether a student guesses answers on a quiz, a doctor examines how often a treatment succeeds, or an engineer measures how often a machine part fails, the underlying question is the same: How many successes should we expect, and how likely is each possible number of successes?

The binomial distribution is the essential tool for answering these questions. It describes the behavior of a process made up of repeated, independent trials—each with only two possible outcomes, such as success or failure, hit or miss, correct or incorrect. Even though each trial is simple, the combined results across many trials produce rich probability patterns that allow us to model uncertainty with precision.

This chapter introduces the binomial setting, the conditions that make a scenario binomial, and the formula used to compute exact probabilities. Rather than memorizing formulas, you will learn to recognize when the binomial model truly fits and how to interpret its parameters in practical contexts. Through examples and guided practice, you will see how the binomial distribution captures both randomness and structure: the probability of success on each trial, the number of trials performed, and the different ways successes can occur.

By understanding this distribution, you’ll gain a powerful foundation for probability, statistics, and data analysis. The concepts here serve as stepping stones for confidence intervals, hypothesis testing, sampling distributions, and many applied fields. Once you can identify a binomial situation and compute a binomial probability, you will be prepared for a wide range of statistical reasoning.

Video Reference

6.1 Binomial Probability Distribution

A binomial probability distribution describes the number of successes in a fixed number of independent trials, each with the same probability of success.

If a random variable \(X\) counts the number of successes, then: \[ X \sim \text{Binomial}(n, p) \] where:

\(n\): number of trials
\(p\): probability of success
\(1 - p\): probability of failure

The probability mass function is: \[ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} \] Interpretation

The binomial distribution is one of the most widely used probability models because it represents simple but common situations: success/failure, yes/no, correct/incorrect. It counts how many successes occur, not which ones. This abstraction makes the distribution extremely flexible, used across quality control, medicine, finance, and everyday decision-making.

Practice Problem

These four conditions act like a checklist. Many learners incorrectly try to force binomial calculations into situations where independence or constant probability fail. By verifying these criteria, the reader ensures mathematical accuracy. The power of the binomial distribution lies precisely in its simplicity—once these conditions are satisfied, everything else follows naturally.

A basketball player makes free throws with probability \(p = 0.7\).
He attempts \(n = 5\) shots.

Find the probability that he makes exactly 3 shots.

We compute: \[ P(X = 3) = \binom{5}{3} (0.7)^3 (0.3)^2 \] Interpretation

This problem highlights the transition from concept to application. The free-throw example is classic: repeated attempts with constant probability. The combination term \(\binom{5}{3}\) \(p^{3}(1-p)^{2}\)

counts how many ways the success–failure pattern can occur, while the powers of \(p\) and \(1-p\) describe the probability of each specific pattern.

The model neatly separates counting from likelihood.

This scenario shows how the binomial model treats guessing or randomness. With equal probabilities and independence, the distribution becomes symmetrical. Such problems prepare students for analyzing standardized tests, reliability systems, and research experiments where guessing or chance is involved.

6.2 The Binomial Formula

The binomial probability formula is: \[ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} \]

The combination term counts the number of success arrangements: \[ \binom{n}{k} = \frac{n!}{k!(n-k)!} \]

The probability term assigns likelihood to each arrangement: - \(p^k\) for successes
- \((1-p)^{n-k}\) for failures

Multiplying them gives the total probability.

7 The Binomial Distribution & Formula

The binomial distribution is one of the most fundamental tools in probability. It helps us describe situations where an event can only result in success or failure, repeated over several trials. Whether we are modeling coin flips, product defects, or survey responses, the binomial distribution gives us a clear way to understand how likely different outcomes are.

In this chapter, we focus on visual intuition. Instead of just applying formulas, we explore how the distribution changes shape when we adjust two key parameters:

𝑛: number of trials

𝑝: probability of success

By observing how the graph shifts, spreads, or becomes more symmetric, we develop a stronger sense of what the numbers actually mean. This visual understanding will prepare you for later topics, including when and why the binomial distribution starts to resemble the normal distribution.

7.1 The Binomial Distribution & Formula

This opening sets our learning intentions: we’re not just computing probabilities, but visualizing distributions. Understanding how the binomial distribution behaves as you adjust \(n\) and \(p\) helps build intuition — which is vital when handling real-world data or probabilistic models. We review the formula and then move deeper: seeing distributions as shapes, not just numbers.

Recall that a binomial random variable
\[ X \sim \mathrm{Binomial}(n, p) \]
counts the number of successes in \(n\) independent trials, each with success probability \(p\).

The formula for exactly \(k\) successes is:

\[ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} \]

Interpret

The formula blends counting and probability into one elegant expression. The combination part expresses structure, while the probability part expresses chance. Students often memorize the formula without appreciating that it simply sums identical probabilities over all possible success patterns. Understanding this decomposition helps prevent errors and reveals why the binomial distribution is so powerful.

7.2 From Formula to Distribution — What Happens When You Change 𝑝

As we vary the success probability \(p\) (keeping \(n\) fixed), the binomial distribution shifts.

If \(p\) is small (e.g., \(p = 0.1\)), the distribution is heavily skewed toward \(0\) successes.
If \(p\) is moderate (e.g., \(p = 0.5\)), the distribution is more symmetric.
If \(p\) is large (e.g., \(p = 0.9\)), the distribution shifts toward \(n\), skewed to the right (toward many successes).

This section connects parameter \(p\) to the intuitive “chance of success.” When success is unlikely, most of the mass clusters near zero successes. As success becomes more probable, the most likely outcomes cluster around a central value. And when success is almost certain, you expect many successes. Visualizing these effects helps you predict distribution behavior just by glancing at \(p\). It also helps in modeling — for example, anticipating how likely “many successes” are when success probability changes.

From Formula to Distribution — What Happens When You Change \(n\)

Holding \(p\) fixed and increasing the number of trials \(n\) has the following effects:

The distribution becomes more spread out (larger variance).
The possible range for \(X\) increases (from 0 up to \(n\)).
For moderate \(p\), the distribution often becomes more “bell-shaped” as \(n\) grows.

Interpret

This part explores the effect of trial count. With few trials, outcomes are limited; randomness can dominate. As trials increase, the law of large numbers begins to play out: outcomes start clustering around the mean 𝑛𝑝, and randomness “averages out.” For many practical applications, this behavior is why binomial distributions with large 𝑛 begin to resemble the normal distribution. It builds intuition about why sampling many times tends to smooth out variability.

7.3 When to Consider Normal Approximation (Guidelines)

When the number of trials \(n\) is large and the probability of success \(p\) is not extremely small or extremely large (for example, \(p \approx 0.5\)), the Binomial distribution can often be approximated by a Normal distribution.

Under these conditions, the Binomial random variable is approximated by:

\[ \mu = np, \qquad \sigma^2 = np(1 - p) \]

As \(n\) increases, the histogram of a Binomial distribution becomes smoother and begins to resemble a bell curve, making the Normal approximation increasingly accurate.

Narrative Explanation This section introduces a powerful practical insight: large-sample behavior. When you have many trials and a moderate probability of success, the binomial distribution behaves predictably and smooths out — becoming roughly symmetric and “normal.” This approximation is used widely because Normal curves are easier to manipulate analytically and computationally. Understanding when this works — and when it fails — is key for applied statistics, simulation, and data modeling.

7.4 Recap & Key Takeaways

Key Points

The binomial distribution transforms the discrete trial-based model into a full probability distribution over counts of successes.
Changing \(p\) shifts the distribution toward fewer or more successes.
Changing \(n\) affects spread and smoothness: large \(n\) → smoother, more predictable distribution.
For large \(n\) and moderate \(p\), binomial distributions often approximate the normal distribution.

8 Visualization

When the probability value of success (p) and the number of experiments (n) are entered into the formula, the probability values for each k are depicted as a bar graph. In the case of coins (p = 0.5, n = 2), the chart is symmetrical because the chances of success and failure are the same. In other words, a binomial graph can be created simply by using the binomial formula for each value k.

Binomial Distribution Parameters

If a random variable X follows a binomial distribution, then

Parameter	Formula	Description
Mean (μ)	μ = n · p	The expected average number of successes.
Variance (σ²)	σ² = n · p · (1 - p)	A measure of the spread of the data from the mean.
Standard Deviation (σ)	σ = √(n · p · (1 - p))	The square root of the variance.

In the video, it is also shown how the value P affects the form of distribution:

Condition	Example	Description
P = 0.5	0.5	Symmetric, bell-like (fair coin).
P < 0.5	0.1	Right-skewed (few successes expected).
P > 0.5	0.8	Left-skewed (many successes ecpected).

In the video, it is also shown how the value of n (number of attempts) affects the form of the distribution:

Condition	Example	Description
Small	n = 10	The distribution is still wide and less smooth.
Medium	n = 20	The distribution is starting to look smoother and more centered around the expected value (np).
Big	n = 50	The distribution becomes much smoother, narrower, and more symmetrical.

9 Summarize

In this collection of probability chapters, we moved through the full foundation of discrete probability in a structured and connected way.

We began by defining a sample space, the complete list of everything that can happen in a random experiment. From there, we learned that simple probability is just comparing the size of an event to the size of the possible outcomes.

Next, we explored how events interact. The union rule taught us how to compute the probability that at least one event occurs, and we saw why overlap matters. This naturally linked to mutually exclusive events (no overlap) and exhaustive events (cover the entire sample space). We also discussed how some exhaustive sets overlap while others do not, and how this affects probability calculations.

We then introduced the crucial distinction between independent and dependent events. Independent events multiply directly because one has no influence on the other. Dependent events require conditional probability, where we update the chance of an event based on what already happened.

With these building blocks complete, we moved to the binomial probability distribution, which models repeated independent trials with only two outcomes. It uses combinations to count the possible success patterns and

\[P(X = k) = \binom{n}{k} \, p^{k} (1 - p)^{\,n-k}\]

to measure the probability of each pattern. Together, these ideas explain how to compute the likelihood of getting exactly 𝑘successes out of 𝑛 attempts.

Overall, the chapters create a clear progression: from basic outcomes → to event structure → to combining events → to independence → to repeated-trial models. These concepts form the foundation for all later statistical reasoning, making this set of chapters a complete introduction to the logic of probability.

References

[1] Jacob Clifford. Sample Space and Simple Probability. YouTube Video. https://youtu.be/vqKAbhCqSTc

[2] Jacob Clifford. Independent and Dependent Events. YouTube Video. https://youtu.be/LS-_ihDKr2M

[3] Jacob Clifford. Mutually Exclusive and Exhaustive Events. YouTube Video. https://youtu.be/f7agTv9nA5k

[4] Jacob Clifford. Binomial Probability Distribution. YouTube Video. https://youtu.be/nRuQAtajJYk
[5] Jacob Clifford. Conditional Probability. YouTube Video. https://youtu.be/ynjHKBCiGXY

[6] Jacob Clifford. Compound Probability Example. YouTube Video. https://youtu.be/Y2-vSWFmgyI

(Free & Accessible)

[7] OpenStax. (2019). Introductory Statistics. Rice University.
Available at: https://openstax.org/books/introductory-statistics

[8] Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2019). OpenIntro Statistics (4th ed.).
Available at: https://www.openintro.org/book/os/

[9] Grinstead, C. M., & Snell, J. L. (2012). Introduction to Probability.
Available at: https://math.dartmouth.edu/~prob/prob.pdf

[10] Statistics LibreTexts (2023). Probability Theory.
Available at: https://stats.libretexts.org/

Essential of Probability

Week 10