Probability not only helps us understand how likely an event is to occur, but also forms the foundation of many statistical methods used for decision-making. When a process or experiment produces varying outcomes, we use a random variable to represent those outcomes and a probability distribution to describe how the probabilities are assigned to each possible value. Understanding the shape and properties of a distribution is essential because it determines how data behave, how we calculate probabilities, and how we make predictions. From distributions for continuous variables to the behavior of statistics such as sample means, probability distributions serve as the core of inferential statistics.
Understanding these basics will provide a strong foundation as we transition into the main topic of this video: Continuous Random Variables and Their Probability Distributions.
To understand continuous random variables, it is essential to know how probability is represented using a Probability Density Function (PDF). Unlike discrete random variables, a continuous random variable does not assign probability to individual points. Instead, probability is obtained from the area under the PDF curve.
Key Characteristics:
The variable takes values in an interval such as \((a, b)\) or even \((-\infty, +\infty)\).
The probability of any single point is always zero: P\[P(X = x) = 0\]
Probabilities are meaningful only over intervals: \[P(a \leq X \leq b) = \int_{a}^{b} f(x) \, dx\]
1. Non-negativity \[f(x) \geq 0 \quad \forall x\]
2. Total Area Equals 1 \[\int_{-\infty}^{\infty} f(x) \, dx = 1\]
Interpretation:
Larger values of \(f(x)\) indicate higher probability density around that value.
However, \(f(x)\) is not a probability; probabilities come from the area under the curve.
Example PDF: \(f(x) = 3x^2\) on \([0, 1]\)
Consider the probability density function:
\[f(x) = 3x^2, \quad 0 \leq x \leq 1\]
Validation:
\[\int_{0}^{1} 3x^2 \, dx = 1\]
\[P(a \leq X \leq b) = \int_{a}^{b} f(x) \, dx\]
Example:
\[P(0.5 \leq X \leq 1)\]
\[F(x) = P(X \leq x) = \int_{0}^{x} 3t^2 \, dt = x^3\]
Relationship between PDF and CDF:
\[f(x) = F'(x)\]
Understanding sampling distributions is crucial for making inferences about a population based on sample data.
This distribution is created by measuring every single individual in the population. It is described by the Population Mean (\(\mu\)) and the Population Standard Deviation (\(\sigma\)).
This distribution is created by measuring every single individual in a single, specific sample taken from the population. It is described by the Sample Mean (\(\bar{x}\)) and Sample Standard Deviation (\(s\)).
This distribution involves repeatedly taking multiple random samples from a population, calculating a statistic (like the mean, \(\bar{x}\)) for each sample, and then combining those statistics into a distribution.
| Characteristic | Population Distribution | Sampling Distribution of \(\bar{x}\) |
|---|---|---|
| Mean | \(\mu\) | \(\mu_{\bar{x}} = \mu\) (Mean of the sampling distribution equals the population mean) |
| Standard Deviation | \(\sigma\) | \(\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\) (Also called the Standard Error) |
| Notation | \(X \sim N(\mu, \sigma)\) | \(\bar{X} \sim N(\mu, \frac{\sigma}{\sqrt{n}})\) |
| Standardization (Z-score) | \(Z = \frac{x - \mu}{\sigma}\) | \(Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}\) |
Detailed Distribution Characteristics
1. MeanAccording to the Central Limit Theorem, the mean of all sample means (\(\mu_{\bar{x}}\)) will be equal to the population mean (\(\mu\)).
\[\mu_{\bar{x}} = \mu\]
The standard deviation of the sampling distribution of the mean (\(\sigma_{\bar{x}}\)) is called the Standard Error. Its value indicates the variability of the sample mean from the population mean. This value decreases as the sample size (\(n\)) increases.
\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]
To calculate the probability of a specific sample mean (\(\bar{x}\)) occurring, we use the Z-score standardized by the Standard Error:
\[Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}\]
Note on Standard Deviation (\(\sigma_{\bar{x}}\)): The standard deviation of the sampling distribution (Standard Error) is always smaller than the population standard deviation (\(\sigma\)). This is because averages (sample means) are less variable than individual observations.
Suppose the heights of all Canadians follow a normal distribution with:
Population Mean (\(\mu\)): \(160 \text{ cm}\)
Population Standard Deviation (\(\sigma\)): \(7 \text{ cm}\)
Example 1: Probability of Sample Mean
Question: What is the probability that the average height of a sample of \(n=10\) random Canadians is less than \(157 \text{ cm}\)? (\(P(\bar{X} < 157)\))
This question deals with the Sampling Distribution (\(\bar{X}\)), so we must use the standard error and the Z-score formula for the sampling distribution.
Step 1: Calculate the Standard Error (\(\sigma_{\bar{x}}\))
\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{7}{\sqrt{10}}\]
Step 2: Calculate the Z-score
\[Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} = \frac{157 - 160}{2.213}\]
Step 3: Find the Probability
Using the Z-score table (or the pnorm function in R), we find the
area to the left of \(Z = -1.36\).
\[P(\bar{X} < 157) = P(Z < -1.36)\]
The probability is 0.0869 or 8.69%.
Example 2: Probability of an Individual Observation (Population Distribution)
Question: What is the proportion of all people that have heights greater than \(170 \text{ cm}\)? (\(P(X > 170)\))
This requires the Population Distribution formula.
Step 1: Calculate the Z-score (Population)
\[Z = \frac{x - \mu}{\sigma} = \frac{170 - 160}{7}\]
Step 2: Find the Probability (Area to the Right)
Since we want \(P(X > 170)\), we look for the area to the right (\(1\) minus the area to the left).
\[P(X > 170) = 1 - P(X \leq 170) = 1 - P(Z \leq 1.43)\]
The proportion is 0.0764 or 7.64%.
Before exploring the concept of sampling distributions in detail, this video provides a clear visual explanation of how statistics such as sample means behave when repeatedly drawn from the same population. It offers an intuitive foundation for understanding variability, uncertainty, and why sampling distributions are essential in statistical inference. Please watch the video below before continuing with the material.
This material discusses a crucial concept in inferential statistics, the Central Limit Theorem (CLT), and its relationship with sampling distribution.
Understanding these basics will provide a strong foundation as we transition into the main topic of the Central Limit Theorem (CLT).
A sampling distribution is a probability distribution formed from a sample statistic (such as the mean, \(\bar{x}\)) calculated from many random samples of the same size taken from the same population.
To form the sampling distribution of the sample mean (\(\bar{x}\)), the following steps are performed repeatedly:
Take a Simple Random Sample of size \(n\) from the Population.
Calculate the Statistic: Compute the sample mean (\(\bar{x}\)) for that sample.
Plot the Value: Plot the obtained \(\bar{x}\) value onto a graph to form the
distribution.
The CLT is a theorem that predicts the shape of the sampling distribution based on the sample size (\(n\)).
If the sample size (\(\mathbf{n}\)) is large enough, the sampling distribution of the sample mean (\(\mathbf{\bar{x}}\)) will approximate a Normal distribution, regardless of the shape of the original population distribution.
Intuitively, even if the original population may be skewed or uniform, the means of many samples tend to balance out and cluster around the population mean, resulting in a bell shape (Normal distribution) for the \(\bar{x}\) distribution.
The condition under which the CLT can be safely applied to assume the sampling distribution is Normal is:
\[\mathbf{n \ge 30}\]
If \(n\) is 30 or greater, the sampling distribution of \(\bar{x}\) is considered a good approximation of the Normal distribution.
If the original population distribution is already known to be Normal, then:
The sampling distribution of \(\bar{x}\) will be Normal for any sample size (even if \(n < 30\)).
Nevertheless, a larger sample size is still recommended for more precise estimation and higher reliability.
The Central Limit Theorem is highly important because it allows statisticians to use inference methods based on the Normal Distribution (such as calculating Z-scores or confidence intervals) on large sample data, without needing to know or assume the exact shape of the original population distribution.
Having understood the Sampling Distribution of the Sample Mean (\(\bar{x}\)) and how the Central Limit Theorem (CLT) applies to it, we now shift our focus to another crucial type of statistic: the Sample Proportion (\(\hat{p}\)).
The Proportion is used when dealing with categorical data (e.g., the percentage of people who vote yes/no, or the proportion of items that are defective). This video will clearly explain:
Please watch this video to build a solid foundation on the Sampling Distribution of the Sample Proportion before proceeding with related calculations.
This material explains the concept of the Sampling Distribution when the statistic used is the Sample Proportion (\(\hat{p}\)).
A sampling distribution is a probability distribution that involves a repetitive process: repeatedly taking samples from a population, calculating a statistic (such as \(\bar{x}\) or \(\hat{p}\)) for each sample, and then plotting those values onto a graph to create a distribution.
Proportion describes the fraction of favorable outcomes in relation to the whole amount.
Example: The proportion of people with green eyes in a population or a sample.
Population Proportion: Represented by the symbol \(\mathbf{P}\) (a fixed value for the population).
Sample Proportion: Represented by the symbol \(\mathbf{\hat{p}}\) (P-hat) (a value that varies from sample to sample).
If we repeatedly take samples from the population and calculate \(\hat{p}\) for each sample, the different \(\hat{p}\) values obtained will form the Sampling Distribution of the Sample Proportion.
If the sampling distribution of the sample proportion follows the Central Limit Theorem (CLT), it will approximate a Normal distribution and possess three main characteristics:
The mean (average) of all combined sample proportions is equal to the true population proportion.
\[\mu_{\hat{p}} = P\]
The standard deviation of the \(\hat{p}\) sampling distribution is called the Standard Error of the proportion. It measures the spread of the \(\hat{p}\) values around \(P\).
\[\sigma_{\hat{p}} = \sqrt{\frac{P(1-P)}{n}}\]
Where \(n\) is the sample size, and \((1-P)\) is often denoted as \(Q\).
If the \(\hat{p}\) distribution is Normal, we can use the Z-score formula to standardize \(\hat{p}\) and calculate the area (probability) under the curve:
\[Z = \frac{\hat{p} - P}{\sigma_{\hat{p}}} = \frac{\hat{p} - P}{\sqrt{\frac{P(1-P)}{n}}}\]
It is crucial to note that the CLT conditions for proportions are different from those for means (\(n \ge 30\)).
For the Sampling Distribution of the Sample Proportion (\(\hat{p}\)) to be considered approximately Normal, both of the following conditions must be met:
1. Expected Number of Successes \(\ge 10\):
\[n P \ge 10\]
2. Expected Number of Failures \(\ge 10\):
\[n (1-P) \ge 10\]
If both requirements are satisfied, the CLT can be applied, and the Normal Distribution can be used for probability calculations involving \(\hat{p}\).
To bridge the gap between theoretical characteristics and practical application, this video provides a detailed review using practical examples.
The video will show you when to apply each of the three key probability methods:
This understanding is crucial for selecting the correct tool for probability calculations in real-world scenarios.
This document summarizes the three main methods used in the video to solve probability problems involving repeated trials (drawing marbles), categorized by sample size (\(n\)).
Scenario Setup: A jar contains 200 Green (G) marbles and 300 Blue (B) marbles.
Total Marbles: 500
Probability of Success (P): \(P(\text{Green}) = 200 / 500 = \mathbf{0.4}\)
Probability of Failure (Q): \(P(\text{Blue}) = 300 / 500 = \mathbf{0.6}\)
Question: If a marble is drawn three times with replacement, what is the probability of drawing at least two green marbles?
Method Used: Sample Space and Direct Probability For very small samples, listing all outcomes in the sample space and calculating their probabilities is feasible.
Exactly 2 G: GGB, GBG, BGG
Exactly 3 G: GGG
\(P(\text{GGB}) = 0.4 \times 0.4 \times 0.6 = 0.096\)
\(P(\text{GBG}) = 0.4 \times 0.6 \times 0.4 = 0.096\)
\(P(\text{BGG}) = 0.6 \times 0.4 \times 0.4 = 0.096\)
\(P(\text{GGG}) = 0.4 \times 0.4 \times 0.4 = 0.064\)
Question: If a marble is drawn five times with replacement, what is the probability of drawing at least two green marbles?
Method Used: Binomial Distribution Formula Since listing the sample space for \(n=5\) becomes tedious, the Binomial formula is used to find the probability of getting exactly \(k\) successes in \(n\) trials.
\(n=5\), \(k=2\), \(p=0.4\)
\(P(k=2) = \binom{5}{2} (0.4)^2 (0.6)^3 = 0.3456\)
Question: If a marble is drawn 100 times with replacement, what is the approximate probability of drawing at least 35 green marbles?
Method Used: Normal Approximation (CLT) for Proportions When \(n\) is very large, using the Binomial formula repeatedly (66 times in this case) is not feasible. The Central Limit Theorem is used to approximate the Binomial distribution with the Normal distribution.
For the Normal approximation to be valid, both conditions for the sample proportion must be met:
Successes: \(n P = 100 \times 0.4 = \mathbf{40} \ge 10\) (Condition met)
Failures: \(n (1-P) = 100 \times 0.6 = \mathbf{60} \ge 10\) (Condition met)
Since both are \(\ge 10\), the distribution of \(\hat{p}\) is approximately Normal.
The desired minimum number of successes (\(X=35\)) is converted to a proportion:
\[\hat{p} = \frac{X}{n} = \frac{35}{100} = \mathbf{0.35}\]
The question becomes: Find \(P(\hat{p} \ge 0.35)\).
\[\sigma_{\hat{p}} = \sqrt{\frac{P(1-P)}{n}} = \sqrt{\frac{0.4 \times 0.6}{100}} = \sqrt{0.0024} \approx 0.04899\]
\[Z = \frac{\hat{p} - P}{\sigma_{\hat{p}}} = \frac{0.35 - 0.40}{0.04899} \approx \mathbf{-1.02}\]
The Z-score table gives the area to the left of \(Z=-1.02\): \(0.1539\).
Since the question asks for “at least 35” (the area to the right), we subtract from 1.
\(P(Z \ge -1.02) = 1 - 0.1539 = \mathbf{0.8461}\) The approximate probability of drawing at least 35 green marbles is \(\mathbf{84.61\%}\).
Key Takeaway: Using the CLT provides an approximate probability, which is generally close enough for practical use in introductory statistics.
This section provides a classification of the learning resources used in this document, organized by the main statistical concepts discussed.
Sampling Distributions and Inference
Sampling Distribution and Central Limit Theorem (CLT): Core concepts of sampling distribution and the conditions for CLT’s application are explained using references from HEI Publishing (Indonesia (2024)) and EcampusOntario (EcampusOntario).
Comparison of Probability Methods: Guidance on choosing between exact probability methods (Binomial) and approximation (Normal) is sourced from Dsciencelabs (Dsciencelabs).