PROBABILITY DISTRIBUTIONS
TUGAS WEEK 11
7 Introduction of Probability Distribution
Welcome to the Probability Distributions assignment for Week 11. This presentation covers fundamental concepts in probability distributions, including continuous random variables, sampling distributions, the Central Limit Theorem, and sample proportions.
In this assignment, we will explore how probability theory forms the foundation of statistical inference and decision-making in data science. Understanding these concepts is crucial for analyzing data, making predictions, and drawing meaningful conclusions from statistical analyses.
The topics covered in this module include:
- Understanding continuous random variables and their properties
- Working with probability density functions (PDF) and cumulative distribution functions (CDF)
- Analyzing sampling distributions and their characteristics
- Applying the Central Limit Theorem to real-world problems
- Estimating population proportions using sample data
7.1 Continuous Random Variables
Introduction to Continuous Random Variables
A continuous random variable is a variable that can take on any value within a specified range or interval. Unlike discrete random variables that take on countable values, continuous random variables can assume infinitely many values within their range. Examples include height, weight, temperature, time, and distance measurements.
The key characteristic of continuous random variables is that the probability of the variable taking on any single exact value is essentially zero. Instead, we calculate probabilities over intervals or ranges of values.
Key Differences: Discrete vs Continuous
- Discrete: Countable outcomes (e.g., number of students: 10, 11, 12)
- Continuous: Uncountable outcomes (e.g., height: 170.5 cm, 170.51 cm, 170.512 cm…)
- Discrete: P(X = x) > 0 for specific values
- Continuous: P(X = x) = 0 for any specific value
- Discrete: Probability Mass Function (PMF)
- Continuous: Probability Density Function (PDF)
7.1.1 Random Variable
Formal Definition
A random variable is a function that assigns a numerical value to each outcome in the sample space of a random experiment. For continuous random variables, these values form a continuous range.
Notation: Random variables are typically denoted by capital letters (X, Y, Z), while their specific values are denoted by lowercase letters (x, y, z).
Example 1: Temperature
Let X = temperature in Celsius in Cikarang. X can take any value in the range, say [20°C, 35°C]. Possible values: 25.3°C, 28.75°C, 31.256°C, etc.
Example 2: Student Heights
Let Y = height of university students in centimeters. Y might range from 150 cm to 195 cm, with values like 165.5 cm, 172.83 cm, etc.
Example 3: Service Time
Let T = time (in minutes) to serve a customer at a bank. T ≥ 0, with values like 2.5 minutes, 5.75 minutes, 10.123 minutes, etc.
7.1.2 Probability Density Function (PDF)
Definition of PDF
The Probability Density Function (PDF), denoted as f(x), is a function that describes the relative likelihood of a continuous random variable taking on a given value. The PDF must satisfy two conditions:
- Non-negativity: f(x) ≥ 0 for all x
- Total Area = 1: ∫-∞∞ f(x) dx = 1
Important: The value f(x) itself is NOT a probability. It’s a density. Probabilities are calculated as areas under the PDF curve.
Properties of PDF:
1. f(x) ≥ 0 for all x
2. ∫-∞∞ f(x) dx = 1
3. P(X = a) = 0 for any
specific value a
4. P(a ≤ X ≤ b) = ∫ab f(x)
dx
Consider a uniform distribution on the interval [0, 10]. The PDF is:
f(x) = 1/10 for 0 ≤ x ≤ 10
f(x) = 0 otherwise
Verification:
∫010 (1/10) dx = (1/10) × 10 = 1 ✓
The total area under the curve equals 1, satisfying the PDF condition.
The exponential distribution with rate parameter λ = 0.5 has PDF:
f(x) = 0.5e-0.5x for x ≥ 0
f(x) = 0 for x < 0
This is commonly used to model waiting times, such as time until next customer arrival.
7.1.3 Probability on an Interval
Calculating Probabilities Over Intervals
For continuous random variables, we calculate probabilities over intervals by finding the area under the PDF curve between two points. This is computed using integration:
P(a ≤ X ≤ b) = ∫ab f(x) dx
Important properties:
- P(a < X < b) = P(a ≤ X ≤ b) = P(a < X ≤ b) = P(a ≤ X < b)
- The probability is the same whether endpoints are included or excluded
- P(X = a) = 0 for any specific value a
Key Probability Formulas:
P(a ≤ X ≤ b) =
∫ab f(x) dx
P(X ≥ a) =
∫a∞ f(x) dx = 1 - P(X < a)
P(X ≤ b) =
∫-∞b f(x) dx
P(X = a) = 0 (for any
specific value)
Given X ~ Uniform[0, 10], find various probabilities:
1. P(3 ≤ X ≤ 7):
P(3 ≤ X ≤ 7) = ∫37 (1/10) dx = (1/10) × (7-3) = 4/10 = 0.4
2. P(X > 6):
P(X > 6) = ∫610 (1/10) dx = (1/10) × (10-6) = 4/10 = 0.4
3. P(X < 2.5):
P(X < 2.5) = ∫02.5 (1/10) dx = (1/10) × 2.5 = 0.25
4. P(X = 5):
P(X = 5) = 0 (probability of any exact value is zero)
Suppose X ~ Normal(μ = 100, σ = 15), representing IQ scores.
Find P(85 ≤ X ≤ 115):
This interval is within one standard deviation of the mean (100 ± 15).
Using the empirical rule or standard normal table:
P(85 ≤ X ≤ 115) ≈ 0.683 or 68.3%
This means approximately 68.3% of the population has IQ scores between 85 and 115.
7.1.4 Cumulative Distribution Function (CDF)
Definition of CDF
The Cumulative Distribution Function (CDF), denoted as F(x), gives the probability that the random variable X takes on a value less than or equal to x:
F(x) = P(X ≤ x) = ∫-∞x f(t) dt
The CDF is the integral (area) of the PDF from negative infinity up to x.
Properties of CDF:
1. F(x) is
non-decreasing: if x₁ < x₂, then F(x₁) ≤ F(x₂)
2. 0 ≤ F(x) ≤ 1
for all x
3. limx→-∞ F(x) = 0
4. limx→∞
F(x) = 1
5. P(a < X ≤ b) = F(b) - F(a)
6. f(x) = dF(x)/dx
(PDF is derivative of CDF)
For X ~ Uniform[0, 10], the CDF is:
F(x) = 0 for x < 0
F(x) = x/10 for 0 ≤ x ≤ 10
F(x) = 1 for x > 10
Calculations:
F(5) = 5/10 = 0.5 → P(X ≤ 5) = 0.5
F(7.5) = 7.5/10 = 0.75 → P(X ≤ 7.5) = 0.75
P(3 < X ≤ 8) = F(8) - F(3) = 0.8 - 0.3 = 0.5
For exponential distribution with λ = 0.5:
F(x) = 1 - e-0.5x for x ≥ 0
F(x) = 0 for x < 0
Interpretation:
F(2) = 1 - e-1 ≈ 0.632 → 63.2% probability that waiting time is ≤ 2 units
P(X > 3) = 1 - F(3) = e-1.5 ≈ 0.223 → 22.3% probability of waiting more than 3 units
Relationship Between PDF and CDF
- CDF is the integral of PDF: F(x) = ∫-∞x f(t) dt
- PDF is the derivative of CDF: f(x) = dF(x)/dx
- CDF is cumulative: It accumulates probability from left to right
- PDF shows density: It shows where probability is concentrated
- CDF ranges [0,1]: Always between 0 and 1
- PDF can exceed 1: f(x) can be greater than 1 (it’s a density, not probability)
7.2 Sampling Distributions
Introduction to Sampling Distributions
A sampling distribution is the probability distribution of a statistic (such as the sample mean, sample proportion, or sample variance) obtained through repeated sampling from a population. It describes how the statistic varies from sample to sample.
Understanding sampling distributions is crucial because:
- We rarely have access to entire populations
- We use sample statistics to estimate population parameters
- We need to quantify uncertainty in our estimates
- It forms the foundation for statistical inference
Key Terminology
Population Parameter: A numerical characteristic of the population (denoted by Greek letters: μ, σ, π)
Sample Statistic: A numerical characteristic of a sample (denoted by Roman letters: x̄, s, p̂)
Sampling Distribution: The distribution of a sample statistic over all possible samples of size n
Standard Error: The standard deviation of a sampling distribution
Suppose we have a population of student heights with μ = 170 cm and σ = 10 cm.
We repeatedly take samples of n = 25 students and calculate x̄ for each sample.
Results after many samples:
- Sample 1: x̄₁ = 168.5 cm
- Sample 2: x̄₂ = 171.2 cm
- Sample 3: x̄₃ = 169.8 cm
- Sample 4: x̄₄ = 170.5 cm
- … (continue for many samples)
The distribution of all these x̄ values is the sampling distribution of the mean.
Properties of Sampling Distribution of the Mean
For Sample Mean (x̄):
Mean of sampling
distribution: μx̄ = μ
(unbiased estimator)
Standard Error: SE = σx̄ = σ/√n
(decreases as n
increases)
Shape: Approaches normal as n increases
(Central
Limit Theorem)
Important Observations
- Center: The sampling distribution is centered at the population mean μ
- Spread: The spread decreases as sample size increases (σ/√n)
- Shape: Becomes more normal as n increases, regardless of population shape
- Variability: Sample means vary less than individual observations
Population: μ = 100, σ = 15
For n = 2:
SE = 15/√2 ≈ 10.6
Sample means vary widely
For n = 5:
SE = 15/√5 ≈ 6.7
Sample means less variable
For n = 10:
SE = 15/√10 ≈ 4.7
Sample means more concentrated
For n = 30:
SE = 15/√30 ≈ 2.7
Sample means tightly clustered around μ
For n = 100:
SE = 15/√100 = 1.5
Sample means very close to μ
7.3 Central Limit Theorem
The Central Limit Theorem (CLT)
The Central Limit Theorem is one of the most important theorems in statistics. It states that:
For a random sample of size n drawn from any population with mean μ and standard deviation σ, as n becomes large, the sampling distribution of the sample mean x̄ approaches a normal distribution with mean μ and standard deviation σ/√n, regardless of the shape of the original population distribution.
Central Limit Theorem:
As n → ∞:
x̄ ~
N(μ, σ²/n)
Or equivalently:
Z = (x̄ - μ)/(σ/√n) ~ N(0,
1)
Rule of Thumb:
n ≥ 30 is generally
sufficient for CLT to apply
For symmetric populations, smaller n may
suffice
For highly skewed populations, larger n may be needed
Why CLT is Powerful
- Universality: Works for ANY population distribution (normal, skewed, uniform, etc.)
- Predictability: We can predict behavior of sample means even without knowing population distribution
- Foundation for Inference: Enables hypothesis testing and confidence intervals
- Normal Approximation: Allows use of normal distribution tables and methods
- Quality Control: Basis for control charts and process monitoring
Scenario: Population is highly skewed (exponential distribution)
n = 2: Sampling distribution still bimodal/skewed
n = 5: Beginning to smooth out
n = 10: Approaching normal shape
n = 30: Very close to normal
n = 100: Essentially indistinguishable from normal
This demonstrates CLT: as n increases, sampling distribution becomes increasingly normal regardless of population shape.
A factory produces bolts with mean length μ = 5 cm and standard deviation σ = 0.2 cm. A quality inspector takes a random sample of n = 36 bolts.
Question: What is the probability that the sample mean length is between 4.95 and 5.05 cm?
Solution:
By CLT, x̄ ~ N(5, 0.2²/36)
SE = σ/√n = 0.2/√36 = 0.2/6 ≈ 0.0333
Standardize:
Z₁ = (4.95 - 5)/0.0333 ≈ -1.50
Z₂ = (5.05 - 5)/0.0333 ≈ 1.50
P(-1.50 < Z < 1.50) ≈ 0.8664 or 86.64%
Interpretation: There’s an 86.64% chance the sample mean will fall between 4.95 and 5.05 cm.
7.4 Sample Proportion
Definition of Sample Proportion
The sample proportion p̂ (read as “p-hat”) is the fraction of individuals in a sample that possess a certain characteristic of interest. If we have a sample of size n and X individuals have the characteristic, then:
p̂ = X/n
The sample proportion is used to estimate the population proportion π.
Sampling Distribution of p̂
Properties of Sample Proportion Distribution
When sampling from a population with proportion π:
- Mean: E[p̂] = π (unbiased estimator)
- Standard Deviation (Standard Error): σp̂ = √[π(1-π)/n]
Sampling Distribution of p̂:
Mean:
μp̂ = π
Standard Error: SEp̂ =
√[π(1-π)/n]
Normal Approximation (when conditions
met):
p̂ ~ N(π, π(1-π)/n)
Z = (p̂ - π)/√[π(1-π)/n] ~
N(0, 1)
Conditions for Normal
Approximation:
nπ ≥ 10 AND n(1-π) ≥ 10
In a population, 60% support a certain policy (π = 0.6). A random sample of n = 100 people is taken.
Question 1: What is the probability that between 55% and 65% of the sample supports the policy?
Solution:
Check conditions: nπ = 100(0.6) = 60 ≥ 10 ✓
n(1-π) = 100(0.4) = 40 ≥ 10 ✓
SE = √[0.6(0.4)/100] = √0.0024 ≈ 0.049
Z₁ = (0.55 - 0.60)/0.049 ≈ -1.02
Z₂ = (0.65 - 0.60)/0.049 ≈ 1.02
P(-1.02 < Z < 1.02) ≈ 0.6922 or 69.22%
A manufacturer claims defect rate is 5% (π = 0.05). Inspector samples 200 items.
Question: What’s the probability of finding more than 8% defects in the sample?
Solution:
Check conditions: nπ = 200(0.05) = 10 ✓
n(1-π) = 200(0.95) = 190 ✓
SE = √[0.05(0.95)/200] = √0.0002375 ≈ 0.0154
P(p̂ > 0.08) = P(Z > (0.08-0.05)/0.0154)
= P(Z > 1.95)
≈ 0.0256 or 2.56%
Interpretation: If true defect rate is 5%, there’s only 2.56% chance of finding 8% or more defects in a sample of 200. This would be unusual and might suggest the true defect rate is higher.
7.5 Review of Sampling Distributions
Key Concepts Summary
Essential Formulas:
For Sample
Mean:
μx̄ = μ
SEx̄ = σ/√n
Z =
(x̄ - μ)/(σ/√n)
For Sample Proportion:
μp̂ = π
SEp̂ = √[π(1-π)/n]
Z = (p̂ -
π)/√[π(1-π)/n]
Conditions:
CLT: n ≥ 30
(general rule)
Normal approx for p̂: nπ ≥ 10 AND n(1-π) ≥ 10
Common Mistakes to Avoid
- Confusing PDF value with probability: f(x) is NOT a probability; it’s a density
- Thinking P(X = a) > 0 for continuous variables: Always P(X = a) = 0
- Using wrong standard error: For means use σ/√n; for proportions use √[π(1-π)/n]
- Ignoring conditions: Check conditions before using normal approximation
- Confusing σ and SE: σ is population SD; SE is SD of sampling distribution
- Misapplying CLT: CLT applies to sample means, not individual observations
A university wants to estimate average study hours per week. From past data: μ = 20 hours, σ = 5 hours.
Scenario 1: Random sample of n = 25 students
SE = 5/√25 = 1 hour
P(19 ≤ x̄ ≤ 21) = P(-1 ≤ Z ≤ 1) ≈ 0.68 or 68%
Scenario 2: Increase sample to n = 100
SE = 5/√100 = 0.5 hour
P(19 ≤ x̄ ≤ 21) = P(-2 ≤ Z ≤ 2) ≈ 0.95 or 95%
Observation: Larger sample size → smaller SE → more precise estimate → higher confidence in narrower range
References
Primary Sources
- Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2016). Probability & Statistics for Engineers & Scientists (9th ed.). Pearson Education.
- Ross, S. M. (2014). Introduction to Probability and Statistics for Engineers and Scientists (5th ed.). Academic Press.
- Montgomery, D. C., & Runger, G. C. (2018). Applied Statistics and Probability for Engineers (7th ed.). John Wiley & Sons.
- Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences (9th ed.). Cengage Learning.
Additional Resources
- Course Lecture Notes: Essential of Probability - Week 11, Universitas Pelita Harapan, 2024
-
Online Resources:
- Khan Academy - Probability and Statistics
- StatQuest with Josh Starmer - YouTube Channel
- MIT OpenCourseWare - Probability and Statistics
-
Statistical Software Documentation:
- R Documentation (stats package)
- Python SciPy.stats documentation
Note on Data and Examples: All numerical examples and scenarios presented in this document are for educational purposes. While based on realistic situations, specific values may be hypothetical to illustrate statistical concepts clearly.
Learning Outcomes
Upon completing this module, students should be able to:
- Distinguish between discrete and continuous random variables
- Understand and work with probability density functions (PDF) and cumulative distribution functions (CDF)
- Calculate probabilities for continuous distributions
- Explain the concept of sampling distributions
- Apply the Central Limit Theorem to real-world problems
- Work with sampling distributions of proportions
- Understand the relationship between sample size and precision of estimates
- Recognize when to apply normal approximations