PROBABILITY DISTRIBUTIONS

TUGAS WEEK 11

FRENKHY TONGA RETANG-2.png

7 Introduction of Probability Distribution

Welcome to the Probability Distributions assignment for Week 11. This presentation covers fundamental concepts in probability distributions, including continuous random variables, sampling distributions, the Central Limit Theorem, and sample proportions.

In this assignment, we will explore how probability theory forms the foundation of statistical inference and decision-making in data science. Understanding these concepts is crucial for analyzing data, making predictions, and drawing meaningful conclusions from statistical analyses.

The topics covered in this module include:

7.1 Continuous Random Variables

Introduction to Continuous Random Variables

A continuous random variable is a variable that can take on any value within a specified range or interval. Unlike discrete random variables that take on countable values, continuous random variables can assume infinitely many values within their range. Examples include height, weight, temperature, time, and distance measurements.

The key characteristic of continuous random variables is that the probability of the variable taking on any single exact value is essentially zero. Instead, we calculate probabilities over intervals or ranges of values.

Key Differences: Discrete vs Continuous

  • Discrete: Countable outcomes (e.g., number of students: 10, 11, 12)
  • Continuous: Uncountable outcomes (e.g., height: 170.5 cm, 170.51 cm, 170.512 cm…)
  • Discrete: P(X = x) > 0 for specific values
  • Continuous: P(X = x) = 0 for any specific value
  • Discrete: Probability Mass Function (PMF)
  • Continuous: Probability Density Function (PDF)

7.1.1 Random Variable

Formal Definition

A random variable is a function that assigns a numerical value to each outcome in the sample space of a random experiment. For continuous random variables, these values form a continuous range.

Notation: Random variables are typically denoted by capital letters (X, Y, Z), while their specific values are denoted by lowercase letters (x, y, z).

Examples of Continuous Random Variables

Example 1: Temperature

Let X = temperature in Celsius in Cikarang. X can take any value in the range, say [20°C, 35°C]. Possible values: 25.3°C, 28.75°C, 31.256°C, etc.

Example 2: Student Heights

Let Y = height of university students in centimeters. Y might range from 150 cm to 195 cm, with values like 165.5 cm, 172.83 cm, etc.

Example 3: Service Time

Let T = time (in minutes) to serve a customer at a bank. T ≥ 0, with values like 2.5 minutes, 5.75 minutes, 10.123 minutes, etc.

7.1.2 Probability Density Function (PDF)

Definition of PDF

The Probability Density Function (PDF), denoted as f(x), is a function that describes the relative likelihood of a continuous random variable taking on a given value. The PDF must satisfy two conditions:

  1. Non-negativity: f(x) ≥ 0 for all x
  2. Total Area = 1:-∞ f(x) dx = 1

Important: The value f(x) itself is NOT a probability. It’s a density. Probabilities are calculated as areas under the PDF curve.

Properties of PDF:

1. f(x) ≥ 0 for all x
2. ∫-∞ f(x) dx = 1
3. P(X = a) = 0 for any specific value a
4. P(a ≤ X ≤ b) = ∫ab f(x) dx

Example: Uniform Distribution

Consider a uniform distribution on the interval [0, 10]. The PDF is:

f(x) = 1/10 for 0 ≤ x ≤ 10

f(x) = 0 otherwise

Verification:

010 (1/10) dx = (1/10) × 10 = 1 ✓

The total area under the curve equals 1, satisfying the PDF condition.

Example: Exponential Distribution

The exponential distribution with rate parameter λ = 0.5 has PDF:

f(x) = 0.5e-0.5x for x ≥ 0

f(x) = 0 for x < 0

This is commonly used to model waiting times, such as time until next customer arrival.

7.1.3 Probability on an Interval

Calculating Probabilities Over Intervals

For continuous random variables, we calculate probabilities over intervals by finding the area under the PDF curve between two points. This is computed using integration:

P(a ≤ X ≤ b) = ∫ab f(x) dx

Important properties:

  • P(a < X < b) = P(a ≤ X ≤ b) = P(a < X ≤ b) = P(a ≤ X < b)
  • The probability is the same whether endpoints are included or excluded
  • P(X = a) = 0 for any specific value a

Key Probability Formulas:

P(a ≤ X ≤ b) = ∫ab f(x) dx

P(X ≥ a) = ∫a f(x) dx = 1 - P(X < a)

P(X ≤ b) = ∫-∞b f(x) dx

P(X = a) = 0 (for any specific value)

Example: Uniform Distribution Probabilities

Given X ~ Uniform[0, 10], find various probabilities:

1. P(3 ≤ X ≤ 7):

P(3 ≤ X ≤ 7) = ∫37 (1/10) dx = (1/10) × (7-3) = 4/10 = 0.4

2. P(X > 6):

P(X > 6) = ∫610 (1/10) dx = (1/10) × (10-6) = 4/10 = 0.4

3. P(X < 2.5):

P(X < 2.5) = ∫02.5 (1/10) dx = (1/10) × 2.5 = 0.25

4. P(X = 5):

P(X = 5) = 0 (probability of any exact value is zero)

Example: Normal Distribution

Suppose X ~ Normal(μ = 100, σ = 15), representing IQ scores.

Find P(85 ≤ X ≤ 115):

This interval is within one standard deviation of the mean (100 ± 15).

Using the empirical rule or standard normal table:

P(85 ≤ X ≤ 115) ≈ 0.683 or 68.3%

This means approximately 68.3% of the population has IQ scores between 85 and 115.

7.1.4 Cumulative Distribution Function (CDF)

Definition of CDF

The Cumulative Distribution Function (CDF), denoted as F(x), gives the probability that the random variable X takes on a value less than or equal to x:

F(x) = P(X ≤ x) = ∫-∞x f(t) dt

The CDF is the integral (area) of the PDF from negative infinity up to x.

Properties of CDF:

1. F(x) is non-decreasing: if x₁ < x₂, then F(x₁) ≤ F(x₂)
2. 0 ≤ F(x) ≤ 1 for all x
3. limx→-∞ F(x) = 0
4. limx→∞ F(x) = 1
5. P(a < X ≤ b) = F(b) - F(a)
6. f(x) = dF(x)/dx (PDF is derivative of CDF)

Example: Uniform Distribution CDF

For X ~ Uniform[0, 10], the CDF is:

F(x) = 0 for x < 0

F(x) = x/10 for 0 ≤ x ≤ 10

F(x) = 1 for x > 10

Calculations:

F(5) = 5/10 = 0.5 → P(X ≤ 5) = 0.5

F(7.5) = 7.5/10 = 0.75 → P(X ≤ 7.5) = 0.75

P(3 < X ≤ 8) = F(8) - F(3) = 0.8 - 0.3 = 0.5

Example: Exponential Distribution CDF

For exponential distribution with λ = 0.5:

F(x) = 1 - e-0.5x for x ≥ 0

F(x) = 0 for x < 0

Interpretation:

F(2) = 1 - e-1 ≈ 0.632 → 63.2% probability that waiting time is ≤ 2 units

P(X > 3) = 1 - F(3) = e-1.5 ≈ 0.223 → 22.3% probability of waiting more than 3 units

Relationship Between PDF and CDF

  • CDF is the integral of PDF: F(x) = ∫-∞x f(t) dt
  • PDF is the derivative of CDF: f(x) = dF(x)/dx
  • CDF is cumulative: It accumulates probability from left to right
  • PDF shows density: It shows where probability is concentrated
  • CDF ranges [0,1]: Always between 0 and 1
  • PDF can exceed 1: f(x) can be greater than 1 (it’s a density, not probability)

7.2 Sampling Distributions

Introduction to Sampling Distributions

A sampling distribution is the probability distribution of a statistic (such as the sample mean, sample proportion, or sample variance) obtained through repeated sampling from a population. It describes how the statistic varies from sample to sample.

Understanding sampling distributions is crucial because:

  • We rarely have access to entire populations
  • We use sample statistics to estimate population parameters
  • We need to quantify uncertainty in our estimates
  • It forms the foundation for statistical inference

Key Terminology

Population Parameter: A numerical characteristic of the population (denoted by Greek letters: μ, σ, π)

Sample Statistic: A numerical characteristic of a sample (denoted by Roman letters: x̄, s, p̂)

Sampling Distribution: The distribution of a sample statistic over all possible samples of size n

Standard Error: The standard deviation of a sampling distribution

Example: Sampling Distribution of Sample Mean

Suppose we have a population of student heights with μ = 170 cm and σ = 10 cm.

We repeatedly take samples of n = 25 students and calculate x̄ for each sample.

Results after many samples:

  • Sample 1: x̄₁ = 168.5 cm
  • Sample 2: x̄₂ = 171.2 cm
  • Sample 3: x̄₃ = 169.8 cm
  • Sample 4: x̄₄ = 170.5 cm
  • … (continue for many samples)

The distribution of all these x̄ values is the sampling distribution of the mean.

Properties of Sampling Distribution of the Mean

For Sample Mean (x̄):

Mean of sampling distribution: μ = μ
(unbiased estimator)

Standard Error: SE = σ = σ/√n
(decreases as n increases)

Shape: Approaches normal as n increases
(Central Limit Theorem)

Important Observations

  • Center: The sampling distribution is centered at the population mean μ
  • Spread: The spread decreases as sample size increases (σ/√n)
  • Shape: Becomes more normal as n increases, regardless of population shape
  • Variability: Sample means vary less than individual observations
Example: Effect of Sample Size

Population: μ = 100, σ = 15

For n = 2:

SE = 15/√2 ≈ 10.6

Sample means vary widely

For n = 5:

SE = 15/√5 ≈ 6.7

Sample means less variable

For n = 10:

SE = 15/√10 ≈ 4.7

Sample means more concentrated

For n = 30:

SE = 15/√30 ≈ 2.7

Sample means tightly clustered around μ

For n = 100:

SE = 15/√100 = 1.5

Sample means very close to μ

7.3 Central Limit Theorem

The Central Limit Theorem (CLT)

The Central Limit Theorem is one of the most important theorems in statistics. It states that:

For a random sample of size n drawn from any population with mean μ and standard deviation σ, as n becomes large, the sampling distribution of the sample mean x̄ approaches a normal distribution with mean μ and standard deviation σ/√n, regardless of the shape of the original population distribution.

Central Limit Theorem:

As n → ∞:
x̄ ~ N(μ, σ²/n)

Or equivalently:
Z = (x̄ - μ)/(σ/√n) ~ N(0, 1)

Rule of Thumb:
n ≥ 30 is generally sufficient for CLT to apply
For symmetric populations, smaller n may suffice
For highly skewed populations, larger n may be needed

Why CLT is Powerful

  • Universality: Works for ANY population distribution (normal, skewed, uniform, etc.)
  • Predictability: We can predict behavior of sample means even without knowing population distribution
  • Foundation for Inference: Enables hypothesis testing and confidence intervals
  • Normal Approximation: Allows use of normal distribution tables and methods
  • Quality Control: Basis for control charts and process monitoring
Visual Demonstration of CLT

Scenario: Population is highly skewed (exponential distribution)

n = 2: Sampling distribution still bimodal/skewed

n = 5: Beginning to smooth out

n = 10: Approaching normal shape

n = 30: Very close to normal

n = 100: Essentially indistinguishable from normal

This demonstrates CLT: as n increases, sampling distribution becomes increasingly normal regardless of population shape.

Example: Application of CLT

A factory produces bolts with mean length μ = 5 cm and standard deviation σ = 0.2 cm. A quality inspector takes a random sample of n = 36 bolts.

Question: What is the probability that the sample mean length is between 4.95 and 5.05 cm?

Solution:

By CLT, x̄ ~ N(5, 0.2²/36)

SE = σ/√n = 0.2/√36 = 0.2/6 ≈ 0.0333

Standardize:

Z₁ = (4.95 - 5)/0.0333 ≈ -1.50

Z₂ = (5.05 - 5)/0.0333 ≈ 1.50

P(-1.50 < Z < 1.50) ≈ 0.8664 or 86.64%

Interpretation: There’s an 86.64% chance the sample mean will fall between 4.95 and 5.05 cm.

7.4 Sample Proportion

Definition of Sample Proportion

The sample proportion p̂ (read as “p-hat”) is the fraction of individuals in a sample that possess a certain characteristic of interest. If we have a sample of size n and X individuals have the characteristic, then:

p̂ = X/n

The sample proportion is used to estimate the population proportion π.

Sampling Distribution of p̂

Properties of Sample Proportion Distribution

When sampling from a population with proportion π:

  • Mean: E[p̂] = π (unbiased estimator)
  • Standard Deviation (Standard Error): σ = √[π(1-π)/n]

Sampling Distribution of p̂:

Mean: μ = π

Standard Error: SE = √[π(1-π)/n]

Normal Approximation (when conditions met):
p̂ ~ N(π, π(1-π)/n)

Z = (p̂ - π)/√[π(1-π)/n] ~ N(0, 1)

Conditions for Normal Approximation:
nπ ≥ 10 AND n(1-π) ≥ 10

Example: Political Survey

In a population, 60% support a certain policy (π = 0.6). A random sample of n = 100 people is taken.

Question 1: What is the probability that between 55% and 65% of the sample supports the policy?

Solution:

Check conditions: nπ = 100(0.6) = 60 ≥ 10 ✓

n(1-π) = 100(0.4) = 40 ≥ 10 ✓

SE = √[0.6(0.4)/100] = √0.0024 ≈ 0.049

Z₁ = (0.55 - 0.60)/0.049 ≈ -1.02

Z₂ = (0.65 - 0.60)/0.049 ≈ 1.02

P(-1.02 < Z < 1.02) ≈ 0.6922 or 69.22%

Example: Quality Control

A manufacturer claims defect rate is 5% (π = 0.05). Inspector samples 200 items.

Question: What’s the probability of finding more than 8% defects in the sample?

Solution:

Check conditions: nπ = 200(0.05) = 10 ✓

n(1-π) = 200(0.95) = 190 ✓

SE = √[0.05(0.95)/200] = √0.0002375 ≈ 0.0154

P(p̂ > 0.08) = P(Z > (0.08-0.05)/0.0154)

= P(Z > 1.95)

≈ 0.0256 or 2.56%

Interpretation: If true defect rate is 5%, there’s only 2.56% chance of finding 8% or more defects in a sample of 200. This would be unusual and might suggest the true defect rate is higher.

7.5 Review of Sampling Distributions

Key Concepts Summary

  • Continuous Random Variables: Can take any value in an interval; probabilities calculated over ranges, not at specific points
  • PDF (Probability Density Function): Describes relative likelihood; area under curve = probability
  • CDF (Cumulative Distribution Function): P(X ≤ x); integral of PDF; always non-decreasing
  • Sampling Distribution: Distribution of a statistic over all possible samples
  • Central Limit Theorem: Sample means approach normal distribution as n increases, regardless of population shape
  • Standard Error: Standard deviation of sampling distribution; measures variability of statistic
  • Sample Proportion: Used to estimate population proportion; follows approximately normal distribution when conditions met
  • Essential Formulas:

    For Sample Mean:
    μ = μ
    SE = σ/√n
    Z = (x̄ - μ)/(σ/√n)

    For Sample Proportion:
    μ = π
    SE = √[π(1-π)/n]
    Z = (p̂ - π)/√[π(1-π)/n]

    Conditions:
    CLT: n ≥ 30 (general rule)
    Normal approx for p̂: nπ ≥ 10 AND n(1-π) ≥ 10

    Common Mistakes to Avoid

    • Confusing PDF value with probability: f(x) is NOT a probability; it’s a density
    • Thinking P(X = a) > 0 for continuous variables: Always P(X = a) = 0
    • Using wrong standard error: For means use σ/√n; for proportions use √[π(1-π)/n]
    • Ignoring conditions: Check conditions before using normal approximation
    • Confusing σ and SE: σ is population SD; SE is SD of sampling distribution
    • Misapplying CLT: CLT applies to sample means, not individual observations
    Comprehensive Example

    A university wants to estimate average study hours per week. From past data: μ = 20 hours, σ = 5 hours.

    Scenario 1: Random sample of n = 25 students

    SE = 5/√25 = 1 hour

    P(19 ≤ x̄ ≤ 21) = P(-1 ≤ Z ≤ 1) ≈ 0.68 or 68%

    Scenario 2: Increase sample to n = 100

    SE = 5/√100 = 0.5 hour

    P(19 ≤ x̄ ≤ 21) = P(-2 ≤ Z ≤ 2) ≈ 0.95 or 95%

    Observation: Larger sample size → smaller SE → more precise estimate → higher confidence in narrower range

    References

    Primary Sources

    1. Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2016). Probability & Statistics for Engineers & Scientists (9th ed.). Pearson Education.
    2. Ross, S. M. (2014). Introduction to Probability and Statistics for Engineers and Scientists (5th ed.). Academic Press.
    3. Montgomery, D. C., & Runger, G. C. (2018). Applied Statistics and Probability for Engineers (7th ed.). John Wiley & Sons.
    4. Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences (9th ed.). Cengage Learning.

    Additional Resources

    1. Course Lecture Notes: Essential of Probability - Week 11, Universitas Pelita Harapan, 2024
    2. Online Resources:
      • Khan Academy - Probability and Statistics
      • StatQuest with Josh Starmer - YouTube Channel
      • MIT OpenCourseWare - Probability and Statistics
    3. Statistical Software Documentation:
      • R Documentation (stats package)
      • Python SciPy.stats documentation

    Note on Data and Examples: All numerical examples and scenarios presented in this document are for educational purposes. While based on realistic situations, specific values may be hypothetical to illustrate statistical concepts clearly.

    Learning Outcomes

    Upon completing this module, students should be able to:

    • Distinguish between discrete and continuous random variables
    • Understand and work with probability density functions (PDF) and cumulative distribution functions (CDF)
    • Calculate probabilities for continuous distributions
    • Explain the concept of sampling distributions
    • Apply the Central Limit Theorem to real-world problems
    • Work with sampling distributions of proportions
    • Understand the relationship between sample size and precision of estimates
    • Recognize when to apply normal approximations

    Final Remarks

  • Understanding sampling distributions is fundamental to statistical inference
  • The Central Limit Theorem bridges probability theory and statistical practice
  • Larger sample sizes generally lead to more precise estimates
  • Always verify conditions before applying approximations
  • Visual representations (graphs, charts) aid in understanding distributions
  • Practice with diverse examples solidifies conceptual understanding
  • These concepts form the foundation for hypothesis testing and confidence intervals (covered in future modules)