Tugas Week 11 ~ Probability Distribution

Nama: Adinda Adelia Futri

NIM: 52250055

Foto Adinda Adelia Futri

Student Profile

🎓 Student Majoring in Data Science at ITSB
📊 Data Science
📈 Statistics
💻 R Programming

Introduction to Probability Distributions of Continuous Variables

▶️ Watch: Introduction to Probability Distributions
Before exploring specific continuous probability distributions, it is important to understand the fundamental differences between discrete and continuous variables. Unlike discrete variables that take specific, countable values, continuous variables can assume any value within a given range or interval. This distinction fundamentally changes how we calculate and interpret probabilities. In continuous distributions, we focus on the probability of a variable falling within a certain range rather than taking exact values.

Comparison of Discrete and Continuous Distributions

Discrete Distribution

Continuous Distribution

Probability Density Function (PDF)

Probability Density Function (PDF):
The Probability Density Function describes the relative likelihood for a continuous random variable to take on a given value.

Mathematical Definition: \[ f(x)= \frac{d}{dx} F(x) \]
where: \(f(x)\) is the probability density function
\(F(x)\) is the cumulative distribution function
\(x\) is the value of the continuous random variable

Properties of PDF:
\(f(x) \geq 0\) for all \(x\) (Non-negativity)
\(\int_{-\infty}^{\infty} f(x) dx = 1\) (Normalization)
\(P(a \leq X \leq b) = \int_{a}^{b} f(x) dx\) (Area under curve)
\(f(x)=\frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\)

Introduction to Random Variables and Probability Distributions

What are Random Variables?
Random variables are functions that assign numerical values to outcomes of random experiments. They provide a mathematical framework for quantifying uncertainty and randomness in statistical analysis.

🔢 Discrete Random Variables

  • Definition: Variables with countable, finite outcomes
  • Examples: Number of heads in coin tosses, dice roll results
  • Probability Function: Probability Mass Function (PMF)
  • Key Properties: Sum of probabilities equals 1
  • R Functions: dbinom(), dpois(), dgeom()
PMF Formula: \[ P(X = x_i) = p_i \] Properties: \[ \sum_{i=1}^{\infty} p_i = 1 \]

📈 Continuous Random Variables

  • Definition: Variables with uncountable, infinite outcomes
  • Examples: Height, weight, temperature, time
  • Probability Function: Probability Density Function (PDF)
  • Key Properties: Area under curve equals 1
  • R Functions: dnorm(), dexp(), dunif()
PDF Formula: \[ f(x) \geq 0 \] Properties: \[ \int_{-\infty}^{\infty} f(x) dx = 1 \]

📊 Cumulative Distribution Function

  • Definition: Probability that X ≤ x
  • Universal: Works for both discrete and continuous
  • Properties: Non-decreasing, right-continuous
  • Range: 0 ≤ F(x) ≤ 1
  • R Functions: pbinom(), pnorm(), ppois()
CDF Formula: \[ F(x) = P(X \leq x) \] Discrete: \[ F(x) = \sum_{x_i \leq x} P(X = x_i) \] Continuous: \[ F(x) = \int_{-\infty}^{x} f(t) dt \]

🎯 Expectation and Variance

  • Expected Value (μ): Average/mean value
  • Variance (σ²): Measure of spread/dispersion
  • Standard Deviation: Square root of variance
  • Linearity: E[aX + b] = aE[X] + b
  • R Calculations: mean(), var(), sd()
Expectation: \[ E[X] = \sum x_i p_i \quad (\text{discrete}) \] \[ E[X] = \int x f(x) dx \quad (\text{continuous}) \] Variance: \[ Var(X) = E[(X - μ)^2] = E[X^2] - (E[X])^2 \]

Detailed Comparison: Discrete vs Continuous

Characteristic Discrete Random Variable Continuous Random Variable
Possible Values Countable, finite or countably infinite Uncountable, infinite within interval
Probability at Point P(X = x) can be positive P(X = x) = 0 for all x
Probability Function Probability Mass Function (PMF) Probability Density Function (PDF)
Total Probability ∑ P(X = x) = 1 ∫ f(x) dx = 1
Cumulative Distribution F(x) = ∑ P(X ≤ x) F(x) = ∫ f(t) dt from -∞ to x
Example Distributions Binomial, Poisson, Geometric Normal, Exponential, Uniform
Real-world Examples Number of customers, defect counts Height, temperature, waiting time

R Implementation of Random Variables

Cumulative Distribution Function (CDF)

Cumulative Distribution Function (CDF):
The CDF gives the probability that a random variable \(X\) will take a value less than or equal to \(x\).

Mathematical Definition: \[ F(x)=P(X≤x)=\int_{-\infty}^{x} f(t)dt \]
Properties of CDF:
\(F(x)\) is non-decreasing
\(\lim_{x \to -\infty} F(x) = 0\)
\(\lim_{x \to \infty} F(x) = 1\)
\(P(a < X \leq b) = F(b) - F(a)\)

Example Problem: Continuous Distribution

Problem: The time required to complete a standardized test follows a normal distribution with mean 120 minutes and standard deviation 15 minutes. What is the probability that a randomly selected student will complete the test in less than 100 minutes?

Solution Step by Step:

Step 1: Identify the parameters
\(\mu = 120\) minutes, \(\sigma = 15\) minutes, \(x = 100\) minutes
Step 2: Calculate the z-score
\(z = \frac{x - \mu}{\sigma} = \frac{100 - 120}{15} = \frac{-20}{15} = -1.33\)
Step 3: Find probability using standard normal distribution
\(P(X < 100) = P(Z < -1.33)\)
Step 4: Look up the value in z-table or use statistical software
\(P(Z < -1.33) = 0.0918\)
Step 5: Interpret the result
The probability that a student completes the test in less than 100 minutes is 9.18%.

Visual Representation:

Book Reference

Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists. Pearson Education.
Chapter 4: Continuous Random Variables and Probability Distributions
Pages: 110-156
Relevance: This textbook provides comprehensive coverage of continuous probability distributions with engineering applications.

Sampling Distribution of the Sample Proportion

Video: Sampling Distribution of the Sample Proportion
In many practical applications, we are concerned with proportions rather than means. Whether studying voting patterns, quality control in manufacturing, medical treatment success rates, or customer satisfaction levels, proportions provide valuable insights about categorical data.

Sampling Distribution of \(\hat{p}\)

Sampling Distribution of Sample Proportion (\(\hat{p}\)):
\[ \hat{p} = \frac{X}{n} \]
where: \(X\) = number of successes in the sample
\(n\) = sample size

\(E(\hat{p}) = p\)

Explanation:
This formula states that the expected value of the sample proportion estimator \(\hat{p}\) is equal to the true population proportion \(p\). This means that \(\hat{p}\) is an unbiased estimator of the population proportion.

\(\hat{p}\) = sample proportion

\(p\) = population proportion

\(E(\hat{p})\) = expected value (mean) of the sample proportion, which equals the population proportion

Variance of Sampling Distribution: \(\mathrm{Var}(\hat{p}) = \frac{p(1-p)}{n}\)

Standard Error: \[ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}} \]

Normal Approximation Conditions: For the sampling distribution to be approximately normal:

\(np \ge 10\)

\(n(1-p) \ge 10\)

Normal Approximation Formula: \[ \hat{p} \sim N\left(p,\; \frac{p(1-p)}{n}\right) \]

Z-score for Proportion: \[ z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \]

Understanding Sample Proportion Distribution

Why Sample Proportion Distribution Matters:

1. From Binomial to Normal:
The sample proportion \(\hat{p}\) is essentially a binomial random variable (X) divided by n. As n increases, the binomial distribution approaches a normal distribution.
2. Central Limit Theorem for Proportions:
For large sample sizes, the sampling distribution of \(\hat{p}\) is approximately normal, regardless of the shape of the population distribution (as long as np ≥ 10 and n(1-p) ≥ 10).
3. Practical Applications:
• Election polls and political surveys
• Quality control and defect rates
• Medical trial success rates
• Market research and customer satisfaction
• A/B testing in web development

Visualizing Sampling Distribution of Proportions

Effect of Sample Size on Proportion Distribution

Comparison: Sample Mean vs Sample Proportion

Aspect Sample Mean (\(\bar{X}\)) Sample Proportion (\(\hat{p}\))
Data Type Continuous numerical data Categorical/binary data
Formula \(\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\) \(\hat{p} = \frac{X}{n}\)
Expected Value \(E(\bar{X}) = \mu\) \(E(\hat{p}) = p\)
Standard Error \(SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}\) \(SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}\)
Distribution \(N(\mu, \frac{\sigma^2}{n})\) for large n \(N(p, \frac{p(1-p)}{n})\) when np≥10, n(1-p)≥10
Applications Average height, test scores, income Voting percentages, defect rates, success rates

Important Properties and Notes

Key Properties of Sample Proportion Distribution:

1. Bias and Consistency:
\(\hat{p}\) is an unbiased estimator of p: \(E(\hat{p}) = p\)
As n increases, \(\hat{p}\) converges to p (consistent estimator)
2. Maximum Variance:
The variance \(p(1-p)\) is maximized when p = 0.5
This means proportions near 0.5 have the largest standard errors
3. Finite Population Correction:
When sampling without replacement from a finite population of size N:
\[ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n} \times \frac{N-n}{N-1}} \]

Reference Books

1. Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists. Pearson Education.
Chapter: 7 - Sampling Distributions
Pages: 245-280
Relevance: Comprehensive coverage of sampling distributions including proportion distributions.

2. Devore, J.L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.

Chapter: 6 - Statistics and Sampling Distributions

Pages: 270-310

Relevance: Practical applications of sampling distributions in engineering contexts.

3. Montgomery, D.C., & Runger, G.C. (2018). Applied Statistics and Probability for Engineers. Wiley.

Chapter: 7 - Point Estimation of Parameters and Sampling Distributions

Pages: 255-290

Relevance: Engineering-focused approach to proportion estimation and sampling theory.

Central Limit Theorem - Complete Summary

Based on Video: The Central Limit Theorem Explained

Central Limit Theorem: Foundation of Statistical Inference

What is the Central Limit Theorem (CLT)?
The Central Limit Theorem is one of the most important concepts in statistics. It states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This theorem forms the foundation for statistical inference, allowing us to make probabilistic statements about population parameters.

🎯 Core Statement

  • Definition: Sample means approach normal distribution
  • Requirements: Independent, identically distributed samples
  • Sample Size: n ≥ 30 generally sufficient
  • Key Insight: Works for ANY population distribution
  • Mathematical Notation: \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\)
CLT Formula:
\[ \bar{X} \xrightarrow{d} N\left(\mu, \frac{\sigma^2}{n}\right) \] \[ \text{As } n \to \infty, \quad \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) \]

📊 Key Properties

  • Mean Preservation: \(E(\bar{X}) = \mu\)
  • Variance Reduction: \(Var(\bar{X}) = \frac{\sigma^2}{n}\)
  • Standard Error: \(SE = \frac{\sigma}{\sqrt{n}}\)
  • Sample Size Effect: Larger n → smaller SE
  • Distribution Shape: Becomes more normal with larger n
Important Relationships:
\[ E(\bar{X}) = \mu \] \[ Var(\bar{X}) = \frac{\sigma^2}{n} \] \[ SE(\bar{X}) = \frac{\sigma}{\sqrt{n}} \]

🔢 Conditions & Requirements

  • Independence: Samples must be independent
  • Random Sampling: Samples randomly selected
  • Sample Size: n ≥ 30 for most distributions
  • Finite Variance: Population variance must be finite
  • Identical Distribution: Samples from same population
Sample Size Guidelines:
• Normal population: Any n works
• Slightly skewed: n ≥ 15
• Moderately skewed: n ≥ 30
• Highly skewed: n ≥ 50+

📈 Practical Applications

  • Confidence Intervals: Estimating population parameters
  • Hypothesis Testing: Testing claims about means
  • Quality Control: Process monitoring and improvement
  • Survey Analysis: Polling and market research
  • Medical Research: Clinical trial analysis
Z-score Formula:
\[ Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \] Confidence Interval:
\[ \bar{X} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]

R Implementation: CLT Demonstration

💻 Central Limit Theorem Simulation in R
## === CENTRAL LIMIT THEOREM DEMONSTRATION ===
## Creating population distributions...
## Running CLT simulations...
## 
## === SIMULATION RESULTS (n = 30 , samples = 1000 ) ===
## Exponential Distribution:
##   Population mean (μ): 1.004 
##   Population SD (σ): 1 
##   Theoretical SE: 0.183 
##   Empirical mean of sample means: 1.003 
##   Empirical SE: 0.18 
##   Difference in means: 0 
##   Ratio SE(empirical)/SE(theoretical): 0.984 
## 
## Uniform Distribution:
##   Population mean (μ): 4.998 
##   Population SD (σ): 2.894 
##   Theoretical SE: 0.528 
##   Empirical mean of sample means: 5 
##   Empirical SE: 0.514 
##   Difference in means: 0.002 
##   Ratio SE(empirical)/SE(theoretical): 0.973 
## 
## Bimodal Distribution:
##   Population mean (μ): 49.936 
##   Population SD (σ): 20.735 
##   Theoretical SE: 3.786 
##   Empirical mean of sample means: 49.836 
##   Empirical SE: 3.89 
##   Difference in means: 0.1 
##   Ratio SE(empirical)/SE(theoretical): 1.028
## Creating visualizations...

## TableGrob (5 x 2) "arrange": 8 grobs
##   z     cells    name                grob
## 1 1 (2-2,1-1) arrange      gtable[layout]
## 2 2 (2-2,2-2) arrange      gtable[layout]
## 3 3 (3-3,1-1) arrange      gtable[layout]
## 4 4 (3-3,2-2) arrange      gtable[layout]
## 5 5 (4-4,1-1) arrange      gtable[layout]
## 6 6 (4-4,2-2) arrange      gtable[layout]
## 7 7 (1-1,1-2) arrange text[GRID.text.480]
## 8 8 (5-5,1-2) arrange text[GRID.text.481]
## 
## === INTERPRETATION ===
## Berdasarkan simulasi Central Limit Theorem (CLT) di atas, dapat ditarik beberapa kesimpulan penting:
## 1. BENTUK DISTRIBUSI:
##    - Distribusi Eksponensial (kiri atas): Sangat miring ke kanan
##    - Distribusi Uniform (tengah kiri): Bentuk persegi yang datar
##    - Distribusi Bimodal (kiri bawah): Memiliki dua puncak yang berbeda
##    - Namun, distribusi rata-rata sampel untuk ketiganya (kolom kanan) mendekati bentuk normal
## 2. KONSISTENSI HASIL:
##    - Rata-rata empiris dari rata-rata sampel mendekati rata-rata populasi (μ)
##    - Standar Error empiris mendekati Standar Error teoritis (σ/√n)
##    - Rasio SE(empirical)/SE(theoretical) mendekati 1 untuk semua distribusi
## 3. IMPLIKASI PRAKTIS:
##    - CLT berlaku meskipun distribusi populasi tidak normal
##    - Dengan ukuran sampel n=30, distribusi rata-rata sampel sudah cukup mendekati normal
##    - Ini memungkinkan penggunaan inferensi statistik (uji hipotesis, interval kepercayaan)
##    - CLT adalah dasar untuk banyak metode statistik parametrik
## 4. PENTINGNYA UKURAN SAMPEL:
##    - Semakin besar n, distribusi rata-rata sampel semakin mendekati normal
##    - Untuk populasi yang sangat miring, mungkin diperlukan n yang lebih besar
##    - Aturan praktis: n ≥ 30 biasanya cukup untuk menerapkan CLT
## === SIMULATION COMPLETE ===
## 
## Plot telah disimpan sebagai 'clt_simulation_rpubs.png'
## 
## === INSTRUKSI UNTUK RPUBS ===
## 1. Install package jika belum: install.packages(c('ggplot2', 'gridExtra'))
## 2. Copy seluruh kode ini ke RStudio
## 3. Jalankan semua kode (Ctrl + A lalu Ctrl + Enter)
## 4. Plot akan muncul di panel Plot dan tersimpan sebagai file PNG
## 5. Untuk Rpubs: Upload file R ini atau publish langsung dari RStudio

Reference Books:

1. Montgomery, D.C., & Runger, G.C. (2018). Applied Statistics and Probability for Engineers. Wiley.

Chapter: 7 - Point Estimation and Sampling Distributions

Pages: 255-290

Relevance: Excellent engineering-focused explanation of CLT with practical examples.

2. Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists. Pearson Education.

Chapter: 8 - Fundamental Sampling Distributions

Pages: 281-320

Relevance: Comprehensive coverage of sampling distributions including CLT proofs.

3. Devore, J.L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.

Chapter: 6 - Statistics and Sampling Distributions

Pages: 270-310

Relevance: Practical applications of CLT in engineering and scientific contexts.

Central Limit Theorem Summary

Video: The Central Limit Theorem Explained

Central Limit Theorem: Key Concepts

What is CLT?
The Central Limit Theorem states that the sampling distribution of sample means approaches a normal distribution as sample size increases, regardless of population distribution shape. This enables statistical inference about population parameters.

🎯 Core Concept

  • Sample means approach normal distribution
  • Works for any population distribution
  • n ≥ 30 usually sufficient
  • Independent random samples required
  • $$ \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) $$

    📊 Key Properties

    • Mean: \(E(\bar{X}) = \mu\)
    • Variance: \(Var(\bar{X}) = \frac{\sigma^2}{n}\)
    • Standard Error: \(SE = \frac{\sigma}{\sqrt{n}}\)
    • Larger n → smaller SE

    \[ SE = \frac{\sigma}{\sqrt{n}} \]

    CLT Simulation in R

    💻 CLT Demonstration
    library(ggplot2)
    set.seed(123)
    
    # Parameters
    pop_size <- 5000
    n <- 30
    samples <- 1000
    
    # Skewed population
    pop_skewed <- rexp(pop_size, rate = 1)
    sample_means <- numeric(samples)
    
    for(i in 1:samples) {
      sample_means[i] <- mean(sample(pop_skewed, n, replace = TRUE))
    }
    
    # Create plots
    p1 <- ggplot(data.frame(x = pop_skewed), aes(x = x)) +
      geom_histogram(aes(y = ..density..), bins = 40, 
                     fill = "#8D6E63", alpha = 0.6) +
      labs(title = "Population Distribution (Exponential)",
           x = "Value", y = "Density") +
      theme_minimal() +
      theme(
        plot.title = element_text(size = 12),
        plot.background = element_rect(fill = "#F5F1E8"),
        panel.background = element_rect(fill = "#F5F1E8")
      )
    
    p2 <- ggplot(data.frame(x = sample_means), aes(x = x)) +
      geom_histogram(aes(y = ..density..), bins = 40,
                     fill = "#A1887F", alpha = 0.6) +
      stat_function(fun = dnorm, 
                    args = list(mean = mean(sample_means), 
                              sd = sd(sample_means)),
                    color = "#5D4037", size = 0.8) +
      labs(title = paste("Sample Means (n =", n, ")"),
           x = "Sample Mean", y = "Density") +
      theme_minimal() +
      theme(
        plot.title = element_text(size = 12),
        plot.background = element_rect(fill = "#F5F1E8"),
        panel.background = element_rect(fill = "#F5F1E8")
      )
    
    library(gridExtra)
    grid.arrange(p1, p2, ncol = 2,
                 top = grid::textGrob(
                   "Central Limit Theorem Demonstration",
                   gp = grid::gpar(fontsize = 14, fontface = "bold", col = "#5D4037")
                 ))

    1. Standard Error:
    \(SE = \frac{2}{\sqrt{36}} = 0.333\) cm
    2. Probability mean < 99.5cm:
    \(Z = \frac{99.5-100}{0.333} = -1.5\)
    \(P = 0.0668\) (6.68%)
    3. 95% Confidence Interval:
    \(100 \pm 1.96 \times 0.333 = [99.35, 100.65]\) cm
    R Calculation
    mu <- 100
    sigma <- 2
    n <- 36
    
    # Calculations
    se <- sigma / sqrt(n)
    z <- (99.5 - mu) / se
    p_val <- pnorm(z)
    
    ci_lower <- mu - 1.96 * se
    ci_upper <- mu + 1.96 * se
    
    # Soft brown color
    color <- "#8B5E3C"
    
    output <- paste0(
      "<span style='color:", color, "; font-weight:600;'>Standard Error: ", round(se, 3), " cm</span><br>",
      "<span style='color:", color, "; font-weight:600;'>Z-score for 99.5 cm: ", round(z, 3), "</span><br>",
      "<span style='color:", color, "; font-weight:600;'>Probability: ", round(p_val, 4), "</span><br>",
      "<span style='color:", color, "; font-weight:600;'>95% CI: [", 
          round(ci_lower, 2), ", ", round(ci_upper, 2), "] cm</span>"
    )
    Sample Size Standard Error 95% CI Width
    n = 9 0.667 2.61 cm
    n = 36 0.333 1.31 cm
    n = 100 0.200 0.78 cm
    Key Points:

    1. Importance:
    Enables inference without knowing population distribution.
    2. Sample Size:
    Larger n → more normal distribution of means.
    3. Applications:
    Confidence intervals, hypothesis testing, quality control.

    Essential Formulas:

    ## Essential Formulas

    1 1. Standard Error (SE)

    \[ SE = \frac{\sigma}{\sqrt{n}} \]

    2 2. Z-Score

    \[ Z = \frac{\bar{X} - \mu}{SE} \]

    3 3. Confidence Interval (CI)

    \[ \bar{X} \pm z_{\alpha/2} \times SE \]

    Video Takeaways:
    1. CLT enables statistical inference
    2. Works for any population shape
    3. Larger samples give better approximations
    4. Foundation for confidence intervals and hypothesis tests

    Sampling Distribution of the Sample Proportion

    The sampling distribution of the sample proportion is a fundamental concept in inferential statistics that describes how sample proportions vary from sample to sample. It provides the foundation for constructing confidence intervals and conducting hypothesis tests about population proportions.

    Core Concepts of Sample Proportion Distribution

    Sample Proportion Formula:
    \[ \hat{p} = \frac{X}{n} \]
    Where:
    \(X\) = number of successes in the sample
    \(n\) = sample size
    \(\hat{p}\) = sample proportion (pronounced “p-hat”)

    Key Parameters of Sampling Distribution

    Mean of Sampling Distribution:
    \[ \mu_{\hat{p}} = E(\hat{p}) = p \]
    Explanation:
    The expected value of all possible sample proportions equals the population proportion
    This makes \(\hat{p}\) an unbiased estimator of \(p\)
    Variance and Standard Error:
    \[ \sigma^2_{\hat{p}} = \frac{p(1-p)}{n} \] \[ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}} \]
    Important Notes:
    1. Standard error decreases as sample size increases
    2. Maximum variance occurs when \(p = 0.5\)
    3. Variance is smallest when \(p\) is near 0 or 1

    Normal Approximation Conditions

    Central Limit Theorem for Proportions:
    The sampling distribution of \(\hat{p}\) is approximately normal if:
    1. \(np \ge 10\)
      2. \(n(1-p) \ge 10\)
      (These are practical rules of thumb)
    When satisfied:
    \[ \hat{p} \sim N\left(p,\; \sqrt{\frac{p(1-p)}{n}}\right) \]
    Z-score Calculation for Proportions:
    \[ z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \]
    Used for:
    • Finding probabilities for sample proportions
    • Constructing confidence intervals
    • Hypothesis testing about proportions

    Practical Example from Video

    Video Example: Voter Support Survey
    Scenario:
    • Population proportion: \(p = 0.60\) (60% support candidate)
    • Sample size: \(n = 100\)
    • Question: What is \(P(\hat{p} > 0.65)\)?
    Solution Steps:
    1. Check normal approximation: \(np = 60\), \(n(1-p) = 40\)
    2. Calculate standard error: \(SE = \sqrt{\frac{0.6 \times 0.4}{100}} = 0.049\)
    3. Compute z-score: \(z = \frac{0.65 - 0.60}{0.049} = 1.02\)
    4. Find probability: \(P(Z > 1.02) = 0.1539\)
    Conclusion: There’s a 15.39% chance of getting a sample with over 65% support

    Visualizing the Sampling Distribution

    Characteristics of \(\hat{p}\) Distribution
    Shape Changes with Sample Size:
    • Small \(n\): Discrete distribution (binomial-like)
    • Large \(n\): Bell-shaped curve (normal approximation)
    • As \(n\) increases: Distribution becomes more concentrated around \(p\)
    Effect of \(p\) on Distribution:
    • When \(p = 0.5\): Symmetric distribution
    • When \(p\) near 0 or 1: Skewed distribution (needs larger \(n\) for normal approximation)

    Comparison: Mean vs Proportion Sampling

    Aspect Sample Mean (\(\bar{x}\)) Sample Proportion (\(\hat{p}\))
    Population Parameter \(\mu\) (population mean) \(p\) (population proportion)
    Sample Statistic \(\bar{x} = \frac{\sum x_i}{n}\) \(\hat{p} = \frac{X}{n}\)
    Sampling Distribution Mean \(\mu_{\bar{x}} = \mu\) \(\mu_{\hat{p}} = p\)
    Standard Error Formula \(SE = \frac{\sigma}{\sqrt{n}}\) \(SE = \sqrt{\frac{p(1-p)}{n}}\)
    Normality Conditions \(n \ge 30\) (CLT) \(np \ge 10\) and \(n(1-p) \ge 10\)
    Data Type Quantitative (continuous) Categorical (binary: success/failure)
    Distribution Shape Normal for large \(n\) Normal when conditions met

    R Implementation and Simulation

    📊 Simulating Sampling Distribution of \(\hat{p}\)
    ## === SAMPLING DISTRIBUTION SIMULATION RESULTS ===
    ## Population proportion (p): 0.6
    ## Sample size (n): 100
    ## Number of simulations: 10000
    ## Mean of simulated p̂: 0.6006
    ## Theoretical Standard Error: 0.049
    ## Empirical Standard Error: 0.049
    ## === NORMALITY CONDITIONS CHECK ===
    ## np = 60
    ## n(1-p) = 40
    ## ✓ Conditions met: Normal approximation is valid
    ## 
    ## === PROBABILITY CALCULATION ===
    ## P(p̂ > 0.65 ):
    ## Z-score: 1.021
    ## Probability: 0.1537
    ## Percentage: 15.37 %

    Interpretation of Results

    Key Insights from Simulation:
    1. Unbiasedness Confirmation:
    • Simulated mean of \(\hat{p}\) ≈ population \(p\) (0.60)
    • Empirical SE ≈ theoretical SE
    • Demonstrates \(\hat{p}\) is an unbiased estimator
    2. Practical Applications:
    • Quality control: Probability of defect rate exceeding threshold
    • Election forecasting: Chance of candidate getting certain vote percentage
    • Medical studies: Probability of treatment success rate
    3. Decision Making:
    • With 15.39% probability, a sample of 100 could show >65% support
    • Important for interpreting survey/margin of error

    Real-World Applications

    Common Applications in Various Fields:
    1. Political Polling:
      • Estimating candidate support
      • Calculating margin of error
      • Determining sample size needed for desired precision
    2. Quality Control:

      • Monitoring defect rates in manufacturing
      • Setting control limits for proportion defective
      • Determining acceptable quality levels

    3. Medical Research:

      • Estimating treatment success rates
      • Comparing proportions between treatment groups
      • Calculating required sample size for clinical trials

    4. Market Research:

      • Estimating market share
      • Calculating customer satisfaction rates
      • Determining brand preference proportions

    Statistical Formulas for Practice

    Confidence Interval for Proportion:
    \[ \hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \] Margin of Error Formula:

    \[ ME = z_{\alpha/2} \times \sqrt{\frac{p(1-p)}{n}} \]

    Sample Size Determination:

    \[ n = \left( \frac{z_{\alpha/2}}{ME} \right)^2 \, p(1-p) \]

    Note: When \(p\) is unknown, use \(p = 0.5\) for maximum sample size

    Important Considerations

    Critical Points to Remember:
    1. Independence Assumption: Samples must be independent (random sampling)
    2. 10% Condition: Sample should be ≤ 10% of population when sampling without replacement
    3. Success-Failure Condition: Must check \(np ≥ 10\) and \(n(1-p) ≥ 10\) for normal approximation
    4. Continuity Correction: For small samples, consider adding ±0.5/n to improve normal approximation
    When Normal Approximation Fails:
    Alternative Methods:
    1. Exact Binomial Test: Use when \(n\) is small or \(p\) is extreme
    2. Wilson Score Interval: Better for small samples or extreme proportions
    3. Clopper-Pearson Interval: Conservative exact method
    4. Simulation/Bootstrap: Resampling methods for any sample size

    Reference Books and Resources

    Recommended Textbooks:
    1. “Statistics for Business and Economics” by Anderson, Sweeney, and Williams
      • Chapter 7: Sampling and Sampling Distributions
      • Excellent practical examples with business applications
    2. “Introductory Statistics” by Prem S. Mann

      • Chapter 7: Sampling Distributions
      • Clear explanations with step-by-step solutions

    3. “The Practice of Statistics” by Starnes, Tabor, Yates, and Moore

      • Chapter 7: Sampling Distributions
      • Modern approach with emphasis on data analysis

    4. “Statistical Inference” by George Casella and Roger L. Berger

      • Chapter 5: Properties of a Random Sample
      • Theoretical foundation for sampling distributions

    5. “OpenIntro Statistics” by Diez, Barr, and Çetinkaya-Rundel

      • Free online textbook
      • Chapter 4: Foundations for Inference
      • Includes R examples and practice problems

    6. Binomial Distribution Review

      Introduction to Binomial Distribution

      The binomial distribution is one of the most fundamental discrete probability distributions in statistics. It models the number of successes in a fixed number of independent Bernoulli trials, where each trial has only two possible outcomes: success or failure.

      Binomial Distribution Formulas

      Binomial Probability Mass Function:
      \[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

      where: \(n\) = number of trials
      \(k\) = number of successes
      \(p\) = probability of success on each trial
      \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\) = binomial coefficient

      \[ \text{Mean of Binomial Distribution:} \quad \mu = E(X) = n p \]

      Penjelasan singkat:
      \(\mu\) = mean / rata-rata distribusi
      \(E(X)\) = ekspektasi dari variabel acak \(X\)
      \(n\) = jumlah percobaan
      \(p\) = probabilitas sukses pada tiap percobaan

      Variance of Binomial Distribution: \[ \sigma^2 = \operatorname{Var}(X) = np(1-p) \]

      Standard Deviation: \[ \sigma = \sqrt{np(1-p)} \]

      Conditions for Binomial Distribution:
      Fixed number of trials \((n)\)
      Independent trials
      Two possible outcomes (success/failure)
      Constant probability of success \((p)\)

      Normal Approximation to the Binomial:

      Normal approximation is appropriate if:
      \(np \ge 10\)
      \(n(1-p) \ge 10\)

      \[ \text{Normal Approximation:} \quad X \sim N \big( n p, \, n p (1 - p) \big) \]

      Penjelasan singkat:
      \(X\) = variabel acak binomial
      \(N(\mu,\sigma^2)\) = distribusi normal dengan mean \(\mu\) dan varians \(\sigma^2\)
      \(\mu = np\) = mean dari distribusi binomial
      \(\sigma^2 = np(1-p)\) = varians dari distribusi binomial
      Digunakan ketika \(n\) besar dan \(p\) tidak terlalu dekat dengan 0 atau 1

      Example Problem: Binomial Distribution

      Problem: A multiple-choice test has 20 questions, each with 5 choices. A student guesses randomly on all questions. What is the probability that the student gets exactly 5 questions correct?

      Solution Step by Step:

      Step 1: Identify parameters
      \(n = 20\) (number of questions)
      \(p = \frac{1}{5} = 0.2\) (probability of guessing correctly)
      \(k = 5\) (number of correct answers wanted)

      Step 2: Calculate binomial coefficient
      \(\binom{20}{5} = \frac{20!}{5!15!} = \frac{20 \times 19 \times 18 \times 17 \times 16}{5 \times 4 \times 3 \times 2 \times 1} = 15504\)

      Step 3: Apply binomial formula
      \(P(X = 5) = \binom{20}{5} (0.2)^5 (0.8)^{15}\)

      Step 4: Calculate each component
      \((0.2)^5 = 0.00032\)
      \((0.8)^{15} = 0.03518\)

      Step 5: Multiply all components
      \(P(X = 5) = 15504 \times 0.00032 \times 0.03518 = 15504 \times 0.0000112576 = 0.1746\)


      Answer: The probability that the student gets exactly 5 questions correct is 17.46%.

      Book Reference

      Ross, S.M. (2014). A First Course in Probability. Pearson.
      Chapter 4: Discrete Random Variables
      Pages: 125-160
      Relevance: Excellent coverage of binomial distribution with numerous examples and applications.

      Summary and Conclusion

      Key Takeaways from Week 11:

      1. Continuous Probability Distributions:
      Understanding PDF and CDF is fundamental for working with continuous variables

      2. Sampling Distributions:
      The distribution of sample statistics forms the basis for statistical inference

      3. Central Limit Theorem:
      One of the most powerful results in statistics, enabling normal approximations

      4. Sampling Distribution of Proportions:
      Essential for categorical data analysis and proportion inference

      5. Binomial Distribution:
      Foundation for understanding proportions and binary outcomes

    