Tugas Week 11 ~ Probability Distribution

Nama: Adinda Adelia Futri

NIM: 52250055

Foto Adinda Adelia Futri

Student Profile

🎓 Student Majoring in Data Science at ITSB
📊 Data Science
📈 Statistics
💻 R Programming

Introduction to Probability Distributions of Continuous Variables

▶️ Watch: Introduction to Probability Distributions
Before exploring specific continuous probability distributions, it is important to understand the fundamental differences between discrete and continuous variables. Unlike discrete variables that take specific, countable values, continuous variables can assume any value within a given range or interval. This distinction fundamentally changes how we calculate and interpret probabilities. In continuous distributions, we focus on the probability of a variable falling within a certain range rather than taking exact values.

Comparison of Discrete and Continuous Distributions

Discrete Distribution

Continuous Distribution

Probability Density Function (PDF)

Probability Density Function (PDF):
The Probability Density Function describes the relative likelihood for a continuous random variable to take on a given value.

Mathematical Definition: \[ f(x)= \frac{d}{dx} F(x) \]
where: \(f(x)\) is the probability density function
\(F(x)\) is the cumulative distribution function
\(x\) is the value of the continuous random variable

Properties of PDF:
\(f(x) \geq 0\) for all \(x\) (Non-negativity)
\(\int_{-\infty}^{\infty} f(x) dx = 1\) (Normalization)
\(P(a \leq X \leq b) = \int_{a}^{b} f(x) dx\) (Area under curve)
\(f(x)=\frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\)

Introduction to Random Variables and Probability Distributions

What are Random Variables?
Random variables are functions that assign numerical values to outcomes of random experiments. They provide a mathematical framework for quantifying uncertainty and randomness in statistical analysis.

🔢 Discrete Random Variables

  • Definition: Variables with countable, finite outcomes
  • Examples: Number of heads in coin tosses, dice roll results
  • Probability Function: Probability Mass Function (PMF)
  • Key Properties: Sum of probabilities equals 1
  • R Functions: dbinom(), dpois(), dgeom()
PMF Formula: \[ P(X = x_i) = p_i \] Properties: \[ \sum_{i=1}^{\infty} p_i = 1 \]

📈 Continuous Random Variables

  • Definition: Variables with uncountable, infinite outcomes
  • Examples: Height, weight, temperature, time
  • Probability Function: Probability Density Function (PDF)
  • Key Properties: Area under curve equals 1
  • R Functions: dnorm(), dexp(), dunif()
PDF Formula: \[ f(x) \geq 0 \] Properties: \[ \int_{-\infty}^{\infty} f(x) dx = 1 \]

📊 Cumulative Distribution Function

  • Definition: Probability that X ≤ x
  • Universal: Works for both discrete and continuous
  • Properties: Non-decreasing, right-continuous
  • Range: 0 ≤ F(x) ≤ 1
  • R Functions: pbinom(), pnorm(), ppois()
CDF Formula: \[ F(x) = P(X \leq x) \] Discrete: \[ F(x) = \sum_{x_i \leq x} P(X = x_i) \] Continuous: \[ F(x) = \int_{-\infty}^{x} f(t) dt \]

🎯 Expectation and Variance

  • Expected Value (μ): Average/mean value
  • Variance (σ²): Measure of spread/dispersion
  • Standard Deviation: Square root of variance
  • Linearity: E[aX + b] = aE[X] + b
  • R Calculations: mean(), var(), sd()
Expectation: \[ E[X] = \sum x_i p_i \quad (\text{discrete}) \] \[ E[X] = \int x f(x) dx \quad (\text{continuous}) \] Variance: \[ Var(X) = E[(X - μ)^2] = E[X^2] - (E[X])^2 \]

Detailed Comparison: Discrete vs Continuous

Characteristic Discrete Random Variable Continuous Random Variable
Possible Values Countable, finite or countably infinite Uncountable, infinite within interval
Probability at Point P(X = x) can be positive P(X = x) = 0 for all x
Probability Function Probability Mass Function (PMF) Probability Density Function (PDF)
Total Probability ∑ P(X = x) = 1 ∫ f(x) dx = 1
Cumulative Distribution F(x) = ∑ P(X ≤ x) F(x) = ∫ f(t) dt from -∞ to x
Example Distributions Binomial, Poisson, Geometric Normal, Exponential, Uniform
Real-world Examples Number of customers, defect counts Height, temperature, waiting time

R Implementation of Random Variables

Cumulative Distribution Function (CDF)

Cumulative Distribution Function (CDF):
The CDF gives the probability that a random variable \(X\) will take a value less than or equal to \(x\).

Mathematical Definition: \[ F(x)=P(X≤x)=\int_{-\infty}^{x} f(t)dt \]
Properties of CDF:
\(F(x)\) is non-decreasing
\(\lim_{x \to -\infty} F(x) = 0\)
\(\lim_{x \to \infty} F(x) = 1\)
\(P(a < X \leq b) = F(b) - F(a)\)

Example Problem: Continuous Distribution

Problem: The time required to complete a standardized test follows a normal distribution with mean 120 minutes and standard deviation 15 minutes. What is the probability that a randomly selected student will complete the test in less than 100 minutes?

Solution Step by Step:

Step 1: Identify the parameters
\(\mu = 120\) minutes, \(\sigma = 15\) minutes, \(x = 100\) minutes
Step 2: Calculate the z-score
\(z = \frac{x - \mu}{\sigma} = \frac{100 - 120}{15} = \frac{-20}{15} = -1.33\)
Step 3: Find probability using standard normal distribution
\(P(X < 100) = P(Z < -1.33)\)
Step 4: Look up the value in z-table or use statistical software
\(P(Z < -1.33) = 0.0918\)
Step 5: Interpret the result
The probability that a student completes the test in less than 100 minutes is 9.18%.

Visual Representation:

Book Reference

Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists. Pearson Education.
Chapter 4: Continuous Random Variables and Probability Distributions
Pages: 110-156
Relevance: This textbook provides comprehensive coverage of continuous probability distributions with engineering applications.

Sampling Distribution of the Sample Proportion

Video: Sampling Distribution of the Sample Proportion
In many practical applications, we are concerned with proportions rather than means. Whether studying voting patterns, quality control in manufacturing, medical treatment success rates, or customer satisfaction levels, proportions provide valuable insights about categorical data.

Sampling Distribution of \(\hat{p}\)

Sampling Distribution of Sample Proportion (\(\hat{p}\)):
\[ \hat{p} = \frac{X}{n} \]
where: \(X\) = number of successes in the sample
\(n\) = sample size

\(E(\hat{p}) = p\)

Explanation:
This formula states that the expected value of the sample proportion estimator \(\hat{p}\) is equal to the true population proportion \(p\). This means that \(\hat{p}\) is an unbiased estimator of the population proportion.

\(\hat{p}\) = sample proportion

\(p\) = population proportion

\(E(\hat{p})\) = expected value (mean) of the sample proportion, which equals the population proportion

Variance of Sampling Distribution: \(\mathrm{Var}(\hat{p}) = \frac{p(1-p)}{n}\)

Standard Error: \[ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}} \]

Normal Approximation Conditions: For the sampling distribution to be approximately normal:

\(np \ge 10\)

\(n(1-p) \ge 10\)

Normal Approximation Formula: \[ \hat{p} \sim N\left(p,\; \frac{p(1-p)}{n}\right) \]

Z-score for Proportion: \[ z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \]

Understanding Sample Proportion Distribution

Why Sample Proportion Distribution Matters:

1. From Binomial to Normal:
The sample proportion \(\hat{p}\) is essentially a binomial random variable (X) divided by n. As n increases, the binomial distribution approaches a normal distribution.
2. Central Limit Theorem for Proportions:
For large sample sizes, the sampling distribution of \(\hat{p}\) is approximately normal, regardless of the shape of the population distribution (as long as np ≥ 10 and n(1-p) ≥ 10).
3. Practical Applications:
• Election polls and political surveys
• Quality control and defect rates
• Medical trial success rates
• Market research and customer satisfaction
• A/B testing in web development

Visualizing Sampling Distribution of Proportions

Effect of Sample Size on Proportion Distribution

Comparison: Sample Mean vs Sample Proportion

Aspect Sample Mean (\(\bar{X}\)) Sample Proportion (\(\hat{p}\))
Data Type Continuous numerical data Categorical/binary data
Formula \(\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\) \(\hat{p} = \frac{X}{n}\)
Expected Value \(E(\bar{X}) = \mu\) \(E(\hat{p}) = p\)
Standard Error \(SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}\) \(SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}\)
Distribution \(N(\mu, \frac{\sigma^2}{n})\) for large n \(N(p, \frac{p(1-p)}{n})\) when np≥10, n(1-p)≥10
Applications Average height, test scores, income Voting percentages, defect rates, success rates

Important Properties and Notes

Key Properties of Sample Proportion Distribution:

1. Bias and Consistency:
\(\hat{p}\) is an unbiased estimator of p: \(E(\hat{p}) = p\)
As n increases, \(\hat{p}\) converges to p (consistent estimator)
2. Maximum Variance:
The variance \(p(1-p)\) is maximized when p = 0.5
This means proportions near 0.5 have the largest standard errors
3. Finite Population Correction:
When sampling without replacement from a finite population of size N:
\[ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n} \times \frac{N-n}{N-1}} \]

Reference Books

1. Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists. Pearson Education.
Chapter: 7 - Sampling Distributions
Pages: 245-280
Relevance: Comprehensive coverage of sampling distributions including proportion distributions.

2. Devore, J.L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.

Chapter: 6 - Statistics and Sampling Distributions

Pages: 270-310

Relevance: Practical applications of sampling distributions in engineering contexts.

3. Montgomery, D.C., & Runger, G.C. (2018). Applied Statistics and Probability for Engineers. Wiley.

Chapter: 7 - Point Estimation of Parameters and Sampling Distributions

Pages: 255-290

Relevance: Engineering-focused approach to proportion estimation and sampling theory.

Central Limit Theorem - Complete Summary

Based on Video: The Central Limit Theorem Explained

Central Limit Theorem: Foundation of Statistical Inference

What is the Central Limit Theorem (CLT)?
The Central Limit Theorem is one of the most important concepts in statistics. It states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This theorem forms the foundation for statistical inference, allowing us to make probabilistic statements about population parameters.

🎯 Core Statement

  • Definition: Sample means approach normal distribution
  • Requirements: Independent, identically distributed samples
  • Sample Size: n ≥ 30 generally sufficient
  • Key Insight: Works for ANY population distribution
  • Mathematical Notation: \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\)
CLT Formula:
\[ \bar{X} \xrightarrow{d} N\left(\mu, \frac{\sigma^2}{n}\right) \] \[ \text{As } n \to \infty, \quad \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) \]

📊 Key Properties

  • Mean Preservation: \(E(\bar{X}) = \mu\)
  • Variance Reduction: \(Var(\bar{X}) = \frac{\sigma^2}{n}\)
  • Standard Error: \(SE = \frac{\sigma}{\sqrt{n}}\)
  • Sample Size Effect: Larger n → smaller SE
  • Distribution Shape: Becomes more normal with larger n
Important Relationships:
\[ E(\bar{X}) = \mu \] \[ Var(\bar{X}) = \frac{\sigma^2}{n} \] \[ SE(\bar{X}) = \frac{\sigma}{\sqrt{n}} \]

🔢 Conditions & Requirements

  • Independence: Samples must be independent
  • Random Sampling: Samples randomly selected
  • Sample Size: n ≥ 30 for most distributions
  • Finite Variance: Population variance must be finite
  • Identical Distribution: Samples from same population
Sample Size Guidelines:
• Normal population: Any n works
• Slightly skewed: n ≥ 15
• Moderately skewed: n ≥ 30
• Highly skewed: n ≥ 50+

📈 Practical Applications

  • Confidence Intervals: Estimating population parameters
  • Hypothesis Testing: Testing claims about means
  • Quality Control: Process monitoring and improvement
  • Survey Analysis: Polling and market research
  • Medical Research: Clinical trial analysis
Z-score Formula:
\[ Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \] Confidence Interval:
\[ \bar{X} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]

R Implementation: CLT Demonstration

💻 Central Limit Theorem Simulation in R
## === CENTRAL LIMIT THEOREM DEMONSTRATION ===
## Creating population distributions...
## Running CLT simulations...
## 
## === SIMULATION RESULTS (n = 30 , samples = 1000 ) ===
## Exponential Distribution:
##   Population mean (μ): 1.004 
##   Population SD (σ): 1 
##   Theoretical SE: 0.183 
##   Empirical mean of sample means: 1.003 
##   Empirical SE: 0.18 
##   Difference in means: 0 
##   Ratio SE(empirical)/SE(theoretical): 0.984 
## 
## Uniform Distribution:
##   Population mean (μ): 4.998 
##   Population SD (σ): 2.894 
##   Theoretical SE: 0.528 
##   Empirical mean of sample means: 5 
##   Empirical SE: 0.514 
##   Difference in means: 0.002 
##   Ratio SE(empirical)/SE(theoretical): 0.973 
## 
## Bimodal Distribution:
##   Population mean (μ): 49.936 
##   Population SD (σ): 20.735 
##   Theoretical SE: 3.786 
##   Empirical mean of sample means: 49.836 
##   Empirical SE: 3.89 
##   Difference in means: 0.1 
##   Ratio SE(empirical)/SE(theoretical): 1.028
## Creating visualizations...

## TableGrob (5 x 2) "arrange": 8 grobs
##   z     cells    name                grob
## 1 1 (2-2,1-1) arrange      gtable[layout]
## 2 2 (2-2,2-2) arrange      gtable[layout]
## 3 3 (3-3,1-1) arrange      gtable[layout]
## 4 4 (3-3,2-2) arrange      gtable[layout]
## 5 5 (4-4,1-1) arrange      gtable[layout]
## 6 6 (4-4,2-2) arrange      gtable[layout]
## 7 7 (1-1,1-2) arrange text[GRID.text.480]
## 8 8 (5-5,1-2) arrange text[GRID.text.481]
## 
## === INTERPRETATION ===
## Berdasarkan simulasi Central Limit Theorem (CLT) di atas, dapat ditarik beberapa kesimpulan penting:
## 1. BENTUK DISTRIBUSI:
##    - Distribusi Eksponensial (kiri atas): Sangat miring ke kanan
##    - Distribusi Uniform (tengah kiri): Bentuk persegi yang datar
##    - Distribusi Bimodal (kiri bawah): Memiliki dua puncak yang berbeda
##    - Namun, distribusi rata-rata sampel untuk ketiganya (kolom kanan) mendekati bentuk normal
## 2. KONSISTENSI HASIL:
##    - Rata-rata empiris dari rata-rata sampel mendekati rata-rata populasi (μ)
##    - Standar Error empiris mendekati Standar Error teoritis (σ/√n)
##    - Rasio SE(empirical)/SE(theoretical) mendekati 1 untuk semua distribusi
## 3. IMPLIKASI PRAKTIS:
##    - CLT berlaku meskipun distribusi populasi tidak normal
##    - Dengan ukuran sampel n=30, distribusi rata-rata sampel sudah cukup mendekati normal
##    - Ini memungkinkan penggunaan inferensi statistik (uji hipotesis, interval kepercayaan)
##    - CLT adalah dasar untuk banyak metode statistik parametrik
## 4. PENTINGNYA UKURAN SAMPEL:
##    - Semakin besar n, distribusi rata-rata sampel semakin mendekati normal
##    - Untuk populasi yang sangat miring, mungkin diperlukan n yang lebih besar
##    - Aturan praktis: n ≥ 30 biasanya cukup untuk menerapkan CLT
## === SIMULATION COMPLETE ===
## 
## Plot telah disimpan sebagai 'clt_simulation_rpubs.png'
## 
## === INSTRUKSI UNTUK RPUBS ===
## 1. Install package jika belum: install.packages(c('ggplot2', 'gridExtra'))
## 2. Copy seluruh kode ini ke RStudio
## 3. Jalankan semua kode (Ctrl + A lalu Ctrl + Enter)
## 4. Plot akan muncul di panel Plot dan tersimpan sebagai file PNG
## 5. Untuk Rpubs: Upload file R ini atau publish langsung dari RStudio

Reference Books:

1. Montgomery, D.C., & Runger, G.C. (2018). Applied Statistics and Probability for Engineers. Wiley.

Chapter: 7 - Point Estimation and Sampling Distributions

Pages: 255-290

Relevance: Excellent engineering-focused explanation of CLT with practical examples.

2. Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists. Pearson Education.

Chapter: 8 - Fundamental Sampling Distributions

Pages: 281-320

Relevance: Comprehensive coverage of sampling distributions including CLT proofs.

3. Devore, J.L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.

Chapter: 6 - Statistics and Sampling Distributions

Pages: 270-310

Relevance: Practical applications of CLT in engineering and scientific contexts.

Central Limit Theorem Summary

Video: The Central Limit Theorem Explained

Central Limit Theorem: Key Concepts

What is CLT?
The Central Limit Theorem states that the sampling distribution of sample means approaches a normal distribution as sample size increases, regardless of population distribution shape. This enables statistical inference about population parameters.

🎯 Core Concept

  • Sample means approach normal distribution
  • Works for any population distribution
  • n ≥ 30 usually sufficient
  • Independent random samples required
  • $$ \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) $$

    📊 Key Properties

    • Mean: \(E(\bar{X}) = \mu\)
    • Variance: \(Var(\bar{X}) = \frac{\sigma^2}{n}\)
    • Standard Error: \(SE = \frac{\sigma}{\sqrt{n}}\)
    • Larger n → smaller SE

    \[ SE = \frac{\sigma}{\sqrt{n}} \]

    CLT Simulation in R

    💻 CLT Demonstration
    library(ggplot2)
    set.seed(123)
    
    # Parameters
    pop_size <- 5000
    n <- 30
    samples <- 1000
    
    # Skewed population
    pop_skewed <- rexp(pop_size, rate = 1)
    sample_means <- numeric(samples)
    
    for(i in 1:samples) {
      sample_means[i] <- mean(sample(pop_skewed, n, replace = TRUE))
    }
    
    # Create plots
    p1 <- ggplot(data.frame(x = pop_skewed), aes(x = x)) +
      geom_histogram(aes(y = ..density..), bins = 40, 
                     fill = "#8D6E63", alpha = 0.6) +
      labs(title = "Population Distribution (Exponential)",
           x = "Value", y = "Density") +
      theme_minimal() +
      theme(
        plot.title = element_text(size = 12),
        plot.background = element_rect(fill = "#F5F1E8"),
        panel.background = element_rect(fill = "#F5F1E8")
      )
    
    p2 <- ggplot(data.frame(x = sample_means), aes(x = x)) +
      geom_histogram(aes(y = ..density..), bins = 40,
                     fill = "#A1887F", alpha = 0.6) +
      stat_function(fun = dnorm, 
                    args = list(mean = mean(sample_means), 
                              sd = sd(sample_means)),
                    color = "#5D4037", size = 0.8) +
      labs(title = paste("Sample Means (n =", n, ")"),
           x = "Sample Mean", y = "Density") +
      theme_minimal() +
      theme(
        plot.title = element_text(size = 12),
        plot.background = element_rect(fill = "#F5F1E8"),
        panel.background = element_rect(fill = "#F5F1E8")
      )
    
    library(gridExtra)
    grid.arrange(p1, p2, ncol = 2,
                 top = grid::textGrob(
                   "Central Limit Theorem Demonstration",
                   gp = grid::gpar(fontsize = 14, fontface = "bold", col = "#5D4037")
                 ))

    1. Standard Error:
    \(SE = \frac{2}{\sqrt{36}} = 0.333\) cm
    2. Probability mean < 99.5cm:
    \(Z = \frac{99.5-100}{0.333} = -1.5\)
    \(P = 0.0668\) (6.68%)
    3. 95% Confidence Interval:
    \(100 \pm 1.96 \times 0.333 = [99.35, 100.65]\) cm
    R Calculation
    mu <- 100
    sigma <- 2
    n <- 36
    
    # Calculations
    se <- sigma / sqrt(n)
    z <- (99.5 - mu) / se
    p_val <- pnorm(z)
    
    ci_lower <- mu - 1.96 * se
    ci_upper <- mu + 1.96 * se
    
    # Soft brown color
    color <- "#8B5E3C"
    
    output <- paste0(
      "<span style='color:", color, "; font-weight:600;'>Standard Error: ", round(se, 3), " cm</span><br>",
      "<span style='color:", color, "; font-weight:600;'>Z-score for 99.5 cm: ", round(z, 3), "</span><br>",
      "<span style='color:", color, "; font-weight:600;'>Probability: ", round(p_val, 4), "</span><br>",
      "<span style='color:", color, "; font-weight:600;'>95% CI: [", 
          round(ci_lower, 2), ", ", round(ci_upper, 2), "] cm</span>"
    )
    Sample Size Standard Error 95% CI Width
    n = 9 0.667 2.61 cm
    n = 36 0.333 1.31 cm
    n = 100 0.200 0.78 cm
    Key Points:

    1. Importance:
    Enables inference without knowing population distribution.
    2. Sample Size:
    Larger n → more normal distribution of means.
    3. Applications:
    Confidence intervals, hypothesis testing, quality control.

    Essential Formulas:

    ## Essential Formulas

    1 1. Standard Error (SE)

    \[ SE = \frac{\sigma}{\sqrt{n}} \]

    2 2. Z-Score

    \[ Z = \frac{\bar{X} - \mu}{SE} \]

    3 3. Confidence Interval (CI)

    \[ \bar{X} \pm z_{\alpha/2} \times SE \]

    Video Takeaways:
    1. CLT enables statistical inference
    2. Works for any population shape
    3. Larger samples give better approximations
    4. Foundation for confidence intervals and hypothesis tests

    Sampling Distribution of the Sample Proportion

    The sampling distribution of the sample proportion is a fundamental concept in inferential statistics that describes how sample proportions vary from sample to sample. It provides the foundation for constructing confidence intervals and conducting hypothesis tests about population proportions.

    Core Concepts of Sample Proportion Distribution

    Sample Proportion Formula:
    \[ \hat{p} = \frac{X}{n} \]
    Where:
    \(X\) = number of successes in the sample
    \(n\) = sample size
    \(\hat{p}\) = sample proportion (pronounced “p-hat”)

    Key Parameters of Sampling Distribution

    Mean of Sampling Distribution:
    \[ \mu_{\hat{p}} = E(\hat{p}) = p \]
    Explanation:
    The expected value of all possible sample proportions equals the population proportion
    This makes \(\hat{p}\) an unbiased estimator of \(p\)
    Variance and Standard Error:
    \[ \sigma^2_{\hat{p}} = \frac{p(1-p)}{n} \] \[ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}} \]
    Important Notes:
    1. Standard error decreases as sample size increases
    2. Maximum variance occurs when \(p = 0.5\)
    3. Variance is smallest when \(p\) is near 0 or 1

    Normal Approximation Conditions

    Central Limit Theorem for Proportions:
    The sampling distribution of \(\hat{p}\) is approximately normal if:
    1. \(np \ge 10\)
      2. \(n(1-p) \ge 10\)
      (These are practical rules of thumb)
    When satisfied:
    \[ \hat{p} \sim N\left(p,\; \sqrt{\frac{p(1-p)}{n}}\right) \]
    Z-score Calculation for Proportions:
    \[ z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \]
    Used for:
    • Finding probabilities for sample proportions
    • Constructing confidence intervals
    • Hypothesis testing about proportions

    Practical Example from Video

    Video Example: Voter Support Survey
    Scenario:
    • Population proportion: \(p = 0.60\) (60% support candidate)
    • Sample size: \(n = 100\)
    • Question: What is \(P(\hat{p} > 0.65)\)?
    Solution Steps:
    1. Check normal approximation: \(np = 60\), \(n(1-p) = 40\)
    2. Calculate standard error: \(SE = \sqrt{\frac{0.6 \times 0.4}{100}} = 0.049\)
    3. Compute z-score: \(z = \frac{0.65 - 0.60}{0.049} = 1.02\)
    4. Find probability: \(P(Z > 1.02) = 0.1539\)
    Conclusion: There’s a 15.39% chance of getting a sample with over 65% support

    Visualizing the Sampling Distribution

    Characteristics of \(\hat{p}\) Distribution
    Shape Changes with Sample Size:
    • Small \(n\): Discrete distribution (binomial-like)
    • Large \(n\): Bell-shaped curve (normal approximation)
    • As \(n\) increases: Distribution becomes more concentrated around \(p\)
    Effect of \(p\) on Distribution:
    • When \(p = 0.5\): Symmetric distribution
    • When \(p\) near 0 or 1: Skewed distribution (needs larger \(n\) for normal approximation)

    Comparison: Mean vs Proportion Sampling

    Aspect Sample Mean (\(\bar{x}\)) Sample Proportion (\(\hat{p}\))
    Population Parameter \(\mu\) (population mean) \(p\) (population proportion)
    Sample Statistic \(\bar{x} = \frac{\sum x_i}{n}\) \(\hat{p} = \frac{X}{n}\)
    Sampling Distribution Mean \(\mu_{\bar{x}} = \mu\) \(\mu_{\hat{p}} = p\)
    Standard Error Formula \(SE = \frac{\sigma}{\sqrt{n}}\) \(SE = \sqrt{\frac{p(1-p)}{n}}\)
    Normality Conditions \(n \ge 30\) (CLT) \(np \ge 10\) and \(n(1-p) \ge 10\)
    Data Type Quantitative (continuous) Categorical (binary: success/failure)
    Distribution Shape Normal for large \(n\) Normal when conditions met

    R Implementation and Simulation

    📊 Simulating Sampling Distribution of \(\hat{p}\)
    ## === SAMPLING DISTRIBUTION SIMULATION RESULTS ===
    ## Population proportion (p): 0.6
    ## Sample size (n): 100
    ## Number of simulations: 10000
    ## Mean of simulated p̂: 0.6006
    ## Theoretical Standard Error: 0.049
    ## Empirical Standard Error: 0.049
    ## === NORMALITY CONDITIONS CHECK ===
    ## np = 60
    ## n(1-p) = 40
    ## ✓ Conditions met: Normal approximation is valid
    ## 
    ## === PROBABILITY CALCULATION ===
    ## P(p̂ > 0.65 ):
    ## Z-score: 1.021
    ## Probability: 0.1537
    ## Percentage: 15.37 %

    Interpretation of Results

    Key Insights from Simulation:
    1. Unbiasedness Confirmation:
    • Simulated mean of \(\hat{p}\) ≈ population \(p\) (0.60)
    • Empirical SE ≈ theoretical SE
    • Demonstrates \(\hat{p}\) is an unbiased estimator
    2. Practical Applications:
    • Quality control: Probability of defect rate exceeding threshold
    • Election forecasting: Chance of candidate getting certain vote percentage
    • Medical studies: Probability of treatment success rate
    3. Decision Making:
    • With 15.39% probability, a sample of 100 could show >65% support
    • Important for interpreting survey/margin of error

    Real-World Applications

    Common Applications in Various Fields:
    1. Political Polling:
      • Estimating candidate support
      • Calculating margin of error
      • Determining sample size needed for desired precision
    2. Quality Control:

      • Monitoring defect rates in manufacturing
      • Setting control limits for proportion defective
      • Determining acceptable quality levels

    3. Medical Research:

      • Estimating treatment success rates
      • Comparing proportions between treatment groups
      • Calculating required sample size for clinical trials

    4. Market Research:

      • Estimating market share
      • Calculating customer satisfaction rates
      • Determining brand preference proportions

    Statistical Formulas for Practice

    Confidence Interval for Proportion:
    \[ \hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \] Margin of Error Formula:

    \[ ME = z_{\alpha/2} \times \sqrt{\frac{p(1-p)}{n}} \]

    Sample Size Determination:

    \[ n = \left( \frac{z_{\alpha/2}}{ME} \right)^2 \, p(1-p) \]

    Note: When \(p\) is unknown, use \(p = 0.5\) for maximum sample size

    Important Considerations

    Critical Points to Remember:
    1. Independence Assumption: Samples must be independent (random sampling)
    2. 10% Condition: Sample should be ≤ 10% of population when sampling without replacement
    3. Success-Failure Condition: Must check \(np ≥ 10\) and \(n(1-p) ≥ 10\) for normal approximation
    4. Continuity Correction: For small samples, consider adding ±0.5/n to improve normal approximation
    When Normal Approximation Fails:
    Alternative Methods:
    1. Exact Binomial Test: Use when \(n\) is small or \(p\) is extreme
    2. Wilson Score Interval: Better for small samples or extreme proportions
    3. Clopper-Pearson Interval: Conservative exact method
    4. Simulation/Bootstrap: Resampling methods for any sample size

    Reference Books and Resources

    Recommended Textbooks:
    1. “Statistics for Business and Economics” by Anderson, Sweeney, and Williams
      • Chapter 7: Sampling and Sampling Distributions
      • Excellent practical examples with business applications
    2. “Introductory Statistics” by Prem S. Mann

      • Chapter 7: Sampling Distributions
      • Clear explanations with step-by-step solutions

    3. “The Practice of Statistics” by Starnes, Tabor, Yates, and Moore

      • Chapter 7: Sampling Distributions
      • Modern approach with emphasis on data analysis

    4. “Statistical Inference” by George Casella and Roger L. Berger

      • Chapter 5: Properties of a Random Sample
      • Theoretical foundation for sampling distributions

    5. “OpenIntro Statistics” by Diez, Barr, and Çetinkaya-Rundel

      • Free online textbook
      • Chapter 4: Foundations for Inference
      • Includes R examples and practice problems

    6. Binomial Distribution Review

      Introduction to Binomial Distribution

      The binomial distribution is one of the most fundamental discrete probability distributions in statistics. It models the number of successes in a fixed number of independent Bernoulli trials, where each trial has only two possible outcomes: success or failure.

      Binomial Distribution Formulas

      Binomial Probability Mass Function:
      \[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

      where: \(n\) = number of trials
      \(k\) = number of successes
      \(p\) = probability of success on each trial
      \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\) = binomial coefficient

      \[ \text{Mean of Binomial Distribution:} \quad \mu = E(X) = n p \]

      Penjelasan singkat:
      \(\mu\) = mean / rata-rata distribusi
      \(E(X)\) = ekspektasi dari variabel acak \(X\)
      \(n\) = jumlah percobaan
      \(p\) = probabilitas sukses pada tiap percobaan

      Variance of Binomial Distribution: \[ \sigma^2 = \operatorname{Var}(X) = np(1-p) \]

      Standard Deviation: \[ \sigma = \sqrt{np(1-p)} \]

      Conditions for Binomial Distribution:
      Fixed number of trials \((n)\)
      Independent trials
      Two possible outcomes (success/failure)
      Constant probability of success \((p)\)

      Normal Approximation to the Binomial:

      Normal approximation is appropriate if:
      \(np \ge 10\)
      \(n(1-p) \ge 10\)

      \[ \text{Normal Approximation:} \quad X \sim N \big( n p, \, n p (1 - p) \big) \]

      Penjelasan singkat:
      \(X\) = variabel acak binomial
      \(N(\mu,\sigma^2)\) = distribusi normal dengan mean \(\mu\) dan varians \(\sigma^2\)
      \(\mu = np\) = mean dari distribusi binomial
      \(\sigma^2 = np(1-p)\) = varians dari distribusi binomial
      Digunakan ketika \(n\) besar dan \(p\) tidak terlalu dekat dengan 0 atau 1

      Example Problem: Binomial Distribution

      Problem: A multiple-choice test has 20 questions, each with 5 choices. A student guesses randomly on all questions. What is the probability that the student gets exactly 5 questions correct?

      Solution Step by Step:

      Step 1: Identify parameters
      \(n = 20\) (number of questions)
      \(p = \frac{1}{5} = 0.2\) (probability of guessing correctly)
      \(k = 5\) (number of correct answers wanted)

      Step 2: Calculate binomial coefficient
      \(\binom{20}{5} = \frac{20!}{5!15!} = \frac{20 \times 19 \times 18 \times 17 \times 16}{5 \times 4 \times 3 \times 2 \times 1} = 15504\)

      Step 3: Apply binomial formula
      \(P(X = 5) = \binom{20}{5} (0.2)^5 (0.8)^{15}\)

      Step 4: Calculate each component
      \((0.2)^5 = 0.00032\)
      \((0.8)^{15} = 0.03518\)

      Step 5: Multiply all components
      \(P(X = 5) = 15504 \times 0.00032 \times 0.03518 = 15504 \times 0.0000112576 = 0.1746\)


      Answer: The probability that the student gets exactly 5 questions correct is 17.46%.

      Book Reference

      Ross, S.M. (2014). A First Course in Probability. Pearson.
      Chapter 4: Discrete Random Variables
      Pages: 125-160
      Relevance: Excellent coverage of binomial distribution with numerous examples and applications.

      Summary and Conclusion

      Key Takeaways from Week 11:

      1. Continuous Probability Distributions:
      Understanding PDF and CDF is fundamental for working with continuous variables

      2. Sampling Distributions:
      The distribution of sample statistics forms the basis for statistical inference

      3. Central Limit Theorem:
      One of the most powerful results in statistics, enabling normal approximations

      4. Sampling Distribution of Proportions:
      Essential for categorical data analysis and proportion inference

      5. Binomial Distribution:
      Foundation for understanding proportions and binary outcomes

    ---
title: "Probability Distribution" 
subtitle: "Tugas Week 11" 
author: "Adinda Adelia Futri"
date: "`r format(Sys.Date(), '%B %d, %Y')`" 
output:
  html_document:
    theme: cerulean
    highlight: textmate
    toc: false
    toc_float: false
    number_sections: true
    code_folding: show
    code_download: yes
---

<style>
/* ========== RESET DAN GLOBAL STYLES ========== */
* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}

body {
  font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
  line-height: 1.6;
  color: #5D4037;
  background-color: #F5F1E8;
  max-width: 800px;
  margin: 0 auto;
  padding: 15px;
  position: relative;
}

/* ========== TABLE OF CONTENTS ========== */
#toc-container {
  position: fixed;
  left: 20px;
  top: 50%;
  transform: translateY(-50%);
  width: 220px;
  background: linear-gradient(135deg, #8D6E63 0%, #A1887F 100%);
  border-radius: 10px;
  padding: 15px;
  box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);
  z-index: 1000;
  display: none;
}

#toc-container.show {
  display: block;
}

#toc-title {
  color: #FFF8E1;
  font-size: 1.1em;
  margin-bottom: 10px;
  text-align: center;
  border-bottom: 2px solid rgba(255, 255, 255, 0.2);
  padding-bottom: 8px;
}

#toc-list {
  list-style-type: none;
}

#toc-list li {
  margin: 6px 0;
}

#toc-list a {
  color: #FFF8E1;
  text-decoration: none;
  font-size: 0.9em;
  display: block;
  padding: 5px 8px;
  border-radius: 4px;
  transition: all 0.3s ease;
  position: relative;
}

#toc-list a:hover {
  background-color: rgba(255, 255, 255, 0.15);
  transform: translateX(5px);
}

#toc-list a:hover::after {
  content: attr(data-tooltip);
  position: absolute;
  left: 100%;
  top: 50%;
  transform: translateY(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 6px 10px;
  border-radius: 4px;
  font-size: 0.8em;
  white-space: nowrap;
  margin-left: 10px;
  z-index: 1001;
}

.toc-toggle {
  position: fixed;
  left: 20px;
  top: 20px;
  background: linear-gradient(135deg, #8D6E63 0%, #A1887F 100%);
  color: #FFF8E1;
  border: none;
  border-radius: 50%;
  width: 40px;
  height: 40px;
  cursor: pointer;
  z-index: 999;
  display: flex;
  align-items: center;
  justify-content: center;
  box-shadow: 0 3px 8px rgba(0, 0, 0, 0.2);
  transition: all 0.3s ease;
}

.toc-toggle:hover {
  transform: scale(1.1);
  box-shadow: 0 5px 15px rgba(0, 0, 0, 0.3);
}

.toc-toggle:hover::after {
  content: "Show Table of Contents";
  position: absolute;
  left: 100%;
  top: 50%;
  transform: translateY(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 6px 10px;
  border-radius: 4px;
  font-size: 0.8em;
  white-space: nowrap;
  margin-left: 10px;
  z-index: 1001;
}

/* ========== HEADER SECTION ========== */
.header-section {
  text-align: center;
  margin-bottom: 20px;
  padding: 15px;
  background: linear-gradient(135deg, #8D6E63 0%, #A1887F 100%);
  color: #FFF8E1;
  border-radius: 8px;
  box-shadow: 0 3px 5px rgba(0, 0, 0, 0.1);
  position: relative;
  overflow: hidden;
  transition: transform 0.3s ease;
  cursor: pointer;
}

.header-section:hover {
  transform: translateY(-3px);
}

.header-section h1 {
  font-size: 2em;
  margin-bottom: 8px;
  font-weight: 700;
  position: relative;
}

.header-section p {
  font-size: 1em;
  margin: 4px 0;
}

/* Tooltip untuk header section */
.header-section:hover::after {
  content: "Click to copy student info";
  position: absolute;
  bottom: -30px;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 4px;
  font-size: 0.8em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
}

/* ========== VIDEO CONTAINER ========== */
.video-container {
  text-align: center;
  margin: 20px 0;
  padding: 15px;
  background: linear-gradient(to right, #8D6E63, #A1887F);
  border-radius: 8px;
  transition: all 0.3s ease;
  cursor: pointer;
  position: relative;
}

.video-container:hover {
  transform: translateY(-3px);
  box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1);
}

.video-link {
  color: #FFF8E1;
  text-decoration: none;
  font-weight: bold;
  font-size: 1em;
  display: inline-flex;
  align-items: center;
  gap: 8px;
  padding: 10px 20px;
  background-color: rgba(255, 255, 255, 0.1);
  border-radius: 5px;
  transition: background-color 0.3s;
}

.video-link:hover {
  background-color: rgba(255, 255, 255, 0.2);
  text-decoration: none;
}

.video-link i {
  font-size: 1.2em;
}

/* Tooltip untuk video container */
.video-container:hover::after {
  content: "Click to watch video about probability distributions";
  position: absolute;
  bottom: -35px;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 4px;
  font-size: 0.8em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
}

/* ========== PHOTO CONTAINER ========== */
.photo-container {
  text-align: center;
  margin: 20px 0;
  position: relative;
  cursor: pointer;
}

.student-photo {
  width: 150px;
  height: 150px;
  border-radius: 50%;
  object-fit: cover;
  border: 4px solid #D7CCC8;
  box-shadow: 0 3px 10px rgba(0, 0, 0, 0.2);
  transition: all 0.3s ease;
}

.student-photo:hover {
  transform: scale(1.05);
  border-color: #8D6E63;
}

/* Tooltip untuk photo */
.photo-container:hover::after {
  content: "Student Photo - Adinda Adelia Futri";
  position: absolute;
  bottom: -30px;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 4px;
  font-size: 0.8em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
}

/* ========== TYPOGRAPHY ========== */
h1, h2, h3, h4 {
  color: #4E342E;
  margin-top: 20px;
  margin-bottom: 10px;
  text-align: center;
  cursor: help;
  position: relative;
  transition: color 0.3s ease;
}

h1 {
  font-size: 1.8em;
  border-bottom: 2px solid #A1887F;
  padding-bottom: 8px;
  margin-bottom: 20px;
}

h1:hover {
  color: #8D6E63;
}

h2 {
  font-size: 1.5em;
  padding-bottom: 6px;
}

h3 {
  font-size: 1.2em;
}

/* Tooltip untuk headings */
h1:hover::after,
h2:hover::after,
h3:hover::after,
h4:hover::after {
  content: attr(title);
  position: absolute;
  bottom: 100%;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 3px;
  font-size: 0.75em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
}

/* ========== UNIFORM BOX STYLES ========== */
.concept-intro,
.formula-box,
.example-box,
.reference-box,
.step-box,
.distribution-card,
.random-variable-card {
  background-color: #EFEBE9;
  border: 1px solid #D7CCC8;
  border-radius: 6px;
  padding: 15px;
  margin: 15px 0;
  transition: all 0.3s ease;
  position: relative;
  cursor: help;
}

/* Tooltip untuk semua box */
.concept-intro:hover::after,
.formula-box:hover::after,
.example-box:hover::after,
.reference-box:hover::after,
.step-box:hover::after,
.distribution-card:hover::after,
.random-variable-card:hover::after {
  content: attr(data-tooltip);
  position: absolute;
  top: -10px;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 3px;
  font-size: 0.75em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
}

.concept-intro:hover,
.formula-box:hover,
.example-box:hover,
.reference-box:hover,
.step-box:hover,
.distribution-card:hover,
.random-variable-card:hover {
  transform: translateY(-3px);
  box-shadow: 0 5px 15px rgba(0, 0, 0, 0.05);
  border-color: #8D6E63;
}

.concept-intro {
  border-left: 3px solid #8D6E63;
}

.formula-box {
  background-color: #F5EFE6;
  text-align: center;
}

.example-box {
  background-color: #F5EDE3;
}

.step-box {
  border-left: 2px solid #8D6E63;
  margin: 8px 0;
}

.reference-box {
  background-color: #EFEBE9;
  border: 1px solid #BCAAA4;
}

/* ========== DISTRIBUTION CARDS ========== */
.distribution-types {
  display: flex;
  gap: 15px;
  margin: 15px 0;
}

.distribution-card {
  flex: 1;
  min-height: 200px;
}

.distribution-card h3 {
  color: #5D4037;
  text-align: center;
  margin-bottom: 10px;
}

.distribution-points {
  list-style-type: none;
  padding-left: 10px;
}

.distribution-points li {
  margin: 5px 0;
  padding-left: 15px;
  position: relative;
}

.distribution-points li:before {
  content: "•";
  color: #8D6E63;
  position: absolute;
  left: 0;
}

/* ========== RANDOM VARIABLE CARDS ========== */
.random-variable-grid {
  display: grid;
  grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
  gap: 15px;
  margin: 15px 0;
}

.random-variable-card {
  min-height: 250px;
}

.random-variable-card h3 {
  color: #5D4037;
  text-align: center;
  margin-bottom: 10px;
  display: flex;
  align-items: center;
  justify-content: center;
  gap: 8px;
}

.card-points {
  list-style-type: none;
  margin-bottom: 10px;
}

.card-points li {
  margin: 5px 0;
  padding-left: 15px;
  position: relative;
}

.card-points li:before {
  content: "›";
  color: #8D6E63;
  position: absolute;
  left: 0;
}

/* ========== STUDENT INFO BADGES ========== */
.info-badges {
  display: flex;
  justify-content: center;
  flex-wrap: wrap;
  gap: 10px;
  margin: 20px 0;
}

.badge {
  background: linear-gradient(135deg, #8D6E63 0%, #A1887F 100%);
  color: #FFF8E1;
  padding: 8px 15px;
  border-radius: 20px;
  font-size: 0.9em;
  display: flex;
  align-items: center;
  gap: 8px;
  cursor: help;
  position: relative;
  transition: transform 0.3s ease;
}

.badge:hover {
  transform: translateY(-3px);
}

/* Tooltip untuk badges */
.badge:hover::after {
  content: attr(data-tooltip);
  position: absolute;
  bottom: 100%;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 3px;
  font-size: 0.75em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
  margin-bottom: 5px;
}

/* ========== TABLES ========== */
table {
  margin: 15px 0;
  box-shadow: 0 1px 2px rgba(0,0,0,0.1);
  border-radius: 6px;
  overflow: hidden;
  cursor: help;
  position: relative;
}

table:hover::after {
  content: "Comparison table between discrete and continuous variables";
  position: absolute;
  top: -30px;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 3px;
  font-size: 0.75em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
}

th, td {
  padding: 8px 10px;
  text-align: left;
}

th {
  background-color: #8D6E63;
  color: #FFF8E1;
  font-weight: bold;
}

tr:nth-child(even) {
  background-color: #EFEBE9;
}

tr:hover {
  background-color: #F5EDE3;
}

/* ========== FOOTER SECTION ========== */
.footer-section {
  margin-top: 30px;
  padding: 15px;
  background-color: #5D4037;
  color: #FFF8E1;
  border-radius: 8px;
  text-align: center;
  cursor: help;
  position: relative;
}

.footer-section:hover::after {
  content: "End of document - Created for Week 11 Assignment";
  position: absolute;
  top: -30px;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 3px;
  font-size: 0.75em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
}

/* ========== CODE BLOCKS ========== */
pre {
  padding: 15px;
  margin: 15px 0;
  font-size: 0.9em;
  background-color: #3E2723;
  color: #D7CCC8;
  border-radius: 6px;
  border-left: 3px solid #8D6E63;
  cursor: help;
  position: relative;
}

pre:hover::after {
  content: "R code chunk for visualization";
  position: absolute;
  top: -30px;
  left: 50%;
  transform: translateX(-50%);
  background-color: #5D4037;
  color: #FFF8E1;
  padding: 5px 10px;
  border-radius: 3px;
  font-size: 0.75em;
  white-space: nowrap;
  z-index: 1000;
  opacity: 0.9;
}

/* ========== MATH CONTAINER ========== */
.math-container {
  margin: 15px 0;
  padding: 12px;
  background-color: #F5EFE6;
  border-radius: 6px;
  border: 1px solid #D7CCC8;
  text-align: center;
}

/* ========== RESPONSIVE DESIGN ========== */
@media (max-width: 1100px) {
  #toc-container {
    display: none !important;
  }
  
  .toc-toggle {
    display: none !important;
  }
}

/* TAMBAHKAN CSS INI - TIDAK MENGUBAH APAPUN YANG LAIN */

/* Scrollbar untuk #toc-list */
#toc-list {
  max-height: 400px;
  overflow-y: auto;
}

#toc-list::-webkit-scrollbar {
  width: 6px;
}

#toc-list::-webkit-scrollbar-track {
  background: #f1f1f1;
}

#toc-list::-webkit-scrollbar-thumb {
  background: #888;
  border-radius: 3px;
}

#toc-list::-webkit-scrollbar-thumb:hover {
  background: #555;
}

@media (max-width: 768px) {
  body {
    padding: 10px;
  }
  
  .header-section h1 {
    font-size: 1.6em;
  }
  
  h1 {
    font-size: 1.5em;
  }
  
  h2 {
    font-size: 1.3em;
  }
  
  .concept-intro,
  .formula-box,
  .example-box,
  .reference-box,
  .distribution-card,
  .random-variable-card {
    padding: 12px;
    margin: 10px 0;
  }
  
  .distribution-types {
    flex-direction: column;
  }
  
  .random-variable-grid {
    grid-template-columns: 1fr;
  }
  
  .info-badges {
    flex-direction: column;
    align-items: center;
  }
  
  .badge {
    width: 100%;
    max-width: 300px;
    justify-content: center;
  }
}

/* ========== ANIMATIONS ========== */
@keyframes fadeIn {
  from {
    opacity: 0;
    transform: translateY(20px);
  }
  to {
    opacity: 1;
    transform: translateY(0);
  }
}

.concept-intro,
.formula-box,
.example-box,
.reference-box,
.distribution-card,
.random-variable-card {
  animation: fadeIn 0.6s ease-out;
}
</style>

<script>
// JavaScript untuk TOC
document.addEventListener('DOMContentLoaded', function() {
  // Buat tombol toggle TOC
  const tocToggle = document.createElement('button');
  tocToggle.className = 'toc-toggle';
  tocToggle.innerHTML = '📖';
  document.body.appendChild(tocToggle);
  
  // Buat container TOC
  const tocContainer = document.createElement('div');
  tocContainer.id = 'toc-container';
  
  const tocTitle = document.createElement('h3');
  tocTitle.id = 'toc-title';
  tocTitle.textContent = '📚 Table of Contents';
  
  const tocList = document.createElement('ul');
  tocList.id = 'toc-list';
  
  // Kumpulkan semua heading
  const headings = document.querySelectorAll('h1, h2');
  headings.forEach((heading, index) => {
    if (heading.id === '' || !heading.id) {
      heading.id = 'section-' + index;
    }
    
    const listItem = document.createElement('li');
    const link = document.createElement('a');
    link.href = '#' + heading.id;
    link.textContent = heading.textContent;
    
    // Tambah tooltip
    if (heading.title) {
      link.setAttribute('data-tooltip', heading.title);
    }
    
    link.addEventListener('click', (e) => {
      e.preventDefault();
      document.getElementById(heading.id).scrollIntoView({
        behavior: 'smooth'
      });
      tocContainer.classList.remove('show');
    });
    
    listItem.appendChild(link);
    tocList.appendChild(listItem);
  });
  
  tocContainer.appendChild(tocTitle);
  tocContainer.appendChild(tocList);
  document.body.appendChild(tocContainer);
  
  // Toggle TOC
  tocToggle.addEventListener('click', () => {
    tocContainer.classList.toggle('show');
  });
  
  // Tutup TOC ketika klik di luar
  document.addEventListener('click', (e) => {
    if (!tocContainer.contains(e.target) && !tocToggle.contains(e.target)) {
      tocContainer.classList.remove('show');
    }
  });
  
  // Copy student info ketika header diklik
  const headerSection = document.querySelector('.header-section');
  headerSection.addEventListener('click', () => {
    const studentInfo = `Name: Adinda Adelia Futri\nNIM: 202331242\nAssignment: Week 11 Probability Distribution`;
    navigator.clipboard.writeText(studentInfo)
      .then(() => {
        const originalTitle = headerSection.querySelector('h1').textContent;
        headerSection.querySelector('h1').textContent = '✓ Copied to clipboard!';
        setTimeout(() => {
          headerSection.querySelector('h1').textContent = originalTitle;
        }, 2000);
      });
  });
});
</script>

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = FALSE,
  warning = FALSE,
  message = FALSE,
  fig.align = "center",
  fig.width = 7,
  fig.height = 4,
  out.width = "90%"
)

# Load required libraries
library(ggplot2)
library(dplyr)
library(gridExtra)
library(knitr)
library(kableExtra)
library(scales)
library(rmarkdown)
library(prettydoc)
```

<div class="header-section" title="Student Information Header"> <h1>Tugas Week 11 ~ Probability Distribution</h1> <p><strong>Nama: Adinda Adelia Futri</strong></p> <p><strong>NIM: 52250055</strong></p> </div></a> </div><div class="photo-container"> <img src="https://raw.githubusercontent.com/adindaadeliafutri6-gif/adinduy/main/akudinda.jpeg" alt="Foto Adinda Adelia Futri" class="student-photo"> </div><h2 title="Student Profile Information">Student Profile</h2><div class="info-badges"> <div class="badge" data-tooltip="Institut teknologi sains bandung"> <i>🎓</i> <span>Student Majoring in Data Science at ITSB</span> </div> <div class="badge" data-tooltip="Data Science & Analytics"> <i>📊</i> <span>Data Science</span> </div> <div class="badge" data-tooltip="Statistical Analysis"> <i>📈</i> <span>Statistics</span> </div> <div class="badge" data-tooltip="R Programming Language"> <i>💻</i> <span>R Programming</span> </div> </div><h1 title="Introduction to Continuous Probability Distributions">Introduction to Probability Distributions of Continuous Variables</h1></div><!-- VIDEO LINK DIBAWAH JUDUL MATERI PERTAMA --><div class="video-container" data-tooltip="Video introduction to probability distributions"> <a href="https://youtu.be/ZyUzRVa6hCM?si=D0p75cuybvouJE-4" class="video-link" target="_blank"> <i>▶️</i> Watch: Introduction to Probability Distributions </a> </div><div class="concept-intro" data-tooltip="Fundamental concepts about continuous variables"> Before exploring specific continuous probability distributions, it is important to understand the fundamental differences between discrete and continuous variables. Unlike discrete variables that take specific, countable values, continuous variables can assume any value within a given range or interval. This distinction fundamentally changes how we calculate and interpret probabilities. In continuous distributions, we focus on the probability of a variable falling within a certain range rather than taking exact values. </div><h1 title="Comparison of Discrete and Continuous Distributions">Comparison of Discrete and Continuous Distributions</h1><div class="distribution-types"> <div class="distribution-card" data-tooltip="Characteristics of discrete distributions"> <h3>Discrete Distribution</h3> <ul class="distribution-points"> <li>Variables can only take countable, specific values</li> <li>Used for countable data (number of events)</li> <li>Probability calculated at specific points</li> <li>Uses Probability Mass Function (PMF)</li> <li>Examples: number of heads in coin toss, defect counts</li> </ul> </div> <div class="distribution-card" data-tooltip="Characteristics of continuous distributions"> <h3>Continuous Distribution</h3> <ul class="distribution-points"> <li>Variables can take any value within an interval</li> <li>Used for measured data (measurements)</li> <li>Probability calculated as area under the curve</li> <li>Uses Probability Density Function (PDF)</li> <li>Examples: height, time, temperature, weight</li> </ul> </div> </div><h2 title="Probability Density Function">Probability Density Function (PDF)</h2><div class="formula-box" data-tooltip="Mathematical definition of PDF"> <strong>Probability Density Function (PDF):</strong><br> The Probability Density Function describes the relative likelihood for a continuous random variable to take on a given value.<br><br> <strong>Mathematical Definition:</strong> $$ f(x)= \frac{d}{dx} F(x) $$<br> where: $f(x)$ is the probability density function<br> $F(x)$ is the cumulative distribution function<br> $x$ is the value of the continuous random variable<br><br> <strong>Properties of PDF:</strong><br> $f(x) \geq 0$ for all $x$ (Non-negativity)<br> $\int_{-\infty}^{\infty} f(x) dx = 1$ (Normalization)<br> $P(a \leq X \leq b) = \int_{a}^{b} f(x) dx$ (Area under curve)<br> $f(x)=\frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$ </div><h2 title="Random Variables">Introduction to Random Variables and Probability Distributions</h2><div class="concept-intro" data-tooltip="Basic concepts of random variables"> <strong>What are Random Variables?</strong><br> Random variables are functions that assign numerical values to outcomes of random experiments. They provide a mathematical framework for quantifying uncertainty and randomness in statistical analysis. </div><div class="random-variable-grid"> <div class="random-variable-card" data-tooltip="Discrete random variables definition and properties"> <h3><i>🔢</i> Discrete Random Variables</h3> <div class="card-content"> <ul class="card-points"> <li><strong>Definition:</strong> Variables with countable, finite outcomes</li> <li><strong>Examples:</strong> Number of heads in coin tosses, dice roll results</li> <li><strong>Probability Function:</strong> Probability Mass Function (PMF)</li> <li><strong>Key Properties:</strong> Sum of probabilities equals 1</li> <li><strong>R Functions:</strong> dbinom(), dpois(), dgeom()</li> </ul> <div class="formula-box"> <strong>PMF Formula:</strong> $$ P(X = x_i) = p_i $$ <strong>Properties:</strong> $$ \sum_{i=1}^{\infty} p_i = 1 $$ </div> </div> </div> <div class="random-variable-card" data-tooltip="Continuous random variables definition and properties"> <h3><i>📈</i> Continuous Random Variables</h3> <div class="card-content"> <ul class="card-points"> <li><strong>Definition:</strong> Variables with uncountable, infinite outcomes</li> <li><strong>Examples:</strong> Height, weight, temperature, time</li> <li><strong>Probability Function:</strong> Probability Density Function (PDF)</li> <li><strong>Key Properties:</strong> Area under curve equals 1</li> <li><strong>R Functions:</strong> dnorm(), dexp(), dunif()</li> </ul> <div class="formula-box"> <strong>PDF Formula:</strong> $$ f(x) \geq 0 $$ <strong>Properties:</strong> $$ \int_{-\infty}^{\infty} f(x) dx = 1 $$ </div> </div> </div> <div class="random-variable-card" data-tooltip="Cumulative distribution function concepts"> <h3><i>📊</i> Cumulative Distribution Function</h3> <div class="card-content"> <ul class="card-points"> <li><strong>Definition:</strong> Probability that X ≤ x</li> <li><strong>Universal:</strong> Works for both discrete and continuous</li> <li><strong>Properties:</strong> Non-decreasing, right-continuous</li> <li><strong>Range:</strong> 0 ≤ F(x) ≤ 1</li> <li><strong>R Functions:</strong> pbinom(), pnorm(), ppois()</li> </ul> <div class="formula-box"> <strong>CDF Formula:</strong> $$ F(x) = P(X \leq x) $$ <strong>Discrete:</strong> $$ F(x) = \sum_{x_i \leq x} P(X = x_i) $$ <strong>Continuous:</strong> $$ F(x) = \int_{-\infty}^{x} f(t) dt $$ </div> </div> </div> <div class="random-variable-card" data-tooltip="Expectation and variance concepts"> <h3><i>🎯</i> Expectation and Variance</h3> <div class="card-content"> <ul class="card-points"> <li><strong>Expected Value (μ):</strong> Average/mean value</li> <li><strong>Variance (σ²):</strong> Measure of spread/dispersion</li> <li><strong>Standard Deviation:</strong> Square root of variance</li> <li><strong>Linearity:</strong> E[aX + b] = aE[X] + b</li> <li><strong>R Calculations:</strong> mean(), var(), sd()</li> </ul> <div class="formula-box"> <strong>Expectation:</strong> $$ E[X] = \sum x_i p_i \quad (\text{discrete}) $$ $$ E[X] = \int x f(x) dx \quad (\text{continuous}) $$ <strong>Variance:</strong> $$ Var(X) = E[(X - μ)^2] = E[X^2] - (E[X])^2 $$ </div> </div> </div> </div><h3 class="random-variable-subtitle" title="Detailed Comparison">Detailed Comparison: Discrete vs Continuous</h3><table class="comparison-table"> <thead> <tr> <th>Characteristic</th> <th>Discrete Random Variable</th> <th>Continuous Random Variable</th> </tr> </thead> <tbody> <tr> <td><strong>Possible Values</strong></td> <td>Countable, finite or countably infinite</td> <td>Uncountable, infinite within interval</td> </tr> <tr> <td><strong>Probability at Point</strong></td> <td>P(X = x) can be positive</td> <td>P(X = x) = 0 for all x</td> </tr> <tr> <td><strong>Probability Function</strong></td> <td>Probability Mass Function (PMF)</td> <td>Probability Density Function (PDF)</td> </tr> <tr> <td><strong>Total Probability</strong></td> <td>∑ P(X = x) = 1</td> <td>∫ f(x) dx = 1</td> </tr> <tr> <td><strong>Cumulative Distribution</strong></td> <td>F(x) = ∑ P(X ≤ x)</td> <td>F(x) = ∫ f(t) dt from -∞ to x</td> </tr> <tr> <td><strong>Example Distributions</strong></td> <td>Binomial, Poisson, Geometric</td> <td>Normal, Exponential, Uniform</td> </tr> <tr> <td><strong>Real-world Examples</strong></td> <td>Number of customers, defect counts</td> <td>Height, temperature, waiting time</td> </tr> </tbody> </table><h3 class="random-variable-subtitle" title="R Implementation">R Implementation of Random Variables</h3><h3 title="Cumulative Distribution Function">Cumulative Distribution Function (CDF)</h3><div class="formula-box" data-tooltip="Mathematical definition of CDF"> <strong>Cumulative Distribution Function (CDF):</strong><br> The CDF gives the probability that a random variable $X$ will take a value less than or equal to $x$.<br><br> <strong>Mathematical Definition:</strong> $$ F(x)=P(X≤x)=\int_{-\infty}^{x} f(t)dt $$<br> <strong>Properties of CDF:</strong><br> $F(x)$ is non-decreasing<br> $\lim_{x \to -\infty} F(x) = 0$<br> $\lim_{x \to \infty} F(x) = 1$<br> $P(a < X \leq b) = F(b) - F(a)$ </div><h2 title="Example Problem">Example Problem: Continuous Distribution</h2><div class="example-box" data-tooltip="Step-by-step solution for normal distribution problem"> <strong>Problem:</strong> The time required to complete a standardized test follows a normal distribution with mean 120 minutes and standard deviation 15 minutes. What is the probability that a randomly selected student will complete the test in less than 100 minutes?<br><br>
<strong>Solution Step by Step:</strong>

<div class="step-box" data-tooltip="Step 1: Identify parameters"> <strong>Step 1:</strong> Identify the parameters<br> $\mu = 120$ minutes, $\sigma = 15$ minutes, $x = 100$ minutes </div> <div class="step-box" data-tooltip="Step 2: Calculate z-score"> <strong>Step 2:</strong> Calculate the z-score<br> $z = \frac{x - \mu}{\sigma} = \frac{100 - 120}{15} = \frac{-20}{15} = -1.33$ </div> <div class="step-box" data-tooltip="Step 3: Find probability using standard normal"> <strong>Step 3:</strong> Find probability using standard normal distribution<br> $P(X < 100) = P(Z < -1.33)$ </div> <div class="step-box" data-tooltip="Step 4: Look up z-table value"> <strong>Step 4:</strong> Look up the value in z-table or use statistical software<br> $P(Z < -1.33) = 0.0918$ </div> <div class="step-box" data-tooltip="Step 5: Interpret the result"> <strong>Step 5:</strong> Interpret the result<br> The probability that a student completes the test in less than 100 minutes is 9.18%. </div> </div>
<strong>Visual Representation:</strong>
```{r}
library(ggplot2)

x <- seq(60, 180, length.out = 1000)
pdf <- dnorm(x, mean = 120, sd = 15)

df <- data.frame(x = x, pdf = pdf)

ggplot(df, aes(x = x, y = pdf)) +
  geom_line(color = "#8D6E63", size = 1.2) +
  geom_area(data = subset(df, x < 100), aes(x = x, y = pdf), 
            fill = "#A1887F", alpha = 0.5) +
  geom_vline(xintercept = 100, color = "#5D4037", linetype = "dashed", size = 0.8) +
  labs(title = "Normal Distribution: Test Completion Time",
       subtitle = "μ = 120 minutes, σ = 15 minutes",
       x = "Time (minutes)",
       y = "Probability Density") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
        plot.subtitle = element_text(hjust = 0.5, size = 11),
        plot.background = element_rect(fill = "#F5F1E8"),
        panel.background = element_rect(fill = "#F5F1E8"),
        axis.title = element_text(size = 10))
```

<h2 title="Book Reference">Book Reference</h2><div class="reference-box" data-tooltip="Reference textbook for continuous probability distributions"> <strong>Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K.</strong> (2012). <em>Probability & Statistics for Engineers & Scientists</em>. Pearson Education.<br> <strong>Chapter 4:</strong> Continuous Random Variables and Probability Distributions<br> <strong>Pages:</strong> 110-156<br> <strong>Relevance:</strong> This textbook provides comprehensive coverage of continuous probability distributions with engineering applications.


<h1 title="Sampling Distribution of Sample Proportion">Sampling Distribution of the Sample Proportion</h1><div class="video-box" data-tooltip="Educational video about proportion sampling"> <strong>Video:</strong> <a href="https://youtu.be/q2e4mK0FTbw?si=wjeJzXrcTtVySAFU" target="_blank"> Sampling Distribution of the Sample Proportion </a> </div><div class="concept-intro" data-tooltip="Introduction to sample proportion concepts"> In many practical applications, we are concerned with proportions rather than means. Whether studying voting patterns, quality control in manufacturing, medical treatment success rates, or customer satisfaction levels, proportions provide valuable insights about categorical data. </div><h2 title="Sampling Distribution of Sample Proportion">Sampling Distribution of $\hat{p}$</h2><div class="formula-box" data-tooltip="Mathematical formulas for sample proportion distribution"> <strong>Sampling Distribution of Sample Proportion ($\hat{p}$):</strong><br> $$ \hat{p} = \frac{X}{n} $$<br> where: $X$ = number of successes in the sample<br> $n$ = sample size<br><br>
$E(\hat{p}) = p$

**Explanation:**  
This formula states that the expected value of the sample proportion estimator $\hat{p}$ is equal to the true population proportion $p$. This means that $\hat{p}$ is an **unbiased estimator** of the population proportion.


$\hat{p}$ = sample proportion

$p$ = population proportion

$E(\hat{p})$ = expected value (mean) of the sample proportion, which equals the population proportion


<strong>Variance of Sampling Distribution:</strong>
$\mathrm{Var}(\hat{p}) = \frac{p(1-p)}{n}$


<strong>Standard Error:</strong>
$$
SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}
$$

 

<strong>Normal Approximation Conditions:</strong>
For the sampling distribution to be approximately normal:

$np \ge 10$

$n(1-p) \ge 10$


<strong>Normal Approximation Formula:</strong>
$$
\hat{p} \sim N\left(p,\; \frac{p(1-p)}{n}\right)
$$


<strong>Z-score for Proportion:</strong>
$$
z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}}
$$

 

</div><h2 title="Understanding the Concepts">Understanding Sample Proportion Distribution</h2><div class="concept-intro" data-tooltip="Key concepts and importance of proportion distribution"> <strong>Why Sample Proportion Distribution Matters:</strong><br><br> <div class="step-box" data-tooltip="Relationship between binomial and normal distributions"> <strong>1. From Binomial to Normal:</strong><br> The sample proportion $\hat{p}$ is essentially a binomial random variable (X) divided by n. As n increases, the binomial distribution approaches a normal distribution. </div> <div class="step-box" data-tooltip="Central Limit Theorem application for proportions"> <strong>2. Central Limit Theorem for Proportions:</strong><br> For large sample sizes, the sampling distribution of $\hat{p}$ is approximately normal, regardless of the shape of the population distribution (as long as np ≥ 10 and n(1-p) ≥ 10). </div> <div class="step-box" data-tooltip="Real-world applications of proportion analysis"> <strong>3. Practical Applications:</strong><br> • Election polls and political surveys<br> • Quality control and defect rates<br> • Medical trial success rates<br> • Market research and customer satisfaction<br> • A/B testing in web development </div> </div><h2 title="Visualizing Sampling Distribution">Visualizing Sampling Distribution of Proportions</h2><div class="sampling-proportion-viz" data-tooltip="Visualization of proportion distributions with different sample sizes"> <h3 class="sampling-proportion-title">Effect of Sample Size on Proportion Distribution</h3>

```{r}
library(ggplot2)
library(gridExtra)

# Parameters for visualization
p <- 0.05  # Population proportion
sample_sizes <- c(50, 100, 200, 500)
x_limits <- c(0, 0.15)

# Create plots for different sample sizes
plots <- list()

for (i in seq_along(sample_sizes)) {
  n <- sample_sizes[i]
  
  # Calculate parameters for normal approximation
  mu <- p
  se <- sqrt(p * (1 - p) / n)
  
  # Create normal distribution curve
  x_vals <- seq(x_limits[1], x_limits[2], length.out = 1000)
  y_vals <- dnorm(x_vals, mean = mu, sd = se)
  
  plot_data <- data.frame(x = x_vals, y = y_vals)
  
  # Create plot
  p_plot <- ggplot(plot_data, aes(x = x, y = y)) +
    geom_line(color = "#8D6E63", size = 1.2) +
    geom_area(fill = "#A1887F", alpha = 0.3) +
    geom_vline(xintercept = mu, color = "#5D4037", linetype = "dashed", size = 0.8) +
    labs(title = paste0("n = ", n),
         x = "Sample Proportion (p̂)",
         y = "Density") +
    xlim(x_limits) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, face = "bold", size = 12),
      plot.background = element_rect(fill = "#F5F1E8"),
      panel.background = element_rect(fill = "#F5F1E8"),
      axis.title = element_text(size = 10),
      axis.text = element_text(size = 9)
    )
  
  # Add standard error annotation
  p_plot <- p_plot + 
    annotate("text", x = mu + se, y = max(y_vals) * 0.9, 
             label = paste0("SE = ", round(se, 4)), 
             color = "#795548", size = 3.5, fontface = "bold")
  
  plots[[i]] <- p_plot
}

# Arrange and display the plots
grid.arrange(
  grobs = plots, 
  ncol = 2,
  top = grid::textGrob(
    "Sampling Distribution of p̂ for Different Sample Sizes (p = 0.05)",
    gp = grid::gpar(fontsize = 14, fontface = "bold", col = "#5D4037")
  ),
  bottom = grid::textGrob(
    "As sample size increases, standard error decreases and distribution becomes more concentrated around p",
    gp = grid::gpar(fontsize = 11, col = "#795548")
  )
)
```

<h2 title="Comparison with Sample Mean">Comparison: Sample Mean vs Sample Proportion</h2><table class="comparison-table"> <thead> <tr> <th data-tooltip="Comparison aspects">Aspect</th> <th data-tooltip="Sample mean characteristics">Sample Mean ($\bar{X}$)</th> <th data-tooltip="Sample proportion characteristics">Sample Proportion ($\hat{p}$)</th> </tr> </thead> <tbody> <tr> <td><strong>Data Type</strong></td> <td>Continuous numerical data</td> <td>Categorical/binary data</td> </tr> <tr> <td><strong>Formula</strong></td> <td>$\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$</td> <td>$\hat{p} = \frac{X}{n}$</td> </tr> <tr> <td><strong>Expected Value</strong></td> <td>$E(\bar{X}) = \mu$</td> <td>$E(\hat{p}) = p$</td> </tr> <tr> <td><strong>Standard Error</strong></td> <td>$SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}$</td> <td>$SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}$</td> </tr> <tr> <td><strong>Distribution</strong></td> <td>$N(\mu, \frac{\sigma^2}{n})$ for large n</td> <td>$N(p, \frac{p(1-p)}{n})$ when np≥10, n(1-p)≥10</td> </tr> <tr> <td><strong>Applications</strong></td> <td>Average height, test scores, income</td> <td>Voting percentages, defect rates, success rates</td> </tr> </tbody> </table><h2 title="Important Properties">Important Properties and Notes</h2><div class="concept-intro" data-tooltip="Key statistical properties of proportion distributions"> <strong>Key Properties of Sample Proportion Distribution:</strong><br><br> <div class="step-box" data-tooltip="Statistical properties of estimators"> <strong>1. Bias and Consistency:</strong><br> $\hat{p}$ is an unbiased estimator of p: $E(\hat{p}) = p$<br> As n increases, $\hat{p}$ converges to p (consistent estimator) </div> <div class="step-box" data-tooltip="Variance properties of proportions"> <strong>2. Maximum Variance:</strong><br> The variance $p(1-p)$ is maximized when p = 0.5<br> This means proportions near 0.5 have the largest standard errors </div> <div class="step-box" data-tooltip="Correction factor for finite populations"> <strong>3. Finite Population Correction:</strong><br> When sampling without replacement from a finite population of size N:<br> $$ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n} \times \frac{N-n}{N-1}} $$ </div> </div><h2 title="Book References">Reference Books</h2><div class="reference-box" data-tooltip="Recommended textbooks for further study"> <strong>1. Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K.</strong> (2012). <em>Probability & Statistics for Engineers & Scientists</em>. Pearson Education.<br> <strong>Chapter:</strong> 7 - Sampling Distributions<br> <strong>Pages:</strong> 245-280<br> <strong>Relevance:</strong> Comprehensive coverage of sampling distributions including proportion distributions.<br><br>
<strong>2. Devore, J.L.</strong> (2015). <em>Probability and Statistics for Engineering and the Sciences</em>. Cengage Learning.

<strong>Chapter:</strong> 6 - Statistics and Sampling Distributions

<strong>Pages:</strong> 270-310

<strong>Relevance:</strong> Practical applications of sampling distributions in engineering contexts.


<strong>3. Montgomery, D.C., & Runger, G.C.</strong> (2018). <em>Applied Statistics and Probability for Engineers</em>. Wiley.

<strong>Chapter:</strong> 7 - Point Estimation of Parameters and Sampling Distributions

<strong>Pages:</strong> 255-290

<strong>Relevance:</strong> Engineering-focused approach to proportion estimation and sampling theory.


<div class="header-section"> <h1>Central Limit Theorem - Complete Summary</h1> <p><strong>Based on Video: The Central Limit Theorem Explained</strong></p> </div><!-- VIDEO LINK DITARUH DI BAWAH JUDUL AWAL PADA BAB --><div class="video-box" data-tooltip="Watch the complete Central Limit Theorem explanation"> <strong>Video:</strong> <a href="https://youtu.be/ivd8wEHnMCg?si=S2hlMdu37cYVl0ka" target="_blank"> The Central Limit Theorem - Understanding the Most Important Concept in Statistics </a> </div><div class="clt-summary"> <h2 class="clt-title" title="Central Limit Theorem Complete Guide">Central Limit Theorem: Foundation of Statistical Inference</h2><div class="concept-intro" data-tooltip="Introduction to Central Limit Theorem concepts"> <strong>What is the Central Limit Theorem (CLT)?</strong><br> The Central Limit Theorem is one of the most important concepts in statistics. It states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This theorem forms the foundation for statistical inference, allowing us to make probabilistic statements about population parameters. </div><div class="clt-grid"><div class="clt-card" data-tooltip="Core concepts and mathematical formulation"> <h3><i>🎯</i> Core Statement</h3> <div class="card-content"> <ul class="card-points"> <li><strong>Definition:</strong> Sample means approach normal distribution</li> <li><strong>Requirements:</strong> Independent, identically distributed samples</li> <li><strong>Sample Size:</strong> n ≥ 30 generally sufficient</li> <li><strong>Key Insight:</strong> Works for ANY population distribution</li> <li><strong>Mathematical Notation:</strong> $\bar{X} \sim N(\mu, \frac{\sigma^2}{n})$</li> </ul> <div class="formula-box"> <strong>CLT Formula:</strong><br> $$ \bar{X} \xrightarrow{d} N\left(\mu, \frac{\sigma^2}{n}\right) $$ $$ \text{As } n \to \infty, \quad \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) $$ </div> </div> </div><div class="clt-card" data-tooltip="Statistical properties and relationships"> <h3><i>📊</i> Key Properties</h3> <div class="card-content"> <ul class="card-points"> <li><strong>Mean Preservation:</strong> $E(\bar{X}) = \mu$</li> <li><strong>Variance Reduction:</strong> $Var(\bar{X}) = \frac{\sigma^2}{n}$</li> <li><strong>Standard Error:</strong> $SE = \frac{\sigma}{\sqrt{n}}$</li> <li><strong>Sample Size Effect:</strong> Larger n → smaller SE</li> <li><strong>Distribution Shape:</strong> Becomes more normal with larger n</li> </ul> <div class="formula-box"> <strong>Important Relationships:</strong><br> $$ E(\bar{X}) = \mu $$ $$ Var(\bar{X}) = \frac{\sigma^2}{n} $$ $$ SE(\bar{X}) = \frac{\sigma}{\sqrt{n}} $$ </div> </div> </div><div class="clt-card" data-tooltip="Requirements and conditions for CLT application"> <h3><i>🔢</i> Conditions & Requirements</h3> <div class="card-content"> <ul class="card-points"> <li><strong>Independence:</strong> Samples must be independent</li> <li><strong>Random Sampling:</strong> Samples randomly selected</li> <li><strong>Sample Size:</strong> n ≥ 30 for most distributions</li> <li><strong>Finite Variance:</strong> Population variance must be finite</li> <li><strong>Identical Distribution:</strong> Samples from same population</li> </ul> <div class="formula-box"> <strong>Sample Size Guidelines:</strong><br> • Normal population: Any n works<br> • Slightly skewed: n ≥ 15<br> • Moderately skewed: n ≥ 30<br> • Highly skewed: n ≥ 50+<br> </div> </div> </div><div class="clt-card" data-tooltip="Real-world applications and uses"> <h3><i>📈</i> Practical Applications</h3> <div class="card-content"> <ul class="card-points"> <li><strong>Confidence Intervals:</strong> Estimating population parameters</li> <li><strong>Hypothesis Testing:</strong> Testing claims about means</li> <li><strong>Quality Control:</strong> Process monitoring and improvement</li> <li><strong>Survey Analysis:</strong> Polling and market research</li> <li><strong>Medical Research:</strong> Clinical trial analysis</li> </ul> <div class="formula-box"> <strong>Z-score Formula:</strong><br> $$ Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} $$ <strong>Confidence Interval:</strong><br> $$ \bar{X} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} $$ </div> </div> </div></div><h3 class="clt-subtitle" title="R Implementation">R Implementation: CLT Demonstration</h3><div class="r-code-block" data-tooltip="R code for Central Limit Theorem simulation"> <div class="r-code-title"><i>💻</i> Central Limit Theorem Simulation in R</div>
```{r}
# ============================================
# CENTRAL LIMIT THEOREM SIMULATION - Rpubs Version
# ============================================

cat("=== CENTRAL LIMIT THEOREM DEMONSTRATION ===\n\n")

# Memuat library
library(ggplot2)
library(gridExtra)

# Set parameter simulasi
set.seed(123)  # Untuk reproduktibilitas
population_size <- 10000
sample_size <- 30
n_samples <- 1000

# Membuat distribusi populasi yang berbeda
cat("Creating population distributions...\n\n")

# 1. Distribusi eksponensial (sangat miring)
pop_exponential <- rexp(population_size, rate = 1)
mu_exp <- mean(pop_exponential)
sigma_exp <- sd(pop_exponential)

# 2. Distribusi uniform
pop_uniform <- runif(population_size, min = 0, max = 10)
mu_uni <- mean(pop_uniform)
sigma_uni <- sd(pop_uniform)

# 3. Distribusi bimodal (dua puncak)
pop_bimodal <- c(rnorm(population_size/2, mean = 30, sd = 5),
                 rnorm(population_size/2, mean = 70, sd = 5))
mu_bim <- mean(pop_bimodal)
sigma_bim <- sd(pop_bimodal)

# Fungsi untuk mensimulasikan CLT
simulate_clt <- function(population, pop_name, mu, sigma, n, k) {
  sample_means <- numeric(k)
  
  for (i in 1:k) {
    sample_data <- sample(population, size = n, replace = TRUE)
    sample_means[i] <- mean(sample_data)
  }
  
  # Menghitung nilai teoritis
  theoretical_se <- sigma / sqrt(n)
  
  # Membuat data frame hasil
  results <- data.frame(
    Population = pop_name,
    SampleMeans = sample_means,
    TheoreticalMean = mu,
    TheoreticalSE = theoretical_se,
    EmpiricalMean = mean(sample_means),
    EmpiricalSE = sd(sample_means)
  )
  
  return(results)
}

# Menjalankan simulasi untuk semua distribusi
cat("Running CLT simulations...\n")

results_exp <- simulate_clt(pop_exponential, "Exponential", mu_exp, sigma_exp, sample_size, n_samples)
results_uni <- simulate_clt(pop_uniform, "Uniform", mu_uni, sigma_uni, sample_size, n_samples)
results_bim <- simulate_clt(pop_bimodal, "Bimodal", mu_bim, sigma_bim, sample_size, n_samples)

# Menggabungkan hasil
all_results <- rbind(results_exp, results_uni, results_bim)

# Menampilkan statistik ringkasan
cat("\n=== SIMULATION RESULTS (n =", sample_size, ", samples =", n_samples, ") ===\n\n")

for (dist in c("Exponential", "Uniform", "Bimodal")) {
  subset_data <- all_results[all_results$Population == dist, ]
  
  cat(dist, "Distribution:\n")
  cat("  Population mean (μ):", round(unique(subset_data$TheoreticalMean), 3), "\n")
  cat("  Population SD (σ):", round(unique(subset_data$TheoreticalSE * sqrt(sample_size)), 3), "\n")
  cat("  Theoretical SE:", round(unique(subset_data$TheoreticalSE), 3), "\n")
  cat("  Empirical mean of sample means:", round(mean(subset_data$SampleMeans), 3), "\n")
  cat("  Empirical SE:", round(sd(subset_data$SampleMeans), 3), "\n")
  cat("  Difference in means:", round(abs(mean(subset_data$SampleMeans) - unique(subset_data$TheoreticalMean)), 3), "\n")
  cat("  Ratio SE(empirical)/SE(theoretical):", round(sd(subset_data$SampleMeans)/unique(subset_data$TheoreticalSE), 3), "\n\n")
}

# Membuat visualisasi dengan tema coklat yang ditingkatkan
cat("Creating visualizations...\n")

# Fungsi untuk membuat plot distribusi dengan tema coklat
create_distribution_plot <- function(pop_data, sample_means_data, title, color) {
  # Plot distribusi populasi
  p_pop <- ggplot(data.frame(x = pop_data), aes(x = x)) +
    geom_histogram(aes(y = after_stat(density)), bins = 50, fill = color, alpha = 0.7, 
                   color = "white", linewidth = 0.2) +
    geom_density(color = "#3E2723", size = 0.8) +
    labs(title = paste("Population:", title),
         x = "Value", y = "Density") +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, face = "bold", size = 11, color = "#5D4037"),
      plot.background = element_rect(fill = "#EFEBE9", color = "#8D6E63", linewidth = 1),
      panel.background = element_rect(fill = "#F5F1E8"),
      panel.grid.major = element_line(color = "#D7CCC8", linewidth = 0.3),
      panel.grid.minor = element_line(color = "#D7CCC8", linewidth = 0.1),
      axis.title = element_text(size = 10, color = "#5D4037"),
      axis.text = element_text(size = 9, color = "#795548"),
      plot.margin = margin(15, 15, 15, 15)
    )
  
  # Plot distribusi rata-rata sampel
  p_means <- ggplot(data.frame(x = sample_means_data), aes(x = x)) +
    geom_histogram(aes(y = after_stat(density)), bins = 40, fill = color, alpha = 0.7,
                   color = "white", linewidth = 0.2) +
    geom_density(color = "#3E2723", size = 0.8) +
    stat_function(fun = dnorm, 
                  args = list(mean = mean(sample_means_data), 
                            sd = sd(sample_means_data)),
                  color = "#4E342E", size = 0.8, linetype = "dashed") +
    labs(title = paste("Sample Means (n =", sample_size, ")"),
         x = "Sample Mean", y = "Density") +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, face = "bold", size = 11, color = "#5D4037"),
      plot.background = element_rect(fill = "#EFEBE9", color = "#8D6E63", linewidth = 1),
      panel.background = element_rect(fill = "#F5F1E8"),
      panel.grid.major = element_line(color = "#D7CCC8", linewidth = 0.3),
      panel.grid.minor = element_line(color = "#D7CCC8", linewidth = 0.1),
      axis.title = element_text(size = 10, color = "#5D4037"),
      axis.text = element_text(size = 9, color = "#795548"),
      plot.margin = margin(15, 15, 15, 15)
    )
  
  return(list(pop = p_pop, means = p_means))
}

# Palet warna coklat yang lebih variatif
colors <- c("#8D6E63", "#A1887F", "#BCAAA4")  # Warna coklat yang berbeda untuk setiap distribusi

dist_names <- c("Exponential", "Uniform", "Bimodal")
pop_data_list <- list(pop_exponential, pop_uniform, pop_bimodal)
means_data_list <- list(results_exp$SampleMeans, results_uni$SampleMeans, results_bim$SampleMeans)

all_plots <- list()

for (i in 1:3) {
  plots <- create_distribution_plot(pop_data_list[[i]], means_data_list[[i]], 
                                   dist_names[i], colors[i])
  all_plots[[2*i-1]] <- plots$pop
  all_plots[[2*i]] <- plots$means
}

# Mengatur semua plot dalam grid dengan border coklat
grid_plot <- grid.arrange(
  grobs = all_plots,
  ncol = 2,
  nrow = 3,
  top = grid::textGrob(
    paste("CENTRAL LIMIT THEOREM DEMONSTRATION\nSample Size n =", sample_size, "| Number of Samples =", n_samples),
    gp = grid::gpar(fontsize = 16, fontface = "bold", col = "#5D4037", lineheight = 1.2)
  ),
  bottom = grid::textGrob(
    "Interpretasi: Distribusi rata-rata sampel (kolom kanan) mendekati distribusi normal meskipun distribusi populasi asli (kolom kiri) tidak normal",
    gp = grid::gpar(fontsize = 12, col = "#795548", fontface = "italic"),
    y = unit(1, "npc") - unit(15, "mm")
  ),
  padding = unit(0.5, "cm")
)

# Menampilkan plot
print(grid_plot)

cat("\n=== INTERPRETATION ===\n\n")
cat("Berdasarkan simulasi Central Limit Theorem (CLT) di atas, dapat ditarik beberapa kesimpulan penting:\n\n")
cat("1. BENTUK DISTRIBUSI:\n")
cat("   - Distribusi Eksponensial (kiri atas): Sangat miring ke kanan\n")
cat("   - Distribusi Uniform (tengah kiri): Bentuk persegi yang datar\n")
cat("   - Distribusi Bimodal (kiri bawah): Memiliki dua puncak yang berbeda\n")
cat("   - Namun, distribusi rata-rata sampel untuk ketiganya (kolom kanan) mendekati bentuk normal\n\n")

cat("2. KONSISTENSI HASIL:\n")
cat("   - Rata-rata empiris dari rata-rata sampel mendekati rata-rata populasi (μ)\n")
cat("   - Standar Error empiris mendekati Standar Error teoritis (σ/√n)\n")
cat("   - Rasio SE(empirical)/SE(theoretical) mendekati 1 untuk semua distribusi\n\n")

cat("3. IMPLIKASI PRAKTIS:\n")
cat("   - CLT berlaku meskipun distribusi populasi tidak normal\n")
cat("   - Dengan ukuran sampel n=30, distribusi rata-rata sampel sudah cukup mendekati normal\n")
cat("   - Ini memungkinkan penggunaan inferensi statistik (uji hipotesis, interval kepercayaan)\n")
cat("   - CLT adalah dasar untuk banyak metode statistik parametrik\n\n")

cat("4. PENTINGNYA UKURAN SAMPEL:\n")
cat("   - Semakin besar n, distribusi rata-rata sampel semakin mendekati normal\n")
cat("   - Untuk populasi yang sangat miring, mungkin diperlukan n yang lebih besar\n")
cat("   - Aturan praktis: n ≥ 30 biasanya cukup untuk menerapkan CLT\n\n")

cat("=== SIMULATION COMPLETE ===\n")

# Menyimpan plot dengan kualitas tinggi
ggsave("clt_simulation_rpubs.png", grid_plot, 
       width = 14, height = 10, dpi = 300, bg = "#EFEBE9")

cat("\nPlot telah disimpan sebagai 'clt_simulation_rpubs.png'\n")

cat("\n=== INSTRUKSI UNTUK RPUBS ===\n")
cat("1. Install package jika belum: install.packages(c('ggplot2', 'gridExtra'))\n")
cat("2. Copy seluruh kode ini ke RStudio\n")
cat("3. Jalankan semua kode (Ctrl + A lalu Ctrl + Enter)\n")
cat("4. Plot akan muncul di panel Plot dan tersimpan sebagai file PNG\n")
cat("5. Untuk Rpubs: Upload file R ini atau publish langsung dari RStudio\n\n")
```
</div> </div><div class="reference-box" data-tooltip="Recommended textbooks for further study"> <strong>Reference Books:</strong><br><br>
<strong>1. Montgomery, D.C., & Runger, G.C.</strong> (2018). <em>Applied Statistics and Probability for Engineers</em>. Wiley.

<strong>Chapter:</strong> 7 - Point Estimation and Sampling Distributions

<strong>Pages:</strong> 255-290

<strong>Relevance:</strong> Excellent engineering-focused explanation of CLT with practical examples.


<strong>2. Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K.</strong> (2012). <em>Probability & Statistics for Engineers & Scientists</em>. Pearson Education.

<strong>Chapter:</strong> 8 - Fundamental Sampling Distributions

<strong>Pages:</strong> 281-320

<strong>Relevance:</strong> Comprehensive coverage of sampling distributions including CLT proofs.


<strong>3. Devore, J.L.</strong> (2015). <em>Probability and Statistics for Engineering and the Sciences</em>. Cengage Learning.

<strong>Chapter:</strong> 6 - Statistics and Sampling Distributions

<strong>Pages:</strong> 270-310

<strong>Relevance:</strong> Practical applications of CLT in engineering and scientific contexts.


<div class="header-section"> 
<h1>Central Limit Theorem Summary</h1> 
<p><strong>Video: The Central Limit Theorem Explained</strong></p>
</div>

<div class="clt-summary">
<h2 class="clt-title" title="Central Limit Theorem Complete Guide">Central Limit Theorem: Key Concepts</h2>
  
<!-- VIDEO LINK DITARUH DI BAWAH JUDUL AWAL PADA BAB -->
<div class="video-box" data-tooltip="Watch Central Limit Theorem explanation video">
<strong>Video:</strong> 
<a href="https://youtu.be/ivd8wEHnMCg?si=S2hlMdu37cYVl0ka" target="_blank">
The Central Limit Theorem - Understanding the Most Important Concept in Statistics
</a>
</div>
  
<div class="concept-intro" data-tooltip="Introduction to Central Limit Theorem">
<strong>What is CLT?</strong><br>
The Central Limit Theorem states that the sampling distribution of sample means approaches a normal distribution as sample size increases, regardless of population distribution shape. This enables statistical inference about population parameters.
</div>
  
<div class="clt-grid">
    
<div class="clt-card" data-tooltip="Core concepts of CLT">
<h3><i>🎯</i> Core Concept</h3>
<div class="card-content"    
<ul class="card-points">
<li>Sample means approach normal distribution</li>
<li>Works for any population distribution</li>
<li>n ≥ 30 usually sufficient</li>
<li>Independent random samples required</li>
</ul>
<div class="formula-box" data-tooltip="CLT mathematical formula">
$$ \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) $$
</div>
</div>
</div>
  
<div class="clt-card" data-tooltip="Key statistical properties">
<h3><i>📊</i> Key Properties</h3>
<div class="card-content">
<ul class="card-points">
<li>Mean: $E(\bar{X}) = \mu$</li>
<li>Variance: $Var(\bar{X}) = \frac{\sigma^2}{n}$</li>
<li>Standard Error: $SE = \frac{\sigma}{\sqrt{n}}$</li>
<li>Larger n → smaller SE</li>
</ul>
<div class="formula-box" data-tooltip="Standard error formula">
$$ SE = \frac{\sigma}{\sqrt{n}} $$
</div>
</div>
</div>
    
</div>
  
<h3 class="clt-subtitle" title="R Implementation">CLT Simulation in R</h3>
  
<div class="r-code-block" data-tooltip="R code for CLT demonstration">
<div class="r-code-title"><i>💻</i> CLT Demonstration</div>

```{r clt-demo, echo=TRUE, warning=FALSE, message=FALSE, fig.width=7, fig.height=5}
library(ggplot2)
set.seed(123)

# Parameters
pop_size <- 5000
n <- 30
samples <- 1000

# Skewed population
pop_skewed <- rexp(pop_size, rate = 1)
sample_means <- numeric(samples)

for(i in 1:samples) {
  sample_means[i] <- mean(sample(pop_skewed, n, replace = TRUE))
}

# Create plots
p1 <- ggplot(data.frame(x = pop_skewed), aes(x = x)) +
  geom_histogram(aes(y = ..density..), bins = 40, 
                 fill = "#8D6E63", alpha = 0.6) +
  labs(title = "Population Distribution (Exponential)",
       x = "Value", y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12),
    plot.background = element_rect(fill = "#F5F1E8"),
    panel.background = element_rect(fill = "#F5F1E8")
  )

p2 <- ggplot(data.frame(x = sample_means), aes(x = x)) +
  geom_histogram(aes(y = ..density..), bins = 40,
                 fill = "#A1887F", alpha = 0.6) +
  stat_function(fun = dnorm, 
                args = list(mean = mean(sample_means), 
                          sd = sd(sample_means)),
                color = "#5D4037", size = 0.8) +
  labs(title = paste("Sample Means (n =", n, ")"),
       x = "Sample Mean", y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12),
    plot.background = element_rect(fill = "#F5F1E8"),
    panel.background = element_rect(fill = "#F5F1E8")
  )

library(gridExtra)
grid.arrange(p1, p2, ncol = 2,
             top = grid::textGrob(
               "Central Limit Theorem Demonstration",
               gp = grid::gpar(fontsize = 14, fontface = "bold", col = "#5D4037")
             ))
```

<div class="step-box" data-tooltip="Standard error calculation"> <strong>1. Standard Error:</strong><br> $SE = \frac{2}{\sqrt{36}} = 0.333$ cm </div><div class="step-box" data-tooltip="Probability calculation"> <strong>2. Probability mean < 99.5cm:</strong><br> $Z = \frac{99.5-100}{0.333} = -1.5$<br> $P = 0.0668$ (6.68%) </div><div class="step-box" data-tooltip="Confidence interval calculation"> <strong>3. 95% Confidence Interval:</strong><br> $100 \pm 1.96 \times 0.333 = [99.35, 100.65]$ cm </div></div> <div class="r-code-block" data-tooltip="R calculations for CLT example"> <div class="r-code-title"><i></i>R Calculation</div>

```{r echo=TRUE, warning=FALSE, message=FALSE, results='asis'}
mu <- 100
sigma <- 2
n <- 36

# Calculations
se <- sigma / sqrt(n)
z <- (99.5 - mu) / se
p_val <- pnorm(z)

ci_lower <- mu - 1.96 * se
ci_upper <- mu + 1.96 * se

# Soft brown color
color <- "#8B5E3C"

output <- paste0(
  "<span style='color:", color, "; font-weight:600;'>Standard Error: ", round(se, 3), " cm</span><br>",
  "<span style='color:", color, "; font-weight:600;'>Z-score for 99.5 cm: ", round(z, 3), "</span><br>",
  "<span style='color:", color, "; font-weight:600;'>Probability: ", round(p_val, 4), "</span><br>",
  "<span style='color:", color, "; font-weight:600;'>95% CI: [", 
      round(ci_lower, 2), ", ", round(ci_upper, 2), "] cm</span>"
)

```

</div> <table class="comparison-table"> <thead> <tr> <th>Sample Size</th> <th>Standard Error</th> <th>95% CI Width</th> </tr> </thead> <tbody> <tr> <td>n = 9</td> <td>0.667</td> <td>2.61 cm</td> </tr> <tr> <td>n = 36</td> <td>0.333</td> <td>1.31 cm</td> </tr> <tr> <td>n = 100</td> <td>0.200</td> <td>0.78 cm</td> </tr> </tbody> </table> <div class="concept-intro" data-tooltip="Key points about CLT"> <strong>Key Points:</strong><br><br> <div class="step-box" data-tooltip="Importance of CLT"> <strong>1. Importance:</strong><br> Enables inference without knowing population distribution. </div> <div class="step-box" data-tooltip="Effect of sample size"> <strong>2. Sample Size:</strong><br> Larger n → more normal distribution of means. </div> <div class="step-box" data-tooltip="Applications of CLT"> <strong>3. Applications:</strong><br> Confidence intervals, hypothesis testing, quality control. </div></div> <div class="formula-box" data-tooltip="Essential statistical formulas"> <strong>Essential Formulas:</strong><br><br>
## Essential Formulas

### 1. Standard Error (SE)
$$
SE = \frac{\sigma}{\sqrt{n}}
$$

### 2. Z-Score
$$
Z = \frac{\bar{X} - \mu}{SE}
$$

### 3. Confidence Interval (CI)
$$
\bar{X} \pm z_{\alpha/2} \times SE
$$


</div> <div class="video-box" data-tooltip="Video key takeaways"> <strong>Video Takeaways:</strong><br> 1. CLT enables statistical inference<br> 2. Works for any population shape<br> 3. Larger samples give better approximations<br> 4. Foundation for confidence intervals and hypothesis tests </div> </div> <h1 title="Sampling Distribution of the Sample Proportion">Sampling Distribution of the Sample Proportion</h1><!-- VIDEO LINK DITARUH DI BAWAH JUDUL BAB --><div class="video-box" data-tooltip="Watch complete video about sampling distribution of proportions"> <strong>Video:</strong> <a href="https://youtu.be/q2e4mK0FTbw?si=wjeJzXrcTtVySAFU" target="_blank"> Sampling Distribution of the Sample Proportion by Joshua Emmanuel </a> </div><div class="concept-intro" data-tooltip="Introduction to sample proportion concepts"> The sampling distribution of the sample proportion is a fundamental concept in inferential statistics that describes how sample proportions vary from sample to sample. It provides the foundation for constructing confidence intervals and conducting hypothesis tests about population proportions. </div><h2 title="Core Concepts of Sample Proportion Distribution">Core Concepts of Sample Proportion Distribution</h2><div class="formula-box" data-tooltip="Basic formula for sample proportion"> <strong>Sample Proportion Formula:</strong><br> $$ \hat{p} = \frac{X}{n} $$ <div class="step-box" data-tooltip="Variable definitions"> <strong>Where:</strong><br> $X$ = number of successes in the sample<br> $n$ = sample size<br> $\hat{p}$ = sample proportion (pronounced "p-hat") </div> </div><h3 title="Key Parameters of Sampling Distribution">Key Parameters of Sampling Distribution</h3><div class="math-container" data-tooltip="Mean of sampling distribution"> <div class="math"> <strong>Mean of Sampling Distribution:</strong><br> $$ \mu_{\hat{p}} = E(\hat{p}) = p $$ </div> <div class="step-box" data-tooltip="Explanation of unbiased estimator"> <strong>Explanation:</strong><br> The expected value of all possible sample proportions equals the population proportion<br> This makes $\hat{p}$ an <strong>unbiased estimator</strong> of $p$ </div> </div><div class="formula-box" data-tooltip="Variance and standard error formulas"> <strong>Variance and Standard Error:</strong><br> $$ \sigma^2_{\hat{p}} = \frac{p(1-p)}{n} $$ $$ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}} $$ <div class="step-box" data-tooltip="Important notes about variance"> <strong>Important Notes:</strong><br> 1. Standard error decreases as sample size increases<br> 2. Maximum variance occurs when $p = 0.5$<br> 3. Variance is smallest when $p$ is near 0 or 1 </div> </div><h3 title="Normal Approximation Conditions">Normal Approximation Conditions</h3><div class="example-box" data-tooltip="CLT conditions for proportions"> <strong>Central Limit Theorem for Proportions:</strong><br> The sampling distribution of $\hat{p}$ is approximately normal if: <div class="step-box" data-tooltip="Practical conditions"> 1. $np \ge 10$<br> 2. $n(1-p) \ge 10$<br> <em>(These are practical rules of thumb)</em> </div> <div class="step-box" data-tooltip="Normal approximation formula"> <strong>When satisfied:</strong><br> $$ \hat{p} \sim N\left(p,\; \sqrt{\frac{p(1-p)}{n}}\right) $$ </div> </div><div class="formula-box" data-tooltip="Z-score calculation for proportions"> <strong>Z-score Calculation for Proportions:</strong><br> $$ z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} $$ <div class="step-box" data-tooltip="Applications of z-score"> <strong>Used for:</strong><br> • Finding probabilities for sample proportions<br> • Constructing confidence intervals<br> • Hypothesis testing about proportions </div> </div><h2 title="Practical Example from Video">Practical Example from Video</h2><div class="example-box" data-tooltip="Example problem from video"> <strong>Video Example: Voter Support Survey</strong><br> <div class="step-box" data-tooltip="Problem scenario"> <strong>Scenario:</strong><br> • Population proportion: $p = 0.60$ (60% support candidate)<br> • Sample size: $n = 100$<br> • Question: What is $P(\hat{p} > 0.65)$? </div> <div class="step-box" data-tooltip="Step-by-step solution"> <strong>Solution Steps:</strong><br> 1. Check normal approximation: $np = 60$, $n(1-p) = 40$ ✓<br> 2. Calculate standard error: $SE = \sqrt{\frac{0.6 \times 0.4}{100}} = 0.049$<br> 3. Compute z-score: $z = \frac{0.65 - 0.60}{0.049} = 1.02$<br> 4. Find probability: $P(Z > 1.02) = 0.1539$<br> <em>Conclusion: There's a 15.39% chance of getting a sample with over 65% support</em> </div> </div><h3 title="Visualizing the Sampling Distribution">Visualizing the Sampling Distribution</h3><div class="sampling-proportion-viz" data-tooltip="Visual characteristics of proportion distribution"> <div class="sampling-proportion-title"> Characteristics of $\hat{p}$ Distribution </div> <div class="step-box" data-tooltip="Effect of sample size on shape"> <strong>Shape Changes with Sample Size:</strong><br> • Small $n$: Discrete distribution (binomial-like)<br> • Large $n$: Bell-shaped curve (normal approximation)<br> • As $n$ increases: Distribution becomes more concentrated around $p$ </div> <div class="step-box" data-tooltip="Effect of population proportion"> <strong>Effect of $p$ on Distribution:</strong><br> • When $p = 0.5$: Symmetric distribution<br> • When $p$ near 0 or 1: Skewed distribution (needs larger $n$ for normal approximation) </div> </div><h3 title="Comparison: Mean vs Proportion">Comparison: Mean vs Proportion Sampling</h3><table class="comparison-table"> <thead> <tr> <th data-tooltip="Comparison aspects">Aspect</th> <th data-tooltip="Sample mean characteristics">Sample Mean ($\bar{x}$)</th> <th data-tooltip="Sample proportion characteristics">Sample Proportion ($\hat{p}$)</th> </tr> </thead> <tbody> <tr> <td><strong>Population Parameter</strong></td> <td>$\mu$ (population mean)</td> <td>$p$ (population proportion)</td> </tr> <tr> <td><strong>Sample Statistic</strong></td> <td>$\bar{x} = \frac{\sum x_i}{n}$</td> <td>$\hat{p} = \frac{X}{n}$</td> </tr> <tr> <td><strong>Sampling Distribution Mean</strong></td> <td>$\mu_{\bar{x}} = \mu$</td> <td>$\mu_{\hat{p}} = p$</td> </tr> <tr> <td><strong>Standard Error Formula</strong></td> <td>$SE = \frac{\sigma}{\sqrt{n}}$</td> <td>$SE = \sqrt{\frac{p(1-p)}{n}}$</td> </tr> <tr> <td><strong>Normality Conditions</strong></td> <td>$n \ge 30$ (CLT)</td> <td>$np \ge 10$ and $n(1-p) \ge 10$</td> </tr> <tr> <td><strong>Data Type</strong></td> <td>Quantitative (continuous)</td> <td>Categorical (binary: success/failure)</td> </tr> <tr> <td><strong>Distribution Shape</strong></td> <td>Normal for large $n$</td> <td>Normal when conditions met</td> </tr> </tbody> </table><h3 title="R Implementation and Simulation">R Implementation and Simulation</h3><div class="r-code-block" data-tooltip="R simulation of proportion sampling"> <div class="r-code-title"> <i>📊</i> Simulating Sampling Distribution of $\hat{p}$ </div>

```{r}
# SIMULATION OF SAMPLE PROPORTION DISTRIBUTION
# ============================================

# Population parameters
p_population <- 0.60      # True population proportion
n_sample <- 100           # Sample size
n_simulations <- 10000    # Number of samples to simulate

# Function to generate one sample proportion
generate_sample_proportion <- function(p, n) {
  # Generate binary data (Bernoulli trials)
  sample_data <- rbinom(n, 1, p)
  # Calculate sample proportion
  p_hat <- mean(sample_data)
  return(p_hat)
}

# Run simulation
set.seed(123)  # For reproducibility
sample_proportions <- replicate(n_simulations, 
                                generate_sample_proportion(p_population, n_sample))

# Calculate descriptive statistics
mean_simulated <- mean(sample_proportions)
se_theoretical <- sqrt(p_population * (1-p_population) / n_sample)
se_empirical <- sd(sample_proportions)

# Display results
cat("=== SAMPLING DISTRIBUTION SIMULATION RESULTS ===\n")
cat("Population proportion (p):", p_population, "\n")
cat("Sample size (n):", n_sample, "\n")
cat("Number of simulations:", n_simulations, "\n\n")
cat("Mean of simulated p̂:", round(mean_simulated, 4), "\n")
cat("Theoretical Standard Error:", round(se_theoretical, 4), "\n")
cat("Empirical Standard Error:", round(se_empirical, 4), "\n\n")

# Check normality conditions
cat("=== NORMALITY CONDITIONS CHECK ===\n")
cat("np =", n_sample * p_population, "\n")
cat("n(1-p) =", n_sample * (1-p_population), "\n")
if (n_sample * p_population >= 10 && n_sample * (1-p_population) >= 10) {
  cat("✓ Conditions met: Normal approximation is valid\n")
} else {
  cat("✗ Conditions not met: Consider alternative methods\n")
}

# Calculate probability for specific value
target_value <- 0.65
z_calculated <- (target_value - p_population) / se_theoretical
probability <- 1 - pnorm(z_calculated)

cat("\n=== PROBABILITY CALCULATION ===\n")
cat("P(p̂ >", target_value, "):\n")
cat("Z-score:", round(z_calculated, 3), "\n")
cat("Probability:", round(probability, 4), "\n")
cat("Percentage:", round(probability * 100, 2), "%\n")
```

</div><h3 title="Interpretation of Results">Interpretation of Results</h3><div class="example-box" data-tooltip="Interpretation of simulation results"> <strong>Key Insights from Simulation:</strong> <div class="step-box" data-tooltip="Unbiasedness confirmation"> <strong>1. Unbiasedness Confirmation:</strong><br> • Simulated mean of $\hat{p}$ ≈ population $p$ (0.60)<br> • Empirical SE ≈ theoretical SE<br> • Demonstrates $\hat{p}$ is an unbiased estimator </div> <div class="step-box" data-tooltip="Practical applications"> <strong>2. Practical Applications:</strong><br> • Quality control: Probability of defect rate exceeding threshold<br> • Election forecasting: Chance of candidate getting certain vote percentage<br> • Medical studies: Probability of treatment success rate </div> <div class="step-box" data-tooltip="Decision making implications"> <strong>3. Decision Making:</strong><br> • With 15.39% probability, a sample of 100 could show >65% support<br> • Important for interpreting survey/margin of error </div> </div><h2 title="Real-World Applications">Real-World Applications</h2><div class="reference-box" data-tooltip="Real-world applications of proportion analysis"> <strong>Common Applications in Various Fields:</strong> <ol> <li><strong>Political Polling:</strong> <div class="step-box" data-tooltip="Political polling applications"> • Estimating candidate support<br> • Calculating margin of error<br> • Determining sample size needed for desired precision </div> </li>

<li><strong>Quality Control:</strong>
<div class="step-box" data-tooltip="Quality control applications">
• Monitoring defect rates in manufacturing<br>
• Setting control limits for proportion defective<br>
• Determining acceptable quality levels
</div>
</li>

<li><strong>Medical Research:</strong>
<div class="step-box" data-tooltip="Medical research applications">
• Estimating treatment success rates<br>
• Comparing proportions between treatment groups<br>
• Calculating required sample size for clinical trials
</div>
</li>

<li><strong>Market Research:</strong>
<div class="step-box" data-tooltip="Market research applications">
• Estimating market share<br>
• Calculating customer satisfaction rates<br>
• Determining brand preference proportions
</div>
</li>
</ol> </div><h3 title="Statistical Formulas for Practice">Statistical Formulas for Practice</h3><div class="formula-box" data-tooltip="Important statistical formulas"> <strong>Confidence Interval for Proportion:</strong><br> $$ \hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} $$
<strong>Margin of Error Formula:</strong>

$$
ME = z_{\alpha/2} \times \sqrt{\frac{p(1-p)}{n}}
$$


<strong>Sample Size Determination:</strong>

$$
n = \left( \frac{z_{\alpha/2}}{ME} \right)^2 \, p(1-p)
$$


<div class="step-box" data-tooltip="Note about sample size calculation"> <strong>Note:</strong> When $p$ is unknown, use $p = 0.5$ for maximum sample size </div> </div><h2 title="Important Considerations">Important Considerations</h2><div class="concept-intro" data-tooltip="Critical points to remember"> <strong>Critical Points to Remember:</strong> <ol> <li><strong>Independence Assumption:</strong> Samples must be independent (random sampling)</li> <li><strong>10% Condition:</strong> Sample should be ≤ 10% of population when sampling without replacement</li> <li><strong>Success-Failure Condition:</strong> Must check $np ≥ 10$ and $n(1-p) ≥ 10$ for normal approximation</li> <li><strong>Continuity Correction:</strong> For small samples, consider adding ±0.5/n to improve normal approximation</li> </ol> </div><div class="example-box" data-tooltip="Alternative methods when normal approximation fails"> <strong>When Normal Approximation Fails:</strong> <div class="step-box" data-tooltip="Alternative statistical methods"> <strong>Alternative Methods:</strong><br> 1. <strong>Exact Binomial Test:</strong> Use when $n$ is small or $p$ is extreme<br> 2. <strong>Wilson Score Interval:</strong> Better for small samples or extreme proportions<br> 3. <strong>Clopper-Pearson Interval:</strong> Conservative exact method<br> 4. <strong>Simulation/Bootstrap:</strong> Resampling methods for any sample size </div> </div><h3 title="Reference Books and Resources">Reference Books and Resources</h3><div class="reference-box" data-tooltip="Recommended textbooks"> <strong>Recommended Textbooks:</strong> <ol> <li><strong>"Statistics for Business and Economics"</strong> by Anderson, Sweeney, and Williams <div class="step-box" data-tooltip="Book details"> • Chapter 7: Sampling and Sampling Distributions<br> • Excellent practical examples with business applications </div> </li>

<li><strong>"Introductory Statistics"</strong> by Prem S. Mann
<div class="step-box" data-tooltip="Book details">
• Chapter 7: Sampling Distributions<br>
• Clear explanations with step-by-step solutions
</div>
</li>

<li><strong>"The Practice of Statistics"</strong> by Starnes, Tabor, Yates, and Moore
<div class="step-box" data-tooltip="Book details">
• Chapter 7: Sampling Distributions<br>
• Modern approach with emphasis on data analysis
</div>
</li>

<li><strong>"Statistical Inference"</strong> by George Casella and Roger L. Berger
<div class="step-box" data-tooltip="Book details">
• Chapter 5: Properties of a Random Sample<br>
• Theoretical foundation for sampling distributions
</div>
</li>

<li><strong>"OpenIntro Statistics"</strong> by Diez, Barr, and Çetinkaya-Rundel
<div class="step-box" data-tooltip="Book details">
• Free online textbook<br>
• Chapter 4: Foundations for Inference<br>
• Includes R examples and practice problems
</div>
</li>


<h1 title="Binomial Distribution Review">Binomial Distribution Review</h1>

<!-- VIDEO LINK DITARUH DI BAWAH JUDUL BAB -->
<div class="video-box" data-tooltip="Watch comprehensive review of statistical distributions">
<strong>Video Review:</strong>
<a href="https://youtu.be/c0mFEL_SWzE?si=LZdOZXSePZn_nIld" target="_blank">
Review: Sampling Distribution of the Sample Proportion, Binomial Distribution, Probability
</a>
</div>

<h2 title="Introduction to Binomial Distribution">Introduction to Binomial Distribution</h2>

<div class="concept-intro" data-tooltip="Basic concepts of binomial distribution">
The binomial distribution is one of the most fundamental discrete probability distributions in statistics. It models the number of successes in a fixed number of independent Bernoulli trials, where each trial has only two possible outcomes: success or failure.
</div>

<h2 title="Binomial Distribution Formulas">Binomial Distribution Formulas</h2>

<div class="formula-box" data-tooltip="Mathematical formulas for binomial distribution">
<strong>Binomial Probability Mass Function:</strong><br>
$$ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} $$<br>
  
  where:
  $n$ = number of trials<br>
  $k$ = number of successes<br>
  $p$ = probability of success on each trial<br>
  $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ = binomial coefficient<br><br>
  
  \[
  \text{Mean of Binomial Distribution:} \quad \mu = E(X) = n p
  \]
  
  Penjelasan singkat:<br>
  $\mu$ = mean / rata-rata distribusi<br>
  $E(X)$ = ekspektasi dari variabel acak $X$<br>
  $n$ = jumlah percobaan<br>
  $p$ = probabilitas sukses pada tiap percobaan<br><br>
  
  <strong>**Variance of Binomial Distribution:**</strong>
  $$ \sigma^2 = \operatorname{Var}(X) = np(1-p) $$<br>
  
  <strong>Standard Deviation:</strong>
  $$ \sigma = \sqrt{np(1-p)} $$<br>
  
  <strong>**Conditions for Binomial Distribution:**</strong><br>
  Fixed number of trials $(n)$<br>
  Independent trials<br>
  Two possible outcomes (success/failure)<br>
  Constant probability of success $(p)$<br><br>
  
  <strong>**Normal Approximation to the Binomial:**</strong><br>
  
  Normal approximation is appropriate if:<br>
  $np \ge 10$<br>
  $n(1-p) \ge 10$<br><br>
  
  \[
  \text{Normal Approximation:} \quad X \sim N \big( n p, \, n p (1 - p) \big)
  \]
  
  Penjelasan singkat:<br>
  $X$ = variabel acak binomial<br>
  $N(\mu,\sigma^2)$ = distribusi normal dengan mean $\mu$ dan varians $\sigma^2$<br>
  $\mu = np$ = mean dari distribusi binomial<br>
  $\sigma^2 = np(1-p)$ = varians dari distribusi binomial<br>
  Digunakan ketika $n$ besar dan $p$ tidak terlalu dekat dengan 0 atau 1
</div>

<h2 title="Example Problem: Binomial Distribution">Example Problem: Binomial Distribution</h2>

<div class="example-box" data-tooltip="Step-by-step binomial distribution problem">
<strong>Problem:</strong> A multiple-choice test has 20 questions, each with 5 choices. A student guesses randomly on all questions. What is the probability that the student gets exactly 5 questions correct?<br><br>

<strong>**Solution Step by Step:**</strong>
  
<div class="step-box" data-tooltip="Step 1: Identify parameters">
<strong>Step 1:</strong> Identify parameters<br>
$n = 20$ (number of questions)<br>
$p = \frac{1}{5} = 0.2$ (probability of guessing correctly)<br>
$k = 5$ (number of correct answers wanted)
</div>
  
<div class="step-box" data-tooltip="Step 2: Calculate binomial coefficient">
<strong>Step 2:</strong> Calculate binomial coefficient<br>
$\binom{20}{5} = \frac{20!}{5!15!} = \frac{20 \times 19 \times 18 \times 17 \times 16}{5 \times 4 \times 3 \times 2 \times 1} = 15504$
</div>
  
<div class="step-box" data-tooltip="Step 3: Apply binomial formula">
<strong>Step 3:</strong> Apply binomial formula<br>
$P(X = 5) = \binom{20}{5} (0.2)^5 (0.8)^{15}$
</div>
  
<div class="step-box" data-tooltip="Step 4: Calculate each component">
<strong>Step 4:</strong> Calculate each component<br>
$(0.2)^5 = 0.00032$<br>
$(0.8)^{15} = 0.03518$
</div>
  
<div class="step-box" data-tooltip="Step 5: Multiply all components">
<strong>Step 5:</strong> Multiply all components<br>
$P(X = 5) = 15504 \times 0.00032 \times 0.03518 = 15504 \times 0.0000112576 = 0.1746$
</div>
  
<br>
<strong>**Answer:**</strong> The probability that the student gets exactly 5 questions correct is 17.46%.
</div>

<h2 title="Book Reference for Binomial Distribution">Book Reference</h2>

<div class="reference-box" data-tooltip="Textbook reference for binomial distribution">
<strong>Ross, S.M.</strong> (2014). <em>A First Course in Probability</em>. Pearson.<br>
<strong>Chapter 4:</strong> Discrete Random Variables<br>
<strong>Pages:</strong> 125-160<br>
<strong>Relevance:</strong> Excellent coverage of binomial distribution with numerous examples and applications.
</div>

<h1 title="Week 11 Summary and Conclusion">Summary and Conclusion</h1>

<div class="summary-container" data-tooltip="Week 11 key takeaways summary">
<div class="concept-intro" data-tooltip="Main concepts learned in Week 11">
<strong>**Key Takeaways from Week 11:**</strong><br><br>
    
<div class="summary-points">
<div class="summary-point" data-tooltip="Continuous distributions concepts">
<strong>1. Continuous Probability Distributions:</strong><br>
Understanding PDF and CDF is fundamental for working with continuous variables
</div>
      
<div class="summary-point" data-tooltip="Sampling distributions importance">
<strong>2. Sampling Distributions:</strong><br>
The distribution of sample statistics forms the basis for statistical inference
</div>
      
<div class="summary-point" data-tooltip="Central Limit Theorem significance">
<strong>3. Central Limit Theorem:</strong><br>
One of the most powerful results in statistics, enabling normal approximations
</div>
      
<div class="summary-point" data-tooltip="Proportions sampling applications">
<strong>4. Sampling Distribution of Proportions:</strong><br>
Essential for categorical data analysis and proportion inference
</div>
      
<div class="summary-point" data-tooltip="Binomial distribution fundamentals">
<strong>5. Binomial Distribution:</strong><br>
Foundation for understanding proportions and binary outcomes
</div>
</div>
</div>
</div>

<div class="footer-section" data-tooltip="Assignment completion information">
<p><strong>Week 11 - Probability Distribution</strong></p>
<p class="footer-text">end | Statistics for Data Science</p>
</div>

