Nama: Adinda Adelia Futri
NIM: 52250055
| Characteristic | Discrete Random Variable | Continuous Random Variable |
|---|---|---|
| Possible Values | Countable, finite or countably infinite | Uncountable, infinite within interval |
| Probability at Point | P(X = x) can be positive | P(X = x) = 0 for all x |
| Probability Function | Probability Mass Function (PMF) | Probability Density Function (PDF) |
| Total Probability | ∑ P(X = x) = 1 | ∫ f(x) dx = 1 |
| Cumulative Distribution | F(x) = ∑ P(X ≤ x) | F(x) = ∫ f(t) dt from -∞ to x |
| Example Distributions | Binomial, Poisson, Geometric | Normal, Exponential, Uniform |
| Real-world Examples | Number of customers, defect counts | Height, temperature, waiting time |
Problem: The time required to complete a
standardized test follows a normal distribution with mean 120 minutes
and standard deviation 15 minutes. What is the probability that a
randomly selected student will complete the test in less than 100
minutes?
Solution Step by Step:
Visual Representation:
Walpole, R.E., Myers, R.H., Myers, S.L., & Ye,
K. (2012). Probability & Statistics for Engineers &
Scientists. Pearson Education.
Chapter 4:
Continuous Random Variables and Probability Distributions
Pages: 110-156
Relevance: This
textbook provides comprehensive coverage of continuous probability
distributions with engineering applications.
Sampling Distribution of Sample Proportion (\(\hat{p}\)):
\[ \hat{p} = \frac{X}{n} \]
where: \(X\) = number of successes in the sample
\(n\) = sample size
\(E(\hat{p}) = p\)
Explanation:
This formula states that the expected value of the sample proportion
estimator \(\hat{p}\) is equal to the
true population proportion \(p\). This
means that \(\hat{p}\) is an
unbiased estimator of the population proportion.
\(\hat{p}\) = sample proportion
\(p\) = population proportion
\(E(\hat{p})\) = expected value (mean) of the sample proportion, which equals the population proportion
Variance of Sampling Distribution: \(\mathrm{Var}(\hat{p}) = \frac{p(1-p)}{n}\)
Standard Error: \[ SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}} \]
Normal Approximation Conditions: For the sampling distribution to be approximately normal:
\(np \ge 10\)
\(n(1-p) \ge 10\)
Normal Approximation Formula: \[ \hat{p} \sim N\left(p,\; \frac{p(1-p)}{n}\right) \]
Z-score for Proportion: \[ z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \]
| Aspect | Sample Mean (\(\bar{X}\)) | Sample Proportion (\(\hat{p}\)) |
|---|---|---|
| Data Type | Continuous numerical data | Categorical/binary data |
| Formula | \(\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\) | \(\hat{p} = \frac{X}{n}\) |
| Expected Value | \(E(\bar{X}) = \mu\) | \(E(\hat{p}) = p\) |
| Standard Error | \(SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}\) | \(SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}\) |
| Distribution | \(N(\mu, \frac{\sigma^2}{n})\) for large n | \(N(p, \frac{p(1-p)}{n})\) when np≥10, n(1-p)≥10 |
| Applications | Average height, test scores, income | Voting percentages, defect rates, success rates |
1. Walpole, R.E., Myers, R.H., Myers, S.L., & Ye,
K. (2012). Probability & Statistics for Engineers &
Scientists. Pearson Education.
Chapter: 7 -
Sampling Distributions
Pages: 245-280
Relevance: Comprehensive coverage of sampling
distributions including proportion distributions.
2.
Devore, J.L. (2015). Probability and Statistics for
Engineering and the Sciences. Cengage Learning.
Chapter: 6 - Statistics and Sampling Distributions
Pages: 270-310
Relevance: Practical applications of sampling distributions in engineering contexts.
3. Montgomery, D.C., & Runger, G.C. (2018). Applied Statistics and Probability for Engineers. Wiley.
Chapter: 7 - Point Estimation of Parameters and Sampling Distributions
Pages: 255-290
Relevance: Engineering-focused approach to proportion estimation and sampling theory.
Based on Video: The Central Limit Theorem Explained
## === CENTRAL LIMIT THEOREM DEMONSTRATION ===
## Creating population distributions...
## Running CLT simulations...
##
## === SIMULATION RESULTS (n = 30 , samples = 1000 ) ===
## Exponential Distribution:
## Population mean (μ): 1.004
## Population SD (σ): 1
## Theoretical SE: 0.183
## Empirical mean of sample means: 1.003
## Empirical SE: 0.18
## Difference in means: 0
## Ratio SE(empirical)/SE(theoretical): 0.984
##
## Uniform Distribution:
## Population mean (μ): 4.998
## Population SD (σ): 2.894
## Theoretical SE: 0.528
## Empirical mean of sample means: 5
## Empirical SE: 0.514
## Difference in means: 0.002
## Ratio SE(empirical)/SE(theoretical): 0.973
##
## Bimodal Distribution:
## Population mean (μ): 49.936
## Population SD (σ): 20.735
## Theoretical SE: 3.786
## Empirical mean of sample means: 49.836
## Empirical SE: 3.89
## Difference in means: 0.1
## Ratio SE(empirical)/SE(theoretical): 1.028
## Creating visualizations...
## TableGrob (5 x 2) "arrange": 8 grobs
## z cells name grob
## 1 1 (2-2,1-1) arrange gtable[layout]
## 2 2 (2-2,2-2) arrange gtable[layout]
## 3 3 (3-3,1-1) arrange gtable[layout]
## 4 4 (3-3,2-2) arrange gtable[layout]
## 5 5 (4-4,1-1) arrange gtable[layout]
## 6 6 (4-4,2-2) arrange gtable[layout]
## 7 7 (1-1,1-2) arrange text[GRID.text.480]
## 8 8 (5-5,1-2) arrange text[GRID.text.481]
##
## === INTERPRETATION ===
## Berdasarkan simulasi Central Limit Theorem (CLT) di atas, dapat ditarik beberapa kesimpulan penting:
## 1. BENTUK DISTRIBUSI:
## - Distribusi Eksponensial (kiri atas): Sangat miring ke kanan
## - Distribusi Uniform (tengah kiri): Bentuk persegi yang datar
## - Distribusi Bimodal (kiri bawah): Memiliki dua puncak yang berbeda
## - Namun, distribusi rata-rata sampel untuk ketiganya (kolom kanan) mendekati bentuk normal
## 2. KONSISTENSI HASIL:
## - Rata-rata empiris dari rata-rata sampel mendekati rata-rata populasi (μ)
## - Standar Error empiris mendekati Standar Error teoritis (σ/√n)
## - Rasio SE(empirical)/SE(theoretical) mendekati 1 untuk semua distribusi
## 3. IMPLIKASI PRAKTIS:
## - CLT berlaku meskipun distribusi populasi tidak normal
## - Dengan ukuran sampel n=30, distribusi rata-rata sampel sudah cukup mendekati normal
## - Ini memungkinkan penggunaan inferensi statistik (uji hipotesis, interval kepercayaan)
## - CLT adalah dasar untuk banyak metode statistik parametrik
## 4. PENTINGNYA UKURAN SAMPEL:
## - Semakin besar n, distribusi rata-rata sampel semakin mendekati normal
## - Untuk populasi yang sangat miring, mungkin diperlukan n yang lebih besar
## - Aturan praktis: n ≥ 30 biasanya cukup untuk menerapkan CLT
## === SIMULATION COMPLETE ===
##
## Plot telah disimpan sebagai 'clt_simulation_rpubs.png'
##
## === INSTRUKSI UNTUK RPUBS ===
## 1. Install package jika belum: install.packages(c('ggplot2', 'gridExtra'))
## 2. Copy seluruh kode ini ke RStudio
## 3. Jalankan semua kode (Ctrl + A lalu Ctrl + Enter)
## 4. Plot akan muncul di panel Plot dan tersimpan sebagai file PNG
## 5. Untuk Rpubs: Upload file R ini atau publish langsung dari RStudio
Reference Books:
1. Montgomery,
D.C., & Runger, G.C. (2018). Applied Statistics and
Probability for Engineers. Wiley.
Chapter: 7 - Point Estimation and Sampling Distributions
Pages: 255-290
Relevance: Excellent engineering-focused explanation of CLT with practical examples.
2. Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers & Scientists. Pearson Education.
Chapter: 8 - Fundamental Sampling Distributions
Pages: 281-320
Relevance: Comprehensive coverage of sampling distributions including CLT proofs.
3. Devore, J.L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.
Chapter: 6 - Statistics and Sampling Distributions
Pages: 270-310
Relevance: Practical applications of CLT in engineering and scientific contexts.
Video: The Central Limit Theorem Explained
What is CLT?
The Central Limit Theorem states
that the sampling distribution of sample means approaches a normal
distribution as sample size increases, regardless of population
distribution shape. This enables statistical inference about population
parameters.
\[ SE = \frac{\sigma}{\sqrt{n}} \]
library(ggplot2)
set.seed(123)
# Parameters
pop_size <- 5000
n <- 30
samples <- 1000
# Skewed population
pop_skewed <- rexp(pop_size, rate = 1)
sample_means <- numeric(samples)
for(i in 1:samples) {
sample_means[i] <- mean(sample(pop_skewed, n, replace = TRUE))
}
# Create plots
p1 <- ggplot(data.frame(x = pop_skewed), aes(x = x)) +
geom_histogram(aes(y = ..density..), bins = 40,
fill = "#8D6E63", alpha = 0.6) +
labs(title = "Population Distribution (Exponential)",
x = "Value", y = "Density") +
theme_minimal() +
theme(
plot.title = element_text(size = 12),
plot.background = element_rect(fill = "#F5F1E8"),
panel.background = element_rect(fill = "#F5F1E8")
)
p2 <- ggplot(data.frame(x = sample_means), aes(x = x)) +
geom_histogram(aes(y = ..density..), bins = 40,
fill = "#A1887F", alpha = 0.6) +
stat_function(fun = dnorm,
args = list(mean = mean(sample_means),
sd = sd(sample_means)),
color = "#5D4037", size = 0.8) +
labs(title = paste("Sample Means (n =", n, ")"),
x = "Sample Mean", y = "Density") +
theme_minimal() +
theme(
plot.title = element_text(size = 12),
plot.background = element_rect(fill = "#F5F1E8"),
panel.background = element_rect(fill = "#F5F1E8")
)
library(gridExtra)
grid.arrange(p1, p2, ncol = 2,
top = grid::textGrob(
"Central Limit Theorem Demonstration",
gp = grid::gpar(fontsize = 14, fontface = "bold", col = "#5D4037")
))
mu <- 100
sigma <- 2
n <- 36
# Calculations
se <- sigma / sqrt(n)
z <- (99.5 - mu) / se
p_val <- pnorm(z)
ci_lower <- mu - 1.96 * se
ci_upper <- mu + 1.96 * se
# Soft brown color
color <- "#8B5E3C"
output <- paste0(
"<span style='color:", color, "; font-weight:600;'>Standard Error: ", round(se, 3), " cm</span><br>",
"<span style='color:", color, "; font-weight:600;'>Z-score for 99.5 cm: ", round(z, 3), "</span><br>",
"<span style='color:", color, "; font-weight:600;'>Probability: ", round(p_val, 4), "</span><br>",
"<span style='color:", color, "; font-weight:600;'>95% CI: [",
round(ci_lower, 2), ", ", round(ci_upper, 2), "] cm</span>"
)
| Sample Size | Standard Error | 95% CI Width |
|---|---|---|
| n = 9 | 0.667 | 2.61 cm |
| n = 36 | 0.333 | 1.31 cm |
| n = 100 | 0.200 | 0.78 cm |
Essential Formulas:
## Essential
Formulas
\[ SE = \frac{\sigma}{\sqrt{n}} \]
\[ Z = \frac{\bar{X} - \mu}{SE} \]
\[ \bar{X} \pm z_{\alpha/2} \times SE \]
| Aspect | Sample Mean (\(\bar{x}\)) | Sample Proportion (\(\hat{p}\)) |
|---|---|---|
| Population Parameter | \(\mu\) (population mean) | \(p\) (population proportion) |
| Sample Statistic | \(\bar{x} = \frac{\sum x_i}{n}\) | \(\hat{p} = \frac{X}{n}\) |
| Sampling Distribution Mean | \(\mu_{\bar{x}} = \mu\) | \(\mu_{\hat{p}} = p\) |
| Standard Error Formula | \(SE = \frac{\sigma}{\sqrt{n}}\) | \(SE = \sqrt{\frac{p(1-p)}{n}}\) |
| Normality Conditions | \(n \ge 30\) (CLT) | \(np \ge 10\) and \(n(1-p) \ge 10\) |
| Data Type | Quantitative (continuous) | Categorical (binary: success/failure) |
| Distribution Shape | Normal for large \(n\) | Normal when conditions met |
## === SAMPLING DISTRIBUTION SIMULATION RESULTS ===
## Population proportion (p): 0.6
## Sample size (n): 100
## Number of simulations: 10000
## Mean of simulated p̂: 0.6006
## Theoretical Standard Error: 0.049
## Empirical Standard Error: 0.049
## === NORMALITY CONDITIONS CHECK ===
## np = 60
## n(1-p) = 40
## ✓ Conditions met: Normal approximation is valid
##
## === PROBABILITY CALCULATION ===
## P(p̂ > 0.65 ):
## Z-score: 1.021
## Probability: 0.1537
## Percentage: 15.37 %
• Monitoring defect rates in manufacturing
• Setting control
limits for proportion defective
• Determining acceptable quality
levels
• Estimating treatment success rates
• Comparing proportions
between treatment groups
• Calculating required sample size for
clinical trials
• Estimating market share
• Calculating customer satisfaction
rates
• Determining brand preference proportions
Confidence Interval for Proportion:
\[ \hat{p} \pm z_{\alpha/2} \times
\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \] Margin of Error
Formula:
\[ ME = z_{\alpha/2} \times \sqrt{\frac{p(1-p)}{n}} \]
Sample Size Determination:
\[ n = \left( \frac{z_{\alpha/2}}{ME} \right)^2 \, p(1-p) \]
• Chapter 7: Sampling Distributions
• Clear explanations with
step-by-step solutions
• Chapter 7: Sampling Distributions
• Modern approach with
emphasis on data analysis
• Chapter 5: Properties of a Random Sample
• Theoretical
foundation for sampling distributions
• Free online textbook
• Chapter 4: Foundations for Inference
• Includes R examples and practice problems
Video Review: Review: Sampling Distribution of the Sample Proportion, Binomial Distribution, Probability
The binomial distribution is one of the most fundamental discrete probability distributions in statistics. It models the number of successes in a fixed number of independent Bernoulli trials, where each trial has only two possible outcomes: success or failure.
Binomial Probability Mass Function:
\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
\]
where: \(n\) = number of trials
\(k\) = number of successes
\(p\) = probability of success on each
trial
\(\binom{n}{k} =
\frac{n!}{k!(n-k)!}\) = binomial coefficient
\[ \text{Mean of Binomial Distribution:} \quad \mu = E(X) = n p \]
Penjelasan singkat:
\(\mu\) =
mean / rata-rata distribusi
\(E(X)\) = ekspektasi dari variabel acak
\(X\)
\(n\) = jumlah percobaan
\(p\) = probabilitas sukses pada tiap
percobaan
Variance of Binomial Distribution:
\[ \sigma^2 = \operatorname{Var}(X) = np(1-p)
\]
Standard Deviation: \[
\sigma = \sqrt{np(1-p)} \]
Conditions for Binomial
Distribution:
Fixed number of trials \((n)\)
Independent trials
Two
possible outcomes (success/failure)
Constant probability of success
\((p)\)
Normal Approximation to the
Binomial:
Normal approximation is appropriate if:
\(np \ge 10\)
\(n(1-p) \ge 10\)
\[ \text{Normal Approximation:} \quad X \sim N \big( n p, \, n p (1 - p) \big) \]
Penjelasan singkat:
\(X\) =
variabel acak binomial
\(N(\mu,\sigma^2)\) = distribusi normal
dengan mean \(\mu\) dan varians \(\sigma^2\)
\(\mu = np\) = mean dari distribusi
binomial
\(\sigma^2 = np(1-p)\) =
varians dari distribusi binomial
Digunakan ketika \(n\) besar dan \(p\) tidak terlalu dekat dengan 0 atau 1
Problem: A multiple-choice test has 20 questions,
each with 5 choices. A student guesses randomly on all questions. What
is the probability that the student gets exactly 5 questions
correct?
Solution Step by Step:
Step 1: Identify parameters
\(n = 20\) (number of questions)
\(p = \frac{1}{5} = 0.2\) (probability of
guessing correctly)
\(k = 5\)
(number of correct answers wanted)
Step 2: Calculate binomial coefficient
\(\binom{20}{5} = \frac{20!}{5!15!} = \frac{20
\times 19 \times 18 \times 17 \times 16}{5 \times 4 \times 3 \times 2
\times 1} = 15504\)
Step 3: Apply binomial formula
\(P(X = 5) = \binom{20}{5} (0.2)^5
(0.8)^{15}\)
Step 4: Calculate each component
\((0.2)^5 = 0.00032\)
\((0.8)^{15} = 0.03518\)
Step 5: Multiply all components
\(P(X = 5) = 15504 \times 0.00032 \times 0.03518 =
15504 \times 0.0000112576 = 0.1746\)
Answer: The probability that
the student gets exactly 5 questions correct is 17.46%.
Ross, S.M. (2014). A First Course in
Probability. Pearson.
Chapter 4: Discrete
Random Variables
Pages: 125-160
Relevance: Excellent coverage of binomial distribution
with numerous examples and applications.
Key Takeaways from Week
11:
1. Continuous Probability Distributions:
Understanding PDF and CDF is fundamental for working with continuous
variables
2. Sampling Distributions:
The distribution of
sample statistics forms the basis for statistical inference
3. Central Limit Theorem:
One of the most
powerful results in statistics, enabling normal approximations
4. Sampling Distribution of Proportions:
Essential for categorical data analysis and proportion inference
5. Binomial Distribution:
Foundation for
understanding proportions and binary outcomes