Definisi: Statistika adalah cabang ilmu yang mencakup pengembangan dan penerapan metode pengumpulan, pengorganisasian, analisis, dan interpretasi data numerik, untuk mendukung pengambilan keputusan dalam kondisi ketidakpastian. (Walpole et al., 2012)
| Aspek | Statistika Deskriptif | Statistika Inferensia |
|---|---|---|
| Tujuan | Meringkas data yang diamati | Generalisasi ke populasi |
| Objek | Sampel atau populasi penuh | Sampel → kesimpulan populasi |
| Output | Mean, SD, histogram, boxplot | Uji hipotesis, selang kepercayaan |
| Ketidakpastian | Tidak ada | Ada (error \(\alpha\)) |
Data diklasifikasikan berdasarkan skala pengukuran yang menentukan operasi statistik yang valid:
| Skala | Contoh | Operasi Valid |
|---|---|---|
| Nominal | Jenis kelamin, warna | Modus, frekuensi |
| Ordinal | Peringkat, skala Likert | Median, persentil |
| Interval | Suhu (°C), IQ | Mean, SD (tanpa rasio) |
| Rasio | Berat, tinggi, usia | Semua operasi statistik |
\[\bar{X} = \frac{\sum_{i=1}^n X_i}{n} \qquad \text{Median} = \text{nilai tengah data terurut}\]
\[\text{Modus} = \text{nilai yang paling sering muncul}\]
\[s^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1} \qquad s = \sqrt{s^2}\]
\[\text{IQR} = Q_3 - Q_1 \qquad \text{CV} = \frac{s}{\bar{X}} \times 100\%\]
Soal 1.1 — Analisis Deskriptif Dasar
Data tinggi badan 10 mahasiswa (cm): 165, 170, 168, 172, 175, 160, 180, 169, 171, 173.
tinggi <- c(165, 170, 168, 172, 175, 160, 180, 169, 171, 173)
# Ringkasan lengkap
cat("=== Statistik Deskriptif ===\n")## === Statistik Deskriptif ===
## Mean : 170.30 cm
## Median : 170.50 cm
## Std Dev : 5.46 cm
## Varians : 29.79 cm²
## IQR : 4.50 cm
## CV : 3.20 %
## Min : 160 cm | Max: 180 cm
# Visualisasi: Histogram + Boxplot berdampingan
df <- data.frame(tinggi = tinggi)
p1 <- ggplot(df, aes(x = tinggi)) +
geom_histogram(binwidth = 4, fill = pal["teal"], color = "white", alpha = 0.85) +
geom_vline(xintercept = mean(tinggi), color = pal["coral"], linewidth = 1.2,
linetype = "dashed") +
annotate("text", x = mean(tinggi) + 1.2, y = 2.5,
label = paste("Mean =", round(mean(tinggi),1)),
color = pal["coral"], hjust = 0, size = 4) +
labs(title = "Distribusi Tinggi Badan Mahasiswa",
subtitle = "Histogram dengan garis mean",
x = "Tinggi Badan (cm)", y = "Frekuensi") +
theme_metstat()
p2 <- ggplot(df, aes(y = tinggi, x = "")) +
geom_boxplot(fill = pal["amber"], color = pal["indigo"],
outlier.color = pal["coral"], outlier.size = 3,
linewidth = 0.8, width = 0.5) +
stat_summary(fun = mean, geom = "point", shape = 18,
size = 4, color = pal["coral"]) +
labs(title = "Boxplot Tinggi Badan",
subtitle = "Berlian merah = mean",
x = "", y = "Tinggi Badan (cm)") +
theme_metstat()
grid.arrange(p1, p2, ncol = 2)Soal 1.2 — Deteksi Pencilan dengan Z-Score
Gunakan Z-score untuk mendeteksi apakah ada mahasiswa dengan tinggi badan yang tergolong pencilan (|z| > 2).
z_scores <- (tinggi - mean(tinggi)) / sd(tinggi)
hasil <- data.frame(
Mahasiswa = paste("Mhs", 1:10),
Tinggi = tinggi,
Z_Score = round(z_scores, 3),
Status = ifelse(abs(z_scores) > 2, "⚠️ Pencilan", "✅ Normal")
)
print(hasil)## Mahasiswa Tinggi Z_Score Status
## 1 Mhs 1 165 -0.971 ✅ Normal
## 2 Mhs 2 170 -0.055 ✅ Normal
## 3 Mhs 3 168 -0.421 ✅ Normal
## 4 Mhs 4 172 0.311 ✅ Normal
## 5 Mhs 5 175 0.861 ✅ Normal
## 6 Mhs 6 160 -1.887 ✅ Normal
## 7 Mhs 7 180 1.777 ✅ Normal
## 8 Mhs 8 169 -0.238 ✅ Normal
## 9 Mhs 9 171 0.128 ✅ Normal
## 10 Mhs 10 173 0.495 ✅ Normal
💡 Tips: Secara umum, nilai dengan \(|Z| > 2\) dapat dianggap sebagai pencilan lunak, dan \(|Z| > 3\) sebagai pencilan keras.
Peluang adalah ukuran kuantitatif dari ketidakpastian suatu kejadian, bernilai antara 0 (mustahil) dan 1 (pasti). (Ross, 2014)
Untuk setiap kejadian \(A\) dalam ruang contoh \(S\):
\[{}_{n}P_r = \frac{n!}{(n-r)!} \qquad \text{(urutan penting)}\]
\[{}_{n}C_r = \binom{n}{r} = \frac{n!}{r!\,(n-r)!} \qquad \text{(urutan tidak penting)}\]
Aturan Perkalian: Bila kejadian 1 dapat terjadi \(m\) cara dan kejadian 2 dapat terjadi \(n\) cara, total = \(m \times n\) cara.
\[P(A \cup B) = P(A) + P(B) - P(A \cap B)\] \[P(A^c) = 1 - P(A)\] \[P(A \cup B \cup C) = P(A)+P(B)+P(C) - P(A\cap B) - P(A\cap C) - P(B\cap C) + P(A\cap B\cap C)\]
Soal 2.1 — Permutasi (Kompetisi)
Dari 10 finalis, akan dipilih juara 1, 2, dan 3. Berapa susunan pemenang yang mungkin?
n <- 10; r <- 3
permutasi <- factorial(n) / factorial(n - r)
cat(sprintf("10P3 = %d! / %d! = %d susunan\n", n, n-r, permutasi))## 10P3 = 10! / 7! = 720 susunan
Soal 2.2 — Kombinasi (Pemilihan Tim)
Dari 12 pelamar, dipilih 4 asisten. Berapa cara pemilihan?
## 12C4 = 495 cara pemilihan
Soal 2.3 — Peluang Union (Kartu Bridge)
Peluang mendapatkan King ATAU kartu merah dari 52 kartu.
# Hitung peluang
p_king <- 4/52
p_merah <- 26/52
p_king_merah <- 2/52
p_union <- p_king + p_merah - p_king_merah
cat(sprintf("P(King) = 4/52 = %.4f\n", p_king))## P(King) = 4/52 = 0.0769
## P(Merah) = 26/52 = 0.5000
## P(King ∩ Merah) = 2/52 = 0.0385
## P(King ∪ Merah) = 0.5385
# Visualisasi diagram Venn sederhana
df_venn <- data.frame(
set = c("King\n(4/52)", "Irisan\nKing Merah\n(2/52)", "Merah\n(26/52)"),
nilai = c(2/52, 2/52, 24/52),
x = c(1, 2, 3)
)
ggplot(df_venn, aes(x = x, y = nilai, fill = set)) +
geom_col(width = 0.6, color = "white", linewidth = 0.8) +
scale_fill_manual(values = c(pal["indigo"], pal["coral"], pal["teal"])) +
scale_y_continuous(labels = percent_format()) +
labs(title = "Komposisi Peluang: King dan Kartu Merah",
subtitle = paste0("P(King ∪ Merah) = ", round(p_union, 4)),
x = "", y = "Peluang", fill = "") +
theme_metstat()Soal 2.4 — Aturan Perkalian (Diagram Pohon)
Sebuah kotak berisi 3 bola merah (M) dan 2 bola biru (B). Diambil 2 bola tanpa pengembalian. Hitung semua peluang kejadian.
# Semua kemungkinan pengambilan 2 bola (tanpa pengembalian)
hasil_pohon <- data.frame(
Pengambilan_1 = c("Merah","Merah","Merah","Merah","Biru","Biru","Biru","Biru","Biru","Biru"),
Pengambilan_2 = c("Merah","Merah","Merah","Biru","Merah","Merah","Merah","Biru","Biru","Biru"),
Peluang = c(rep(3/5 * 2/4, 3), rep(3/5 * 2/4, 1), rep(2/5 * 3/4, 3), rep(2/5 * 1/4, 3))
)
# Ringkasan
cat("P(MM) = (3/5)(2/4) =", round(3/5 * 2/4, 4), "\n")## P(MM) = (3/5)(2/4) = 0.3
## P(MB) = (3/5)(2/4) = 0.3
## P(BM) = (2/5)(3/4) = 0.3
## P(BB) = (2/5)(1/4) = 0.1
## Total = 1
Peluang bersyarat \(P(A|B)\) mengukur probabilitas kejadian \(A\) dengan asumsi kejadian \(B\) telah terjadi.
\[P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0\]
\[P(A \cap B) = P(A|B)\,P(B) = P(B|A)\,P(A)\] \[P(A \cap B \cap C) = P(A)\,P(B|A)\,P(C|A \cap B)\]
Jika \(\{H_1, H_2, \ldots, H_k\}\) merupakan partisi \(S\):
\[P(E) = \sum_{i=1}^{k} P(E|H_i)\,P(H_i)\]
\[\boxed{P(H_i|E) = \frac{P(E|H_i)\,P(H_i)}{\displaystyle\sum_{j=1}^k P(E|H_j)\,P(H_j)}}\]
Penurunan Dalil Bayes:
Dari definisi peluang bersyarat: \[P(H|E) = \frac{P(H \cap E)}{P(E)} \quad \text{dan} \quad P(E|H) = \frac{P(H \cap E)}{P(H)}\] Maka \(P(H \cap E) = P(E|H)\,P(H)\). Substitusi ke rumus pertama: \[P(H|E) = \frac{P(E|H)\,P(H)}{P(E)}\]
Soal 3.1 — Diagnosa Medis (Dalil Bayes)
Tes COVID: sensitivitas 95%, spesifisitas 99%, prevalensi 2%. Berapa peluang benar-benar positif jika hasil tes positif?
# Parameter
prev <- 0.02 # P(Sakit)
sens <- 0.95 # P(+ | Sakit) – sensitivitas
spec <- 0.99 # P(- | Sehat) – spesifisitas
fpr <- 1 - spec # P(+ | Sehat) – false positive rate
p_pos <- sens * prev + fpr * (1 - prev) # Hukum peluang total
ppv <- (sens * prev) / p_pos # Positive Predictive Value
npv <- (spec * (1-prev)) / (1 - p_pos) # Negative Predictive Value
cat("=== Analisis Tes Diagnostik ===\n")## === Analisis Tes Diagnostik ===
## P(Sakit) = 0.0200
## P(+ | Sakit) sensitivitas = 0.9500
## P(+ | Sehat) false pos rate = 0.0100
## P(Tes +) peluang total = 0.0288
## PPV: P(Sakit | Tes +) = 0.6597 (66.0%)
## NPV: P(Sehat | Tes -) = 0.9990 (99.9%)
# Visualisasi pengaruh prevalensi terhadap PPV
prev_seq <- seq(0.001, 0.3, length = 200)
ppv_seq <- (sens * prev_seq) / (sens*prev_seq + fpr*(1-prev_seq))
df_ppv <- data.frame(prevalensi = prev_seq * 100, PPV = ppv_seq * 100)
ggplot(df_ppv, aes(x = prevalensi, y = PPV)) +
geom_line(color = pal["teal"], linewidth = 1.5) +
geom_vline(xintercept = prev*100, color = pal["coral"],
linetype = "dashed", linewidth = 1) +
geom_point(x = prev*100, y = ppv*100, size = 4, color = pal["coral"]) +
annotate("text", x = prev*100 + 1.5, y = ppv*100 - 5,
label = sprintf("Prev = %.0f%%\nPPV = %.1f%%", prev*100, ppv*100),
color = pal["coral"], hjust = 0, size = 3.5) +
scale_x_continuous(labels = function(x) paste0(x, "%")) +
scale_y_continuous(labels = function(x) paste0(x, "%")) +
labs(title = "Pengaruh Prevalensi Terhadap Nilai Prediktif Positif (PPV)",
subtitle = "Sensitivitas = 95%, Spesifisitas = 99%",
x = "Prevalensi Penyakit (%)", y = "PPV (%)") +
theme_metstat()Soal 3.2 — Kejadian Independen (Dadu)
Dua dadu dilempar. A: dadu 1 = 4. B: jumlah = 7. Apakah A dan B independen?
S <- expand.grid(d1 = 1:6, d2 = 1:6)
n_S <- nrow(S)
p_A <- nrow(subset(S, d1 == 4)) / n_S
p_B <- nrow(subset(S, d1 + d2 == 7)) / n_S
p_A_B <- nrow(subset(S, d1 == 4 & d1 + d2 == 7)) / n_S
p_AxB <- p_A * p_B
cat(sprintf("P(A) = %.4f\n", p_A))## P(A) = 0.1667
## P(B) = 0.1667
## P(A ∩ B) = 0.0278
## P(A)×P(B) = 0.0278
cat(sprintf("\nKesimpulan: %s\n",
ifelse(round(p_A_B,4)==round(p_AxB,4), "✅ A dan B INDEPENDEN", "❌ A dan B tidak independen")))##
## Kesimpulan: ✅ A dan B INDEPENDEN
Peubah acak adalah fungsi \(X: S \rightarrow \mathbb{R}\) yang memetakan tiap elemen ruang contoh ke bilangan riil.
PMF: \(f(x) = P(X = x) \geq 0\), dengan \(\sum_x f(x) = 1\)
CDF: \(F(x) = P(X \leq x) = \sum_{t \leq x} f(t)\)
Nilai Harapan dan Varians: \[E(X) = \sum_x x\,f(x) \qquad Var(X) = E(X^2) - [E(X)]^2\]
\[P(X=x) = \binom{n}{x}p^x(1-p)^{n-x}, \quad x=0,1,\ldots,n\]
\[E(X) = np \qquad Var(X) = np(1-p)\]
Syarat (BINS): Binary, Independent, Number fixed, Same \(p\).
\[P(X=x) = \frac{e^{-\lambda}\lambda^x}{x!}, \quad x=0,1,2,\ldots\]
\[E(X) = Var(X) = \lambda\]
Poisson sebagai aproksimasi Binomial bila \(n\) besar dan \(p\) kecil (\(np = \lambda\)).
Populasi \(N\) dengan \(K\) sukses, sampel \(n\):
\[P(X=x) = \frac{\binom{K}{x}\binom{N-K}{n-x}}{\binom{N}{n}}, \quad x = \max(0, n+K-N), \ldots, \min(n,K)\]
\[E(X) = \frac{nK}{N} \qquad Var(X) = n\frac{K}{N}\frac{N-K}{N}\frac{N-n}{N-1}\]
Soal 4.1 — Distribusi Binomial (Visualisasi Lengkap)
Mesin memproduksi komponen dengan tingkat cacat 10%. Jika diambil 20 komponen, visualisasikan distribusinya.
n <- 20; p <- 0.10
x <- 0:n
pmf <- dbinom(x, n, p)
cdf <- pbinom(x, n, p)
# PMF
df_bin <- data.frame(x = x, pmf = pmf, cdf = cdf)
p1 <- ggplot(df_bin, aes(x = factor(x), y = pmf)) +
geom_col(fill = ifelse(x == 3, pal["coral"], pal["teal"]),
color = "white", alpha = 0.85) +
geom_text(aes(label = ifelse(round(pmf,3)>0.01, round(pmf,3), "")),
vjust = -0.3, size = 2.8, color = "#333") +
labs(title = "PMF — Binomial(20, 0.10)",
subtitle = "Batang merah: P(X = 3)",
x = "Jumlah Komponen Cacat (X)", y = "P(X = x)") +
theme_metstat() +
theme(axis.text.x = element_text(size = 8))
p2 <- ggplot(df_bin, aes(x = x, y = cdf)) +
geom_step(color = pal["indigo"], linewidth = 1.2) +
geom_point(color = pal["coral"], size = 2) +
labs(title = "CDF — Binomial(20, 0.10)",
x = "x", y = "P(X ≤ x)") +
theme_metstat()
grid.arrange(p1, p2, ncol = 2)##
## P(X = 3) = 0.1901
## P(X ≤ 2) = 0.6769
## P(X ≥ 5) = 0.0432
## E(X) = np = 2.0
## SD(X) = 1.3416
Soal 4.2 — Distribusi Poisson (Bank)
Rata-rata 5 nasabah per jam. Visualisasikan dan hitung \(P(X > 8)\).
lambda <- 5
x <- 0:15
pmf_poi <- dpois(x, lambda)
df_poi <- data.frame(x = x, pmf = pmf_poi,
warna = ifelse(x > 8, "X > 8", "X ≤ 8"))
ggplot(df_poi, aes(x = factor(x), y = pmf, fill = warna)) +
geom_col(color = "white", alpha = 0.85) +
scale_fill_manual(values = c("X > 8" = pal["coral"], "X ≤ 8" = pal["teal"])) +
labs(title = paste0("PMF — Poisson(λ = ", lambda, ")"),
subtitle = sprintf("P(X > 8) = %.4f (area merah)", 1 - ppois(8, lambda)),
x = "Jumlah Nasabah (X)", y = "P(X = x)", fill = "") +
theme_metstat()## P(X > 8) = 1 - P(X ≤ 8) = 0.0681
Soal 4.3 — Sebaran Hipergeometrik
Dari 20 produk (5 cacat), diambil 8 tanpa pengembalian. \(P(X = 2)\)?
N <- 20; K <- 5; n <- 8
x_val <- 0:min(n, K)
pmf_hyp <- dhyper(x_val, K, N - K, n)
df_hyp <- data.frame(x = x_val, pmf = pmf_hyp)
cat("Distribusi Hipergeometrik(N=20, K=5, n=8):\n")## Distribusi Hipergeometrik(N=20, K=5, n=8):
## x pmf
## 1 0 0.051083591
## 2 1 0.255417957
## 3 2 0.397316821
## 4 3 0.238390093
## 5 4 0.054179567
## 6 5 0.003611971
##
## P(X = 2) = 0.3973
## E(X) = nK/N = 2.0000
Peubah acak kontinu \(X\) memiliki
fungsi kepekatan peluang (PDF) \(f(x)\) di mana:
\(P(a < X < b) = \int_a^b
f(x)\,dx\) dan \(\int_{-\infty}^{\infty} f(x)\,dx = 1\)
\[f(x;\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left[-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right]\]
Sifat penting: simetris, bell-shaped, \(E(X)=\mu\), \(Var(X)=\sigma^2\).
Aturan Empiris: - \(P(\mu
- \sigma < X < \mu + \sigma) \approx 68.27\%\)
- \(P(\mu - 2\sigma < X < \mu + 2\sigma)
\approx 95.45\%\)
- \(P(\mu - 3\sigma < X < \mu + 3\sigma)
\approx 99.73\%\)
\[Z = \frac{X - \mu}{\sigma} \sim N(0,1)\]
Penurunan: \(E(Z) = \frac{E(X)-\mu}{\sigma} = 0\) dan \(Var(Z) = \frac{Var(X)}{\sigma^2} = 1\).
\[f(x) = \lambda e^{-\lambda x}, \quad x \geq 0\] \[E(X) = \frac{1}{\lambda} \qquad Var(X) = \frac{1}{\lambda^2}\] Sifat memoryless: \(P(X > s+t \mid X > s) = P(X > t)\).
Gamma \(\Gamma(\alpha, \beta)\): \(f(x) = \frac{1}{\beta^\alpha\Gamma(\alpha)}x^{\alpha-1}e^{-x/\beta}\), \(E(X)=\alpha\beta\)
Chi-Square \(\chi^2(\nu)\): kasus khusus Gamma dengan
\(\alpha=\nu/2\), \(\beta=2\).
Digunakan dalam uji hipotesis untuk varians dan tabel kontingensi.
Soal 5.1 — Distribusi Normal Lengkap (Nilai Ujian)
Nilai ujian \(\sim N(75, 10^2)\). Visualisasikan aturan empiris.
mu <- 75; sigma <- 10
x <- seq(mu - 4*sigma, mu + 4*sigma, length = 500)
y <- dnorm(x, mu, sigma)
df_norm <- data.frame(x = x, y = y)
ggplot(df_norm, aes(x, y)) +
# Area 3 sigma
geom_area(data = subset(df_norm, x >= mu - 3*sigma & x <= mu + 3*sigma),
aes(x, y), fill = "#cce5ff", alpha = 0.6) +
# Area 2 sigma
geom_area(data = subset(df_norm, x >= mu - 2*sigma & x <= mu + 2*sigma),
aes(x, y), fill = "#99ccff", alpha = 0.6) +
# Area 1 sigma
geom_area(data = subset(df_norm, x >= mu - sigma & x <= mu + sigma),
aes(x, y), fill = pal["teal"], alpha = 0.5) +
geom_line(color = pal["indigo"], linewidth = 1.5) +
geom_vline(xintercept = mu, color = pal["coral"], linetype = "dashed", linewidth = 1) +
annotate("text", x = mu + 0.5, y = 0.041, label = "μ = 75",
color = pal["coral"], hjust = 0, size = 4) +
annotate("text", x = 56, y = 0.015, label = "68.27%\n(±1σ)", size = 3.5,
color = "white", fontface = "bold") +
annotate("text", x = 47, y = 0.005, label = "95.45%\n(±2σ)", size = 3, color = "#446699") +
annotate("text", x = 39, y = 0.001, label = "99.73% (±3σ)", size = 3, color = "#335588") +
labs(title = "Distribusi Normal N(75, 100) — Aturan Empiris",
x = "Nilai Ujian", y = "Kepekatan Peluang f(x)") +
theme_metstat()## === Perhitungan Peluang N(75, 100) ===
## P(80 < X < 90) = 0.2417
## P(X < 60) = 0.0668
## P(X > 90) = 0.0668
## Persentil ke-95 = 91.45
## Persentil ke-5 = 58.55
Soal 5.2 — Perbandingan Tiga Distribusi Kontinu
Bandingkan bentuk distribusi Normal, Eksponensial, dan Uniform secara visual.
x1 <- seq(-4, 4, length = 300)
x2 <- seq(0, 5, length = 300)
x3 <- seq(0, 1, length = 300)
df_dist <- rbind(
data.frame(x = x1, y = dnorm(x1), dist = "Normal(0,1)"),
data.frame(x = x2, y = dexp(x2, 1), dist = "Eksponensial(λ=1)"),
data.frame(x = x3, y = dunif(x3, 0,1), dist = "Uniform(0,1)")
)
ggplot(df_dist, aes(x, y, color = dist, fill = dist)) +
geom_area(alpha = 0.2) +
geom_line(linewidth = 1.3) +
facet_wrap(~dist, scales = "free") +
scale_color_manual(values = c(pal["teal"], pal["coral"], pal["purple"])) +
scale_fill_manual(values = c(pal["teal"], pal["coral"], pal["purple"])) +
labs(title = "Perbandingan Tiga Distribusi Kontinu Utama",
x = "x", y = "f(x)") +
theme_metstat() +
theme(legend.position = "none",
strip.text = element_text(face = "bold", color = "#3d405b"))Sebaran percontohan adalah distribusi peluang suatu statistik (misal \(\bar{X}\)) yang dihitung dari semua kemungkinan sampel berukuran \(n\) dari populasi.
| Parameter Populasi | Statistik Sampel | |
|---|---|---|
| Rata-rata | \(\mu\) | \(\bar{X}\) |
| Varians | \(\sigma^2\) | \(s^2\) |
| Proporsi | \(p\) | \(\hat{p}\) |
| Korelasi | \(\rho\) | \(r\) |
Jika \(X_1, X_2, \ldots, X_n\) adalah sampel acak IID dari populasi dengan \(E(X)=\mu\) dan \(Var(X)=\sigma^2 < \infty\), maka untuk \(n\) besar:
\[\bar{X} \xrightarrow{d} N\!\left(\mu,\, \frac{\sigma^2}{n}\right) \quad \text{atau} \quad Z = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1)\]
Praktisnya: cukup \(n \geq 30\) untuk distribusi tidak terlalu miring.
Penurunan \(Var(\bar{X}) = \sigma^2/n\):
Karena \(\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\) dan \(X_i\) independen: \[Var(\bar{X}) = \frac{1}{n^2}\sum_{i=1}^n Var(X_i) = \frac{1}{n^2}(n\sigma^2) = \frac{\sigma^2}{n}\] Akarnya, \(SE(\bar{X}) = \sigma/\sqrt{n}\), disebut standar galat.
Distribusi \(t\) dengan \(\nu = n-1\) derajat bebas (digunakan bila \(\sigma\) tidak diketahui):
\[T = \frac{\bar{X} - \mu}{s/\sqrt{n}} \sim t_{n-1}\]
Distribusi \(\chi^2\) (Chi-Square) dengan \(\nu\) derajat bebas:
\[\chi^2 = \frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}\]
Soal 6.1 — Simulasi Dalil Limit Pusat
Populasi Uniform(0,1): ambil 1000 sampel berukuran \(n=5\), \(n=30\), \(n=100\). Tunjukkan histogramnya.
set.seed(42)
sim_clm <- function(n_size, n_sim = 2000) {
replicate(n_sim, mean(runif(n_size, 0, 1)))
}
df_clm <- rbind(
data.frame(xbar = sim_clm(5), n = "n = 5"),
data.frame(xbar = sim_clm(30), n = "n = 30"),
data.frame(xbar = sim_clm(100), n = "n = 100")
)
df_clm$n <- factor(df_clm$n, levels = c("n = 5","n = 30","n = 100"))
ggplot(df_clm, aes(x = xbar, fill = n)) +
geom_histogram(aes(y = after_stat(density)), bins = 40,
color = "white", alpha = 0.85) +
stat_function(fun = dnorm,
args = list(mean = 0.5, sd = sqrt(1/12/as.numeric(sub("n = ","",df_clm$n[1])))),
color = pal["indigo"], linewidth = 1) +
facet_wrap(~n, ncol = 3, scales = "free_y") +
scale_fill_manual(values = c(pal["amber"], pal["teal"], pal["coral"])) +
labs(title = "Simulasi Dalil Limit Pusat — Populasi Uniform(0,1)",
subtitle = "Semakin besar n, distribusi rata-rata sampel mendekati Normal",
x = "Rata-rata Sampel (X̄)", y = "Kepekatan") +
theme_metstat() +
theme(legend.position = "none",
strip.text = element_text(face = "bold", size = 12))Soal 6.2 — Pengaruh Ukuran Sampel Terhadap Standard Error
sigma <- 50
n_vals <- 1:500
se_vals <- sigma / sqrt(n_vals)
df_se <- data.frame(n = n_vals, se = se_vals)
ggplot(df_se, aes(x = n, y = se)) +
geom_line(color = pal["teal"], linewidth = 1.2) +
geom_hline(yintercept = c(5, 10), color = pal["coral"],
linetype = "dashed", alpha = 0.7) +
annotate("text", x = 480, y = 6, label = "SE = 5 (n ≈ 100)",
color = pal["coral"], size = 3.5, hjust = 1) +
annotate("text", x = 480, y = 11, label = "SE = 10 (n ≈ 25)",
color = pal["coral"], size = 3.5, hjust = 1) +
labs(title = "Standard Error vs Ukuran Sampel (σ = 50)",
subtitle = "SE turun cepat di awal, lalu semakin lambat (law of diminishing returns)",
x = "Ukuran Sampel (n)", y = "Standard Error (SE)") +
theme_metstat()##
## Tabel Standard Error:
## n SE
## 1 10 15.811
## 2 30 9.129
## 3 50 7.071
## 4 100 5.000
## 5 200 3.536
## 6 500 2.236
## 7 1000 1.581
Soal 6.3 — Perbandingan Distribusi Normal vs t-Student
z <- seq(-4, 4, length = 400)
df_zt <- rbind(
data.frame(z = z, y = dnorm(z), dist = "Normal(0,1)"),
data.frame(z = z, y = dt(z, df=3), dist = "t (df = 3)"),
data.frame(z = z, y = dt(z, df=10), dist = "t (df = 10)"),
data.frame(z = z, y = dt(z, df=30), dist = "t (df = 30)")
)
ggplot(df_zt, aes(x = z, y = y, color = dist, linetype = dist)) +
geom_line(linewidth = 1.1) +
scale_color_manual(values = c(pal["indigo"], pal["coral"], pal["teal"], pal["amber"])) +
scale_linetype_manual(values = c("solid","dashed","dotdash","dotted")) +
labs(title = "Perbandingan Distribusi Normal Baku vs t-Student",
subtitle = "Semakin besar df, distribusi t mendekati Normal",
x = "z / t", y = "Kepekatan", color = "Distribusi", linetype = "Distribusi") +
theme_metstat()Pendugaan parameter adalah proses menggunakan statistik sampel untuk mengestimasi parameter populasi yang tidak diketahui.
| Sifat | Definisi |
|---|---|
| Tak Bias | \(E(\hat\theta) = \theta\) |
| Efisien | MSE terkecil di antara semua penduga tak bias |
| Konsisten | \(\hat\theta \xrightarrow{p} \theta\) bila \(n \to \infty\) |
| Cukup | Menggunakan semua informasi dalam sampel tentang \(\theta\) |
Bila \(\sigma\) diketahui (statistik Z): \[\bar{X} \pm Z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\]
Bila \(\sigma\) tidak diketahui (statistik t, \(df = n-1\)): \[\bar{X} \pm t_{\alpha/2,\,n-1}\frac{s}{\sqrt{n}}\]
Margin of Error: \(E = Z_{\alpha/2}\dfrac{\sigma}{\sqrt{n}}\) atau \(E = t_{\alpha/2,n-1}\dfrac{s}{\sqrt{n}}\)
\[\hat{p} \pm Z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
Berlaku bila \(n\hat{p} \geq 10\) dan \(n(1-\hat{p}) \geq 10\).
Untuk estimasi rata-rata dengan margin error \(E\): \[n \geq \left(\frac{Z_{\alpha/2}\,\sigma}{E}\right)^2\]
Untuk estimasi proporsi (bila \(p\) tidak diketahui, gunakan \(p=0.5\)): \[n \geq \frac{Z_{\alpha/2}^2\,p(1-p)}{E^2}\]
Soal 7.1 — Selang Kepercayaan Z (Batang Besi)
\(n=50\), \(\bar{X}=150\) kg, \(\sigma=8\) kg. CI 90%, 95%, 99%.
n <- 50; x_bar <- 150; sigma <- 8
alphas <- c(0.10, 0.05, 0.01)
levels <- c("90%","95%","99%")
ci_df <- data.frame(
level = levels,
alpha = alphas,
z = qnorm(1 - alphas/2),
lower = x_bar - qnorm(1 - alphas/2) * sigma/sqrt(n),
upper = x_bar + qnorm(1 - alphas/2) * sigma/sqrt(n)
)
ci_df$width <- ci_df$upper - ci_df$lower
cat("=== Selang Kepercayaan (σ diketahui) ===\n")## === Selang Kepercayaan (σ diketahui) ===
## level z lower upper width
## 1 90% 1.644854 148.1391 151.8609 3.721879
## 2 95% 1.959964 147.7826 152.2174 4.434892
## 3 99% 2.575829 147.0858 152.9142 5.828436
# Visualisasi CI
ggplot(ci_df, aes(y = level, x = x_bar)) +
geom_errorbarh(aes(xmin = lower, xmax = upper, color = level),
height = 0.3, linewidth = 1.5) +
geom_point(color = pal["coral"], size = 4) +
scale_color_manual(values = c(pal["teal"], pal["amber"], pal["coral"])) +
labs(title = "Visualisasi Selang Kepercayaan untuk μ",
subtitle = "Titik merah = estimasi titik (x̄ = 150 kg)",
x = "Kekuatan Batang Besi (kg)", y = "Tingkat Kepercayaan",
color = "") +
theme_metstat()Soal 7.2 — Selang Kepercayaan t (Isi Botol)
\(n=10\), data isi botol. CI 99% dengan \(\sigma\) tidak diketahui.
isi <- c(502, 498, 501, 497, 499, 500, 503, 496, 501, 502)
n <- length(isi)
x_bar <- mean(isi)
s <- sd(isi)
t_crit <- qt(0.995, df = n - 1) # alpha = 0.01, two-tailed
moe <- t_crit * s / sqrt(n)
cat("=== Selang Kepercayaan t (σ tidak diketahui) ===\n")## === Selang Kepercayaan t (σ tidak diketahui) ===
## n = 10
## x̄ = 499.90 ml
## s = 2.3310 ml
## t(0.005, 9) = 3.2498
## Margin of Error = 2.3955
## CI 99%: [497.50, 502.30] ml
# Menggunakan fungsi bawaan R untuk konfirmasi
ci_auto <- t.test(isi, conf.level = 0.99)
cat(sprintf("Konfirmasi (t.test): [%.4f, %.4f]\n",
ci_auto$conf.int[1], ci_auto$conf.int[2]))## Konfirmasi (t.test): [497.5045, 502.2955]
Soal 7.3 — Selang Kepercayaan Proporsi
Dari 200 mahasiswa, 130 lulus tepat waktu. Hitung CI 95% untuk proporsi populasi.
n_tot <- 200
x_suk <- 130
p_hat <- x_suk / n_tot
z_95 <- qnorm(0.975)
se_p <- sqrt(p_hat * (1-p_hat) / n_tot)
moe_p <- z_95 * se_p
cat("=== CI untuk Proporsi ===\n")## === CI untuk Proporsi ===
## p̂ = 130/200 = 0.6500
## SE(p̂) = 0.0337
## CI 95%: [0.5839, 0.7161]
cat(sprintf("Artinya: antara %.1f%% hingga %.1f%% lulus tepat waktu.\n",
(p_hat - moe_p)*100, (p_hat + moe_p)*100))## Artinya: antara 58.4% hingga 71.6% lulus tepat waktu.
# Visualisasi dengan berbagai ukuran sampel
n_seq <- seq(50, 1000, by = 10)
moe_seq <- qnorm(0.975) * sqrt(p_hat*(1-p_hat)/n_seq)
ggplot(data.frame(n = n_seq, moe = moe_seq), aes(x = n, y = moe*100)) +
geom_line(color = pal["teal"], linewidth = 1.3) +
geom_vline(xintercept = 200, color = pal["coral"], linetype = "dashed") +
geom_point(x = 200, y = moe_p*100, size = 4, color = pal["coral"]) +
annotate("text", x = 220, y = moe_p*100 + 0.5,
label = sprintf("n=200\nMoE=%.1f%%", moe_p*100),
color = pal["coral"], hjust = 0, size = 3.5) +
labs(title = "Hubungan Ukuran Sampel dan Margin of Error (Proporsi)",
subtitle = "p̂ = 0.65, α = 0.05",
x = "Ukuran Sampel (n)", y = "Margin of Error (%)") +
theme_metstat()Soal 7.4 — Penentuan Ukuran Sampel Minimum
Riset ingin mengestimasi rata-rata pendapatan dengan error maks. Rp 500.000 dan kepercayaan 95%. Diketahui \(\sigma \approx\) Rp 3.000.000.
sigma_pop <- 3000000
E_target <- 500000
z_val <- qnorm(0.975) # 95% CI
n_min <- ceiling((z_val * sigma_pop / E_target)^2)
cat(sprintf("Z(α/2) = %.4f\n", z_val))## Z(α/2) = 1.9600
## σ = Rp 3e+06
## E (toleransi)= Rp 5e+05
## n minimum = ⌈(1.9600 × 3000000 / 500000)²⌉ = 139 orang
Pengujian hipotesis adalah prosedur statistika untuk mengambil keputusan tentang suatu pernyataan mengenai parameter populasi berdasarkan data sampel.
Langkah Pengujian:
Jenis Kesalahan:
| \(H_0\) Benar | \(H_0\) Salah | |
|---|---|---|
| Tolak \(H_0\) | Galat Tipe I (\(\alpha\)) | Keputusan Benar (Power = \(1-\beta\)) |
| Gagal Tolak \(H_0\) | Keputusan Benar | Galat Tipe II (\(\beta\)) |
Statistik Uji untuk \(\mu\):
Bila \(\sigma\) diketahui: \(Z = \dfrac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\)
Bila \(\sigma\) tidak diketahui: \(T = \dfrac{\bar{X}-\mu_0}{s/\sqrt{n}} \sim t_{n-1}\)
\(p\)-value adalah probabilitas mendapatkan hasil setidak sama ekstremnya dengan observasi, jika \(H_0\) benar.
Soal 8.1 — Uji t Satu Sampel
Standar isi botol 500 ml. Sampel 10 botol: 502, 498, 501, 497, 499, 500, 503, 496, 501, 502. Apakah isi botol sesuai standar (\(\alpha = 0.05\))?
isi <- c(502, 498, 501, 497, 499, 500, 503, 496, 501, 502)
# H0: mu = 500; H1: mu ≠ 500 (two-tailed)
hasil_uji <- t.test(isi, mu = 500, alternative = "two.sided", conf.level = 0.95)
cat("=== Uji t Satu Sampel ===\n")## === Uji t Satu Sampel ===
## H0: μ = 500 ml
## H1: μ ≠ 500 ml
## x̄ = 499.90
## s = 2.3310
## t-hitung = -0.1357
## df = 9
## p-value = 0.8951
cat(sprintf("Keputusan: %s H0 pada α = 0.05\n",
ifelse(hasil_uji$p.value < 0.05, "TOLAK", "GAGAL TOLAK")))## Keputusan: GAGAL TOLAK H0 pada α = 0.05
## Kesimpulan: Tidak cukup bukti bahwa isi botol berbeda dari 500 ml.
# Visualisasi: distribusi t dan daerah penolakan
t_val <- as.numeric(hasil_uji$statistic)
df_t <- hasil_uji$parameter
t_seq <- seq(-4, 4, length = 400)
t_crit_val <- qt(0.975, df_t)
df_vis <- data.frame(t = t_seq, y = dt(t_seq, df_t))
ggplot(df_vis, aes(t, y)) +
geom_area(data = subset(df_vis, t < -t_crit_val),
fill = pal["coral"], alpha = 0.4) +
geom_area(data = subset(df_vis, t > t_crit_val),
fill = pal["coral"], alpha = 0.4) +
geom_line(color = pal["indigo"], linewidth = 1.3) +
geom_vline(xintercept = c(-t_crit_val, t_crit_val),
color = pal["coral"], linetype = "dashed", linewidth = 1) +
geom_vline(xintercept = t_val, color = pal["teal"],
linetype = "solid", linewidth = 1.2) +
annotate("text", x = t_val + 0.2, y = 0.3,
label = sprintf("t = %.2f", t_val),
color = pal["teal"], hjust = 0, size = 4) +
annotate("text", x = 3.5, y = 0.05,
label = sprintf("Tolak\nα/2=%.3f", 0.025),
color = pal["coral"], size = 3.5) +
annotate("text", x = -3.5, y = 0.05,
label = sprintf("Tolak\nα/2=%.3f", 0.025),
color = pal["coral"], size = 3.5) +
labs(title = sprintf("Distribusi t(%d) — Uji Dua Arah (α = 0.05)", df_t),
subtitle = "Area merah = daerah penolakan | Garis hijau = t hitung",
x = "Nilai t", y = "f(t)") +
theme_metstat()Soal 8.2 — Uji Proporsi
Klaim: 60% pelanggan puas. Survei 150 pelanggan: 82 puas. Uji klaim ini (\(\alpha = 0.05\)).
# H0: p = 0.60; H1: p ≠ 0.60
n_survey <- 150; x_puas <- 82
p0 <- 0.60; p_hat_sur <- x_puas / n_survey
z_stat <- (p_hat_sur - p0) / sqrt(p0 * (1 - p0) / n_survey)
p_val <- 2 * pnorm(-abs(z_stat))
cat("=== Uji Proporsi Satu Sampel ===\n")## === Uji Proporsi Satu Sampel ===
## H0: p = 0.60 | H1: p ≠ 0.60
## p̂ = 82/150 = 0.5467
## Z hitung = -1.3333
## p-value = 0.1824
## Keputusan: GAGAL TOLAK H0 pada α = 0.05
# Buat diagram batang ringkasan topik
topik <- data.frame(
bab = c("Deskriptif","Peluang","Bayes","Diskret","Kontinu","Sampling","Pendugaan","Uji Hipotesis"),
level = c(1, 2, 2, 3, 3, 4, 5, 5),
ket = c("Statistik Dasar","Kaidah Peluang","Peluang Bersyarat",
"Distribusi Diskret","Distribusi Kontinu","Sebaran Percontohan",
"Estimasi Parameter","Uji Hipotesis")
)
topik$bab <- factor(topik$bab, levels = topik$bab)
ggplot(topik, aes(x = bab, y = level, fill = factor(level))) +
geom_col(color = "white", linewidth = 0.8, width = 0.7) +
geom_text(aes(label = ket), vjust = -0.4, size = 3, color = "#333", fontface = "bold") +
scale_fill_manual(values = c("#d4f1ee","#aae6df","#6dcfc5","#2a9d8f",
"#1e7a71","#145c55","#0d3d38","#061f1d")) +
scale_y_continuous(breaks = 1:5, labels = c("Bab 1","Bab 2-3","Bab 4-5","Bab 6","Bab 7-8")) +
labs(title = "Hierarki Materi Statistika Matematika",
subtitle = "Setiap bab membangun fondasi untuk bab berikutnya",
x = "Topik", y = "Level Kompleksitas") +
theme_metstat() +
theme(legend.position = "none",
axis.text.x = element_text(angle = 15, hjust = 1, size = 9))tbl <- data.frame(
Distribusi = c("Binomial","Poisson","Hipergeometrik","Normal","Eksponensial","t-Student","Chi-Square"),
Parameter = c("n, p","λ","N, K, n","μ, σ²","λ","ν=n-1","ν=n-1"),
EX = c("np","λ","nK/N","μ","1/λ","0","ν"),
VarX = c("np(1-p)","λ","...","σ²","1/λ²","ν/(ν-2)","2ν"),
Penggunaan = c("Percobaan Bernoulli berulang","Kejadian langka dalam interval",
"Sampling tanpa pengembalian","Variabel kontinu simetris",
"Waktu antar kejadian","CI/uji dengan σ tdk diketahui",
"Uji varians & kecocokan")
)
knitr::kable(tbl, caption = "Ringkasan Distribusi Probabilitas Utama",
col.names = c("Distribusi","Parameter","E(X)","Var(X)","Penggunaan"))| Distribusi | Parameter | E(X) | Var(X) | Penggunaan |
|---|---|---|---|---|
| Binomial | n, p | np | np(1-p) | Percobaan Bernoulli berulang |
| Poisson | λ | λ | λ | Kejadian langka dalam interval |
| Hipergeometrik | N, K, n | nK/N | … | Sampling tanpa pengembalian |
| Normal | μ, σ² | μ | σ² | Variabel kontinu simetris |
| Eksponensial | λ | 1/λ | 1/λ² | Waktu antar kejadian |
| t-Student | ν=n-1 | 0 | ν/(ν-2) | CI/uji dengan σ tdk diketahui |
| Chi-Square | ν=n-1 | ν | 2ν | Uji varians & kecocokan |
- Walpole, R.E., Myers, R.H., Myers, S.L., & Ye, K. (2012). Probability & Statistics for Engineers and Scientists (9th ed.). Pearson.
- Ross, S.M. (2014). Introduction to Probability and Statistics for Engineers and Scientists (5th ed.). Academic Press.
- DeGroot, M.H., & Schervish, M.J. (2012). Probability and Statistics (4th ed.). Addison-Wesley.
- Casella, G., & Berger, R.L. (2002). Statistical Inference (2nd ed.). Duxbury Press.
Dokumen ini dibuat dengan R Markdown oleh Nida Khairunnissa. Seluruh kode R dapat dijalankan ulang secara langsung.