Tabel kontingensi merupakan salah satu alat analisis paling fundamental dalam statistika inferensial, khususnya untuk mempelajari hubungan antara dua variabel kategorikal. Dalam kehidupan nyata, banyak pertanyaan penelitian yang berkaitan dengan apakah dua faktor kategorikal saling berkaitan — misalnya, apakah kebiasaan tertentu berhubungan dengan suatu penyakit, atau apakah karakteristik demografis memengaruhi preferensi seseorang.
Tugas ini bertujuan untuk melakukan inferensi statistik secara menyeluruh pada tabel kontingensi dua arah menggunakan dua kasus nyata yang berbeda dimensi. Analisis mencakup estimasi titik, interval kepercayaan, berbagai ukuran asosiasi, serta uji hipotesis formal.
Kasus 1 — Tabel \(2 \times 2\): Kebiasaan Merokok dan Kanker Paru
Kasus ini mengeksplorasi hubungan antara dua variabel biner: status merokok (Smoker vs Non-Smoker) dan kejadian kanker paru (Cancer (+) vs Control (−)). Ukuran asosiasi yang dihitung meliputi Risk Difference (RD), Relative Risk (RR), dan Odds Ratio (OR), dilengkapi dengan empat uji hipotesis: uji dua proporsi, chi-square, likelihood ratio, dan Fisher exact test.
Kasus 2 — Tabel \(2 \times 3\): Gender dan Identifikasi Partai Politik
Kasus ini mengeksplorasi hubungan antara gender (Female vs Male) dan preferensi partai politik (Democrat, Republican, Independent). Analisis meliputi uji chi-square independensi, residual Pearson dan standardized residual, serta partisi chi-square untuk mengidentifikasi kontras mana yang paling berkontribusi pada ketidakindependenan.
Setiap kasus diselesaikan dengan dua cara yang saling melengkapi:
chisq.test, fisher.test,
prop.test, dll.) untuk mengkonfirmasi kebenaran perhitungan
manual dan menghasilkan output yang lebih lengkap.Taraf signifikansi yang digunakan pada seluruh pengujian adalah \(\alpha = 0{,}05\).
data_k1 <- matrix(
c(688, 650, 21, 59),
nrow = 2, byrow = TRUE,
dimnames = list(
"Status Merokok" = c("Smoker", "Non-Smoker"),
"Status" = c("Cancer (+)", "Control (-)")
)
)
addmargins(data_k1)## Status
## Status Merokok Cancer (+) Control (-) Sum
## Smoker 688 650 1338
## Non-Smoker 21 59 80
## Sum 709 709 1418
Notasi sel tabel \(2 \times 2\):
| Cancer (+) | Control (−) | Total | |
|---|---|---|---|
| Smoker | \(n_{11} = 688\) | \(n_{12} = 650\) | \(n_{1+} = 1338\) |
| Non-Smoker | \(n_{21} = 21\) | \(n_{22} = 59\) | \(n_{2+} = 80\) |
| Total | \(n_{+1} = 709\) | \(n_{+2} = 709\) | \(n = 1418\) |
n11 <- data_k1[1, 1]; n12 <- data_k1[1, 2]
n21 <- data_k1[2, 1]; n22 <- data_k1[2, 2]
R_k1 <- rowSums(data_k1) # Smoker=1338, Non-Smoker=80
C_k1 <- colSums(data_k1) # Cancer(+)=709, Control(-)=709
n <- sum(data_k1) # 1418
n1 <- R_k1[1]; n2 <- R_k1[2] # total marginal baris
nc1 <- C_k1[1]; nc2 <- C_k1[2] # total marginal kolom\[\hat{p}_1 = \frac{n_{11}}{n_{1+}} = \frac{688}{1338}, \qquad \hat{p}_2 = \frac{n_{21}}{n_{2+}} = \frac{21}{80}\]
\[\hat{p}_1 = \frac{688}{1338} = 0{,}5143\]
\[\hat{p}_2 = \frac{21}{80} = 0{,}2625\]
## p1 (Smoker) : 0.5142
## p2 (Non-Smoker): 0.2625
Interpretasi: Sekitar 51,43% perokok mengalami kanker paru, sedangkan pada non-perokok hanya 26,25%.
Rumus Wald:
\[CI_{95\%}(\hat{p}_1) = \hat{p}_1 \pm z_{0{,}025} \cdot \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}}\]
Perhitungan Manual:
\[SE(\hat{p}_1) = \sqrt{\frac{0{,}5143 \times 0{,}4857}{1338}} = \sqrt{\frac{0{,}2498}{1338}} = \sqrt{0{,}000187} = 0{,}01367\]
\[CI_{95\%} = 0{,}5143 \pm 1{,}96 \times 0{,}01367 = 0{,}5143 \pm 0{,}02679\]
\[CI_{95\%} = [0{,}4875 \;;\; 0{,}5411]\]
z <- qnorm(0.975)
se1 <- sqrt(p1 * (1 - p1) / n1)
ci1 <- c(p1 - z * se1, p1 + z * se1)
cat("SE(p1) :", round(se1, 5), "\n")## SE(p1) : 0.01366
## CI 95% p1 : [ 0.4874 ; 0.541 ]
Perhitungan Manual:
\[SE(\hat{p}_2) = \sqrt{\frac{0{,}2625 \times 0{,}7375}{80}} = \sqrt{\frac{0{,}1936}{80}} = \sqrt{0{,}002420} = 0{,}04920\]
\[CI_{95\%} = 0{,}2625 \pm 1{,}96 \times 0{,}04920 = 0{,}2625 \pm 0{,}09643\]
\[CI_{95\%} = [0{,}1661 \;;\; 0{,}3589]\]
se2 <- sqrt(p2 * (1 - p2) / n2)
ci2 <- c(p2 - z * se2, p2 + z * se2)
cat("SE(p2) :", round(se2, 5), "\n")## SE(p2) : 0.04919
## CI 95% p2 : [ 0.1661 ; 0.3589 ]
Rumus:
\[RD = \hat{p}_1 - \hat{p}_2\]
\[SE(RD) = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]
\[CI_{95\%}(RD) = RD \pm z_{0{,}025} \cdot SE(RD)\]
Perhitungan Manual:
\[RD = 0{,}5143 - 0{,}2625 = 0{,}2518\]
\[SE(RD) = \sqrt{\frac{0{,}5143 \times 0{,}4857}{1338} + \frac{0{,}2625 \times 0{,}7375}{80}}\]
\[SE(RD) = \sqrt{0{,}000187 + 0{,}002420} = \sqrt{0{,}002607} = 0{,}05106\]
\[CI_{95\%}(RD) = 0{,}2518 \pm 1{,}96 \times 0{,}05106 = 0{,}2518 \pm 0{,}1001\]
\[CI_{95\%}(RD) = [0{,}1517 \;;\; 0{,}3519]\]
RD <- p1 - p2
se_RD <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
ci_RD <- c(RD - z*se_RD, RD + z*se_RD)
cat("RD :", round(RD, 4), "\n")## RD : 0.2517
## SE(RD) : 0.05106
## CI 95% RD: [ 0.1516 ; 0.3518 ]
Interpretasi: RD = 0,2518 artinya risiko kanker paru pada perokok lebih tinggi 25,18 poin persentase dibanding non-perokok. CI 95% tidak mencakup 0, menunjukkan perbedaan signifikan.
Rumus:
\[RR = \frac{\hat{p}_1}{\hat{p}_2}\]
\[SE(\ln RR) = \sqrt{\frac{1 - \hat{p}_1}{n_1 \hat{p}_1} + \frac{1 - \hat{p}_2}{n_2 \hat{p}_2}}\]
\[CI_{95\%}(RR) = \exp\!\left(\ln RR \pm z_{0{,}025} \cdot SE(\ln RR)\right)\]
Perhitungan Manual:
\[RR = \frac{0{,}5143}{0{,}2625} = 1{,}9592\]
\[SE(\ln RR) = \sqrt{\frac{0{,}4857}{1338 \times 0{,}5143} + \frac{0{,}7375}{80 \times 0{,}2625}}\]
\[= \sqrt{\frac{0{,}4857}{688} + \frac{0{,}7375}{21}} = \sqrt{0{,}000706 + 0{,}035119} = \sqrt{0{,}035825} = 0{,}18927\]
\[\ln RR = \ln(1{,}9592) = 0{,}6729\]
\[CI_{95\%}(\ln RR) = 0{,}6729 \pm 1{,}96 \times 0{,}18927 = [0{,}3020 \;;\; 1{,}0439]\]
\[CI_{95\%}(RR) = \left[e^{0{,}3020} \;;\; e^{1{,}0439}\right] = [1{,}3527 \;;\; 2{,}8397]\]
RR <- p1 / p2
se_lnRR <- sqrt((1-p1)/(n1*p1) + (1-p2)/(n2*p2))
ci_RR <- exp(c(log(RR) - z*se_lnRR, log(RR) + z*se_lnRR))
cat("RR :", round(RR, 4), "\n")## RR : 1.9589
## ln(RR) : 0.6724
## SE(ln RR) : 0.18928
## CI 95% RR : [ 1.3517 ; 2.8387 ]
Interpretasi: RR = 1,9592 berarti perokok memiliki risiko kanker paru hampir 2 kali lebih besar dibanding non-perokok. CI 95% tidak mencakup 1.
Rumus:
\[OR = \frac{n_{11} \cdot n_{22}}{n_{12} \cdot n_{21}}\]
\[SE(\ln OR) = \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{12}} + \frac{1}{n_{21}} + \frac{1}{n_{22}}}\]
\[CI_{95\%}(OR) = \exp\!\left(\ln OR \pm z_{0{,}025} \cdot SE(\ln OR)\right)\]
Perhitungan Manual:
\[OR = \frac{688 \times 59}{650 \times 21} = \frac{40592}{13650} = 2{,}9736\]
\[SE(\ln OR) = \sqrt{\frac{1}{688} + \frac{1}{650} + \frac{1}{21} + \frac{1}{59}}\]
\[= \sqrt{0{,}001453 + 0{,}001538 + 0{,}047619 + 0{,}016949} = \sqrt{0{,}067559} = 0{,}25992\]
\[\ln OR = \ln(2{,}9736) = 1{,}0997\]
\[CI_{95\%}(\ln OR) = 1{,}0997 \pm 1{,}96 \times 0{,}25992 = [0{,}5902 \;;\; 1{,}6091]\]
\[CI_{95\%}(OR) = \left[e^{0{,}5902} \;;\; e^{1{,}6091}\right] = [1{,}8041 \;;\; 4{,}9990]\]
OR <- (n11 * n22) / (n12 * n21)
se_lnOR <- sqrt(1/n11 + 1/n12 + 1/n21 + 1/n22)
ci_OR <- exp(c(log(OR) - z*se_lnOR, log(OR) + z*se_lnOR))
cat("OR :", round(OR, 4), "\n")## OR : 2.9738
## ln(OR) : 1.0898
## SE(ln OR) : 0.25992
## CI 95% OR : [ 1.7867 ; 4.9494 ]
Interpretasi: OR = 2,9736 berarti odds kanker paru pada perokok hampir 3 kali lebih besar dibanding non-perokok. CI 95% tidak mencakup 1.
# Verifikasi menggunakan paket epitools
if (!require(epitools)) install.packages("epitools")
library(epitools)
# Odds Ratio dengan epitools
cat("=== Verifikasi OR dengan epitools ===\n")## === Verifikasi OR dengan epitools ===
## $data
## Status
## Status Merokok Control (-) Cancer (+) Total
## Non-Smoker 59 21 80
## Smoker 650 688 1338
## Total 709 709 1418
##
## $measure
## odds ratio with 95% C.I.
## Status Merokok estimate lower upper
## Non-Smoker 1.000000 NA NA
## Smoker 2.973773 1.786737 4.949427
##
## $p.value
## two-sided
## Status Merokok midp.exact fisher.exact chi.square
## Non-Smoker NA NA NA
## Smoker 9.747013e-06 1.476303e-05 1.221601e-05
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
##
## === Verifikasi RR dengan epitools ===
## $data
## Status
## Status Merokok Control (-) Cancer (+) Total
## Non-Smoker 59 21 80
## Smoker 650 688 1338
## Total 709 709 1418
##
## $measure
## risk ratio with 95% C.I.
## Status Merokok estimate lower upper
## Non-Smoker 1.000000 NA NA
## Smoker 1.958858 1.351735 2.838667
##
## $p.value
## two-sided
## Status Merokok midp.exact fisher.exact chi.square
## Non-Smoker NA NA NA
## Smoker 9.747013e-06 1.476303e-05 1.221601e-05
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
# Visualisasi
library(vcd)
# Mosaic Plot untuk melihat asosiasi
mosaic(data_k1, shade = TRUE, legend = TRUE,
main = "Mosaic Plot: Kebiasaan Merokok vs Kanker Paru")# Barplot untuk melihat perbedaan proporsi
prop_k1 <- prop.table(data_k1, margin = 1)
barplot(t(prop_k1), beside = TRUE,
col = c("firebrick", "darkblue"),
legend = colnames(data_k1),
main = "Proporsi Status Kesehatan: Perokok vs Bukan Perokok",
xlab = "Kebiasaan Merokok", ylab = "Proporsi")
Interpretasi: Berdasarkan Mosaic Plot, terdapat
asosiasi kuat antara kebiasaan merokok dan kanker paru, di mana sel
Smoker-Cancer(+) berwarna biru pekat yang menunjukkan frekuensi
observasi jauh melampaui frekuensi harapan (residual positif
signifikan). Hal ini diperkuat dengan nilai Relative Risk (RR) sebesar
1,96, yang artinya perokok memiliki risiko 1,96 kali lebih besar (hampir
dua kali lipat) untuk terkena kanker paru dibandingkan kelompok
non-perokok. Dengan nilai \(p <
0,05\), dapat disimpulkan bahwa hubungan ini signifikan secara
statistik.
Hipotesis:
\[H_0: p_1 = p_2 \qquad \text{vs} \qquad H_1: p_1 \neq p_2\]
Statistik uji:
\[Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}}, \qquad \hat{p} = \frac{n_{11} + n_{21}}{n}\]
Daerah penolakan: Tolak \(H_0\) jika \(|Z| > z_{0{,}025} = 1{,}96\), atau \(p\text{-value} < 0{,}05\).
Perhitungan Manual:
\[\hat{p} = \frac{688 + 21}{1418} = \frac{709}{1418} = 0{,}5000\]
\[SE = \sqrt{0{,}5000 \times 0{,}5000 \times \left(\frac{1}{1338} + \frac{1}{80}\right)} = \sqrt{0{,}25 \times 0{,}013297} = \sqrt{0{,}003324} = 0{,}05766\]
\[Z = \frac{0{,}5143 - 0{,}2625}{0{,}05766} = \frac{0{,}2518}{0{,}05766} = 4{,}367\]
\[p\text{-value} = 2 \times P(Z > 4{,}367) = 2 \times 6{,}29 \times 10^{-6} \approx 1{,}26 \times 10^{-5}\]
p_pool <- (n11 + n21) / n
SE_z <- sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
Z_stat <- (p1 - p2) / SE_z
pval_z <- 2 * pnorm(-abs(Z_stat))
cat("p_pool (gabungan):", round(p_pool, 4), "\n")## p_pool (gabungan): 0.5
## SE : 0.05755
## Statistik Z : 4.3737
## Z^2 : 19.1292
## P-value : 1.222e-05
# Verifikasi dengan fungsi R
prop.test(c(n11, n21), c(n1, n2), correct = FALSE, alternative = "two.sided")##
## 2-sample test for equality of proportions without continuity correction
##
## data: c(n11, n21) out of c(n1, n2)
## X-squared = 19.129, df = 1, p-value = 1.222e-05
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.1516343 0.3517663
## sample estimates:
## prop 1 prop 2
## 0.5142003 0.2625000
Keputusan: \(|Z| = 4{,}367 > 1{,}96\), dan \(p\text{-value} \approx 1{,}26 \times 10^{-5} < 0{,}05\). Tolak \(H_0\).
Hipotesis:
\[H_0: \text{Status merokok dan kanker paru saling independen} \qquad \text{vs} \qquad H_1: \text{Keduanya berasosiasi}\]
Frekuensi harapan:
\[E_{ij} = \frac{n_{i+} \cdot n_{+j}}{n}\]
Statistik uji:
\[\chi^2 = \sum_{i}\sum_{j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2_{(r-1)(c-1)}\]
Daerah penolakan: Tolak \(H_0\) jika \(\chi^2 > \chi^2_{0{,}05; 1} = 3{,}841\), atau \(p\text{-value} < 0{,}05\).
Perhitungan Manual — Frekuensi Harapan:
\[E_{11} = \frac{1338 \times 709}{1418} = \frac{949\,062}{1418} = 669{,}3\]
\[E_{12} = \frac{1338 \times 709}{1418} = 669{,}3 \quad \text{(karena } n_{+1}=n_{+2}=709\text{)}\]
\[E_{21} = \frac{80 \times 709}{1418} = \frac{56\,720}{1418} = 40{,}0\]
\[E_{22} = \frac{80 \times 709}{1418} = 40{,}0\]
Komponen Chi-Square:
\[\frac{(688 - 669{,}3)^2}{669{,}3} = \frac{(18{,}7)^2}{669{,}3} = \frac{349{,}7}{669{,}3} = 0{,}5225\]
\[\frac{(650 - 669{,}3)^2}{669{,}3} = \frac{(-19{,}3)^2}{669{,}3} = \frac{372{,}5}{669{,}3} = 0{,}5565\]
\[\frac{(21 - 40{,}0)^2}{40{,}0} = \frac{(-19{,}0)^2}{40{,}0} = \frac{361{,}0}{40{,}0} = 9{,}025\]
\[\frac{(59 - 40{,}0)^2}{40{,}0} = \frac{(19{,}0)^2}{40{,}0} = \frac{361{,}0}{40{,}0} = 9{,}025\]
\[\chi^2 = 0{,}5225 + 0{,}5565 + 9{,}025 + 9{,}025 = 19{,}129\]
\[p\text{-value} = P(\chi^2_1 > 19{,}129) \approx 1{,}23 \times 10^{-5}\]
## Frekuensi harapan:
## Cancer (+) Control (-)
## Smoker 669 669
## Non-Smoker 40 40
##
## Komponen chi-square:
## Status
## Status Merokok Cancer (+) Control (-)
## Smoker 0.5396 0.5396
## Non-Smoker 9.0250 9.0250
chi2_manual <- sum(komponen)
pval_chi2 <- pchisq(chi2_manual, df = 1, lower.tail = FALSE)
cat("\nChi-square manual:", round(chi2_manual, 4), "\n")##
## Chi-square manual: 19.1292
## P-value : 1.222e-05
##
## Pearson's Chi-squared test
##
## data: data_k1
## X-squared = 19.129, df = 1, p-value = 1.222e-05
Keputusan: \(\chi^2 = 19{,}129 > 3{,}841\), dan \(p\text{-value} < 0{,}05\). Tolak \(H_0\).
Hipotesis: Sama dengan uji chi-square.
Statistik uji:
\[G^2 = 2 \sum n_{ij} \ln \left( \frac{n_{ij}}{\hat{\mu}_{ij}} \right)\]
Daerah penolakan: Tolak \(H_0\) jika \(G^2 > \chi^2_{0{,}05; 1} = 3{,}841\).
Perhitungan Manual — Tiap Sel:
\[O_{11}\ln\!\frac{O_{11}}{E_{11}} = 688 \times \ln\!\frac{688}{669{,}3} = 688 \times \ln(1{,}02794) = 688 \times 0{,}02756 = 18{,}96\]
\[O_{12}\ln\!\frac{O_{12}}{E_{12}} = 650 \times \ln\!\frac{650}{669{,}3} = 650 \times \ln(0{,}97117) = 650 \times (-0{,}02924) = -19{,}01\]
\[O_{21}\ln\!\frac{O_{21}}{E_{21}} = 21 \times \ln\!\frac{21}{40{,}0} = 21 \times \ln(0{,}5250) = 21 \times (-0{,}6444) = -13{,}53\]
\[O_{22}\ln\!\frac{O_{22}}{E_{22}} = 59 \times \ln\!\frac{59}{40{,}0} = 59 \times \ln(1{,}4750) = 59 \times 0{,}38877 = 22{,}94\]
\[G^2 = 2 \times (18{,}96 - 19{,}01 - 13{,}53 + 22{,}94) = 2 \times 9{,}360 = 18{,}720\]
\[p\text{-value} = P(\chi^2_1 > 18{,}720) \approx 1{,}54 \times 10^{-5}\]
komponen_G2 <- data_k1 * log(data_k1 / E)
cat("Komponen O*ln(O/E):\n"); print(round(komponen_G2, 4))## Komponen O*ln(O/E):
## Status
## Status Merokok Cancer (+) Control (-)
## Smoker 19.2673 -18.7276
## Non-Smoker -13.5315 22.9308
G2 <- 2 * sum(komponen_G2)
pval_G2 <- pchisq(G2, df = 1, lower.tail = FALSE)
cat("\nG^2 :", round(G2, 4), "\n")##
## G^2 : 19.878
## P-value : 8.254e-06
Keputusan: \(G^2 = 18{,}720 > 3{,}841\), dan \(p\text{-value} < 0{,}05\). Tolak \(H_0\).
Uji Fisher menggunakan distribusi hipergeometrik. Probabilitas tabel yang teramati dihitung sebagai:
\[P(n_{11} = k) = \frac{\dbinom{n_{1+}}{k}\dbinom{n_{2+}}{n_{+1}-k}}{\dbinom{n}{n_{+1}}}\]
\(p\text{-value}\) dihitung sebagai jumlah probabilitas semua tabel yang sama ekstrem atau lebih ekstrem dari yang teramati.
Probabilitas tabel teramati:
\[P(n_{11} = 688) = \frac{\dbinom{1338}{688}\dbinom{80}{21}}{\dbinom{1418}{709}}\]
Karena perhitungan kombinasi ini melibatkan bilangan sangat besar, dilakukan menggunakan fungsi R:
##
## Fisher's Exact Test for Count Data
##
## data: data_k1
## p-value = 1.476e-05
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.755611 5.210711
## sample estimates:
## odds ratio
## 2.971634
Keputusan: \(p\text{-value} < 0{,}05\). Tolak \(H_0\).
fisher_res <- fisher.test(data_k1)
tbl_banding <- data.frame(
Uji = c("Uji Dua Proporsi","Chi-Square","Likelihood Ratio (G2)","Fisher Exact"),
Hipotesis = c("p1=p2","Independensi","Independensi","Independensi"),
Statistik_Uji = c(
paste0("Z=",round(Z_stat,3)," (Z^2=",round(Z_stat^2,3),")"),
paste0("chi^2=",round(sum(komponen),3)),
paste0("G^2=",round(G2,3)),
"Hipergeometrik"
),
P_value = c(
format(pval_z, scientific=TRUE, digits=3),
format(pval_chi2, scientific=TRUE, digits=3),
format(pval_G2, scientific=TRUE, digits=3),
format(fisher_res$p.value, scientific=TRUE, digits=3)
),
Keputusan = rep("Tolak H0", 4),
Asumsi = c("Aproks. normal","E_ij >= 5","E_ij >= 5","Tanpa asumsi")
)
knitr::kable(tbl_banding, caption="Perbandingan Hasil Keempat Uji Hipotesis - Kasus 1")| Uji | Hipotesis | Statistik_Uji | P_value | Keputusan | Asumsi |
|---|---|---|---|---|---|
| Uji Dua Proporsi | p1=p2 | Z=4.374 (Z^2=19.129) | 1.22e-05 | Tolak H0 | Aproks. normal |
| Chi-Square | Independensi | chi^2=19.129 | 1.22e-05 | Tolak H0 | E_ij >= 5 |
| Likelihood Ratio (G2) | Independensi | G^2=19.878 | 8.25e-06 | Tolak H0 | E_ij >= 5 |
| Fisher Exact | Independensi | Hipergeometrik | 1.48e-05 | Tolak H0 | Tanpa asumsi |
Interpretasi: Keempat uji menghasilkan kesimpulan yang sama: terdapat asosiasi yang sangat signifikan antara merokok dan kanker paru (\(p\)-value \(\ll 0{,}05\)). Nilai \(\chi^2 \approx Z^2 \approx G^2\) menunjukkan konsistensi antar metode. Fisher exact test digunakan sebagai konfirmasi karena tidak bergantung pada asumsi distribusi asimtotik.
Berdasarkan seluruh analisis pada tabel \(2 \times 2\) (Merokok vs Kanker Paru):
| Ukuran | Nilai | CI 95% | Interpretasi |
|---|---|---|---|
| \(\hat{p}_1\) (Smoker) | 0,5143 | [0,4875 ; 0,5411] | 51,4% perokok terkena kanker |
| \(\hat{p}_2\) (Non-Smoker) | 0,2625 | [0,1661 ; 0,3589] | 26,3% non-perokok terkena kanker |
| RD | 0,2518 | [0,1517 ; 0,3519] | Selisih risiko absolut 25,2 poin |
| RR | 1,9592 | [1,3527 ; 2,8397] | Perokok berisiko ~2× lebih besar |
| OR | 2,9736 | [1,8041 ; 4,9990] | Odds kanker perokok ~3× lebih besar |
Semua uji hipotesis: Tolak H₀ dengan \(p\text{-value} < 0{,}001\)
Kesimpulan: Terdapat asosiasi yang sangat signifikan antara kebiasaan merokok dan kejadian kanker paru. Perokok memiliki risiko hampir dua kali lipat dan odds hampir tiga kali lipat dibandingkan non-perokok.
data_k2 <- matrix(
c(495, 272, 590,
330, 265, 498),
nrow = 2, byrow = TRUE,
dimnames = list(
"Gender" = c("Female","Male"),
"Partai" = c("Democrat","Republican","Independent")
)
)
addmargins(data_k2)## Partai
## Gender Democrat Republican Independent Sum
## Female 495 272 590 1357
## Male 330 265 498 1093
## Sum 825 537 1088 2450
Notasi: \(n_{ij}\) adalah frekuensi baris \(i\), kolom \(j\); dengan \(n_{1+}=1357\), \(n_{2+}=1093\), \(n=2450\).
R <- rowSums(data_k2) # c(1357, 1093)
C <- colSums(data_k2) # c(825, 537, 1088)
N <- sum(data_k2) # 2450Rumus:
\[E_{ij} = \frac{n_{i+} \cdot n_{+j}}{n}\]
Perhitungan Manual:
\[E_{11} = \frac{1357 \times 825}{2450} = \frac{1\,119\,525}{2450} = 456{,}95\]
\[E_{12} = \frac{1357 \times 537}{2450} = \frac{728\,709}{2450} = 297{,}43\]
\[E_{13} = \frac{1357 \times 1088}{2450} = \frac{1\,476\,416}{2450} = 602{,}62\]
\[E_{21} = \frac{1093 \times 825}{2450} = \frac{901\,725}{2450} = 368{,}05\]
\[E_{22} = \frac{1093 \times 537}{2450} = \frac{586\,941}{2450} = 239{,}57\]
\[E_{23} = \frac{1093 \times 1088}{2450} = \frac{1\,189\,184}{2450} = 485{,}38\]
## Frekuensi Harapan E_ij:
## Democrat Republican Independent
## Female 456.95 297.43 602.62
## Male 368.05 239.57 485.38
##
## Semua E_ij > 5: TRUE
Semua \(E_{ij} > 5\), sehingga asumsi uji chi-square terpenuhi.
Hipotesis:
\[H_0: \text{Gender dan identifikasi partai politik saling independen}\]
\[H_1: \text{Terdapat asosiasi antara gender dan identifikasi partai politik}\]
Statistik uji:
\[\chi^2 = \sum_{i=1}^{2}\sum_{j=1}^{3} \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2_{(2-1)(3-1)} = \chi^2_2\]
Daerah penolakan: Tolak \(H_0\) jika \(\chi^2 > \chi^2_{0{,}05;2} = 5{,}991\).
Perhitungan Manual — Komponen Tiap Sel:
\[\frac{(495 - 456{,}95)^2}{456{,}95} = \frac{(38{,}05)^2}{456{,}95} = \frac{1447{,}8}{456{,}95} = 3{,}167\]
\[\frac{(272 - 297{,}43)^2}{297{,}43} = \frac{(-25{,}43)^2}{297{,}43} = \frac{646{,}7}{297{,}43} = 2{,}175\]
\[\frac{(590 - 602{,}62)^2}{602{,}62} = \frac{(-12{,}62)^2}{602{,}62} = \frac{159{,}3}{602{,}62} = 0{,}264\]
\[\frac{(330 - 368{,}05)^2}{368{,}05} = \frac{(-38{,}05)^2}{368{,}05} = \frac{1447{,}8}{368{,}05} = 3{,}933\]
\[\frac{(265 - 239{,}57)^2}{239{,}57} = \frac{(25{,}43)^2}{239{,}57} = \frac{646{,}7}{239{,}57} = 2{,}700\]
\[\frac{(498 - 485{,}38)^2}{485{,}38} = \frac{(12{,}62)^2}{485{,}38} = \frac{159{,}3}{485{,}38} = 0{,}328\]
\[\chi^2 = 3{,}167 + 2{,}175 + 0{,}264 + 3{,}933 + 2{,}700 + 0{,}328 = 12{,}567\]
\[p\text{-value} = P(\chi^2_2 > 12{,}567) = 0{,}00186\]
## Komponen chi-square per sel:
## Partai
## Gender Democrat Republican Independent
## Female 3.1686 2.1746 0.2642
## Male 3.9339 2.6999 0.3281
chi2_k2 <- sum(komponen_k2)
pval_k2 <- pchisq(chi2_k2, df = 2, lower.tail = FALSE)
cat("\nChi-square manual:", round(chi2_k2, 4), "\n")##
## Chi-square manual: 12.5693
## Derajat bebas : 2
## P-value : 0.00186
##
## Pearson's Chi-squared test
##
## data: data_k2
## X-squared = 12.569, df = 2, p-value = 0.001865
Keputusan: \(\chi^2 = 12{,}567 > 5{,}991\), \(p\text{-value} = 0{,}00186 < 0{,}05\). Tolak \(H_0\).
Rumus:
\[r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}\]
Perhitungan Manual:
\[r_{11} = \frac{495 - 456{,}95}{\sqrt{456{,}95}} = \frac{38{,}05}{21{,}376} = +1{,}780\]
\[r_{12} = \frac{272 - 297{,}43}{\sqrt{297{,}43}} = \frac{-25{,}43}{17{,}246} = -1{,}475\]
\[r_{13} = \frac{590 - 602{,}62}{\sqrt{602{,}62}} = \frac{-12{,}62}{24{,}549} = -0{,}514\]
\[r_{21} = \frac{330 - 368{,}05}{\sqrt{368{,}05}} = \frac{-38{,}05}{19{,}185} = -1{,}984\]
\[r_{22} = \frac{265 - 239{,}57}{\sqrt{239{,}57}} = \frac{25{,}43}{15{,}478} = +1{,}643\]
\[r_{23} = \frac{498 - 485{,}38}{\sqrt{485{,}38}} = \frac{12{,}62}{22{,}031} = +0{,}573\]
## Residual Pearson:
## Partai
## Gender Democrat Republican Independent
## Female 1.7801 -1.4747 -0.5140
## Male -1.9834 1.6431 0.5728
Rumus:
\[d_{ij} = \frac{n_{ij} - \hat{\mu}_{ij}}{\sqrt{\hat{\mu}_{ij} (1 - p_{i+})(1 - p_{+j})}}\]
di mana \(p_{i+} = n_{i+}/n\) dan \(p_{+j} = n_{+j}/n\).
Perhitungan Manual untuk sel \((1,1)\) — Female–Democrat:
\[p_{1+} = \frac{1357}{2450} = 0{,}5539, \quad p_{+1} = \frac{825}{2450} = 0{,}3367\]
\[d_{11} = \frac{38{,}05}{\sqrt{456{,}95 \times (1 - 0{,}5539)(1 - 0{,}3367)}}\]
\[= \frac{38{,}05}{\sqrt{456{,}95 \times 0{,}4461 \times 0{,}6633}} = \frac{38{,}05}{\sqrt{135{,}19}} = \frac{38{,}05}{11{,}627} = +3{,}273\]
Perhitungan Manual untuk sel \((2,1)\) — Male–Democrat:
\[p_{2+} = \frac{1093}{2450} = 0{,}4461\]
\[d_{21} = \frac{-38{,}05}{\sqrt{368{,}05 \times (1 - 0{,}4461)(1 - 0{,}3367)}}\]
\[= \frac{-38{,}05}{\sqrt{368{,}05 \times 0{,}5539 \times 0{,}6633}} = \frac{-38{,}05}{\sqrt{135{,}19}} = \frac{-38{,}05}{11{,}627} = -3{,}273\]
p_row <- R / N # proporsi marginal baris p_{i+}
p_col <- C / N # proporsi marginal kolom p_{+j}
res_std <- (data_k2 - E_k2) / sqrt(E_k2 * outer(1 - p_row, 1 - p_col))
cat("Standardized Residual (manual):\n")## Standardized Residual (manual):
## Partai
## Gender Democrat Republican Independent
## Female 3.2724 -2.4986 -1.0322
## Male -3.2724 2.4986 1.0322
##
## Verifikasi (chisq.test stdres):
## Partai
## Gender Democrat Republican Independent
## Female 3.2724 -2.4986 -1.0322
## Male -3.2724 2.4986 1.0322
library(vcd)
mosaic(data_k2, shade = TRUE, legend = TRUE,
main = "Mosaic Plot: Gender vs Identifikasi Partai Politik")Interpretasi Residual: Sel dengan \(|d_{ij}| > 2\) berkontribusi signifikan:
Untuk tabel \(2 \times 3\) dengan \(\chi^2_{total}\) berderajat bebas 2, dapat dipartisi menjadi dua kontras ortogonal:
\[\chi^2_{(2)} = \chi^2_{P1(1)} + \chi^2_{P2(1)}\]
Subtabel \(2 \times 2\) hanya kolom Democrat dan Republican:
Frekuensi harapan subtabel (total baris = 767 dan 595; total kolom = 825 dan 537; \(n^* = 1609\)):
\[E^*_{11} = \frac{767 \times 825}{1609} = \frac{632\,775}{1609} = 393{,}27\]
\[E^*_{12} = \frac{767 \times 537}{1609} = \frac{411\,879}{1609} = 255{,}98 \approx 256{,}0 \quad (\text{sisa baris 1})\]
\[E^*_{21} = \frac{595 \times 825}{1609} = \frac{490\,875}{1609} = 305{,}08\]
\[E^*_{22} = \frac{595 \times 537}{1609} = \frac{319\,515}{1609} = 198{,}58 \approx 199{,}0 \quad (\text{sisa baris 2})\]
Komponen chi-square:
\[\chi^2_{P1} = \frac{(495-393{,}27)^2}{393{,}27} + \frac{(272-256{,}0)^2}{256{,}0} + \frac{(330-305{,}08)^2}{305{,}08} + \frac{(265-199{,}0)^2}{199{,}0}\]
sub1 <- data_k2[, c("Democrat","Republican")]
chi_P1 <- chisq.test(sub1, correct=FALSE)
cat("=== Partisi 1: Democrat vs Republican ===\n")## === Partisi 1: Democrat vs Republican ===
## Partai
## Gender Democrat Republican Sum
## Female 495 272 767
## Male 330 265 595
## Sum 825 537 1362
## Frekuensi harapan:
## Partai
## Gender Democrat Republican
## Female 464.59 302.41
## Male 360.41 234.59
## Chi-square: 11.5545
## df : 1
## P-value : 0.000676
demrep <- rowSums(data_k2[, c("Democrat","Republican")])
ind <- data_k2[, "Independent"]
sub2 <- cbind("Dem+Rep"=demrep, "Independent"=ind)
chi_P2 <- chisq.test(sub2, correct=FALSE)
cat("=== Partisi 2: (Dem+Rep) vs Independent ===\n")## === Partisi 2: (Dem+Rep) vs Independent ===
## Dem+Rep Independent Sum
## Female 767 590 1357
## Male 595 498 1093
## Sum 1362 1088 2450
## Frekuensi harapan:
## Dem+Rep Independent
## Female 754.38 602.62
## Male 607.62 485.38
## Chi-square: 1.0654
## df : 1
## P-value : 0.301979
\[\chi^2_{P1} + \chi^2_{P2} \approx \chi^2_{total}\]
## Chi-square P1 : 11.5545
## Chi-square P2 : 1.0654
## P1 + P2 : 12.62
## Chi-square total : 12.5693
Catatan: Jumlah \(\chi^2_{P1} + \chi^2_{P2}\) mendekati tetapi tidak selalu sama persis dengan \(\chi^2_{total}\). Hal ini karena partisi ke subtabel bukan partisi ortogonal Lancaster yang sempurna — kontras orthogonal sejati memerlukan pembobotan khusus. Namun, dekomposisi ini tetap berguna secara interpretatif untuk memahami kontribusi masing-masing kontras kategori.
tbl_banding2 <- data.frame(
Uji = c("Chi-Square Keseluruhan",
"Partisi 1: Dem vs Rep",
"Partisi 2: (Dem+Rep) vs Ind"),
Chi_square = c(round(chi2_k2, 4),
round(chi_P1$statistic, 4),
round(chi_P2$statistic, 4)),
df = c(2, 1, 1),
P_value = c(round(pval_k2, 5),
round(chi_P1$p.value, 5),
round(chi_P2$p.value, 5)),
Keputusan = c("Tolak H0",
ifelse(chi_P1$p.value<0.05,"Tolak H0","Gagal Tolak H0"),
ifelse(chi_P2$p.value<0.05,"Tolak H0","Gagal Tolak H0"))
)
knitr::kable(tbl_banding2, caption="Perbandingan Partisi Chi-Square - Kasus 2")| Uji | Chi_square | df | P_value | Keputusan |
|---|---|---|---|---|
| Chi-Square Keseluruhan | 12.5693 | 2 | 0.00186 | Tolak H0 |
| Partisi 1: Dem vs Rep | 11.5545 | 1 | 0.00068 | Tolak H0 |
| Partisi 2: (Dem+Rep) vs Ind | 1.0654 | 1 | 0.30198 | Gagal Tolak H0 |
Interpretasi Partisi:
## Kontribusi chi-square per sel:
## Partai
## Gender Democrat Republican Independent
## Female 3.1686 2.1746 0.2642
## Male 3.9339 2.6999 0.3281
##
## Kontribusi (% dari total chi-square):
## Partai
## Gender Democrat Republican Independent
## Female 25.21 17.30 2.10
## Male 31.30 21.48 2.61
proporsi_k2 <- prop.table(data_k2, margin = 1)
barplot(t(proporsi_k2),
beside = TRUE,
col = c("steelblue","tomato","seagreen"),
legend = colnames(data_k2),
args.legend = list(x="topright", bty="n"),
main = "Proporsi Identifikasi Partai Politik per Gender",
xlab = "Gender", ylab = "Proporsi",
ylim = c(0, 0.60))Interpretasi: Sel Female–Democrat dan Male–Democrat masing-masing berkontribusi sekitar 25% terhadap \(\chi^2\) keseluruhan. Ini selaras dengan standardized residual terbesar (\(|d| \approx 3{,}27\)) pada kategori Democrat menurut gender.
Berdasarkan seluruh analisis pada tabel \(2 \times 3\) (Gender vs Identifikasi Partai Politik):
| Komponen | Hasil | Keputusan |
|---|---|---|
| Chi-square keseluruhan | \(\chi^2 = 12{,}567\), \(df = 2\), \(p = 0{,}00186\) | Tolak H₀ |
| Partisi 1: Dem vs Rep | \(p < 0{,}05\) | Tolak H₀ |
| Partisi 2: Partisan vs Ind | \(p > 0{,}05\) | Gagal Tolak H₀ |
Residual terbesar:
Kesimpulan: Gender berpengaruh signifikan terhadap preferensi identifikasi partai, terutama pada pilihan antara Democrat dan Republican. Perbedaan tidak nyata pada kecenderungan memilih Independent antar gender.
| Kasus | Variabel | Metode Utama | Hasil |
|---|---|---|---|
| Kasus 1 | Merokok vs Kanker Paru | RD, RR, OR + 4 uji hipotesis | Asosiasi Sangat Signifikan |
| Kasus 2 | Gender vs Partai Politik | Chi-square, residual, partisi | Asosiasi Signifikan |
Kedua kasus menunjukkan bahwa inferensi tabel kontingensi dua arah dengan berbagai metode pengujian menghasilkan kesimpulan yang konsisten satu sama lain. Perhitungan manual dan verifikasi R saling mengkonfirmasi, menunjukkan keandalan prosedur statistik yang digunakan.
Pemilihan metode disesuaikan dengan ukuran sampel dan struktur data: untuk tabel \(2 \times 2\) dengan sampel yang memadai, keempat uji memberikan hasil serupa; untuk tabel yang lebih besar (\(2 \times 3\)), partisi chi-square memberikan wawasan tambahan tentang kontras kategori mana yang paling berkontribusi pada asosiasi yang teramati.