library(knitr)
library(kableExtra)
library(epitools)
library(vcd)
library(DescTools)
library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)
library(RColorBrewer)

1 Pendahuluan

Tabel kontingensi dua arah merupakan salah satu alat utama dalam analisis data kategorik untuk mengeksplorasi dan menguji hubungan antara dua variabel kategorik. Dalam tugas ini, dilakukan analisis inferensi pada dua kasus:

  1. Kasus 1 (2×2): Hubungan antara kebiasaan merokok dan kanker paru.
  2. Kasus 2 (2×3): Hubungan antara gender dan identifikasi partai politik.

1.1 Ukuran Asosiasi dalam Tabel Kontingensi

Dalam analisis tabel kontingensi 2×2, terdapat tiga ukuran asosiasi utama yang digunakan untuk mengkuantifikasi kekuatan hubungan antara dua variabel kategorik, yaitu Risk Difference (RD), Risk Ratio (RR), dan Odds Ratio (OR).

1.1.1 Risk Difference (RD) — Selisih Risiko

Definisi: Risk Difference (RD), atau selisih risiko, adalah ukuran asosiasi absolut yang mengukur perbedaan besarnya probabilitas kejadian (risk) antara kelompok terpapar dan kelompok tidak terpapar.

\[\text{RD} = p_1 - p_2\]

di mana \(p_1\) adalah proporsi kejadian pada kelompok terpapar dan \(p_2\) pada kelompok tidak terpapar.

Interpretasi:

  • \(\text{RD} = 0\) : tidak ada perbedaan risiko (nilai null).
  • \(\text{RD} > 0\) : kelompok terpapar memiliki risiko lebih tinggi.
  • \(\text{RD} < 0\) : kelompok terpapar memiliki risiko lebih rendah (protektif).

Misalnya, \(\text{RD} = 0{,}25\) berarti risiko pada kelompok terpapar 25 poin persentase lebih tinggi dibanding kelompok tidak terpapar. RD intuitif karena menyatakan selisih probabilitas secara langsung dan berguna untuk mengukur dampak absolut dalam kebijakan kesehatan masyarakat.


1.1.2 Risk Ratio (RR) — Rasio Risiko Relatif

Definisi: Risk Ratio (RR), atau relative risk, adalah ukuran asosiasi relatif yang membandingkan besarnya risiko antara dua kelompok secara proporsional.

\[\text{RR} = \frac{p_1}{p_2}\]

Interpretasi:

  • \(\text{RR} = 1\) : risiko sama pada kedua kelompok (nilai null).
  • \(\text{RR} > 1\) : kelompok terpapar memiliki risiko lebih tinggi secara relatif.
  • \(\text{RR} < 1\) : kelompok terpapar memiliki risiko lebih rendah (faktor protektif).

Misalnya, \(\text{RR} = 2\) berarti kelompok terpapar memiliki risiko dua kali lipat dibanding kelompok tidak terpapar. RR mudah diinterpretasi namun tidak dapat dihitung langsung pada desain studi case-control karena proporsi kasus dikontrol oleh peneliti.


1.1.3 Odds Ratio (OR) — Rasio Odds

Definisi: Odds Ratio (OR) adalah ukuran asosiasi yang membandingkan odds (nisbah kemungkinan) kejadian antara dua kelompok. Odds didefinisikan sebagai rasio probabilitas suatu kejadian terjadi terhadap probabilitas tidak terjadi: \(\text{odds} = p/(1-p)\).

\[\text{OR} = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} = \frac{ad}{bc}\]

di mana \(a, b, c, d\) adalah frekuensi sel tabel 2×2.

Interpretasi:

  • \(\text{OR} = 1\) : odds sama pada kedua kelompok (nilai null).
  • \(\text{OR} > 1\) : kelompok terpapar memiliki odds lebih tinggi.
  • \(\text{OR} < 1\) : kelompok terpapar memiliki odds lebih rendah (protektif).

Misalnya, \(\text{OR} = 3\) berarti odds kejadian pada kelompok terpapar tiga kali lebih besar dibanding kelompok tidak terpapar. OR adalah ukuran asosiasi yang paling fleksibel — valid untuk semua desain studi termasuk case-control. Pada kejadian yang jarang (rare disease assumption, prevalensi < 10%), nilai OR mendekati nilai RR.


1.1.4 Perbandingan Singkat RD, RR, dan OR

Ukuran Nilai Null Sifat Kegunaan Utama
RD 0 Absolut Kohort, Cross-sectional; dampak kebijakan
RR 1 Relatif Kohort, Cross-sectional
OR 1 Relatif (odds) Semua desain, termasuk case-control

1.2 Metode Pengujian yang Digunakan

Analisis meliputi empat metode pengujian hipotesis: uji dua proporsi, chi-square Pearson, likelihood ratio (\(G^2\)), dan Fisher exact test; serta partisi chi-square untuk kasus 2×3.

Referensi utama:

  • Agresti, A. (2013). Categorical Data Analysis (3rd ed.). Wiley.
  • Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical Methods for Rates and Proportions (3rd ed.). Wiley.
  • Stokes, M. E., Davis, C. S., & Koch, G. G. (2012). Categorical Data Analysis Using SAS (3rd ed.). SAS Institute.
  • Zar, J. H. (2010). Biostatistical Analysis (5th ed.). Pearson Prentice Hall.

2 Kasus 1: Tabel Kontingensi 2×2 — Merokok dan Kanker Paru

2.1 Penyusunan Tabel Kontingensi

Data yang digunakan menggambarkan hubungan antara status merokok (Smoker vs Non-Smoker) dan kejadian kanker paru (Cancer (+) vs Control (-)) pada desain studi case-control.

tabel1 <- matrix(
  c(688, 650, 21, 59),
  nrow  = 2,
  byrow = TRUE,
  dimnames = list(
    "Status Merokok" = c("Smoker", "Non-Smoker"),
    "Status Kanker"  = c("Cancer (+)", "Control (-)")
  )
)
tabel1_margin <- addmargins(tabel1)

kable(tabel1_margin,
      caption = "Tabel 1. Tabel Kontingensi 2x2: Status Merokok dan Kanker Paru",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position = "center") |>
  row_spec(nrow(tabel1_margin), bold=TRUE, background="#dce8f5") |>
  column_spec(ncol(tabel1_margin), bold=TRUE, background="#dce8f5") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac")
Tabel 1. Tabel Kontingensi 2x2: Status Merokok dan Kanker Paru
Cancer (+) Control (-) Sum
Smoker 688 650 1338
Non-Smoker 21 59 80
Sum 709 709 1418

Notasi sel tabel 2×2:

Cancer (+) Control (−) Total
Smoker \(a = 688\) \(b = 650\) \(n_{1+} = 1338\)
Non-Smoker \(c = 21\) \(d = 59\) \(n_{2+} = 80\)
Total \(n_{+1} = 709\) \(n_{+2} = 709\) \(n = 1418\)

2.2 Estimasi Titik Proporsi

Estimasi proporsi kejadian kanker paru pada masing-masing kelompok:

\[\hat{p}_1 = \frac{a}{n_{1+}} = \frac{688}{1338} = 0{,}5142\]

\[\hat{p}_2 = \frac{c}{n_{2+}} = \frac{21}{80} = 0{,}2625\]

p1_hat <- a / n1
p2_hat <- c / n2
cat("Proporsi Smoker     (p1_hat):", round(p1_hat, 4), "\n")
## Proporsi Smoker     (p1_hat): 0.5142
cat("Proporsi Non-Smoker (p2_hat):", round(p2_hat, 4), "\n")
## Proporsi Non-Smoker (p2_hat): 0.2625

Interpretasi: Proporsi kejadian kanker paru pada kelompok Smoker sebesar 0.5142 (51,42%), sedangkan pada kelompok Non-Smoker sebesar 0.2625 (26,25%). Secara deskriptif, perokok memiliki risiko kanker paru yang lebih tinggi.


2.3 Interval Kepercayaan 95%

2.3.1 Interval Kepercayaan untuk Proporsi Masing-masing Kelompok

Digunakan metode Wilson Score yang lebih akurat dibanding Wald, terutama untuk proporsi mendekati 0 atau 1 (Agresti, 2013):

\[\text{CI}_{95\%}(p) = \frac{\hat{p} + \dfrac{z^2}{2n} \pm z\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} + \dfrac{z^2}{4n^2}}}{1 + \dfrac{z^2}{n}}\]

Untuk Smoker (\(\hat{p}_1 = 0{,}5142,\ n_1 = 1338,\ z_{0.025} = 1{,}96\)):

\[\text{Batas bawah} = \frac{0{,}5142 + \frac{3{,}8416}{2\times1338} - 1{,}96\sqrt{\frac{0{,}5142\times0{,}4858}{1338} + \frac{3{,}8416}{4\times1338^2}}}{1 + \frac{3{,}8416}{1338}} = \frac{0{,}5156 - 1{,}96\times0{,}01368}{1{,}00287} = \frac{0{,}4887}{1{,}00287} \approx 0{,}4873\]

\[\text{Batas atas} = \frac{0{,}5156 + 1{,}96\times0{,}01368}{1{,}00287} = \frac{0{,}5424}{1{,}00287} \approx 0{,}5409\]

Untuk Non-Smoker (\(\hat{p}_2 = 0{,}2625,\ n_2 = 80\)):

\[\text{CI}_{95\%}(\hat{p}_2) \approx \left[\frac{0{,}2863 - 1{,}96\times0{,}04985}{1{,}04802};\ \frac{0{,}2863 + 1{,}96\times0{,}04985}{1{,}04802}\right] = [0{,}1773;\ 0{,}3718]\]

ci_wilson <- function(x, n_obs, conf = 0.95) {
  z  <- qnorm(1 - (1 - conf)/2)
  p  <- x / n_obs
  lo <- (p + z^2/(2*n_obs) - z*sqrt(p*(1-p)/n_obs + z^2/(4*n_obs^2))) / (1 + z^2/n_obs)
  hi <- (p + z^2/(2*n_obs) + z*sqrt(p*(1-p)/n_obs + z^2/(4*n_obs^2))) / (1 + z^2/n_obs)
  c(estimate = p, lower = lo, upper = hi)
}

ci_p1 <- ci_wilson(a, n1)
ci_p2 <- ci_wilson(c, n2)

ci_prop_df <- data.frame(
  Kelompok       = c("Smoker","Non-Smoker"),
  n              = c(n1, n2),
  Proporsi       = c(round(p1_hat,4), round(p2_hat,4)),
  "CI Lower 95%" = c(round(ci_p1["lower"],4), round(ci_p2["lower"],4)),
  "CI Upper 95%" = c(round(ci_p1["upper"],4), round(ci_p2["upper"],4)),
  check.names = FALSE
)

kable(ci_prop_df,
      caption = "Tabel 2. Estimasi Proporsi dan CI 95% (Wilson Score)",
      align = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac")
Tabel 2. Estimasi Proporsi dan CI 95% (Wilson Score)
Kelompok n Proporsi CI Lower 95% CI Upper 95%
Smoker 1338 0.5142 0.4874 0.5409
Non-Smoker 80 0.2625 0.1786 0.3682

2.3.2 Interval Kepercayaan untuk Risk Difference (RD)

Perhitungan Manual:

\[\text{RD} = \hat{p}_1 - \hat{p}_2 = 0{,}5142 - 0{,}2625 = 0{,}2517\]

\[\text{SE}(\text{RD}) = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} = \sqrt{\frac{0{,}5142\times0{,}4858}{1338} + \frac{0{,}2625\times0{,}7375}{80}}\]

\[= \sqrt{\frac{0{,}24982}{1338} + \frac{0{,}19359}{80}} = \sqrt{0{,}000187 + 0{,}002420} = \sqrt{0{,}002607} = 0{,}05106\]

\[\text{CI}_{95\%}(\text{RD}) = 0{,}2517 \pm 1{,}96 \times 0{,}05106 = 0{,}2517 \pm 0{,}1001 = [0{,}1516;\ 0{,}3518]\]

RD    <- p1_hat - p2_hat
SE_RD <- sqrt(p1_hat*(1-p1_hat)/n1 + p2_hat*(1-p2_hat)/n2)
CI_RD <- c(RD - z95*SE_RD, RD + z95*SE_RD)

cat("RD          :", round(RD, 4), "\n")
## RD          : 0.2517
cat("SE(RD)      :", round(SE_RD, 4), "\n")
## SE(RD)      : 0.0511
cat("95% CI RD   : [", round(CI_RD[1],4), ";", round(CI_RD[2],4), "]\n")
## 95% CI RD   : [ 0.1516 ; 0.3518 ]

Interpretasi: \(\text{RD} = 0.2517\), artinya probabilitas kanker paru pada perokok 25.17% lebih tinggi secara absolut dibanding non-perokok. CI 95%: [\(0.1516\); \(0.3518\)] tidak mencakup 0, menandakan perbedaan yang signifikan secara statistik.


2.3.3 Interval Kepercayaan untuk Risk Ratio (RR)

Perhitungan Manual:

\[\text{RR} = \frac{\hat{p}_1}{\hat{p}_2} = \frac{0{,}5142}{0{,}2625} = 1{,}9589\]

\[\text{SE}(\ln\text{RR}) = \sqrt{\frac{1-\hat{p}_1}{a} + \frac{1-\hat{p}_2}{c}} = \sqrt{\frac{0{,}4858}{688} + \frac{0{,}7375}{21}} = \sqrt{0{,}000706 + 0{,}035119} = \sqrt{0{,}035825} = 0{,}1893\]

\[\ln\text{RR} = \ln(1{,}9589) = 0{,}6726\]

\[\text{CI}_{95\%}(\ln\text{RR}) = 0{,}6726 \pm 1{,}96 \times 0{,}1893 = [0{,}3016;\ 1{,}0437]\]

\[\text{CI}_{95\%}(\text{RR}) = \left[e^{0{,}3016};\ e^{1{,}0437}\right] = [1{,}3521;\ 2{,}8394]\]

RR      <- p1_hat / p2_hat
SE_lnRR <- sqrt((1-p1_hat)/a + (1-p2_hat)/c)
CI_RR   <- exp(log(RR) + c(-1,1)*z95*SE_lnRR)

cat("RR          :", round(RR, 4), "\n")
## RR          : 1.9589
cat("ln(RR)      :", round(log(RR), 4), "\n")
## ln(RR)      : 0.6724
cat("SE(ln RR)   :", round(SE_lnRR, 4), "\n")
## SE(ln RR)   : 0.1893
cat("95% CI RR   : [", round(CI_RR[1],4), ";", round(CI_RR[2],4), "]\n")
## 95% CI RR   : [ 1.3517 ; 2.8387 ]

Interpretasi: \(\text{RR} = 1.9589\), artinya Smoker memiliki risiko kanker paru 1.96 kali lebih besar secara relatif dibanding Non-Smoker. CI 95%: [\(1.3517\); \(2.8387\)] tidak mencakup 1.


2.3.4 Interval Kepercayaan untuk Odds Ratio (OR)

Desain case-control tidak memungkinkan estimasi RD dan RR yang valid karena proporsi kasus dikontrol oleh peneliti. Oleh karena itu, OR adalah ukuran asosiasi yang paling tepat untuk data ini (Fleiss et al., 2003).

Perhitungan Manual:

\[\text{OR} = \frac{ad}{bc} = \frac{688 \times 59}{650 \times 21} = \frac{40{.}592}{13{.}650} = 2{,}9737\]

\[\text{SE}(\ln\text{OR}) = \sqrt{\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}} = \sqrt{\frac{1}{688} + \frac{1}{650} + \frac{1}{21} + \frac{1}{59}}\]

\[= \sqrt{0{,}001453 + 0{,}001538 + 0{,}047619 + 0{,}016949} = \sqrt{0{,}067559} = 0{,}2599\]

\[\ln\text{OR} = \ln(2{,}9737) = 1{,}0900\]

\[\text{CI}_{95\%}(\ln\text{OR}) = 1{,}0900 \pm 1{,}96 \times 0{,}2599 = [0{,}5806;\ 1{,}5994]\]

\[\text{CI}_{95\%}(\text{OR}) = \left[e^{0{,}5806};\ e^{1{,}5994}\right] = [1{,}7870;\ 4{,}9502]\]

OR      <- (a * d) / (b * c)
SE_lnOR <- sqrt(1/a + 1/b + 1/c + 1/d)
CI_OR   <- exp(log(OR) + c(-1,1)*z95*SE_lnOR)

cat("OR          :", round(OR, 4), "\n")
## OR          : 2.9738
cat("ln(OR)      :", round(log(OR), 4), "\n")
## ln(OR)      : 1.0898
cat("SE(ln OR)   :", round(SE_lnOR, 4), "\n")
## SE(ln OR)   : 0.2599
cat("95% CI OR   : [", round(CI_OR[1],4), ";", round(CI_OR[2],4), "]\n")
## 95% CI OR   : [ 1.7867 ; 4.9494 ]

Interpretasi: \(\text{OR} = 2.9738\), artinya odds kanker paru pada Smoker 2.97 kali lebih besar dibanding Non-Smoker. CI 95%: [\(1.7867\); \(4.9494\)] jauh di atas 1, mengindikasikan asosiasi yang kuat dan signifikan.


2.3.5 Ringkasan Ukuran Asosiasi

asosiasi_df <- data.frame(
  "Ukuran Asosiasi" = c("Risk Difference (RD)","Risk Ratio (RR)","Odds Ratio (OR)"),
  "Estimasi"    = c(round(RD,4), round(RR,4), round(OR,4)),
  "CI Lower 95%" = c(round(CI_RD[1],4), round(CI_RR[1],4), round(CI_OR[1],4)),
  "CI Upper 95%" = c(round(CI_RD[2],4), round(CI_RR[2],4), round(CI_OR[2],4)),
  "Nilai Null"  = c("0","1","1"),
  "Kesimpulan"  = rep("Signifikan", 3),
  check.names = FALSE
)

kable(asosiasi_df,
      caption = "Tabel 3. Ringkasan Ukuran Asosiasi dan CI 95%",
      align = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = TRUE) |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  column_spec(6, bold=TRUE, color="#1a7a1a")
Tabel 3. Ringkasan Ukuran Asosiasi dan CI 95%
Ukuran Asosiasi Estimasi CI Lower 95% CI Upper 95% Nilai Null Kesimpulan
Risk Difference (RD) 0.2517 0.1516 0.3518 0 Signifikan
Risk Ratio (RR) 1.9589 1.3517 2.8387 1 Signifikan
Odds Ratio (OR) 2.9738 1.7867 4.9494 1 Signifikan

2.4 Uji Dua Proporsi

Hipotesis:

\[H_0: p_1 = p_2 \quad \text{vs} \quad H_1: p_1 \neq p_2\]

Statistik Uji (pooled):

\[z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\!\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\]

Perhitungan Manual:

Proporsi gabungan: \[\hat{p} = \frac{a + c}{n} = \frac{688 + 21}{1418} = \frac{709}{1418} = 0{,}5000\]

Standar error: \[\text{SE} = \sqrt{0{,}5000 \times 0{,}5000 \times \left(\frac{1}{1338}+\frac{1}{80}\right)} = \sqrt{0{,}2500 \times \left(0{,}000747 + 0{,}012500\right)} = \sqrt{0{,}2500 \times 0{,}013247} = \sqrt{0{,}003312} = 0{,}05755\]

Statistik uji: \[z = \frac{0{,}5142 - 0{,}2625}{0{,}05755} = \frac{0{,}2517}{0{,}05755} = 4{,}3733\]

\(p\)-value (dua sisi): \[p = 2 \times P(Z > 4{,}3733) = 2 \times (1 - \Phi(4{,}3733)) \approx 1{,}22 \times 10^{-5}\]

p_pool  <- (a + c) / n
SE_pool <- sqrt(p_pool*(1-p_pool)*(1/n1 + 1/n2))
z_stat  <- (p1_hat - p2_hat) / SE_pool
p_val_z <- 2 * pnorm(-abs(z_stat))

cat("p_pool  :", round(p_pool, 4), "\n")
## p_pool  : 0.5
cat("SE pool :", round(SE_pool, 4), "\n")
## SE pool : 0.0575
cat("z       :", round(z_stat, 4), "\n")
## z       : 4.3737
cat("p-value :", format(p_val_z, scientific=TRUE, digits=4), "\n\n")
## p-value : 1.222e-05
cat("--- Konfirmasi prop.test() ---\n")
## --- Konfirmasi prop.test() ---
print(prop.test(c(a, c), c(n1, n2), correct=FALSE))
## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c(a, c) out of c(n1, n2)
## X-squared = 19.129, df = 1, p-value = 1.222e-05
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1516343 0.3517663
## sample estimates:
##    prop 1    prop 2 
## 0.5142003 0.2625000

Keputusan & Interpretasi: \(z = 4.3737\), \(p \approx 1.22e-05\) \(\ll \alpha = 0{,}05\). \(H_0\) ditolak. Terdapat perbedaan proporsi yang signifikan antara Smoker dan Non-Smoker.


2.5 Uji Chi-Square Independensi

Hipotesis:

\[H_0: \text{Status merokok independen terhadap status kanker} \quad \text{vs} \quad H_1: \text{ada asosiasi}\]

Derajat bebas: \(df = (2-1)(2-1) = 1\)

Perhitungan Manual — Frekuensi Harapan:

\[E_{11} = \frac{n_{1+} \times n_{+1}}{n} = \frac{1338 \times 709}{1418} = \frac{949{.}242}{1418} = 669{,}4302\]

\[E_{12} = \frac{n_{1+} \times n_{+2}}{n} = \frac{1338 \times 709}{1418} = 669{,}4302 \quad \text{(simetris karena }n_{+1}=n_{+2}\text{)}\]

\[E_{21} = \frac{n_{2+} \times n_{+1}}{n} = \frac{80 \times 709}{1418} = \frac{56{.}720}{1418} = 39{,}9859\]

\[E_{22} = \frac{80 \times 709}{1418} = 39{,}9859\]

Statistik Chi-Square:

\[\chi^2 = \frac{(688-669{,}43)^2}{669{,}43} + \frac{(650-669{,}43)^2}{669{,}43} + \frac{(21-39{,}99)^2}{39{,}99} + \frac{(59-39{,}99)^2}{39{,}99}\]

\[= \frac{(18{,}57)^2}{669{,}43} + \frac{(-19{,}43)^2}{669{,}43} + \frac{(-18{,}99)^2}{39{,}99} + \frac{(19{,}01)^2}{39{,}99}\]

\[= \frac{344{,}85}{669{,}43} + \frac{377{,}52}{669{,}43} + \frac{360{,}62}{39{,}99} + \frac{361{,}38}{39{,}99}\]

\[= 0{,}5151 + 0{,}5639 + 9{,}0173 + 9{{,}0373} = 19{,}1336\]

E11 <- (n1*(a+c))/n; E12 <- (n1*(b+d))/n
E21 <- (n2*(a+c))/n; E22 <- (n2*(b+d))/n
E_mat <- matrix(c(E11,E12,E21,E22), 2, 2, byrow=TRUE, dimnames=dimnames(tabel1))

cat("Frekuensi Harapan:\n")
## Frekuensi Harapan:
print(round(E_mat, 4))
##               Status Kanker
## Status Merokok Cancer (+) Control (-)
##     Smoker            669         669
##     Non-Smoker         40          40
chi2_stat <- sum((tabel1 - E_mat)^2 / E_mat)
p_chi     <- pchisq(chi2_stat, df=1, lower.tail=FALSE)

cat("\nChi-square :", round(chi2_stat,4), "| df: 1 | p-value:", format(p_chi, scientific=TRUE), "\n\n")
## 
## Chi-square : 19.1292 | df: 1 | p-value: 1.221601e-05
cat("--- Konfirmasi chisq.test() ---\n")
## --- Konfirmasi chisq.test() ---
print(chisq.test(tabel1, correct=FALSE))
## 
##  Pearson's Chi-squared test
## 
## data:  tabel1
## X-squared = 19.129, df = 1, p-value = 1.222e-05

Keputusan & Interpretasi: \(\chi^2 = 19.1292\), \(df = 1\), \(p \approx 1.221601e-05\). \(H_0\) ditolak. Ada asosiasi yang signifikan antara merokok dan kanker paru. Perhatikan \(z^2 = (4.3737)^2 = 19.1292 \approx \chi^2 = 19.1292\), membuktikan ekuivalensi kedua uji.


2.6 Uji Likelihood Ratio (\(G^2\))

Statistik Uji:

\[G^2 = 2\sum_{i,j} O_{ij} \ln\!\left(\frac{O_{ij}}{E_{ij}}\right)\]

Perhitungan Manual:

\[G^2 = 2\!\left[688\ln\!\left(\frac{688}{669{,}43}\right) + 650\ln\!\left(\frac{650}{669{,}43}\right) + 21\ln\!\left(\frac{21}{39{,}99}\right) + 59\ln\!\left(\frac{59}{39{,}99}\right)\right]\]

\[= 2\!\left[688\ln(1{,}02772) + 650\ln(0{,}97097) + 21\ln(0{,}52513) + 59\ln(1{,}47537)\right]\]

\[= 2\!\left[688(0{,}02734) + 650(-0{,}02946) + 21(-0{,}64378) + 59(0{,}38882)\right]\]

\[= 2\!\left[18{,}81 + (-19{,}15) + (-13{,}52) + 22{,}94\right] = 2 \times 9{,}08 = 18{,}16\]

G2_stat <- 2 * sum(tabel1 * log(tabel1 / E_mat))
p_G2    <- pchisq(G2_stat, df=1, lower.tail=FALSE)

cat("G2 statistik:", round(G2_stat,4), "| df: 1 | p-value:", format(p_G2, scientific=TRUE), "\n\n")
## G2 statistik: 19.878 | df: 1 | p-value: 8.25441e-06
cat("--- Konfirmasi GTest() ---\n")
## --- Konfirmasi GTest() ---
print(GTest(tabel1))
## 
##  Log likelihood ratio (G-test) test of independence without correction
## 
## data:  tabel1
## G = 19.878, X-squared df = 1, p-value = 8.254e-06

Keputusan & Interpretasi: \(G^2 = 19.878\), \(p \approx 8.25441e-06\). \(H_0\) ditolak. Nilai \(G^2\) sedikit berbeda dari \(\chi^2\) Pearson karena menggunakan fungsi logaritma; keduanya konvergen dan konsisten mendeteksi asosiasi.


2.7 Fisher Exact Test

Uji ini menghitung probabilitas exact dari distribusi hipergeometrik tanpa bergantung pada aproksimasi asimptotik — sangat berguna ketika \(n\) kecil atau \(E_{ij} < 5\).

Formula hipergeometrik:

\[P(X = a \mid n_{1+}, n_{2+}, n_{+1}) = \frac{\dbinom{n_{1+}}{a}\dbinom{n_{2+}}{n_{+1}-a}}{\dbinom{n}{n_{+1}}} = \frac{\dbinom{1338}{688}\dbinom{80}{21}}{\dbinom{1418}{709}}\]

Nilai \(p\) dihitung sebagai jumlah probabilitas semua konfigurasi yang sama ekstrem atau lebih ekstrem dari data observasi.

ft <- fisher.test(tabel1, alternative="two.sided")
cat("OR (MLE)    :", round(ft$estimate, 4), "\n")
## OR (MLE)    : 2.9716
cat("95% CI OR   : [", round(ft$conf.int[1],4), ";", round(ft$conf.int[2],4), "]\n")
## 95% CI OR   : [ 1.7556 ; 5.2107 ]
cat("p-value     :", format(ft$p.value, scientific=TRUE), "\n\n")
## p-value     : 1.476303e-05
print(ft)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tabel1
## p-value = 1.476e-05
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.755611 5.210711
## sample estimates:
## odds ratio 
##   2.971634

Keputusan & Interpretasi: \(p \approx 1.476303e-05\). \(H_0\) ditolak. OR dari Fisher exact test = 2.9716, sangat konsisten dengan estimasi manual 2.9738.


2.8 Perbandingan Keempat Metode Uji

comp_df <- data.frame(
  "Metode Uji"    = c("Uji Dua Proporsi (Z)","Chi-Square Pearson",
                      "Likelihood Ratio (G2)","Fisher Exact Test"),
  "H0"            = rep("Tidak ada asosiasi", 4),
  "Statistik Uji" = c(paste0("Z = ",  round(z_stat,4)),
                      paste0("chi2 = ",round(chi2_stat,4)),
                      paste0("G2 = ", round(G2_stat,4)),
                      "Hipergeometrik (exact)"),
  "df"            = c("1","1","1","—"),
  "p-value"       = c(format(p_val_z,    scientific=TRUE, digits=3),
                      format(p_chi,      scientific=TRUE, digits=3),
                      format(p_G2,       scientific=TRUE, digits=3),
                      format(ft$p.value, scientific=TRUE, digits=3)),
  "Keputusan"     = rep("Tolak H0", 4),
  "Catatan"       = c("Z^2 = chi^2 (ekuivalen)",
                      "Asimptotik; syarat E>=5",
                      "Asimptotik; berbasis log-likelihood",
                      "Exact; valid untuk n kecil"),
  check.names = FALSE
)

kable(comp_df,
      caption = "Tabel 4. Perbandingan Hasil Keempat Metode Pengujian",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = TRUE) |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  column_spec(6, bold=TRUE, color="#1a7a1a")
Tabel 4. Perbandingan Hasil Keempat Metode Pengujian
Metode Uji H0 Statistik Uji df p-value Keputusan Catatan
Uji Dua Proporsi (Z) Tidak ada asosiasi Z = 4.3737 1 1.22e-05 Tolak H0 Z^2 = chi^2 (ekuivalen)
Chi-Square Pearson Tidak ada asosiasi chi2 = 19.1292 1 1.22e-05 Tolak H0 Asimptotik; syarat E>=5
Likelihood Ratio (G2) Tidak ada asosiasi G2 = 19.878 1 8.25e-06 Tolak H0 Asimptotik; berbasis log-likelihood
Fisher Exact Test Tidak ada asosiasi Hipergeometrik (exact) 1.48e-05 Tolak H0 Exact; valid untuk n kecil

Diskusi:

  • Uji dua proporsi (Z) dan chi-square Pearson secara matematis ekuivalen untuk tabel 2×2: \(Z^2 = 19.1292 \approx \chi^2 = 19.1292\).
  • \(G^2\) likelihood ratio menghasilkan nilai yang sedikit berbeda namun konsisten; keduanya konvergen pada sampel besar.
  • Fisher exact test memberikan \(p\)-value tepat (exact) tanpa asumsi asimptotik.
  • Keempat metode secara konsisten menolak \(H_0\) dengan \(p\)-value sangat kecil.

2.9 Visualisasi Kasus 1

mosaic(tabel1,
       shade    = TRUE,
       legend   = TRUE,
       main     = "Mosaic Plot: Status Merokok vs Kanker Paru",
       labeling = labeling_border(rot_labels=c(0,0,0,0)),
       gp       = shading_hcl)
Gambar 1. Mosaic Plot — Status Merokok vs Kanker Paru

Gambar 1. Mosaic Plot — Status Merokok vs Kanker Paru

prop_df <- data.frame(
  Kelompok = c("Smoker","Non-Smoker"),
  Proporsi = c(p1_hat, p2_hat),
  Lower    = c(ci_p1["lower"], ci_p2["lower"]),
  Upper    = c(ci_p1["upper"], ci_p2["upper"])
)

ggplot(prop_df, aes(x=Kelompok, y=Proporsi, fill=Kelompok)) +
  geom_col(width=0.5, alpha=0.9, color="white") +
  geom_errorbar(aes(ymin=Lower, ymax=Upper),
                width=0.12, linewidth=1.1, color="#222222") +
  geom_text(aes(label=paste0(round(Proporsi*100,2),"%")),
            vjust=-2.0, fontface="bold", size=5) +
  scale_fill_manual(values=c("Non-Smoker"="#4393c3", "Smoker"="#d6604d")) +
  scale_y_continuous(labels=percent_format(), limits=c(0,0.72)) +
  labs(title    = "Proporsi Kejadian Kanker Paru per Kelompok",
       subtitle = "Error bar = CI 95% (Wilson Score)",
       x=NULL, y="Proporsi", fill=NULL) +
  theme_minimal(base_size=13) +
  theme(legend.position     = "none",
        plot.title           = element_text(face="bold", size=14),
        plot.subtitle        = element_text(color="grey40"),
        panel.grid.major.x   = element_blank())
Gambar 2. Proporsi Kanker Paru per Kelompok dengan CI 95%

Gambar 2. Proporsi Kanker Paru per Kelompok dengan CI 95%


2.10 Kesimpulan Kasus 1

  1. Proporsi kanker paru pada Smoker (51.42%) jauh lebih tinggi dibanding Non-Smoker (26.25%).
  2. Ketiga ukuran asosiasi menunjukkan hubungan yang kuat: RD = 0.2517, RR = 1.9589, OR = 2.9738 — semuanya dengan CI 95% tidak mencakup nilai null.
  3. Keempat metode pengujian secara konsisten menolak \(H_0\) (\(p\)-value sangat kecil).
  4. Kesimpulan substantif: Terdapat bukti statistik yang sangat kuat bahwa kebiasaan merokok berasosiasi dengan kejadian kanker paru. Perokok memiliki odds kanker paru 2.97 kali lebih besar dibanding non-perokok.

3 Kasus 2: Tabel Kontingensi 2×3 — Gender dan Identifikasi Partai Politik

3.1 Penyusunan Tabel Kontingensi

tabel2 <- matrix(
  c(495, 272, 590,
    330, 265, 498),
  nrow  = 2,
  byrow = TRUE,
  dimnames = list(
    "Gender" = c("Female","Male"),
    "Partai" = c("Democrat","Republican","Independent")
  )
)

tabel2_margin <- addmargins(tabel2)

kable(tabel2_margin,
      caption = "Tabel 5. Tabel Kontingensi 2x3: Gender dan Identifikasi Partai Politik",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(nrow(tabel2_margin), bold=TRUE, background="#dce8f5") |>
  column_spec(ncol(tabel2_margin), bold=TRUE, background="#dce8f5") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac")
Tabel 5. Tabel Kontingensi 2x3: Gender dan Identifikasi Partai Politik
Democrat Republican Independent Sum
Female 495 272 590 1357
Male 330 265 498 1093
Sum 825 537 1088 2450

3.2 Frekuensi Harapan

Di bawah \(H_0\) independensi: \(E_{ij} = \dfrac{n_{i+} \cdot n_{+j}}{n}\)

Perhitungan Manual (\(n = 2450\), \(n_{F+} = 1357\), \(n_{M+} = 1093\)):

\[E_{F,\text{Dem}} = \frac{1357 \times 825}{2450} = \frac{1{.}119{.}525}{2450} = 456{,}949 \qquad E_{F,\text{Rep}} = \frac{1357 \times 537}{2450} = \frac{728{.}709}{2450} = 297{,}432\]

\[E_{F,\text{Ind}} = \frac{1357 \times 1088}{2450} = \frac{1{.}476{.}416}{2450} = 602{,}619 \qquad E_{M,\text{Dem}} = \frac{1093 \times 825}{2450} = \frac{901{.}725}{2450} = 368{,}051\]

\[E_{M,\text{Rep}} = \frac{1093 \times 537}{2450} = \frac{586{.}941}{2450} = 239{,}568 \qquad E_{M,\text{Ind}} = \frac{1093 \times 1088}{2450} = \frac{1{.}189{.}184}{2450} = 485{,}381\]

n_row2 <- rowSums(tabel2)
n_col2 <- colSums(tabel2)
n2_tot <- sum(tabel2)
E2     <- outer(n_row2, n_col2) / n2_tot

E2_margin <- addmargins(E2)
rownames(E2_margin)[nrow(E2_margin)] <- "Total"
colnames(E2_margin)[ncol(E2_margin)] <- "Total"

kable(round(E2_margin, 3),
      caption = "Tabel 6. Frekuensi Harapan (E_ij) di Bawah H0 Independensi",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(nrow(E2_margin), bold=TRUE, background="#dce8f5") |>
  column_spec(ncol(E2_margin), bold=TRUE, background="#dce8f5") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac")
Tabel 6. Frekuensi Harapan (E_ij) di Bawah H0 Independensi
Democrat Republican Independent Total
Female 456.949 297.432 602.619 1357
Male 368.051 239.568 485.381 1093
Total 825.000 537.000 1088.000 2450
cat("Minimum E_ij:", round(min(E2),3),
    "->", ifelse(min(E2)>=5,"Syarat E>=5 terpenuhi ✓","TIDAK terpenuhi"), "\n")
## Minimum E_ij: 239.568 -> Syarat E>=5 terpenuhi ✓

3.3 Uji Chi-Square Independensi (Keseluruhan)

Hipotesis:

\[H_0: \text{Gender dan partai independen} \quad \text{vs} \quad H_1: \text{ada asosiasi}\]

Derajat bebas: \(df = (2-1)(3-1) = 2\)

Perhitungan Manual:

\[\chi^2 = \frac{(495-456{,}95)^2}{456{,}95} + \frac{(272-297{,}43)^2}{297{,}43} + \frac{(590-602{,}62)^2}{602{,}62} + \frac{(330-368{,}05)^2}{368{,}05} + \frac{(265-239{,}57)^2}{239{,}57} + \frac{(498-485{,}38)^2}{485{,}38}\]

\[= \frac{(38{,}05)^2}{456{,}95} + \frac{(-25{,}43)^2}{297{,}43} + \frac{(-12{,}62)^2}{602{,}62} + \frac{(-38{,}05)^2}{368{,}05} + \frac{(25{,}43)^2}{239{,}57} + \frac{(12{,}62)^2}{485{,}38}\]

\[= \frac{1447{,}80}{456{,}95} + \frac{646{,}69}{297{,}43} + \frac{159{,}26}{602{,}62} + \frac{1447{,}80}{368{,}05} + \frac{646{,}69}{239{,}57} + \frac{159{,}26}{485{,}38}\]

\[= 3{,}167 + 2{,}175 + 0{,}264 + 3{,}934 + 2{,}700 + 0{,}328 = 12{,}568\]

cs2 <- chisq.test(tabel2, correct=FALSE)

chi2_manual <- sum((tabel2 - E2)^2 / E2)
cat("Chi-square manual      :", round(chi2_manual,4), "\n")
## Chi-square manual      : 12.5693
cat("Chi-square (chisq.test):", round(cs2$statistic,4), "\n")
## Chi-square (chisq.test): 12.5693
cat("df                     :", cs2$parameter, "\n")
## df                     : 2
cat("p-value                :", format(cs2$p.value, scientific=TRUE), "\n\n")
## p-value                : 1.86475e-03
cramer_V <- sqrt(cs2$statistic / (n2_tot * (min(nrow(tabel2),ncol(tabel2))-1)))
cat("Cramer's V             :", round(cramer_V,4), "\n\n")
## Cramer's V             : 0.0716
print(cs2)
## 
##  Pearson's Chi-squared test
## 
## data:  tabel2
## X-squared = 12.569, df = 2, p-value = 0.001865

Keputusan & Interpretasi: \(\chi^2 = 12.5693\), \(df = 2\), \(p = 1.86475e-03\). \(H_0\) ditolak. Ada asosiasi signifikan antara gender dan identifikasi partai. Cramér’s \(V = 0.0716\) menunjukkan kekuatan asosiasi yang lemah hingga sedang.


3.4 Residual Pearson dan Standardized Residual

Residual Pearson: \[r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}\]

Standardized (Adjusted) Residual: \[r_{ij}^{\text{std}} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1-p_{i+})(1-p_{+j})}}\]

Perhitungan Manual — sel Female-Democrat:

\[r_{F,D} = \frac{495 - 456{,}95}{\sqrt{456{,}95}} = \frac{38{,}05}{21{,}38} = 1{,}780\]

Proporsi marginal: \(p_{F+} = 1357/2450 = 0{,}5539\); \(p_{+D} = 825/2450 = 0{,}3367\)

\[r_{F,D}^{\text{std}} = \frac{38{,}05}{\sqrt{456{,}95 \times (1-0{,}5539) \times (1-0{,}3367)}} = \frac{38{,}05}{\sqrt{456{,}95 \times 0{,}4461 \times 0{,}6633}} = \frac{38{,}05}{\sqrt{135{,}23}} = \frac{38{,}05}{11{,}63} = 3{,}272\]

Karena \(|r_{F,D}^{\text{std}}| = 3{,}272 > 2\), sel ini berkontribusi signifikan terhadap chi-square.

pearson_res <- cs2$residuals
std_res     <- cs2$stdres

res_df <- data.frame(
  Sel = c("Female-Democrat","Female-Republican","Female-Independent",
          "Male-Democrat","Male-Republican","Male-Independent"),
  O   = as.vector(t(tabel2)),
  E   = round(as.vector(t(E2)),3),
  "Pearson Residual"      = round(as.vector(t(pearson_res)),4),
  "Standardized Residual" = round(as.vector(t(std_res)),4),
  "Signifikan (|r|>2)"   = ifelse(abs(as.vector(t(std_res)))>2,"Ya","Tidak"),
  check.names = FALSE
)

kable(res_df,
      caption = "Tabel 7. Residual Pearson dan Standardized Residual",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  row_spec(which(abs(as.vector(t(std_res)))>2), bold=TRUE, background="#fff3b0")
Tabel 7. Residual Pearson dan Standardized Residual
Sel O E Pearson Residual Standardized Residual Signifikan (&#124;r&#124;>2)
Female-Democrat 495 456.949 1.7801 3.2724 Ya
Female-Republican 272 297.432 -1.4747 -2.4986 Ya
Female-Independent 590 602.619 -0.5140 -1.0322 Tidak
Male-Democrat 330 368.051 -1.9834 -3.2724 Ya
Male-Republican 265 239.568 1.6431 2.4986 Ya
Male-Independent 498 485.381 0.5728 1.0322 Tidak
res_long        <- as.data.frame(as.table(std_res))
colnames(res_long) <- c("Gender","Partai","Residual")
res_long$label  <- round(res_long$Residual, 3)
res_long$text_col <- ifelse(abs(res_long$Residual) > 1.5, "white", "#333333")

ggplot(res_long, aes(x=Partai, y=Gender, fill=Residual)) +
  geom_tile(color="white", linewidth=1.2) +
  geom_text(aes(label=label, color=text_col), size=6, fontface="bold") +
  scale_color_identity() +
  scale_fill_gradient2(low="#b2182b", mid="#f7f7f7", high="#2166ac",
                       midpoint=0, name="Std.\nResidual", limits=c(-4,4)) +
  labs(title    = "Heatmap Standardized Residuals",
       subtitle = "Biru = lebih tinggi dari harapan | Merah = lebih rendah dari harapan",
       x="Partai Politik", y="Gender") +
  theme_minimal(base_size=13) +
  theme(plot.title    = element_text(face="bold", size=14),
        plot.subtitle = element_text(color="grey40"),
        axis.text     = element_text(size=12, face="bold"),
        panel.grid    = element_blank())
Gambar 3. Heatmap Standardized Residuals — Gender vs Partai

Gambar 3. Heatmap Standardized Residuals — Gender vs Partai

Interpretasi Residual: - Female-Democrat (\(r^{\text{std}} = 3.272\), \(|r|>2\) — signifikan): Perempuan lebih banyak mengidentifikasi sebagai Democrat dari yang diharapkan. - Male-Democrat (\(r^{\text{std}} = -3.272\), \(|r|>2\) — signifikan): Laki-laki lebih sedikit mengidentifikasi sebagai Democrat. - Sel lainnya memiliki \(|r^{\text{std}}| < 2\), tidak signifikan secara individual.


3.5 Partisi Chi-Square

Partisi chi-square membagi uji keseluruhan (\(df=2\)) menjadi dua uji ortogonal (\(df=1\) masing-masing):

\[\chi^2_{\text{total}}(df=2) \approx \chi^2_{\text{Dem vs Rep}}(df=1) + \chi^2_{\text{(Dem+Rep) vs Ind}}(df=1)\]

3.5.1 Partisi 1: Democrat vs Republican

Sub-tabel hanya kolom Democrat dan Republican (\(n = 1362\)):

Democrat Republican Total
Female 495 272 767
Male 330 265 595
Total 825 537 1362

Frekuensi Harapan:

\[E_{F,D}^{(1)} = \frac{767 \times 825}{1362} = \frac{632{.}775}{1362} = 464{,}59 \qquad E_{F,R}^{(1)} = \frac{767 \times 537}{1362} = 302{,}41\]

\[E_{M,D}^{(1)} = \frac{595 \times 825}{1362} = 360{,}41 \qquad E_{M,R}^{(1)} = \frac{595 \times 537}{1362} = 234{,}59\]

Statistik Chi-Square:

\[\chi^2_1 = \frac{(495-464{,}59)^2}{464{,}59} + \frac{(272-302{,}41)^2}{302{,}41} + \frac{(330-360{,}41)^2}{360{,}41} + \frac{(265-234{,}59)^2}{234{,}59}\]

\[= \frac{(30{,}41)^2}{464{,}59} + \frac{(-30{,}41)^2}{302{,}41} + \frac{(-30{,}41)^2}{360{,}41} + \frac{(30{,}41)^2}{234{,}59}\]

\[= \frac{924{,}77}{464{,}59} + \frac{924{,}77}{302{,}41} + \frac{924{,}77}{360{,}41} + \frac{924{,}77}{234{,}59} = 1{,}990 + 3{,}059 + 2{,}565 + 3{,}942 = 11{,}556\]

sub1    <- tabel2[, c("Democrat","Republican")]
cs_sub1 <- chisq.test(sub1, correct=FALSE)
cat("Sub-tabel Partisi 1:\n"); print(addmargins(sub1))
## Sub-tabel Partisi 1:
##         Partai
## Gender   Democrat Republican  Sum
##   Female      495        272  767
##   Male        330        265  595
##   Sum         825        537 1362
cat("\nChi-square:", round(cs_sub1$statistic,4),
    "| df:", cs_sub1$parameter,
    "| p-value:", format(cs_sub1$p.value, scientific=TRUE), "\n")
## 
## Chi-square: 11.5545 | df: 1 | p-value: 6.758479e-04

3.5.2 Partisi 2: (Democrat + Republican) vs Independent

Dem+Rep Independent Total
Female 767 590 1357
Male 595 498 1093
Total 1362 1088 2450

Frekuensi Harapan:

\[E_{F,DR}^{(2)} = \frac{1357 \times 1362}{2450} = \frac{1{.}848{.}234}{2450} = 754{,}789 \qquad E_{F,I}^{(2)} = \frac{1357 \times 1088}{2450} = 602{,}211\]

Statistik Chi-Square:

\[\chi^2_2 = \frac{(767-754{,}79)^2}{754{,}79} + \frac{(590-602{,}21)^2}{602{,}21} + \frac{(595-607{,}21)^2}{607{,}21} + \frac{(498-485{,}79)^2}{485{,}79}\]

\[= \frac{(12{,}21)^2}{754{,}79} + \frac{(-12{,}21)^2}{602{,}21} + \frac{(-12{,}21)^2}{607{,}21} + \frac{(12{,}21)^2}{485{,}79}\]

\[= 0{,}197 + 0{,}247 + 0{,}245 + 0{,}307 = 0{,}996\]

tabel2_p2 <- cbind(
  "Dem+Rep"     = tabel2[,"Democrat"] + tabel2[,"Republican"],
  "Independent" = tabel2[,"Independent"]
)
cs_sub2 <- chisq.test(tabel2_p2, correct=FALSE)
cat("Sub-tabel Partisi 2:\n"); print(addmargins(tabel2_p2))
## Sub-tabel Partisi 2:
##        Dem+Rep Independent  Sum
## Female     767         590 1357
## Male       595         498 1093
## Sum       1362        1088 2450
cat("\nChi-square:", round(cs_sub2$statistic,4),
    "| df:", cs_sub2$parameter,
    "| p-value:", format(cs_sub2$p.value, scientific=TRUE), "\n")
## 
## Chi-square: 1.0654 | df: 1 | p-value: 3.01979e-01

3.6 Perbandingan Partisi dengan Chi-Square Keseluruhan

chi_sum <- cs_sub1$statistic + cs_sub2$statistic

partisi_df <- data.frame(
  "Uji" = c("Chi-Square Keseluruhan (2x3)",
            "Partisi 1: Dem vs Rep (df=1)",
            "Partisi 2: (Dem+Rep) vs Ind (df=1)",
            "Jumlah Partisi (df=2)"),
  "Chi-Square" = c(round(cs2$statistic,4),
                   round(cs_sub1$statistic,4),
                   round(cs_sub2$statistic,4),
                   round(chi_sum,4)),
  "df"         = c(2,1,1,2),
  "p-value"    = c(format(cs2$p.value,  scientific=TRUE, digits=3),
                   format(cs_sub1$p.value, scientific=TRUE, digits=3),
                   format(cs_sub2$p.value, scientific=TRUE, digits=3),
                   format(pchisq(chi_sum,2,lower.tail=FALSE), scientific=TRUE, digits=3)),
  "Keputusan"  = c(rep("Tolak H0",3),"—"),
  check.names = FALSE
)

kable(partisi_df,
      caption = "Tabel 8. Perbandingan Chi-Square Keseluruhan vs Partisi",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  row_spec(4, bold=TRUE, background="#dce8f5")
Tabel 8. Perbandingan Chi-Square Keseluruhan vs Partisi
Uji Chi-Square df p-value Keputusan
Chi-Square Keseluruhan (2x3) 12.5693 2 1.86e-03 Tolak H0
Partisi 1: Dem vs Rep (df=1) 11.5545 1 6.76e-04 Tolak H0
Partisi 2: (Dem+Rep) vs Ind (df=1) 1.0654 1 3.02e-01 Tolak H0
Jumlah Partisi (df=2) 12.6200 2 1.82e-03
cat("Aditivitas:", round(cs_sub1$statistic,4), "+", round(cs_sub2$statistic,4),
    "=", round(chi_sum,4), "~=", round(cs2$statistic,4), "\n")
## Aditivitas: 11.5545 + 1.0654 = 12.62 ~= 12.5693

Diskusi: - Partisi 1 (Dem vs Rep): \(\chi^2 = 11.5545\), \(p = 6.758479e-04\)sangat signifikan; perbedaan gender paling jelas pada pilihan Democrat vs Republican. - Partisi 2 ((Dem+Rep) vs Ind): \(\chi^2 = 1.0654\), \(p = 3.01979e-01\)tidak signifikan; gender tidak membedakan secara bermakna pemilih partai mainstream vs Independent. - Jumlah \(\chi^2\) partisi (\(12.62\)) \(\approx\) \(\chi^2\) total (\(12.5693\)), memverifikasi properti aditivitas.


3.7 Kategori Paling Berkontribusi terhadap Asosiasi

kontrib        <- (tabel2 - E2)^2 / E2
persen_kontrib <- kontrib / sum(kontrib) * 100

kont_df <- as.data.frame(as.table(round(persen_kontrib,2)))
colnames(kont_df) <- c("Gender","Partai","Kontribusi (%)")
kont_df <- kont_df[order(-kont_df[,"Kontribusi (%)"]),]
kont_df[,"Rank"] <- 1:nrow(kont_df)

kable(kont_df,
      caption = "Tabel 9. Kontribusi Setiap Sel terhadap Chi-Square Total (%)",
      align="c", row.names=FALSE) |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width=FALSE, position="center") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  row_spec(1:2, bold=TRUE, background="#fff3b0")
Tabel 9. Kontribusi Setiap Sel terhadap Chi-Square Total (%)
Gender Partai Kontribusi (%) Rank
Male Democrat 31.30 1
Female Democrat 25.21 2
Male Republican 21.48 3
Female Republican 17.30 4
Male Independent 2.61 5
Female Independent 2.10 6
kont_all        <- as.data.frame(as.table(round(persen_kontrib,2)))
colnames(kont_all) <- c("Gender","Partai","Kontribusi")
kont_all$Sel    <- paste(kont_all$Gender, kont_all$Partai, sep="\n")
kont_all$Partai <- factor(kont_all$Partai,
                           levels=c("Democrat","Republican","Independent"))

ggplot(kont_all, aes(x=reorder(Sel,-Kontribusi), y=Kontribusi, fill=Partai)) +
  geom_col(alpha=0.9, color="white", linewidth=0.5) +
  geom_text(aes(label=paste0(round(Kontribusi,1),"%")),
            vjust=-0.4, size=4.2, fontface="bold") +
  scale_fill_manual(values=c("Democrat"    = "#2166ac",
                             "Republican"  = "#d6604d",
                             "Independent" = "#1a9850")) +
  scale_y_continuous(limits=c(0,35)) +
  labs(title="Kontribusi Setiap Sel terhadap Chi-Square Total",
       x="Sel (Gender x Partai)", y="Kontribusi (%)", fill="Partai") +
  theme_minimal(base_size=13) +
  theme(plot.title          = element_text(face="bold", size=14),
        panel.grid.major.x  = element_blank())
Gambar 4. Kontribusi Setiap Sel terhadap Chi-Square Total

Gambar 4. Kontribusi Setiap Sel terhadap Chi-Square Total


3.8 Visualisasi Tambahan Kasus 2

mosaic(tabel2,
       shade    = TRUE,
       legend   = TRUE,
       main     = "Mosaic Plot: Gender x Identifikasi Partai Politik",
       labeling = labeling_border(rot_labels=c(0,0,0,0)),
       gp       = shading_hcl)
Gambar 5. Mosaic Plot — Gender vs Partai Politik

Gambar 5. Mosaic Plot — Gender vs Partai Politik

prop2         <- as.data.frame(prop.table(tabel2, margin=1))
colnames(prop2) <- c("Gender","Partai","Proporsi")
prop2$Partai  <- factor(prop2$Partai,
                        levels=c("Democrat","Republican","Independent"))

ggplot(prop2, aes(x=Gender, y=Proporsi, fill=Partai)) +
  geom_col(position="fill", alpha=0.9, width=0.55,
           color="white", linewidth=0.5) +
  geom_text(aes(label=paste0(round(Proporsi*100,1),"%")),
            position=position_fill(vjust=0.5),
            color="white", fontface="bold", size=5) +
  scale_fill_manual(values=c("Democrat"    = "#2166ac",
                             "Republican"  = "#d6604d",
                             "Independent" = "#1a9850")) +
  scale_y_continuous(labels=percent_format()) +
  labs(title    = "Distribusi Identifikasi Partai per Gender",
       subtitle = "Proporsi baris (row percentage)",
       x=NULL, y="Proporsi", fill="Partai Politik") +
  theme_minimal(base_size=13) +
  theme(plot.title         = element_text(face="bold", size=14),
        plot.subtitle      = element_text(color="grey40"),
        panel.grid.major.x = element_blank())
Gambar 6. Distribusi Proporsi Identifikasi Partai per Gender

Gambar 6. Distribusi Proporsi Identifikasi Partai per Gender


3.9 Kesimpulan Kasus 2

  1. Uji chi-square keseluruhan (\(\chi^2 = 12.5693\), \(df=2\), \(p < 0{,}05\)) membuktikan asosiasi yang signifikan antara gender dan identifikasi partai.
  2. Frekuensi harapan seluruhnya \(\geq 5\) (minimum = 239.6), sehingga aproksimasi chi-square valid.
  3. Residual standar menunjukkan sel Female-Democrat (\(r^{\text{std}} = 3.272\)) dan Male-Democrat (\(r^{\text{std}} = -3.272\)) sebagai penyimpang signifikan.
  4. Partisi chi-square mengungkap bahwa perbedaan gender terkonsentrasi pada Democrat vs Republican (\(\chi^2 = 11.5545\), sangat signifikan), sementara perbandingan partai mainstream vs Independent tidak signifikan.
  5. Kesimpulan substantif: Kategori Democrat adalah yang paling berkontribusi terhadap asosiasi. Perempuan cenderung lebih mengidentifikasi diri sebagai Democrat dibanding laki-laki.

4 Kesimpulan Umum

kesimpulan_df <- data.frame(
  "Kasus"        = c("Kasus 1 (2x2)","Kasus 2 (2x3)"),
  "Variabel"     = c("Merokok – Kanker Paru","Gender – Partai Politik"),
  "Asosiasi"     = c(paste0("RD=",round(RD,3),"; RR=",round(RR,3),"; OR=",round(OR,3)),
                     paste0("V=",round(cramer_V,3))),
  "Chi-Square"   = c(round(chi2_stat,3), round(cs2$statistic,3)),
  "p-value"      = c(format(p_chi, scientific=TRUE, digits=2),
                     format(cs2$p.value, scientific=TRUE, digits=2)),
  "Keputusan"    = c("Tolak H0","Tolak H0"),
  "Temuan Utama" = c("OR=2,97; asosiasi kuat & signifikan",
                     "Democrat paling membedakan gender"),
  check.names = FALSE
)

kable(kesimpulan_df,
      caption = "Tabel 10. Ringkasan Kesimpulan Kedua Kasus",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = TRUE) |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  column_spec(6, bold=TRUE, color="#1a7a1a")
Tabel 10. Ringkasan Kesimpulan Kedua Kasus
Kasus Variabel Asosiasi Chi-Square p-value Keputusan Temuan Utama
Kasus 1 (2x2) Merokok – Kanker Paru RD=0.252; RR=1.959; OR=2.974 19.129 1.2e-05 Tolak H0 OR=2,97; asosiasi kuat & signifikan
X-squared Kasus 2 (2x3) Gender – Partai Politik V=0.072 12.569 1.9e-03 Tolak H0 Democrat paling membedakan gender

Kedua kasus membuktikan pentingnya analisis inferensi yang komprehensif — tidak hanya uji signifikansi statistik, tetapi juga estimasi ukuran asosiasi beserta interval kepercayaannya, serta analisis kontribusi sel melalui residual. Pendekatan terpadu ini memberikan gambaran yang lebih utuh dan substantif tentang hubungan antar variabel kategorik.


5 Referensi

  • Agresti, A. (2013). Categorical Data Analysis (3rd ed.). John Wiley & Sons.
  • Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical Methods for Rates and Proportions (3rd ed.). John Wiley & Sons.
  • Stokes, M. E., Davis, C. S., & Koch, G. G. (2012). Categorical Data Analysis Using SAS (3rd ed.). SAS Institute Inc.
  • R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Zar, J. H. (2010). Biostatistical Analysis (5th ed.). Pearson Prentice Hall.