library(knitr)
library(kableExtra)
library(epitools)
library(vcd)
library(DescTools)
library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)
library(RColorBrewer)

1 Pendahuluan

Tabel kontingensi dua arah merupakan salah satu alat utama dalam analisis data kategorik untuk mengeksplorasi dan menguji hubungan antara dua variabel kategorik. Dalam tugas ini, dilakukan analisis inferensi pada dua kasus:

Kasus 1 (2×2): Hubungan antara kebiasaan merokok dan kanker paru.
Kasus 2 (2×3): Hubungan antara gender dan identifikasi partai politik.

1.1 Ukuran Asosiasi dalam Tabel Kontingensi

Dalam analisis tabel kontingensi 2×2, terdapat tiga ukuran asosiasi utama yang digunakan untuk mengkuantifikasi kekuatan hubungan antara dua variabel kategorik, yaitu Risk Difference (RD), Risk Ratio (RR), dan Odds Ratio (OR).

1.1.1 Risk Difference (RD) — Selisih Risiko

Definisi: Risk Difference (RD), atau selisih risiko, adalah ukuran asosiasi absolut yang mengukur perbedaan besarnya probabilitas kejadian (risk) antara kelompok terpapar dan kelompok tidak terpapar.

\[\text{RD} = p_1 - p_2\]

di mana \(p_1\) adalah proporsi kejadian pada kelompok terpapar dan \(p_2\) pada kelompok tidak terpapar.

Interpretasi:

\(\text{RD} = 0\) : tidak ada perbedaan risiko (nilai null).
\(\text{RD} > 0\) : kelompok terpapar memiliki risiko lebih tinggi.
\(\text{RD} < 0\) : kelompok terpapar memiliki risiko lebih rendah (protektif).

Misalnya, \(\text{RD} = 0{,}25\) berarti risiko pada kelompok terpapar 25 poin persentase lebih tinggi dibanding kelompok tidak terpapar. RD intuitif karena menyatakan selisih probabilitas secara langsung dan berguna untuk mengukur dampak absolut dalam kebijakan kesehatan masyarakat.

1.1.2 Risk Ratio (RR) — Rasio Risiko Relatif

Definisi: Risk Ratio (RR), atau relative risk, adalah ukuran asosiasi relatif yang membandingkan besarnya risiko antara dua kelompok secara proporsional.

\[\text{RR} = \frac{p_1}{p_2}\]

Interpretasi:

\(\text{RR} = 1\) : risiko sama pada kedua kelompok (nilai null).
\(\text{RR} > 1\) : kelompok terpapar memiliki risiko lebih tinggi secara relatif.
\(\text{RR} < 1\) : kelompok terpapar memiliki risiko lebih rendah (faktor protektif).

Misalnya, \(\text{RR} = 2\) berarti kelompok terpapar memiliki risiko dua kali lipat dibanding kelompok tidak terpapar. RR mudah diinterpretasi namun tidak dapat dihitung langsung pada desain studi case-control karena proporsi kasus dikontrol oleh peneliti.

1.1.3 Odds Ratio (OR) — Rasio Odds

Definisi: Odds Ratio (OR) adalah ukuran asosiasi yang membandingkan odds (nisbah kemungkinan) kejadian antara dua kelompok. Odds didefinisikan sebagai rasio probabilitas suatu kejadian terjadi terhadap probabilitas tidak terjadi: \(\text{odds} = p/(1-p)\).

\[\text{OR} = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} = \frac{ad}{bc}\]

di mana \(a, b, c, d\) adalah frekuensi sel tabel 2×2.

Interpretasi:

\(\text{OR} = 1\) : odds sama pada kedua kelompok (nilai null).
\(\text{OR} > 1\) : kelompok terpapar memiliki odds lebih tinggi.
\(\text{OR} < 1\) : kelompok terpapar memiliki odds lebih rendah (protektif).

Misalnya, \(\text{OR} = 3\) berarti odds kejadian pada kelompok terpapar tiga kali lebih besar dibanding kelompok tidak terpapar. OR adalah ukuran asosiasi yang paling fleksibel — valid untuk semua desain studi termasuk case-control. Pada kejadian yang jarang (rare disease assumption, prevalensi < 10%), nilai OR mendekati nilai RR.

1.1.4 Perbandingan Singkat RD, RR, dan OR

Ukuran	Nilai Null	Sifat	Kegunaan Utama
RD	0	Absolut	Kohort, Cross-sectional; dampak kebijakan
RR	1	Relatif	Kohort, Cross-sectional
OR	1	Relatif (odds)	Semua desain, termasuk case-control

1.2 Metode Pengujian yang Digunakan

Analisis meliputi empat metode pengujian hipotesis: uji dua proporsi, chi-square Pearson, likelihood ratio (\(G^2\)), dan Fisher exact test; serta partisi chi-square untuk kasus 2×3.

Referensi utama:

Agresti, A. (2013). Categorical Data Analysis (3rd ed.). Wiley.
Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical Methods for Rates and Proportions (3rd ed.). Wiley.
Stokes, M. E., Davis, C. S., & Koch, G. G. (2012). Categorical Data Analysis Using SAS (3rd ed.). SAS Institute.
Zar, J. H. (2010). Biostatistical Analysis (5th ed.). Pearson Prentice Hall.

2 Kasus 1: Tabel Kontingensi 2×2 — Merokok dan Kanker Paru

2.1 Penyusunan Tabel Kontingensi

Data yang digunakan menggambarkan hubungan antara status merokok (Smoker vs Non-Smoker) dan kejadian kanker paru (Cancer (+) vs Control (-)) pada desain studi case-control.

tabel1 <- matrix(
  c(688, 650, 21, 59),
  nrow  = 2,
  byrow = TRUE,
  dimnames = list(
    "Status Merokok" = c("Smoker", "Non-Smoker"),
    "Status Kanker"  = c("Cancer (+)", "Control (-)")
  )
)
tabel1_margin <- addmargins(tabel1)

kable(tabel1_margin,
      caption = "Tabel 1. Tabel Kontingensi 2x2: Status Merokok dan Kanker Paru",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position = "center") |>
  row_spec(nrow(tabel1_margin), bold=TRUE, background="#dce8f5") |>
  column_spec(ncol(tabel1_margin), bold=TRUE, background="#dce8f5") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac")

Tabel 1. Tabel Kontingensi 2x2: Status Merokok dan Kanker Paru
	Cancer (+)	Control (-)	Sum
Smoker	688	650	1338
Non-Smoker	21	59	80
Sum	709	709	1418

Notasi sel tabel 2×2:

	Cancer (+)	Control (−)	Total
Smoker	\(a = 688\)	\(b = 650\)	\(n_{1+} = 1338\)
Non-Smoker	\(c = 21\)	\(d = 59\)	\(n_{2+} = 80\)
Total	\(n_{+1} = 709\)	\(n_{+2} = 709\)	\(n = 1418\)

2.2 Estimasi Titik Proporsi

Estimasi proporsi kejadian kanker paru pada masing-masing kelompok:

\[\hat{p}_1 = \frac{a}{n_{1+}} = \frac{688}{1338} = 0{,}5142\]

\[\hat{p}_2 = \frac{c}{n_{2+}} = \frac{21}{80} = 0{,}2625\]

p1_hat <- a / n1
p2_hat <- c / n2
cat("Proporsi Smoker     (p1_hat):", round(p1_hat, 4), "\n")

## Proporsi Smoker     (p1_hat): 0.5142

cat("Proporsi Non-Smoker (p2_hat):", round(p2_hat, 4), "\n")

## Proporsi Non-Smoker (p2_hat): 0.2625

Interpretasi: Proporsi kejadian kanker paru pada kelompok Smoker sebesar 0.5142 (51,42%), sedangkan pada kelompok Non-Smoker sebesar 0.2625 (26,25%). Secara deskriptif, perokok memiliki risiko kanker paru yang lebih tinggi.

2.3 Interval Kepercayaan 95%

2.3.1 Interval Kepercayaan untuk Proporsi Masing-masing Kelompok

Digunakan metode Wilson Score yang lebih akurat dibanding Wald, terutama untuk proporsi mendekati 0 atau 1 (Agresti, 2013):

\[\text{CI}_{95\%}(p) = \frac{\hat{p} + \dfrac{z^2}{2n} \pm z\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} + \dfrac{z^2}{4n^2}}}{1 + \dfrac{z^2}{n}}\]

Untuk Smoker (\(\hat{p}_1 = 0{,}5142,\ n_1 = 1338,\ z_{0.025} = 1{,}96\)):

\[\text{Batas bawah} = \frac{0{,}5142 + \frac{3{,}8416}{2\times1338} - 1{,}96\sqrt{\frac{0{,}5142\times0{,}4858}{1338} + \frac{3{,}8416}{4\times1338^2}}}{1 + \frac{3{,}8416}{1338}} = \frac{0{,}5156 - 1{,}96\times0{,}01368}{1{,}00287} = \frac{0{,}4887}{1{,}00287} \approx 0{,}4873\]

\[\text{Batas atas} = \frac{0{,}5156 + 1{,}96\times0{,}01368}{1{,}00287} = \frac{0{,}5424}{1{,}00287} \approx 0{,}5409\]

Untuk Non-Smoker (\(\hat{p}_2 = 0{,}2625,\ n_2 = 80\)):

\[\text{CI}_{95\%}(\hat{p}_2) \approx \left[\frac{0{,}2863 - 1{,}96\times0{,}04985}{1{,}04802};\ \frac{0{,}2863 + 1{,}96\times0{,}04985}{1{,}04802}\right] = [0{,}1773;\ 0{,}3718]\]

ci_wilson <- function(x, n_obs, conf = 0.95) {
  z  <- qnorm(1 - (1 - conf)/2)
  p  <- x / n_obs
  lo <- (p + z^2/(2*n_obs) - z*sqrt(p*(1-p)/n_obs + z^2/(4*n_obs^2))) / (1 + z^2/n_obs)
  hi <- (p + z^2/(2*n_obs) + z*sqrt(p*(1-p)/n_obs + z^2/(4*n_obs^2))) / (1 + z^2/n_obs)
  c(estimate = p, lower = lo, upper = hi)
}

ci_p1 <- ci_wilson(a, n1)
ci_p2 <- ci_wilson(c, n2)

ci_prop_df <- data.frame(
  Kelompok       = c("Smoker","Non-Smoker"),
  n              = c(n1, n2),
  Proporsi       = c(round(p1_hat,4), round(p2_hat,4)),
  "CI Lower 95%" = c(round(ci_p1["lower"],4), round(ci_p2["lower"],4)),
  "CI Upper 95%" = c(round(ci_p1["upper"],4), round(ci_p2["upper"],4)),
  check.names = FALSE
)

kable(ci_prop_df,
      caption = "Tabel 2. Estimasi Proporsi dan CI 95% (Wilson Score)",
      align = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac")

Tabel 2. Estimasi Proporsi dan CI 95% (Wilson Score)
Kelompok	n	Proporsi	CI Lower 95%	CI Upper 95%
Smoker	1338	0.5142	0.4874	0.5409
Non-Smoker	80	0.2625	0.1786	0.3682

2.3.2 Interval Kepercayaan untuk Risk Difference (RD)

Perhitungan Manual:

\[\text{RD} = \hat{p}_1 - \hat{p}_2 = 0{,}5142 - 0{,}2625 = 0{,}2517\]

\[\text{SE}(\text{RD}) = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} = \sqrt{\frac{0{,}5142\times0{,}4858}{1338} + \frac{0{,}2625\times0{,}7375}{80}}\]

\[= \sqrt{\frac{0{,}24982}{1338} + \frac{0{,}19359}{80}} = \sqrt{0{,}000187 + 0{,}002420} = \sqrt{0{,}002607} = 0{,}05106\]

\[\text{CI}_{95\%}(\text{RD}) = 0{,}2517 \pm 1{,}96 \times 0{,}05106 = 0{,}2517 \pm 0{,}1001 = [0{,}1516;\ 0{,}3518]\]

RD    <- p1_hat - p2_hat
SE_RD <- sqrt(p1_hat*(1-p1_hat)/n1 + p2_hat*(1-p2_hat)/n2)
CI_RD <- c(RD - z95*SE_RD, RD + z95*SE_RD)

cat("RD          :", round(RD, 4), "\n")

## RD          : 0.2517

cat("SE(RD)      :", round(SE_RD, 4), "\n")

## SE(RD)      : 0.0511

cat("95% CI RD   : [", round(CI_RD[1],4), ";", round(CI_RD[2],4), "]\n")

## 95% CI RD   : [ 0.1516 ; 0.3518 ]

Interpretasi: \(\text{RD} = 0.2517\), artinya probabilitas kanker paru pada perokok 25.17% lebih tinggi secara absolut dibanding non-perokok. CI 95%: [\(0.1516\); \(0.3518\)] tidak mencakup 0, menandakan perbedaan yang signifikan secara statistik.

2.3.3 Interval Kepercayaan untuk Risk Ratio (RR)

Perhitungan Manual:

\[\text{RR} = \frac{\hat{p}_1}{\hat{p}_2} = \frac{0{,}5142}{0{,}2625} = 1{,}9589\]

\[\text{SE}(\ln\text{RR}) = \sqrt{\frac{1-\hat{p}_1}{a} + \frac{1-\hat{p}_2}{c}} = \sqrt{\frac{0{,}4858}{688} + \frac{0{,}7375}{21}} = \sqrt{0{,}000706 + 0{,}035119} = \sqrt{0{,}035825} = 0{,}1893\]

\[\ln\text{RR} = \ln(1{,}9589) = 0{,}6726\]

\[\text{CI}_{95\%}(\ln\text{RR}) = 0{,}6726 \pm 1{,}96 \times 0{,}1893 = [0{,}3016;\ 1{,}0437]\]

\[\text{CI}_{95\%}(\text{RR}) = \left[e^{0{,}3016};\ e^{1{,}0437}\right] = [1{,}3521;\ 2{,}8394]\]

RR      <- p1_hat / p2_hat
SE_lnRR <- sqrt((1-p1_hat)/a + (1-p2_hat)/c)
CI_RR   <- exp(log(RR) + c(-1,1)*z95*SE_lnRR)

cat("RR          :", round(RR, 4), "\n")

## RR          : 1.9589

cat("ln(RR)      :", round(log(RR), 4), "\n")

## ln(RR)      : 0.6724

cat("SE(ln RR)   :", round(SE_lnRR, 4), "\n")

## SE(ln RR)   : 0.1893

cat("95% CI RR   : [", round(CI_RR[1],4), ";", round(CI_RR[2],4), "]\n")

## 95% CI RR   : [ 1.3517 ; 2.8387 ]

Interpretasi: \(\text{RR} = 1.9589\), artinya Smoker memiliki risiko kanker paru 1.96 kali lebih besar secara relatif dibanding Non-Smoker. CI 95%: [\(1.3517\); \(2.8387\)] tidak mencakup 1.

2.3.4 Interval Kepercayaan untuk Odds Ratio (OR)

Desain case-control tidak memungkinkan estimasi RD dan RR yang valid karena proporsi kasus dikontrol oleh peneliti. Oleh karena itu, OR adalah ukuran asosiasi yang paling tepat untuk data ini (Fleiss et al., 2003).

Perhitungan Manual:

\[\text{OR} = \frac{ad}{bc} = \frac{688 \times 59}{650 \times 21} = \frac{40{.}592}{13{.}650} = 2{,}9737\]

\[\text{SE}(\ln\text{OR}) = \sqrt{\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}} = \sqrt{\frac{1}{688} + \frac{1}{650} + \frac{1}{21} + \frac{1}{59}}\]

\[= \sqrt{0{,}001453 + 0{,}001538 + 0{,}047619 + 0{,}016949} = \sqrt{0{,}067559} = 0{,}2599\]

\[\ln\text{OR} = \ln(2{,}9737) = 1{,}0900\]

\[\text{CI}_{95\%}(\ln\text{OR}) = 1{,}0900 \pm 1{,}96 \times 0{,}2599 = [0{,}5806;\ 1{,}5994]\]

\[\text{CI}_{95\%}(\text{OR}) = \left[e^{0{,}5806};\ e^{1{,}5994}\right] = [1{,}7870;\ 4{,}9502]\]

OR      <- (a * d) / (b * c)
SE_lnOR <- sqrt(1/a + 1/b + 1/c + 1/d)
CI_OR   <- exp(log(OR) + c(-1,1)*z95*SE_lnOR)

cat("OR          :", round(OR, 4), "\n")

## OR          : 2.9738

cat("ln(OR)      :", round(log(OR), 4), "\n")

## ln(OR)      : 1.0898

cat("SE(ln OR)   :", round(SE_lnOR, 4), "\n")

## SE(ln OR)   : 0.2599

cat("95% CI OR   : [", round(CI_OR[1],4), ";", round(CI_OR[2],4), "]\n")

## 95% CI OR   : [ 1.7867 ; 4.9494 ]

Interpretasi: \(\text{OR} = 2.9738\), artinya odds kanker paru pada Smoker 2.97 kali lebih besar dibanding Non-Smoker. CI 95%: [\(1.7867\); \(4.9494\)] jauh di atas 1, mengindikasikan asosiasi yang kuat dan signifikan.

2.3.5 Ringkasan Ukuran Asosiasi

asosiasi_df <- data.frame(
  "Ukuran Asosiasi" = c("Risk Difference (RD)","Risk Ratio (RR)","Odds Ratio (OR)"),
  "Estimasi"    = c(round(RD,4), round(RR,4), round(OR,4)),
  "CI Lower 95%" = c(round(CI_RD[1],4), round(CI_RR[1],4), round(CI_OR[1],4)),
  "CI Upper 95%" = c(round(CI_RD[2],4), round(CI_RR[2],4), round(CI_OR[2],4)),
  "Nilai Null"  = c("0","1","1"),
  "Kesimpulan"  = rep("Signifikan", 3),
  check.names = FALSE
)

kable(asosiasi_df,
      caption = "Tabel 3. Ringkasan Ukuran Asosiasi dan CI 95%",
      align = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = TRUE) |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  column_spec(6, bold=TRUE, color="#1a7a1a")

Tabel 3. Ringkasan Ukuran Asosiasi dan CI 95%
Ukuran Asosiasi	Estimasi	CI Lower 95%	CI Upper 95%	Nilai Null	Kesimpulan
Risk Difference (RD)	0.2517	0.1516	0.3518	0	Signifikan
Risk Ratio (RR)	1.9589	1.3517	2.8387	1	Signifikan
Odds Ratio (OR)	2.9738	1.7867	4.9494	1	Signifikan

2.4 Uji Dua Proporsi

Hipotesis:

\[H_0: p_1 = p_2 \quad \text{vs} \quad H_1: p_1 \neq p_2\]

Statistik Uji (pooled):

\[z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\!\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\]

Perhitungan Manual:

Proporsi gabungan: \[\hat{p} = \frac{a + c}{n} = \frac{688 + 21}{1418} = \frac{709}{1418} = 0{,}5000\]

Standar error: \[\text{SE} = \sqrt{0{,}5000 \times 0{,}5000 \times \left(\frac{1}{1338}+\frac{1}{80}\right)} = \sqrt{0{,}2500 \times \left(0{,}000747 + 0{,}012500\right)} = \sqrt{0{,}2500 \times 0{,}013247} = \sqrt{0{,}003312} = 0{,}05755\]

Statistik uji: \[z = \frac{0{,}5142 - 0{,}2625}{0{,}05755} = \frac{0{,}2517}{0{,}05755} = 4{,}3733\]

\(p\)-value (dua sisi): \[p = 2 \times P(Z > 4{,}3733) = 2 \times (1 - \Phi(4{,}3733)) \approx 1{,}22 \times 10^{-5}\]

p_pool  <- (a + c) / n
SE_pool <- sqrt(p_pool*(1-p_pool)*(1/n1 + 1/n2))
z_stat  <- (p1_hat - p2_hat) / SE_pool
p_val_z <- 2 * pnorm(-abs(z_stat))

cat("p_pool  :", round(p_pool, 4), "\n")

## p_pool  : 0.5

cat("SE pool :", round(SE_pool, 4), "\n")

## SE pool : 0.0575

cat("z       :", round(z_stat, 4), "\n")

## z       : 4.3737

cat("p-value :", format(p_val_z, scientific=TRUE, digits=4), "\n\n")

## p-value : 1.222e-05

cat("--- Konfirmasi prop.test() ---\n")

## --- Konfirmasi prop.test() ---

print(prop.test(c(a, c), c(n1, n2), correct=FALSE))

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c(a, c) out of c(n1, n2)
## X-squared = 19.129, df = 1, p-value = 1.222e-05
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1516343 0.3517663
## sample estimates:
##    prop 1    prop 2 
## 0.5142003 0.2625000

Keputusan & Interpretasi: \(z = 4.3737\), \(p \approx 1.22e-05\) \(\ll \alpha = 0{,}05\). \(H_0\) ditolak. Terdapat perbedaan proporsi yang signifikan antara Smoker dan Non-Smoker.

2.5 Uji Chi-Square Independensi

Hipotesis:

\[H_0: \text{Status merokok independen terhadap status kanker} \quad \text{vs} \quad H_1: \text{ada asosiasi}\]

Derajat bebas: \(df = (2-1)(2-1) = 1\)

Perhitungan Manual — Frekuensi Harapan:

\[E_{11} = \frac{n_{1+} \times n_{+1}}{n} = \frac{1338 \times 709}{1418} = \frac{949{.}242}{1418} = 669{,}4302\]

\[E_{12} = \frac{n_{1+} \times n_{+2}}{n} = \frac{1338 \times 709}{1418} = 669{,}4302 \quad \text{(simetris karena }n_{+1}=n_{+2}\text{)}\]

\[E_{21} = \frac{n_{2+} \times n_{+1}}{n} = \frac{80 \times 709}{1418} = \frac{56{.}720}{1418} = 39{,}9859\]

\[E_{22} = \frac{80 \times 709}{1418} = 39{,}9859\]

Statistik Chi-Square:

\[\chi^2 = \frac{(688-669{,}43)^2}{669{,}43} + \frac{(650-669{,}43)^2}{669{,}43} + \frac{(21-39{,}99)^2}{39{,}99} + \frac{(59-39{,}99)^2}{39{,}99}\]

\[= \frac{(18{,}57)^2}{669{,}43} + \frac{(-19{,}43)^2}{669{,}43} + \frac{(-18{,}99)^2}{39{,}99} + \frac{(19{,}01)^2}{39{,}99}\]

\[= \frac{344{,}85}{669{,}43} + \frac{377{,}52}{669{,}43} + \frac{360{,}62}{39{,}99} + \frac{361{,}38}{39{,}99}\]

\[= 0{,}5151 + 0{,}5639 + 9{,}0173 + 9{{,}0373} = 19{,}1336\]

E11 <- (n1*(a+c))/n; E12 <- (n1*(b+d))/n
E21 <- (n2*(a+c))/n; E22 <- (n2*(b+d))/n
E_mat <- matrix(c(E11,E12,E21,E22), 2, 2, byrow=TRUE, dimnames=dimnames(tabel1))

cat("Frekuensi Harapan:\n")

## Frekuensi Harapan:

print(round(E_mat, 4))

##               Status Kanker
## Status Merokok Cancer (+) Control (-)
##     Smoker            669         669
##     Non-Smoker         40          40

chi2_stat <- sum((tabel1 - E_mat)^2 / E_mat)
p_chi     <- pchisq(chi2_stat, df=1, lower.tail=FALSE)

cat("\nChi-square :", round(chi2_stat,4), "| df: 1 | p-value:", format(p_chi, scientific=TRUE), "\n\n")

## 
## Chi-square : 19.1292 | df: 1 | p-value: 1.221601e-05

cat("--- Konfirmasi chisq.test() ---\n")

## --- Konfirmasi chisq.test() ---

print(chisq.test(tabel1, correct=FALSE))

## 
##  Pearson's Chi-squared test
## 
## data:  tabel1
## X-squared = 19.129, df = 1, p-value = 1.222e-05

Keputusan & Interpretasi: \(\chi^2 = 19.1292\), \(df = 1\), \(p \approx 1.221601e-05\). \(H_0\) ditolak. Ada asosiasi yang signifikan antara merokok dan kanker paru. Perhatikan \(z^2 = (4.3737)^2 = 19.1292 \approx \chi^2 = 19.1292\), membuktikan ekuivalensi kedua uji.

2.6 Uji Likelihood Ratio (\(G^2\))

Statistik Uji:

\[G^2 = 2\sum_{i,j} O_{ij} \ln\!\left(\frac{O_{ij}}{E_{ij}}\right)\]

Perhitungan Manual:

\[G^2 = 2\!\left[688\ln\!\left(\frac{688}{669{,}43}\right) + 650\ln\!\left(\frac{650}{669{,}43}\right) + 21\ln\!\left(\frac{21}{39{,}99}\right) + 59\ln\!\left(\frac{59}{39{,}99}\right)\right]\]

\[= 2\!\left[688\ln(1{,}02772) + 650\ln(0{,}97097) + 21\ln(0{,}52513) + 59\ln(1{,}47537)\right]\]

\[= 2\!\left[688(0{,}02734) + 650(-0{,}02946) + 21(-0{,}64378) + 59(0{,}38882)\right]\]

\[= 2\!\left[18{,}81 + (-19{,}15) + (-13{,}52) + 22{,}94\right] = 2 \times 9{,}08 = 18{,}16\]

G2_stat <- 2 * sum(tabel1 * log(tabel1 / E_mat))
p_G2    <- pchisq(G2_stat, df=1, lower.tail=FALSE)

cat("G2 statistik:", round(G2_stat,4), "| df: 1 | p-value:", format(p_G2, scientific=TRUE), "\n\n")

## G2 statistik: 19.878 | df: 1 | p-value: 8.25441e-06

cat("--- Konfirmasi GTest() ---\n")

## --- Konfirmasi GTest() ---

print(GTest(tabel1))

## 
##  Log likelihood ratio (G-test) test of independence without correction
## 
## data:  tabel1
## G = 19.878, X-squared df = 1, p-value = 8.254e-06

Keputusan & Interpretasi: \(G^2 = 19.878\), \(p \approx 8.25441e-06\). \(H_0\) ditolak. Nilai \(G^2\) sedikit berbeda dari \(\chi^2\) Pearson karena menggunakan fungsi logaritma; keduanya konvergen dan konsisten mendeteksi asosiasi.

2.7 Fisher Exact Test

Uji ini menghitung probabilitas exact dari distribusi hipergeometrik tanpa bergantung pada aproksimasi asimptotik — sangat berguna ketika \(n\) kecil atau \(E_{ij} < 5\).

Formula hipergeometrik:

\[P(X = a \mid n_{1+}, n_{2+}, n_{+1}) = \frac{\dbinom{n_{1+}}{a}\dbinom{n_{2+}}{n_{+1}-a}}{\dbinom{n}{n_{+1}}} = \frac{\dbinom{1338}{688}\dbinom{80}{21}}{\dbinom{1418}{709}}\]

Nilai \(p\) dihitung sebagai jumlah probabilitas semua konfigurasi yang sama ekstrem atau lebih ekstrem dari data observasi.

ft <- fisher.test(tabel1, alternative="two.sided")
cat("OR (MLE)    :", round(ft$estimate, 4), "\n")

## OR (MLE)    : 2.9716

cat("95% CI OR   : [", round(ft$conf.int[1],4), ";", round(ft$conf.int[2],4), "]\n")

## 95% CI OR   : [ 1.7556 ; 5.2107 ]

cat("p-value     :", format(ft$p.value, scientific=TRUE), "\n\n")

## p-value     : 1.476303e-05

print(ft)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  tabel1
## p-value = 1.476e-05
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.755611 5.210711
## sample estimates:
## odds ratio 
##   2.971634

Keputusan & Interpretasi: \(p \approx 1.476303e-05\). \(H_0\) ditolak. OR dari Fisher exact test = 2.9716, sangat konsisten dengan estimasi manual 2.9738.

2.8 Perbandingan Keempat Metode Uji

comp_df <- data.frame(
  "Metode Uji"    = c("Uji Dua Proporsi (Z)","Chi-Square Pearson",
                      "Likelihood Ratio (G2)","Fisher Exact Test"),
  "H0"            = rep("Tidak ada asosiasi", 4),
  "Statistik Uji" = c(paste0("Z = ",  round(z_stat,4)),
                      paste0("chi2 = ",round(chi2_stat,4)),
                      paste0("G2 = ", round(G2_stat,4)),
                      "Hipergeometrik (exact)"),
  "df"            = c("1","1","1","—"),
  "p-value"       = c(format(p_val_z,    scientific=TRUE, digits=3),
                      format(p_chi,      scientific=TRUE, digits=3),
                      format(p_G2,       scientific=TRUE, digits=3),
                      format(ft$p.value, scientific=TRUE, digits=3)),
  "Keputusan"     = rep("Tolak H0", 4),
  "Catatan"       = c("Z^2 = chi^2 (ekuivalen)",
                      "Asimptotik; syarat E>=5",
                      "Asimptotik; berbasis log-likelihood",
                      "Exact; valid untuk n kecil"),
  check.names = FALSE
)

kable(comp_df,
      caption = "Tabel 4. Perbandingan Hasil Keempat Metode Pengujian",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = TRUE) |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  column_spec(6, bold=TRUE, color="#1a7a1a")

Tabel 4. Perbandingan Hasil Keempat Metode Pengujian
Metode Uji	H0	Statistik Uji	df	p-value	Keputusan	Catatan
Uji Dua Proporsi (Z)	Tidak ada asosiasi	Z = 4.3737	1	1.22e-05	Tolak H0	Z^2 = chi^2 (ekuivalen)
Chi-Square Pearson	Tidak ada asosiasi	chi2 = 19.1292	1	1.22e-05	Tolak H0	Asimptotik; syarat E>=5
Likelihood Ratio (G2)	Tidak ada asosiasi	G2 = 19.878	1	8.25e-06	Tolak H0	Asimptotik; berbasis log-likelihood
Fisher Exact Test	Tidak ada asosiasi	Hipergeometrik (exact)	—	1.48e-05	Tolak H0	Exact; valid untuk n kecil

Diskusi:

Uji dua proporsi (Z) dan chi-square Pearson secara matematis ekuivalen untuk tabel 2×2: \(Z^2 = 19.1292 \approx \chi^2 = 19.1292\).
\(G^2\) likelihood ratio menghasilkan nilai yang sedikit berbeda namun konsisten; keduanya konvergen pada sampel besar.
Fisher exact test memberikan \(p\)-value tepat (exact) tanpa asumsi asimptotik.
Keempat metode secara konsisten menolak \(H_0\) dengan \(p\)-value sangat kecil.

2.9 Visualisasi Kasus 1

mosaic(tabel1,
       shade    = TRUE,
       legend   = TRUE,
       main     = "Mosaic Plot: Status Merokok vs Kanker Paru",
       labeling = labeling_border(rot_labels=c(0,0,0,0)),
       gp       = shading_hcl)

Gambar 1. Mosaic Plot — Status Merokok vs Kanker Paru

prop_df <- data.frame(
  Kelompok = c("Smoker","Non-Smoker"),
  Proporsi = c(p1_hat, p2_hat),
  Lower    = c(ci_p1["lower"], ci_p2["lower"]),
  Upper    = c(ci_p1["upper"], ci_p2["upper"])
)

ggplot(prop_df, aes(x=Kelompok, y=Proporsi, fill=Kelompok)) +
  geom_col(width=0.5, alpha=0.9, color="white") +
  geom_errorbar(aes(ymin=Lower, ymax=Upper),
                width=0.12, linewidth=1.1, color="#222222") +
  geom_text(aes(label=paste0(round(Proporsi*100,2),"%")),
            vjust=-2.0, fontface="bold", size=5) +
  scale_fill_manual(values=c("Non-Smoker"="#4393c3", "Smoker"="#d6604d")) +
  scale_y_continuous(labels=percent_format(), limits=c(0,0.72)) +
  labs(title    = "Proporsi Kejadian Kanker Paru per Kelompok",
       subtitle = "Error bar = CI 95% (Wilson Score)",
       x=NULL, y="Proporsi", fill=NULL) +
  theme_minimal(base_size=13) +
  theme(legend.position     = "none",
        plot.title           = element_text(face="bold", size=14),
        plot.subtitle        = element_text(color="grey40"),
        panel.grid.major.x   = element_blank())

Gambar 2. Proporsi Kanker Paru per Kelompok dengan CI 95%

2.10 Kesimpulan Kasus 1

Proporsi kanker paru pada Smoker (51.42%) jauh lebih tinggi dibanding Non-Smoker (26.25%).
Ketiga ukuran asosiasi menunjukkan hubungan yang kuat: RD = 0.2517, RR = 1.9589, OR = 2.9738 — semuanya dengan CI 95% tidak mencakup nilai null.
Keempat metode pengujian secara konsisten menolak \(H_0\) (\(p\)-value sangat kecil).
Kesimpulan substantif: Terdapat bukti statistik yang sangat kuat bahwa kebiasaan merokok berasosiasi dengan kejadian kanker paru. Perokok memiliki odds kanker paru 2.97 kali lebih besar dibanding non-perokok.

3 Kasus 2: Tabel Kontingensi 2×3 — Gender dan Identifikasi Partai Politik

3.1 Penyusunan Tabel Kontingensi

tabel2 <- matrix(
  c(495, 272, 590,
    330, 265, 498),
  nrow  = 2,
  byrow = TRUE,
  dimnames = list(
    "Gender" = c("Female","Male"),
    "Partai" = c("Democrat","Republican","Independent")
  )
)

tabel2_margin <- addmargins(tabel2)

kable(tabel2_margin,
      caption = "Tabel 5. Tabel Kontingensi 2x3: Gender dan Identifikasi Partai Politik",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(nrow(tabel2_margin), bold=TRUE, background="#dce8f5") |>
  column_spec(ncol(tabel2_margin), bold=TRUE, background="#dce8f5") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac")

Tabel 5. Tabel Kontingensi 2x3: Gender dan Identifikasi Partai Politik
	Democrat	Republican	Independent	Sum
Female	495	272	590	1357
Male	330	265	498	1093
Sum	825	537	1088	2450

3.2 Frekuensi Harapan

Di bawah \(H_0\) independensi: \(E_{ij} = \dfrac{n_{i+} \cdot n_{+j}}{n}\)

Perhitungan Manual (\(n = 2450\), \(n_{F+} = 1357\), \(n_{M+} = 1093\)):

\[E_{F,\text{Dem}} = \frac{1357 \times 825}{2450} = \frac{1{.}119{.}525}{2450} = 456{,}949 \qquad E_{F,\text{Rep}} = \frac{1357 \times 537}{2450} = \frac{728{.}709}{2450} = 297{,}432\]

\[E_{F,\text{Ind}} = \frac{1357 \times 1088}{2450} = \frac{1{.}476{.}416}{2450} = 602{,}619 \qquad E_{M,\text{Dem}} = \frac{1093 \times 825}{2450} = \frac{901{.}725}{2450} = 368{,}051\]

\[E_{M,\text{Rep}} = \frac{1093 \times 537}{2450} = \frac{586{.}941}{2450} = 239{,}568 \qquad E_{M,\text{Ind}} = \frac{1093 \times 1088}{2450} = \frac{1{.}189{.}184}{2450} = 485{,}381\]

n_row2 <- rowSums(tabel2)
n_col2 <- colSums(tabel2)
n2_tot <- sum(tabel2)
E2     <- outer(n_row2, n_col2) / n2_tot

E2_margin <- addmargins(E2)
rownames(E2_margin)[nrow(E2_margin)] <- "Total"
colnames(E2_margin)[ncol(E2_margin)] <- "Total"

kable(round(E2_margin, 3),
      caption = "Tabel 6. Frekuensi Harapan (E_ij) di Bawah H0 Independensi",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(nrow(E2_margin), bold=TRUE, background="#dce8f5") |>
  column_spec(ncol(E2_margin), bold=TRUE, background="#dce8f5") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac")

Tabel 6. Frekuensi Harapan (E_ij) di Bawah H0 Independensi
	Democrat	Republican	Independent	Total
Female	456.949	297.432	602.619	1357
Male	368.051	239.568	485.381	1093
Total	825.000	537.000	1088.000	2450

cat("Minimum E_ij:", round(min(E2),3),
    "->", ifelse(min(E2)>=5,"Syarat E>=5 terpenuhi ✓","TIDAK terpenuhi"), "\n")

## Minimum E_ij: 239.568 -> Syarat E>=5 terpenuhi ✓

3.3 Uji Chi-Square Independensi (Keseluruhan)

Hipotesis:

\[H_0: \text{Gender dan partai independen} \quad \text{vs} \quad H_1: \text{ada asosiasi}\]

Derajat bebas: \(df = (2-1)(3-1) = 2\)

Perhitungan Manual:

\[\chi^2 = \frac{(495-456{,}95)^2}{456{,}95} + \frac{(272-297{,}43)^2}{297{,}43} + \frac{(590-602{,}62)^2}{602{,}62} + \frac{(330-368{,}05)^2}{368{,}05} + \frac{(265-239{,}57)^2}{239{,}57} + \frac{(498-485{,}38)^2}{485{,}38}\]

\[= \frac{(38{,}05)^2}{456{,}95} + \frac{(-25{,}43)^2}{297{,}43} + \frac{(-12{,}62)^2}{602{,}62} + \frac{(-38{,}05)^2}{368{,}05} + \frac{(25{,}43)^2}{239{,}57} + \frac{(12{,}62)^2}{485{,}38}\]

\[= \frac{1447{,}80}{456{,}95} + \frac{646{,}69}{297{,}43} + \frac{159{,}26}{602{,}62} + \frac{1447{,}80}{368{,}05} + \frac{646{,}69}{239{,}57} + \frac{159{,}26}{485{,}38}\]

\[= 3{,}167 + 2{,}175 + 0{,}264 + 3{,}934 + 2{,}700 + 0{,}328 = 12{,}568\]

cs2 <- chisq.test(tabel2, correct=FALSE)

chi2_manual <- sum((tabel2 - E2)^2 / E2)
cat("Chi-square manual      :", round(chi2_manual,4), "\n")

## Chi-square manual      : 12.5693

cat("Chi-square (chisq.test):", round(cs2$statistic,4), "\n")

## Chi-square (chisq.test): 12.5693

cat("df                     :", cs2$parameter, "\n")

## df                     : 2

cat("p-value                :", format(cs2$p.value, scientific=TRUE), "\n\n")

## p-value                : 1.86475e-03

cramer_V <- sqrt(cs2$statistic / (n2_tot * (min(nrow(tabel2),ncol(tabel2))-1)))
cat("Cramer's V             :", round(cramer_V,4), "\n\n")

## Cramer's V             : 0.0716

print(cs2)

## 
##  Pearson's Chi-squared test
## 
## data:  tabel2
## X-squared = 12.569, df = 2, p-value = 0.001865

Keputusan & Interpretasi: \(\chi^2 = 12.5693\), \(df = 2\), \(p = 1.86475e-03\). \(H_0\) ditolak. Ada asosiasi signifikan antara gender dan identifikasi partai. Cramér’s \(V = 0.0716\) menunjukkan kekuatan asosiasi yang lemah hingga sedang.

3.4 Residual Pearson dan Standardized Residual

Residual Pearson: \[r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}\]

Standardized (Adjusted) Residual: \[r_{ij}^{\text{std}} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1-p_{i+})(1-p_{+j})}}\]

Perhitungan Manual — sel Female-Democrat:

\[r_{F,D} = \frac{495 - 456{,}95}{\sqrt{456{,}95}} = \frac{38{,}05}{21{,}38} = 1{,}780\]

Proporsi marginal: \(p_{F+} = 1357/2450 = 0{,}5539\); \(p_{+D} = 825/2450 = 0{,}3367\)

\[r_{F,D}^{\text{std}} = \frac{38{,}05}{\sqrt{456{,}95 \times (1-0{,}5539) \times (1-0{,}3367)}} = \frac{38{,}05}{\sqrt{456{,}95 \times 0{,}4461 \times 0{,}6633}} = \frac{38{,}05}{\sqrt{135{,}23}} = \frac{38{,}05}{11{,}63} = 3{,}272\]

Karena \(|r_{F,D}^{\text{std}}| = 3{,}272 > 2\), sel ini berkontribusi signifikan terhadap chi-square.

pearson_res <- cs2$residuals
std_res     <- cs2$stdres

res_df <- data.frame(
  Sel = c("Female-Democrat","Female-Republican","Female-Independent",
          "Male-Democrat","Male-Republican","Male-Independent"),
  O   = as.vector(t(tabel2)),
  E   = round(as.vector(t(E2)),3),
  "Pearson Residual"      = round(as.vector(t(pearson_res)),4),
  "Standardized Residual" = round(as.vector(t(std_res)),4),
  "Signifikan (|r|>2)"   = ifelse(abs(as.vector(t(std_res)))>2,"Ya","Tidak"),
  check.names = FALSE
)

kable(res_df,
      caption = "Tabel 7. Residual Pearson dan Standardized Residual",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  row_spec(which(abs(as.vector(t(std_res)))>2), bold=TRUE, background="#fff3b0")

Tabel 7. Residual Pearson dan Standardized Residual
Sel	O	E	Pearson Residual	Standardized Residual	Signifikan (\|r\|>2)
Female-Democrat	495	456.949	1.7801	3.2724	Ya
Female-Republican	272	297.432	-1.4747	-2.4986	Ya
Female-Independent	590	602.619	-0.5140	-1.0322	Tidak
Male-Democrat	330	368.051	-1.9834	-3.2724	Ya
Male-Republican	265	239.568	1.6431	2.4986	Ya
Male-Independent	498	485.381	0.5728	1.0322	Tidak

res_long        <- as.data.frame(as.table(std_res))
colnames(res_long) <- c("Gender","Partai","Residual")
res_long$label  <- round(res_long$Residual, 3)
res_long$text_col <- ifelse(abs(res_long$Residual) > 1.5, "white", "#333333")

ggplot(res_long, aes(x=Partai, y=Gender, fill=Residual)) +
  geom_tile(color="white", linewidth=1.2) +
  geom_text(aes(label=label, color=text_col), size=6, fontface="bold") +
  scale_color_identity() +
  scale_fill_gradient2(low="#b2182b", mid="#f7f7f7", high="#2166ac",
                       midpoint=0, name="Std.\nResidual", limits=c(-4,4)) +
  labs(title    = "Heatmap Standardized Residuals",
       subtitle = "Biru = lebih tinggi dari harapan | Merah = lebih rendah dari harapan",
       x="Partai Politik", y="Gender") +
  theme_minimal(base_size=13) +
  theme(plot.title    = element_text(face="bold", size=14),
        plot.subtitle = element_text(color="grey40"),
        axis.text     = element_text(size=12, face="bold"),
        panel.grid    = element_blank())

Gambar 3. Heatmap Standardized Residuals — Gender vs Partai

Interpretasi Residual: - Female-Democrat (\(r^{\text{std}} = 3.272\), \(|r|>2\) — signifikan): Perempuan lebih banyak mengidentifikasi sebagai Democrat dari yang diharapkan. - Male-Democrat (\(r^{\text{std}} = -3.272\), \(|r|>2\) — signifikan): Laki-laki lebih sedikit mengidentifikasi sebagai Democrat. - Sel lainnya memiliki \(|r^{\text{std}}| < 2\), tidak signifikan secara individual.

3.5 Partisi Chi-Square

Partisi chi-square membagi uji keseluruhan (\(df=2\)) menjadi dua uji ortogonal (\(df=1\) masing-masing):

\[\chi^2_{\text{total}}(df=2) \approx \chi^2_{\text{Dem vs Rep}}(df=1) + \chi^2_{\text{(Dem+Rep) vs Ind}}(df=1)\]

3.5.1 Partisi 1: Democrat vs Republican

Sub-tabel hanya kolom Democrat dan Republican (\(n = 1362\)):

	Democrat	Republican	Total
Female	495	272	767
Male	330	265	595
Total	825	537	1362

Frekuensi Harapan:

\[E_{F,D}^{(1)} = \frac{767 \times 825}{1362} = \frac{632{.}775}{1362} = 464{,}59 \qquad E_{F,R}^{(1)} = \frac{767 \times 537}{1362} = 302{,}41\]

\[E_{M,D}^{(1)} = \frac{595 \times 825}{1362} = 360{,}41 \qquad E_{M,R}^{(1)} = \frac{595 \times 537}{1362} = 234{,}59\]

Statistik Chi-Square:

\[\chi^2_1 = \frac{(495-464{,}59)^2}{464{,}59} + \frac{(272-302{,}41)^2}{302{,}41} + \frac{(330-360{,}41)^2}{360{,}41} + \frac{(265-234{,}59)^2}{234{,}59}\]

\[= \frac{(30{,}41)^2}{464{,}59} + \frac{(-30{,}41)^2}{302{,}41} + \frac{(-30{,}41)^2}{360{,}41} + \frac{(30{,}41)^2}{234{,}59}\]

\[= \frac{924{,}77}{464{,}59} + \frac{924{,}77}{302{,}41} + \frac{924{,}77}{360{,}41} + \frac{924{,}77}{234{,}59} = 1{,}990 + 3{,}059 + 2{,}565 + 3{,}942 = 11{,}556\]

sub1    <- tabel2[, c("Democrat","Republican")]
cs_sub1 <- chisq.test(sub1, correct=FALSE)
cat("Sub-tabel Partisi 1:\n"); print(addmargins(sub1))

## Sub-tabel Partisi 1:

##         Partai
## Gender   Democrat Republican  Sum
##   Female      495        272  767
##   Male        330        265  595
##   Sum         825        537 1362

cat("\nChi-square:", round(cs_sub1$statistic,4),
    "| df:", cs_sub1$parameter,
    "| p-value:", format(cs_sub1$p.value, scientific=TRUE), "\n")

## 
## Chi-square: 11.5545 | df: 1 | p-value: 6.758479e-04

3.5.2 Partisi 2: (Democrat + Republican) vs Independent

	Dem+Rep	Independent	Total
Female	767	590	1357
Male	595	498	1093
Total	1362	1088	2450

Frekuensi Harapan:

\[E_{F,DR}^{(2)} = \frac{1357 \times 1362}{2450} = \frac{1{.}848{.}234}{2450} = 754{,}789 \qquad E_{F,I}^{(2)} = \frac{1357 \times 1088}{2450} = 602{,}211\]

Statistik Chi-Square:

\[\chi^2_2 = \frac{(767-754{,}79)^2}{754{,}79} + \frac{(590-602{,}21)^2}{602{,}21} + \frac{(595-607{,}21)^2}{607{,}21} + \frac{(498-485{,}79)^2}{485{,}79}\]

\[= \frac{(12{,}21)^2}{754{,}79} + \frac{(-12{,}21)^2}{602{,}21} + \frac{(-12{,}21)^2}{607{,}21} + \frac{(12{,}21)^2}{485{,}79}\]

\[= 0{,}197 + 0{,}247 + 0{,}245 + 0{,}307 = 0{,}996\]

tabel2_p2 <- cbind(
  "Dem+Rep"     = tabel2[,"Democrat"] + tabel2[,"Republican"],
  "Independent" = tabel2[,"Independent"]
)
cs_sub2 <- chisq.test(tabel2_p2, correct=FALSE)
cat("Sub-tabel Partisi 2:\n"); print(addmargins(tabel2_p2))

## Sub-tabel Partisi 2:

##        Dem+Rep Independent  Sum
## Female     767         590 1357
## Male       595         498 1093
## Sum       1362        1088 2450

cat("\nChi-square:", round(cs_sub2$statistic,4),
    "| df:", cs_sub2$parameter,
    "| p-value:", format(cs_sub2$p.value, scientific=TRUE), "\n")

## 
## Chi-square: 1.0654 | df: 1 | p-value: 3.01979e-01

3.6 Perbandingan Partisi dengan Chi-Square Keseluruhan

chi_sum <- cs_sub1$statistic + cs_sub2$statistic

partisi_df <- data.frame(
  "Uji" = c("Chi-Square Keseluruhan (2x3)",
            "Partisi 1: Dem vs Rep (df=1)",
            "Partisi 2: (Dem+Rep) vs Ind (df=1)",
            "Jumlah Partisi (df=2)"),
  "Chi-Square" = c(round(cs2$statistic,4),
                   round(cs_sub1$statistic,4),
                   round(cs_sub2$statistic,4),
                   round(chi_sum,4)),
  "df"         = c(2,1,1,2),
  "p-value"    = c(format(cs2$p.value,  scientific=TRUE, digits=3),
                   format(cs_sub1$p.value, scientific=TRUE, digits=3),
                   format(cs_sub2$p.value, scientific=TRUE, digits=3),
                   format(pchisq(chi_sum,2,lower.tail=FALSE), scientific=TRUE, digits=3)),
  "Keputusan"  = c(rep("Tolak H0",3),"—"),
  check.names = FALSE
)

kable(partisi_df,
      caption = "Tabel 8. Perbandingan Chi-Square Keseluruhan vs Partisi",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position="center") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  row_spec(4, bold=TRUE, background="#dce8f5")

Tabel 8. Perbandingan Chi-Square Keseluruhan vs Partisi
Uji	Chi-Square	df	p-value	Keputusan
Chi-Square Keseluruhan (2x3)	12.5693	2	1.86e-03	Tolak H0
Partisi 1: Dem vs Rep (df=1)	11.5545	1	6.76e-04	Tolak H0
Partisi 2: (Dem+Rep) vs Ind (df=1)	1.0654	1	3.02e-01	Tolak H0
Jumlah Partisi (df=2)	12.6200	2	1.82e-03	—

cat("Aditivitas:", round(cs_sub1$statistic,4), "+", round(cs_sub2$statistic,4),
    "=", round(chi_sum,4), "~=", round(cs2$statistic,4), "\n")

## Aditivitas: 11.5545 + 1.0654 = 12.62 ~= 12.5693

Diskusi: - Partisi 1 (Dem vs Rep): \(\chi^2 = 11.5545\), \(p = 6.758479e-04\) — sangat signifikan; perbedaan gender paling jelas pada pilihan Democrat vs Republican. - Partisi 2 ((Dem+Rep) vs Ind): \(\chi^2 = 1.0654\), \(p = 3.01979e-01\) — tidak signifikan; gender tidak membedakan secara bermakna pemilih partai mainstream vs Independent. - Jumlah \(\chi^2\) partisi (\(12.62\)) \(\approx\) \(\chi^2\) total (\(12.5693\)), memverifikasi properti aditivitas.

3.7 Kategori Paling Berkontribusi terhadap Asosiasi

kontrib        <- (tabel2 - E2)^2 / E2
persen_kontrib <- kontrib / sum(kontrib) * 100

kont_df <- as.data.frame(as.table(round(persen_kontrib,2)))
colnames(kont_df) <- c("Gender","Partai","Kontribusi (%)")
kont_df <- kont_df[order(-kont_df[,"Kontribusi (%)"]),]
kont_df[,"Rank"] <- 1:nrow(kont_df)

kable(kont_df,
      caption = "Tabel 9. Kontribusi Setiap Sel terhadap Chi-Square Total (%)",
      align="c", row.names=FALSE) |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width=FALSE, position="center") |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  row_spec(1:2, bold=TRUE, background="#fff3b0")

Tabel 9. Kontribusi Setiap Sel terhadap Chi-Square Total (%)
Gender	Partai	Kontribusi (%)	Rank
Male	Democrat	31.30	1
Female	Democrat	25.21	2
Male	Republican	21.48	3
Female	Republican	17.30	4
Male	Independent	2.61	5
Female	Independent	2.10	6

kont_all        <- as.data.frame(as.table(round(persen_kontrib,2)))
colnames(kont_all) <- c("Gender","Partai","Kontribusi")
kont_all$Sel    <- paste(kont_all$Gender, kont_all$Partai, sep="\n")
kont_all$Partai <- factor(kont_all$Partai,
                           levels=c("Democrat","Republican","Independent"))

ggplot(kont_all, aes(x=reorder(Sel,-Kontribusi), y=Kontribusi, fill=Partai)) +
  geom_col(alpha=0.9, color="white", linewidth=0.5) +
  geom_text(aes(label=paste0(round(Kontribusi,1),"%")),
            vjust=-0.4, size=4.2, fontface="bold") +
  scale_fill_manual(values=c("Democrat"    = "#2166ac",
                             "Republican"  = "#d6604d",
                             "Independent" = "#1a9850")) +
  scale_y_continuous(limits=c(0,35)) +
  labs(title="Kontribusi Setiap Sel terhadap Chi-Square Total",
       x="Sel (Gender x Partai)", y="Kontribusi (%)", fill="Partai") +
  theme_minimal(base_size=13) +
  theme(plot.title          = element_text(face="bold", size=14),
        panel.grid.major.x  = element_blank())

Gambar 4. Kontribusi Setiap Sel terhadap Chi-Square Total

3.8 Visualisasi Tambahan Kasus 2

mosaic(tabel2,
       shade    = TRUE,
       legend   = TRUE,
       main     = "Mosaic Plot: Gender x Identifikasi Partai Politik",
       labeling = labeling_border(rot_labels=c(0,0,0,0)),
       gp       = shading_hcl)

Gambar 5. Mosaic Plot — Gender vs Partai Politik

prop2         <- as.data.frame(prop.table(tabel2, margin=1))
colnames(prop2) <- c("Gender","Partai","Proporsi")
prop2$Partai  <- factor(prop2$Partai,
                        levels=c("Democrat","Republican","Independent"))

ggplot(prop2, aes(x=Gender, y=Proporsi, fill=Partai)) +
  geom_col(position="fill", alpha=0.9, width=0.55,
           color="white", linewidth=0.5) +
  geom_text(aes(label=paste0(round(Proporsi*100,1),"%")),
            position=position_fill(vjust=0.5),
            color="white", fontface="bold", size=5) +
  scale_fill_manual(values=c("Democrat"    = "#2166ac",
                             "Republican"  = "#d6604d",
                             "Independent" = "#1a9850")) +
  scale_y_continuous(labels=percent_format()) +
  labs(title    = "Distribusi Identifikasi Partai per Gender",
       subtitle = "Proporsi baris (row percentage)",
       x=NULL, y="Proporsi", fill="Partai Politik") +
  theme_minimal(base_size=13) +
  theme(plot.title         = element_text(face="bold", size=14),
        plot.subtitle      = element_text(color="grey40"),
        panel.grid.major.x = element_blank())

Gambar 6. Distribusi Proporsi Identifikasi Partai per Gender

3.9 Kesimpulan Kasus 2

Uji chi-square keseluruhan (\(\chi^2 = 12.5693\), \(df=2\), \(p < 0{,}05\)) membuktikan asosiasi yang signifikan antara gender dan identifikasi partai.
Frekuensi harapan seluruhnya \(\geq 5\) (minimum = 239.6), sehingga aproksimasi chi-square valid.
Residual standar menunjukkan sel Female-Democrat (\(r^{\text{std}} = 3.272\)) dan Male-Democrat (\(r^{\text{std}} = -3.272\)) sebagai penyimpang signifikan.
Partisi chi-square mengungkap bahwa perbedaan gender terkonsentrasi pada Democrat vs Republican (\(\chi^2 = 11.5545\), sangat signifikan), sementara perbandingan partai mainstream vs Independent tidak signifikan.
Kesimpulan substantif: Kategori Democrat adalah yang paling berkontribusi terhadap asosiasi. Perempuan cenderung lebih mengidentifikasi diri sebagai Democrat dibanding laki-laki.

4 Kesimpulan Umum

kesimpulan_df <- data.frame(
  "Kasus"        = c("Kasus 1 (2x2)","Kasus 2 (2x3)"),
  "Variabel"     = c("Merokok – Kanker Paru","Gender – Partai Politik"),
  "Asosiasi"     = c(paste0("RD=",round(RD,3),"; RR=",round(RR,3),"; OR=",round(OR,3)),
                     paste0("V=",round(cramer_V,3))),
  "Chi-Square"   = c(round(chi2_stat,3), round(cs2$statistic,3)),
  "p-value"      = c(format(p_chi, scientific=TRUE, digits=2),
                     format(cs2$p.value, scientific=TRUE, digits=2)),
  "Keputusan"    = c("Tolak H0","Tolak H0"),
  "Temuan Utama" = c("OR=2,97; asosiasi kuat & signifikan",
                     "Democrat paling membedakan gender"),
  check.names = FALSE
)

kable(kesimpulan_df,
      caption = "Tabel 10. Ringkasan Kesimpulan Kedua Kasus",
      align   = "c") |>
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = TRUE) |>
  row_spec(0, bold=TRUE, color="white", background="#2166ac") |>
  column_spec(6, bold=TRUE, color="#1a7a1a")

Tabel 10. Ringkasan Kesimpulan Kedua Kasus
	Kasus	Variabel	Asosiasi	Chi-Square	p-value	Keputusan	Temuan Utama
	Kasus 1 (2x2)	Merokok – Kanker Paru	RD=0.252; RR=1.959; OR=2.974	19.129	1.2e-05	Tolak H0	OR=2,97; asosiasi kuat & signifikan
X-squared	Kasus 2 (2x3)	Gender – Partai Politik	V=0.072	12.569	1.9e-03	Tolak H0	Democrat paling membedakan gender

Kedua kasus membuktikan pentingnya analisis inferensi yang komprehensif — tidak hanya uji signifikansi statistik, tetapi juga estimasi ukuran asosiasi beserta interval kepercayaannya, serta analisis kontribusi sel melalui residual. Pendekatan terpadu ini memberikan gambaran yang lebih utuh dan substantif tentang hubungan antar variabel kategorik.

5 Referensi

Agresti, A. (2013). Categorical Data Analysis (3rd ed.). John Wiley & Sons.
Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical Methods for Rates and Proportions (3rd ed.). John Wiley & Sons.
Stokes, M. E., Davis, C. S., & Koch, G. G. (2012). Categorical Data Analysis Using SAS (3rd ed.). SAS Institute Inc.
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Zar, J. H. (2010). Biostatistical Analysis (5th ed.). Pearson Prentice Hall.

Tugas 6: Inferensi Tabel Kontingensi Dua Arah

Gregorius Adiyatma Pradana/140610240065

2026-04-09

1 Pendahuluan

1.1 Ukuran Asosiasi dalam Tabel Kontingensi

1.1.1 Risk Difference (RD) — Selisih Risiko

1.1.2 Risk Ratio (RR) — Rasio Risiko Relatif

1.1.3 Odds Ratio (OR) — Rasio Odds

1.1.4 Perbandingan Singkat RD, RR, dan OR

1.2 Metode Pengujian yang Digunakan

2 Kasus 1: Tabel Kontingensi 2×2 — Merokok dan Kanker Paru

2.1 Penyusunan Tabel Kontingensi

2.2 Estimasi Titik Proporsi

2.3 Interval Kepercayaan 95%

2.3.1 Interval Kepercayaan untuk Proporsi Masing-masing Kelompok

2.3.2 Interval Kepercayaan untuk Risk Difference (RD)

2.3.3 Interval Kepercayaan untuk Risk Ratio (RR)

2.3.4 Interval Kepercayaan untuk Odds Ratio (OR)

2.3.5 Ringkasan Ukuran Asosiasi

2.4 Uji Dua Proporsi

2.5 Uji Chi-Square Independensi

2.6 Uji Likelihood Ratio (\(G^2\))

2.7 Fisher Exact Test

2.8 Perbandingan Keempat Metode Uji

2.9 Visualisasi Kasus 1

2.10 Kesimpulan Kasus 1

3 Kasus 2: Tabel Kontingensi 2×3 — Gender dan Identifikasi Partai Politik

3.1 Penyusunan Tabel Kontingensi

3.2 Frekuensi Harapan

3.3 Uji Chi-Square Independensi (Keseluruhan)

3.4 Residual Pearson dan Standardized Residual

3.5 Partisi Chi-Square

3.5.1 Partisi 1: Democrat vs Republican

3.5.2 Partisi 2: (Democrat + Republican) vs Independent

3.6 Perbandingan Partisi dengan Chi-Square Keseluruhan

3.7 Kategori Paling Berkontribusi terhadap Asosiasi

3.8 Visualisasi Tambahan Kasus 2

3.9 Kesimpulan Kasus 2

4 Kesimpulan Umum

5 Referensi