1 Pendahuluan

Laporan ini membahas inferensi statistik pada tabel kontingensi dua arah menggunakan data kategorik. Analisis mencakup estimasi proporsi, interval kepercayaan, serta berbagai uji hipotesis: uji dua proporsi, chi-square, likelihood ratio (\(G^2\)), dan Fisher exact test.

Terdapat dua kasus:

Kasus 1: Tabel 2×2 — hubungan kebiasaan merokok dan kanker paru.
Kasus 2: Tabel 2×3 — hubungan gender dan identifikasi partai politik.

2 Kasus 1: Tabel Kontingensi 2×2

2.1 Data dan Tabel Kontingensi

Data menggambarkan hubungan antara kebiasaan merokok dan kejadian kanker paru.

# Membuat tabel kontingensi
tabel1 <- matrix(
  c(688, 650, 21, 59),
  nrow  = 2,
  byrow = TRUE,
  dimnames = list(
    "Status Merokok" = c("Smoker", "Non-Smoker"),
    "Diagnosis"      = c("Cancer (+)", "Control (-)")
  )
)

# Tampilkan dengan total
addmargins(tabel1)

              Diagnosis
Status Merokok Cancer (+) Control (-)  Sum
    Smoker            688         650 1338
    Non-Smoker         21          59   80
    Sum               709         709 1418

library(knitr)
library(kableExtra)

df1 <- data.frame(
  "Status Merokok" = c("Smoker", "Non-Smoker", "Total"),
  "Cancer (+)"     = c(688, 21, 709),
  "Control (-)"    = c(650, 59, 709),
  "Total"          = c(1338, 80, 1418),
  check.names = FALSE
)

kable(df1, align = "c", caption = "Tabel Kontingensi 2×2: Merokok dan Kanker Paru") |>
  kable_styling(
    bootstrap_options = c("striped", "bordered", "hover"),
    full_width        = TRUE
  ) |>
  row_spec(0, bold = TRUE, background = "#000000", color = "white") |>
  row_spec(3, bold = TRUE, background = "#d0d0d0")

Tabel Kontingensi 2×2: Merokok dan Kanker Paru
Status Merokok	Cancer (+)	Control (-)	Total
Smoker	688	650	1338
Non-Smoker	21	59	80
Total	709	709	1418

2.2 Estimasi Titik Proporsi

\[\hat{p}_{\text{Smoker}} = \frac{688}{1338}, \quad \hat{p}_{\text{Non-Smoker}} = \frac{21}{80}\]

n1 <- 1338; x1 <- 688  # Smoker
n2 <- 80;   x2 <- 21   # Non-Smoker

p1 <- x1 / n1
p2 <- x2 / n2

cat("Proporsi Smoker     :", round(p1, 4), "\n")

Proporsi Smoker     : 0.5142

cat("Proporsi Non-Smoker :", round(p2, 4), "\n")

Proporsi Non-Smoker : 0.2625

2.3 Interval Kepercayaan 95%

2.3.1 Proporsi Masing-Masing Kelompok

\[\hat{p} \pm z_{0.025} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

z <- qnorm(0.975)

# CI Smoker
ci1_low <- p1 - z * sqrt(p1*(1-p1)/n1)
ci1_up  <- p1 + z * sqrt(p1*(1-p1)/n1)

# CI Non-Smoker
ci2_low <- p2 - z * sqrt(p2*(1-p2)/n2)
ci2_up  <- p2 + z * sqrt(p2*(1-p2)/n2)

cat("95% CI Smoker     : [", round(ci1_low,4), ",", round(ci1_up,4), "]\n")

95% CI Smoker     : [ 0.4874 , 0.541 ]

cat("95% CI Non-Smoker : [", round(ci2_low,4), ",", round(ci2_up,4), "]\n")

95% CI Non-Smoker : [ 0.1661 , 0.3589 ]

2.3.2 Risk Difference (RD)

\[\text{RD} = \hat{p}_1 - \hat{p}_2, \quad \text{SE}(\text{RD}) = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]

RD    <- p1 - p2
SE_RD <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
ci_RD <- c(RD - z*SE_RD, RD + z*SE_RD)

cat("Risk Difference (RD) :", round(RD, 4), "\n")

Risk Difference (RD) : 0.2517

cat("95% CI RD            : [", round(ci_RD[1],4), ",", round(ci_RD[2],4), "]\n")

95% CI RD            : [ 0.1516 , 0.3518 ]

Interpretasi: Kelompok perokok memiliki risiko kanker paru yang lebih tinggi sekitar 0.2517 dibandingkan non-perokok.

2.3.3 Relative Risk (RR)

\[\text{RR} = \frac{\hat{p}_1}{\hat{p}_2}, \quad \text{SE}(\ln\text{RR}) = \sqrt{\frac{1-\hat{p}_1}{x_1} + \frac{1-\hat{p}_2}{x_2}}\]

RR     <- p1 / p2
lnRR   <- log(RR)
SE_lnRR <- sqrt((1-p1)/x1 + (1-p2)/x2)
ci_RR  <- exp(c(lnRR - z*SE_lnRR, lnRR + z*SE_lnRR))

cat("Relative Risk (RR) :", round(RR, 4), "\n")

Relative Risk (RR) : 1.9589

cat("95% CI RR          : [", round(ci_RR[1],4), ",", round(ci_RR[2],4), "]\n")

95% CI RR          : [ 1.3517 , 2.8387 ]

Interpretasi: Risiko kanker paru pada perokok adalah 1.96 kali lebih besar dibandingkan non-perokok. CI tidak mencakup 1, sehingga hubungan signifikan.

2.3.4 Odds Ratio (OR)

\[\text{OR} = \frac{x_1(n_2 - x_2)}{x_2(n_1 - x_1)}, \quad \text{SE}(\ln\text{OR}) = \sqrt{\frac{1}{x_1} + \frac{1}{n_1-x_1} + \frac{1}{x_2} + \frac{1}{n_2-x_2}}\]

OR     <- (x1*(n2-x2)) / (x2*(n1-x1))
lnOR   <- log(OR)
SE_lnOR <- sqrt(1/x1 + 1/(n1-x1) + 1/x2 + 1/(n2-x2))
ci_OR  <- exp(c(lnOR - z*SE_lnOR, lnOR + z*SE_lnOR))

cat("Odds Ratio (OR) :", round(OR, 4), "\n")

Odds Ratio (OR) : 2.9738

cat("95% CI OR       : [", round(ci_OR[1],4), ",", round(ci_OR[2],4), "]\n")

95% CI OR       : [ 1.7867 , 4.9494 ]

Interpretasi: Odds kanker paru pada perokok adalah 2.97 kali lebih besar dibandingkan non-perokok. Efek sangat kuat dan signifikan.

2.4 Uji Hipotesis

2.4.1 Uji Dua Proporsi

\[H_0: p_1 = p_2 \quad \text{vs} \quad H_1: p_1 \neq p_2\]

\[Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\]

uji_prop <- prop.test(c(x1, x2), c(n1, n2), correct = FALSE)
print(uji_prop)


    2-sample test for equality of proportions without continuity correction

data:  c(x1, x2) out of c(n1, n2)
X-squared = 19.129, df = 1, p-value = 1.222e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 0.1516343 0.3517663
sample estimates:
   prop 1    prop 2 
0.5142003 0.2625000

2.4.2 Uji Chi-Square Independensi

\[H_0: \text{Merokok dan Kanker Paru independen}\]

\[\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]

uji_chi <- chisq.test(tabel1, correct = FALSE)
print(uji_chi)


    Pearson's Chi-squared test

data:  tabel1
X-squared = 19.129, df = 1, p-value = 1.222e-05

cat("Nilai harapan:\n")

Nilai harapan:

print(uji_chi$expected)

              Diagnosis
Status Merokok Cancer (+) Control (-)
    Smoker            669         669
    Non-Smoker         40          40

2.4.3 Uji Likelihood Ratio (G²)

\[G^2 = 2\sum O_{ij} \ln\left(\frac{O_{ij}}{E_{ij}}\right)\]

library(DescTools)
G2_stat <- GTest(tabel1)
print(G2_stat)


    Log likelihood ratio (G-test) test of independence without correction

data:  tabel1
G = 19.878, X-squared df = 1, p-value = 8.254e-06

2.4.4 Fisher Exact Test

uji_fisher <- fisher.test(tabel1)
print(uji_fisher)


    Fisher's Exact Test for Count Data

data:  tabel1
p-value = 1.476e-05
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.755611 5.210711
sample estimates:
odds ratio 
  2.971634

2.5 Perbandingan Hasil Uji

compare_df <- data.frame(
  "Metode Uji"    = c("Uji Dua Proporsi", "Chi-Square", "Likelihood Ratio (G²)", "Fisher Exact Test"),
  "Statistik Uji" = c(
    paste0("Z = ", round(sqrt(uji_prop$statistic), 4)),
    paste0("χ² = ", round(uji_chi$statistic, 4)),
    paste0("G² = ", round(G2_stat$statistic, 4)),
    "—"
  ),
  "p-value"   = c(
    formatC(uji_prop$p.value, format="e", digits=3),
    formatC(uji_chi$p.value,  format="e", digits=3),
    formatC(G2_stat$p.value,  format="e", digits=3),
    formatC(uji_fisher$p.value, format="e", digits=3)
  ),
  "df" = c(1, 1, 1, "—"),
  "Keputusan" = rep("Tolak H₀", 4),
  check.names = FALSE
)

kable(compare_df, align = "c", caption = "Perbandingan Hasil Uji Hipotesis (α = 0.05)") |>
  kable_styling(bootstrap_options = c("striped","bordered","hover"), full_width=TRUE) |>
  row_spec(0, bold=TRUE, background="#000000", color="white")

Perbandingan Hasil Uji Hipotesis (α = 0.05)
Metode Uji	Statistik Uji	p-value	df	Keputusan
Uji Dua Proporsi	Z = 4.3737	1.222e-05	1	Tolak H₀
Chi-Square	χ² = 19.1292	1.222e-05	1	Tolak H₀
Likelihood Ratio (G²)	G² = 19.878	8.254e-06	1	Tolak H₀
Fisher Exact Test	—	1.476e-05	—	Tolak H₀

Interpretasi Komparatif:
Keempat metode menghasilkan keputusan yang konsisten: tolak \(H_0\) pada \(\alpha = 0.05\). Semua p-value sangat kecil (< 0.0001), menunjukkan bukti statistik yang kuat. Uji proporsi dan chi-square ekuivalen untuk tabel 2×2. Fisher exact test sesuai untuk sel kecil, meski di sini semua frekuensi harapan besar.

2.6 Kesimpulan Kasus 1

Terdapat hubungan yang signifikan antara kebiasaan merokok dan kanker paru (p < 0,0001 pada semua uji). Perokok memiliki risiko kanker paru 1.96 kali lebih tinggi (RR) dan odds 2.97 kali lebih besar (OR) dibandingkan non-perokok. Risk Difference sebesar 0.2517 menunjukkan perbedaan absolut yang substansial. Semua ukuran asosiasi signifikan secara statistik berdasarkan interval kepercayaan 95%.

3 Kasus 2: Tabel Kontingensi 2×3

3.1 Data dan Tabel Kontingensi

Data menggambarkan hubungan antara gender dan identifikasi partai politik.

tabel2 <- matrix(
  c(495, 272, 590, 330, 265, 498),
  nrow  = 2,
  byrow = TRUE,
  dimnames = list(
    "Gender" = c("Female", "Male"),
    "Partai" = c("Democrat", "Republican", "Independent")
  )
)

addmargins(tabel2)

        Partai
Gender   Democrat Republican Independent  Sum
  Female      495        272         590 1357
  Male        330        265         498 1093
  Sum         825        537        1088 2450

df2 <- data.frame(
  "Gender"      = c("Female", "Male", "Total"),
  "Democrat"    = c(495, 330, 825),
  "Republican"  = c(272, 265, 537),
  "Independent" = c(590, 498, 1088),
  "Total"       = c(1357, 1093, 2450),
  check.names   = FALSE
)

kable(df2, align="c", caption="Tabel Kontingensi 2×3: Gender dan Identifikasi Partai Politik") |>
  kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
  row_spec(0, bold=TRUE, background="#000000", color="white") |>
  row_spec(3, bold=TRUE, background="#d0d0d0")

Tabel Kontingensi 2×3: Gender dan Identifikasi Partai Politik
Gender	Democrat	Republican	Independent	Total
Female	495	272	590	1357
Male	330	265	498	1093
Total	825	537	1088	2450

3.2 Frekuensi Harapan

\[E_{ij} = \frac{(\text{Total Baris}_i) \times (\text{Total Kolom}_j)}{N}\]

chi2_full <- chisq.test(tabel2, correct=FALSE)
cat("Frekuensi Harapan (E_ij):\n")

Frekuensi Harapan (E_ij):

print(round(chi2_full$expected, 2))

        Partai
Gender   Democrat Republican Independent
  Female   456.95     297.43      602.62
  Male     368.05     239.57      485.38

Semua frekuensi harapan > 5, sehingga asumsi chi-square terpenuhi.

3.3 Uji Chi-Square Keseluruhan

\[H_0: \text{Gender dan Identifikasi Partai independen}\]

\[\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad df = (r-1)(c-1) = (2-1)(3-1) = 2\]

print(chi2_full)


    Pearson's Chi-squared test

data:  tabel2
X-squared = 12.569, df = 2, p-value = 0.001865

3.4 Residual Pearson dan Standardized Residual

3.4.1 Residual Pearson

\[r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}\]

3.4.2 Standardized Residual

\[d_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1 - p_{i\cdot})(1 - p_{\cdot j})}}\]

# Residual Pearson
pearson_res <- chi2_full$residuals
cat("Residual Pearson:\n")

Residual Pearson:

print(round(pearson_res, 4))

        Partai
Gender   Democrat Republican Independent
  Female   1.7801    -1.4747     -0.5140
  Male    -1.9834     1.6431      0.5728

# Standardized residual
std_res <- chi2_full$stdres
cat("\nStandardized Residual:\n")


Standardized Residual:

print(round(std_res, 4))

        Partai
Gender   Democrat Republican Independent
  Female   3.2724    -2.4986     -1.0322
  Male    -3.2724     2.4986      1.0322

library(corrplot)

# Heatmap residual
corrplot(
  as.matrix(std_res),
  is.corr  = FALSE,
  method   = "color",
  col      = colorRampPalette(c("black","white","gray30"))(200),
  addCoef.col = "black",
  tl.col   = "black",
  tl.srt   = 0,
  cl.pos   = "r",
  title    = "Standardized Residual",
  mar      = c(0,0,2,0)
)

Standardized Residual per Sel

Interpretasi Residual:
Standardized residual dengan \(|d_{ij}| > 2\) mengindikasikan sel yang berkontribusi signifikan. Female–Democrat dan Male–Democrat menunjukkan pola perbedaan terbesar, mengindikasikan wanita lebih cenderung mengidentifikasi sebagai Democrat dibanding pria.

3.5 Partisi Chi-Square

3.5.1 Partisi 1: Democrat vs Republican

tabel2a <- tabel2[, c("Democrat", "Republican")]
chi2a   <- chisq.test(tabel2a, correct=FALSE)
cat("Chi-Square (Democrat vs Republican):\n")

Chi-Square (Democrat vs Republican):

print(chi2a)


    Pearson's Chi-squared test

data:  tabel2a
X-squared = 11.555, df = 1, p-value = 0.0006758

3.5.2 Partisi 2: (Democrat + Republican) vs Independent

tabel2b <- cbind(
  "Dem+Rep"     = tabel2[,"Democrat"] + tabel2[,"Republican"],
  "Independent" = tabel2[,"Independent"]
)
chi2b <- chisq.test(tabel2b, correct=FALSE)
cat("Chi-Square ((Democrat+Republican) vs Independent):\n")

Chi-Square ((Democrat+Republican) vs Independent):

print(chi2b)


    Pearson's Chi-squared test

data:  tabel2b
X-squared = 1.0654, df = 1, p-value = 0.302

3.5.3 Ringkasan Partisi

partisi_df <- data.frame(
  "Partisi"       = c(
    "Democrat vs Republican",
    "(Democrat+Republican) vs Independent",
    "Total Keseluruhan"
  ),
  "χ²"   = c(
    round(chi2a$statistic, 4),
    round(chi2b$statistic, 4),
    round(chi2_full$statistic, 4)
  ),
  "df"      = c(1, 1, 2),
  "p-value" = c(
    formatC(chi2a$p.value, format="e", digits=3),
    formatC(chi2b$p.value, format="e", digits=3),
    formatC(chi2_full$p.value, format="e", digits=3)
  ),
  "Keputusan" = c("Tolak H₀", "Gagal Tolak H₀", "Tolak H₀"),
  check.names = FALSE
)

kable(partisi_df, align="c", caption="Ringkasan Partisi Chi-Square") |>
  kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
  row_spec(0, bold=TRUE, background="#000000", color="white") |>
  row_spec(3, bold=TRUE, background="#d0d0d0")

Ringkasan Partisi Chi-Square
Partisi	χ²	df	p-value	Keputusan
Democrat vs Republican	11.5545	1	6.758e-04	Tolak H₀
(Democrat+Republican) vs Independent	1.0654	1	3.020e-01	Gagal Tolak H₀
Total Keseluruhan	12.5693	2	1.865e-03	Tolak H₀

Catatan Aditivitas: \(\chi^2_{\text{Partisi 1}} + \chi^2_{\text{Partisi 2}} \approx \chi^2_{\text{Total}}\) dengan \(df = 1 + 1 = 2\), konsisten dengan uji keseluruhan.

3.6 Visualisasi

mosaicplot(
  tabel2,
  main   = "Mosaic Plot: Gender dan Partai Politik",
  color  = c("#1a1a1a","#888888","#cccccc"),
  border = "white",
  las    = 1,
  cex.axis = 0.85
)

Mosaic Plot: Gender dan Identifikasi Partai Politik

prop_tabel2 <- prop.table(tabel2, margin=1)

barplot(
  t(prop_tabel2),
  beside  = TRUE,
  col     = c("#1a1a1a","#666666","#cccccc"),
  legend.text = colnames(tabel2),
  args.legend = list(x="topright", bty="n", cex=0.85),
  main    = "Proporsi Identifikasi Partai per Gender",
  xlab    = "Gender",
  ylab    = "Proporsi",
  ylim    = c(0, 0.55),
  las     = 1,
  border  = "white"
)

Proporsi Identifikasi Partai per Gender

3.7 Kontribusi Kategori Terhadap Asosiasi

kontrib <- round((chi2_full$residuals^2), 4)
pct     <- round(100 * kontrib / chi2_full$statistic, 2)

df_kontrib <- data.frame(
  Sel          = c("Female–Democrat","Female–Republican","Female–Independent",
                   "Male–Democrat","Male–Republican","Male–Independent"),
  "O"          = as.vector(t(tabel2)),
  "E"          = round(as.vector(t(chi2_full$expected)), 2),
  "Residual²"  = as.vector(t(kontrib)),
  "Kontribusi (%)" = as.vector(t(pct)),
  check.names  = FALSE
)
df_kontrib <- df_kontrib[order(-df_kontrib[["Kontribusi (%)"]]),]

kable(df_kontrib, align="c", caption="Kontribusi Sel terhadap χ² Keseluruhan", row.names=FALSE) |>
  kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
  row_spec(0, bold=TRUE, background="#000000", color="white")

Kontribusi Sel terhadap χ² Keseluruhan
Sel	O	E	Residual²	Kontribusi (%)
Male–Democrat	330	368.05	3.9339	31.30
Female–Democrat	495	456.95	3.1686	25.21
Male–Republican	265	239.57	2.6999	21.48
Female–Republican	272	297.43	2.1746	17.30
Male–Independent	498	485.38	0.3281	2.61
Female–Independent	590	602.62	0.2642	2.10

3.8 Kesimpulan Kasus 2

Uji chi-square keseluruhan menunjukkan terdapat hubungan yang signifikan antara gender dan identifikasi partai politik (\(\chi^2\) = 12.5693, \(df\) = 2, p < 0,05).

Hasil partisi menunjukkan:

Democrat vs Republican: Perbedaan signifikan berdasarkan gender. Wanita lebih cenderung mengidentifikasi sebagai Democrat dibandingkan pria.
(Democrat+Republican) vs Independent: Tidak terdapat perbedaan signifikan antar gender dalam memilih antara partai besar versus Independen.

Kategori yang paling berkontribusi terhadap asosiasi adalah sel Female–Democrat dan Male–Democrat, sebagaimana tercermin dari nilai standardized residual dan persentase kontribusi terhadap \(\chi^2\) terbesar.

4 Ringkasan Akhir

Kasus	Variabel	Uji	Hasil
1	Merokok–Kanker	Semua uji	Signifikan (p < 0,0001)
2	Gender–Partai	Chi-square keseluruhan	Signifikan (p < 0,05)
2	Democrat vs Republican	Partisi 1	Signifikan
2	(Dem+Rep) vs Independent	Partisi 2	Tidak Signifikan

Inferensi Tabel Kontingensi Dua Arah

Analisis Data Kategori

Rayyan Ilyas Fadani