1 Pendahuluan

Laporan ini membahas inferensi statistik pada tabel kontingensi dua arah menggunakan data kategorik. Analisis mencakup estimasi proporsi, interval kepercayaan, serta berbagai uji hipotesis: uji dua proporsi, chi-square, likelihood ratio (\(G^2\)), dan Fisher exact test.

Terdapat dua kasus:

  • Kasus 1: Tabel 2×2 — hubungan kebiasaan merokok dan kanker paru.
  • Kasus 2: Tabel 2×3 — hubungan gender dan identifikasi partai politik.

2 Kasus 1: Tabel Kontingensi 2×2

2.1 Data dan Tabel Kontingensi

Data menggambarkan hubungan antara kebiasaan merokok dan kejadian kanker paru.

# Membuat tabel kontingensi
tabel1 <- matrix(
  c(688, 650, 21, 59),
  nrow  = 2,
  byrow = TRUE,
  dimnames = list(
    "Status Merokok" = c("Smoker", "Non-Smoker"),
    "Diagnosis"      = c("Cancer (+)", "Control (-)")
  )
)

# Tampilkan dengan total
addmargins(tabel1)
              Diagnosis
Status Merokok Cancer (+) Control (-)  Sum
    Smoker            688         650 1338
    Non-Smoker         21          59   80
    Sum               709         709 1418
library(knitr)
library(kableExtra)

df1 <- data.frame(
  "Status Merokok" = c("Smoker", "Non-Smoker", "Total"),
  "Cancer (+)"     = c(688, 21, 709),
  "Control (-)"    = c(650, 59, 709),
  "Total"          = c(1338, 80, 1418),
  check.names = FALSE
)

kable(df1, align = "c", caption = "Tabel Kontingensi 2×2: Merokok dan Kanker Paru") |>
  kable_styling(
    bootstrap_options = c("striped", "bordered", "hover"),
    full_width        = TRUE
  ) |>
  row_spec(0, bold = TRUE, background = "#000000", color = "white") |>
  row_spec(3, bold = TRUE, background = "#d0d0d0")
Tabel Kontingensi 2×2: Merokok dan Kanker Paru
Status Merokok Cancer (+) Control (-) Total
Smoker 688 650 1338
Non-Smoker 21 59 80
Total 709 709 1418

2.2 Estimasi Titik Proporsi

\[\hat{p}_{\text{Smoker}} = \frac{688}{1338}, \quad \hat{p}_{\text{Non-Smoker}} = \frac{21}{80}\]

n1 <- 1338; x1 <- 688  # Smoker
n2 <- 80;   x2 <- 21   # Non-Smoker

p1 <- x1 / n1
p2 <- x2 / n2

cat("Proporsi Smoker     :", round(p1, 4), "\n")
Proporsi Smoker     : 0.5142 
cat("Proporsi Non-Smoker :", round(p2, 4), "\n")
Proporsi Non-Smoker : 0.2625 

2.3 Interval Kepercayaan 95%

2.3.1 Proporsi Masing-Masing Kelompok

\[\hat{p} \pm z_{0.025} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

z <- qnorm(0.975)

# CI Smoker
ci1_low <- p1 - z * sqrt(p1*(1-p1)/n1)
ci1_up  <- p1 + z * sqrt(p1*(1-p1)/n1)

# CI Non-Smoker
ci2_low <- p2 - z * sqrt(p2*(1-p2)/n2)
ci2_up  <- p2 + z * sqrt(p2*(1-p2)/n2)

cat("95% CI Smoker     : [", round(ci1_low,4), ",", round(ci1_up,4), "]\n")
95% CI Smoker     : [ 0.4874 , 0.541 ]
cat("95% CI Non-Smoker : [", round(ci2_low,4), ",", round(ci2_up,4), "]\n")
95% CI Non-Smoker : [ 0.1661 , 0.3589 ]

2.3.2 Risk Difference (RD)

\[\text{RD} = \hat{p}_1 - \hat{p}_2, \quad \text{SE}(\text{RD}) = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]

RD    <- p1 - p2
SE_RD <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
ci_RD <- c(RD - z*SE_RD, RD + z*SE_RD)

cat("Risk Difference (RD) :", round(RD, 4), "\n")
Risk Difference (RD) : 0.2517 
cat("95% CI RD            : [", round(ci_RD[1],4), ",", round(ci_RD[2],4), "]\n")
95% CI RD            : [ 0.1516 , 0.3518 ]

Interpretasi: Kelompok perokok memiliki risiko kanker paru yang lebih tinggi sekitar 0.2517 dibandingkan non-perokok.

2.3.3 Relative Risk (RR)

\[\text{RR} = \frac{\hat{p}_1}{\hat{p}_2}, \quad \text{SE}(\ln\text{RR}) = \sqrt{\frac{1-\hat{p}_1}{x_1} + \frac{1-\hat{p}_2}{x_2}}\]

RR     <- p1 / p2
lnRR   <- log(RR)
SE_lnRR <- sqrt((1-p1)/x1 + (1-p2)/x2)
ci_RR  <- exp(c(lnRR - z*SE_lnRR, lnRR + z*SE_lnRR))

cat("Relative Risk (RR) :", round(RR, 4), "\n")
Relative Risk (RR) : 1.9589 
cat("95% CI RR          : [", round(ci_RR[1],4), ",", round(ci_RR[2],4), "]\n")
95% CI RR          : [ 1.3517 , 2.8387 ]

Interpretasi: Risiko kanker paru pada perokok adalah 1.96 kali lebih besar dibandingkan non-perokok. CI tidak mencakup 1, sehingga hubungan signifikan.

2.3.4 Odds Ratio (OR)

\[\text{OR} = \frac{x_1(n_2 - x_2)}{x_2(n_1 - x_1)}, \quad \text{SE}(\ln\text{OR}) = \sqrt{\frac{1}{x_1} + \frac{1}{n_1-x_1} + \frac{1}{x_2} + \frac{1}{n_2-x_2}}\]

OR     <- (x1*(n2-x2)) / (x2*(n1-x1))
lnOR   <- log(OR)
SE_lnOR <- sqrt(1/x1 + 1/(n1-x1) + 1/x2 + 1/(n2-x2))
ci_OR  <- exp(c(lnOR - z*SE_lnOR, lnOR + z*SE_lnOR))

cat("Odds Ratio (OR) :", round(OR, 4), "\n")
Odds Ratio (OR) : 2.9738 
cat("95% CI OR       : [", round(ci_OR[1],4), ",", round(ci_OR[2],4), "]\n")
95% CI OR       : [ 1.7867 , 4.9494 ]

Interpretasi: Odds kanker paru pada perokok adalah 2.97 kali lebih besar dibandingkan non-perokok. Efek sangat kuat dan signifikan.


2.4 Uji Hipotesis

2.4.1 Uji Dua Proporsi

\[H_0: p_1 = p_2 \quad \text{vs} \quad H_1: p_1 \neq p_2\]

\[Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\]

uji_prop <- prop.test(c(x1, x2), c(n1, n2), correct = FALSE)
print(uji_prop)

    2-sample test for equality of proportions without continuity correction

data:  c(x1, x2) out of c(n1, n2)
X-squared = 19.129, df = 1, p-value = 1.222e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 0.1516343 0.3517663
sample estimates:
   prop 1    prop 2 
0.5142003 0.2625000 

2.4.2 Uji Chi-Square Independensi

\[H_0: \text{Merokok dan Kanker Paru independen}\]

\[\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]

uji_chi <- chisq.test(tabel1, correct = FALSE)
print(uji_chi)

    Pearson's Chi-squared test

data:  tabel1
X-squared = 19.129, df = 1, p-value = 1.222e-05
cat("Nilai harapan:\n")
Nilai harapan:
print(uji_chi$expected)
              Diagnosis
Status Merokok Cancer (+) Control (-)
    Smoker            669         669
    Non-Smoker         40          40

2.4.3 Uji Likelihood Ratio (G²)

\[G^2 = 2\sum O_{ij} \ln\left(\frac{O_{ij}}{E_{ij}}\right)\]

library(DescTools)
G2_stat <- GTest(tabel1)
print(G2_stat)

    Log likelihood ratio (G-test) test of independence without correction

data:  tabel1
G = 19.878, X-squared df = 1, p-value = 8.254e-06

2.4.4 Fisher Exact Test

uji_fisher <- fisher.test(tabel1)
print(uji_fisher)

    Fisher's Exact Test for Count Data

data:  tabel1
p-value = 1.476e-05
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.755611 5.210711
sample estimates:
odds ratio 
  2.971634 

2.5 Perbandingan Hasil Uji

compare_df <- data.frame(
  "Metode Uji"    = c("Uji Dua Proporsi", "Chi-Square", "Likelihood Ratio (G²)", "Fisher Exact Test"),
  "Statistik Uji" = c(
    paste0("Z = ", round(sqrt(uji_prop$statistic), 4)),
    paste0("χ² = ", round(uji_chi$statistic, 4)),
    paste0("G² = ", round(G2_stat$statistic, 4)),
    "—"
  ),
  "p-value"   = c(
    formatC(uji_prop$p.value, format="e", digits=3),
    formatC(uji_chi$p.value,  format="e", digits=3),
    formatC(G2_stat$p.value,  format="e", digits=3),
    formatC(uji_fisher$p.value, format="e", digits=3)
  ),
  "df" = c(1, 1, 1, "—"),
  "Keputusan" = rep("Tolak H₀", 4),
  check.names = FALSE
)

kable(compare_df, align = "c", caption = "Perbandingan Hasil Uji Hipotesis (α = 0.05)") |>
  kable_styling(bootstrap_options = c("striped","bordered","hover"), full_width=TRUE) |>
  row_spec(0, bold=TRUE, background="#000000", color="white")
Perbandingan Hasil Uji Hipotesis (α = 0.05)
Metode Uji Statistik Uji p-value df Keputusan
Uji Dua Proporsi Z = 4.3737 1.222e-05 1 Tolak H₀
Chi-Square χ² = 19.1292 1.222e-05 1 Tolak H₀
Likelihood Ratio (G²) G² = 19.878 8.254e-06 1 Tolak H₀
Fisher Exact Test 1.476e-05 Tolak H₀

Interpretasi Komparatif:
Keempat metode menghasilkan keputusan yang konsisten: tolak \(H_0\) pada \(\alpha = 0.05\). Semua p-value sangat kecil (< 0.0001), menunjukkan bukti statistik yang kuat. Uji proporsi dan chi-square ekuivalen untuk tabel 2×2. Fisher exact test sesuai untuk sel kecil, meski di sini semua frekuensi harapan besar.


2.6 Kesimpulan Kasus 1

Terdapat hubungan yang signifikan antara kebiasaan merokok dan kanker paru (p < 0,0001 pada semua uji). Perokok memiliki risiko kanker paru 1.96 kali lebih tinggi (RR) dan odds 2.97 kali lebih besar (OR) dibandingkan non-perokok. Risk Difference sebesar 0.2517 menunjukkan perbedaan absolut yang substansial. Semua ukuran asosiasi signifikan secara statistik berdasarkan interval kepercayaan 95%.


3 Kasus 2: Tabel Kontingensi 2×3

3.1 Data dan Tabel Kontingensi

Data menggambarkan hubungan antara gender dan identifikasi partai politik.

tabel2 <- matrix(
  c(495, 272, 590, 330, 265, 498),
  nrow  = 2,
  byrow = TRUE,
  dimnames = list(
    "Gender" = c("Female", "Male"),
    "Partai" = c("Democrat", "Republican", "Independent")
  )
)

addmargins(tabel2)
        Partai
Gender   Democrat Republican Independent  Sum
  Female      495        272         590 1357
  Male        330        265         498 1093
  Sum         825        537        1088 2450
df2 <- data.frame(
  "Gender"      = c("Female", "Male", "Total"),
  "Democrat"    = c(495, 330, 825),
  "Republican"  = c(272, 265, 537),
  "Independent" = c(590, 498, 1088),
  "Total"       = c(1357, 1093, 2450),
  check.names   = FALSE
)

kable(df2, align="c", caption="Tabel Kontingensi 2×3: Gender dan Identifikasi Partai Politik") |>
  kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
  row_spec(0, bold=TRUE, background="#000000", color="white") |>
  row_spec(3, bold=TRUE, background="#d0d0d0")
Tabel Kontingensi 2×3: Gender dan Identifikasi Partai Politik
Gender Democrat Republican Independent Total
Female 495 272 590 1357
Male 330 265 498 1093
Total 825 537 1088 2450

3.2 Frekuensi Harapan

\[E_{ij} = \frac{(\text{Total Baris}_i) \times (\text{Total Kolom}_j)}{N}\]

chi2_full <- chisq.test(tabel2, correct=FALSE)
cat("Frekuensi Harapan (E_ij):\n")
Frekuensi Harapan (E_ij):
print(round(chi2_full$expected, 2))
        Partai
Gender   Democrat Republican Independent
  Female   456.95     297.43      602.62
  Male     368.05     239.57      485.38

Semua frekuensi harapan > 5, sehingga asumsi chi-square terpenuhi.


3.3 Uji Chi-Square Keseluruhan

\[H_0: \text{Gender dan Identifikasi Partai independen}\]

\[\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad df = (r-1)(c-1) = (2-1)(3-1) = 2\]

print(chi2_full)

    Pearson's Chi-squared test

data:  tabel2
X-squared = 12.569, df = 2, p-value = 0.001865

3.4 Residual Pearson dan Standardized Residual

3.4.1 Residual Pearson

\[r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}\]

3.4.2 Standardized Residual

\[d_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1 - p_{i\cdot})(1 - p_{\cdot j})}}\]

# Residual Pearson
pearson_res <- chi2_full$residuals
cat("Residual Pearson:\n")
Residual Pearson:
print(round(pearson_res, 4))
        Partai
Gender   Democrat Republican Independent
  Female   1.7801    -1.4747     -0.5140
  Male    -1.9834     1.6431      0.5728
# Standardized residual
std_res <- chi2_full$stdres
cat("\nStandardized Residual:\n")

Standardized Residual:
print(round(std_res, 4))
        Partai
Gender   Democrat Republican Independent
  Female   3.2724    -2.4986     -1.0322
  Male    -3.2724     2.4986      1.0322
library(corrplot)

# Heatmap residual
corrplot(
  as.matrix(std_res),
  is.corr  = FALSE,
  method   = "color",
  col      = colorRampPalette(c("black","white","gray30"))(200),
  addCoef.col = "black",
  tl.col   = "black",
  tl.srt   = 0,
  cl.pos   = "r",
  title    = "Standardized Residual",
  mar      = c(0,0,2,0)
)
Standardized Residual per Sel

Standardized Residual per Sel

Interpretasi Residual:
Standardized residual dengan \(|d_{ij}| > 2\) mengindikasikan sel yang berkontribusi signifikan. Female–Democrat dan Male–Democrat menunjukkan pola perbedaan terbesar, mengindikasikan wanita lebih cenderung mengidentifikasi sebagai Democrat dibanding pria.


3.5 Partisi Chi-Square

3.5.1 Partisi 1: Democrat vs Republican

tabel2a <- tabel2[, c("Democrat", "Republican")]
chi2a   <- chisq.test(tabel2a, correct=FALSE)
cat("Chi-Square (Democrat vs Republican):\n")
Chi-Square (Democrat vs Republican):
print(chi2a)

    Pearson's Chi-squared test

data:  tabel2a
X-squared = 11.555, df = 1, p-value = 0.0006758

3.5.2 Partisi 2: (Democrat + Republican) vs Independent

tabel2b <- cbind(
  "Dem+Rep"     = tabel2[,"Democrat"] + tabel2[,"Republican"],
  "Independent" = tabel2[,"Independent"]
)
chi2b <- chisq.test(tabel2b, correct=FALSE)
cat("Chi-Square ((Democrat+Republican) vs Independent):\n")
Chi-Square ((Democrat+Republican) vs Independent):
print(chi2b)

    Pearson's Chi-squared test

data:  tabel2b
X-squared = 1.0654, df = 1, p-value = 0.302

3.5.3 Ringkasan Partisi

partisi_df <- data.frame(
  "Partisi"       = c(
    "Democrat vs Republican",
    "(Democrat+Republican) vs Independent",
    "Total Keseluruhan"
  ),
  "χ²"   = c(
    round(chi2a$statistic, 4),
    round(chi2b$statistic, 4),
    round(chi2_full$statistic, 4)
  ),
  "df"      = c(1, 1, 2),
  "p-value" = c(
    formatC(chi2a$p.value, format="e", digits=3),
    formatC(chi2b$p.value, format="e", digits=3),
    formatC(chi2_full$p.value, format="e", digits=3)
  ),
  "Keputusan" = c("Tolak H₀", "Gagal Tolak H₀", "Tolak H₀"),
  check.names = FALSE
)

kable(partisi_df, align="c", caption="Ringkasan Partisi Chi-Square") |>
  kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
  row_spec(0, bold=TRUE, background="#000000", color="white") |>
  row_spec(3, bold=TRUE, background="#d0d0d0")
Ringkasan Partisi Chi-Square
Partisi χ² df p-value Keputusan
Democrat vs Republican 11.5545 1 6.758e-04 Tolak H₀
(Democrat+Republican) vs Independent 1.0654 1 3.020e-01 Gagal Tolak H₀
Total Keseluruhan 12.5693 2 1.865e-03 Tolak H₀

Catatan Aditivitas: \(\chi^2_{\text{Partisi 1}} + \chi^2_{\text{Partisi 2}} \approx \chi^2_{\text{Total}}\) dengan \(df = 1 + 1 = 2\), konsisten dengan uji keseluruhan.


3.6 Visualisasi

mosaicplot(
  tabel2,
  main   = "Mosaic Plot: Gender dan Partai Politik",
  color  = c("#1a1a1a","#888888","#cccccc"),
  border = "white",
  las    = 1,
  cex.axis = 0.85
)
Mosaic Plot: Gender dan Identifikasi Partai Politik

Mosaic Plot: Gender dan Identifikasi Partai Politik

prop_tabel2 <- prop.table(tabel2, margin=1)

barplot(
  t(prop_tabel2),
  beside  = TRUE,
  col     = c("#1a1a1a","#666666","#cccccc"),
  legend.text = colnames(tabel2),
  args.legend = list(x="topright", bty="n", cex=0.85),
  main    = "Proporsi Identifikasi Partai per Gender",
  xlab    = "Gender",
  ylab    = "Proporsi",
  ylim    = c(0, 0.55),
  las     = 1,
  border  = "white"
)
Proporsi Identifikasi Partai per Gender

Proporsi Identifikasi Partai per Gender


3.7 Kontribusi Kategori Terhadap Asosiasi

kontrib <- round((chi2_full$residuals^2), 4)
pct     <- round(100 * kontrib / chi2_full$statistic, 2)

df_kontrib <- data.frame(
  Sel          = c("Female–Democrat","Female–Republican","Female–Independent",
                   "Male–Democrat","Male–Republican","Male–Independent"),
  "O"          = as.vector(t(tabel2)),
  "E"          = round(as.vector(t(chi2_full$expected)), 2),
  "Residual²"  = as.vector(t(kontrib)),
  "Kontribusi (%)" = as.vector(t(pct)),
  check.names  = FALSE
)
df_kontrib <- df_kontrib[order(-df_kontrib[["Kontribusi (%)"]]),]

kable(df_kontrib, align="c", caption="Kontribusi Sel terhadap χ² Keseluruhan", row.names=FALSE) |>
  kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
  row_spec(0, bold=TRUE, background="#000000", color="white")
Kontribusi Sel terhadap χ² Keseluruhan
Sel O E Residual² Kontribusi (%)
Male–Democrat 330 368.05 3.9339 31.30
Female–Democrat 495 456.95 3.1686 25.21
Male–Republican 265 239.57 2.6999 21.48
Female–Republican 272 297.43 2.1746 17.30
Male–Independent 498 485.38 0.3281 2.61
Female–Independent 590 602.62 0.2642 2.10

3.8 Kesimpulan Kasus 2

Uji chi-square keseluruhan menunjukkan terdapat hubungan yang signifikan antara gender dan identifikasi partai politik (\(\chi^2\) = 12.5693, \(df\) = 2, p < 0,05).

Hasil partisi menunjukkan:

  • Democrat vs Republican: Perbedaan signifikan berdasarkan gender. Wanita lebih cenderung mengidentifikasi sebagai Democrat dibandingkan pria.
  • (Democrat+Republican) vs Independent: Tidak terdapat perbedaan signifikan antar gender dalam memilih antara partai besar versus Independen.

Kategori yang paling berkontribusi terhadap asosiasi adalah sel Female–Democrat dan Male–Democrat, sebagaimana tercermin dari nilai standardized residual dan persentase kontribusi terhadap \(\chi^2\) terbesar.


4 Ringkasan Akhir

Kasus Variabel Uji Hasil
1 Merokok–Kanker Semua uji Signifikan (p < 0,0001)
2 Gender–Partai Chi-square keseluruhan Signifikan (p < 0,05)
2 Democrat vs Republican Partisi 1 Signifikan
2 (Dem+Rep) vs Independent Partisi 2 Tidak Signifikan