Laporan ini membahas inferensi statistik pada tabel kontingensi dua arah menggunakan data kategorik. Analisis mencakup estimasi proporsi, interval kepercayaan, serta berbagai uji hipotesis: uji dua proporsi, chi-square, likelihood ratio (\(G^2\)), dan Fisher exact test.
Terdapat dua kasus:
Data menggambarkan hubungan antara kebiasaan merokok dan kejadian kanker paru.
# Membuat tabel kontingensi
tabel1 <- matrix(
c(688, 650, 21, 59),
nrow = 2,
byrow = TRUE,
dimnames = list(
"Status Merokok" = c("Smoker", "Non-Smoker"),
"Diagnosis" = c("Cancer (+)", "Control (-)")
)
)
# Tampilkan dengan total
addmargins(tabel1) Diagnosis
Status Merokok Cancer (+) Control (-) Sum
Smoker 688 650 1338
Non-Smoker 21 59 80
Sum 709 709 1418
library(knitr)
library(kableExtra)
df1 <- data.frame(
"Status Merokok" = c("Smoker", "Non-Smoker", "Total"),
"Cancer (+)" = c(688, 21, 709),
"Control (-)" = c(650, 59, 709),
"Total" = c(1338, 80, 1418),
check.names = FALSE
)
kable(df1, align = "c", caption = "Tabel Kontingensi 2×2: Merokok dan Kanker Paru") |>
kable_styling(
bootstrap_options = c("striped", "bordered", "hover"),
full_width = TRUE
) |>
row_spec(0, bold = TRUE, background = "#000000", color = "white") |>
row_spec(3, bold = TRUE, background = "#d0d0d0")| Status Merokok | Cancer (+) | Control (-) | Total |
|---|---|---|---|
| Smoker | 688 | 650 | 1338 |
| Non-Smoker | 21 | 59 | 80 |
| Total | 709 | 709 | 1418 |
\[\hat{p}_{\text{Smoker}} = \frac{688}{1338}, \quad \hat{p}_{\text{Non-Smoker}} = \frac{21}{80}\]
n1 <- 1338; x1 <- 688 # Smoker
n2 <- 80; x2 <- 21 # Non-Smoker
p1 <- x1 / n1
p2 <- x2 / n2
cat("Proporsi Smoker :", round(p1, 4), "\n")Proporsi Smoker : 0.5142
Proporsi Non-Smoker : 0.2625
\[\hat{p} \pm z_{0.025} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
z <- qnorm(0.975)
# CI Smoker
ci1_low <- p1 - z * sqrt(p1*(1-p1)/n1)
ci1_up <- p1 + z * sqrt(p1*(1-p1)/n1)
# CI Non-Smoker
ci2_low <- p2 - z * sqrt(p2*(1-p2)/n2)
ci2_up <- p2 + z * sqrt(p2*(1-p2)/n2)
cat("95% CI Smoker : [", round(ci1_low,4), ",", round(ci1_up,4), "]\n")95% CI Smoker : [ 0.4874 , 0.541 ]
95% CI Non-Smoker : [ 0.1661 , 0.3589 ]
\[\text{RD} = \hat{p}_1 - \hat{p}_2, \quad \text{SE}(\text{RD}) = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]
RD <- p1 - p2
SE_RD <- sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
ci_RD <- c(RD - z*SE_RD, RD + z*SE_RD)
cat("Risk Difference (RD) :", round(RD, 4), "\n")Risk Difference (RD) : 0.2517
95% CI RD : [ 0.1516 , 0.3518 ]
Interpretasi: Kelompok perokok memiliki risiko kanker paru yang lebih tinggi sekitar 0.2517 dibandingkan non-perokok.
\[\text{RR} = \frac{\hat{p}_1}{\hat{p}_2}, \quad \text{SE}(\ln\text{RR}) = \sqrt{\frac{1-\hat{p}_1}{x_1} + \frac{1-\hat{p}_2}{x_2}}\]
RR <- p1 / p2
lnRR <- log(RR)
SE_lnRR <- sqrt((1-p1)/x1 + (1-p2)/x2)
ci_RR <- exp(c(lnRR - z*SE_lnRR, lnRR + z*SE_lnRR))
cat("Relative Risk (RR) :", round(RR, 4), "\n")Relative Risk (RR) : 1.9589
95% CI RR : [ 1.3517 , 2.8387 ]
Interpretasi: Risiko kanker paru pada perokok adalah 1.96 kali lebih besar dibandingkan non-perokok. CI tidak mencakup 1, sehingga hubungan signifikan.
\[\text{OR} = \frac{x_1(n_2 - x_2)}{x_2(n_1 - x_1)}, \quad \text{SE}(\ln\text{OR}) = \sqrt{\frac{1}{x_1} + \frac{1}{n_1-x_1} + \frac{1}{x_2} + \frac{1}{n_2-x_2}}\]
OR <- (x1*(n2-x2)) / (x2*(n1-x1))
lnOR <- log(OR)
SE_lnOR <- sqrt(1/x1 + 1/(n1-x1) + 1/x2 + 1/(n2-x2))
ci_OR <- exp(c(lnOR - z*SE_lnOR, lnOR + z*SE_lnOR))
cat("Odds Ratio (OR) :", round(OR, 4), "\n")Odds Ratio (OR) : 2.9738
95% CI OR : [ 1.7867 , 4.9494 ]
Interpretasi: Odds kanker paru pada perokok adalah 2.97 kali lebih besar dibandingkan non-perokok. Efek sangat kuat dan signifikan.
\[H_0: p_1 = p_2 \quad \text{vs} \quad H_1: p_1 \neq p_2\]
\[Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\]
2-sample test for equality of proportions without continuity correction
data: c(x1, x2) out of c(n1, n2)
X-squared = 19.129, df = 1, p-value = 1.222e-05
alternative hypothesis: two.sided
95 percent confidence interval:
0.1516343 0.3517663
sample estimates:
prop 1 prop 2
0.5142003 0.2625000
\[H_0: \text{Merokok dan Kanker Paru independen}\]
\[\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]
Pearson's Chi-squared test
data: tabel1
X-squared = 19.129, df = 1, p-value = 1.222e-05
Nilai harapan:
Diagnosis
Status Merokok Cancer (+) Control (-)
Smoker 669 669
Non-Smoker 40 40
\[G^2 = 2\sum O_{ij} \ln\left(\frac{O_{ij}}{E_{ij}}\right)\]
Log likelihood ratio (G-test) test of independence without correction
data: tabel1
G = 19.878, X-squared df = 1, p-value = 8.254e-06
compare_df <- data.frame(
"Metode Uji" = c("Uji Dua Proporsi", "Chi-Square", "Likelihood Ratio (G²)", "Fisher Exact Test"),
"Statistik Uji" = c(
paste0("Z = ", round(sqrt(uji_prop$statistic), 4)),
paste0("χ² = ", round(uji_chi$statistic, 4)),
paste0("G² = ", round(G2_stat$statistic, 4)),
"—"
),
"p-value" = c(
formatC(uji_prop$p.value, format="e", digits=3),
formatC(uji_chi$p.value, format="e", digits=3),
formatC(G2_stat$p.value, format="e", digits=3),
formatC(uji_fisher$p.value, format="e", digits=3)
),
"df" = c(1, 1, 1, "—"),
"Keputusan" = rep("Tolak H₀", 4),
check.names = FALSE
)
kable(compare_df, align = "c", caption = "Perbandingan Hasil Uji Hipotesis (α = 0.05)") |>
kable_styling(bootstrap_options = c("striped","bordered","hover"), full_width=TRUE) |>
row_spec(0, bold=TRUE, background="#000000", color="white")| Metode Uji | Statistik Uji | p-value | df | Keputusan |
|---|---|---|---|---|
| Uji Dua Proporsi | Z = 4.3737 | 1.222e-05 | 1 | Tolak H₀ |
| Chi-Square | χ² = 19.1292 | 1.222e-05 | 1 | Tolak H₀ |
| Likelihood Ratio (G²) | G² = 19.878 | 8.254e-06 | 1 | Tolak H₀ |
| Fisher Exact Test | — | 1.476e-05 | — | Tolak H₀ |
Interpretasi Komparatif:
Keempat metode menghasilkan keputusan yang konsisten: tolak \(H_0\) pada \(\alpha = 0.05\). Semua p-value sangat kecil (< 0.0001), menunjukkan bukti statistik yang kuat. Uji proporsi dan chi-square ekuivalen untuk tabel 2×2. Fisher exact test sesuai untuk sel kecil, meski di sini semua frekuensi harapan besar.
Terdapat hubungan yang signifikan antara kebiasaan merokok dan kanker paru (p < 0,0001 pada semua uji). Perokok memiliki risiko kanker paru 1.96 kali lebih tinggi (RR) dan odds 2.97 kali lebih besar (OR) dibandingkan non-perokok. Risk Difference sebesar 0.2517 menunjukkan perbedaan absolut yang substansial. Semua ukuran asosiasi signifikan secara statistik berdasarkan interval kepercayaan 95%.
Data menggambarkan hubungan antara gender dan identifikasi partai politik.
tabel2 <- matrix(
c(495, 272, 590, 330, 265, 498),
nrow = 2,
byrow = TRUE,
dimnames = list(
"Gender" = c("Female", "Male"),
"Partai" = c("Democrat", "Republican", "Independent")
)
)
addmargins(tabel2) Partai
Gender Democrat Republican Independent Sum
Female 495 272 590 1357
Male 330 265 498 1093
Sum 825 537 1088 2450
df2 <- data.frame(
"Gender" = c("Female", "Male", "Total"),
"Democrat" = c(495, 330, 825),
"Republican" = c(272, 265, 537),
"Independent" = c(590, 498, 1088),
"Total" = c(1357, 1093, 2450),
check.names = FALSE
)
kable(df2, align="c", caption="Tabel Kontingensi 2×3: Gender dan Identifikasi Partai Politik") |>
kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
row_spec(0, bold=TRUE, background="#000000", color="white") |>
row_spec(3, bold=TRUE, background="#d0d0d0")| Gender | Democrat | Republican | Independent | Total |
|---|---|---|---|---|
| Female | 495 | 272 | 590 | 1357 |
| Male | 330 | 265 | 498 | 1093 |
| Total | 825 | 537 | 1088 | 2450 |
\[E_{ij} = \frac{(\text{Total Baris}_i) \times (\text{Total Kolom}_j)}{N}\]
Frekuensi Harapan (E_ij):
Partai
Gender Democrat Republican Independent
Female 456.95 297.43 602.62
Male 368.05 239.57 485.38
Semua frekuensi harapan > 5, sehingga asumsi chi-square terpenuhi.
\[H_0: \text{Gender dan Identifikasi Partai independen}\]
\[\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad df = (r-1)(c-1) = (2-1)(3-1) = 2\]
Pearson's Chi-squared test
data: tabel2
X-squared = 12.569, df = 2, p-value = 0.001865
\[r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}\]
\[d_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1 - p_{i\cdot})(1 - p_{\cdot j})}}\]
Residual Pearson:
Partai
Gender Democrat Republican Independent
Female 1.7801 -1.4747 -0.5140
Male -1.9834 1.6431 0.5728
Standardized Residual:
Partai
Gender Democrat Republican Independent
Female 3.2724 -2.4986 -1.0322
Male -3.2724 2.4986 1.0322
library(corrplot)
# Heatmap residual
corrplot(
as.matrix(std_res),
is.corr = FALSE,
method = "color",
col = colorRampPalette(c("black","white","gray30"))(200),
addCoef.col = "black",
tl.col = "black",
tl.srt = 0,
cl.pos = "r",
title = "Standardized Residual",
mar = c(0,0,2,0)
)Standardized Residual per Sel
Interpretasi Residual:
Standardized residual dengan \(|d_{ij}| > 2\) mengindikasikan sel yang berkontribusi signifikan. Female–Democrat dan Male–Democrat menunjukkan pola perbedaan terbesar, mengindikasikan wanita lebih cenderung mengidentifikasi sebagai Democrat dibanding pria.
tabel2a <- tabel2[, c("Democrat", "Republican")]
chi2a <- chisq.test(tabel2a, correct=FALSE)
cat("Chi-Square (Democrat vs Republican):\n")Chi-Square (Democrat vs Republican):
Pearson's Chi-squared test
data: tabel2a
X-squared = 11.555, df = 1, p-value = 0.0006758
tabel2b <- cbind(
"Dem+Rep" = tabel2[,"Democrat"] + tabel2[,"Republican"],
"Independent" = tabel2[,"Independent"]
)
chi2b <- chisq.test(tabel2b, correct=FALSE)
cat("Chi-Square ((Democrat+Republican) vs Independent):\n")Chi-Square ((Democrat+Republican) vs Independent):
Pearson's Chi-squared test
data: tabel2b
X-squared = 1.0654, df = 1, p-value = 0.302
partisi_df <- data.frame(
"Partisi" = c(
"Democrat vs Republican",
"(Democrat+Republican) vs Independent",
"Total Keseluruhan"
),
"χ²" = c(
round(chi2a$statistic, 4),
round(chi2b$statistic, 4),
round(chi2_full$statistic, 4)
),
"df" = c(1, 1, 2),
"p-value" = c(
formatC(chi2a$p.value, format="e", digits=3),
formatC(chi2b$p.value, format="e", digits=3),
formatC(chi2_full$p.value, format="e", digits=3)
),
"Keputusan" = c("Tolak H₀", "Gagal Tolak H₀", "Tolak H₀"),
check.names = FALSE
)
kable(partisi_df, align="c", caption="Ringkasan Partisi Chi-Square") |>
kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
row_spec(0, bold=TRUE, background="#000000", color="white") |>
row_spec(3, bold=TRUE, background="#d0d0d0")| Partisi | χ² | df | p-value | Keputusan |
|---|---|---|---|---|
| Democrat vs Republican | 11.5545 | 1 | 6.758e-04 | Tolak H₀ |
| (Democrat+Republican) vs Independent | 1.0654 | 1 | 3.020e-01 | Gagal Tolak H₀ |
| Total Keseluruhan | 12.5693 | 2 | 1.865e-03 | Tolak H₀ |
Catatan Aditivitas: \(\chi^2_{\text{Partisi 1}} + \chi^2_{\text{Partisi 2}} \approx \chi^2_{\text{Total}}\) dengan \(df = 1 + 1 = 2\), konsisten dengan uji keseluruhan.
mosaicplot(
tabel2,
main = "Mosaic Plot: Gender dan Partai Politik",
color = c("#1a1a1a","#888888","#cccccc"),
border = "white",
las = 1,
cex.axis = 0.85
)Mosaic Plot: Gender dan Identifikasi Partai Politik
prop_tabel2 <- prop.table(tabel2, margin=1)
barplot(
t(prop_tabel2),
beside = TRUE,
col = c("#1a1a1a","#666666","#cccccc"),
legend.text = colnames(tabel2),
args.legend = list(x="topright", bty="n", cex=0.85),
main = "Proporsi Identifikasi Partai per Gender",
xlab = "Gender",
ylab = "Proporsi",
ylim = c(0, 0.55),
las = 1,
border = "white"
)Proporsi Identifikasi Partai per Gender
kontrib <- round((chi2_full$residuals^2), 4)
pct <- round(100 * kontrib / chi2_full$statistic, 2)
df_kontrib <- data.frame(
Sel = c("Female–Democrat","Female–Republican","Female–Independent",
"Male–Democrat","Male–Republican","Male–Independent"),
"O" = as.vector(t(tabel2)),
"E" = round(as.vector(t(chi2_full$expected)), 2),
"Residual²" = as.vector(t(kontrib)),
"Kontribusi (%)" = as.vector(t(pct)),
check.names = FALSE
)
df_kontrib <- df_kontrib[order(-df_kontrib[["Kontribusi (%)"]]),]
kable(df_kontrib, align="c", caption="Kontribusi Sel terhadap χ² Keseluruhan", row.names=FALSE) |>
kable_styling(bootstrap_options=c("striped","bordered","hover"), full_width=TRUE) |>
row_spec(0, bold=TRUE, background="#000000", color="white")| Sel | O | E | Residual² | Kontribusi (%) |
|---|---|---|---|---|
| Male–Democrat | 330 | 368.05 | 3.9339 | 31.30 |
| Female–Democrat | 495 | 456.95 | 3.1686 | 25.21 |
| Male–Republican | 265 | 239.57 | 2.6999 | 21.48 |
| Female–Republican | 272 | 297.43 | 2.1746 | 17.30 |
| Male–Independent | 498 | 485.38 | 0.3281 | 2.61 |
| Female–Independent | 590 | 602.62 | 0.2642 | 2.10 |
Uji chi-square keseluruhan menunjukkan terdapat hubungan yang signifikan antara gender dan identifikasi partai politik (\(\chi^2\) = 12.5693, \(df\) = 2, p < 0,05).
Hasil partisi menunjukkan:
Kategori yang paling berkontribusi terhadap asosiasi adalah sel Female–Democrat dan Male–Democrat, sebagaimana tercermin dari nilai standardized residual dan persentase kontribusi terhadap \(\chi^2\) terbesar.
| Kasus | Variabel | Uji | Hasil |
|---|---|---|---|
| 1 | Merokok–Kanker | Semua uji | Signifikan (p < 0,0001) |
| 2 | Gender–Partai | Chi-square keseluruhan | Signifikan (p < 0,05) |
| 2 | Democrat vs Republican | Partisi 1 | Signifikan |
| 2 | (Dem+Rep) vs Independent | Partisi 2 | Tidak Signifikan |