Penelitian ini menerapkan Principal Component Analysis (PCA) dan Factor Analysis (FA) pada data indikator kesehatan global tahun 2019 yang bersumber dari World Bank. Tujuannya adalah mereduksi dimensi data multivariat dan mengidentifikasi faktor-faktor laten yang mendasari variasi kondisi kesehatan antar negara.
Dataset yang digunakan adalah World Bank Health Data yang mencakup berbagai indikator kesehatan dari seluruh dunia. Analisis difokuskan pada data tahun 2019 (pra-pandemi COVID-19).
| Simbol | Nama Variabel | Satuan |
|---|---|---|
| X1 | Pengeluaran kesehatan (health_exp) |
% PDB |
| X2 | Angka harapan hidup (life_expect) |
Tahun |
| X3 | Angka kematian ibu (maternal_mortality) |
per 100.000 KH |
| X4 | Angka kematian bayi (infant_mortality) |
per 1.000 KH |
| X5 | Angka kematian neonatal (neonatal_mortality) |
per 1.000 KH |
| X6 | Angka kematian balita (under_5_mortality) |
per 1.000 KH |
| X7 | Prevalensi HIV (prev_hiv) |
% usia 15–49 |
| X8 | Insidensi tuberkulosis (inci_tuberc) |
per 100.000 |
| X9 | Prevalensi kekurangan gizi (prev_undernourishment) |
% |
df <- read.csv("world_health_data.csv")
vars <- c("health_exp", "life_expect", "maternal_mortality", "infant_mortality",
"neonatal_mortality", "under_5_mortality", "prev_hiv", "inci_tuberc",
"prev_undernourishment")
# Kode agregat World Bank (kawasan/kelompok pendapatan), bukan negara individual
aggregates <- c("LAC","LCN","LIC","LMC","LMY","MEA","MIC","MNA","NAC","SAS","SSA","SSF",
"WLD","ARB","CEB","CSS","EAP","EAS","EAR","ECA","ECS","EMU","EUU","FCS",
"HIC","HPC","IBD","IBT","IDA","IDB","IDX","INX","LDC","LTE","OED","OSS",
"PRE","PSS","PST","TEA","TEC","TLA","TMN","TSA","TSS","UMC","AFE","AFW")
data2019 <- df[df$year == 2019, c("country", "country_code", vars)]
data2019 <- data2019[!(data2019$country_code %in% aggregates), ]
data2019 <- data2019[complete.cases(data2019), ]
rownames(data2019) <- data2019$country
X <- data2019[, vars]
cat("Jumlah negara (observasi) :", nrow(X), "\n")## Jumlah negara (observasi) : 135
## Jumlah variabel : 9
desc_stats <- data.frame(
Variabel = colnames(X),
N = nrow(X),
Mean = round(sapply(X, mean), 2),
SD = round(sapply(X, sd), 2),
Min = round(sapply(X, min), 2),
Max = round(sapply(X, max), 2)
)
knitr::kable(desc_stats, caption = "Tabel 1. Statistika Deskriptif Variabel Penelitian",
align = c("l","c","r","r","r","r"))| Variabel | N | Mean | SD | Min | Max | |
|---|---|---|---|---|---|---|
| health_exp | health_exp | 135 | 6.38 | 2.63 | 2.19 | 16.67 |
| life_expect | life_expect | 135 | 72.11 | 7.37 | 53.26 | 83.90 |
| maternal_mortality | maternal_mortality | 135 | 148.97 | 193.71 | 1.00 | 1047.00 |
| infant_mortality | infant_mortality | 135 | 22.15 | 19.30 | 1.80 | 82.60 |
| neonatal_mortality | neonatal_mortality | 135 | 13.47 | 10.49 | 1.00 | 41.70 |
| under_5_mortality | under_5_mortality | 135 | 29.55 | 28.71 | 2.30 | 122.20 |
| prev_hiv | prev_hiv | 135 | 1.58 | 3.89 | 0.10 | 28.40 |
| inci_tuberc | inci_tuberc | 135 | 115.53 | 142.02 | 0.00 | 615.00 |
| prev_undernourishment | prev_undernourishment | 135 | 10.08 | 9.99 | 2.50 | 43.60 |
Xz <- as.data.frame(scale(X))
Xz_long <- reshape2::melt(Xz)
ggplot(Xz_long, aes(x = variable, y = value)) +
geom_boxplot(fill = "steelblue", alpha = 0.6, outlier.colour = "red", outlier.size = 1.5) +
theme_minimal(base_size = 12) +
labs(title = "Distribusi Variabel Indikator Kesehatan (Z-score)",
x = "", y = "Z-score") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Gambar 1. Distribusi Variabel Indikator Kesehatan (Z-score)
r_mat <- cor(X)
# 1. Uji Bartlett's Test of Sphericity
bartlett_test <- cortest.bartlett(r_mat, n = nrow(X))
cat("--- Uji Bartlett's Test of Sphericity ---\n")## --- Uji Bartlett's Test of Sphericity ---
cat("Chi-square =", round(bartlett_test$chisq, 3),
"| df =", bartlett_test$df,
"| p-value =", format.pval(bartlett_test$p.value), "\n\n")## Chi-square = 1552.087 | df = 36 | p-value = < 2.22e-16
## --- Uji KMO (Kaiser-Meyer-Olkin) ---
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r_mat)
## Overall MSA = 0.85
## MSA for each item =
## health_exp life_expect maternal_mortality
## 0.70 0.90 0.90
## infant_mortality neonatal_mortality under_5_mortality
## 0.79 0.93 0.81
## prev_hiv inci_tuberc prev_undernourishment
## 0.60 0.85 0.95
##
## Determinant matriks korelasi: 6.630415e-06
Interpretasi:
pca_model <- prcomp(X, scale. = TRUE)
eig <- get_eigenvalue(pca_model)
knitr::kable(round(eig, 3),
caption = "Tabel 2. Eigenvalue dan Proporsi Varians PCA",
col.names = c("Eigenvalue", "% Varians", "% Varians Kumulatif"))| Eigenvalue | % Varians | % Varians Kumulatif | |
|---|---|---|---|
| Dim.1 | 5.914 | 65.716 | 65.716 |
| Dim.2 | 1.100 | 12.218 | 77.934 |
| Dim.3 | 0.832 | 9.248 | 87.182 |
| Dim.4 | 0.437 | 4.855 | 92.037 |
| Dim.5 | 0.408 | 4.534 | 96.571 |
| Dim.6 | 0.159 | 1.772 | 98.343 |
| Dim.7 | 0.078 | 0.870 | 99.213 |
| Dim.8 | 0.062 | 0.688 | 99.901 |
| Dim.9 | 0.009 | 0.099 | 100.000 |
fviz_eig(pca_model, addlabels = TRUE, ylim = c(0, 70)) +
labs(title = "Scree Plot PCA") +
theme_minimal(base_size = 12)Gambar 3. Scree Plot PCA
loadings_pca <- round(pca_model$rotation[, 1:2], 3)
knitr::kable(as.data.frame(loadings_pca),
caption = "Tabel 3. Loading PCA pada PC1 dan PC2")| PC1 | PC2 | |
|---|---|---|
| health_exp | -0.191 | 0.341 |
| life_expect | -0.395 | 0.009 |
| maternal_mortality | 0.364 | -0.172 |
| infant_mortality | 0.397 | -0.111 |
| neonatal_mortality | 0.390 | -0.132 |
| under_5_mortality | 0.390 | -0.139 |
| prev_hiv | 0.168 | 0.801 |
| inci_tuberc | 0.285 | 0.390 |
| prev_undernourishment | 0.321 | 0.111 |
Interpretasi:
infant_mortality,
neonatal_mortality, under_5_mortality,
maternal_mortality) dan negatif pada
life_expect → merepresentasikan Beban Mortalitas
Umum.prev_hiv dan inci_tuberc → merepresentasikan
Beban Penyakit Menular.fviz_pca_biplot(pca_model,
repel = TRUE,
col.var = "red",
col.ind = "grey60",
label = "var") +
labs(title = "Biplot PCA — Indikator Kesehatan Global 2019") +
theme_minimal(base_size = 11)Gambar 4. Biplot PCA
pc_scores <- as.data.frame(pca_model$x[, 1:2])
pc_scores$country <- rownames(pc_scores)
cat("--- 5 Negara dengan skor PC1 TERTINGGI (kondisi kesehatan terburuk) ---\n")## --- 5 Negara dengan skor PC1 TERTINGGI (kondisi kesehatan terburuk) ---
knitr::kable(head(pc_scores[order(-pc_scores$PC1), c("country","PC1","PC2")], 5),
digits = 3, row.names = FALSE)| country | PC1 | PC2 |
|---|---|---|
| Central African Republic | 6.350 | 0.306 |
| Chad | 6.338 | -1.765 |
| Sierra Leone | 4.992 | -0.223 |
| Guinea-Bissau | 4.962 | 0.039 |
| Congo, Dem. Rep. | 4.857 | -0.777 |
## --- 5 Negara dengan skor PC1 TERENDAH (kondisi kesehatan terbaik) ---
knitr::kable(head(pc_scores[order(pc_scores$PC1), c("country","PC1","PC2")], 5),
digits = 3, row.names = FALSE)| country | PC1 | PC2 |
|---|---|---|
| Switzerland | -2.893 | 0.441 |
| United States | -2.889 | 1.148 |
| Germany | -2.838 | 0.516 |
| France | -2.832 | 0.497 |
| Spain | -2.809 | 0.250 |
Gambar 5. Parallel Analysis untuk Penentuan Jumlah Faktor
## Parallel analysis suggests that the number of factors = 2 and the number of components = NA
##
## Jumlah faktor disarankan: 2
fa_model <- fa(X, nfactors = 2, rotate = "varimax", fm = "ml")
cat("--- Loading Factor Analysis (cutoff 0.3) ---\n")## --- Loading Factor Analysis (cutoff 0.3) ---
##
## Loadings:
## ML1 ML2
## health_exp -0.303 -0.317
## life_expect -0.799 -0.517
## maternal_mortality 0.834
## infant_mortality 0.940 0.333
## neonatal_mortality 0.886 0.370
## under_5_mortality 0.954
## prev_hiv 0.596
## inci_tuberc 0.396 0.651
## prev_undernourishment 0.553 0.494
##
## ML1 ML2
## SS loadings 4.478 1.785
## Proportion Var 0.498 0.198
## Cumulative Var 0.498 0.696
communalities <- data.frame(
Variabel = names(fa_model$communality),
Communality = round(fa_model$communality, 3)
)
knitr::kable(communalities,
caption = "Tabel 4. Communalities Factor Analysis",
align = c("l","c"))| Variabel | Communality | |
|---|---|---|
| health_exp | health_exp | 0.192 |
| life_expect | life_expect | 0.905 |
| maternal_mortality | maternal_mortality | 0.771 |
| infant_mortality | infant_mortality | 0.994 |
| neonatal_mortality | neonatal_mortality | 0.922 |
| under_5_mortality | under_5_mortality | 0.982 |
| prev_hiv | prev_hiv | 0.365 |
| inci_tuberc | inci_tuberc | 0.580 |
| prev_undernourishment | prev_undernourishment | 0.550 |
Interpretasi:
under_5_mortality, infant_mortality,
neonatal_mortality, maternal_mortality, dan
negatif pada life_expect.prev_hiv dan inci_tuberc.health_exp mengindikasikan
variabel ini memiliki varians unik yang tidak dijelaskan kedua
faktor.Gambar 6. Diagram Factor Analysis
fa_scores <- as.data.frame(fa_model$scores)
colnames(fa_scores) <- c("Faktor1", "Faktor2")
fa_scores$country <- rownames(X)
knitr::kable(head(fa_scores[, c("country","Faktor1","Faktor2")], 10),
caption = "Tabel 5. Contoh Skor Faktor (10 Negara Pertama)",
digits = 3, row.names = FALSE)| country | Faktor1 | Faktor2 |
|---|---|---|
| Afghanistan | 1.387 | 0.273 |
| Angola | 1.346 | 0.553 |
| Albania | -0.554 | -0.610 |
| United Arab Emirates | -0.720 | -0.617 |
| Argentina | -0.538 | -0.523 |
| Armenia | -0.486 | -0.461 |
| Australia | -0.683 | -1.039 |
| Azerbaijan | -0.192 | -0.154 |
| Belgium | -0.703 | -0.959 |
| Benin | 2.224 | -0.669 |
Kode lengkap tersedia di RPubs. Dataset: World Bank Health Data 2019.