1 Pendahuluan

Penelitian ini menerapkan Principal Component Analysis (PCA) dan Factor Analysis (FA) pada data indikator kesehatan global tahun 2019 yang bersumber dari World Bank. Tujuannya adalah mereduksi dimensi data multivariat dan mengidentifikasi faktor-faktor laten yang mendasari variasi kondisi kesehatan antar negara.


2 Metodologi

2.1 Dataset

Dataset yang digunakan adalah World Bank Health Data yang mencakup berbagai indikator kesehatan dari seluruh dunia. Analisis difokuskan pada data tahun 2019 (pra-pandemi COVID-19).

2.2 Variabel Penelitian

Simbol Nama Variabel Satuan
X1 Pengeluaran kesehatan (health_exp) % PDB
X2 Angka harapan hidup (life_expect) Tahun
X3 Angka kematian ibu (maternal_mortality) per 100.000 KH
X4 Angka kematian bayi (infant_mortality) per 1.000 KH
X5 Angka kematian neonatal (neonatal_mortality) per 1.000 KH
X6 Angka kematian balita (under_5_mortality) per 1.000 KH
X7 Prevalensi HIV (prev_hiv) % usia 15–49
X8 Insidensi tuberkulosis (inci_tuberc) per 100.000
X9 Prevalensi kekurangan gizi (prev_undernourishment) %

3 Persiapan Data

3.1 Load Library

# install.packages(c("psych","corrplot","ggplot2","GPArotation","factoextra","reshape2"))
library(psych)
library(corrplot)
library(ggplot2)
library(GPArotation)
library(factoextra)
library(reshape2)

3.2 Import & Filter Data

df <- read.csv("world_health_data.csv")

vars <- c("health_exp", "life_expect", "maternal_mortality", "infant_mortality",
          "neonatal_mortality", "under_5_mortality", "prev_hiv", "inci_tuberc",
          "prev_undernourishment")

# Kode agregat World Bank (kawasan/kelompok pendapatan), bukan negara individual
aggregates <- c("LAC","LCN","LIC","LMC","LMY","MEA","MIC","MNA","NAC","SAS","SSA","SSF",
                "WLD","ARB","CEB","CSS","EAP","EAS","EAR","ECA","ECS","EMU","EUU","FCS",
                "HIC","HPC","IBD","IBT","IDA","IDB","IDX","INX","LDC","LTE","OED","OSS",
                "PRE","PSS","PST","TEA","TEC","TLA","TMN","TSA","TSS","UMC","AFE","AFW")

data2019 <- df[df$year == 2019, c("country", "country_code", vars)]
data2019 <- data2019[!(data2019$country_code %in% aggregates), ]
data2019 <- data2019[complete.cases(data2019), ]
rownames(data2019) <- data2019$country

X <- data2019[, vars]

cat("Jumlah negara (observasi) :", nrow(X), "\n")
## Jumlah negara (observasi) : 135
cat("Jumlah variabel           :", ncol(X), "\n")
## Jumlah variabel           : 9

4 Hasil dan Pembahasan

4.1 Karakteristik Data

4.1.1 Statistika Deskriptif

desc_stats <- data.frame(
  Variabel = colnames(X),
  N        = nrow(X),
  Mean     = round(sapply(X, mean), 2),
  SD       = round(sapply(X, sd),   2),
  Min      = round(sapply(X, min),  2),
  Max      = round(sapply(X, max),  2)
)

knitr::kable(desc_stats, caption = "Tabel 1. Statistika Deskriptif Variabel Penelitian",
             align = c("l","c","r","r","r","r"))
Tabel 1. Statistika Deskriptif Variabel Penelitian
Variabel N Mean SD Min Max
health_exp health_exp 135 6.38 2.63 2.19 16.67
life_expect life_expect 135 72.11 7.37 53.26 83.90
maternal_mortality maternal_mortality 135 148.97 193.71 1.00 1047.00
infant_mortality infant_mortality 135 22.15 19.30 1.80 82.60
neonatal_mortality neonatal_mortality 135 13.47 10.49 1.00 41.70
under_5_mortality under_5_mortality 135 29.55 28.71 2.30 122.20
prev_hiv prev_hiv 135 1.58 3.89 0.10 28.40
inci_tuberc inci_tuberc 135 115.53 142.02 0.00 615.00
prev_undernourishment prev_undernourishment 135 10.08 9.99 2.50 43.60

4.1.2 Gambar 1. Distribusi Variabel (Z-score)

Xz      <- as.data.frame(scale(X))
Xz_long <- reshape2::melt(Xz)

ggplot(Xz_long, aes(x = variable, y = value)) +
  geom_boxplot(fill = "steelblue", alpha = 0.6, outlier.colour = "red", outlier.size = 1.5) +
  theme_minimal(base_size = 12) +
  labs(title = "Distribusi Variabel Indikator Kesehatan (Z-score)",
       x = "", y = "Z-score") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Gambar 1. Distribusi Variabel Indikator Kesehatan (Z-score)

Gambar 1. Distribusi Variabel Indikator Kesehatan (Z-score)

4.1.3 Gambar 2. Matriks Korelasi

corrplot(cor(X),
         method       = "color",
         type         = "upper",
         addCoef.col  = "black",
         tl.col       = "black",
         tl.srt       = 45,
         number.cex   = 0.65,
         title        = "Matriks Korelasi Antar Variabel",
         mar          = c(0, 0, 2, 0))
Gambar 2. Matriks Korelasi Antar Variabel

Gambar 2. Matriks Korelasi Antar Variabel


4.2 Uji Asumsi

r_mat <- cor(X)

# 1. Uji Bartlett's Test of Sphericity
bartlett_test <- cortest.bartlett(r_mat, n = nrow(X))
cat("--- Uji Bartlett's Test of Sphericity ---\n")
## --- Uji Bartlett's Test of Sphericity ---
cat("Chi-square =", round(bartlett_test$chisq, 3),
    "| df =", bartlett_test$df,
    "| p-value =", format.pval(bartlett_test$p.value), "\n\n")
## Chi-square = 1552.087 | df = 36 | p-value = < 2.22e-16
# 2. Uji KMO
kmo_test <- KMO(r_mat)
cat("--- Uji KMO (Kaiser-Meyer-Olkin) ---\n")
## --- Uji KMO (Kaiser-Meyer-Olkin) ---
print(kmo_test)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r_mat)
## Overall MSA =  0.85
## MSA for each item = 
##            health_exp           life_expect    maternal_mortality 
##                  0.70                  0.90                  0.90 
##      infant_mortality    neonatal_mortality     under_5_mortality 
##                  0.79                  0.93                  0.81 
##              prev_hiv           inci_tuberc prev_undernourishment 
##                  0.60                  0.85                  0.95
# 3. Determinant matriks korelasi
cat("\nDeterminant matriks korelasi:", det(r_mat), "\n")
## 
## Determinant matriks korelasi: 6.630415e-06

Interpretasi:

  • Bartlett’s Test: p-value < 0,05 → matriks korelasi bukan matriks identitas, analisis faktor layak dilakukan.
  • KMO: Nilai ≥ 0,80 dikategorikan baik (Kaiser, 1974), artinya kecukupan sampling memadai.
  • Determinant: Mendekati nol namun tidak nol → terdapat korelasi yang memadai tanpa singularitas.

4.3 Principal Component Analysis (PCA)

4.3.1 Eigenvalue & Proporsi Varians

pca_model <- prcomp(X, scale. = TRUE)

eig <- get_eigenvalue(pca_model)
knitr::kable(round(eig, 3),
             caption = "Tabel 2. Eigenvalue dan Proporsi Varians PCA",
             col.names = c("Eigenvalue", "% Varians", "% Varians Kumulatif"))
Tabel 2. Eigenvalue dan Proporsi Varians PCA
Eigenvalue % Varians % Varians Kumulatif
Dim.1 5.914 65.716 65.716
Dim.2 1.100 12.218 77.934
Dim.3 0.832 9.248 87.182
Dim.4 0.437 4.855 92.037
Dim.5 0.408 4.534 96.571
Dim.6 0.159 1.772 98.343
Dim.7 0.078 0.870 99.213
Dim.8 0.062 0.688 99.901
Dim.9 0.009 0.099 100.000

4.3.2 Gambar 3. Scree Plot

fviz_eig(pca_model, addlabels = TRUE, ylim = c(0, 70)) +
  labs(title = "Scree Plot PCA") +
  theme_minimal(base_size = 12)
Gambar 3. Scree Plot PCA

Gambar 3. Scree Plot PCA

4.3.3 Loading PC1 & PC2

loadings_pca <- round(pca_model$rotation[, 1:2], 3)
knitr::kable(as.data.frame(loadings_pca),
             caption = "Tabel 3. Loading PCA pada PC1 dan PC2")
Tabel 3. Loading PCA pada PC1 dan PC2
PC1 PC2
health_exp -0.191 0.341
life_expect -0.395 0.009
maternal_mortality 0.364 -0.172
infant_mortality 0.397 -0.111
neonatal_mortality 0.390 -0.132
under_5_mortality 0.390 -0.139
prev_hiv 0.168 0.801
inci_tuberc 0.285 0.390
prev_undernourishment 0.321 0.111

Interpretasi:

  • PC1 (65,72% varians): loading positif besar pada variabel mortalitas (infant_mortality, neonatal_mortality, under_5_mortality, maternal_mortality) dan negatif pada life_expect → merepresentasikan Beban Mortalitas Umum.
  • PC2 (12,22% varians): loading positif dominan pada prev_hiv dan inci_tuberc → merepresentasikan Beban Penyakit Menular.

4.3.4 Gambar 4. Biplot PCA

fviz_pca_biplot(pca_model,
                repel     = TRUE,
                col.var   = "red",
                col.ind   = "grey60",
                label     = "var") +
  labs(title = "Biplot PCA — Indikator Kesehatan Global 2019") +
  theme_minimal(base_size = 11)
Gambar 4. Biplot PCA

Gambar 4. Biplot PCA

4.3.5 Skor PC per Negara

pc_scores <- as.data.frame(pca_model$x[, 1:2])
pc_scores$country <- rownames(pc_scores)

cat("--- 5 Negara dengan skor PC1 TERTINGGI (kondisi kesehatan terburuk) ---\n")
## --- 5 Negara dengan skor PC1 TERTINGGI (kondisi kesehatan terburuk) ---
knitr::kable(head(pc_scores[order(-pc_scores$PC1), c("country","PC1","PC2")], 5),
             digits = 3, row.names = FALSE)
country PC1 PC2
Central African Republic 6.350 0.306
Chad 6.338 -1.765
Sierra Leone 4.992 -0.223
Guinea-Bissau 4.962 0.039
Congo, Dem. Rep. 4.857 -0.777
cat("--- 5 Negara dengan skor PC1 TERENDAH (kondisi kesehatan terbaik) ---\n")
## --- 5 Negara dengan skor PC1 TERENDAH (kondisi kesehatan terbaik) ---
knitr::kable(head(pc_scores[order(pc_scores$PC1), c("country","PC1","PC2")], 5),
             digits = 3, row.names = FALSE)
country PC1 PC2
Switzerland -2.893 0.441
United States -2.889 1.148
Germany -2.838 0.516
France -2.832 0.497
Spain -2.809 0.250

4.4 Factor Analysis (FA)

4.4.1 Gambar 5. Parallel Analysis

pa <- fa.parallel(X, fa = "fa", n.iter = 100,
                  main = "Parallel Analysis — Penentuan Jumlah Faktor")
Gambar 5. Parallel Analysis untuk Penentuan Jumlah Faktor

Gambar 5. Parallel Analysis untuk Penentuan Jumlah Faktor

## Parallel analysis suggests that the number of factors =  2  and the number of components =  NA
cat("\nJumlah faktor disarankan:", pa$nfact, "\n")
## 
## Jumlah faktor disarankan: 2

4.4.2 Ekstraksi Faktor & Loading

fa_model <- fa(X, nfactors = 2, rotate = "varimax", fm = "ml")

cat("--- Loading Factor Analysis (cutoff 0.3) ---\n")
## --- Loading Factor Analysis (cutoff 0.3) ---
print(fa_model$loadings, cutoff = 0.3)
## 
## Loadings:
##                       ML1    ML2   
## health_exp            -0.303 -0.317
## life_expect           -0.799 -0.517
## maternal_mortality     0.834       
## infant_mortality       0.940  0.333
## neonatal_mortality     0.886  0.370
## under_5_mortality      0.954       
## prev_hiv                      0.596
## inci_tuberc            0.396  0.651
## prev_undernourishment  0.553  0.494
## 
##                  ML1   ML2
## SS loadings    4.478 1.785
## Proportion Var 0.498 0.198
## Cumulative Var 0.498 0.696

4.4.3 Communalities

communalities <- data.frame(
  Variabel    = names(fa_model$communality),
  Communality = round(fa_model$communality, 3)
)
knitr::kable(communalities,
             caption = "Tabel 4. Communalities Factor Analysis",
             align   = c("l","c"))
Tabel 4. Communalities Factor Analysis
Variabel Communality
health_exp health_exp 0.192
life_expect life_expect 0.905
maternal_mortality maternal_mortality 0.771
infant_mortality infant_mortality 0.994
neonatal_mortality neonatal_mortality 0.922
under_5_mortality under_5_mortality 0.982
prev_hiv prev_hiv 0.365
inci_tuberc inci_tuberc 0.580
prev_undernourishment prev_undernourishment 0.550

Interpretasi:

  • Faktor 1 (Kondisi Mortalitas Umum): loading tinggi pada under_5_mortality, infant_mortality, neonatal_mortality, maternal_mortality, dan negatif pada life_expect.
  • Faktor 2 (Beban Penyakit Menular): loading dominan pada prev_hiv dan inci_tuberc.
  • Communality rendah pada health_exp mengindikasikan variabel ini memiliki varians unik yang tidak dijelaskan kedua faktor.

4.4.4 Gambar 6. Diagram Factor Analysis

fa.diagram(fa_model, main = "Diagram Factor Analysis — Varimax")
Gambar 6. Diagram Factor Analysis

Gambar 6. Diagram Factor Analysis

4.4.5 Skor Faktor per Negara

fa_scores <- as.data.frame(fa_model$scores)
colnames(fa_scores) <- c("Faktor1", "Faktor2")
fa_scores$country   <- rownames(X)

knitr::kable(head(fa_scores[, c("country","Faktor1","Faktor2")], 10),
             caption = "Tabel 5. Contoh Skor Faktor (10 Negara Pertama)",
             digits  = 3, row.names = FALSE)
Tabel 5. Contoh Skor Faktor (10 Negara Pertama)
country Faktor1 Faktor2
Afghanistan 1.387 0.273
Angola 1.346 0.553
Albania -0.554 -0.610
United Arab Emirates -0.720 -0.617
Argentina -0.538 -0.523
Armenia -0.486 -0.461
Australia -0.683 -1.039
Azerbaijan -0.192 -0.154
Belgium -0.703 -0.959
Benin 2.224 -0.669

5 Kesimpulan

  1. Data indikator kesehatan 135 negara tahun 2019 menunjukkan ketimpangan yang signifikan, terutama pada variabel mortalitas di negara-negara Afrika Sub-Sahara.
  2. Uji asumsi terpenuhi: KMO = 0,851 (baik) dan Bartlett signifikan (p < 0,001).
  3. PCA menghasilkan 2 komponen utama yang menjelaskan 77,93% total varians: PC1 (Beban Mortalitas, 65,72%) dan PC2 (Beban Penyakit Menular, 12,22%).
  4. FA dengan rotasi varimax mengidentifikasi 2 faktor laten yang menjelaskan 69,5% total varians: Faktor 1 (Kondisi Mortalitas Umum) dan Faktor 2 (Penyakit Menular).
  5. Negara Afrika Sub-Sahara (Central African Republic, Chad) memiliki kondisi terburuk; negara Eropa Barat (Swiss, Jerman) terbaik.

6 Daftar Pustaka

  • Gorsuch, R. L. (2014). Factor analysis (2nd ed.). Lawrence Erlbaum Associates.
  • Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (6th ed.). Pearson.
  • Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36.
  • UNDP. (2022). Human Development Report 2021–2022. https://hdr.undp.org
  • World Bank. (2023). World Development Indicators. https://databank.worldbank.org

Kode lengkap tersedia di RPubs. Dataset: World Bank Health Data 2019.