Laporan ini menyajikan Principal Component Analysis (PCA) yang komprehensif pada dataset komposisi tubuh dan biomarker pasien gallstone menggunakan metodologi 7-Stage.
Temuan Utama:
Catatan: Nilai spesifik pada Executive Summary akan terisi otomatis setelah analisis dijalankan.
# >> GANTI PATH SESUAI LOKASI FILE <<
data_raw <- read_excel("file_show.xlsx")
cat(sprintf("Dimensi data mentah: %d baris × %d kolom\n",
nrow(data_raw), ncol(data_raw)))## Dimensi data mentah: 319 baris × 39 kolom
data.frame(
Informasi = c("Sumber Dataset","Jumlah Observasi (raw)",
"Jumlah Variabel (raw)","Tujuan Analisis",
"Metode","Software"),
Detail = c(
"Data Klinis - Body Composition & Biomarker Gallstone",
paste0(nrow(data_raw), " pasien"),
paste0(ncol(data_raw), " variabel"),
"Reduksi dimensi & identifikasi struktur laten",
"Principal Component Analysis (PCA) + VARIMAX Rotation",
paste0("R ", R.version$major, ".", R.version$minor)
)
) %>% kb(caption = "Informasi Umum Dataset")| Informasi | Detail |
|---|---|
| Sumber Dataset | Data Klinis - Body Composition & Biomarker Gallstone |
| Jumlah Observasi (raw) | 319 pasien |
| Jumlah Variabel (raw) | 39 variabel |
| Tujuan Analisis | Reduksi dimensi & identifikasi struktur laten |
| Metode | Principal Component Analysis (PCA) + VARIMAX Rotation |
| Software | R 4.5.2 |
categorical_vars <- c(
"Gallstone Status",
"Gender",
"Comorbidity",
"Coronary Artery Disease (CAD)",
"Hypothyroidism",
"Hyperlipidemia",
"Diabetes Mellitus (DM)"
)
data_num <- data_raw %>%
select(-any_of(categorical_vars)) %>%
select(where(is.numeric)) %>%
select(where(~ n_distinct(., na.rm = TRUE) > 2)) %>%
na.omit()
data.frame(
No = seq_along(names(data_num)),
Variabel = names(data_num)
) %>%
kb(caption = "Variabel Numerik Kontinu yang Digunakan dalam PCA") %>%
scroll_box(width = "100%", height = "350px")| No | Variabel |
|---|---|
| 1 | Age |
| 2 | Height |
| 3 | Weight |
| 4 | Body Mass Index (BMI) |
| 5 | Total Body Water (TBW) |
| 6 | Extracellular Water (ECW) |
| 7 | Intracellular Water (ICW) |
| 8 | Extracellular Fluid/Total Body Water (ECF/TBW) |
| 9 | Total Body Fat Ratio (TBFR) (%) |
| 10 | Lean Mass (LM) (%) |
| 11 | Body Protein Content (Protein) (%) |
| 12 | Visceral Fat Rating (VFR) |
| 13 | Bone Mass (BM) |
| 14 | Muscle Mass (MM) |
| 15 | Obesity (%) |
| 16 | Total Fat Content (TFC) |
| 17 | Visceral Fat Area (VFA) |
| 18 | Visceral Muscle Area (VMA) (Kg) |
| 19 | Hepatic Fat Accumulation (HFA) |
| 20 | Glucose |
| 21 | Total Cholesterol (TC) |
| 22 | Low Density Lipoprotein (LDL) |
| 23 | High Density Lipoprotein (HDL) |
| 24 | Triglyceride |
| 25 | Aspartat Aminotransferaz (AST) |
| 26 | Alanin Aminotransferaz (ALT) |
| 27 | Alkaline Phosphatase (ALP) |
| 28 | Creatinine |
| 29 | Glomerular Filtration Rate (GFR) |
| 30 | C-Reactive Protein (CRP) |
| 31 | Hemoglobin (HGB) |
| 32 | Vitamin D |
Variabel yang dikeluarkan (binary/kategorikal — tidak cocok untuk PCA): Gallstone Status, Gender, Comorbidity, Coronary Artery Disease (CAD), Hypothyroidism, Hyperlipidemia, Diabetes Mellitus (DM)
desc_stats <- data.frame(
Variabel = names(data_num),
N = sapply(data_num, function(x) sum(!is.na(x))),
Mean = round(sapply(data_num, mean, na.rm = TRUE), 3),
Median = round(sapply(data_num, median, na.rm = TRUE), 3),
SD = round(sapply(data_num, sd, na.rm = TRUE), 3),
Min = round(sapply(data_num, min, na.rm = TRUE), 3),
Max = round(sapply(data_num, max, na.rm = TRUE), 3),
Skewness = round(sapply(data_num, psych::skew), 3),
Kurtosis = round(sapply(data_num, psych::kurtosi), 3)
)
desc_stats %>%
kb(caption = "Statistik Deskriptif Variabel Numerik") %>%
scroll_box(width = "100%", height = "400px")| Variabel | N | Mean | Median | SD | Min | Max | Skewness | Kurtosis | |
|---|---|---|---|---|---|---|---|---|---|
| Age | Age | 319 | 48.069 | 49.000 | 12.115 | 20.00 | 96.00 | 0.133 | 0.349 |
| Height | Height | 319 | 167.157 | 168.000 | 10.053 | 145.00 | 191.00 | -0.080 | -0.701 |
| Weight | Weight | 319 | 80.565 | 78.800 | 15.709 | 42.90 | 143.50 | 0.428 | 0.315 |
| Body Mass Index (BMI) | Body Mass Index (BMI) | 319 | 28.877 | 28.300 | 5.314 | 17.40 | 49.70 | 0.663 | 1.133 |
| Total Body Water (TBW) | Total Body Water (TBW) | 319 | 40.588 | 39.800 | 7.930 | 13.00 | 66.20 | 0.206 | -0.378 |
| Extracellular Water (ECW) | Extracellular Water (ECW) | 319 | 17.071 | 17.100 | 3.162 | 9.00 | 27.80 | 0.020 | -0.371 |
| Intracellular Water (ICW) | Intracellular Water (ICW) | 319 | 23.634 | 23.000 | 5.349 | 13.80 | 57.10 | 0.943 | 3.468 |
| Extracellular Fluid/Total Body Water (ECF/TBW) | Extracellular Fluid/Total Body Water (ECF/TBW) | 319 | 42.212 | 42.000 | 3.244 | 29.23 | 52.00 | -0.509 | 1.374 |
| Total Body Fat Ratio (TBFR) (%) | Total Body Fat Ratio (TBFR) (%) | 319 | 28.275 | 27.820 | 8.444 | 6.30 | 50.92 | 0.133 | -0.525 |
| Lean Mass (LM) (%) | Lean Mass (LM) (%) | 319 | 71.638 | 72.110 | 8.438 | 48.99 | 93.67 | -0.124 | -0.507 |
| Body Protein Content (Protein) (%) | Body Protein Content (Protein) (%) | 319 | 15.939 | 15.870 | 2.335 | 5.56 | 24.81 | -0.047 | 1.820 |
| Visceral Fat Rating (VFR) | Visceral Fat Rating (VFR) | 319 | 9.078 | 9.000 | 4.333 | 1.00 | 31.00 | 0.790 | 2.134 |
| Bone Mass (BM) | Bone Mass (BM) | 319 | 2.803 | 2.800 | 0.509 | 1.40 | 4.00 | 0.205 | -0.825 |
| Muscle Mass (MM) | Muscle Mass (MM) | 319 | 54.273 | 53.900 | 10.604 | 4.70 | 78.80 | -0.102 | 0.359 |
| Obesity (%) | Obesity (%) | 319 | 35.850 | 25.600 | 109.800 | 0.40 | 1954.00 | 16.715 | 288.973 |
| Total Fat Content (TFC) | Total Fat Content (TFC) | 319 | 23.488 | 22.600 | 9.608 | 3.10 | 62.50 | 0.802 | 1.115 |
| Visceral Fat Area (VFA) | Visceral Fat Area (VFA) | 319 | 12.172 | 11.590 | 5.262 | 0.90 | 41.00 | 1.048 | 2.919 |
| Visceral Muscle Area (VMA) (Kg) | Visceral Muscle Area (VMA) (Kg) | 319 | 30.403 | 30.408 | 4.461 | 18.90 | 41.10 | -0.056 | -0.473 |
| Hepatic Fat Accumulation (HFA) | Hepatic Fat Accumulation (HFA) | 319 | 1.150 | 1.000 | 1.059 | 0.00 | 4.00 | 0.174 | -1.357 |
| Glucose | Glucose | 319 | 108.689 | 98.000 | 44.849 | 69.00 | 575.00 | 5.882 | 45.419 |
| Total Cholesterol (TC) | Total Cholesterol (TC) | 319 | 203.495 | 198.000 | 45.759 | 60.00 | 360.00 | 0.430 | 0.485 |
| Low Density Lipoprotein (LDL) | Low Density Lipoprotein (LDL) | 319 | 126.652 | 122.000 | 38.541 | 11.00 | 293.00 | 0.535 | 1.080 |
| High Density Lipoprotein (HDL) | High Density Lipoprotein (HDL) | 319 | 49.476 | 46.500 | 17.719 | 25.00 | 273.00 | 6.468 | 77.178 |
| Triglyceride | Triglyceride | 319 | 144.502 | 119.000 | 97.904 | 1.39 | 838.00 | 2.760 | 12.649 |
| Aspartat Aminotransferaz (AST) | Aspartat Aminotransferaz (AST) | 319 | 21.685 | 18.000 | 16.698 | 8.00 | 195.00 | 6.929 | 60.062 |
| Alanin Aminotransferaz (ALT) | Alanin Aminotransferaz (ALT) | 319 | 26.856 | 19.000 | 27.884 | 3.00 | 372.00 | 7.209 | 76.924 |
| Alkaline Phosphatase (ALP) | Alkaline Phosphatase (ALP) | 319 | 73.113 | 71.000 | 24.181 | 7.00 | 197.00 | 0.788 | 2.507 |
| Creatinine | Creatinine | 319 | 0.801 | 0.790 | 0.176 | 0.46 | 1.46 | 0.615 | 0.135 |
| Glomerular Filtration Rate (GFR) | Glomerular Filtration Rate (GFR) | 319 | 100.819 | 104.000 | 16.971 | 10.60 | 132.00 | -1.796 | 6.391 |
| C-Reactive Protein (CRP) | C-Reactive Protein (CRP) | 319 | 1.854 | 0.215 | 4.990 | 0.00 | 43.40 | 5.360 | 33.233 |
| Hemoglobin (HGB) | Hemoglobin (HGB) | 319 | 14.418 | 14.400 | 1.776 | 8.50 | 18.80 | -0.381 | 0.133 |
| Vitamin D | Vitamin D | 319 | 21.401 | 22.000 | 9.982 | 3.50 | 53.10 | 0.276 | -0.238 |
Variabel dengan |Skewness| > 1 menunjukkan
distribusi menceng. Karena PCA berbasis matriks korelasi menggunakan
scale = TRUE, data distandardisasi (Z-score) sehingga
perbedaan satuan dan skala tidak mempengaruhi hasil analisis.
Tujuan PCA pada dataset ini:
Data Summarization — Mengidentifikasi struktur laten dari variabel komposisi tubuh dan biomarker. Apakah variabel-variabel ini dapat dikelompokkan menjadi dimensi bermakna seperti “komposisi lemak”, “metabolisme”, atau “komposisi otot/tulang”?
Data Reduction — Mengkompres ruang dimensi tinggi menjadi sejumlah kecil Principal Component yang mempertahankan sebagian besar variansi, memungkinkan analisis lebih efisien dan mudah diinterpretasi.
Syarat desain PCA (Hair et al., 2019):
n_raw_d <- nrow(data_num)
p_raw_d <- ncol(data_num)
data.frame(
Kriteria = c("Semua variabel numerik","Obs (n >= 50)",
"Rasio n/p >= 5:1","Tipe analisis",
"Missing values"),
Nilai = c("Ya — semua kontinu",
as.character(n_raw_d),
sprintf("%.1f : 1", n_raw_d / p_raw_d),
"R-type (korelasi antar variabel)",
"Ditangani dengan na.omit()"),
Syarat = c("Wajib", ">= 50", ">= 5:1", "Standard", "Harus ditangani"),
Status = c("✅ Terpenuhi","✅ Terpenuhi",
ifelse(n_raw_d/p_raw_d >= 5,"✅ Terpenuhi","⚠️ Perhatian"),
"✅ Terpenuhi","✅ Terpenuhi")
) %>%
kb(caption = "Stage 2: Design Checklist") %>%
column_spec(4, bold = TRUE,
color = ifelse(
c(TRUE,TRUE, n_raw_d/p_raw_d>=5, TRUE,TRUE),
"darkgreen","darkorange"))| Kriteria | Nilai | Syarat | Status |
|---|---|---|---|
| Semua variabel numerik | Ya — semua kontinu | Wajib | ✅ Terpenuhi | |
| Obs (n >= 50) | 319 | >= 50 | ✅ Terpenuhi | |
| Rasio n/p >= 5:1 | 10.0 : 1 | >= 5:1 | ✅ Terpenuhi | |
| Tipe analisis | R-type (korelasi antar variabel) | Standard | ✅ Terpenuhi | |
| Missing values | Ditangani dengan na.omit() | Harus ditangani | ✅ Terpenuhi | |
data_pca_work <- data_num
# 2a. Zero variance
var_vals <- apply(data_pca_work, 2, var, na.rm = TRUE)
zero_var <- names(var_vals[var_vals < 1e-10])
if (length(zero_var) > 0) {
cat("Drop (zero variance):", paste(zero_var, collapse=", "), "\n")
data_pca_work <- data_pca_work %>% select(-all_of(zero_var))
} else cat("Tidak ada zero-variance variable.\n")## Tidak ada zero-variance variable.
# 2b. Perfect correlation
cor_tmp <- cor(data_pca_work, use = "complete.obs")
perf_idx <- which(abs(cor_tmp) > 0.9999 & upper.tri(cor_tmp), arr.ind = TRUE)
if (nrow(perf_idx) > 0) {
drop_perf <- unique(colnames(cor_tmp)[perf_idx[,2]])
cat("Drop (perfect corr):", paste(drop_perf, collapse=", "), "\n")
data_pca_work <- data_pca_work %>% select(-all_of(drop_perf))
} else cat("Tidak ada perfect correlation.\n")## Tidak ada perfect correlation.
## Variabel setelah pre-cleaning: 32
Syarat: Minimal 30% pasang variabel harus memiliki korelasi signifikan |r| > 0.3. Jika < 30%, PCA tidak tepat digunakan.
Formula Pearson: \[r_{xy} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2} \cdot \sqrt{\sum_{i=1}^{n}(y_i - \bar{y})^2}}\]
mat_corr <- round(cor(data_pca_work, use = "complete.obs"), 3)
if (any(is.na(mat_corr))) {
na_vars <- names(which(colSums(is.na(mat_corr)) > 0))
data_pca_work <- data_pca_work %>% select(-all_of(na_vars))
mat_corr <- round(cor(data_pca_work, use = "complete.obs"), 3)
}
n_var_c <- ncol(data_pca_work)
n_pairs <- n_var_c * (n_var_c - 1) / 2
mat_abs <- abs(mat_corr); diag(mat_abs) <- 0
n_sig <- sum(mat_abs > 0.3) / 2
pct_sig <- round(n_sig / n_pairs * 100, 1)
data.frame(
Metrik = c("Total variabel","Total pasangan unik",
"Pasangan |r| > 0.3","Persentase signifikan",
"Syarat minimal","Keputusan"),
Nilai = c(n_var_c, n_pairs, n_sig,
paste0(pct_sig, "%"), "> 30%",
ifelse(pct_sig > 30,
"✅ TERPENUHI — Data eligible untuk PCA",
"❌ TIDAK TERPENUHI"))
) %>%
kb(caption = "Penilaian Matriks Korelasi") %>%
row_spec(6, bold = TRUE, color = ifelse(pct_sig > 30,"darkgreen","red"))| Metrik | Nilai |
|---|---|
| Total variabel | 32 |
| Total pasangan unik | 496 |
| Pasangan |r| > 0.3 | 145 |
| Persentase signifikan | 29.2% |
| Syarat minimal | > 30% |
| Keputusan | ❌ TIDAK TERPENUHI | |
corrplot(mat_corr,
method = "color", type = "upper",
tl.cex = 0.58, tl.col = "black",
col = colorRampPalette(c("#2166AC","white","#D6604D"))(200),
title = "Matriks Korelasi — Body Composition & Biomarker",
mar = c(0,0,2,0))Matriks Korelasi — Merah: korelasi positif, Biru: korelasi negatif
Interpretasi: 29.2% pasang variabel memiliki |r| > 0.3, tidak mencapai ambang batas 30%. Heatmap menunjukkan adanya kluster korelasi antar kelompok variabel, mengkonfirmasi adanya struktur laten yang dapat diekstraksi melalui PCA.
Bartlett’s Test menguji apakah matriks korelasi secara signifikan berbeda dari matriks identitas (semua variabel tidak berkorelasi).
Formula: \[\chi^2 = -\left[(n-1) - \frac{2p+5}{6}\right] \ln|R|\]
n_obs_b <- nrow(data_pca_work)
bart_res <- cortest.bartlett(mat_corr, n = n_obs_b, diag = TRUE)
p_val_b <- bart_res$p.value
p_disp_b <- ifelse(is.na(p_val_b) || p_val_b == 0,
"≈ 0 (< 2.2e-16)", format(p_val_b, scientific = TRUE))
b_pass <- is.na(p_val_b) || p_val_b < 0.05
data.frame(
Statistik = c("Chi-square (χ²)","Degrees of Freedom","p-value",
"Level Signifikansi","Keputusan"),
Nilai = c(
round(bart_res$chisq, 4), bart_res$df, p_disp_b, "α = 0.05",
ifelse(b_pass,
"✅ TOLAK H₀ — Korelasi signifikan, PCA dapat dilanjutkan",
"❌ GAGAL TOLAK H₀ — Data tidak cocok untuk PCA")
)
) %>%
kb(caption = "Bartlett's Test of Sphericity") %>%
row_spec(5, bold = TRUE, color = ifelse(b_pass, "darkgreen","red"))| Statistik | Nilai |
|---|---|
| Chi-square (χ²) | 13198.5697 |
| Degrees of Freedom | 496 |
| p-value | ≈ 0 (< 2.2e-16) |
| Level Signifikansi | α = 0.05 |
| Keputusan | ✅ TOLAK H₀ — Korelasi signifikan, PCA dapat dilanjutkan | |
Interpretasi: χ² = 1.319857^{4} dengan df = 496 dan p ≈ 0 (< 2.2e-16). H₀ ditolak — korelasi antar variabel signifikan secara statistik, PCA justified ✅.
Kaiser-Meyer-Olkin (KMO) mengukur proporsi variansi antar variabel yang disebabkan oleh faktor umum. Nilai berkisar 0–1.
| Nilai KMO | Klasifikasi |
|---|---|
| ≥ 0.90 | Marvelous |
| ≥ 0.80 | Meritorious |
| ≥ 0.70 | Middling |
| ≥ 0.60 | Mediocre |
| ≥ 0.50 | Miserable |
| < 0.50 | Unacceptable ❌ → variabel harus dibuang |
Prosedur: Variabel dengan MSA < 0.50 dibuang satu per satu (terkecil dahulu), KMO dihitung ulang setiap iterasi.
kmo_init <- KMO(mat_corr)
kmo_init_kat <- case_when(
kmo_init$MSA >= 0.90 ~ "Marvelous",
kmo_init$MSA >= 0.80 ~ "Meritorious",
kmo_init$MSA >= 0.70 ~ "Middling",
kmo_init$MSA >= 0.60 ~ "Mediocre",
kmo_init$MSA >= 0.50 ~ "Miserable",
TRUE ~ "Unacceptable"
)
cat(sprintf("KMO Overall AWAL: %.4f (%s)\n", kmo_init$MSA, kmo_init_kat))## KMO Overall AWAL: 0.8025 (Meritorious)
msa_awal_df <- data.frame(
No = seq_along(kmo_init$MSAi),
Variabel = names(kmo_init$MSAi),
MSA = round(kmo_init$MSAi, 4),
Kategori = case_when(
kmo_init$MSAi >= 0.90 ~ "Marvelous",
kmo_init$MSAi >= 0.80 ~ "Meritorious",
kmo_init$MSAi >= 0.70 ~ "Middling",
kmo_init$MSAi >= 0.60 ~ "Mediocre",
kmo_init$MSAi >= 0.50 ~ "Miserable",
TRUE ~ "⚠️ Unacceptable (Drop)"
)
) %>% arrange(MSA)
msa_awal_df %>%
kb(caption = "MSA per Variabel — Awal (urut ascending)") %>%
column_spec(4, bold = TRUE,
color = case_when(
msa_awal_df$MSA >= 0.80 ~ "darkgreen",
msa_awal_df$MSA >= 0.60 ~ "#E67E22",
msa_awal_df$MSA >= 0.50 ~ "#D35400",
TRUE ~ "red"
)) %>%
scroll_box(width = "100%", height = "380px")| No | Variabel | MSA | Kategori | |
|---|---|---|---|---|
| Total Cholesterol (TC) | 21 | Total Cholesterol (TC) | 0.3981 | ⚠️ Unacceptable (Drop) |
| Low Density Lipoprotein (LDL) | 22 | Low Density Lipoprotein (LDL) | 0.4013 | ⚠️ Unacceptable (Drop) |
| Extracellular Fluid/Total Body Water (ECF/TBW) | 8 | Extracellular Fluid/Total Body Water (ECF/TBW) | 0.4840 | ⚠️ Unacceptable (Drop) |
| Alkaline Phosphatase (ALP) | 27 | Alkaline Phosphatase (ALP) | 0.4933 | ⚠️ Unacceptable (Drop) |
| Age | 1 | Age | 0.5574 | Miserable |
| C-Reactive Protein (CRP) | 30 | C-Reactive Protein (CRP) | 0.5675 | Miserable |
| Glomerular Filtration Rate (GFR) | 29 | Glomerular Filtration Rate (GFR) | 0.5882 | Miserable |
| Aspartat Aminotransferaz (AST) | 25 | Aspartat Aminotransferaz (AST) | 0.5963 | Miserable |
| Triglyceride | 24 | Triglyceride | 0.6454 | Mediocre |
| Alanin Aminotransferaz (ALT) | 26 | Alanin Aminotransferaz (ALT) | 0.6602 | Mediocre |
| High Density Lipoprotein (HDL) | 23 | High Density Lipoprotein (HDL) | 0.6700 | Mediocre |
| Vitamin D | 32 | Vitamin D | 0.7438 | Middling |
| Glucose | 20 | Glucose | 0.7488 | Middling |
| Extracellular Water (ECW) | 6 | Extracellular Water (ECW) | 0.7706 | Middling |
| Body Mass Index (BMI) | 4 | Body Mass Index (BMI) | 0.7795 | Middling |
| Height | 2 | Height | 0.7856 | Middling |
| Visceral Fat Rating (VFR) | 12 | Visceral Fat Rating (VFR) | 0.7894 | Middling |
| Weight | 3 | Weight | 0.8067 | Meritorious |
| Total Body Water (TBW) | 5 | Total Body Water (TBW) | 0.8204 | Meritorious |
| Visceral Fat Area (VFA) | 17 | Visceral Fat Area (VFA) | 0.8240 | Meritorious |
| Body Protein Content (Protein) (%) | 11 | Body Protein Content (Protein) (%) | 0.8244 | Meritorious |
| Intracellular Water (ICW) | 7 | Intracellular Water (ICW) | 0.8263 | Meritorious |
| Lean Mass (LM) (%) | 10 | Lean Mass (LM) (%) | 0.8359 | Meritorious |
| Obesity (%) | 15 | Obesity (%) | 0.8483 | Meritorious |
| Total Body Fat Ratio (TBFR) (%) | 9 | Total Body Fat Ratio (TBFR) (%) | 0.8592 | Meritorious |
| Creatinine | 28 | Creatinine | 0.8636 | Meritorious |
| Total Fat Content (TFC) | 16 | Total Fat Content (TFC) | 0.9237 | Marvelous |
| Hepatic Fat Accumulation (HFA) | 19 | Hepatic Fat Accumulation (HFA) | 0.9326 | Marvelous |
| Hemoglobin (HGB) | 31 | Hemoglobin (HGB) | 0.9450 | Marvelous |
| Bone Mass (BM) | 13 | Bone Mass (BM) | 0.9647 | Marvelous |
| Muscle Mass (MM) | 14 | Muscle Mass (MM) | 0.9680 | Marvelous |
| Visceral Muscle Area (VMA) (Kg) | 18 | Visceral Muscle Area (VMA) (Kg) | 0.9813 | Marvelous |
drop_log <- c()
data_ok <- data_pca_work
iter_c <- 0
max_iter <- ncol(data_pca_work) - 5
cat("--- Proses Iteratif KMO ---\n")## --- Proses Iteratif KMO ---
repeat {
iter_c <- iter_c + 1
if (iter_c > max_iter) { cat("Max iterasi tercapai.\n"); break }
mc <- tryCatch(round(cor(data_ok, "complete.obs"), 3), error = function(e) NULL)
if (is.null(mc)) { cat("Matriks singular.\n"); break }
det_v <- tryCatch(det(mc), error = function(e) NA)
if (is.na(det_v) || det_v < 1e-15) { cat("Hampir singular.\n"); break }
kmo_t <- tryCatch(KMO(mc), error = function(e) NULL)
if (is.null(kmo_t)) { cat("KMO gagal.\n"); break }
msa_c <- kmo_t$MSAi[!is.na(kmo_t$MSAi)]
if (!length(msa_c)) break
min_msa <- min(msa_c)
min_var <- names(which.min(msa_c))
if (min_msa >= 0.50) {
cat(sprintf("Iter %d: Semua MSA >= 0.50 → Selesai ✓\n", iter_c))
break
}
cat(sprintf("Iter %d: Drop '%s' (MSA = %.4f)\n", iter_c, min_var, min_msa))
drop_log <- c(drop_log, min_var)
data_ok <- data_ok %>% select(-all_of(min_var))
}## Matriks singular.
data_pca_f <- data_ok
mat_corr_f <- round(cor(data_pca_f, use = "complete.obs"), 3)
kmo_final <- KMO(mat_corr_f)
p_final <- ncol(data_pca_f)
n_final <- nrow(data_pca_f)
kmo_f_kat <- case_when(
kmo_final$MSA >= 0.90 ~ "Marvelous",
kmo_final$MSA >= 0.80 ~ "Meritorious",
kmo_final$MSA >= 0.70 ~ "Middling",
kmo_final$MSA >= 0.60 ~ "Mediocre",
kmo_final$MSA >= 0.50 ~ "Miserable",
TRUE ~ "Unacceptable"
)data.frame(
Metrik = c("Variabel Awal","Variabel Dibuang (MSA < 0.50)",
"Variabel Final","KMO Overall Final",
"Klasifikasi","Keputusan"),
Nilai = c(
ncol(data_pca_work), length(drop_log), p_final,
round(kmo_final$MSA, 4), kmo_f_kat,
"✅ Sampling adequacy terpenuhi — Lanjutkan PCA"
)
) %>%
kb(caption = "Ringkasan KMO Final") %>%
row_spec(6, bold = TRUE, color = "darkgreen")| Metrik | Nilai |
|---|---|
| Variabel Awal | 32 |
| Variabel Dibuang (MSA < 0.50) | 0 |
| Variabel Final | 32 |
| KMO Overall Final | 0.8025 |
| Klasifikasi | Meritorious |
| Keputusan | ✅ Sampling adequacy terpenuhi — Lanjutkan PCA | |
msa_f_df <- data.frame(
Variable = names(kmo_final$MSAi),
MSA = round(kmo_final$MSAi, 4),
Kategori = case_when(
kmo_final$MSAi >= 0.90 ~ "Marvelous",
kmo_final$MSAi >= 0.80 ~ "Meritorious",
kmo_final$MSAi >= 0.70 ~ "Middling",
kmo_final$MSAi >= 0.60 ~ "Mediocre",
kmo_final$MSAi >= 0.50 ~ "Miserable",
TRUE ~ "Unacceptable"
)
) %>% arrange(MSA) %>%
mutate(Short = abbreviate(Variable, minlength = 12),
Tier = factor(Kategori, levels = c("Unacceptable","Miserable","Mediocre",
"Middling","Meritorious","Marvelous")))
ggplot(msa_f_df, aes(x = reorder(Short, MSA), y = MSA, fill = Tier)) +
geom_col(width = 0.75) +
geom_text(aes(label = sprintf("%.3f", MSA)), hjust = -0.12, size = 2.8) +
geom_hline(yintercept = c(0.50,0.70,0.80,0.90),
linetype = "dashed",
color = c("#E74C3C","#E67E22","#27AE60","#2980B9"),
linewidth = 0.7) +
scale_fill_manual(values = c(
"Unacceptable"="#C0392B","Miserable"="#E67E22","Mediocre"="#F1C40F",
"Middling"="#ABD9E9","Meritorious"="#4575B4","Marvelous"="#1A5276"
)) +
coord_flip() + ylim(0, 1.12) +
labs(title = "KMO — MSA per Variabel (Final)",
subtitle = "Garis: 0.50 (merah) | 0.70 (oranye) | 0.80 (hijau) | 0.90 (biru)",
x = NULL, y = "MSA Value", fill = "Kategori KMO") +
theme_minimal(base_size = 11) +
theme(legend.position = "bottom")MSA per variabel — semua harus >= 0.50
Interpretasi: KMO Overall = 0.8025 (Meritorious) ✅. Tidak ada variabel yang perlu dibuang — semua memiliki MSA ≥ 0.50. Hasil ini menunjukkan bahwa data memiliki struktur faktor yang kuat (meritorious/marvelous) untuk dilanjutkan ke PCA.
n_d <- nrow(data_pca_f)
p_d <- ncol(data_pca_f)
data.frame(
Asumsi = c("1. Sample Size (n >= 50)",
"2. Rasio n/p (>= 5:1)",
"3. Korelasi |r|>0.3 (>30%)",
"4. Bartlett's Test (p < 0.05)",
"5. KMO Overall (>= 0.50)"),
Hasil = c(as.character(n_d),
sprintf("%.1f:1", n_d/p_d),
sprintf("%.1f%%", pct_sig),
p_disp_b,
sprintf("%.4f (%s)", kmo_final$MSA, kmo_f_kat)),
Syarat = c(">= 50",">= 5:1","> 30%","p < 0.05",">= 0.50"),
Status = c(
ifelse(n_d >= 50, "✅ Terpenuhi","❌ Tidak"),
ifelse(n_d/p_d >= 5, "✅ Terpenuhi","❌ Tidak"),
ifelse(pct_sig > 30, "✅ Terpenuhi","❌ Tidak"),
ifelse(b_pass, "✅ Terpenuhi","❌ Tidak"),
ifelse(kmo_final$MSA>=0.5, "✅ Terpenuhi","❌ Tidak")
)
) %>%
kb(caption = "Ringkasan Semua Asumsi PCA") %>%
column_spec(4, bold = TRUE,
color = ifelse(
c(n_d>=50, n_d/p_d>=5, pct_sig>30, b_pass, kmo_final$MSA>=0.5),
"darkgreen","red"))| Asumsi | Hasil | Syarat | Status |
|---|---|---|---|
|
319 | >= 50 | ✅ Terpenuhi | |
|
10.0:1 | >= 5:1 | ✅ Terpenuhi | |
|
29.2% | > 30% | ❌ Tidak | |
|
≈ 0 (< 2.2e-16) | p < 0.05 | ✅ Terpenuhi | |
|
0.8025 (Meritorious) | >= 0.50 | ✅ Terpenuhi | |
Kesimpulan: Terdapat asumsi yang tidak terpenuhi ❌ — periksa kembali sebelum melanjutkan.
Formula PCA:
Transformasi linear: Yi = ei1·Z1 + ei2·Z2 + … + eip·Zp
Loading (korelasi variabel-PC): ρ(Yi, Zk) = eik × √λi
di mana λi = eigenvalue ke-i, eik = elemen ke-k dari eigenvector ke-i
n_kaiser <- sum(eigenvalues > 1)
n_var70 <- which(cum_var >= 0.70)[1]
n_var80 <- which(cum_var >= 0.80)[1]
n_comp <- n_kaiser
eig_df <- data.frame(
Komponen = paste0("PC", 1:length(eigenvalues)),
Eigenvalue = round(eigenvalues, 4),
`Variansi(%)` = round(prop_var * 100, 2),
`Kumulatif(%)` = round(cum_var * 100, 2),
Keputusan = ifelse(eigenvalues > 1, "✅ Retain", "❌ Drop"),
check.names = FALSE
)
head(eig_df, 15) %>%
kb(caption = "Tabel Eigenvalue — 15 Komponen Pertama") %>%
column_spec(5, bold = TRUE,
color = ifelse(
head(eig_df,15)$Keputusan == "✅ Retain",
"darkgreen","red"))| Komponen | Eigenvalue | Variansi(%) | Kumulatif(%) | Keputusan |
|---|---|---|---|---|
| PC1 | 8.8113 | 27.54 | 27.54 | ✅ Retain | |
| PC2 | 6.3617 | 19.88 | 47.42 | ✅ Retain | |
| PC3 | 2.3086 | 7.21 | 54.63 | ✅ Retain | |
| PC4 | 1.8937 | 5.92 | 60.55 | ✅ Retain | |
| PC5 | 1.6441 | 5.14 | 65.69 | ✅ Retain | |
| PC6 | 1.4211 | 4.44 | 70.13 | ✅ Retain | |
| PC7 | 1.2277 | 3.84 | 73.96 | ✅ Retain | |
| PC8 | 1.0274 | 3.21 | 77.17 | ✅ Retain | |
| PC9 | 0.9704 | 3.03 | 80.21 | ❌ Drop | |
| PC10 | 0.8870 | 2.77 | 82.98 | ❌ Drop | |
| PC11 | 0.8219 | 2.57 | 85.55 | ❌ Drop | |
| PC12 | 0.7358 | 2.30 | 87.85 | ❌ Drop | |
| PC13 | 0.6433 | 2.01 | 89.86 | ❌ Drop | |
| PC14 | 0.6075 | 1.90 | 91.76 | ❌ Drop | |
| PC15 | 0.4319 | 1.35 | 93.11 | ❌ Drop | |
fviz_eig(pca_result, ncp = p_final, addlabels = TRUE,
barfill = "#3498DB", barcolor = "#2980B9", linecolor = "#E74C3C",
main = "Scree Plot — PCA Gallstone Dataset") +
geom_hline(yintercept = 100/p_final,
linetype = "dashed", color = "#E74C3C", linewidth = 0.9) +
annotate("text",
x = n_comp + 0.7,
y = eigenvalues[n_comp]/sum(eigenvalues)*100 + 1.5,
label = sprintf("Retain PC1–PC%d\n(%.2f%%)", n_comp, cum_var[n_comp]*100),
color = "#1A5276", size = 3.5, hjust = 0) +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))Scree Plot — Garis putus merah = rata-rata eigenvalue (kriteria Kaiser Rule)
data.frame(
Kriteria = c("Kaiser's Rule (eigenvalue > 1)",
"Variansi Kumulatif >= 70%",
"Variansi Kumulatif >= 80%"),
N_Komponen = c(n_kaiser, n_var70, n_var80),
Kumulatif = c(
sprintf("%.2f%%", cum_var[n_kaiser]*100),
sprintf("%.2f%%", cum_var[n_var70]*100),
sprintf("%.2f%%", cum_var[n_var80]*100)
),
Peran = c("✅ Kriteria UTAMA","— Pendukung","— Pendukung")
) %>%
kb(caption = "Perbandingan Kriteria Retensi Komponen") %>%
row_spec(1, bold = TRUE, background = "#EBF5FB")| Kriteria | N_Komponen | Kumulatif | Peran |
|---|---|---|---|
| Kaiser’s Rule (eigenvalue > 1) | 8 | 77.17% | ✅ Kriteria UTAMA | |
| Variansi Kumulatif >= 70% | 6 | 70.13% | — Pendukung |
| Variansi Kumulatif >= 80% | 9 | 80.21% | — Pendukung |
Keputusan: Berdasarkan Kaiser’s Rule sebagai kriteria utama, 8 principal component dipertahankan (PC1–PC8, semua eigenvalue > 1). Komponen ini secara total menjelaskan 77.17% variansi data.
loadings_mat <- pca_result$rotation[,1:n_comp] %*% diag(sqrt(eigenvalues[1:n_comp]))
colnames(loadings_mat) <- paste0("PC", 1:n_comp)
h2_u <- rowSums(loadings_mat^2)
load_df <- as.data.frame(round(loadings_mat, 3))
load_df$h2 <- round(h2_u, 3)
load_df <- load_df %>% arrange(desc(h2))
low_h2_vars <- rownames(load_df)[load_df$h2 < 0.50]
load_df %>%
kb(caption = "Unrotated Component Loading Matrix + Communality (h²)") %>%
column_spec(n_comp + 2, bold = TRUE,
color = ifelse(load_df$h2 >= 0.70, "darkgreen",
ifelse(load_df$h2 >= 0.50, "#E67E22", "red"))) %>%
scroll_box(width = "100%", height = "450px")| PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | h2 | |
|---|---|---|---|---|---|---|---|---|---|
| Weight | -0.885 | 0.398 | 0.130 | -0.012 | 0.099 | 0.010 | -0.102 | -0.005 | 0.979 |
| Total Cholesterol (TC) | 0.001 | 0.063 | -0.696 | -0.547 | 0.355 | 0.099 | -0.174 | 0.045 | 0.955 |
| Muscle Mass (MM) | -0.938 | -0.215 | 0.078 | 0.022 | 0.065 | 0.034 | -0.075 | 0.015 | 0.943 |
| Lean Mass (LM) (%) | -0.048 | -0.960 | -0.047 | 0.079 | -0.051 | 0.053 | 0.058 | 0.016 | 0.941 |
| Total Body Fat Ratio (TBFR) (%) | 0.044 | 0.959 | 0.057 | -0.072 | 0.048 | -0.047 | -0.060 | -0.007 | 0.938 |
| Extracellular Water (ECW) | -0.914 | -0.003 | 0.001 | 0.015 | -0.052 | 0.277 | 0.115 | -0.049 | 0.931 |
| Total Fat Content (TFC) | -0.360 | 0.878 | 0.122 | -0.041 | 0.071 | -0.033 | -0.090 | -0.009 | 0.930 |
| Alanin Aminotransferaz (ALT) | -0.377 | -0.141 | 0.021 | -0.497 | -0.687 | -0.172 | -0.110 | 0.013 | 0.923 |
| Body Mass Index (BMI) | -0.501 | 0.809 | 0.075 | 0.008 | 0.011 | -0.030 | -0.048 | 0.004 | 0.914 |
| Visceral Fat Area (VFA) | -0.655 | 0.679 | 0.097 | -0.003 | 0.068 | 0.024 | -0.086 | -0.015 | 0.911 |
| Total Body Water (TBW) | -0.928 | -0.170 | 0.061 | 0.030 | 0.027 | 0.092 | -0.020 | -0.013 | 0.905 |
| Aspartat Aminotransferaz (AST) | -0.274 | -0.139 | 0.027 | -0.523 | -0.718 | -0.103 | -0.070 | 0.053 | 0.903 |
| Visceral Fat Rating (VFR) | -0.694 | 0.517 | -0.240 | 0.197 | -0.152 | 0.080 | -0.001 | 0.025 | 0.876 |
| Low Density Lipoprotein (LDL) | -0.019 | -0.014 | -0.644 | -0.560 | 0.300 | 0.194 | -0.135 | -0.032 | 0.875 |
| Intracellular Water (ICW) | -0.871 | -0.288 | 0.132 | -0.009 | 0.088 | 0.001 | -0.068 | 0.015 | 0.871 |
| Height | -0.661 | -0.577 | 0.110 | -0.031 | 0.149 | 0.048 | -0.094 | -0.039 | 0.817 |
| Visceral Muscle Area (VMA) (Kg) | -0.867 | -0.202 | 0.052 | 0.025 | 0.063 | -0.019 | -0.102 | 0.002 | 0.811 |
| Glomerular Filtration Rate (GFR) | 0.079 | -0.127 | 0.555 | -0.567 | 0.256 | 0.103 | 0.254 | -0.052 | 0.794 |
| Bone Mass (BM) | -0.853 | -0.185 | 0.043 | 0.035 | 0.066 | 0.129 | 0.013 | 0.010 | 0.786 |
| Extracellular Fluid/Total Body Water (ECF/TBW) | 0.173 | 0.493 | -0.225 | -0.002 | -0.251 | 0.468 | 0.360 | -0.125 | 0.750 |
| Age | -0.006 | 0.350 | -0.633 | 0.369 | -0.278 | 0.017 | -0.079 | 0.071 | 0.748 |
| Body Protein Content (Protein) (%) | -0.004 | -0.738 | -0.201 | 0.153 | 0.035 | -0.219 | -0.228 | 0.075 | 0.715 |
| Creatinine | -0.458 | -0.430 | -0.383 | 0.344 | -0.122 | 0.001 | -0.097 | 0.046 | 0.687 |
| Obesity (%) | -0.039 | 0.204 | 0.018 | -0.016 | -0.047 | -0.233 | 0.213 | 0.713 | 0.654 |
| Triglyceride | -0.433 | -0.015 | -0.340 | -0.171 | 0.226 | -0.391 | 0.284 | -0.014 | 0.617 |
| Alkaline Phosphatase (ALP) | -0.061 | 0.152 | -0.177 | -0.079 | -0.157 | -0.410 | 0.087 | -0.563 | 0.581 |
| Hemoglobin (HGB) | -0.591 | -0.410 | -0.162 | -0.077 | -0.003 | 0.039 | 0.171 | 0.018 | 0.580 |
| Glucose | -0.198 | 0.093 | -0.320 | 0.097 | 0.103 | -0.413 | 0.483 | -0.029 | 0.575 |
| High Density Lipoprotein (HDL) | 0.415 | 0.200 | -0.078 | -0.090 | -0.027 | 0.182 | -0.419 | 0.231 | 0.489 |
| Hepatic Fat Accumulation (HFA) | -0.514 | 0.400 | -0.043 | 0.008 | -0.018 | -0.079 | 0.099 | 0.137 | 0.461 |
| Vitamin D | 0.010 | -0.125 | -0.135 | 0.139 | -0.181 | 0.533 | 0.182 | -0.148 | 0.425 |
| C-Reactive Protein (CRP) | 0.036 | 0.184 | 0.082 | 0.161 | 0.053 | -0.256 | -0.456 | -0.254 | 0.408 |
heat_df <- load_df %>%
select(-h2) %>%
rownames_to_column("Variable") %>%
pivot_longer(-Variable, names_to = "PC", values_to = "Loading")
ggplot(heat_df, aes(x = PC, y = Variable, fill = Loading)) +
geom_tile(color = "white") +
geom_text(aes(label = sprintf("%.2f", Loading)), size = 2.5) +
scale_fill_gradient2(low = "#2166AC", mid = "white", high = "#D6604D",
midpoint = 0, limits = c(-1,1)) +
labs(title = "Heatmap Component Loadings (Unrotated)",
x = "Principal Component", y = NULL) +
theme_minimal(base_size = 9) +
theme(axis.text.y = element_text(size = 7))Heatmap Loading Unrotated
Communality (h²): Proporsi variansi tiap variabel yang dijelaskan oleh 8 komponen yang dipertahankan.
Rotasi VARIMAX (ortogonal): Memaksimalkan variansi loading per kolom sehingga setiap variabel idealnya hanya memiliki loading tinggi pada satu faktor. Total variansi yang dijelaskan tidak berubah setelah rotasi — hanya redistribusi antar faktor untuk kemudahan interpretasi.
pca_rot <- principal(data_pca_f, nfactors = n_comp,
rotate = "varimax", scores = TRUE)
rot_load <- unclass(pca_rot$loadings)[, 1:n_comp]
colnames(rot_load) <- paste0("RC", 1:n_comp)
h2_rot <- pca_rot$communality
rot_df <- as.data.frame(round(rot_load, 3))
rot_df$h2 <- round(h2_rot, 3)rot_df %>%
kb(caption = "Rotated Component Matrix (VARIMAX) + Communality") %>%
scroll_box(width = "100%", height = "450px")| RC1 | RC2 | RC3 | RC4 | RC5 | RC6 | RC7 | RC8 | h2 | |
|---|---|---|---|---|---|---|---|---|---|
| Age | -0.145 | 0.214 | 0.792 | 0.121 | 0.004 | 0.091 | 0.172 | 0.030 | 0.748 |
| Height | 0.820 | -0.329 | -0.135 | 0.021 | 0.033 | -0.005 | -0.113 | -0.059 | 0.817 |
| Weight | 0.731 | 0.654 | 0.024 | -0.008 | 0.042 | 0.050 | -0.106 | -0.004 | 0.979 |
| Body Mass Index (BMI) | 0.231 | 0.918 | 0.107 | -0.029 | 0.019 | 0.061 | -0.047 | 0.018 | 0.914 |
| Total Body Water (TBW) | 0.936 | 0.115 | 0.024 | -0.028 | 0.097 | 0.070 | 0.011 | -0.004 | 0.905 |
| Extracellular Water (ECW) | 0.879 | 0.260 | 0.043 | -0.003 | 0.109 | 0.073 | 0.269 | -0.001 | 0.931 |
| Intracellular Water (ICW) | 0.916 | 0.000 | -0.061 | -0.034 | 0.091 | 0.057 | -0.122 | 0.010 | 0.871 |
| Extracellular Fluid/Total Body Water (ECF/TBW) | -0.304 | 0.381 | 0.123 | 0.064 | 0.021 | 0.026 | 0.701 | -0.023 | 0.750 |
| Total Body Fat Ratio (TBFR) (%) | -0.322 | 0.908 | 0.035 | 0.043 | -0.054 | -0.013 | -0.053 | 0.000 | 0.938 |
| Lean Mass (LM) (%) | 0.328 | -0.907 | -0.034 | -0.054 | 0.054 | 0.004 | 0.055 | 0.008 | 0.941 |
| Body Protein Content (Protein) (%) | 0.203 | -0.732 | 0.210 | 0.054 | -0.001 | 0.005 | -0.300 | -0.011 | 0.715 |
| Visceral Fat Rating (VFR) | 0.480 | 0.637 | 0.440 | 0.010 | 0.071 | 0.129 | 0.149 | 0.036 | 0.876 |
| Bone Mass (BM) | 0.878 | 0.077 | 0.007 | -0.004 | 0.044 | 0.063 | 0.048 | 0.025 | 0.786 |
| Muscle Mass (MM) | 0.958 | 0.079 | 0.013 | -0.018 | 0.092 | 0.059 | -0.083 | 0.008 | 0.943 |
| Obesity (%) | -0.075 | 0.179 | 0.058 | -0.068 | 0.080 | 0.134 | -0.143 | 0.751 | 0.654 |
| Total Fat Content (TFC) | 0.090 | 0.955 | 0.033 | -0.001 | -0.023 | 0.007 | -0.090 | -0.004 | 0.930 |
| Visceral Fat Area (VFA) | 0.431 | 0.847 | 0.068 | -0.001 | 0.004 | 0.029 | -0.053 | -0.009 | 0.911 |
| Visceral Muscle Area (VMA) (Kg) | 0.880 | 0.068 | 0.039 | -0.007 | 0.092 | 0.071 | -0.128 | -0.012 | 0.811 |
| Hepatic Fat Accumulation (HFA) | 0.336 | 0.510 | 0.122 | 0.000 | 0.070 | 0.191 | -0.009 | 0.178 | 0.461 |
| Glucose | 0.057 | 0.075 | 0.147 | 0.029 | -0.075 | 0.724 | -0.005 | 0.114 | 0.575 |
| Total Cholesterol (TC) | -0.033 | 0.030 | 0.063 | 0.972 | -0.002 | 0.063 | -0.022 | 0.010 | 0.955 |
| Low Density Lipoprotein (LDL) | 0.015 | -0.028 | 0.006 | 0.928 | 0.035 | 0.032 | 0.084 | -0.056 | 0.875 |
| High Density Lipoprotein (HDL) | -0.402 | 0.079 | 0.127 | 0.219 | -0.001 | -0.489 | -0.094 | 0.098 | 0.489 |
| Triglyceride | 0.319 | 0.068 | 0.025 | 0.291 | 0.025 | 0.628 | -0.152 | 0.084 | 0.617 |
| Aspartat Aminotransferaz (AST) | 0.161 | -0.029 | -0.040 | 0.020 | 0.933 | -0.028 | 0.035 | 0.037 | 0.903 |
| Alanin Aminotransferaz (ALT) | 0.255 | -0.003 | -0.011 | 0.016 | 0.925 | 0.015 | -0.042 | -0.010 | 0.923 |
| Alkaline Phosphatase (ALP) | -0.099 | 0.140 | 0.089 | -0.001 | 0.225 | 0.474 | -0.100 | -0.508 | 0.581 |
| Creatinine | 0.539 | -0.353 | 0.512 | 0.018 | 0.017 | 0.093 | 0.024 | 0.000 | 0.687 |
| Glomerular Filtration Rate (GFR) | -0.002 | -0.022 | -0.886 | 0.030 | 0.057 | -0.030 | 0.035 | 0.044 | 0.794 |
| C-Reactive Protein (CRP) | -0.077 | 0.177 | 0.164 | -0.090 | -0.064 | -0.134 | -0.423 | -0.368 | 0.408 |
| Hemoglobin (HGB) | 0.647 | -0.237 | 0.012 | 0.121 | 0.137 | 0.231 | 0.116 | 0.068 | 0.580 |
| Vitamin D | 0.075 | -0.145 | 0.107 | -0.018 | -0.042 | -0.138 | 0.595 | -0.108 | 0.425 |
data.frame(
Unrotated = paste0("PC", 1:n_comp),
Var_Unrot = paste0(round(prop_var[1:n_comp]*100, 2), "%"),
Rotated = paste0("RC", 1:n_comp),
Var_Rot = paste0(round(pca_rot$Vaccounted[2,1:n_comp]*100, 2), "%"),
Kumulatif = paste0(round(pca_rot$Vaccounted[3,1:n_comp]*100, 2), "%")
) %>%
kb(caption = "Perbandingan Variansi Unrotated vs Rotated (Total tidak berubah)")| Unrotated | Var_Unrot | Rotated | Var_Rot | Kumulatif |
|---|---|---|---|---|
| PC1 | 27.54% | RC1 | 25.65% | 25.65% |
| PC2 | 19.88% | RC2 | 20.16% | 45.82% |
| PC3 | 7.21% | RC3 | 6.51% | 52.33% |
| PC4 | 5.92% | RC4 | 6.25% | 58.58% |
| PC5 | 5.14% | RC5 | 5.9% | 64.48% |
| PC6 | 4.44% | RC6 | 5.01% | 69.49% |
| PC7 | 3.84% | RC7 | 4.41% | 73.89% |
| PC8 | 3.21% | RC8 | 3.28% | 77.17% |
rot_long <- rot_df %>%
select(-h2) %>%
rownames_to_column("Variable") %>%
pivot_longer(-Variable, names_to = "RC", values_to = "Loading")
ggplot(rot_long, aes(x = RC, y = Variable, fill = Loading)) +
geom_tile(color = "white") +
geom_text(aes(label = ifelse(abs(Loading) >= 0.30,
sprintf("%.2f", Loading), "")),
size = 2.5) +
scale_fill_gradient2(low = "#2166AC", mid = "white", high = "#D6604D",
midpoint = 0, limits = c(-1,1)) +
labs(title = "Heatmap Rotated Component Loadings (VARIMAX)",
subtitle = "Teks hanya jika |loading| >= 0.30",
x = "Rotated Component", y = NULL) +
theme_minimal(base_size = 9) +
theme(axis.text.y = element_text(size = 7))Heatmap Rotated VARIMAX — Teks ditampilkan jika |loading| >= 0.30
dom_results <- list()
for (k in 1:n_comp) {
ld <- rot_load[, k]
dom <- sort(ld[abs(ld) > 0.40], decreasing = TRUE)
if (length(dom) > 0) {
dom_results[[k]] <- data.frame(
Faktor = paste0("RC", k),
Variabel = names(dom),
Loading = round(dom, 4)
)
}
}
if (length(dom_results) > 0) {
bind_rows(dom_results) %>%
kb(caption = "Variabel Dominan per Faktor (|loading| > 0.40)") %>%
scroll_box(width = "100%", height = "400px")
}| Faktor | Variabel | Loading | |
|---|---|---|---|
| Muscle Mass (MM) | RC1 | Muscle Mass (MM) | 0.9579 |
| Total Body Water (TBW) | RC1 | Total Body Water (TBW) | 0.9356 |
| Intracellular Water (ICW) | RC1 | Intracellular Water (ICW) | 0.9163 |
| Visceral Muscle Area (VMA) (Kg) | RC1 | Visceral Muscle Area (VMA) (Kg) | 0.8803 |
| Extracellular Water (ECW) | RC1 | Extracellular Water (ECW) | 0.8788 |
| Bone Mass (BM) | RC1 | Bone Mass (BM) | 0.8780 |
| Height | RC1 | Height | 0.8204 |
| Weight…8 | RC1 | Weight | 0.7315 |
| Hemoglobin (HGB) | RC1 | Hemoglobin (HGB) | 0.6466 |
| Creatinine…10 | RC1 | Creatinine | 0.5388 |
| Visceral Fat Rating (VFR)…11 | RC1 | Visceral Fat Rating (VFR) | 0.4804 |
| Visceral Fat Area (VFA)…12 | RC1 | Visceral Fat Area (VFA) | 0.4307 |
| High Density Lipoprotein (HDL)…13 | RC1 | High Density Lipoprotein (HDL) | -0.4016 |
| Total Fat Content (TFC) | RC2 | Total Fat Content (TFC) | 0.9553 |
| Body Mass Index (BMI) | RC2 | Body Mass Index (BMI) | 0.9177 |
| Total Body Fat Ratio (TBFR) (%) | RC2 | Total Body Fat Ratio (TBFR) (%) | 0.9082 |
| Visceral Fat Area (VFA)…17 | RC2 | Visceral Fat Area (VFA) | 0.8470 |
| Weight…18 | RC2 | Weight | 0.6540 |
| Visceral Fat Rating (VFR)…19 | RC2 | Visceral Fat Rating (VFR) | 0.6375 |
| Hepatic Fat Accumulation (HFA) | RC2 | Hepatic Fat Accumulation (HFA) | 0.5100 |
| Body Protein Content (Protein) (%) | RC2 | Body Protein Content (Protein) (%) | -0.7324 |
| Lean Mass (LM) (%) | RC2 | Lean Mass (LM) (%) | -0.9073 |
| Age | RC3 | Age | 0.7925 |
| Creatinine…24 | RC3 | Creatinine | 0.5117 |
| Visceral Fat Rating (VFR)…25 | RC3 | Visceral Fat Rating (VFR) | 0.4395 |
| Glomerular Filtration Rate (GFR) | RC3 | Glomerular Filtration Rate (GFR) | -0.8864 |
| Total Cholesterol (TC) | RC4 | Total Cholesterol (TC) | 0.9721 |
| Low Density Lipoprotein (LDL) | RC4 | Low Density Lipoprotein (LDL) | 0.9281 |
| Aspartat Aminotransferaz (AST) | RC5 | Aspartat Aminotransferaz (AST) | 0.9332 |
| Alanin Aminotransferaz (ALT) | RC5 | Alanin Aminotransferaz (ALT) | 0.9250 |
| Glucose | RC6 | Glucose | 0.7242 |
| Triglyceride | RC6 | Triglyceride | 0.6282 |
| Alkaline Phosphatase (ALP)…33 | RC6 | Alkaline Phosphatase (ALP) | 0.4739 |
| High Density Lipoprotein (HDL)…34 | RC6 | High Density Lipoprotein (HDL) | -0.4891 |
| Extracellular Fluid/Total Body Water (ECF/TBW) | RC7 | Extracellular Fluid/Total Body Water (ECF/TBW) | 0.7012 |
| Vitamin D | RC7 | Vitamin D | 0.5949 |
| C-Reactive Protein (CRP) | RC7 | C-Reactive Protein (CRP) | -0.4226 |
| Obesity (%) | RC8 | Obesity (%) | 0.7508 |
| Alkaline Phosphatase (ALP)…39 | RC8 | Alkaline Phosphatase (ALP) | -0.5085 |
fviz_pca_biplot(pca_result,
axes = c(1,2), geom.ind = "point",
col.ind = "grey60", alpha.ind = 0.3,
col.var = "contrib",
gradient.cols = c("#00AFBB","#E7B800","#FC4E07"),
repel = TRUE, label = "var",
arrowsize = 0.8, labelsize = 3.2) +
labs(title = "Biplot PCA: PC1 vs PC2",
subtitle = sprintf("PC1 = %.1f%% | PC2 = %.1f%% | Total = %.1f%%",
prop_var[1]*100, prop_var[2]*100,
(prop_var[1]+prop_var[2])*100),
color = "Kontribusi (%)") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(face="bold", hjust=0.5))Biplot PCA — Arah dan panjang panah menunjukkan korelasi variabel dengan PC
Interpretasi Biplot: - Panjang panah: Semakin panjang, semakin besar kontribusi variabel ke PC yang ditampilkan - Arah panah: Variabel yang berpanah ke arah sama memiliki korelasi positif; berlawanan arah = korelasi negatif - Warna panah: Gradien sesuai kontribusi (hijau=rendah, merah=tinggi) - Titik: Representasi tiap observasi dalam ruang dua dimensi pertama
if (n_comp >= 3) {
fviz_pca_biplot(pca_result, axes = c(1,3),
geom.ind = "point", col.ind = "grey60", alpha.ind = 0.3,
col.var = "contrib",
gradient.cols = c("#00AFBB","#E7B800","#FC4E07"),
repel = TRUE, label = "var",
arrowsize = 0.8, labelsize = 3.2) +
labs(title = "Biplot PCA: PC1 vs PC3",
subtitle = sprintf("PC1 = %.1f%% | PC3 = %.1f%%",
prop_var[1]*100, prop_var[3]*100)) +
theme_minimal(base_size = 12)
}Biplot PCA — PC1 vs PC3
fviz_contrib(pca_result, choice = "var", axes = 1:2, top = 20,
fill = "#3498DB", color = "#2980B9") +
labs(title = "Top 20 Kontribusi Variabel ke PC1 & PC2",
subtitle = "Garis merah = kontribusi rata-rata (jika semua setara)") +
theme_minimal(base_size = 11)Top 20 variabel dengan kontribusi tertinggi ke PC1 dan PC2
fviz_cos2(pca_result, choice = "var", axes = 1:2,
fill = "#ABD9E9", color = "#4575B4") +
labs(title = "Cos² Variabel pada PC1 + PC2",
subtitle = "Cos² tinggi = variabel terwakili baik di dimensi pertama") +
theme_minimal(base_size = 11)Cos² per variabel — semakin tinggi semakin baik representasinya di 2D pertama
Validasi Split-Sample: Data dibagi secara acak (50:50). PCA/FA dijalankan pada masing-masing bagian. Jika selisih variansi per faktor < 5%, hasil dianggap stabil dan konsisten.
set.seed(42)
idx_split <- sample(1:nrow(data_pca_f), nrow(data_pca_f)/2)
data_s1 <- data_pca_f[idx_split, ]
data_s2 <- data_pca_f[-idx_split, ]
fa_s1 <- principal(data_s1, nfactors = n_comp, rotate = "varimax")
fa_s2 <- principal(data_s2, nfactors = n_comp, rotate = "varimax")
n_show <- min(n_comp, 5)
var_s1 <- round(fa_s1$Vaccounted[2,1:n_show]*100, 2)
var_s2 <- round(fa_s2$Vaccounted[2,1:n_show]*100, 2)
sel_s <- round(abs(var_s1-var_s2), 2)
data.frame(
Faktor = paste0("RC", 1:n_show),
`Sampel 1 (n)` = paste0(var_s1, "%"),
`Sampel 2 (n)` = paste0(var_s2, "%"),
Selisih = paste0(sel_s, "%"),
Konsisten = ifelse(sel_s < 5, "✅ Ya","❌ Tidak"),
check.names = FALSE
) %>%
kb(caption = sprintf("Validasi Split-Sample: Sampel 1 (n=%d) vs Sampel 2 (n=%d)",
nrow(data_s1), nrow(data_s2))) %>%
column_spec(5, bold = TRUE,
color = ifelse(sel_s < 5, "darkgreen","red"))| Faktor | Sampel 1 (n) | Sampel 2 (n) | Selisih | Konsisten | |
|---|---|---|---|---|---|
| RC1 | RC1 | 25.92% | 25.82% | 0.1% | ✅ Ya | |
| RC2 | RC2 | 23.37% | 19.42% | 3.95% | ✅ Ya | |
| RC3 | RC3 | 6.81% | 6.87% | 0.06% | ✅ Ya | |
| RC4 | RC4 | 6.43% | 6.23% | 0.2% | ✅ Ya | |
| RC5 | RC5 | 5.85% | 6.08% | 0.23% | ✅ Ya | |
Interpretasi: Selisih terbesar antar sampel = 3.95%. Hasil konsisten antar split-sample (semua selisih < 5%) → solusi PCA stabil ✅.
pc_scores <- as.data.frame(pca_result$x[, 1:n_comp])
colnames(pc_scores) <- paste0("PC", 1:n_comp)
fa_scores <- as.data.frame(pca_rot$scores)
colnames(fa_scores) <- paste0("Factor_", 1:n_comp)
cat("Verifikasi independensi PC Scores (semua harus mendekati 0):\n")## Verifikasi independensi PC Scores (semua harus mendekati 0):
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## PC1 1 0 0 0 0 0 0 0
## PC2 0 1 0 0 0 0 0 0
## PC3 0 0 1 0 0 0 0 0
## PC4 0 0 0 1 0 0 0 0
## PC5 0 0 0 0 1 0 0 0
## PC6 0 0 0 0 0 1 0 0
## PC7 0 0 0 0 0 0 1 0
## PC8 0 0 0 0 0 0 0 1
write.csv(cbind(data_pca_f, pc_scores), "OUTPUT_PCA_scores.csv", row.names=FALSE)
write.csv(cbind(data_pca_f, fa_scores), "OUTPUT_Factor_scores.csv", row.names=FALSE)
cat("\nFile output berhasil disimpan.\n")##
## File output berhasil disimpan.
data.frame(
Metrik = c(
"Dataset","Observasi (n)","Variabel Awal (numerik)",
"Variabel Final","Variabel Dibuang (KMO)",
"Pasangan |r| > 0.3","KMO Overall (Final)",
"Bartlett χ² / p-value",
"Komponen Dipertahankan","Variansi Total Dijelaskan",
"Validasi Split-Sample"
),
Nilai = c(
"Body Composition & Biomarker — Gallstone",
as.character(n_final), as.character(ncol(data_num)),
as.character(p_final),
ifelse(length(drop_log)>0,
paste0(length(drop_log)," var: ",paste(drop_log,collapse=",")),
"Tidak ada (semua MSA ≥ 0.50)"),
sprintf("%d / %d (%.1f%%)", n_sig, n_pairs, pct_sig),
sprintf("%.4f (%s)", kmo_final$MSA, kmo_f_kat),
sprintf("%.2f / p %s", bart_res$chisq, p_disp_b),
sprintf("%d komponen (PC1–PC%d)", n_comp, n_comp),
sprintf("%.2f%%", cum_var[n_comp]*100),
ifelse(max(sel_s)<5, "Konsisten ✅","Ada perbedaan >5% ⚠️")
)
) %>%
kb(caption = "Ringkasan Akhir Hasil Analisis PCA") %>%
column_spec(1, bold = TRUE, width = "22em") %>%
row_spec(c(9,10,11), background = "#EBF5FB", bold = TRUE)| Metrik | Nilai |
|---|---|
| Dataset | Body Composition & Biomarker — Gallstone |
| Observasi (n) | 319 |
| Variabel Awal (numerik) | 32 |
| Variabel Final | 32 |
| Variabel Dibuang (KMO) | Tidak ada (semua MSA ≥ 0.50) |
| Pasangan |r| > 0.3 | 145 / 496 (29.2%) |
| KMO Overall (Final) | 0.8025 (Meritorious) |
| Bartlett χ² / p-value | 13198.57 / p ≈ 0 (< 2.2e-16) |
| Komponen Dipertahankan | 8 komponen (PC1–PC8) |
| Variansi Total Dijelaskan | 77.17% |
| Validasi Split-Sample | Konsisten ✅ | |
data.frame(
Komponen = paste0("PC", 1:n_comp),
Eigenvalue = round(eigenvalues[1:n_comp], 4),
`Var(%)` = round(prop_var[1:n_comp]*100, 2),
`Kumulatif(%)` = round(cum_var[1:n_comp]*100, 2),
check.names = FALSE
) %>%
kb(caption = "Variansi per Komponen yang Dipertahankan")| Komponen | Eigenvalue | Var(%) | Kumulatif(%) |
|---|---|---|---|
| PC1 | 8.8113 | 27.54 | 27.54 |
| PC2 | 6.3617 | 19.88 | 47.42 |
| PC3 | 2.3086 | 7.21 | 54.63 |
| PC4 | 1.8937 | 5.92 | 60.55 |
| PC5 | 1.6441 | 5.14 | 65.69 |
| PC6 | 1.4211 | 4.44 | 70.13 |
| PC7 | 1.2277 | 3.84 | 73.96 |
| PC8 | 1.0274 | 3.21 | 77.17 |
29.2% pasang variabel berkorelasi signifikan (>30% ✅), Bartlett p ≈ 0 (✅), KMO = 0.803 (Meritorious ✅). Data body composition dan biomarker ini memiliki struktur laten yang kuat, artinya terdapat dimensi-dimensi tersembunyi yang mendasari korelasi antar variabel.
8 komponen menjelaskan 77.17% variansi dari 32 variabel asli. Ini berarti kompleksitas data dapat direduksi secara signifikan tanpa kehilangan banyak informasi.
PC1 mendominasi dengan menjelaskan 27.54% variansi, mengindikasikan adanya dimensi “umum” yang membedakan antar observasi secara paling kuat. Komponen selanjutnya menambahkan dimensi ortogonal yang merepresentasikan aspek berbeda dari data.
Validasi split-sample menunjukkan selisih variansi maksimal 3.95% antar dua sub-sampel → hasil stabil dan dapat diandalkan.
psych — KMO(), cortest.bartlett(), principal()factoextra — fviz_eig(), fviz_pca_biplot(),
fviz_contrib()corrplot — corrplot()kableExtra — formatting tabel HTMLTanggal Analisis: 27 February 2026
Analisis Multivariat — PCA Gallstone Dataset