This analysis uses provincial socioeconomic data from BPS (Badan Pusat Statistik). The dataset contains multiple numerical variables representing various social and economic indicators across Indonesian provinces. Non-analytical columns such as province names, year, poverty line values, and poverty headcount figures were excluded from the analysis, as they either serve as identifiers or are directly derived from the target variables. After removing non-analytical variables and performing manual imputation prior to loading, the dataset was ready for multivariate analysis.
df <- read.csv("sps data bps - full imputasi manual - data bps (2).csv",
stringsAsFactors = FALSE, check.names = FALSE)
head(df)
## PROVINSI TAHUN APS 13-15 APS 16-18 APS 19-24 TPT - Februari
## 1 ACEH 2021 98,42 83,28 32,61 6,3
## 2 SUMATERA UTARA 2021 96,99 78,66 27,05 6,01
## 3 SUMATERA BARAT 2021 96,63 84,07 36,41 6,67
## 4 RIAU 2021 95,66 77,81 28,79 4,96
## 5 JAMBI 2021 96,39 72,5 24,14 4,76
## 6 SUMATERA SELATAN 2021 94,85 71,53 18,81 5,17
## TPT - Agustus TPAK - Februari TPAK - Agustus Garis Kemiskinan - Maret (Rp)
## 1 6,3 65,14 63,78 541109
## 2 6,33 69,39 69,1 525756
## 3 6,52 68,41 67,72 568703
## 4 4,42 65,81 65,03 565937
## 5 5,09 67,3 67,17 506355
## 6 4,98 69,95 68,77 457455
## Garis Kemiskinan - September (Rp) Jumlah Penduduk Miskin - Maret (ribu)
## 1 665855 834,24
## 2 648336 1343,86
## 3 714991 370,67
## 4 702620 500,81
## 5 658100 293,86
## 6 564462 1113,76
## Jumlah Penduduk Miskin - September (ribu) Persentase Penduduk Miskin - Maret
## 1 718,96 15,33
## 2 1110,92 9,01
## 3 315,43 6,63
## 4 473,04 7,12
## 5 272,7 8,09
## 6 948,84 12,84
## Persentase Penduduk Miskin - September Indeks Pembangunan Manusia
## 1 12,64 73,48
## 2 7,19 73,84
## 3 5,42 74,56
## 4 6,36 73,89
## 5 7,26 72,62
## 6 10,51 71,83
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## 1 9,77
## 2 9,88
## 3 9,46
## 4 9,52
## 5 9,03
## 6 8,78
## Harapan Lama Sekolah (Tahun)\n
## 1 14,36
## 2 13,27
## 3 14,09
## 4 13,28
## 5 13,04
## 6 12,54
summary(df)
## PROVINSI TAHUN APS 13-15 APS 16-18
## Length:152 Min. :2021 Length:152 Length:152
## Class :character 1st Qu.:2022 Class :character Class :character
## Mode :character Median :2022 Mode :character Mode :character
## Mean :2022
## 3rd Qu.:2023
## Max. :2024
## APS 19-24 TPT - Februari TPT - Agustus TPAK - Februari
## Length:152 Length:152 Length:152 Length:152
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## TPAK - Agustus Garis Kemiskinan - Maret (Rp)
## Length:152 Min. : 364251
## Class :character 1st Qu.: 490383
## Mode :character Median : 569536
## Mean : 583579
## 3rd Qu.: 662321
## Max. :1007060
## Garis Kemiskinan - September (Rp) Jumlah Penduduk Miskin - Maret (ribu)
## Min. :460283 Length:152
## 1st Qu.:547751 Class :character
## Median :646222 Mode :character
## Mean :652547
## 3rd Qu.:708811
## Max. :917673
## Jumlah Penduduk Miskin - September (ribu) Persentase Penduduk Miskin - Maret
## Length:152 Length:152
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Persentase Penduduk Miskin - September Indeks Pembangunan Manusia
## Length:152 Length:152
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## Length:152
## Class :character
## Mode :character
##
##
##
## Harapan Lama Sekolah (Tahun)\n
## Length:152
## Class :character
## Mode :character
##
##
##
str(df)
## 'data.frame': 152 obs. of 18 variables:
## $ PROVINSI : chr "ACEH" "SUMATERA UTARA" "SUMATERA BARAT" "RIAU" ...
## $ TAHUN : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ APS 13-15 : chr "98,42" "96,99" "96,63" "95,66" ...
## $ APS 16-18 : chr "83,28" "78,66" "84,07" "77,81" ...
## $ APS 19-24 : chr "32,61" "27,05" "36,41" "28,79" ...
## $ TPT - Februari : chr "6,3" "6,01" "6,67" "4,96" ...
## $ TPT - Agustus : chr "6,3" "6,33" "6,52" "4,42" ...
## $ TPAK - Februari : chr "65,14" "69,39" "68,41" "65,81" ...
## $ TPAK - Agustus : chr "63,78" "69,1" "67,72" "65,03" ...
## $ Garis Kemiskinan - Maret (Rp) : int 541109 525756 568703 565937 506355 457455 548934 471439 752203 642425 ...
## $ Garis Kemiskinan - September (Rp) : int 665855 648336 714991 702620 658100 564462 672816 599018 917673 807602 ...
## $ Jumlah Penduduk Miskin - Maret (ribu) : chr "834,24" "1343,86" "370,67" "500,81" ...
## $ Jumlah Penduduk Miskin - September (ribu) : chr "718,96" "1110,92" "315,43" "473,04" ...
## $ Persentase Penduduk Miskin - Maret : chr "15,33" "9,01" "6,63" "7,12" ...
## $ Persentase Penduduk Miskin - September : chr "12,64" "7,19" "5,42" "6,36" ...
## $ Indeks Pembangunan Manusia : chr "73,48" "73,84" "74,56" "73,89" ...
## $ Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi: chr "9,77" "9,88" "9,46" "9,52" ...
## $ Harapan Lama Sekolah (Tahun)
## : chr "14,36" "13,27" "14,09" "13,28" ...
colnames(df)
## [1] "PROVINSI"
## [2] "TAHUN"
## [3] "APS 13-15"
## [4] "APS 16-18"
## [5] "APS 19-24"
## [6] "TPT - Februari"
## [7] "TPT - Agustus"
## [8] "TPAK - Februari"
## [9] "TPAK - Agustus"
## [10] "Garis Kemiskinan - Maret (Rp)"
## [11] "Garis Kemiskinan - September (Rp)"
## [12] "Jumlah Penduduk Miskin - Maret (ribu)"
## [13] "Jumlah Penduduk Miskin - September (ribu)"
## [14] "Persentase Penduduk Miskin - Maret"
## [15] "Persentase Penduduk Miskin - September"
## [16] "Indeks Pembangunan Manusia"
## [17] "Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi"
## [18] " Harapan Lama Sekolah (Tahun)\n"
for (col in colnames(df)) {
if (is.character(df[[col]]) && col != "PROVINSI") {
df[[col]] <- as.numeric(gsub(",", ".", df[[col]]))
}
}
drop_vars <- c(
"PROVINSI",
"TAHUN",
"Garis Kemiskinan - Maret (Rp)",
"Garis Kemiskinan - September (Rp)",
"Jumlah Penduduk Miskin - Maret (ribu)",
"Jumlah Penduduk Miskin - September (ribu)"
)
data_clean <- df[, !(colnames(df) %in% drop_vars)]
data_clean <- data_clean[, sapply(data_clean, is.numeric)]
data_clean <- na.omit(data_clean)
str(data_clean)
## 'data.frame': 152 obs. of 12 variables:
## $ APS 13-15 : num 98.4 97 96.6 95.7 96.4 ...
## $ APS 16-18 : num 83.3 78.7 84.1 77.8 72.5 ...
## $ APS 19-24 : num 32.6 27.1 36.4 28.8 24.1 ...
## $ TPT - Februari : num 6.3 6.01 6.67 4.96 4.76 ...
## $ TPT - Agustus : num 6.3 6.33 6.52 4.42 5.09 4.98 3.65 4.69 5.03 9.91 ...
## $ TPAK - Februari : num 65.1 69.4 68.4 65.8 67.3 ...
## $ TPAK - Agustus : num 63.8 69.1 67.7 65 67.2 ...
## $ Persentase Penduduk Miskin - Maret : num 15.33 9.01 6.63 7.12 8.09 ...
## $ Persentase Penduduk Miskin - September : num 12.64 7.19 5.42 6.36 7.26 ...
## $ Indeks Pembangunan Manusia : num 73.5 73.8 74.6 73.9 72.6 ...
## $ Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi: num 9.77 9.88 9.46 9.52 9.03 ...
## $ Harapan Lama Sekolah (Tahun)
## : num 14.4 13.3 14.1 13.3 13 ...
dim(data_clean)
## [1] 152 12
colSums(is.na(df[, !(colnames(df) %in% drop_vars)]))
## APS 13-15
## 0
## APS 16-18
## 0
## APS 19-24
## 0
## TPT - Februari
## 0
## TPT - Agustus
## 0
## TPAK - Februari
## 0
## TPAK - Agustus
## 0
## Persentase Penduduk Miskin - Maret
## 0
## Persentase Penduduk Miskin - September
## 0
## Indeks Pembangunan Manusia
## 0
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## 0
## Harapan Lama Sekolah (Tahun)\n
## 0
Before proceeding with Factor Analysis, it is important to assess whether the available sample size is sufficient relative to the number of variables. This dataset consists of provincial observations from BPS, which typically yields around 34 to 38 observations (one per province per year). Factor Analysis requires adequate sample size to produce stable and replicable factor solutions. Two commonly applied guidelines are: a minimum ratio of 5 to 10 observations per variable (Hair et al., 2019), and an absolute minimum of n > 100 observations as a traditional rule of thumb. When sample size is small, results should be interpreted with caution, and communalities and factor loadings should be cross-validated if possible.
n_obs <- nrow(data_clean)
n_vars <- ncol(data_clean)
ratio <- n_obs / n_vars
cat("Number of observations (n) :", n_obs, "\n")
## Number of observations (n) : 152
cat("Number of variables (p) :", n_vars, "\n")
## Number of variables (p) : 12
cat("Observation-to-variable ratio:", round(ratio, 2), "\n\n")
## Observation-to-variable ratio: 12.67
if (ratio >= 10) {
cat("Ratio >= 10: Sample size is adequate (Hair et al., 2019).\n")
} else if (ratio >= 5) {
cat("Ratio 5-10: Marginally adequate. Interpret results with caution.\n")
} else {
cat("Ratio < 5: Sample size may be insufficient. Results should be treated as exploratory.\n")
}
## Ratio >= 10: Sample size is adequate (Hair et al., 2019).
if (n_obs >= 100) {
cat("n >= 100: Meets traditional absolute minimum.\n")
} else {
cat("n < 100: Below traditional absolute minimum. FA results may be unstable.\n")
cat("Interpretation should be treated as preliminary and exploratory.\n")
}
## n >= 100: Meets traditional absolute minimum.
The histograms below provide a visual overview of each variable’s distribution. Variables tied to economic conditions such as expenditure or poverty-related indicators typically exhibit right-skewed distributions, while variables related to access and participation rates tend to be more symmetrically distributed.
The Shapiro-Wilk test is included for descriptive purposes only. Because this analysis uses Principal Axis Factoring (PAF) as the extraction method, strict multivariate normality is not required. Normality assessment would only be critical if Maximum Likelihood estimation were used. Results from the Shapiro-Wilk test are therefore interpreted as supplementary distributional information rather than a prerequisite check.
normality_results <- data.frame(
Variable = colnames(data_clean),
Shapiro_W = NA_real_,
p_value = NA_real_,
Skewness = NA_real_,
Kurtosis = NA_real_,
stringsAsFactors = FALSE
)
for (i in seq_along(colnames(data_clean))) {
col <- colnames(data_clean)[i]
x <- data_clean[[col]]
sw <- shapiro.test(x)
normality_results$Shapiro_W[i] <- round(sw$statistic, 4)
normality_results$p_value[i] <- round(sw$p.value, 4)
normality_results$Skewness[i] <- round(psych::skew(x), 4)
normality_results$Kurtosis[i] <- round(psych::kurtosi(x), 4)
}
normality_results$Normal <- ifelse(normality_results$p_value > 0.05, "Yes", "No")
print(normality_results)
## Variable
## 1 APS 13-15
## 2 APS 16-18
## 3 APS 19-24
## 4 TPT - Februari
## 5 TPT - Agustus
## 6 TPAK - Februari
## 7 TPAK - Agustus
## 8 Persentase Penduduk Miskin - Maret
## 9 Persentase Penduduk Miskin - September
## 10 Indeks Pembangunan Manusia
## 11 Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## 12 Harapan Lama Sekolah (Tahun)\n
## Shapiro_W p_value Skewness Kurtosis Normal
## 1 0.7139 0.0000 -1.8599 2.4117 No
## 2 0.9729 0.0042 0.2860 -0.5467 No
## 3 0.8821 0.0000 1.5081 3.7814 No
## 4 0.9086 0.0000 0.9788 0.3434 No
## 5 0.9555 0.0001 0.6822 0.2138 No
## 6 0.9662 0.0009 0.5784 -0.0068 No
## 7 0.9428 0.0000 0.5368 -0.4535 No
## 8 0.8742 0.0000 1.0844 0.3242 No
## 9 0.8530 0.0000 1.3093 1.2354 No
## 10 0.9403 0.0000 -0.4440 0.3897 No
## 11 0.9641 0.0005 -0.6898 1.1973 No
## 12 0.9099 0.0000 -0.4487 2.1534 No
n_cols_plot <- 3
n_rows_plot <- ceiling(n_vars / n_cols_plot)
colors <- colorRampPalette(c("lightblue", "steelblue", "lightgreen",
"lightpink", "khaki", "plum"))(n_vars)
par(mfrow = c(n_rows_plot, n_cols_plot), mar = c(4, 4, 3, 1))
for (i in seq_along(colnames(data_clean))) {
col <- colnames(data_clean)[i]
hist(data_clean[[col]],
main = paste("Distribution of", col),
xlab = col,
col = colors[i],
border = "white",
breaks = 15)
}
par(mfrow = c(1, 1))
Outliers were assessed using the Z-score method. Observations with a Z-score greater than plus or minus 3 are considered extreme outliers. Extreme outliers in socioeconomic data often reflect genuine variation between provinces. DKI Jakarta, for example, typically has very different income or expenditure levels compared to other provinces. Rather than removing such observations, they should be noted and interpreted with care in subsequent analyses.
z_scores <- scale(data_clean)
outlier_count <- colSums(abs(z_scores) > 3)
print(data.frame(Variable = names(outlier_count), Outliers_Z3 = outlier_count))
## Variable
## APS 13-15 APS 13-15
## APS 16-18 APS 16-18
## APS 19-24 APS 19-24
## TPT - Februari TPT - Februari
## TPT - Agustus TPT - Agustus
## TPAK - Februari TPAK - Februari
## TPAK - Agustus TPAK - Agustus
## Persentase Penduduk Miskin - Maret Persentase Penduduk Miskin - Maret
## Persentase Penduduk Miskin - September Persentase Penduduk Miskin - September
## Indeks Pembangunan Manusia Indeks Pembangunan Manusia
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## Harapan Lama Sekolah (Tahun)\n Harapan Lama Sekolah (Tahun)\n
## Outliers_Z3
## APS 13-15 0
## APS 16-18 0
## APS 19-24 4
## TPT - Februari 1
## TPT - Agustus 2
## TPAK - Februari 0
## TPAK - Agustus 0
## Persentase Penduduk Miskin - Maret 1
## Persentase Penduduk Miskin - September 4
## Indeks Pembangunan Manusia 0
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 1
## Harapan Lama Sekolah (Tahun)\n 1
par(mfrow = c(n_rows_plot, n_cols_plot), mar = c(4, 4, 3, 1))
for (i in seq_along(colnames(data_clean))) {
col <- colnames(data_clean)[i]
boxplot(data_clean[[col]],
main = col,
col = colors[i],
border = "gray30",
horizontal = TRUE)
}
par(mfrow = c(1, 1))
The correlation matrix was computed to examine linear relationships among all variables. For Factor Analysis to be appropriate, there should be sufficient intercorrelation among variables generally, pairs with an absolute correlation of at least 0.30 are desirable. Variables with very low correlations across the board contribute little shared variance and may not be suitable for factor extraction.
The determinant of the correlation matrix is reported as a check for multicollinearity. A determinant close to zero indicates extreme multicollinearity that can cause numerical instability. Variable pairs with absolute correlation exceeding 0.90 are additionally flagged, as extreme correlations may indicate near-redundancy and could destabilise the factor solution.
cor_matrix <- cor(data_clean)
round(cor_matrix, 3)
## APS 13-15
## APS 13-15 1.000
## APS 16-18 0.715
## APS 19-24 0.327
## TPT - Februari 0.443
## TPT - Agustus 0.516
## TPAK - Februari -0.566
## TPAK - Agustus -0.624
## Persentase Penduduk Miskin - Maret -0.657
## Persentase Penduduk Miskin - September -0.652
## Indeks Pembangunan Manusia 0.785
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.732
## Harapan Lama Sekolah (Tahun)\n 0.735
## APS 16-18
## APS 13-15 0.715
## APS 16-18 1.000
## APS 19-24 0.654
## TPT - Februari 0.209
## TPT - Agustus 0.236
## TPAK - Februari -0.218
## TPAK - Agustus -0.229
## Persentase Penduduk Miskin - Maret -0.260
## Persentase Penduduk Miskin - September -0.273
## Indeks Pembangunan Manusia 0.539
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.658
## Harapan Lama Sekolah (Tahun)\n 0.790
## APS 19-24
## APS 13-15 0.327
## APS 16-18 0.654
## APS 19-24 1.000
## TPT - Februari -0.070
## TPT - Agustus -0.044
## TPAK - Februari -0.066
## TPAK - Agustus -0.055
## Persentase Penduduk Miskin - Maret 0.038
## Persentase Penduduk Miskin - September 0.038
## Indeks Pembangunan Manusia 0.265
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.369
## Harapan Lama Sekolah (Tahun)\n 0.728
## TPT - Februari
## APS 13-15 0.443
## APS 16-18 0.209
## APS 19-24 -0.070
## TPT - Februari 1.000
## TPT - Agustus 0.953
## TPAK - Februari -0.611
## TPAK - Agustus -0.622
## Persentase Penduduk Miskin - Maret -0.378
## Persentase Penduduk Miskin - September -0.389
## Indeks Pembangunan Manusia 0.467
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.538
## Harapan Lama Sekolah (Tahun)\n 0.152
## TPT - Agustus
## APS 13-15 0.516
## APS 16-18 0.236
## APS 19-24 -0.044
## TPT - Februari 0.953
## TPT - Agustus 1.000
## TPAK - Februari -0.613
## TPAK - Agustus -0.649
## Persentase Penduduk Miskin - Maret -0.433
## Persentase Penduduk Miskin - September -0.441
## Indeks Pembangunan Manusia 0.502
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.539
## Harapan Lama Sekolah (Tahun)\n 0.193
## TPAK - Februari
## APS 13-15 -0.566
## APS 16-18 -0.218
## APS 19-24 -0.066
## TPT - Februari -0.611
## TPT - Agustus -0.613
## TPAK - Februari 1.000
## TPAK - Agustus 0.916
## Persentase Penduduk Miskin - Maret 0.563
## Persentase Penduduk Miskin - September 0.555
## Indeks Pembangunan Manusia -0.522
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.610
## Harapan Lama Sekolah (Tahun)\n -0.364
## TPAK - Agustus
## APS 13-15 -0.624
## APS 16-18 -0.229
## APS 19-24 -0.055
## TPT - Februari -0.622
## TPT - Agustus -0.649
## TPAK - Februari 0.916
## TPAK - Agustus 1.000
## Persentase Penduduk Miskin - Maret 0.596
## Persentase Penduduk Miskin - September 0.584
## Indeks Pembangunan Manusia -0.552
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.616
## Harapan Lama Sekolah (Tahun)\n -0.378
## Persentase Penduduk Miskin - Maret
## APS 13-15 -0.657
## APS 16-18 -0.260
## APS 19-24 0.038
## TPT - Februari -0.378
## TPT - Agustus -0.433
## TPAK - Februari 0.563
## TPAK - Agustus 0.596
## Persentase Penduduk Miskin - Maret 1.000
## Persentase Penduduk Miskin - September 0.967
## Indeks Pembangunan Manusia -0.812
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.553
## Harapan Lama Sekolah (Tahun)\n -0.410
## Persentase Penduduk Miskin - September
## APS 13-15 -0.652
## APS 16-18 -0.273
## APS 19-24 0.038
## TPT - Februari -0.389
## TPT - Agustus -0.441
## TPAK - Februari 0.555
## TPAK - Agustus 0.584
## Persentase Penduduk Miskin - Maret 0.967
## Persentase Penduduk Miskin - September 1.000
## Indeks Pembangunan Manusia -0.797
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.530
## Harapan Lama Sekolah (Tahun)\n -0.393
## Indeks Pembangunan Manusia
## APS 13-15 0.785
## APS 16-18 0.539
## APS 19-24 0.265
## TPT - Februari 0.467
## TPT - Agustus 0.502
## TPAK - Februari -0.522
## TPAK - Agustus -0.552
## Persentase Penduduk Miskin - Maret -0.812
## Persentase Penduduk Miskin - September -0.797
## Indeks Pembangunan Manusia 1.000
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.700
## Harapan Lama Sekolah (Tahun)\n 0.625
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## APS 13-15 0.732
## APS 16-18 0.658
## APS 19-24 0.369
## TPT - Februari 0.538
## TPT - Agustus 0.539
## TPAK - Februari -0.610
## TPAK - Agustus -0.616
## Persentase Penduduk Miskin - Maret -0.553
## Persentase Penduduk Miskin - September -0.530
## Indeks Pembangunan Manusia 0.700
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 1.000
## Harapan Lama Sekolah (Tahun)\n 0.677
## Harapan Lama Sekolah (Tahun)\n
## APS 13-15 0.735
## APS 16-18 0.790
## APS 19-24 0.728
## TPT - Februari 0.152
## TPT - Agustus 0.193
## TPAK - Februari -0.364
## TPAK - Agustus -0.378
## Persentase Penduduk Miskin - Maret -0.410
## Persentase Penduduk Miskin - September -0.393
## Indeks Pembangunan Manusia 0.625
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.677
## Harapan Lama Sekolah (Tahun)\n 1.000
det_val <- det(cor_matrix)
cat("Determinant of Correlation Matrix:", round(det_val, 8), "\n")
## Determinant of Correlation Matrix: 1.9e-07
if (det_val > 0.00001) {
cat("No extreme multicollinearity detected. Matrix is suitable for FA.\n")
} else {
cat("Determinant is very small. Check for multicollinearity.\n")
}
## Determinant is very small. Check for multicollinearity.
cor_tmp <- cor_matrix
diag(cor_tmp) <- NA
high_corr_pairs <- which(abs(cor_tmp) > 0.90, arr.ind = TRUE)
high_corr_pairs <- high_corr_pairs[high_corr_pairs[, 1] < high_corr_pairs[, 2], , drop = FALSE]
if (nrow(high_corr_pairs) > 0) {
cat("\nVariable pairs with |r| > 0.90 (potential near-redundancy):\n")
for (k in seq_len(nrow(high_corr_pairs))) {
r1 <- rownames(cor_matrix)[high_corr_pairs[k, 1]]
r2 <- colnames(cor_matrix)[high_corr_pairs[k, 2]]
cat(sprintf(" %s & %s : r = %.4f\n",
r1, r2, cor_matrix[high_corr_pairs[k, 1], high_corr_pairs[k, 2]]))
}
} else {
cat("\nNo variable pairs exceed |r| = 0.90. No near-redundancy detected.\n")
}
##
## Variable pairs with |r| > 0.90 (potential near-redundancy):
## TPT - Februari & TPT - Agustus : r = 0.9530
## TPAK - Februari & TPAK - Agustus : r = 0.9165
## Persentase Penduduk Miskin - Maret & Persentase Penduduk Miskin - September : r = 0.9667
The correlation heatmap provides a visual representation of the correlation structure. Blue shades indicate positive correlations, while red shades indicate negative correlations. Clusters of correlated variables provide an initial indication of potential latent factor groupings.
corrplot(cor_matrix,
method = "color",
type = "full",
order = "hclust",
hclust.method = "ward.D2",
col = colorRampPalette(c("red", "white", "blue"))(200),
addCoef.col = "black",
number.cex = 0.5,
tl.cex = 0.6,
tl.col = "black",
diag = FALSE,
mar = c(0, 0, 1.5, 0))
corr_tmp2 <- cor_matrix
corr_tmp2[lower.tri(corr_tmp2, diag = TRUE)] <- NA
corr_table <- na.omit(as.data.frame(as.table(corr_tmp2)))
colnames(corr_table) <- c("Variable_1", "Variable_2", "Correlation")
corr_table <- corr_table[order(-abs(corr_table$Correlation)), ]
head(corr_table, 5)
## Variable_1
## 104 Persentase Penduduk Miskin - Maret
## 52 TPT - Februari
## 78 TPAK - Februari
## 116 Persentase Penduduk Miskin - Maret
## 117 Persentase Penduduk Miskin - September
## Variable_2 Correlation
## 104 Persentase Penduduk Miskin - September 0.9667073
## 52 TPT - Agustus 0.9530010
## 78 TPAK - Agustus 0.9164744
## 116 Indeks Pembangunan Manusia -0.8121996
## 117 Indeks Pembangunan Manusia -0.7974913
The KMO measure assesses the proportion of variance among variables that might be common variance, that is, variance attributable to underlying factors. Values above 0.70 are considered acceptable, with higher values indicating greater suitability for Factor Analysis.
| KMO Value | Interpretation |
|---|---|
| >= 0.90 | Marvelous |
| >= 0.80 | Meritorious |
| >= 0.70 | Middling |
| >= 0.60 | Mediocre |
| < 0.50 | Unacceptable |
kmo_result <- KMO(data_clean)
interpret_kmo <- function(val) {
if (val >= 0.90) "Marvelous"
else if (val >= 0.80) "Meritorious"
else if (val >= 0.70) "Middling"
else if (val >= 0.60) "Mediocre"
else "Unacceptable"
}
cat("Overall KMO :", round(kmo_result$MSA, 4), "\n")
## Overall KMO : 0.8192
cat("Interpretation :", interpret_kmo(kmo_result$MSA), "\n\n")
## Interpretation : Meritorious
cat("KMO per Variable:\n")
## KMO per Variable:
print(data.frame(Variable = names(kmo_result$MSAi),
KMO = round(kmo_result$MSAi, 4)))
## Variable
## APS 13-15 APS 13-15
## APS 16-18 APS 16-18
## APS 19-24 APS 19-24
## TPT - Februari TPT - Februari
## TPT - Agustus TPT - Agustus
## TPAK - Februari TPAK - Februari
## TPAK - Agustus TPAK - Agustus
## Persentase Penduduk Miskin - Maret Persentase Penduduk Miskin - Maret
## Persentase Penduduk Miskin - September Persentase Penduduk Miskin - September
## Indeks Pembangunan Manusia Indeks Pembangunan Manusia
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## Harapan Lama Sekolah (Tahun)\n Harapan Lama Sekolah (Tahun)\n
## KMO
## APS 13-15 0.8396
## APS 16-18 0.8067
## APS 19-24 0.6362
## TPT - Februari 0.7176
## TPT - Agustus 0.7462
## TPAK - Februari 0.8311
## TPAK - Agustus 0.8252
## Persentase Penduduk Miskin - Maret 0.7843
## Persentase Penduduk Miskin - September 0.8115
## Indeks Pembangunan Manusia 0.9351
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.9307
## Harapan Lama Sekolah (Tahun)\n 0.8480
Bartlett’s test examines whether the correlation matrix is an identity matrix. If the correlation matrix were an identity matrix, all variables would be completely uncorrelated and Factor Analysis would be meaningless. A significant result (p < 0.05) indicates that the variables are sufficiently correlated to justify Factor Analysis.
bartlett_result <- cortest.bartlett(cor_matrix, n = nrow(data_clean))
cat("Bartlett's Test of Sphericity\n")
## Bartlett's Test of Sphericity
cat("Chi-square :", round(bartlett_result$chisq, 4), "\n")
## Chi-square : 2263.651
cat("df :", bartlett_result$df, "\n")
## df : 66
cat("p-value :", format(bartlett_result$p.value, scientific = TRUE), "\n")
## p-value : 0e+00
if (bartlett_result$p.value < 0.05) {
cat("p < 0.05: H0 rejected. Data is suitable for Factor Analysis.\n")
} else {
cat("p >= 0.05: H0 not rejected. Data may not be suitable for Factor Analysis.\n")
}
## p < 0.05: H0 rejected. Data is suitable for Factor Analysis.
Standardization using Z-score transformation was applied after the KMO and Bartlett tests, because both tests are based on the correlation matrix which is scale-invariant. Standardization is essential for PCA, as variables with larger variances would otherwise dominate the principal components. The same standardized data object X_scaled is used as input for both PCA and Factor Analysis to ensure a fair and consistent comparison.
X_scaled <- scale(data_clean)
cat("Mean after scaling (should be ~0):", round(mean(colMeans(X_scaled)), 6), "\n")
## Mean after scaling (should be ~0): 0
cat("SD after scaling (should be ~1):", round(mean(apply(X_scaled, 2, sd)), 6), "\n")
## SD after scaling (should be ~1): 1
PCA was conducted to reduce dimensionality and identify the main components that explain the maximum variance in the socioeconomic dataset. Unlike Factor Analysis, PCA is based on total variance and does not distinguish between common and unique variance. The number of components to retain was determined using three criteria: the Kaiser criterion (eigenvalue > 1), the cumulative variance threshold (>= 60%), and Parallel Analysis. Parallel Analysis is used as the primary criterion, as it is generally considered the most statistically rigorous method (Hayton et al., 2004).
pca_result <- prcomp(X_scaled, center = FALSE, scale. = FALSE)
pca_eigenvalues <- pca_result$sdev^2
pca_var_explained <- pca_eigenvalues / sum(pca_eigenvalues) * 100
pca_cum_var <- cumsum(pca_var_explained)
pca_table <- data.frame(
Component = 1:length(pca_eigenvalues),
Eigenvalue = round(pca_eigenvalues, 4),
Pct_Variance = round(pca_var_explained, 2),
Cumulative_Pct = round(pca_cum_var, 2),
Kaiser = ifelse(pca_eigenvalues > 1, "Yes", "No")
)
print(pca_table)
## Component Eigenvalue Pct_Variance Cumulative_Pct Kaiser
## 1 1 6.6180 55.15 55.15 Yes
## 2 2 2.3112 19.26 74.41 Yes
## 3 3 1.2471 10.39 84.80 Yes
## 4 4 0.6838 5.70 90.50 No
## 5 5 0.3457 2.88 93.38 No
## 6 6 0.2683 2.24 95.62 No
## 7 7 0.1524 1.27 96.89 No
## 8 8 0.1384 1.15 98.04 No
## 9 9 0.1000 0.83 98.87 No
## 10 10 0.0696 0.58 99.45 No
## 11 11 0.0368 0.31 99.76 No
## 12 12 0.0288 0.24 100.00 No
n_kaiser_pca <- sum(pca_eigenvalues > 1)
n_cumvar_pca <- which(pca_cum_var >= 60)[1]
cat("\nKaiser Criterion :", n_kaiser_pca, "components\n")
##
## Kaiser Criterion : 3 components
cat("Cumulative Variance >= 60% :", n_cumvar_pca,
"components (", round(pca_cum_var[n_cumvar_pca], 2), "%)\n")
## Cumulative Variance >= 60% : 2 components ( 74.41 %)
pa_pca <- fa.parallel(X_scaled,
fa = "pc",
n.iter = 100,
main = "Parallel Analysis - PCA",
show.legend = TRUE)
## Parallel analysis suggests that the number of factors = NA and the number of components = 2
n_pa_pca <- pa_pca$ncomp
cat("Parallel Analysis :", n_pa_pca, "components\n\n")
## Parallel Analysis : 2 components
n_components <- n_pa_pca
cat("Final decision (Parallel Analysis as primary criterion):", n_components, "components\n")
## Final decision (Parallel Analysis as primary criterion): 2 components
plot(pca_eigenvalues,
type = "b",
pch = 19,
col = "darkblue",
xlab = "Principal Component",
ylab = "Eigenvalue",
main = "Scree Plot - PCA",
ylim = c(0, max(pca_eigenvalues) * 1.1))
abline(h = 1, col = "red", lty = 2, lwd = 1.5)
abline(v = n_components, col = "darkgreen", lty = 3, lwd = 1.5)
legend("topright",
legend = c("Kaiser Criterion (= 1)",
paste0("Selected n = ", n_components, " (Parallel Analysis)")),
col = c("red", "darkgreen"),
lty = c(2, 3),
bty = "n")
The loadings represent the correlation between original variables and each principal component.
print(round(pca_result$rotation, 4))
## PC1
## APS 13-15 -0.3444
## APS 16-18 -0.2414
## APS 19-24 -0.1183
## TPT - Februari -0.2564
## TPT - Agustus -0.2716
## TPAK - Februari 0.2992
## TPAK - Agustus 0.3097
## Persentase Penduduk Miskin - Maret 0.3054
## Persentase Penduduk Miskin - September 0.3028
## Indeks Pembangunan Manusia -0.3393
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.3317
## Harapan Lama Sekolah (Tahun)\n -0.2721
## PC2
## APS 13-15 -0.1317
## APS 16-18 -0.4317
## APS 19-24 -0.5363
## TPT - Februari 0.3078
## TPT - Agustus 0.2936
## TPAK - Februari -0.2160
## TPAK - Agustus -0.2198
## Persentase Penduduk Miskin - Maret -0.1438
## Persentase Penduduk Miskin - September -0.1465
## Indeks Pembangunan Manusia -0.0457
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.1115
## Harapan Lama Sekolah (Tahun)\n -0.4196
## PC3
## APS 13-15 -0.0619
## APS 16-18 0.1325
## APS 19-24 0.2155
## TPT - Februari 0.4354
## TPT - Agustus 0.3914
## TPAK - Februari -0.1470
## TPAK - Agustus -0.1271
## Persentase Penduduk Miskin - Maret 0.4883
## Persentase Penduduk Miskin - September 0.4817
## Indeks Pembangunan Manusia -0.2533
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.1307
## Harapan Lama Sekolah (Tahun)\n 0.0005
## PC4
## APS 13-15 -0.0860
## APS 16-18 -0.2402
## APS 19-24 0.1771
## TPT - Februari -0.3353
## TPT - Agustus -0.3399
## TPAK - Februari -0.5763
## TPAK - Agustus -0.5192
## Persentase Penduduk Miskin - Maret 0.0489
## Persentase Penduduk Miskin - September 0.0772
## Indeks Pembangunan Manusia -0.2269
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.0150
## Harapan Lama Sekolah (Tahun)\n 0.1175
## PC5
## APS 13-15 -0.4181
## APS 16-18 -0.3790
## APS 19-24 0.6405
## TPT - Februari 0.1801
## TPT - Agustus 0.1867
## TPAK - Februari 0.0319
## TPAK - Agustus 0.0750
## Persentase Penduduk Miskin - Maret -0.1525
## Persentase Penduduk Miskin - September -0.1647
## Indeks Pembangunan Manusia 0.2299
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.2979
## Harapan Lama Sekolah (Tahun)\n 0.0733
## PC6
## APS 13-15 0.4926
## APS 16-18 0.1232
## APS 19-24 0.0135
## TPT - Februari -0.0217
## TPT - Agustus 0.1648
## TPAK - Februari -0.0019
## TPAK - Agustus -0.1514
## Persentase Penduduk Miskin - Maret 0.0674
## Persentase Penduduk Miskin - September -0.0080
## Indeks Pembangunan Manusia -0.0594
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.8221
## Harapan Lama Sekolah (Tahun)\n 0.0858
## PC7
## APS 13-15 -0.2687
## APS 16-18 0.5385
## APS 19-24 0.1605
## TPT - Februari 0.0374
## TPT - Agustus 0.0400
## TPAK - Februari -0.1273
## TPAK - Agustus 0.0392
## Persentase Penduduk Miskin - Maret -0.2074
## Persentase Penduduk Miskin - September -0.4046
## Indeks Pembangunan Manusia -0.5470
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.0805
## Harapan Lama Sekolah (Tahun)\n -0.2810
## PC8
## APS 13-15 0.0661
## APS 16-18 -0.3249
## APS 19-24 -0.1154
## TPT - Februari 0.0502
## TPT - Agustus 0.1811
## TPAK - Februari 0.2268
## TPAK - Agustus 0.0575
## Persentase Penduduk Miskin - Maret -0.1700
## Persentase Penduduk Miskin - September -0.1274
## Indeks Pembangunan Manusia -0.5918
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.1073
## Harapan Lama Sekolah (Tahun)\n 0.6163
## PC9
## APS 13-15 0.3851
## APS 16-18 -0.2245
## APS 19-24 0.3634
## TPT - Februari -0.2088
## TPT - Agustus 0.1431
## TPAK - Februari 0.4237
## TPAK - Agustus -0.3147
## Persentase Penduduk Miskin - Maret -0.0589
## Persentase Penduduk Miskin - September -0.0324
## Indeks Pembangunan Manusia -0.1742
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.2525
## Harapan Lama Sekolah (Tahun)\n -0.4774
## PC10
## APS 13-15 0.4131
## APS 16-18 -0.2461
## APS 19-24 0.1701
## TPT - Februari -0.0266
## TPT - Agustus 0.0363
## TPAK - Februari -0.4974
## TPAK - Agustus 0.6578
## Persentase Penduduk Miskin - Maret 0.0052
## Persentase Penduduk Miskin - September -0.0374
## Indeks Pembangunan Manusia -0.1293
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.0931
## Harapan Lama Sekolah (Tahun)\n -0.1758
## PC11
## APS 13-15 -0.1911
## APS 16-18 0.1076
## APS 19-24 -0.0811
## TPT - Februari -0.6285
## TPT - Agustus 0.6380
## TPAK - Februari -0.1204
## TPAK - Agustus 0.0690
## Persentase Penduduk Miskin - Maret -0.2130
## Persentase Penduduk Miskin - September 0.2672
## Indeks Pembangunan Manusia 0.0774
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.0102
## Harapan Lama Sekolah (Tahun)\n 0.0327
## PC12
## APS 13-15 -0.0581
## APS 16-18 -0.0568
## APS 19-24 -0.0584
## TPT - Februari -0.2476
## TPT - Agustus 0.2146
## TPAK - Februari -0.0049
## TPAK - Agustus -0.0092
## Persentase Penduduk Miskin - Maret 0.7048
## Persentase Penduduk Miskin - September -0.6065
## Indeks Pembangunan Manusia 0.0905
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.0713
## Harapan Lama Sekolah (Tahun)\n 0.0687
Component scores represent the transformed coordinates of provinces in the reduced-dimensional space.
pca_scores <- as.data.frame(pca_result$x)
head(pca_scores)
## PC1 PC2 PC3 PC4 PC5 PC6
## 1 -2.3385944 -0.81753946 1.8216797 0.6196472 -0.2868767 0.52823632
## 2 -1.4780383 0.21602878 0.2746672 -0.8369233 -0.1982245 -0.15525600
## 3 -2.5170324 -0.92239718 0.6337863 -0.5938328 0.8828663 0.38296703
## 4 -1.4767519 0.09712381 -0.4223455 0.9035499 -0.1313916 -0.05285344
## 5 -0.6330726 0.86905284 -0.5752711 0.4319450 -0.2429342 0.29389941
## 6 0.5386700 1.37184033 -0.2174194 -0.2567865 -0.7299706 0.24436089
## PC7 PC8 PC9 PC10 PC11 PC12
## 1 -0.015815183 0.04570412 -0.353356031 -0.46766128 0.05419049 0.18219926
## 2 0.274518462 0.15055580 0.139154946 0.01102609 0.13234260 0.12159473
## 3 0.908669444 0.19548492 -0.136271556 -0.28384336 -0.04055757 -0.11948260
## 4 0.450826559 -0.33586797 0.004319478 -0.27874119 -0.11557566 -0.08304870
## 5 0.003528384 0.11061277 0.091832455 0.06327489 0.14111966 0.05254338
## 6 -0.282485122 -0.04758487 -0.037531699 -0.17180244 -0.05797341 0.15314752
The biplot simultaneously visualizes provinces and variables in the space of the first two principal components.
biplot(pca_result, scale = 0, cex = 0.6)
Three complementary methods were used to determine the appropriate number of factors: the Kaiser criterion (eigenvalue > 1), the Scree Plot, and Parallel Analysis. Parallel Analysis is the primary criterion, as it compares observed eigenvalues against those generated from random data of the same dimensions, providing a statistically principled threshold that avoids over- or under-extraction.
Because X_scaled is the input for Factor Analysis, its covariance matrix is equivalent to the correlation matrix of the original variables. The eigenvalues below are therefore identical to those derived from the PCA step above. They are presented here alongside the FA-specific Parallel Analysis for transparency.
eigen_values <- pca_eigenvalues
cum_var_fa <- cumsum(eigen_values / sum(eigen_values)) * 100
eigen_table <- data.frame(
Component = 1:length(eigen_values),
Eigenvalue = round(eigen_values, 4),
Pct_Variance = round(eigen_values / sum(eigen_values) * 100, 2),
Cumulative_Pct = round(cum_var_fa, 2),
Kaiser = ifelse(eigen_values > 1, "Yes", "No")
)
print(eigen_table)
## Component Eigenvalue Pct_Variance Cumulative_Pct Kaiser
## 1 1 6.6180 55.15 55.15 Yes
## 2 2 2.3112 19.26 74.41 Yes
## 3 3 1.2471 10.39 84.80 Yes
## 4 4 0.6838 5.70 90.50 No
## 5 5 0.3457 2.88 93.38 No
## 6 6 0.2683 2.24 95.62 No
## 7 7 0.1524 1.27 96.89 No
## 8 8 0.1384 1.15 98.04 No
## 9 9 0.1000 0.83 98.87 No
## 10 10 0.0696 0.58 99.45 No
## 11 11 0.0368 0.31 99.76 No
## 12 12 0.0288 0.24 100.00 No
n_kaiser <- sum(eigen_values > 1)
cat("\nFactors with eigenvalue > 1 (Kaiser):", n_kaiser, "\n")
##
## Factors with eigenvalue > 1 (Kaiser): 3
plot(eigen_values,
type = "b",
pch = 19,
col = "steelblue",
xlab = "Component",
ylab = "Eigenvalue",
main = "Scree Plot - Factor Analysis",
ylim = c(0, max(eigen_values) * 1.1))
abline(h = 1, col = "red", lty = 2, lwd = 1.5)
legend("topright", legend = "Kaiser Criterion (= 1)",
col = "red", lty = 2, bty = "n")
The result of Parallel Analysis using fa = “fa” is saved and used as the primary criterion for determining the number of factors to extract. This is consistent with the approach used for PCA, ensuring methodological coherence across both analyses.
pa_fa <- fa.parallel(X_scaled,
fa = "fa",
n.iter = 100,
main = "Parallel Analysis - Factor Analysis",
show.legend = TRUE)
## Parallel analysis suggests that the number of factors = 3 and the number of components = NA
n_pa_fa <- pa_fa$nfact
cat("Kaiser Criterion :", n_kaiser, "factors\n")
## Kaiser Criterion : 3 factors
cat("Parallel Analysis :", n_pa_fa, "factors\n\n")
## Parallel Analysis : 3 factors
n_factors <- n_pa_fa
cat("Final decision (Parallel Analysis as primary criterion):", n_factors, "factors\n")
## Final decision (Parallel Analysis as primary criterion): 3 factors
Based on Parallel Analysis as the primary criterion, n_factors factors were extracted using Principal Axis Factoring (PAF). PAF is preferred over Maximum Likelihood when the normality assumption is not fully met, which is common in real-world socioeconomic data (Fabrigar et al., 1999).
fa_result <- fa(X_scaled,
nfactors = n_factors,
rotate = "none",
fm = "pa")
cat("Factor Variance (Before Rotation):\n")
## Factor Variance (Before Rotation):
print(fa_result$Vaccounted)
## PA1 PA2 PA3
## SS loadings 6.4306748 2.0970008 1.13581570
## Proportion Var 0.5358896 0.1747501 0.09465131
## Cumulative Var 0.5358896 0.7106396 0.80529094
## Proportion Explained 0.6654608 0.2170024 0.11753679
## Cumulative Proportion 0.6654608 0.8824632 1.00000000
Two rotation methods are applied for comparison. Varimax (orthogonal) is used when factors are assumed to be uncorrelated, while Oblimin (oblique) is used when factors are allowed to correlate. In social science research, oblique rotation is often more theoretically realistic because underlying constructs rarely operate entirely independently. The factor correlation matrix Phi from Oblimin is inspected to determine whether orthogonality is a reasonable assumption: if inter-factor correlations are below 0.30, Varimax is appropriate; if they are 0.30 or above, Oblimin is preferred.
fa_varimax <- fa(X_scaled,
nfactors = n_factors,
rotate = "varimax",
fm = "pa")
cat("Factor Loadings (Varimax):\n")
## Factor Loadings (Varimax):
print(fa_varimax$loadings, cutoff = 0.30, sort = TRUE)
##
## Loadings:
## PA1
## TPT - Februari 0.925
## TPT - Agustus 0.903
## TPAK - Februari -0.670
## TPAK - Agustus -0.689
## Persentase Penduduk Miskin - Maret
## Persentase Penduduk Miskin - September
## Indeks Pembangunan Manusia 0.345
## APS 13-15 0.404
## APS 16-18
## APS 19-24
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.506
## Harapan Lama Sekolah (Tahun)\n
## PA3
## TPT - Februari
## TPT - Agustus
## TPAK - Februari 0.414
## TPAK - Agustus 0.445
## Persentase Penduduk Miskin - Maret 0.953
## Persentase Penduduk Miskin - September 0.925
## Indeks Pembangunan Manusia -0.698
## APS 13-15 -0.539
## APS 16-18
## APS 19-24
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.385
## Harapan Lama Sekolah (Tahun)\n -0.311
## PA2
## TPT - Februari
## TPT - Agustus
## TPAK - Februari
## TPAK - Agustus
## Persentase Penduduk Miskin - Maret
## Persentase Penduduk Miskin - September
## Indeks Pembangunan Manusia 0.434
## APS 13-15 0.588
## APS 16-18 0.859
## APS 19-24 0.804
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.575
## Harapan Lama Sekolah (Tahun)\n 0.891
##
## PA1 PA3 PA2
## SS loadings 3.309 3.244 3.110
## Proportion Var 0.276 0.270 0.259
## Cumulative Var 0.276 0.546 0.805
fa_oblimin <- fa(X_scaled,
nfactors = n_factors,
rotate = "oblimin",
fm = "pa")
cat("Pattern Matrix (Oblimin):\n")
## Pattern Matrix (Oblimin):
print(fa_oblimin$loadings, cutoff = 0.30, sort = TRUE)
##
## Loadings:
## PA2
## APS 13-15 0.532
## APS 16-18 0.883
## APS 19-24 0.870
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.528
## Harapan Lama Sekolah (Tahun)\n 0.901
## TPT - Februari
## TPT - Agustus
## TPAK - Februari
## TPAK - Agustus
## Persentase Penduduk Miskin - Maret
## Persentase Penduduk Miskin - September
## Indeks Pembangunan Manusia 0.352
## PA3
## APS 13-15
## APS 16-18
## APS 19-24
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.402
## Harapan Lama Sekolah (Tahun)\n
## TPT - Februari 0.994
## TPT - Agustus 0.949
## TPAK - Februari -0.610
## TPAK - Agustus -0.621
## Persentase Penduduk Miskin - Maret
## Persentase Penduduk Miskin - September
## Indeks Pembangunan Manusia
## PA1
## APS 13-15 -0.394
## APS 16-18
## APS 19-24
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## Harapan Lama Sekolah (Tahun)\n
## TPT - Februari
## TPT - Agustus
## TPAK - Februari
## TPAK - Agustus
## Persentase Penduduk Miskin - Maret 1.013
## Persentase Penduduk Miskin - September 0.977
## Indeks Pembangunan Manusia -0.625
##
## PA2 PA3 PA1
## SS loadings 3.048 2.901 2.822
## Proportion Var 0.254 0.242 0.235
## Cumulative Var 0.254 0.496 0.731
cat("\nFactor Correlation Matrix (Phi):\n")
##
## Factor Correlation Matrix (Phi):
print(round(fa_oblimin$Phi, 4))
## PA2 PA3 PA1
## PA2 1.0000 0.2544 -0.3448
## PA3 0.2544 1.0000 -0.5100
## PA1 -0.3448 -0.5100 1.0000
Communalities represent the proportion of each variable’s variance explained by the extracted factors. A communality of 0.50 or above is generally considered acceptable, indicating that the factor model accounts for at least half of the variance in each variable.
comm_df <- data.frame(
Variable = names(fa_varimax$communality),
Communality = round(fa_varimax$communality, 4),
Status = ifelse(fa_varimax$communality >= 0.5, "Adequate", "Low (<0.5)")
)
print(comm_df)
## Variable
## APS 13-15 APS 13-15
## APS 16-18 APS 16-18
## APS 19-24 APS 19-24
## TPT - Februari TPT - Februari
## TPT - Agustus TPT - Agustus
## TPAK - Februari TPAK - Februari
## TPAK - Agustus TPAK - Agustus
## Persentase Penduduk Miskin - Maret Persentase Penduduk Miskin - Maret
## Persentase Penduduk Miskin - September Persentase Penduduk Miskin - September
## Indeks Pembangunan Manusia Indeks Pembangunan Manusia
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## Harapan Lama Sekolah (Tahun)\n Harapan Lama Sekolah (Tahun)\n
## Communality
## APS 13-15 0.7998
## APS 16-18 0.7828
## APS 19-24 0.6618
## TPT - Februari 0.8739
## TPT - Agustus 0.8551
## TPAK - Februari 0.6434
## TPAK - Agustus 0.6964
## Persentase Penduduk Miskin - Maret 0.9836
## Persentase Penduduk Miskin - September 0.9352
## Indeks Pembangunan Manusia 0.7945
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.7346
## Harapan Lama Sekolah (Tahun)\n 0.9024
## Status
## APS 13-15 Adequate
## APS 16-18 Adequate
## APS 19-24 Adequate
## TPT - Februari Adequate
## TPT - Agustus Adequate
## TPAK - Februari Adequate
## TPAK - Agustus Adequate
## Persentase Penduduk Miskin - Maret Adequate
## Persentase Penduduk Miskin - September Adequate
## Indeks Pembangunan Manusia Adequate
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi Adequate
## Harapan Lama Sekolah (Tahun)\n Adequate
low_comm <- comm_df$Variable[fa_varimax$communality < 0.5]
if (length(low_comm) > 0) {
cat("\nVariables with low communality:", paste(low_comm, collapse = ", "), "\n")
} else {
cat("\nAll variables have communality >= 0.50.\n")
}
##
## All variables have communality >= 0.50.
The heatmap displays the Varimax factor loading matrix. Darker blue indicates a strong positive loading and darker red indicates a strong negative loading. Variables with a loading magnitude above 0.50 are considered to have a substantial relationship with the corresponding factor.
loadings_matrix <- matrix(fa_varimax$loadings,
nrow = nrow(fa_varimax$loadings),
ncol = ncol(fa_varimax$loadings))
rownames(loadings_matrix) <- rownames(fa_varimax$loadings)
colnames(loadings_matrix) <- paste0("Factor ", 1:n_factors)
corrplot(loadings_matrix,
is.corr = FALSE,
col = colorRampPalette(c("red", "white", "blue"))(200),
tl.cex = 0.7,
tl.col = "black",
cl.lim = c(-1, 1),
addCoef.col = "black",
number.cex = 0.65,
title = "Factor Loadings Heatmap (Varimax)",
mar = c(0, 0, 2, 0))
var_table <- fa_varimax$Vaccounted
print(round(var_table, 4))
## PA1 PA3 PA2
## SS loadings 3.3093 3.2443 3.1099
## Proportion Var 0.2758 0.2704 0.2592
## Cumulative Var 0.2758 0.5461 0.8053
## Proportion Explained 0.3425 0.3357 0.3218
## Cumulative Proportion 0.3425 0.6782 1.0000
total_var <- sum(var_table["Proportion Var", ]) * 100
cat("\nTotal Variance Explained:", round(total_var, 2), "%\n")
##
## Total Variance Explained: 80.53 %
if (total_var >= 60) {
cat("Total variance >= 60%: Model is acceptable (Hair et al., 2019).\n")
} else {
cat("Total variance < 60%: Consider extracting additional factors.\n")
}
## Total variance >= 60%: Model is acceptable (Hair et al., 2019).
prop_var <- var_table["Proportion Var", ] * 100
cum_var_f <- cumsum(prop_var)
factor_labels <- paste0("Factor ", 1:n_factors)
barplot(prop_var,
names.arg = factor_labels,
col = "steelblue",
border = "white",
ylim = c(0, max(cum_var_f) * 1.15),
ylab = "% Variance Explained",
main = "Variance Explained per Factor")
lines(x = seq(0.7, by = 1.2, length.out = n_factors),
y = cum_var_f,
col = "red", type = "b", pch = 19, lwd = 2)
abline(h = 60, col = "darkgreen", lty = 2, lwd = 1.5)
legend("topright",
legend = c("% per Factor", "Cumulative %", "60% Target"),
col = c("steelblue", "red", "darkgreen"),
lty = c(NA, 1, 2), pch = c(15, 19, NA),
bty = "n")
The factor structure was interpreted based on the pattern of high loadings (absolute loading >= 0.40) within each factor. Variables that load strongly on the same factor are assumed to share a common underlying construct. Factor names should be assigned based on the theoretical meaning of the variables grouped together, following Hair et al. (2019) who recommend that factor naming be grounded in substantive theory rather than statistical criteria alone.
loadings_df <- as.data.frame(matrix(fa_varimax$loadings,
nrow = nrow(fa_varimax$loadings),
ncol = ncol(fa_varimax$loadings)))
rownames(loadings_df) <- rownames(fa_varimax$loadings)
colnames(loadings_df) <- paste0("Factor", 1:n_factors)
for (i in 1:n_factors) {
col_name <- paste0("Factor", i)
mask <- abs(loadings_df[[col_name]]) >= 0.40
vars_in_f <- loadings_df[mask, col_name, drop = FALSE]
vars_in_f <- vars_in_f[order(-abs(vars_in_f[[1]])), , drop = FALSE]
cat(paste0("Factor ", i, " (n variables = ", sum(mask), "):\n"))
print(round(vars_in_f, 4))
cat("Factor Name: [Fill in based on substantive theory]\n\n")
}
## Factor 1 (n variables = 6):
## Factor1
## TPT - Februari 0.9250
## TPT - Agustus 0.9031
## TPAK - Agustus -0.6891
## TPAK - Februari -0.6702
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.5057
## APS 13-15 0.4037
## Factor Name: [Fill in based on substantive theory]
##
## Factor 2 (n variables = 6):
## Factor2
## Persentase Penduduk Miskin - Maret 0.9534
## Persentase Penduduk Miskin - September 0.9248
## Indeks Pembangunan Manusia -0.6981
## APS 13-15 -0.5390
## TPAK - Agustus 0.4451
## TPAK - Februari 0.4144
## Factor Name: [Fill in based on substantive theory]
##
## Factor 3 (n variables = 6):
## Factor3
## Harapan Lama Sekolah (Tahun)\n 0.8912
## APS 16-18 0.8589
## APS 19-24 0.8044
## APS 13-15 0.5884
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.5748
## Indeks Pembangunan Manusia 0.4336
## Factor Name: [Fill in based on substantive theory]
This section provides a side-by-side comparison of PCA and Factor Analysis to highlight key conceptual and empirical differences. Both analyses used the same standardized input X_scaled.
pca_var_selected <- pca_var_explained[1:n_components]
pca_total_var <- sum(pca_var_selected)
fa_prop_var <- var_table["Proportion Var", ] * 100
fa_total_var <- sum(fa_prop_var)
cat("PCA - Variance per Component:\n")
## PCA - Variance per Component:
for (i in 1:n_components) {
cat(sprintf(" PC%d: %.2f%%\n", i, pca_var_selected[i]))
}
## PC1: 55.15%
## PC2: 19.26%
cat(sprintf(" Total (%d components): %.2f%%\n\n", n_components, pca_total_var))
## Total (2 components): 74.41%
cat("FA - Variance per Factor (Varimax):\n")
## FA - Variance per Factor (Varimax):
for (i in 1:n_factors) {
cat(sprintf(" Factor %d: %.2f%%\n", i, fa_prop_var[i]))
}
## Factor 1: 27.58%
## Factor 2: 27.04%
## Factor 3: 25.92%
cat(sprintf(" Total (%d factors): %.2f%%\n", n_factors, fa_total_var))
## Total (3 factors): 80.53%
In PCA, the sum of squared loadings per variable across all retained components approximates the proportion of variance reproduced (analogous to R²). In FA, this is the communality, the proportion of variance shared via the common factors. PCA R² computed from a subset of retained components is not guaranteed to exceed FA communality; the comparison below should therefore be read as a variable-level diagnostic rather than as a directional rule.
pca_load_retained <- pca_result$rotation[, 1:n_components, drop = FALSE]
pca_r2 <- rowSums(pca_load_retained^2)
fa_comm <- fa_varimax$communality
comparison_df <- data.frame(
Variable = names(fa_comm),
PCA_R2 = round(pca_r2, 4),
FA_Communality = round(fa_comm, 4),
Difference = round(pca_r2 - fa_comm, 4)
)
print(comparison_df)
## Variable
## APS 13-15 APS 13-15
## APS 16-18 APS 16-18
## APS 19-24 APS 19-24
## TPT - Februari TPT - Februari
## TPT - Agustus TPT - Agustus
## TPAK - Februari TPAK - Februari
## TPAK - Agustus TPAK - Agustus
## Persentase Penduduk Miskin - Maret Persentase Penduduk Miskin - Maret
## Persentase Penduduk Miskin - September Persentase Penduduk Miskin - September
## Indeks Pembangunan Manusia Indeks Pembangunan Manusia
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi
## Harapan Lama Sekolah (Tahun)\n Harapan Lama Sekolah (Tahun)\n
## PCA_R2
## APS 13-15 0.1359
## APS 16-18 0.2446
## APS 19-24 0.3016
## TPT - Februari 0.1605
## TPT - Agustus 0.1600
## TPAK - Februari 0.1362
## TPAK - Agustus 0.1442
## Persentase Penduduk Miskin - Maret 0.1139
## Persentase Penduduk Miskin - September 0.1132
## Indeks Pembangunan Manusia 0.1172
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.1224
## Harapan Lama Sekolah (Tahun)\n 0.2501
## FA_Communality
## APS 13-15 0.7998
## APS 16-18 0.7828
## APS 19-24 0.6618
## TPT - Februari 0.8739
## TPT - Agustus 0.8551
## TPAK - Februari 0.6434
## TPAK - Agustus 0.6964
## Persentase Penduduk Miskin - Maret 0.9836
## Persentase Penduduk Miskin - September 0.9352
## Indeks Pembangunan Manusia 0.7945
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.7346
## Harapan Lama Sekolah (Tahun)\n 0.9024
## Difference
## APS 13-15 -0.6638
## APS 16-18 -0.5382
## APS 19-24 -0.3602
## TPT - Februari -0.7135
## TPT - Agustus -0.6951
## TPAK - Februari -0.5072
## TPAK - Agustus -0.5522
## Persentase Penduduk Miskin - Maret -0.8697
## Persentase Penduduk Miskin - September -0.8220
## Indeks Pembangunan Manusia -0.6772
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.6121
## Harapan Lama Sekolah (Tahun)\n -0.6523
cat("\nMean PCA R2 :", round(mean(pca_r2), 4), "\n")
##
## Mean PCA R2 : 0.1667
cat("Mean FA Communality :", round(mean(fa_comm), 4), "\n")
## Mean FA Communality : 0.8053
cat("\nPCA R2 reflects only", n_components, "retained components.\n")
##
## PCA R2 reflects only 2 retained components.
cat("When PCA R2 < FA communality, FA better captures shared variance\n")
## When PCA R2 < FA communality, FA better captures shared variance
cat("relative to the retained PCA solution.\n")
## relative to the retained PCA solution.
summary_table <- data.frame(
Aspect = c("Method objective",
"Variance modelled",
"Input data",
"Number retained",
"Rotation",
"Total variance explained",
"Interpretability"),
PCA = c("Dimensionality reduction",
"Total variance",
"X_scaled",
paste0(n_components, " components (Parallel Analysis)"),
"None (orthogonal by design)",
paste0(round(pca_total_var, 2), "%"),
"Linear combinations of variables"),
Factor_Analysis = c("Latent construct discovery",
"Common variance only",
"X_scaled",
paste0(n_factors, " factors (Parallel Analysis, fa=fa)"),
"Varimax (orthogonal) & Oblimin (oblique)",
paste0(round(fa_total_var, 2), "%"),
"Underlying latent constructs")
)
print(summary_table, right = FALSE)
## Aspect PCA
## 1 Method objective Dimensionality reduction
## 2 Variance modelled Total variance
## 3 Input data X_scaled
## 4 Number retained 2 components (Parallel Analysis)
## 5 Rotation None (orthogonal by design)
## 6 Total variance explained 74.41%
## 7 Interpretability Linear combinations of variables
## Factor_Analysis
## 1 Latent construct discovery
## 2 Common variance only
## 3 X_scaled
## 4 3 factors (Parallel Analysis, fa=fa)
## 5 Varimax (orthogonal) & Oblimin (oblique)
## 6 80.53%
## 7 Underlying latent constructs
n_show <- min(2, n_components, n_factors)
pca_load_df <- as.data.frame(round(pca_result$rotation[, 1:n_show, drop = FALSE], 4))
colnames(pca_load_df) <- paste0("PC", 1:n_show)
fa_load_mat <- matrix(fa_varimax$loadings,
nrow = nrow(fa_varimax$loadings),
ncol = ncol(fa_varimax$loadings))
rownames(fa_load_mat) <- rownames(fa_varimax$loadings)
fa_load_df <- as.data.frame(round(fa_load_mat[, 1:n_show, drop = FALSE], 4))
colnames(fa_load_df) <- paste0("FA_F", 1:n_show)
print(cbind(pca_load_df, fa_load_df))
## PC1
## APS 13-15 -0.3444
## APS 16-18 -0.2414
## APS 19-24 -0.1183
## TPT - Februari -0.2564
## TPT - Agustus -0.2716
## TPAK - Februari 0.2992
## TPAK - Agustus 0.3097
## Persentase Penduduk Miskin - Maret 0.3054
## Persentase Penduduk Miskin - September 0.3028
## Indeks Pembangunan Manusia -0.3393
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.3317
## Harapan Lama Sekolah (Tahun)\n -0.2721
## PC2
## APS 13-15 -0.1317
## APS 16-18 -0.4317
## APS 19-24 -0.5363
## TPT - Februari 0.3078
## TPT - Agustus 0.2936
## TPAK - Februari -0.2160
## TPAK - Agustus -0.2198
## Persentase Penduduk Miskin - Maret -0.1438
## Persentase Penduduk Miskin - September -0.1465
## Indeks Pembangunan Manusia -0.0457
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.1115
## Harapan Lama Sekolah (Tahun)\n -0.4196
## FA_F1
## APS 13-15 0.4037
## APS 16-18 0.1441
## APS 19-24 -0.0768
## TPT - Februari 0.9250
## TPT - Agustus 0.9031
## TPAK - Februari -0.6702
## TPAK - Agustus -0.6891
## Persentase Penduduk Miskin - Maret -0.2581
## Persentase Penduduk Miskin - September -0.2681
## Indeks Pembangunan Manusia 0.3452
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi 0.5057
## Harapan Lama Sekolah (Tahun)\n 0.1054
## FA_F2
## APS 13-15 -0.5390
## APS 16-18 -0.1561
## APS 19-24 0.0943
## TPT - Februari -0.1323
## TPT - Agustus -0.1903
## TPAK - Februari 0.4144
## TPAK - Agustus 0.4451
## Persentase Penduduk Miskin - Maret 0.9534
## Persentase Penduduk Miskin - September 0.9248
## Indeks Pembangunan Manusia -0.6981
## Rata-Rata Lama Sekolah Penduduk Umur 15 Tahun ke Atas Menurut Provinsi -0.3853
## Harapan Lama Sekolah (Tahun)\n -0.3114
The analysis was conducted through a systematic procedure encompassing data cleaning, sample size assessment, distributional screening, outlier detection, correlation analysis, suitability testing (Bartlett and KMO), standardization, PCA, factor extraction, rotation, and model evaluation.
Parallel Analysis was consistently applied as the primary criterion for determining the number of retained components in PCA and factors in FA, ensuring full methodological coherence across both analyses. The Kaiser criterion and Scree Plot served as supplementary cross-checks. Factor extraction used Principal Axis Factoring, which does not require multivariate normality, making it appropriate for the distributional characteristics of BPS provincial data.
The Bartlett test confirmed sufficient intercorrelation among variables, and the KMO value indicated adequate sampling adequacy for Factor Analysis. Communalities indicate that the factor structure adequately captures shared variance across variables. Oblimin rotation results, particularly the Phi matrix, were used to evaluate whether orthogonal rotation via Varimax is a defensible choice.
Sample size limitations inherent to province-level BPS data were acknowledged. Given that the number of observations is typically around 34 to 38, results should be treated as exploratory and interpreted with appropriate caution, particularly if the observation-to-variable ratio falls below the 5:1 guideline. The resulting factors provide a theoretically grounded and statistically principled basis for further analysis, policy interpretation, or predictive modelling.