Data yang digunakan berasal dari Profil Kesehatan Provinsi 2017. Indikator yang dicatatkan berupa beberapa indikator sosial-ekonomi yaitu: \(X_1\) : Angka Harapan Hidup \(X_2\) : Rasio Puskesmas per 30 Ribu Penduduk \(X_3\) : Rasio Rumah Sakit per 100 Ribu Penduduk \(X_4\) : Persentase Rumah Tangga yang melaksanakan 10 indikator berperilaku hidup bersih dan sehat (PHBS) \(X_5\) : Persentase RT dengan Sanitasi Layak \(X_6\) : Persentase Bayi dengan Berat Badan Lahir Rendah dengan berat badan kurang dari 2,5 kg saat dilahirkan \(X_7\) : Persentase Bayi yang Mendapat ASI Eksklusif sampai bayi berusia 2 tahun \(X_8\) : Angka Kesakitan Diare (per 1000 ribu penduduk)
Principal Component Analysis (PCA) adalah teknik analisis multivariat yang digunakan untuk mengurangi dimensi variabel yang ada dalam sebuah dataset yang kompleks dengan mentransformasikan variabel-variabel tersebut ke dalam bentuk komponen utama yang saling bebas secara linear. PCA mengambil sebuah matriks data yang terdiri dari n observasi dan p variabel, kemudian melakukan transformasi linier pada data tersebut untuk menghasilkan sejumlah komponen utama baru yang dapat menggambarkan sebagian besar variabilitas dalam dataset. PCA dapat digunakan untuk berbagai jenis data, termasuk data numerik, data kategorik, dan data ordinal. PCA juga dapat digunakan untuk mengekstraksi fitur penting dari data dalam aplikasi machine learning dan deep learning.
library(readxl)
DATA_ANMUL_PRAK <- read_excel("E:/KULIAH/SEMESTER 5/Analisis Multivariat 1/DATA ANMUL PRAK.xlsx",
col_types = c("numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric"))
View(DATA_ANMUL_PRAK)
summary(DATA_ANMUL_PRAK)
## AHH Rasio Puskemas Rasio Rumah Sakit Persenatse PHBS
## Min. :66.04 Min. : 0.357 Min. : 0.14 Min. : 24.20
## 1st Qu.:69.84 1st Qu.: 0.684 1st Qu.: 0.45 1st Qu.: 46.67
## Median :71.21 Median : 0.823 Median : 0.67 Median : 51.95
## Mean :70.83 Mean : 1523.084 Mean : 3387.66 Mean : 53.68
## 3rd Qu.:72.23 3rd Qu.: 0.916 3rd Qu.: 0.92 3rd Qu.: 61.48
## Max. :73.71 Max. :13011.000 Max. :42253.00 Max. :100.00
## Persentase Sanitasi Layak Persentase BBLR Persentase Asi Ekslusif
## Min. : 25.20 Min. :1.100 Min. :55.20
## 1st Qu.: 69.50 1st Qu.:3.125 1st Qu.:70.05
## Median : 77.80 Median :4.100 Median :75.20
## Mean : 76.85 Mean :4.170 Mean :74.34
## 3rd Qu.: 88.08 3rd Qu.:4.750 3rd Qu.:80.42
## Max. :100.00 Max. :7.700 Max. :88.20
## Angka Kesakitan Diare
## Min. :2.700e+07
## 1st Qu.:2.700e+09
## Median :2.700e+09
## Mean :2.449e+09
## 3rd Qu.:2.700e+09
## Max. :2.700e+09
Terlihat bahwa setiap variabel memiliki rentang nilai yang sangat jauh, sehingga akan diperlukan standarisasi.
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.2.3
## corrplot 0.92 loaded
DATA_cor <- cor(DATA_ANMUL_PRAK)
corrplot(DATA_cor, method="number")
Terdapat korelasi yang cukup positif pada \(X_1\) dengan \(X_3\) serta \(X_1\) dengan \(X_5\). Terdapat korelasi yang kuat negatif pada \(X_1\) dengan \(X_6\) serta \(X_5\) dengan \(X_6\). # Menampilkan Kumulatif Nilai Eigen dan Vektor Eigen
sc <- scale(DATA_ANMUL_PRAK)
sc
## AHH Rasio Puskemas Rasio Rumah Sakit Persenatse PHBS
## [1,] 0.224838765 2.8976156 -0.3927479 -0.50822914
## [2,] 0.679249742 2.3124383 -0.3927308 0.72094660
## [3,] 1.095793137 -0.3839298 -0.3927940 -0.92856903
## [4,] 1.275664149 -0.3839350 1.0695939 -0.76934937
## [5,] 1.020057974 -0.3840125 -0.3927304 -0.28532162
## [6,] 0.669782846 -0.3839906 -0.3927365 0.05859283
## [7,] 0.608248026 -0.3840554 -0.3927073 -0.13247075
## [8,] -0.631915265 -0.3839875 -0.3927437 -1.39349042
## [9,] -1.086326242 -0.3840143 -0.3927488 0.89927261
## [10,] -0.305307375 -0.3839578 -0.3927096 -0.45091006
## [11,] -2.269688161 -0.3839239 -0.3927655 -1.21516440
## [12,] -1.091059689 -0.3839798 -0.3927251 -1.75651123
## [13,] -2.066149911 -0.3839538 -0.3927505 -1.87751817
## [14,] -0.442577358 -0.3840144 -0.3927674 -0.40632856
## [15,] 1.360866207 -0.3840798 1.0408375 0.81647839
## [16,] 0.598781131 -0.3839841 0.7673033 -0.08788925
## [17,] 0.489911834 -0.3839646 0.8100901 0.31971307
## [18,] 0.130169811 -0.3840256 -0.3927555 -0.41269734
## [19,] -0.030767410 2.5094312 -0.3927596 -0.21526497
## [20,] 0.627181817 2.2640098 -0.3927001 -0.43180370
## [21,] 0.428377015 -0.3839511 -0.3927689 2.95002176
## [22,] -0.002366724 -0.3839509 -0.3927176 0.19870613
## [23,] -0.016567067 -0.3839553 -0.3927609 1.61257666
## [24,] 0.489911834 -0.3839598 -0.3927034 1.27503099
## [25,] 0.721850771 -0.3839815 1.2314649 0.86105989
## [26,] -0.480444939 -0.3839985 -0.3927511 0.55535816
## [27,] -1.498136189 -0.3840041 -0.3927866 0.28786914
## [28,] -1.791609945 -0.3839946 -0.3927302 0.21781249
## [29,] -0.059168096 -0.3839600 -0.3927679 0.30060671
## [30,] 1.351399312 -0.3839301 4.5065696 -0.20252740
## Persentase Sanitasi Layak Persentase BBLR Persentase Asi Ekslusif
## [1,] -0.22453063 1.35930138 -0.37585908
## [2,] 0.61106438 0.30284953 1.05008876
## [3,] -0.34958567 0.02112904 -0.49568663
## [4,] 0.38937591 -0.68317220 -0.86715203
## [5,] 0.64517030 -0.26059146 1.64922650
## [6,] -0.29842679 -0.82403245 0.28319244
## [7,] -0.01421080 -1.03532282 -0.51965214
## [8,] -0.55422118 0.51413990 1.02612325
## [9,] -0.90664901 1.50016163 1.12198529
## [10,] 0.38937591 -0.75360232 0.54681305
## [11,] -1.29886708 1.99317250 -0.45973836
## [12,] -2.23109553 2.48618336 0.66664060
## [13,] -0.50874662 1.00715077 -0.11223847
## [14,] -0.44053479 -0.11973121 -0.72335897
## [15,] 0.61674870 -2.16220479 -1.15473815
## [16,] 0.19610903 -0.11973121 -0.01637643
## [17,] 0.58264278 0.44370978 0.88233019
## [18,] -2.93595118 0.02112904 1.08603702
## [19,] 0.75885669 -0.68317220 0.22327867
## [20,] 1.16244340 0.09155916 0.75051988
## [21,] 0.09379128 -0.11973121 0.33112346
## [22,] 1.15107476 0.23241941 1.66120926
## [23,] 0.01421080 0.30284953 0.54681305
## [24,] 1.31592004 -0.11973121 -0.12422123
## [25,] 1.23065524 -1.03532282 -0.37585908
## [26,] -0.01421080 -1.31704331 -2.29309987
## [27,] -0.11652856 0.72543027 -1.76585865
## [28,] 0.99191381 -0.75360232 -1.46628978
## [29,] -1.24202388 0.02112904 0.34310622
## [30,] 0.98622949 -1.03532282 -1.41835876
## Angka Kesakitan Diare
## [1,] 0.3272483
## [2,] 0.3273397
## [3,] 0.3273965
## [4,] 0.3273187
## [5,] 0.3273965
## [6,] 0.3273952
## [7,] 0.3273558
## [8,] 0.3273326
## [9,] 0.3273607
## [10,] 0.3273426
## [11,] -3.1573655
## [12,] 0.3273641
## [13,] 0.3273855
## [14,] 0.3273370
## [15,] 0.3273357
## [16,] 0.3274069
## [17,] 0.3273494
## [18,] 0.3274135
## [19,] -2.8405724
## [20,] 0.3272686
## [21,] 0.3273175
## [22,] 0.3273120
## [23,] 0.3274077
## [24,] 0.3273707
## [25,] -2.8405778
## [26,] 0.3273419
## [27,] 0.3273313
## [28,] 0.3273441
## [29,] 0.3274198
## [30,] 0.3273232
## attr(,"scaled:center")
## AHH Rasio Puskemas Rasio Rumah Sakit
## 7.083500e+01 1.523084e+03 3.387660e+03
## Persenatse PHBS Persentase Sanitasi Layak Persentase BBLR
## 5.368000e+01 7.685000e+01 4.170000e+00
## Persentase Asi Ekslusif Angka Kesakitan Diare
## 7.433667e+01 2.448894e+09
## attr(,"scaled:scale")
## AHH Rasio Puskemas Rasio Rumah Sakit
## 2.112625e+00 3.964610e+03 8.624152e+03
## Persenatse PHBS Persentase Sanitasi Layak Persentase BBLR
## 1.570158e+01 1.759225e+01 1.419847e+00
## Persentase Asi Ekslusif Angka Kesakitan Diare
## 8.345326e+00 7.670616e+08
s <- cov(sc)
s_eig <- eigen(s)
s_eig
## eigen() decomposition
## $values
## [1] 2.6118147 1.3746642 1.2422858 0.9440815 0.7260987 0.4954067 0.4600467
## [8] 0.1456017
##
## $vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.46239339 -0.286270713 0.19514429 -0.3944093 -0.12860292 0.28104855
## [2,] 0.01138205 -0.602500980 -0.44494381 -0.1720228 0.47554461 -0.07776324
## [3,] -0.38368732 0.313079082 -0.13275094 -0.5108505 -0.28284234 -0.50476545
## [4,] -0.32131255 -0.225033987 0.22202247 0.6714161 -0.22378184 -0.31796991
## [5,] -0.45713872 -0.226684561 -0.22304517 0.1518141 0.12480978 -0.32209118
## [6,] 0.52689600 -0.104144571 -0.02335993 -0.1072818 -0.08812409 -0.62873412
## [7,] 0.21511145 -0.586623100 0.30232411 -0.2020650 -0.51849603 0.02934568
## [8,] -0.05231593 0.004919114 0.74520617 -0.1664165 0.57884285 -0.24323053
## [,7] [,8]
## [1,] 0.27148249 -0.58427020
## [2,] 0.32778548 0.26393764
## [3,] 0.15537574 0.34241052
## [4,] 0.43349730 0.08319441
## [5,] -0.70788443 -0.21545548
## [6,] 0.11025563 -0.53315167
## [7,] -0.29907768 0.34383643
## [8,] -0.06393969 0.12611002
plot(s_eig$values, xlab="Eigenvalue Number", ylab = "Eigenvalue Size",
main = "Scree Plot")
lines(s_eig$values)
Diketahui secara visual, dapat dimanfaatkan sekitar 4 PC.
for (eg in s_eig$values){
print(eg / sum(s_eig$values))
}
## [1] 0.3264768
## [1] 0.171833
## [1] 0.1552857
## [1] 0.1180102
## [1] 0.09076233
## [1] 0.06192583
## [1] 0.05750583
## [1] 0.01820021
Diketahui berdasarkan kumulasi, 4 PC sudah menangkap sekitar 75% keragaman. Sehingga dapat kita susun 4 buah PC
s_eig$vectors[,1:4]
## [,1] [,2] [,3] [,4]
## [1,] -0.46239339 -0.286270713 0.19514429 -0.3944093
## [2,] 0.01138205 -0.602500980 -0.44494381 -0.1720228
## [3,] -0.38368732 0.313079082 -0.13275094 -0.5108505
## [4,] -0.32131255 -0.225033987 0.22202247 0.6714161
## [5,] -0.45713872 -0.226684561 -0.22304517 0.1518141
## [6,] 0.52689600 -0.104144571 -0.02335993 -0.1072818
## [7,] 0.21511145 -0.586623100 0.30232411 -0.2020650
## [8,] -0.05231593 0.004919114 0.74520617 -0.1664165
Hasil di atas dapat dituliskan dalam bentuk persamaan: \[ PC_1 = -0.46X_1 + 0.01X_2 – 0.38X_3 – 0.32X_4 – 0.46X_5 + 0.53X_6 + 0.22X_7 – 0.05X_8 \] \[ PC_2 = -0.29X_1 - 0.60X_2 + 0.31X_3 - 0.225X_4 - 0.226 X_5 - 0.1 X_6 - 0.59 X7 + 0.005 X8 \] \[ PC_3 = 0.19X_1 - 0.44X_2 - 0.13X_3 + 0.222X_4 - 0.223X_5 - 0.02X_6 - 0.3X_7 - 0.75X_8 \] \[ PC_4 = -0.39X_1 - 0.17X_2 - 0.51X_3 + 0.67X_4 + 0.15X_5 - 0.107X_6 - 0.2X_7 - 0.17X_8 \]
kor_eig <- eigen(DATA_cor)
kor_eig
## eigen() decomposition
## $values
## [1] 2.6118147 1.3746642 1.2422858 0.9440815 0.7260987 0.4954067 0.4600467
## [8] 0.1456017
##
## $vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.46239339 -0.286270713 0.19514429 -0.3944093 -0.12860292 -0.28104855
## [2,] 0.01138205 -0.602500980 -0.44494381 -0.1720228 0.47554461 0.07776324
## [3,] -0.38368732 0.313079082 -0.13275094 -0.5108505 -0.28284234 0.50476545
## [4,] -0.32131255 -0.225033987 0.22202247 0.6714161 -0.22378184 0.31796991
## [5,] -0.45713872 -0.226684561 -0.22304517 0.1518141 0.12480978 0.32209118
## [6,] 0.52689600 -0.104144571 -0.02335993 -0.1072818 -0.08812409 0.62873412
## [7,] 0.21511145 -0.586623100 0.30232411 -0.2020650 -0.51849603 -0.02934568
## [8,] -0.05231593 0.004919114 0.74520617 -0.1664165 0.57884285 0.24323053
## [,7] [,8]
## [1,] 0.27148249 0.58427020
## [2,] 0.32778548 -0.26393764
## [3,] 0.15537574 -0.34241052
## [4,] 0.43349730 -0.08319441
## [5,] -0.70788443 0.21545548
## [6,] 0.11025563 0.53315167
## [7,] -0.29907768 -0.34383643
## [8,] -0.06393969 -0.12611002
plot(kor_eig$values, xlab="Eigenvalue Number", ylab="Eigenvalue Size",
main = "Scree Plot")
lines(kor_eig$values)
Proses selanjutnya mengikuti pola yang sama.
for (eg in kor_eig$values){
print(eg / sum(kor_eig$values))
}
## [1] 0.3264768
## [1] 0.171833
## [1] 0.1552857
## [1] 0.1180102
## [1] 0.09076233
## [1] 0.06192583
## [1] 0.05750583
## [1] 0.01820021
Diketahui berdasarkan kumulasi, 4 PC sudah menangkap sekitar 75% keragaman. Sehingga dapat kita susun 4 buah PC
kor_eig$vectors[,1:4]
## [,1] [,2] [,3] [,4]
## [1,] -0.46239339 -0.286270713 0.19514429 -0.3944093
## [2,] 0.01138205 -0.602500980 -0.44494381 -0.1720228
## [3,] -0.38368732 0.313079082 -0.13275094 -0.5108505
## [4,] -0.32131255 -0.225033987 0.22202247 0.6714161
## [5,] -0.45713872 -0.226684561 -0.22304517 0.1518141
## [6,] 0.52689600 -0.104144571 -0.02335993 -0.1072818
## [7,] 0.21511145 -0.586623100 0.30232411 -0.2020650
## [8,] -0.05231593 0.004919114 0.74520617 -0.1664165
Hasil di atas dapat dituliskan dalam bentuk persamaan: \[ PC_1 = -0.46X_1 + 0.01X_2 – 0.38X_3 – 0.32X_4 – 0.46X_5 + 0.53X_6 + 0.22X_7 – 0.05X_8 \] \[ PC_2 = -0.29X_1 - 0.60X_2 + 0.31X_3 - 0.225X_4 - 0.226 X_5 - 0.1 X_6 - 0.59 X7 + 0.005 X8 \] \[ PC_3 = 0.19X_1 - 0.44X_2 - 0.13X_3 + 0.222X_4 - 0.223X_5 - 0.02X_6 - 0.3X_7 - 0.75X_8 \] \[ PC_4 = -0.39X_1 - 0.17X_2 - 0.51X_3 + 0.67X_4 + 0.15X_5 - 0.107X_6 - 0.2X_7 - 0.17X_8 \]
prcompPCA1 <- prcomp(x=DATA_ANMUL_PRAK,scale=T,center=T)
PCA1
## Standard deviations (1, .., p=8):
## [1] 1.6161110 1.1724608 1.1145788 0.9716386 0.8521142 0.7038513 0.6782674
## [8] 0.3815779
##
## Rotation (n x k) = (8 x 8):
## PC1 PC2 PC3 PC4
## AHH -0.46239339 0.286270713 -0.19514429 0.3944093
## Rasio Puskemas 0.01138205 0.602500980 0.44494381 0.1720228
## Rasio Rumah Sakit -0.38368732 -0.313079082 0.13275094 0.5108505
## Persenatse PHBS -0.32131255 0.225033987 -0.22202247 -0.6714161
## Persentase Sanitasi Layak -0.45713872 0.226684561 0.22304517 -0.1518141
## Persentase BBLR 0.52689600 0.104144571 0.02335993 0.1072818
## Persentase Asi Ekslusif 0.21511145 0.586623100 -0.30232411 0.2020650
## Angka Kesakitan Diare -0.05231593 -0.004919114 -0.74520617 0.1664165
## PC5 PC6 PC7 PC8
## AHH -0.12860292 0.28104855 0.27148249 0.58427020
## Rasio Puskemas 0.47554461 -0.07776324 0.32778548 -0.26393764
## Rasio Rumah Sakit -0.28284234 -0.50476545 0.15537574 -0.34241052
## Persenatse PHBS -0.22378184 -0.31796991 0.43349730 -0.08319441
## Persentase Sanitasi Layak 0.12480978 -0.32209118 -0.70788443 0.21545548
## Persentase BBLR -0.08812409 -0.62873412 0.11025563 0.53315167
## Persentase Asi Ekslusif -0.51849603 0.02934568 -0.29907768 -0.34383643
## Angka Kesakitan Diare 0.57884285 -0.24323053 -0.06393969 -0.12611002
print(PCA1$rotation[,1:4],digits=4)
## PC1 PC2 PC3 PC4
## AHH -0.46239 0.286271 -0.19514 0.3944
## Rasio Puskemas 0.01138 0.602501 0.44494 0.1720
## Rasio Rumah Sakit -0.38369 -0.313079 0.13275 0.5109
## Persenatse PHBS -0.32131 0.225034 -0.22202 -0.6714
## Persentase Sanitasi Layak -0.45714 0.226685 0.22305 -0.1518
## Persentase BBLR 0.52690 0.104145 0.02336 0.1073
## Persentase Asi Ekslusif 0.21511 0.586623 -0.30232 0.2021
## Angka Kesakitan Diare -0.05232 -0.004919 -0.74521 0.1664
summary(PCA1)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.6161 1.1725 1.1146 0.9716 0.85211 0.70385 0.67827
## Proportion of Variance 0.3265 0.1718 0.1553 0.1180 0.09076 0.06193 0.05751
## Cumulative Proportion 0.3265 0.4983 0.6536 0.7716 0.86237 0.92429 0.98180
## PC8
## Standard deviation 0.3816
## Proportion of Variance 0.0182
## Cumulative Proportion 1.0000
Hasil di atas dapat dituliskan dalam bentuk persamaan: \[ PC_1 = -0.46X_1 + 0.01X_2 – 0.38X_3 – 0.32X_4 – 0.46X_5 + 0.53X_6 + 0.22X_7 – 0.05X_8 \] \[ PC_2 = -0.29X_1 - 0.60X_2 + 0.31X_3 - 0.225X_4 - 0.226 X_5 - 0.1 X_6 - 0.59 X7 + 0.005 X8 \] \[ PC_3 = 0.19X_1 - 0.44X_2 - 0.13X_3 + 0.222X_4 - 0.223X_5 - 0.02X_6 - 0.3X_7 - 0.75X_8 \] \[ PC_4 = -0.39X_1 - 0.17X_2 - 0.51X_3 + 0.67X_4 + 0.15X_5 - 0.107X_6 - 0.2X_7 - 0.17X_8\] # Kesimpulan
Dengan metode analisis komponen utama, dapat diketahui bahwasanya data kesehatan di Pulau Jawa yang terdiri atas 8 indikator dapat direduksi menjadi 4 indikator namun tetap dapat menggambarkan keragaman dari data awal. Keempat indikator tersebut adalah Kondisi Kesehatan Bayi, Fasilitas Kesehatan, PHBS, dan Kesehatan Masyarakat. Selain itu indikator baru yang terbentuk ini dapat digunakan untuk analisis lanjutan seperti regresi dan klasifikasi dan telah mengatasi masalah multikolinearitas karena memiliki sifat saling bebas satu sama lain.
```