Praktikum 8 APG : AKU
Tahapan Reduksi Data
+ Memeriksa korelasi linier antar peubah asal dan keragaman data.
+ Tentukan basis AKU (matriks ragam peragam atau matriks korelasi).
+ Menghitung akar ciri dan vektor ciri sesuai basis AKU.
+ Membuat kombinasi linier dari peubah asal untuk KU (Y_i)
+ Menghitung keragaman total data
+ Menetapkan jumlah komponen untuk tujuan reduksi
+ Kontribusi variabel terhadap komponen utamanya
Penerapan 1
Library Zone
library(tidyverse)FAKTOR PENENTU POLUSI
Data dikumpulkan untuk menyelidiki faktor penentu polusi. Variabel berikut diperoleh untuk 41 kota AS:
Method AKU
LOAD DATASET
data1<-read.csv(file="air_pollution.csv",sep=",",row.names = 1)
head(data1)## SO2 temp manu popul wind precip predays
## Albany 46 47.6 44 116 8.8 33.36 135
## Albuquerque 11 56.8 46 244 8.9 7.77 58
## Atlanta 24 61.5 368 497 9.1 48.34 115
## Baltimore 47 55.0 625 905 9.6 41.31 111
## Bu\valo 11 47.1 391 463 12.4 36.11 166
## Charleston 31 55.2 35 71 6.5 40.75 148
Melihat Matriks Korelasi
round(cor(data1),digits=3)## SO2 temp manu popul wind precip predays
## SO2 1.000 -0.434 0.645 0.494 0.095 0.054 0.370
## temp -0.434 1.000 -0.190 -0.063 -0.350 0.386 -0.430
## manu 0.645 -0.190 1.000 0.955 0.238 -0.032 0.132
## popul 0.494 -0.063 0.955 1.000 0.213 -0.026 0.042
## wind 0.095 -0.350 0.238 0.213 1.000 -0.013 0.164
## precip 0.054 0.386 -0.032 -0.026 -0.013 1.000 0.496
## predays 0.370 -0.430 0.132 0.042 0.164 0.496 1.000
NOTE Biasanya dalam regresi linear berganda harus dipenuhi asumsi bahwa tidak terdapat multikolinearitas antar variabel. Tetapi dalam hal ini diperoleh $ Cor(popul,manu)=0.955$ sangat tinggi.Hal ini tentu saja menimbulkan masalah bahwa \(Cov(X_1,X_2)\neq=0\)
Kita dapat mengatasi hal ini dengan menggunakan PCA.
Menentukan Komponen Utama
- Misal terdapat matriks covarians \(\Sigma\) dan memiliki nilai eigen \(\lambda_1 >> \lambda_2 >> ... >>\lambda_p\) dan vektor eigen bersesuaian \(e_1,e_2,e_3,...,e_p\) maka komponen utama dapat ditentukan
- \(Y_1=e_1^T X_1\)
- \(Y_2=e_2^T X_2\)
- \(\vdots\)
- \(Y_p=e_p^T X_p\)
# SO2 dihapus karena variabel dependen
data_polusi<-data1[,-1]
head(data_polusi,10)## temp manu popul wind precip predays
## Albany 47.6 44 116 8.8 33.36 135
## Albuquerque 56.8 46 244 8.9 7.77 58
## Atlanta 61.5 368 497 9.1 48.34 115
## Baltimore 55.0 625 905 9.6 41.31 111
## Bu\valo 47.1 391 463 12.4 36.11 166
## Charleston 55.2 35 71 6.5 40.75 148
## Chicago 50.6 3344 3369 10.4 34.44 122
## Cincinnati 54.0 462 453 7.1 39.04 132
## Cleveland 49.7 1007 751 10.9 34.99 155
## Columbus 51.5 266 540 8.6 37.01 134
#Misalkan ada tambahan negatif temperature
# Membuat Variabel negative temperature (mutate()) keperluan penelitian.
data_polusi <- data_polusi%>%mutate(negtemp=-1*temp)
data_polusi## temp manu popul wind precip predays negtemp
## Albany 47.6 44 116 8.8 33.36 135 -47.6
## Albuquerque 56.8 46 244 8.9 7.77 58 -56.8
## Atlanta 61.5 368 497 9.1 48.34 115 -61.5
## Baltimore 55.0 625 905 9.6 41.31 111 -55.0
## Bu\valo 47.1 391 463 12.4 36.11 166 -47.1
## Charleston 55.2 35 71 6.5 40.75 148 -55.2
## Chicago 50.6 3344 3369 10.4 34.44 122 -50.6
## Cincinnati 54.0 462 453 7.1 39.04 132 -54.0
## Cleveland 49.7 1007 751 10.9 34.99 155 -49.7
## Columbus 51.5 266 540 8.6 37.01 134 -51.5
## Dallas 66.2 641 844 10.9 35.94 78 -66.2
## Denver 51.9 454 515 9.0 12.95 86 -51.9
## Des Moines 49.0 104 201 11.2 30.85 103 -49.0
## Detroit 49.9 1064 1513 10.1 30.96 129 -49.9
## Hartford 49.1 412 158 9.0 43.37 127 -49.1
## Houston 68.9 721 1233 10.8 48.19 103 -68.9
## Indianapolis 52.3 361 746 9.7 38.74 121 -52.3
## Jacksonville 68.4 136 529 8.8 54.47 116 -68.4
## Kansas 54.5 381 507 10.0 37.00 99 -54.5
## Little Rock 61.0 91 132 8.2 48.52 100 -61.0
## Louisville 55.6 291 593 8.3 43.11 123 -55.6
## Memphis 61.6 337 624 9.2 49.10 105 -61.6
## Miami 75.5 207 335 9.0 59.80 128 -75.5
## Milwaukee 45.7 569 717 11.8 29.07 123 -45.7
## Minneapolis 43.5 699 744 10.6 25.94 137 -43.5
## Nashville 59.4 275 448 7.9 46.00 119 -59.4
## New Orleans 68.3 204 361 8.4 56.77 113 -68.3
## Norfolk 59.3 96 308 10.6 44.68 116 -59.3
## Omaha 51.5 181 347 10.9 30.18 98 -51.5
## Philadelphia 54.6 1692 1950 9.6 39.93 115 -54.6
## Phoenix 70.3 213 582 6.0 7.05 36 -70.3
## Pittsburgh 50.4 347 520 9.4 36.22 147 -50.4
## Providence 50.0 343 179 10.6 42.75 125 -50.0
## Richmond 57.8 197 299 7.6 42.59 115 -57.8
## Salt Lake City 51.0 137 176 8.7 15.17 89 -51.0
## San Fransisco 56.7 453 716 8.7 20.66 67 -56.7
## Seattle 51.1 379 531 9.4 38.79 164 -51.1
## St. Louis 55.9 775 622 9.5 35.89 105 -55.9
## Washington 57.3 434 757 9.3 38.89 111 -57.3
## Wichita 56.6 125 277 12.7 30.58 82 -56.6
## Wilmington 54.0 80 80 9.0 40.25 114 -54.0
#oleh karena temperature menjadi negatif, maka hapus variabel temperatur
# Pilih variabel kecuali temperature
data_polusi <- select(data_polusi,-temp)
head(data_polusi,10)## manu popul wind precip predays negtemp
## Albany 44 116 8.8 33.36 135 -47.6
## Albuquerque 46 244 8.9 7.77 58 -56.8
## Atlanta 368 497 9.1 48.34 115 -61.5
## Baltimore 625 905 9.6 41.31 111 -55.0
## Bu\valo 391 463 12.4 36.11 166 -47.1
## Charleston 35 71 6.5 40.75 148 -55.2
## Chicago 3344 3369 10.4 34.44 122 -50.6
## Cincinnati 462 453 7.1 39.04 132 -54.0
## Cleveland 1007 751 10.9 34.99 155 -49.7
## Columbus 266 540 8.6 37.01 134 -51.5
Basis
# Membuat Matriks Covarians
matcov1<-cov(scale(data_polusi))
matcov1## manu popul wind precip predays negtemp
## manu 1.00000000 0.95526935 0.23794683 -0.03241688 0.13182930 0.19004216
## popul 0.95526935 1.00000000 0.21264375 -0.02611873 0.04208319 0.06267813
## wind 0.23794683 0.21264375 1.00000000 -0.01299438 0.16410559 0.34973963
## precip -0.03241688 -0.02611873 -0.01299438 1.00000000 0.49609671 -0.38625342
## predays 0.13182930 0.04208319 0.16410559 0.49609671 1.00000000 0.43024212
## negtemp 0.19004216 0.06267813 0.34973963 -0.38625342 0.43024212 1.00000000
# Bandingkan
cor(data_polusi)## manu popul wind precip predays negtemp
## manu 1.00000000 0.95526935 0.23794683 -0.03241688 0.13182930 0.19004216
## popul 0.95526935 1.00000000 0.21264375 -0.02611873 0.04208319 0.06267813
## wind 0.23794683 0.21264375 1.00000000 -0.01299438 0.16410559 0.34973963
## precip -0.03241688 -0.02611873 -0.01299438 1.00000000 0.49609671 -0.38625342
## predays 0.13182930 0.04208319 0.16410559 0.49609671 1.00000000 0.43024212
## negtemp 0.19004216 0.06267813 0.34973963 -0.38625342 0.43024212 1.00000000
#Jadi kesimpulannya : memakai basis matriks korelasi# Nilai eigen matriks covarianns
eig1<-eigen(matcov1)$values
# Nilai vektor eigen matriks covarians
eigvec1<-eigen(matcov1)$vectors
print("Nilai Eigen")## [1] "Nilai Eigen"
print(eig1)## [1] 2.19616264 1.49994343 1.39464912 0.76022689 0.11457065 0.03444727
print("Vektor Eigen Bersesuaian")## [1] "Vektor Eigen Bersesuaian"
print(eigvec1)## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.61154243 0.1680577 -0.27288633 -0.13684076 0.10204211 0.70297051
## [2,] -0.57782195 0.2224533 -0.35037413 -0.07248126 -0.07806551 -0.69464131
## [3,] -0.35383877 -0.1307915 0.29725334 0.86942583 -0.11326688 0.02452501
## [4,] 0.04080701 -0.6228578 -0.50456294 0.17114826 0.56818342 -0.06062222
## [5,] -0.23791593 -0.7077653 0.09308852 -0.31130693 -0.58000387 0.02196062
## [6,] -0.32964613 -0.1275974 0.67168611 -0.30645728 0.55805638 -0.13618780
Menentukan Nilai standar deviasi dari setiap komponen utama
- Standar deviasi dapat ditentukan dari varians komponen utama Yi, \(Var(Y_i)=\lambda_i\) maka stdev nya adalah \(\sqrt{\lambda_i}\)
print("standar deviasi")## [1] "standar deviasi"
sqrt(eig1)## [1] 1.4819456 1.2247218 1.1809526 0.8719099 0.3384829 0.1855998
Menentukan total varians
- Total varians dapat dinyatakan \(Total \space variance = Var(Y_1 )+Var(Y_2) +... +Var(Y_p)=\lambda_1 +\lambda_2 +...+\lambda_p\)
# total varians adalah jumlah eigen
total_var<-sum(eig1)
total_var## [1] 6
Menentukan proporsi varians untuk setiap komponen
- Proporsi varians komponen utama ke i terhadap total varians dapat dituliskan sebagai berikut
- \(proporsi \space var =\frac{\lambda_i}{\lambda_1 +\lambda_2 +...+\lambda_p}\)
# proporsi, nilai var ke i dibagi var total
prop_var<-eig1/total_var
prop_var## [1] 0.366027107 0.249990572 0.232441520 0.126704481 0.019095109 0.005741211
Menentukan cumulative varians
# Cumulative proporportion varians
cumsum(prop_var)## [1] 0.3660271 0.6160177 0.8484592 0.9751637 0.9942588 1.0000000
Menentukan Tabel Loadings
- Ide dasar adalah
- \(Y_1=e_1^T X_1\)
- \(Y_2=e_2^T X_2\)
- \(\vdots\)
- \(Y_p=e_p^T X_p\)
varname <- colnames(data_polusi)
loadings <- data.frame(varname,round(eigvec1,5))
colnames(loadings)<-c("var","comp.1","comp.2","comp.3","comp.4","comp.5","comp.6")
loadings## var comp.1 comp.2 comp.3 comp.4 comp.5 comp.6
## 1 manu -0.61154 0.16806 -0.27289 -0.13684 0.10204 0.70297
## 2 popul -0.57782 0.22245 -0.35037 -0.07248 -0.07807 -0.69464
## 3 wind -0.35384 -0.13079 0.29725 0.86943 -0.11327 0.02453
## 4 precip 0.04081 -0.62286 -0.50456 0.17115 0.56818 -0.06062
## 5 predays -0.23792 -0.70777 0.09309 -0.31131 -0.58000 0.02196
## 6 negtemp -0.32965 -0.12760 0.67169 -0.30646 0.55806 -0.13619
Alternatif Otomatis Menggunakan printcomp
summary(princomp(scale(data_polusi)),loadings = TRUE)## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 1.4637615 1.2096939 1.1664619 0.8612112 0.33432955
## Proportion of Variance 0.3660271 0.2499906 0.2324415 0.1267045 0.01909511
## Cumulative Proportion 0.3660271 0.6160177 0.8484592 0.9751637 0.99425879
## Comp.6
## Standard deviation 0.183322369
## Proportion of Variance 0.005741211
## Cumulative Proportion 1.000000000
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## manu 0.612 0.168 0.273 0.137 0.102 0.703
## popul 0.578 0.222 0.350 -0.695
## wind 0.354 -0.131 -0.297 -0.869 -0.113
## precip -0.623 0.505 -0.171 0.568
## predays 0.238 -0.708 0.311 -0.580
## negtemp 0.330 -0.128 -0.672 0.306 0.558 -0.136
library(FactoMineR)
pca<-PCA(scale(data_polusi))# Menampilkan tabel Komponen Penting
pca$eig## eigenvalue percentage of variance cumulative percentage of variance
## comp 1 2.19616264 36.6027107 36.60271
## comp 2 1.49994343 24.9990572 61.60177
## comp 3 1.39464912 23.2441520 84.84592
## comp 4 0.76022689 12.6704481 97.51637
## comp 5 0.11457065 1.9095109 99.42588
## comp 6 0.03444727 0.5741211 100.00000
# Menampilkan Scree Plot
plot(prop_var,type="o",ylab="Proporsi varians yang dapat dijelaskan")Penerapan 2
Enam variabel hematologi diukur pada 51 pekerja (Royston 1983): y1 = konsentrasi hemoglobin y4 = jumlah limfosit y2 = volume sel yang dikemas y5 = jumlah neutrofil y3 = jumlah sel darah putih y6 = konsentrasi timbal serum
LOAD DATA
library(readxl)
Data <- read_excel("Data.xlsx")
data_aku <- data.frame(Data$y1,Data$y2,Data$y3,Data$y4,Data$y5,Data$y6)
head(data_aku,10)## Data.y1 Data.y2 Data.y3 Data.y4 Data.y5 Data.y6
## 1 13.4 39 4100 14 25 17
## 2 14.6 46 5000 15 30 20
## 3 13.5 42 4500 19 21 18
## 4 15.0 46 4600 23 16 18
## 5 14.6 44 5100 17 31 19
## 6 14.0 44 4900 20 24 19
## 7 16.4 49 4300 21 17 18
## 8 14.8 44 4400 16 26 29
## 9 15.2 46 4100 27 13 27
## 10 15.5 48 8400 34 42 36
dim(Data)## [1] 35 7
####Menghitung korelasi antar variabel
round(cor(data_aku), digits = 3)## Data.y1 Data.y2 Data.y3 Data.y4 Data.y5 Data.y6
## Data.y1 1.000 0.788 0.384 0.398 0.106 -0.103
## Data.y2 0.788 1.000 0.401 0.437 0.095 0.197
## Data.y3 0.384 0.401 1.000 0.750 0.697 0.202
## Data.y4 0.398 0.437 0.750 1.000 0.077 0.283
## Data.y5 0.106 0.095 0.697 0.077 1.000 0.083
## Data.y6 -0.103 0.197 0.202 0.283 0.083 1.000
Nilai dan vektor eigen basis
eigen(cov(scale(data_aku)))## eigen() decomposition
## $values
## [1] 2.7784901 1.3142591 1.0652472 0.6742434 0.1561809 0.0115794
##
## $vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.4282630 0.51189355 -0.22147091 0.1692936 0.690388026 -0.01450898
## [2,] 0.4580183 0.42928877 0.07092817 0.3829351 -0.672686824 0.04186590
## [3,] 0.5247663 -0.35858404 -0.14567324 -0.2331124 -0.064318534 -0.71856724
## [4,] 0.4695834 -0.02377455 0.29289957 -0.6582804 -0.007574298 0.50965244
## [5,] 0.2842325 -0.57762786 -0.47414638 0.3752621 0.017697085 0.46862384
## [6,] 0.1735906 -0.30145391 0.78364162 0.4429496 0.257583734 -0.04841434
####Melakukan analisis komponen utama
AKU <- princomp(data_aku, cor = T)
summary(AKU,loadings = TRUE)## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 1.6668803 1.1464114 1.0321081 0.8211232 0.39519726
## Proportion of Variance 0.4630817 0.2190432 0.1775412 0.1123739 0.02603015
## Cumulative Proportion 0.4630817 0.6821249 0.8596661 0.9720400 0.99807010
## Comp.6
## Standard deviation 0.1076076
## Proportion of Variance 0.0019299
## Cumulative Proportion 1.0000000
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Data.y1 0.428 0.512 0.221 0.169 0.690
## Data.y2 0.458 0.429 0.383 -0.673
## Data.y3 0.525 -0.359 0.146 -0.233 0.719
## Data.y4 0.470 -0.293 -0.658 -0.510
## Data.y5 0.284 -0.578 0.474 0.375 -0.469
## Data.y6 0.174 -0.301 -0.784 0.443 0.258
####korelasi nilai KU dengan nilai variabel awal
KU <- AKU$scores
KU## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## [1,] -3.11186781 -1.519890576 0.60250842 -1.06198766 0.33170661
## [2,] -0.35665952 -0.265798888 0.54353366 0.66616828 -0.72801362
## [3,] -2.08478412 -0.770105455 -0.02372771 -1.11460739 -0.48391022
## [4,] -0.41894446 1.216209276 -0.21668500 -0.72720878 -0.54268338
## [5,] -0.61196928 -0.698412739 0.77149013 0.08415192 -0.17698531
## [6,] -1.05792974 -0.486440062 0.05939822 -0.58754032 -0.65905453
## [7,] 0.68479171 2.633266994 0.14390813 0.33613844 -0.33594537
## [8,] -0.66511092 -0.662126029 -1.19567479 1.03845519 0.55506756
## [9,] -0.05339881 1.099185569 -2.06200218 -0.23203905 0.12417654
## [10,] 4.03871964 -2.253410597 -1.58650080 1.06046538 0.09390194
## [11,] 1.01633663 0.175272550 -0.20342853 -0.04024993 -0.49709169
## [12,] 2.06655200 2.558603219 -0.77526690 0.34298746 -0.01781005
## [13,] -0.50102452 0.061481666 -0.82426505 -0.55087725 0.18870217
## [14,] 0.90561406 0.711257371 0.49493920 -0.57454777 0.73234826
## [15,] -1.57506972 0.906096753 -0.27701867 -1.43896371 0.10428713
## [16,] -2.69250647 0.510362257 1.43413117 -0.26272307 0.25286361
## [17,] 0.66777252 0.577207418 1.61087956 0.34580216 0.89225288
## [18,] 1.74012353 -1.252315563 0.19106561 -0.12827151 0.33616722
## [19,] -0.50605931 0.201794602 1.02394038 0.15794463 -0.17824784
## [20,] -0.81736811 -0.424153229 -1.25153949 1.33288340 -0.57422851
## [21,] -0.06839551 0.423151678 2.07181865 1.29054512 0.08290469
## [22,] 0.36650054 2.110069638 -0.18186142 -0.64384438 -0.07094278
## [23,] 4.83159653 0.629005826 1.73818228 -0.45762783 -0.08409046
## [24,] -0.62498976 -1.186965735 0.65865777 -0.47362102 -0.13069561
## [25,] -0.87833025 -0.456675453 -1.30135562 1.06289370 0.56130211
## [26,] -1.51543781 -0.420173680 1.48910846 0.64897774 0.30401791
## [27,] 0.01610981 0.007864505 0.78770082 0.54588649 0.27615526
## [28,] -1.53739029 -0.053125970 -1.18592578 -0.03749289 0.40807035
## [29,] 0.22398227 -1.567138288 0.48261117 0.72644434 -0.45026818
## [30,] -0.47472798 -0.383964815 -1.44621437 -0.07788308 0.25443868
## [31,] 2.49011314 -0.604317984 -0.91204277 -2.31896519 0.10923895
## [32,] -1.33241192 0.996035320 -1.18127394 0.32242790 0.07639320
## [33,] 1.43229841 0.120991169 0.57694843 0.40528154 -0.08438277
## [34,] 1.61642875 -2.416516234 0.29498809 -0.84291772 -0.23727983
## [35,] -1.21256318 0.483675485 -0.35102713 1.20391487 -0.43236492
## Comp.6
## [1,] -0.0238401149
## [2,] 0.0203774380
## [3,] 0.0605814558
## [4,] 0.1006476084
## [5,] -0.0888975671
## [6,] 0.0141434382
## [7,] -0.0254807975
## [8,] 0.0008867849
## [9,] -0.1595983329
## [10,] 0.0512421662
## [11,] -0.1840994838
## [12,] -0.0078849245
## [13,] -0.0639425782
## [14,] -0.0406335482
## [15,] -0.0057361488
## [16,] 0.0256491180
## [17,] -0.0589059251
## [18,] -0.0828015815
## [19,] -0.2736857920
## [20,] 0.0248660998
## [21,] 0.3307066235
## [22,] 0.1411742684
## [23,] -0.0600881194
## [24,] -0.0536666012
## [25,] 0.0773086911
## [26,] -0.0563481458
## [27,] 0.0978003203
## [28,] 0.0483125242
## [29,] 0.0623357904
## [30,] -0.0200208681
## [31,] 0.2441167685
## [32,] -0.0210090385
## [33,] -0.0682345021
## [34,] -0.0307637215
## [35,] 0.0254886953
cor(data_aku,KU)## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Data.y1 0.7138631 0.58684061 0.22858192 0.1390109 0.272839455 0.001561276
## Data.y2 0.7634617 0.49214155 -0.07320553 0.3144369 -0.265843989 -0.004505089
## Data.y3 0.8747226 -0.41108484 0.15035053 -0.1914140 -0.025418508 0.077323304
## Data.y4 0.7827393 -0.02725542 -0.30230403 -0.5405293 -0.002993342 -0.054842482
## Data.y5 0.4737816 -0.66219918 0.48937032 0.3081364 0.006993839 -0.050427492
## Data.y6 0.2893548 -0.34559021 -0.80880288 0.3637162 0.101796385 0.005209752
Menentukan Komponen utama dengan basis matriks Covarians
prcompcov<-function(data){
cov<-cov(data)
eig<-eigen(cov)$values
eigvec<-eigen(cov)$vectors
std<-sqrt(eig)
var_tot<-sum(eig)
prop_var<-eig/var_tot
cum_prob<-cumsum(prop_var)
prcomp_name<-rep(0,ncol(data))
for ( i in 1:ncol(data)){
prcomp_name[i]<-paste("comp",i,sep=".")
}
loadings<-data.frame(colnames(data),round(eigvec,5))
colnames(loadings)<-c("variabel",prcomp_name)
importancevar<-data.frame(rbind(std,prop_var,cum_prob))
rownames(importancevar)<-c("Standard deviation","Proportion of Variance","Cumulative Proportion")
colnames(importancevar)<-prcomp_name
scree<-plot(prop_var,type="o",ylab="proporsi varians yang dapat dijelaskan")
result<-list(importancevar,loadings,scree)
names(result)<-c("Importance of components:","Loadings:","scree plot")
return(result)
}#data hemolitik di skala
prcompcov(scale(data_aku))## $`Importance of components:`
## comp.1 comp.2 comp.3 comp.4 comp.5
## Standard deviation 1.6668803 1.1464114 1.0321081 0.8211232 0.39519726
## Proportion of Variance 0.4630817 0.2190432 0.1775412 0.1123739 0.02603015
## Cumulative Proportion 0.4630817 0.6821249 0.8596661 0.9720400 0.99807010
## comp.6
## Standard deviation 0.1076076
## Proportion of Variance 0.0019299
## Cumulative Proportion 1.0000000
##
## $`Loadings:`
## variabel comp.1 comp.2 comp.3 comp.4 comp.5 comp.6
## 1 Data.y1 0.42826 0.51189 -0.22147 0.16929 0.69039 -0.01451
## 2 Data.y2 0.45802 0.42929 0.07093 0.38294 -0.67269 0.04187
## 3 Data.y3 0.52477 -0.35858 -0.14567 -0.23311 -0.06432 -0.71857
## 4 Data.y4 0.46958 -0.02377 0.29290 -0.65828 -0.00757 0.50965
## 5 Data.y5 0.28423 -0.57763 -0.47415 0.37526 0.01770 0.46862
## 6 Data.y6 0.17359 -0.30145 0.78364 0.44295 0.25758 -0.04841
##
## $`scree plot`
## NULL
Menentukan komponen utama dengan basis matriks korelasi
- \(vartotal = p\)
- \(proporsi\space varians =\frac{\lambda_i}{p}\)
prcompcor<-function(data){
cor<-cor(data)
p<-ncol(data)
eig<-eigen(cor)$values
eigvec<-eigen(cor)$vectors
std<-sqrt(eig)
meanvec<-colMeans(data)
vartot<-p
prop_var<-eig/vartot
cum_prob<-cumsum(prop_var)
prcomp_name<-rep(0,ncol(data))
for ( i in 1:ncol(data)){
prcomp_name[i]<-paste("comp",i,sep=".")
}
loadings<-data.frame(colnames(data),round(eigvec,5))
colnames(loadings)<-c("variabel",prcomp_name)
importancevar<-data.frame(rbind(std,prop_var,cum_prob))
rownames(importancevar)<-c("Standard deviation","Proportion of Variance","Cumulative Proportion")
colnames(importancevar)<-prcomp_name
scree<-plot(prop_var,type="o",ylab="proporsi varians yang dapat dijelaskan")
result<-list(importancevar,loadings,scree)
names(result)<-c("Importance of components:","Loadings: ","scree plot")
return(result)
}prcompcor(data_aku)## $`Importance of components:`
## comp.1 comp.2 comp.3 comp.4 comp.5
## Standard deviation 1.6668803 1.1464114 1.0321081 0.8211232 0.39519726
## Proportion of Variance 0.4630817 0.2190432 0.1775412 0.1123739 0.02603015
## Cumulative Proportion 0.4630817 0.6821249 0.8596661 0.9720400 0.99807010
## comp.6
## Standard deviation 0.1076076
## Proportion of Variance 0.0019299
## Cumulative Proportion 1.0000000
##
## $`Loadings: `
## variabel comp.1 comp.2 comp.3 comp.4 comp.5 comp.6
## 1 Data.y1 -0.42826 0.51189 -0.22147 0.16929 0.69039 -0.01451
## 2 Data.y2 -0.45802 0.42929 0.07093 0.38294 -0.67269 0.04187
## 3 Data.y3 -0.52477 -0.35858 -0.14567 -0.23311 -0.06432 -0.71857
## 4 Data.y4 -0.46958 -0.02377 0.29290 -0.65828 -0.00757 0.50965
## 5 Data.y5 -0.28423 -0.57763 -0.47415 0.37526 0.01770 0.46862
## 6 Data.y6 -0.17359 -0.30145 0.78364 0.44295 0.25758 -0.04841
##
## $`scree plot`
## NULL
- Manakah yang lebih cocok? Keduanya cocok karena menghasilkan hasil yang sama
- Berapa banyak komponen yang akan dipertahankan? Dari importance of component terlihat bahwa hanya Komponen 1,2, dan 3 yang memiliki standar deviasi lebih dari 1, Sehingga kita bisa pertahankan komponen 1,2,3 dan abaikan komponen 4,5,6. Selain itu, karena komponen 1, 2, dan 3 juga telah memiliki proporsi varians kumulatif sebesar 85.967% maka komponen 1,2,3 bisa dipertahankan.
Penerapan 3 Soal UAS
SOAL UAS
Bagian A
matR = matrix(data=c(1,0.9,0.74,0.64,
0.9,1,0.33,0.74,
0.74,0.33,1,0.66,
0.64,0.74,0.66,1),4,4)
matR## [,1] [,2] [,3] [,4]
## [1,] 1.00 0.90 0.74 0.64
## [2,] 0.90 1.00 0.33 0.74
## [3,] 0.74 0.33 1.00 0.66
## [4,] 0.64 0.74 0.66 1.00
#Mendapatkan eigen value dan eigen vactor (loading)
eig = eigen(matR)$values
eig## [1] 3.01878551 0.67793201 0.37519449 -0.07191201
eigvec = eigen(matR)$vectors
eigvec## [,1] [,2] [,3] [,4]
## [1,] -0.5459384 0.08698788 0.56959722 -0.6082297
## [2,] -0.5004768 0.62861750 0.05704317 0.5925446
## [3,] -0.4461757 -0.77203698 0.14729690 0.4280068
## [4,] -0.5023958 -0.03510062 -0.80660319 -0.3094474
Karena menggunakan matriks korelasi sebagai basis maka proporsi total keragaman sampel yang mampu dijelaskan oleh komponen ke –k :
#mencari proporsi varoance
p = 4
prop_var = eig/p
prop_var## [1] 0.75469638 0.16948300 0.09379862 -0.01797800
#proporsi kumulatif
cumprop_var = cumsum(prop_var)
cumprop_var## [1] 0.7546964 0.9241794 1.0179780 1.0000000
#Membuat scree plot
scree<-plot(prop_var,type="o",ylab="proporsi varians yang dapat dijelaskan",main="Scree Plot No 1")
### Bagian B
print("Ancora Imparo")## [1] "Ancora Imparo"
SOAL UAS
Bagian 3
SOAL UAS
#Komponen 1
(r.y1x1 = eigvec[1,1]%*%sqrt(eig[1]))## [,1]
## [1,] -0.9485489
(r.y1x2 = eigvec[2,1]%*%sqrt(eig[1]))## [,1]
## [1,] -0.8695611
(r.y1x3 = eigvec[3,1]%*%sqrt(eig[1]))## [,1]
## [1,] -0.7752147
(r.y1x4 = eigvec[4,1]%*%sqrt(eig[1]))## [,1]
## [1,] -0.8728952
#Komponen 2
(r.y2x1 = eigvec[1,2]%*%sqrt(eig[2]))## [,1]
## [1,] 0.07162288
(r.y2x2 = eigvec[2,2]%*%sqrt(eig[2]))## [,1]
## [1,] 0.5175824
(r.y2x3 = eigvec[3,2]%*%sqrt(eig[2]))## [,1]
## [1,] -0.6356692
(r.y2x4 = eigvec[4,2]%*%sqrt(eig[2]))## [,1]
## [1,] -0.02890067
Penerapan 4 PPT
Soal 1
#X1 = % DI SEKTOR INDUSTRI
#X2 = % DI SEKTOR JASA PERORANGAN
#X3 = % DI SEKTOR KEMASYARAKATAN
#n=26 KOTA
#p=3 variabel
#input nilai :
cov <- matrix(c(49.109, 6.535, 7.379, 6.535, 20.933, 17.879, 7.379, 17.879, 46.643),3,3,byrow = T)
cov## [,1] [,2] [,3]
## [1,] 49.109 6.535 7.379
## [2,] 6.535 20.933 17.879
## [3,] 7.379 17.879 46.643
miu <- c(27.008, 12.958, 20.023)
miu## [1] 27.008 12.958 20.023
eig.r=eigen(cov)
eig.r## eigen() decomposition
## $values
## [1] 62.61670 42.47017 11.59813
##
## $vectors
## [,1] [,2] [,3]
## [1,] -0.5803492 0.8114465 0.06891578
## [2,] -0.3961663 -0.2073738 -0.89445422
## [3,] -0.7115104 -0.5463979 0.44181699
nilai_eig=eigen(cov)$values
sum_eig = sum(nilai_eig)#Proporsi keragaman
prop = nilai_eig/sum_eig
prop## [1] 0.53663024 0.36397281 0.09939695
kumulatif = cumsum(prop)
kumulatif## [1] 0.5366302 0.9006030 1.0000000
#Scree plot
#Cara 1
plot(prop,type="o",ylab="Proporsi varians yang dapat dijelaskan")#Cara 2
library(ggplot2)
peluang=cbind(prop)
qplot(c(1:3),peluang)+
geom_line()+
xlab("Komponen Utama")+
ylab("Nilai Eigen")+
ggtitle("Scree Plot")#Jika mau dirubah ke matriks korelasi (terstandarisasi)
Soal1 <- princomp(cov, cor = T)
summary(Soal1,loadings = TRUE)## Importance of components:
## Comp.1 Comp.2 Comp.3
## Standard deviation 1.581211 0.7069462 0
## Proportion of Variance 0.833409 0.1665910 0
## Cumulative Proportion 0.833409 1.0000000 1
##
## Loadings:
## Comp.1 Comp.2 Comp.3
## [1,] 0.624 0.228 0.747
## [2,] -0.595 -0.482 0.644
## [3,] -0.507 0.846 0.165
#Mencari korelasi dgn variabel awal :
#korelasi komponen 1 untuk matrika R
ry1x1<-eig.r$vectors[1,1]*sqrt(eig.r$values[1])/sqrt(cov[1,1])
ry1x1## [1] -0.6553207
ry1x2<-eig.r$vectors[2,1]*sqrt(eig.r$values[1])/sqrt(cov[2,2])
ry1x2## [1] -0.6851835
ry1x3<-eig.r$vectors[3,1]*sqrt(eig.r$values[1])/sqrt(cov[3,3])
ry1x3## [1] -0.8243908
SOal 2
Cuplikan Data
Data <- read.csv("trackrecord.csv")
head(Data,10)## Country X100m X200m X400m X800m X1500m X3000m Marathon
## 1 ARG 11.57 22.94 52.50 2.05 4.25 9.19 150.3
## 2 AUS 11.12 22.23 48.63 1.98 4.02 8.63 143.5
## 3 AUT 11.15 22.70 50.62 1.94 4.05 8.78 154.3
## 4 BEL 11.14 22.48 51.45 1.97 4.08 8.82 143.1
## 5 BER 11.46 23.05 53.30 2.07 4.29 9.81 174.2
## 6 BRA 11.17 22.60 50.62 1.97 4.17 9.04 147.4
## 7 CAN 10.98 22.62 49.91 1.97 4.00 8.54 148.4
## 8 CHI 11.65 23.84 53.68 2.00 4.22 9.26 152.2
## 9 CHN 10.79 22.01 49.81 1.93 3.84 8.10 139.4
## 10 COL 11.31 22.92 49.64 2.04 4.34 9.37 155.2
Data = Data[,-1]
summary(Data)## X100m X200m X400m X800m
## Min. :10.49 Min. :21.34 Min. :47.60 Min. :1.890
## 1st Qu.:11.12 1st Qu.:22.57 1st Qu.:49.97 1st Qu.:1.970
## Median :11.32 Median :22.98 Median :51.65 Median :2.005
## Mean :11.36 Mean :23.12 Mean :51.99 Mean :2.022
## 3rd Qu.:11.57 3rd Qu.:23.61 3rd Qu.:53.12 3rd Qu.:2.070
## Max. :12.52 Max. :25.91 Max. :61.65 Max. :2.290
## X1500m X3000m Marathon
## Min. :3.840 Min. : 8.100 Min. :135.2
## 1st Qu.:4.003 1st Qu.: 8.543 1st Qu.:143.5
## Median :4.100 Median : 8.845 Median :148.4
## Mean :4.189 Mean : 9.081 Mean :153.6
## 3rd Qu.:4.338 3rd Qu.: 9.325 3rd Qu.:157.7
## Max. :5.420 Max. :13.120 Max. :221.1
#Matriks korelasi
round(cor(Data), digits = 3)## X100m X200m X400m X800m X1500m X3000m Marathon
## X100m 1.000 0.941 0.871 0.809 0.782 0.728 0.669
## X200m 0.941 1.000 0.909 0.820 0.801 0.732 0.680
## X400m 0.871 0.909 1.000 0.806 0.720 0.674 0.677
## X800m 0.809 0.820 0.806 1.000 0.905 0.867 0.854
## X1500m 0.782 0.801 0.720 0.905 1.000 0.973 0.791
## X3000m 0.728 0.732 0.674 0.867 0.973 1.000 0.799
## Marathon 0.669 0.680 0.677 0.854 0.791 0.799 1.000
#nilai eigen dan vektor eigen
eig_no2<-eigen(cor(Data))
eig_no2## eigen() decomposition
## $values
## [1] 5.80759599 0.62893409 0.27917300 0.12451284 0.09096433 0.05452422 0.01429552
##
## $vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.3777548 -0.4072266 -0.1407412 0.58702371 -0.16688471 -0.53970842
## [2,] -0.3832057 -0.4136055 -0.1005216 0.19422826 0.09396401 0.74488045
## [3,] -0.3680269 -0.4592983 0.2374990 -0.64550999 0.32687474 -0.24007665
## [4,] -0.3947868 0.1611649 0.1478173 -0.29460262 -0.81919379 0.01681623
## [5,] -0.3892705 0.3088724 -0.4221927 -0.06705941 0.02592430 0.18908759
## [6,] -0.3761078 0.4229767 -0.4062655 -0.08060950 0.35150932 -0.24060940
## [7,] -0.3551982 0.3896878 0.7406303 0.32126288 0.24731320 0.04797235
## [,7]
## [1,] 0.08903512
## [2,] -0.26566629
## [3,] 0.12660348
## [4,] -0.19536460
## [5,] 0.73068219
## [6,] -0.57151771
## [7,] 0.08227664
#Analisis komponen utama
prcompcor(Data)## $`Importance of components:`
## comp.1 comp.2 comp.3 comp.4 comp.5
## Standard deviation 2.4098954 0.79305365 0.52836824 0.35286378 0.3016029
## Proportion of Variance 0.8296566 0.08984773 0.03988186 0.01778755 0.0129949
## Cumulative Proportion 0.8296566 0.91950430 0.95938615 0.97717370 0.9901686
## comp.6 comp.7
## Standard deviation 0.233504215 0.119563887
## Proportion of Variance 0.007789174 0.002042218
## Cumulative Proportion 0.997957782 1.000000000
##
## $`Loadings: `
## variabel comp.1 comp.2 comp.3 comp.4 comp.5 comp.6 comp.7
## 1 X100m -0.37775 -0.40723 -0.14074 0.58702 -0.16688 -0.53971 0.08904
## 2 X200m -0.38321 -0.41361 -0.10052 0.19423 0.09396 0.74488 -0.26567
## 3 X400m -0.36803 -0.45930 0.23750 -0.64551 0.32687 -0.24008 0.12660
## 4 X800m -0.39479 0.16116 0.14782 -0.29460 -0.81919 0.01682 -0.19536
## 5 X1500m -0.38927 0.30887 -0.42219 -0.06706 0.02592 0.18909 0.73068
## 6 X3000m -0.37611 0.42298 -0.40627 -0.08061 0.35151 -0.24061 -0.57152
## 7 Marathon -0.35520 0.38969 0.74063 0.32126 0.24731 0.04797 0.08228
##
## $`scree plot`
## NULL
#Bandingkan
AKU2 <- princomp(Data, cor = T)
summary(AKU2,loadings = TRUE)## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.4098954 0.79305365 0.52836824 0.35286378 0.3016029
## Proportion of Variance 0.8296566 0.08984773 0.03988186 0.01778755 0.0129949
## Cumulative Proportion 0.8296566 0.91950430 0.95938615 0.97717370 0.9901686
## Comp.6 Comp.7
## Standard deviation 0.233504215 0.119563887
## Proportion of Variance 0.007789174 0.002042218
## Cumulative Proportion 0.997957782 1.000000000
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## X100m 0.378 0.407 0.141 0.587 0.167 0.540
## X200m 0.383 0.414 0.101 0.194 -0.745 -0.266
## X400m 0.368 0.459 -0.237 -0.646 -0.327 0.240 0.127
## X800m 0.395 -0.161 -0.148 -0.295 0.819 -0.195
## X1500m 0.389 -0.309 0.422 -0.189 0.731
## X3000m 0.376 -0.423 0.406 -0.352 0.241 -0.572
## Marathon 0.355 -0.390 -0.741 0.321 -0.247