Praktikum 8 APG : AKU

Terkadang Ambisi Tinggi dapat Menjatuhkan ke Jurang Terdalam, Lakukan versi terbaik dan Tetaplah Bersyukur

My Image

Tahapan Reduksi Data

+ Memeriksa korelasi linier antar peubah asal dan keragaman data.
+ Tentukan basis AKU (matriks ragam peragam atau matriks korelasi).
+ Menghitung akar ciri dan vektor ciri sesuai basis AKU.
+ Membuat kombinasi linier dari peubah asal untuk KU (Y_i)
+ Menghitung keragaman total data
+ Menetapkan jumlah komponen untuk tujuan reduksi
+ Kontribusi variabel terhadap komponen utamanya

Penerapan 1

Library Zone

library(tidyverse)

FAKTOR PENENTU POLUSI

Data dikumpulkan untuk menyelidiki faktor penentu polusi. Variabel berikut diperoleh untuk 41 kota AS:

Method AKU

LOAD DATASET

data1<-read.csv(file="air_pollution.csv",sep=",",row.names = 1)
head(data1)
##             SO2 temp manu popul wind precip predays
## Albany       46 47.6   44   116  8.8  33.36     135
## Albuquerque  11 56.8   46   244  8.9   7.77      58
## Atlanta      24 61.5  368   497  9.1  48.34     115
## Baltimore    47 55.0  625   905  9.6  41.31     111
## Bu\valo      11 47.1  391   463 12.4  36.11     166
## Charleston   31 55.2   35    71  6.5  40.75     148

Melihat Matriks Korelasi

round(cor(data1),digits=3)
##            SO2   temp   manu  popul   wind precip predays
## SO2      1.000 -0.434  0.645  0.494  0.095  0.054   0.370
## temp    -0.434  1.000 -0.190 -0.063 -0.350  0.386  -0.430
## manu     0.645 -0.190  1.000  0.955  0.238 -0.032   0.132
## popul    0.494 -0.063  0.955  1.000  0.213 -0.026   0.042
## wind     0.095 -0.350  0.238  0.213  1.000 -0.013   0.164
## precip   0.054  0.386 -0.032 -0.026 -0.013  1.000   0.496
## predays  0.370 -0.430  0.132  0.042  0.164  0.496   1.000
  • NOTE Biasanya dalam regresi linear berganda harus dipenuhi asumsi bahwa tidak terdapat multikolinearitas antar variabel. Tetapi dalam hal ini diperoleh $ Cor(popul,manu)=0.955$ sangat tinggi.Hal ini tentu saja menimbulkan masalah bahwa \(Cov(X_1,X_2)\neq=0\)

  • Kita dapat mengatasi hal ini dengan menggunakan PCA.

Menentukan Komponen Utama

  • Misal terdapat matriks covarians \(\Sigma\) dan memiliki nilai eigen \(\lambda_1 >> \lambda_2 >> ... >>\lambda_p\) dan vektor eigen bersesuaian \(e_1,e_2,e_3,...,e_p\) maka komponen utama dapat ditentukan
  • \(Y_1=e_1^T X_1\)
  • \(Y_2=e_2^T X_2\)
  • \(\vdots\)
  • \(Y_p=e_p^T X_p\)
# SO2 dihapus karena variabel dependen
data_polusi<-data1[,-1]
head(data_polusi,10)
##             temp manu popul wind precip predays
## Albany      47.6   44   116  8.8  33.36     135
## Albuquerque 56.8   46   244  8.9   7.77      58
## Atlanta     61.5  368   497  9.1  48.34     115
## Baltimore   55.0  625   905  9.6  41.31     111
## Bu\valo     47.1  391   463 12.4  36.11     166
## Charleston  55.2   35    71  6.5  40.75     148
## Chicago     50.6 3344  3369 10.4  34.44     122
## Cincinnati  54.0  462   453  7.1  39.04     132
## Cleveland   49.7 1007   751 10.9  34.99     155
## Columbus    51.5  266   540  8.6  37.01     134
#Misalkan ada tambahan negatif temperature
# Membuat Variabel negative temperature (mutate()) keperluan penelitian.
data_polusi <- data_polusi%>%mutate(negtemp=-1*temp)
data_polusi
##                temp manu popul wind precip predays negtemp
## Albany         47.6   44   116  8.8  33.36     135   -47.6
## Albuquerque    56.8   46   244  8.9   7.77      58   -56.8
## Atlanta        61.5  368   497  9.1  48.34     115   -61.5
## Baltimore      55.0  625   905  9.6  41.31     111   -55.0
## Bu\valo        47.1  391   463 12.4  36.11     166   -47.1
## Charleston     55.2   35    71  6.5  40.75     148   -55.2
## Chicago        50.6 3344  3369 10.4  34.44     122   -50.6
## Cincinnati     54.0  462   453  7.1  39.04     132   -54.0
## Cleveland      49.7 1007   751 10.9  34.99     155   -49.7
## Columbus       51.5  266   540  8.6  37.01     134   -51.5
## Dallas         66.2  641   844 10.9  35.94      78   -66.2
## Denver         51.9  454   515  9.0  12.95      86   -51.9
## Des Moines     49.0  104   201 11.2  30.85     103   -49.0
## Detroit        49.9 1064  1513 10.1  30.96     129   -49.9
## Hartford       49.1  412   158  9.0  43.37     127   -49.1
## Houston        68.9  721  1233 10.8  48.19     103   -68.9
## Indianapolis   52.3  361   746  9.7  38.74     121   -52.3
## Jacksonville   68.4  136   529  8.8  54.47     116   -68.4
## Kansas         54.5  381   507 10.0  37.00      99   -54.5
## Little Rock    61.0   91   132  8.2  48.52     100   -61.0
## Louisville     55.6  291   593  8.3  43.11     123   -55.6
## Memphis        61.6  337   624  9.2  49.10     105   -61.6
## Miami          75.5  207   335  9.0  59.80     128   -75.5
## Milwaukee      45.7  569   717 11.8  29.07     123   -45.7
## Minneapolis    43.5  699   744 10.6  25.94     137   -43.5
## Nashville      59.4  275   448  7.9  46.00     119   -59.4
## New Orleans    68.3  204   361  8.4  56.77     113   -68.3
## Norfolk        59.3   96   308 10.6  44.68     116   -59.3
## Omaha          51.5  181   347 10.9  30.18      98   -51.5
## Philadelphia   54.6 1692  1950  9.6  39.93     115   -54.6
## Phoenix        70.3  213   582  6.0   7.05      36   -70.3
## Pittsburgh     50.4  347   520  9.4  36.22     147   -50.4
## Providence     50.0  343   179 10.6  42.75     125   -50.0
## Richmond       57.8  197   299  7.6  42.59     115   -57.8
## Salt Lake City 51.0  137   176  8.7  15.17      89   -51.0
## San Fransisco  56.7  453   716  8.7  20.66      67   -56.7
## Seattle        51.1  379   531  9.4  38.79     164   -51.1
## St. Louis      55.9  775   622  9.5  35.89     105   -55.9
## Washington     57.3  434   757  9.3  38.89     111   -57.3
## Wichita        56.6  125   277 12.7  30.58      82   -56.6
## Wilmington     54.0   80    80  9.0  40.25     114   -54.0
#oleh karena temperature menjadi negatif, maka hapus variabel temperatur
# Pilih variabel kecuali temperature
data_polusi <- select(data_polusi,-temp)
head(data_polusi,10)
##             manu popul wind precip predays negtemp
## Albany        44   116  8.8  33.36     135   -47.6
## Albuquerque   46   244  8.9   7.77      58   -56.8
## Atlanta      368   497  9.1  48.34     115   -61.5
## Baltimore    625   905  9.6  41.31     111   -55.0
## Bu\valo      391   463 12.4  36.11     166   -47.1
## Charleston    35    71  6.5  40.75     148   -55.2
## Chicago     3344  3369 10.4  34.44     122   -50.6
## Cincinnati   462   453  7.1  39.04     132   -54.0
## Cleveland   1007   751 10.9  34.99     155   -49.7
## Columbus     266   540  8.6  37.01     134   -51.5

Basis

# Membuat Matriks Covarians
matcov1<-cov(scale(data_polusi))
matcov1
##                manu       popul        wind      precip    predays     negtemp
## manu     1.00000000  0.95526935  0.23794683 -0.03241688 0.13182930  0.19004216
## popul    0.95526935  1.00000000  0.21264375 -0.02611873 0.04208319  0.06267813
## wind     0.23794683  0.21264375  1.00000000 -0.01299438 0.16410559  0.34973963
## precip  -0.03241688 -0.02611873 -0.01299438  1.00000000 0.49609671 -0.38625342
## predays  0.13182930  0.04208319  0.16410559  0.49609671 1.00000000  0.43024212
## negtemp  0.19004216  0.06267813  0.34973963 -0.38625342 0.43024212  1.00000000
# Bandingkan
cor(data_polusi)
##                manu       popul        wind      precip    predays     negtemp
## manu     1.00000000  0.95526935  0.23794683 -0.03241688 0.13182930  0.19004216
## popul    0.95526935  1.00000000  0.21264375 -0.02611873 0.04208319  0.06267813
## wind     0.23794683  0.21264375  1.00000000 -0.01299438 0.16410559  0.34973963
## precip  -0.03241688 -0.02611873 -0.01299438  1.00000000 0.49609671 -0.38625342
## predays  0.13182930  0.04208319  0.16410559  0.49609671 1.00000000  0.43024212
## negtemp  0.19004216  0.06267813  0.34973963 -0.38625342 0.43024212  1.00000000
#Jadi kesimpulannya : memakai basis matriks korelasi
# Nilai eigen matriks covarianns
eig1<-eigen(matcov1)$values
# Nilai vektor eigen matriks covarians
eigvec1<-eigen(matcov1)$vectors
print("Nilai Eigen")
## [1] "Nilai Eigen"
print(eig1)
## [1] 2.19616264 1.49994343 1.39464912 0.76022689 0.11457065 0.03444727
print("Vektor Eigen Bersesuaian")
## [1] "Vektor Eigen Bersesuaian"
print(eigvec1)
##             [,1]       [,2]        [,3]        [,4]        [,5]        [,6]
## [1,] -0.61154243  0.1680577 -0.27288633 -0.13684076  0.10204211  0.70297051
## [2,] -0.57782195  0.2224533 -0.35037413 -0.07248126 -0.07806551 -0.69464131
## [3,] -0.35383877 -0.1307915  0.29725334  0.86942583 -0.11326688  0.02452501
## [4,]  0.04080701 -0.6228578 -0.50456294  0.17114826  0.56818342 -0.06062222
## [5,] -0.23791593 -0.7077653  0.09308852 -0.31130693 -0.58000387  0.02196062
## [6,] -0.32964613 -0.1275974  0.67168611 -0.30645728  0.55805638 -0.13618780

Menentukan Nilai standar deviasi dari setiap komponen utama

  • Standar deviasi dapat ditentukan dari varians komponen utama Yi, \(Var(Y_i)=\lambda_i\) maka stdev nya adalah \(\sqrt{\lambda_i}\)
print("standar deviasi")
## [1] "standar deviasi"
sqrt(eig1)
## [1] 1.4819456 1.2247218 1.1809526 0.8719099 0.3384829 0.1855998

Menentukan total varians

  • Total varians dapat dinyatakan \(Total \space variance = Var(Y_1 )+Var(Y_2) +... +Var(Y_p)=\lambda_1 +\lambda_2 +...+\lambda_p\)
# total varians adalah jumlah eigen
total_var<-sum(eig1)
total_var
## [1] 6

Menentukan proporsi varians untuk setiap komponen

  • Proporsi varians komponen utama ke i terhadap total varians dapat dituliskan sebagai berikut
  • \(proporsi \space var =\frac{\lambda_i}{\lambda_1 +\lambda_2 +...+\lambda_p}\)
# proporsi, nilai var ke i dibagi var total
prop_var<-eig1/total_var
prop_var
## [1] 0.366027107 0.249990572 0.232441520 0.126704481 0.019095109 0.005741211

Menentukan cumulative varians

# Cumulative proporportion varians
cumsum(prop_var)
## [1] 0.3660271 0.6160177 0.8484592 0.9751637 0.9942588 1.0000000

Menentukan Tabel Loadings

  • Ide dasar adalah
  • \(Y_1=e_1^T X_1\)
  • \(Y_2=e_2^T X_2\)
  • \(\vdots\)
  • \(Y_p=e_p^T X_p\)
varname <- colnames(data_polusi)
loadings <- data.frame(varname,round(eigvec1,5))
colnames(loadings)<-c("var","comp.1","comp.2","comp.3","comp.4","comp.5","comp.6")
loadings
##       var   comp.1   comp.2   comp.3   comp.4   comp.5   comp.6
## 1    manu -0.61154  0.16806 -0.27289 -0.13684  0.10204  0.70297
## 2   popul -0.57782  0.22245 -0.35037 -0.07248 -0.07807 -0.69464
## 3    wind -0.35384 -0.13079  0.29725  0.86943 -0.11327  0.02453
## 4  precip  0.04081 -0.62286 -0.50456  0.17115  0.56818 -0.06062
## 5 predays -0.23792 -0.70777  0.09309 -0.31131 -0.58000  0.02196
## 6 negtemp -0.32965 -0.12760  0.67169 -0.30646  0.55806 -0.13619

Alternatif Otomatis Menggunakan printcomp

summary(princomp(scale(data_polusi)),loadings = TRUE)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.4637615 1.2096939 1.1664619 0.8612112 0.33432955
## Proportion of Variance 0.3660271 0.2499906 0.2324415 0.1267045 0.01909511
## Cumulative Proportion  0.3660271 0.6160177 0.8484592 0.9751637 0.99425879
##                             Comp.6
## Standard deviation     0.183322369
## Proportion of Variance 0.005741211
## Cumulative Proportion  1.000000000
## 
## Loadings:
##         Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## manu     0.612  0.168  0.273  0.137  0.102  0.703
## popul    0.578  0.222  0.350               -0.695
## wind     0.354 -0.131 -0.297 -0.869 -0.113       
## precip         -0.623  0.505 -0.171  0.568       
## predays  0.238 -0.708         0.311 -0.580       
## negtemp  0.330 -0.128 -0.672  0.306  0.558 -0.136
library(FactoMineR)
pca<-PCA(scale(data_polusi))

# Menampilkan tabel Komponen Penting
pca$eig
##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1 2.19616264             36.6027107                          36.60271
## comp 2 1.49994343             24.9990572                          61.60177
## comp 3 1.39464912             23.2441520                          84.84592
## comp 4 0.76022689             12.6704481                          97.51637
## comp 5 0.11457065              1.9095109                          99.42588
## comp 6 0.03444727              0.5741211                         100.00000
# Menampilkan Scree Plot
plot(prop_var,type="o",ylab="Proporsi varians yang dapat dijelaskan")

Penerapan 2

Enam variabel hematologi diukur pada 51 pekerja (Royston 1983): y1 = konsentrasi hemoglobin y4 = jumlah limfosit y2 = volume sel yang dikemas y5 = jumlah neutrofil y3 = jumlah sel darah putih y6 = konsentrasi timbal serum

LOAD DATA

library(readxl)
Data <- read_excel("Data.xlsx")
data_aku <- data.frame(Data$y1,Data$y2,Data$y3,Data$y4,Data$y5,Data$y6)
head(data_aku,10)
##    Data.y1 Data.y2 Data.y3 Data.y4 Data.y5 Data.y6
## 1     13.4      39    4100      14      25      17
## 2     14.6      46    5000      15      30      20
## 3     13.5      42    4500      19      21      18
## 4     15.0      46    4600      23      16      18
## 5     14.6      44    5100      17      31      19
## 6     14.0      44    4900      20      24      19
## 7     16.4      49    4300      21      17      18
## 8     14.8      44    4400      16      26      29
## 9     15.2      46    4100      27      13      27
## 10    15.5      48    8400      34      42      36
dim(Data)
## [1] 35  7

####Menghitung korelasi antar variabel

round(cor(data_aku), digits = 3)
##         Data.y1 Data.y2 Data.y3 Data.y4 Data.y5 Data.y6
## Data.y1   1.000   0.788   0.384   0.398   0.106  -0.103
## Data.y2   0.788   1.000   0.401   0.437   0.095   0.197
## Data.y3   0.384   0.401   1.000   0.750   0.697   0.202
## Data.y4   0.398   0.437   0.750   1.000   0.077   0.283
## Data.y5   0.106   0.095   0.697   0.077   1.000   0.083
## Data.y6  -0.103   0.197   0.202   0.283   0.083   1.000

Nilai dan vektor eigen basis

eigen(cov(scale(data_aku)))
## eigen() decomposition
## $values
## [1] 2.7784901 1.3142591 1.0652472 0.6742434 0.1561809 0.0115794
## 
## $vectors
##           [,1]        [,2]        [,3]       [,4]         [,5]        [,6]
## [1,] 0.4282630  0.51189355 -0.22147091  0.1692936  0.690388026 -0.01450898
## [2,] 0.4580183  0.42928877  0.07092817  0.3829351 -0.672686824  0.04186590
## [3,] 0.5247663 -0.35858404 -0.14567324 -0.2331124 -0.064318534 -0.71856724
## [4,] 0.4695834 -0.02377455  0.29289957 -0.6582804 -0.007574298  0.50965244
## [5,] 0.2842325 -0.57762786 -0.47414638  0.3752621  0.017697085  0.46862384
## [6,] 0.1735906 -0.30145391  0.78364162  0.4429496  0.257583734 -0.04841434

####Melakukan analisis komponen utama

AKU <- princomp(data_aku, cor = T)
summary(AKU,loadings = TRUE)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.6668803 1.1464114 1.0321081 0.8211232 0.39519726
## Proportion of Variance 0.4630817 0.2190432 0.1775412 0.1123739 0.02603015
## Cumulative Proportion  0.4630817 0.6821249 0.8596661 0.9720400 0.99807010
##                           Comp.6
## Standard deviation     0.1076076
## Proportion of Variance 0.0019299
## Cumulative Proportion  1.0000000
## 
## Loadings:
##         Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Data.y1  0.428  0.512  0.221  0.169  0.690       
## Data.y2  0.458  0.429         0.383 -0.673       
## Data.y3  0.525 -0.359  0.146 -0.233         0.719
## Data.y4  0.470        -0.293 -0.658        -0.510
## Data.y5  0.284 -0.578  0.474  0.375        -0.469
## Data.y6  0.174 -0.301 -0.784  0.443  0.258

####korelasi nilai KU dengan nilai variabel awal

KU <- AKU$scores
KU
##            Comp.1       Comp.2      Comp.3      Comp.4      Comp.5
##  [1,] -3.11186781 -1.519890576  0.60250842 -1.06198766  0.33170661
##  [2,] -0.35665952 -0.265798888  0.54353366  0.66616828 -0.72801362
##  [3,] -2.08478412 -0.770105455 -0.02372771 -1.11460739 -0.48391022
##  [4,] -0.41894446  1.216209276 -0.21668500 -0.72720878 -0.54268338
##  [5,] -0.61196928 -0.698412739  0.77149013  0.08415192 -0.17698531
##  [6,] -1.05792974 -0.486440062  0.05939822 -0.58754032 -0.65905453
##  [7,]  0.68479171  2.633266994  0.14390813  0.33613844 -0.33594537
##  [8,] -0.66511092 -0.662126029 -1.19567479  1.03845519  0.55506756
##  [9,] -0.05339881  1.099185569 -2.06200218 -0.23203905  0.12417654
## [10,]  4.03871964 -2.253410597 -1.58650080  1.06046538  0.09390194
## [11,]  1.01633663  0.175272550 -0.20342853 -0.04024993 -0.49709169
## [12,]  2.06655200  2.558603219 -0.77526690  0.34298746 -0.01781005
## [13,] -0.50102452  0.061481666 -0.82426505 -0.55087725  0.18870217
## [14,]  0.90561406  0.711257371  0.49493920 -0.57454777  0.73234826
## [15,] -1.57506972  0.906096753 -0.27701867 -1.43896371  0.10428713
## [16,] -2.69250647  0.510362257  1.43413117 -0.26272307  0.25286361
## [17,]  0.66777252  0.577207418  1.61087956  0.34580216  0.89225288
## [18,]  1.74012353 -1.252315563  0.19106561 -0.12827151  0.33616722
## [19,] -0.50605931  0.201794602  1.02394038  0.15794463 -0.17824784
## [20,] -0.81736811 -0.424153229 -1.25153949  1.33288340 -0.57422851
## [21,] -0.06839551  0.423151678  2.07181865  1.29054512  0.08290469
## [22,]  0.36650054  2.110069638 -0.18186142 -0.64384438 -0.07094278
## [23,]  4.83159653  0.629005826  1.73818228 -0.45762783 -0.08409046
## [24,] -0.62498976 -1.186965735  0.65865777 -0.47362102 -0.13069561
## [25,] -0.87833025 -0.456675453 -1.30135562  1.06289370  0.56130211
## [26,] -1.51543781 -0.420173680  1.48910846  0.64897774  0.30401791
## [27,]  0.01610981  0.007864505  0.78770082  0.54588649  0.27615526
## [28,] -1.53739029 -0.053125970 -1.18592578 -0.03749289  0.40807035
## [29,]  0.22398227 -1.567138288  0.48261117  0.72644434 -0.45026818
## [30,] -0.47472798 -0.383964815 -1.44621437 -0.07788308  0.25443868
## [31,]  2.49011314 -0.604317984 -0.91204277 -2.31896519  0.10923895
## [32,] -1.33241192  0.996035320 -1.18127394  0.32242790  0.07639320
## [33,]  1.43229841  0.120991169  0.57694843  0.40528154 -0.08438277
## [34,]  1.61642875 -2.416516234  0.29498809 -0.84291772 -0.23727983
## [35,] -1.21256318  0.483675485 -0.35102713  1.20391487 -0.43236492
##              Comp.6
##  [1,] -0.0238401149
##  [2,]  0.0203774380
##  [3,]  0.0605814558
##  [4,]  0.1006476084
##  [5,] -0.0888975671
##  [6,]  0.0141434382
##  [7,] -0.0254807975
##  [8,]  0.0008867849
##  [9,] -0.1595983329
## [10,]  0.0512421662
## [11,] -0.1840994838
## [12,] -0.0078849245
## [13,] -0.0639425782
## [14,] -0.0406335482
## [15,] -0.0057361488
## [16,]  0.0256491180
## [17,] -0.0589059251
## [18,] -0.0828015815
## [19,] -0.2736857920
## [20,]  0.0248660998
## [21,]  0.3307066235
## [22,]  0.1411742684
## [23,] -0.0600881194
## [24,] -0.0536666012
## [25,]  0.0773086911
## [26,] -0.0563481458
## [27,]  0.0978003203
## [28,]  0.0483125242
## [29,]  0.0623357904
## [30,] -0.0200208681
## [31,]  0.2441167685
## [32,] -0.0210090385
## [33,] -0.0682345021
## [34,] -0.0307637215
## [35,]  0.0254886953
cor(data_aku,KU)
##            Comp.1      Comp.2      Comp.3     Comp.4       Comp.5       Comp.6
## Data.y1 0.7138631  0.58684061  0.22858192  0.1390109  0.272839455  0.001561276
## Data.y2 0.7634617  0.49214155 -0.07320553  0.3144369 -0.265843989 -0.004505089
## Data.y3 0.8747226 -0.41108484  0.15035053 -0.1914140 -0.025418508  0.077323304
## Data.y4 0.7827393 -0.02725542 -0.30230403 -0.5405293 -0.002993342 -0.054842482
## Data.y5 0.4737816 -0.66219918  0.48937032  0.3081364  0.006993839 -0.050427492
## Data.y6 0.2893548 -0.34559021 -0.80880288  0.3637162  0.101796385  0.005209752

Menentukan Komponen utama dengan basis matriks Covarians

prcompcov<-function(data){
  cov<-cov(data)
  eig<-eigen(cov)$values
  eigvec<-eigen(cov)$vectors
  std<-sqrt(eig)
  var_tot<-sum(eig)
  prop_var<-eig/var_tot
  cum_prob<-cumsum(prop_var)
 
  prcomp_name<-rep(0,ncol(data))
  for ( i in 1:ncol(data)){
     prcomp_name[i]<-paste("comp",i,sep=".")
  }
  
  loadings<-data.frame(colnames(data),round(eigvec,5))
  colnames(loadings)<-c("variabel",prcomp_name)
  importancevar<-data.frame(rbind(std,prop_var,cum_prob))
  rownames(importancevar)<-c("Standard deviation","Proportion of Variance","Cumulative Proportion")
  colnames(importancevar)<-prcomp_name
  scree<-plot(prop_var,type="o",ylab="proporsi varians yang dapat dijelaskan")
  result<-list(importancevar,loadings,scree)
  names(result)<-c("Importance of components:","Loadings:","scree plot")
  return(result)
}
#data hemolitik di skala
prcompcov(scale(data_aku))

## $`Importance of components:`
##                           comp.1    comp.2    comp.3    comp.4     comp.5
## Standard deviation     1.6668803 1.1464114 1.0321081 0.8211232 0.39519726
## Proportion of Variance 0.4630817 0.2190432 0.1775412 0.1123739 0.02603015
## Cumulative Proportion  0.4630817 0.6821249 0.8596661 0.9720400 0.99807010
##                           comp.6
## Standard deviation     0.1076076
## Proportion of Variance 0.0019299
## Cumulative Proportion  1.0000000
## 
## $`Loadings:`
##   variabel  comp.1   comp.2   comp.3   comp.4   comp.5   comp.6
## 1  Data.y1 0.42826  0.51189 -0.22147  0.16929  0.69039 -0.01451
## 2  Data.y2 0.45802  0.42929  0.07093  0.38294 -0.67269  0.04187
## 3  Data.y3 0.52477 -0.35858 -0.14567 -0.23311 -0.06432 -0.71857
## 4  Data.y4 0.46958 -0.02377  0.29290 -0.65828 -0.00757  0.50965
## 5  Data.y5 0.28423 -0.57763 -0.47415  0.37526  0.01770  0.46862
## 6  Data.y6 0.17359 -0.30145  0.78364  0.44295  0.25758 -0.04841
## 
## $`scree plot`
## NULL

Menentukan komponen utama dengan basis matriks korelasi

  • \(vartotal = p\)
  • \(proporsi\space varians =\frac{\lambda_i}{p}\)
prcompcor<-function(data){
  cor<-cor(data)
  p<-ncol(data)
  eig<-eigen(cor)$values
  eigvec<-eigen(cor)$vectors
  std<-sqrt(eig)
  meanvec<-colMeans(data)
  vartot<-p
  prop_var<-eig/vartot
  cum_prob<-cumsum(prop_var)
  
  prcomp_name<-rep(0,ncol(data))
    for ( i in 1:ncol(data)){
       prcomp_name[i]<-paste("comp",i,sep=".")
  }
  loadings<-data.frame(colnames(data),round(eigvec,5))
  colnames(loadings)<-c("variabel",prcomp_name)
  importancevar<-data.frame(rbind(std,prop_var,cum_prob))
  rownames(importancevar)<-c("Standard deviation","Proportion of Variance","Cumulative Proportion")
  colnames(importancevar)<-prcomp_name
  scree<-plot(prop_var,type="o",ylab="proporsi varians yang dapat dijelaskan")
  result<-list(importancevar,loadings,scree)
  names(result)<-c("Importance of components:","Loadings: ","scree plot")
  return(result)
}
prcompcor(data_aku)

## $`Importance of components:`
##                           comp.1    comp.2    comp.3    comp.4     comp.5
## Standard deviation     1.6668803 1.1464114 1.0321081 0.8211232 0.39519726
## Proportion of Variance 0.4630817 0.2190432 0.1775412 0.1123739 0.02603015
## Cumulative Proportion  0.4630817 0.6821249 0.8596661 0.9720400 0.99807010
##                           comp.6
## Standard deviation     0.1076076
## Proportion of Variance 0.0019299
## Cumulative Proportion  1.0000000
## 
## $`Loadings: `
##   variabel   comp.1   comp.2   comp.3   comp.4   comp.5   comp.6
## 1  Data.y1 -0.42826  0.51189 -0.22147  0.16929  0.69039 -0.01451
## 2  Data.y2 -0.45802  0.42929  0.07093  0.38294 -0.67269  0.04187
## 3  Data.y3 -0.52477 -0.35858 -0.14567 -0.23311 -0.06432 -0.71857
## 4  Data.y4 -0.46958 -0.02377  0.29290 -0.65828 -0.00757  0.50965
## 5  Data.y5 -0.28423 -0.57763 -0.47415  0.37526  0.01770  0.46862
## 6  Data.y6 -0.17359 -0.30145  0.78364  0.44295  0.25758 -0.04841
## 
## $`scree plot`
## NULL
  • Manakah yang lebih cocok? Keduanya cocok karena menghasilkan hasil yang sama
  • Berapa banyak komponen yang akan dipertahankan? Dari importance of component terlihat bahwa hanya Komponen 1,2, dan 3 yang memiliki standar deviasi lebih dari 1, Sehingga kita bisa pertahankan komponen 1,2,3 dan abaikan komponen 4,5,6. Selain itu, karena komponen 1, 2, dan 3 juga telah memiliki proporsi varians kumulatif sebesar 85.967% maka komponen 1,2,3 bisa dipertahankan.

Penerapan 3 Soal UAS

SOAL UAS

Bagian A

matR = matrix(data=c(1,0.9,0.74,0.64,
                     0.9,1,0.33,0.74,
                     0.74,0.33,1,0.66,
                     0.64,0.74,0.66,1),4,4)
matR
##      [,1] [,2] [,3] [,4]
## [1,] 1.00 0.90 0.74 0.64
## [2,] 0.90 1.00 0.33 0.74
## [3,] 0.74 0.33 1.00 0.66
## [4,] 0.64 0.74 0.66 1.00
#Mendapatkan eigen value dan eigen vactor (loading)
eig = eigen(matR)$values
eig
## [1]  3.01878551  0.67793201  0.37519449 -0.07191201
eigvec = eigen(matR)$vectors
eigvec
##            [,1]        [,2]        [,3]       [,4]
## [1,] -0.5459384  0.08698788  0.56959722 -0.6082297
## [2,] -0.5004768  0.62861750  0.05704317  0.5925446
## [3,] -0.4461757 -0.77203698  0.14729690  0.4280068
## [4,] -0.5023958 -0.03510062 -0.80660319 -0.3094474

Karena menggunakan matriks korelasi sebagai basis maka proporsi total keragaman sampel yang mampu dijelaskan oleh komponen ke –k :

#mencari proporsi varoance
p = 4
prop_var = eig/p
prop_var
## [1]  0.75469638  0.16948300  0.09379862 -0.01797800
#proporsi kumulatif
cumprop_var = cumsum(prop_var)
cumprop_var
## [1] 0.7546964 0.9241794 1.0179780 1.0000000
#Membuat scree plot
scree<-plot(prop_var,type="o",ylab="proporsi varians yang dapat dijelaskan",main="Scree Plot No 1")

### Bagian B

print("Ancora Imparo")
## [1] "Ancora Imparo"

SOAL UAS

Bagian 3

SOAL UAS

#Komponen 1
(r.y1x1 = eigvec[1,1]%*%sqrt(eig[1]))
##            [,1]
## [1,] -0.9485489
(r.y1x2 = eigvec[2,1]%*%sqrt(eig[1]))
##            [,1]
## [1,] -0.8695611
(r.y1x3 = eigvec[3,1]%*%sqrt(eig[1]))
##            [,1]
## [1,] -0.7752147
(r.y1x4 = eigvec[4,1]%*%sqrt(eig[1]))
##            [,1]
## [1,] -0.8728952
#Komponen 2
(r.y2x1 = eigvec[1,2]%*%sqrt(eig[2]))
##            [,1]
## [1,] 0.07162288
(r.y2x2 = eigvec[2,2]%*%sqrt(eig[2]))
##           [,1]
## [1,] 0.5175824
(r.y2x3 = eigvec[3,2]%*%sqrt(eig[2]))
##            [,1]
## [1,] -0.6356692
(r.y2x4 = eigvec[4,2]%*%sqrt(eig[2]))
##             [,1]
## [1,] -0.02890067

Penerapan 4 PPT

Soal 1

#X1 = % DI SEKTOR INDUSTRI
#X2 = % DI SEKTOR JASA PERORANGAN
#X3 = % DI SEKTOR KEMASYARAKATAN
#n=26 KOTA
#p=3 variabel
#input nilai :
cov <- matrix(c(49.109, 6.535, 7.379, 6.535, 20.933, 17.879, 7.379, 17.879, 46.643),3,3,byrow = T)
cov
##        [,1]   [,2]   [,3]
## [1,] 49.109  6.535  7.379
## [2,]  6.535 20.933 17.879
## [3,]  7.379 17.879 46.643
miu <- c(27.008, 12.958, 20.023)
miu
## [1] 27.008 12.958 20.023
eig.r=eigen(cov)
eig.r
## eigen() decomposition
## $values
## [1] 62.61670 42.47017 11.59813
## 
## $vectors
##            [,1]       [,2]        [,3]
## [1,] -0.5803492  0.8114465  0.06891578
## [2,] -0.3961663 -0.2073738 -0.89445422
## [3,] -0.7115104 -0.5463979  0.44181699
nilai_eig=eigen(cov)$values
sum_eig = sum(nilai_eig)
#Proporsi keragaman
prop = nilai_eig/sum_eig
prop
## [1] 0.53663024 0.36397281 0.09939695
kumulatif = cumsum(prop)
kumulatif
## [1] 0.5366302 0.9006030 1.0000000
#Scree plot
#Cara 1
plot(prop,type="o",ylab="Proporsi varians yang dapat dijelaskan")

#Cara 2
library(ggplot2)
peluang=cbind(prop)
qplot(c(1:3),peluang)+
  geom_line()+
  xlab("Komponen Utama")+
  ylab("Nilai Eigen")+
  ggtitle("Scree Plot")

#Jika mau dirubah ke matriks korelasi (terstandarisasi)
Soal1 <- princomp(cov, cor = T)
summary(Soal1,loadings = TRUE)
## Importance of components:
##                          Comp.1    Comp.2 Comp.3
## Standard deviation     1.581211 0.7069462      0
## Proportion of Variance 0.833409 0.1665910      0
## Cumulative Proportion  0.833409 1.0000000      1
## 
## Loadings:
##      Comp.1 Comp.2 Comp.3
## [1,]  0.624  0.228  0.747
## [2,] -0.595 -0.482  0.644
## [3,] -0.507  0.846  0.165
#Mencari korelasi dgn variabel awal :
#korelasi komponen 1 untuk matrika R
ry1x1<-eig.r$vectors[1,1]*sqrt(eig.r$values[1])/sqrt(cov[1,1])
ry1x1
## [1] -0.6553207
ry1x2<-eig.r$vectors[2,1]*sqrt(eig.r$values[1])/sqrt(cov[2,2])
ry1x2
## [1] -0.6851835
ry1x3<-eig.r$vectors[3,1]*sqrt(eig.r$values[1])/sqrt(cov[3,3])
ry1x3
## [1] -0.8243908

SOal 2

Cuplikan Data

Data <- read.csv("trackrecord.csv")
head(Data,10)
##    Country X100m X200m X400m X800m X1500m X3000m Marathon
## 1      ARG 11.57 22.94 52.50  2.05   4.25   9.19    150.3
## 2      AUS 11.12 22.23 48.63  1.98   4.02   8.63    143.5
## 3      AUT 11.15 22.70 50.62  1.94   4.05   8.78    154.3
## 4      BEL 11.14 22.48 51.45  1.97   4.08   8.82    143.1
## 5      BER 11.46 23.05 53.30  2.07   4.29   9.81    174.2
## 6      BRA 11.17 22.60 50.62  1.97   4.17   9.04    147.4
## 7      CAN 10.98 22.62 49.91  1.97   4.00   8.54    148.4
## 8      CHI 11.65 23.84 53.68  2.00   4.22   9.26    152.2
## 9      CHN 10.79 22.01 49.81  1.93   3.84   8.10    139.4
## 10     COL 11.31 22.92 49.64  2.04   4.34   9.37    155.2
Data = Data[,-1]
summary(Data)
##      X100m           X200m           X400m           X800m      
##  Min.   :10.49   Min.   :21.34   Min.   :47.60   Min.   :1.890  
##  1st Qu.:11.12   1st Qu.:22.57   1st Qu.:49.97   1st Qu.:1.970  
##  Median :11.32   Median :22.98   Median :51.65   Median :2.005  
##  Mean   :11.36   Mean   :23.12   Mean   :51.99   Mean   :2.022  
##  3rd Qu.:11.57   3rd Qu.:23.61   3rd Qu.:53.12   3rd Qu.:2.070  
##  Max.   :12.52   Max.   :25.91   Max.   :61.65   Max.   :2.290  
##      X1500m          X3000m          Marathon    
##  Min.   :3.840   Min.   : 8.100   Min.   :135.2  
##  1st Qu.:4.003   1st Qu.: 8.543   1st Qu.:143.5  
##  Median :4.100   Median : 8.845   Median :148.4  
##  Mean   :4.189   Mean   : 9.081   Mean   :153.6  
##  3rd Qu.:4.338   3rd Qu.: 9.325   3rd Qu.:157.7  
##  Max.   :5.420   Max.   :13.120   Max.   :221.1
#Matriks korelasi
round(cor(Data), digits = 3)
##          X100m X200m X400m X800m X1500m X3000m Marathon
## X100m    1.000 0.941 0.871 0.809  0.782  0.728    0.669
## X200m    0.941 1.000 0.909 0.820  0.801  0.732    0.680
## X400m    0.871 0.909 1.000 0.806  0.720  0.674    0.677
## X800m    0.809 0.820 0.806 1.000  0.905  0.867    0.854
## X1500m   0.782 0.801 0.720 0.905  1.000  0.973    0.791
## X3000m   0.728 0.732 0.674 0.867  0.973  1.000    0.799
## Marathon 0.669 0.680 0.677 0.854  0.791  0.799    1.000
#nilai eigen dan vektor eigen
eig_no2<-eigen(cor(Data))
eig_no2
## eigen() decomposition
## $values
## [1] 5.80759599 0.62893409 0.27917300 0.12451284 0.09096433 0.05452422 0.01429552
## 
## $vectors
##            [,1]       [,2]       [,3]        [,4]        [,5]        [,6]
## [1,] -0.3777548 -0.4072266 -0.1407412  0.58702371 -0.16688471 -0.53970842
## [2,] -0.3832057 -0.4136055 -0.1005216  0.19422826  0.09396401  0.74488045
## [3,] -0.3680269 -0.4592983  0.2374990 -0.64550999  0.32687474 -0.24007665
## [4,] -0.3947868  0.1611649  0.1478173 -0.29460262 -0.81919379  0.01681623
## [5,] -0.3892705  0.3088724 -0.4221927 -0.06705941  0.02592430  0.18908759
## [6,] -0.3761078  0.4229767 -0.4062655 -0.08060950  0.35150932 -0.24060940
## [7,] -0.3551982  0.3896878  0.7406303  0.32126288  0.24731320  0.04797235
##             [,7]
## [1,]  0.08903512
## [2,] -0.26566629
## [3,]  0.12660348
## [4,] -0.19536460
## [5,]  0.73068219
## [6,] -0.57151771
## [7,]  0.08227664
#Analisis komponen utama
prcompcor(Data)

## $`Importance of components:`
##                           comp.1     comp.2     comp.3     comp.4    comp.5
## Standard deviation     2.4098954 0.79305365 0.52836824 0.35286378 0.3016029
## Proportion of Variance 0.8296566 0.08984773 0.03988186 0.01778755 0.0129949
## Cumulative Proportion  0.8296566 0.91950430 0.95938615 0.97717370 0.9901686
##                             comp.6      comp.7
## Standard deviation     0.233504215 0.119563887
## Proportion of Variance 0.007789174 0.002042218
## Cumulative Proportion  0.997957782 1.000000000
## 
## $`Loadings: `
##   variabel   comp.1   comp.2   comp.3   comp.4   comp.5   comp.6   comp.7
## 1    X100m -0.37775 -0.40723 -0.14074  0.58702 -0.16688 -0.53971  0.08904
## 2    X200m -0.38321 -0.41361 -0.10052  0.19423  0.09396  0.74488 -0.26567
## 3    X400m -0.36803 -0.45930  0.23750 -0.64551  0.32687 -0.24008  0.12660
## 4    X800m -0.39479  0.16116  0.14782 -0.29460 -0.81919  0.01682 -0.19536
## 5   X1500m -0.38927  0.30887 -0.42219 -0.06706  0.02592  0.18909  0.73068
## 6   X3000m -0.37611  0.42298 -0.40627 -0.08061  0.35151 -0.24061 -0.57152
## 7 Marathon -0.35520  0.38969  0.74063  0.32126  0.24731  0.04797  0.08228
## 
## $`scree plot`
## NULL
#Bandingkan
AKU2 <- princomp(Data, cor = T)
summary(AKU2,loadings = TRUE)
## Importance of components:
##                           Comp.1     Comp.2     Comp.3     Comp.4    Comp.5
## Standard deviation     2.4098954 0.79305365 0.52836824 0.35286378 0.3016029
## Proportion of Variance 0.8296566 0.08984773 0.03988186 0.01778755 0.0129949
## Cumulative Proportion  0.8296566 0.91950430 0.95938615 0.97717370 0.9901686
##                             Comp.6      Comp.7
## Standard deviation     0.233504215 0.119563887
## Proportion of Variance 0.007789174 0.002042218
## Cumulative Proportion  0.997957782 1.000000000
## 
## Loadings:
##          Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## X100m     0.378  0.407  0.141  0.587  0.167  0.540       
## X200m     0.383  0.414  0.101  0.194        -0.745 -0.266
## X400m     0.368  0.459 -0.237 -0.646 -0.327  0.240  0.127
## X800m     0.395 -0.161 -0.148 -0.295  0.819        -0.195
## X1500m    0.389 -0.309  0.422               -0.189  0.731
## X3000m    0.376 -0.423  0.406        -0.352  0.241 -0.572
## Marathon  0.355 -0.390 -0.741  0.321 -0.247