data <- read.csv("https://archive.ics.uci.edu/static/public/863/data.csv")
# untuk cek data sudah masuk atau belum
data_angka <- data[, 1:6]
# 3a. correlation matrix
cor(data_angka)
## Age SystolicBP DiastolicBP BS BodyTemp
## Age 1.00000000 0.41604545 0.39802629 0.4732843 -0.25532314
## SystolicBP 0.41604545 1.00000000 0.78700648 0.4251717 -0.28661552
## DiastolicBP 0.39802629 0.78700648 1.00000000 0.4238241 -0.25753832
## BS 0.47328434 0.42517166 0.42382407 1.0000000 -0.10349336
## BodyTemp -0.25532314 -0.28661552 -0.25753832 -0.1034934 1.00000000
## HeartRate 0.07979763 -0.02310796 -0.04615057 0.1428672 0.09877104
## HeartRate
## Age 0.07979763
## SystolicBP -0.02310796
## DiastolicBP -0.04615057
## BS 0.14286723
## BodyTemp 0.09877104
## HeartRate 1.00000000
# 3b. matriks kovarians
cov(data_angka)
## Age SystolicBP DiastolicBP BS BodyTemp HeartRate
## Age 181.559065 103.171539 74.471739 21.0035619 -4.7180044 8.697168
## SystolicBP 103.171539 338.704005 201.121845 25.7712999 -7.2338429 -3.439938
## DiastolicBP 74.471739 201.121845 192.815323 19.3828770 -4.9042413 -5.183543
## BS 21.003562 25.771300 19.382877 10.8473512 -0.4674483 3.806040
## BodyTemp -4.718004 -7.233843 -4.904241 -0.4674483 1.8806951 1.095640
## HeartRate 8.697168 -3.439938 -5.183543 3.8060397 1.0956395 65.427104
# 3c. eigen value & vector
eigen(cor(data_angka))
## eigen() decomposition
## $values
## [1] 2.6078934 1.1443812 0.8370499 0.7063345 0.4925435 0.2117975
##
## $vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.4360752 0.1729019 0.2404386 -0.5550931 -0.64321080 0.01685752
## [2,] -0.5296016 -0.1128996 -0.2415842 0.3633726 -0.09386670 -0.71243407
## [3,] -0.5225675 -0.1230416 -0.2990961 0.3544156 -0.07810122 0.70043934
## [4,] -0.4255290 0.3528904 -0.1153507 -0.4254237 0.70709921 -0.01063092
## [5,] 0.2735090 0.4293197 -0.8093439 -0.1305413 -0.26067118 -0.02914362
## [6,] -0.0200401 0.7958469 0.3549994 0.4859993 -0.05856920 0.02399757
3d. Penjelasan output:
1. correlation matrix:
Hubungan terkuat ada pada SystolicBP dan DiastolicBP (0.78), artinya jika tekanan darah sistolik naik, diastolik cenderung ikut naik.
2. matriks kovarians:
Variabel SystolicBP memiliki variasi data paling tinggi (338.70) dibanding variabel lainnya.
3. eigen value & vector:
Nilai pertama (2.60) adalah yang terbesar, menunjukkan bahwa sebagian besar informasi data terwakili di komponen pertama.