Operasi Matriks dan Aplikasinya dalam Analisis Data Mulivariat
Matriks adalah susunan bilangan yang ditata dalam bentuk baris dan kolom, membentuk sebuah persegi panjang atau bujur sangkar. Setiap bilangan di dalam matriks disebut elemen matriks,dan posisinya ditentukan oleh indeks baris serta kolom.
Operasi matriks adalah aturan perhitungan yang digunakan untuk mengolahmatriks sehingga menghasilkan matriks baru atau suatu nilai.Operasi dasarnya meliputipenjumlahan,pengurangan, perkalian dengan skalar, dan perkalian antar matriks. Selain itu, ada juga transpose yangmenukar baris menjadi kolom, determinan dan invers yang hanya berlakuuntuk matriks persegi, serta konsep lanjut seperti eigen value dan eigen vector. Operasi-operasi ini menjadi dasar penting dalam statistika,sains, dan komputasi karena banyak digunakan untuk pemodelan dan analisis data.
Sebelum melakukan operasi, kita perlu membuat matriks menggunakan
fungsimatrix()
di R.
X = matrix(c(6.5,8.2,7.9,
5.4,7.0,6.7,
8.1,6.9,9.2), nrow = 3, ncol = 3); X
## [,1] [,2] [,3]
## [1,] 6.5 5.4 8.1
## [2,] 8.2 7.0 6.9
## [3,] 7.9 6.7 9.2
Y = matrix(c(7.3,6.8,8.5,
8.9,7.6,6.1,
9.4,8.0,7.2), nrow = 3, ncol = 3, byrow = TRUE); Y
## [,1] [,2] [,3]
## [1,] 7.3 6.8 8.5
## [2,] 8.9 7.6 6.1
## [3,] 9.4 8.0 7.2
Operasi dasar meliputi penjumlahan, pengurangan, perkalian skalar,perkalian antar matriks,transpose, determinan, dan invers.
# Penjumlahan dan Pengurangan
X + Y
## [,1] [,2] [,3]
## [1,] 13.8 12.2 16.6
## [2,] 17.1 14.6 13.0
## [3,] 17.3 14.7 16.4
X - Y
## [,1] [,2] [,3]
## [1,] -0.8 -1.4 -0.4
## [2,] -0.7 -0.6 0.8
## [3,] -1.5 -1.3 2.0
Y - X
## [,1] [,2] [,3]
## [1,] 0.8 1.4 0.4
## [2,] 0.7 0.6 -0.8
## [3,] 1.5 1.3 -2.0
# Perkalian
X %*% Y
## [,1] [,2] [,3]
## [1,] 171.65 150.04 146.51
## [2,] 187.02 164.16 162.08
## [3,] 203.78 178.24 174.26
Y %*% X
## [,1] [,2] [,3]
## [1,] 170.36 143.97 184.25
## [2,] 168.36 142.13 180.65
## [3,] 183.58 155.00 197.58
# Perkalian antar elemen
X*Y
## [,1] [,2] [,3]
## [1,] 47.45 36.72 68.85
## [2,] 72.98 53.20 42.09
## [3,] 74.26 53.60 66.24
2*X
## [,1] [,2] [,3]
## [1,] 13.0 10.8 16.2
## [2,] 16.4 14.0 13.8
## [3,] 15.8 13.4 18.4
# Transpose
t(X)
## [,1] [,2] [,3]
## [1,] 6.5 8.2 7.9
## [2,] 5.4 7.0 6.7
## [3,] 8.1 6.9 9.2
t(Y)
## [,1] [,2] [,3]
## [1,] 7.3 8.9 9.4
## [2,] 6.8 7.6 8.0
## [3,] 8.5 6.1 7.2
# Invers dan Determinan
solve(X)
## [,1] [,2] [,3]
## [1,] 8.3848639 2.118136 -8.9709275
## [2,] -9.6585141 -1.933549 9.9538533
## [3,] -0.1661283 -0.410706 0.5629903
solve(Y)
## [,1] [,2] [,3]
## [1,] -1.27147766 -4.089347 4.965636
## [2,] 1.44759450 5.871993 -6.683849
## [3,] 0.05154639 -1.185567 1.082474
det(X)
## [1] 2.167
det(Y)
## [1] -4.656
Selain operasi dasar, kita juga bisa membuat matriks dengan nama baris/kolom dan memanggil komponen tertentu.
A <- matrix(21:40, nrow=4, ncol=5) ; A
## [,1] [,2] [,3] [,4] [,5]
## [1,] 21 25 29 33 37
## [2,] 22 26 30 34 38
## [3,] 23 27 31 35 39
## [4,] 24 28 32 36 40
b <- c(4,7,2,8,5,9,6,3,1,10,12,11);b
## [1] 4 7 2 8 5 9 6 3 1 10 12 11
B <- matrix(b, nrow=3, ncol=4, byrow=TRUE) ; B
## [,1] [,2] [,3] [,4]
## [1,] 4 7 2 8
## [2,] 5 9 6 3
## [3,] 1 10 12 11
sel <- c(15,9,27,18)
nama_kolom <- c("C1", "C2")
nama_baris <- c("R1", "R2")
C <- matrix(sel, nrow=2, ncol=2,
byrow=TRUE, dimnames=list(nama_baris, nama_kolom)) ; C
## C1 C2
## R1 15 9
## R2 27 18
# Memanggil Komponen Matriks
A[,2]
## [1] 25 26 27 28
A[3,]
## [1] 23 27 31 35 39
A[3,2]
## [1] 27
A[c(1,3),2]
## [1] 25 27
A[,1:3]
## [,1] [,2] [,3]
## [1,] 21 25 29
## [2,] 22 26 30
## [3,] 23 27 31
## [4,] 24 28 32
A[2:4,]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 22 26 30 34 38
## [2,] 23 27 31 35 39
## [3,] 24 28 32 36 40
Eigen value dan eigen vector digunakan untuk memahami variasi data dan arah komponen, berguna dalam PCA dan analisis multivariat lainnya.
eigX = eigen(X); eigX
## eigen() decomposition
## $values
## [1] 22.0140019 0.4816027 0.2043953
##
## $vectors
## [,1] [,2] [,3]
## [1,] 0.5268942 0.5218595 0.61121520
## [2,] 0.5752916 -0.8360039 -0.78978922
## [3,] 0.6256373 0.1695881 0.05146821
eigY = eigen(Y); eigY
## eigen() decomposition
## $values
## [1] 23.230397 -1.286223 0.155826
##
## $vectors
## [,1] [,2] [,3]
## [1,] -0.5633011 -0.7710948 0.5522886
## [2,] -0.5584126 0.5270180 -0.8126545
## [3,] -0.6089887 0.3573023 0.1859299
eigX$values; eigY$values
## [1] 22.0140019 0.4816027 0.2043953
## [1] 23.230397 -1.286223 0.155826
eigX$vectors; eigY$vectors
## [,1] [,2] [,3]
## [1,] 0.5268942 0.5218595 0.61121520
## [2,] 0.5752916 -0.8360039 -0.78978922
## [3,] 0.6256373 0.1695881 0.05146821
## [,1] [,2] [,3]
## [1,] -0.5633011 -0.7710948 0.5522886
## [2,] -0.5584126 0.5270180 -0.8126545
## [3,] -0.6089887 0.3573023 0.1859299
SVD memecah matriks menjadi tiga bagian (U, D, V) yang sering dipakai dalam kompresi data dan machine learning.
library(MASS)
## Warning: package 'MASS' was built under R version 4.4.3
A <- matrix(c(5,-3,6,2,-4,8,-2,5,-1,7,3,9), 4, 3, byrow=TRUE)
svd_result <- svd(A)
svd_result$d
## [1] 16.07076 7.41936 3.11187
svd_result$u
## [,1] [,2] [,3]
## [1,] -0.5046975 0.2278362 -0.3742460
## [2,] -0.5178195 0.4138180 0.7413297
## [3,] 0.1646416 -0.6063789 0.5337354
## [4,] -0.6708477 -0.6396483 -0.1596770
svd_result$v
## [,1] [,2] [,3]
## [1,] -0.5341591 -0.17494276 -0.8270847
## [2,] 0.1490928 -0.98251336 0.1115295
## [3,] -0.8321330 -0.06373793 0.5509011
Dalam analisis multivariat, konsep jarak digunakan untuk mengukur kedekatan atau kemiripan antar objek. Bab ini membahas berbagai ukuran jarak seperti Euclidean, Chebyshev,Manhattan, Mahalanobis, dan Minkowski.
#install.packages("factoextra")
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.4.3
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
set.seed(321)
ss <- sample(1:50, 15)
df <- USArrests[ss, ]
df.scaled <- scale(df)
Jarak garis lurus antara dua titik, paling sering digunakan dalam clustering dan analisis spasial.
dist.eucl <- dist(df.scaled, method = "euclidean"); dist.eucl
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Illinois 2.4122476
## Mississippi 2.6164146 3.1543527
## Kansas 0.7934567 2.3786048 3.1993198
## New York 2.7921742 0.4095812 3.3878156 2.7128511
## Kentucky 1.0532156 2.9515362 2.3433244 1.2948587 3.2757206
## Oklahoma 0.8659748 1.8685718 2.9986711 0.5547563 2.2043102 1.4993175
## Hawaii 2.2322175 2.7203365 4.4270510 1.4800030 2.9246694 2.5403456
## Missouri 2.0625111 1.4167282 3.0563398 1.8349434 1.5351057 2.3176129
## New Mexico 3.1109091 1.5775154 3.0617092 3.1551035 1.4705638 3.4011133
## Louisiana 2.4137967 1.6360410 1.7133330 2.6879097 1.7776353 2.4609320
## South Dakota 1.5765126 3.9457686 3.4644086 1.7515852 4.3067435 1.5082173
## Iowa 1.7426214 3.9154083 4.0958166 1.6038155 4.2724405 1.9508929
## North Dakota 2.5296038 4.8794481 4.4694938 2.6181473 5.2524274 2.5546862
## Texas 2.4496576 0.8218968 2.9692463 2.3259192 0.8377979 2.6949264
## Oklahoma Hawaii Missouri New Mexico Louisiana South Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii 1.6491638
## Missouri 1.3724911 2.3123720
## New Mexico 2.6268378 3.7154012 1.4937447
## Louisiana 2.2916633 3.5012381 1.8909275 1.7882330
## South Dakota 2.1588538 2.9115203 3.2767510 4.4281177 3.7902169
## Iowa 2.1130016 2.3395756 3.3845451 4.6758935 4.0922753 0.9964108
## North Dakota 3.0891779 3.4578871 4.3173165 5.5131433 4.8442635 1.1604313
## Texas 1.8768374 2.5920693 1.1756214 1.5867966 1.3643137 3.8935265
## Iowa North Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii
## Missouri
## New Mexico
## Louisiana
## South Dakota
## Iowa
## North Dakota 1.1298867
## Texas 3.9137858 4.8837032
fviz_dist(dist.eucl)
Jarak berdasarkan selisih terbesar antar dimensi, berguna dalam quality control.
dist.cheb <- dist(df.scaled, method = "maximum"); dist.cheb
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Illinois 1.5939604
## Mississippi 2.0521063 2.7028025
## Kansas 0.5464670 1.5918822 2.2286315
## New York 1.8018683 0.3116811 2.9107104 1.6512808
## Kentucky 0.6399041 2.1483815 1.7819577 0.9702368 2.3562894
## Oklahoma 0.6530462 1.1642124 2.0962376 0.4276699 1.2474473 1.1088421
## Hawaii 1.5939604 2.4115828 2.7028025 1.1781447 2.4709814 2.1483815
## Missouri 1.8700867 0.9009342 1.8018683 1.5138797 1.1088421 1.7661930
## New Mexico 2.4489231 1.2021986 2.2262937 2.0927161 1.1088421 2.3450294
## Louisiana 1.8976467 1.1781447 1.5246578 2.0741719 1.3860526 1.6631605
## South Dakota 1.0395394 2.6334999 2.7140760 1.4553552 2.8414078 1.3018739
## Iowa 1.2473704 2.2927856 3.0671266 0.9944112 2.3521842 1.6549244
## North Dakota 1.3780473 2.7028025 3.3760458 1.5880895 2.9107104 1.9638436
## Texas 1.4693539 0.5702265 2.4948946 1.4783991 0.6296251 1.9404736
## Oklahoma Hawaii Missouri New Mexico Louisiana South Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii 1.2473704
## Missouri 1.2170406 1.5681228
## New Mexico 1.7958770 2.8392526 1.2711298
## Louisiana 1.9417780 2.4115828 1.4122022 1.4693539
## South Dakota 1.5939604 2.6334999 2.2856616 2.8644979 2.5596164
## Iowa 1.2912504 1.8018683 2.5082909 3.0871273 2.9126670 0.8316315
## North Dakota 1.8849287 2.7028025 3.1019693 3.6808057 3.2215862 0.8163077
## Texas 1.3460052 1.8413563 0.8164294 0.9978963 0.9702368 2.4255920
## Iowa North Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii
## Missouri
## New Mexico
## Louisiana
## South Dakota
## Iowa
## North Dakota 0.9009342
## Texas 2.3168942 2.7012364
fviz_dist(dist.cheb)
Jarak berbasis grid, dijumlahkan dari perbedaan absolut tiap koordinat,banyak digunakan dalam NLP dan pemodelan gudang/jalan kota.
dist.man <- dist(df.scaled, method = "manhattan"); dist.man
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Illinois 4.6804639
## Mississippi 4.5477901 5.1034373
## Kansas 1.4950151 4.6314334 5.5975464
## New York 5.4139111 0.7334472 5.4091682 5.3648806
## Kentucky 1.9159642 5.1088324 3.8673166 2.1102578 5.8422796
## Oklahoma 1.3703957 3.6359252 5.4729270 0.9955082 4.3693724 2.8409781
## Hawaii 3.9738430 4.1009258 8.0763743 2.4788279 4.8343730 4.4465291
## Missouri 3.2505127 2.6766756 5.9782446 3.2014823 2.7867606 3.9878005
## New Mexico 5.6300548 2.7514592 5.3741207 5.5810243 2.4338278 6.0584233
## Louisiana 4.3384469 2.5485829 2.5548545 4.2894164 2.9731109 4.7668154
## South Dakota 3.0080629 7.6885267 5.4767741 3.0570933 8.4219740 2.5796943
## Iowa 3.1085028 7.7889667 7.2404771 3.1575333 8.5224139 3.3731605
## North Dakota 5.0427114 9.7231753 7.3728174 5.0917419 10.4566225 4.6143429
## Texas 4.6324690 1.5082739 5.1808752 4.5834386 1.4875431 5.0608376
## Oklahoma Hawaii Missouri New Mexico Louisiana
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii 2.6034473
## Missouri 2.2059740 4.4728430
## New Mexico 4.5855161 6.8523850 2.3795420
## Louisiana 3.5711187 6.1151982 3.4233902 3.0568606
## South Dakota 4.0526016 4.5379784 6.2585756 8.6381176 7.3465098
## Iowa 4.1530415 3.9256352 6.3590155 8.7385576 7.4469497
## North Dakota 6.0872501 5.6222495 8.2932241 10.6727662 9.3811583
## Texas 3.5879303 4.4687467 2.1834220 2.9573454 2.6260207
## South Dakota Iowa North Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii
## Missouri
## New Mexico
## Louisiana
## South Dakota
## Iowa 1.7637030
## North Dakota 2.0346485 1.9342086
## Texas 7.6405319 7.7409718 9.6751804
fviz_dist(dist.man)
Jarak yang memperhitungkan varians dan korelasi antar variabel, dipakai dalam deteksi outlier dan analisis diskriminan.
library(StatMatch)
## Warning: package 'StatMatch' was built under R version 4.4.3
## Loading required package: proxy
## Warning: package 'proxy' was built under R version 4.4.3
##
## Attaching package: 'proxy'
## The following objects are masked from 'package:stats':
##
## as.dist, dist
## The following object is masked from 'package:base':
##
## as.matrix
## Loading required package: survey
## Warning: package 'survey' was built under R version 4.4.3
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
## Loading required package: lpSolve
## Warning: package 'lpSolve' was built under R version 4.4.2
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dist.mah <- mahalanobis.dist(df.scaled); dist.mah
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Wyoming 0.000000 1.7186109 2.820779 1.4195095 1.8695558 2.867847
## Illinois 1.718611 0.0000000 3.658323 2.2905255 0.4722069 3.878642
## Mississippi 2.820779 3.6583235 0.000000 3.2139075 3.6566922 2.544477
## Kansas 1.419510 2.2905255 3.213907 0.0000000 2.1522535 2.048031
## New York 1.869556 0.4722069 3.656692 2.1522535 0.0000000 3.698342
## Kentucky 2.867847 3.8786421 2.544477 2.0480310 3.6983422 0.000000
## Oklahoma 1.146496 1.8980286 3.237573 0.6499978 1.7772007 2.505941
## Hawaii 3.466671 3.6449604 4.722203 2.2108491 3.3748818 2.753554
## Missouri 3.198071 3.6796400 3.956918 2.2592572 3.3618939 2.642756
## New Mexico 3.281318 3.5101406 4.057258 3.1016653 3.2869855 3.870023
## Louisiana 2.284940 2.5550539 1.688058 2.2700723 2.4136664 2.119635
## South Dakota 1.826205 3.3564158 3.087365 1.6274307 3.3404110 2.261154
## Iowa 1.327907 2.6329606 3.559587 1.1128197 2.6839965 2.621704
## North Dakota 1.582582 3.1919907 3.553572 1.9466491 3.3317039 3.040465
## Texas 2.540604 2.4769381 3.093919 1.7462066 2.1399545 2.108949
## Oklahoma Hawaii Missouri New Mexico Louisiana South Dakota
## Wyoming 1.1464956 3.466671 3.198071 3.281318 2.284940 1.826205
## Illinois 1.8980286 3.644960 3.679640 3.510141 2.555054 3.356416
## Mississippi 3.2375727 4.722203 3.956918 4.057258 1.688058 3.087365
## Kansas 0.6499978 2.210849 2.259257 3.101665 2.270072 1.627431
## New York 1.7772007 3.374882 3.361894 3.286985 2.413666 3.340411
## Kentucky 2.5059414 2.753554 2.642756 3.870023 2.119635 2.261154
## Oklahoma 0.0000000 2.705865 2.203038 2.660216 2.350208 1.672866
## Hawaii 2.7058650 0.000000 3.193764 4.645567 3.383255 3.551072
## Missouri 2.2030382 3.193764 0.000000 1.836797 3.256319 2.505784
## New Mexico 2.6602159 4.645567 1.836797 0.000000 3.676879 3.026024
## Louisiana 2.3502077 3.383255 3.256319 3.676879 0.000000 3.021642
## South Dakota 1.6728664 3.551072 2.505784 3.026024 3.021642 0.000000
## Iowa 1.3299426 2.790197 3.145245 3.792086 2.954252 1.518854
## North Dakota 1.9813596 3.780966 3.590548 3.950259 3.434074 1.304743
## Texas 1.9635201 2.082005 2.576037 3.501666 1.527269 3.090805
## Iowa North Dakota Texas
## Wyoming 1.327907 1.582582 2.540604
## Illinois 2.632961 3.191991 2.476938
## Mississippi 3.559587 3.553572 3.093919
## Kansas 1.112820 1.946649 1.746207
## New York 2.683996 3.331704 2.139954
## Kentucky 2.621704 3.040465 2.108949
## Oklahoma 1.329943 1.981360 1.963520
## Hawaii 2.790197 3.780966 2.082005
## Missouri 3.145245 3.590548 2.576037
## New Mexico 3.792086 3.950259 3.501666
## Louisiana 2.954252 3.434074 1.527269
## South Dakota 1.518854 1.304743 3.090805
## Iowa 0.000000 1.045923 2.734770
## North Dakota 1.045923 0.000000 3.563193
## Texas 2.734770 3.563193 0.000000
as.matrix(dist.mah)
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Wyoming 0.000000 1.7186109 2.820779 1.4195095 1.8695558 2.867847
## Illinois 1.718611 0.0000000 3.658323 2.2905255 0.4722069 3.878642
## Mississippi 2.820779 3.6583235 0.000000 3.2139075 3.6566922 2.544477
## Kansas 1.419510 2.2905255 3.213907 0.0000000 2.1522535 2.048031
## New York 1.869556 0.4722069 3.656692 2.1522535 0.0000000 3.698342
## Kentucky 2.867847 3.8786421 2.544477 2.0480310 3.6983422 0.000000
## Oklahoma 1.146496 1.8980286 3.237573 0.6499978 1.7772007 2.505941
## Hawaii 3.466671 3.6449604 4.722203 2.2108491 3.3748818 2.753554
## Missouri 3.198071 3.6796400 3.956918 2.2592572 3.3618939 2.642756
## New Mexico 3.281318 3.5101406 4.057258 3.1016653 3.2869855 3.870023
## Louisiana 2.284940 2.5550539 1.688058 2.2700723 2.4136664 2.119635
## South Dakota 1.826205 3.3564158 3.087365 1.6274307 3.3404110 2.261154
## Iowa 1.327907 2.6329606 3.559587 1.1128197 2.6839965 2.621704
## North Dakota 1.582582 3.1919907 3.553572 1.9466491 3.3317039 3.040465
## Texas 2.540604 2.4769381 3.093919 1.7462066 2.1399545 2.108949
## Oklahoma Hawaii Missouri New Mexico Louisiana South Dakota
## Wyoming 1.1464956 3.466671 3.198071 3.281318 2.284940 1.826205
## Illinois 1.8980286 3.644960 3.679640 3.510141 2.555054 3.356416
## Mississippi 3.2375727 4.722203 3.956918 4.057258 1.688058 3.087365
## Kansas 0.6499978 2.210849 2.259257 3.101665 2.270072 1.627431
## New York 1.7772007 3.374882 3.361894 3.286985 2.413666 3.340411
## Kentucky 2.5059414 2.753554 2.642756 3.870023 2.119635 2.261154
## Oklahoma 0.0000000 2.705865 2.203038 2.660216 2.350208 1.672866
## Hawaii 2.7058650 0.000000 3.193764 4.645567 3.383255 3.551072
## Missouri 2.2030382 3.193764 0.000000 1.836797 3.256319 2.505784
## New Mexico 2.6602159 4.645567 1.836797 0.000000 3.676879 3.026024
## Louisiana 2.3502077 3.383255 3.256319 3.676879 0.000000 3.021642
## South Dakota 1.6728664 3.551072 2.505784 3.026024 3.021642 0.000000
## Iowa 1.3299426 2.790197 3.145245 3.792086 2.954252 1.518854
## North Dakota 1.9813596 3.780966 3.590548 3.950259 3.434074 1.304743
## Texas 1.9635201 2.082005 2.576037 3.501666 1.527269 3.090805
## Iowa North Dakota Texas
## Wyoming 1.327907 1.582582 2.540604
## Illinois 2.632961 3.191991 2.476938
## Mississippi 3.559587 3.553572 3.093919
## Kansas 1.112820 1.946649 1.746207
## New York 2.683996 3.331704 2.139954
## Kentucky 2.621704 3.040465 2.108949
## Oklahoma 1.329943 1.981360 1.963520
## Hawaii 2.790197 3.780966 2.082005
## Missouri 3.145245 3.590548 2.576037
## New Mexico 3.792086 3.950259 3.501666
## Louisiana 2.954252 3.434074 1.527269
## South Dakota 1.518854 1.304743 3.090805
## Iowa 0.000000 1.045923 2.734770
## North Dakota 1.045923 0.000000 3.563193
## Texas 2.734770 3.563193 0.000000
Merupakan generalisasi jarak Euclidean, Manhattan, dan Chebyshev,bergantung pada nilai parameter p.
set.seed(123)
data <- matrix(runif(15, min = 1, max = 10), nrow = 5, ncol = 3)
colnames(data) <- c("X1", "X2", "X3")
p1 <- data[1, ]; p2 <- data[2, ]
minkowski_distance <- function(x, y, p) {
sum(abs(x - y)^p)^(1/p)
}
minkowski_distance(p1, p2, p = 1)
## [1] 13.38098
minkowski_distance(p1, p2, p = 2)
## [1] 7.726871
minkowski_distance(p1, p2, p = 3)
## [1] 6.435156
max(abs(p1 - p2))
## [1] 4.531493
Vektor rata-rata dalam konteks multivariat digunakan untuk merangkum pusat data dari beberapa variabel sekaligus. Selain rata-rata, juga dihitung kovarians, korelasi, dan matriks standardisasi.
Data dimasukkan dalam bentuk matriks agar bisa dianalisis secara multivariat.
BB = c(6.2,11.5,8.7,10.1,7.8,6.9,12.0,3.1,14.8,9.4)
PM = c(61,73,68,70,64,60,76,49,84,71)
RTB = c(115,138,127,123,131,120,143,95,160,128)
lizard = as.matrix(cbind(BB,PM,RTB)); lizard
## BB PM RTB
## [1,] 6.2 61 115
## [2,] 11.5 73 138
## [3,] 8.7 68 127
## [4,] 10.1 70 123
## [5,] 7.8 64 131
## [6,] 6.9 60 120
## [7,] 12.0 76 143
## [8,] 3.1 49 95
## [9,] 14.8 84 160
## [10,] 9.4 71 128
Digunakan untuk mencari nilai rata-rata tiap variabel dalam bentuk vektor.
vecMeans = as.matrix(colMeans(lizard)); vecMeans
## [,1]
## BB 9.05
## PM 67.60
## RTB 128.00
matrix(c(mean(BB), mean(PM), mean(RTB)), nrow=3, ncol=1)
## [,1]
## [1,] 9.05
## [2,] 67.60
## [3,] 128.00
Kovarians menunjukkan seberapa besar variabel berubah bersama, sedangkan korelasi menunjukkan hubungan linear antar variabel.
cov(lizard)
## BB PM RTB
## BB 10.98056 31.80000 54.96667
## PM 31.80000 94.04444 160.22222
## RTB 54.96667 160.22222 300.66667
cor(lizard)
## BB PM RTB
## BB 1.0000000 0.9895743 0.9566313
## PM 0.9895743 1.0000000 0.9528259
## RTB 0.9566313 0.9528259 1.0000000
Standardisasi mengubah skala variabel agar bisa dibandingkan dengan adil, menghasilkan matriks korelasi R.
n = nrow(lizard)
u = matrix(1,n,1)
xbar = cbind((1/n)*t(u)%*%lizard)
D = lizard - u %*% xbar
S = (1/(n-1))*t(D)%*%D
Ds = diag(sqrt(diag(S))); Ds
## [,1] [,2] [,3]
## [1,] 3.313692 0.000000 0.00000
## [2,] 0.000000 9.697651 0.00000
## [3,] 0.000000 0.000000 17.33974
R = solve(Ds) %*% S %*% solve(Ds); R
## [,1] [,2] [,3]
## [1,] 1.0000000 0.9895743 0.9566313
## [2,] 0.9895743 1.0000000 0.9528259
## [3,] 0.9566313 0.9528259 1.0000000