Matriks adalah susunanbilangan, simbol, atau ekspresi yang ditata dalam bentuk persegi atauapun persegi panjang, terdiri atas baris dan kolom. Setiap angka di dalam matriks disebut elemen matriks. Misalkan sebuah matriks terdiri dari n baris dan p kolom, maka matriks tersebut berordo n × p.
contoh nya sebaggai berikut :
Matriks berikut memiliki ordo 3 x 3
X = matrix(c(6.5,8.2,7.9,
5.4,7.0,6.7,
8.1,6.9,9.2), nrow = 3, ncol = 3); X
## [,1] [,2] [,3]
## [1,] 6.5 5.4 8.1
## [2,] 8.2 7.0 6.9
## [3,] 7.9 6.7 9.2
Y = matrix(c(7.3,6.8,8.5,
8.9,7.6,6.1,
9.4,8.0,7.2), nrow = 3, ncol = 3, byrow = TRUE); Y
## [,1] [,2] [,3]
## [1,] 7.3 6.8 8.5
## [2,] 8.9 7.6 6.1
## [3,] 9.4 8.0 7.2
Operasi matriks adalah operasi suatu bentuk matriks, seperti penjumlahan, pengurangan, dan perkalian.
Penjumlahan Matriks :
X + Y
## [,1] [,2] [,3]
## [1,] 13.8 12.2 16.6
## [2,] 17.1 14.6 13.0
## [3,] 17.3 14.7 16.4
Pengurangan Matriks
X - Y
## [,1] [,2] [,3]
## [1,] -0.8 -1.4 -0.4
## [2,] -0.7 -0.6 0.8
## [3,] -1.5 -1.3 2.0
Y - X
## [,1] [,2] [,3]
## [1,] 0.8 1.4 0.4
## [2,] 0.7 0.6 -0.8
## [3,] 1.5 1.3 -2.0
Perkalian Matriks
X %*% Y
## [,1] [,2] [,3]
## [1,] 171.65 150.04 146.51
## [2,] 187.02 164.16 162.08
## [3,] 203.78 178.24 174.26
Y %*% X
## [,1] [,2] [,3]
## [1,] 170.36 143.97 184.25
## [2,] 168.36 142.13 180.65
## [3,] 183.58 155.00 197.58
Perkalian antar elemen
X*Y
## [,1] [,2] [,3]
## [1,] 47.45 36.72 68.85
## [2,] 72.98 53.20 42.09
## [3,] 74.26 53.60 66.24
2*X
## [,1] [,2] [,3]
## [1,] 13.0 10.8 16.2
## [2,] 16.4 14.0 13.8
## [3,] 15.8 13.4 18.4
Transpose matriks adalah matriks yang dibentuk dengan mempertukarkan elemen-elemen didalam baris dan kolom dari matriks tersebut.
transX = t(X); transX
## [,1] [,2] [,3]
## [1,] 6.5 8.2 7.9
## [2,] 5.4 7.0 6.7
## [3,] 8.1 6.9 9.2
transY = t(Y); transY
## [,1] [,2] [,3]
## [1,] 7.3 8.9 9.4
## [2,] 6.8 7.6 8.0
## [3,] 8.5 6.1 7.2
Invers Matriks adalah matriks kebalikan dari suatu matriks A, dilambangkan \(A^{-1}\).. Syaratnya: matriks harus persegi dan determinannya \(\neq 0\).
inv_X = solve(X); inv_X
## [,1] [,2] [,3]
## [1,] 8.3848639 2.118136 -8.9709275
## [2,] -9.6585141 -1.933549 9.9538533
## [3,] -0.1661283 -0.410706 0.5629903
inv_Y = solve(Y); inv_Y
## [,1] [,2] [,3]
## [1,] -1.27147766 -4.089347 4.965636
## [2,] 1.44759450 5.871993 -6.683849
## [3,] 0.05154639 -1.185567 1.082474
Determinan Matriks adalah suatu nilai berupa bilangan yang diperoleh dari perhitungan pada matriks persegi (jumlah baris sama dengan jumlah kolom) dan digunakan untuk mengetahui sifat matriks, misalnya apakah matriks memiliki invers atau tidak
# determinan
det(X)
## [1] 2.167
det(Y)
## [1] -4.656
A <- matrix(21:40, nrow=4, ncol=5) ; A
## [,1] [,2] [,3] [,4] [,5]
## [1,] 21 25 29 33 37
## [2,] 22 26 30 34 38
## [3,] 23 27 31 35 39
## [4,] 24 28 32 36 40
b <- c(4,7,2,8,5,9,6,3,1,10,12,11);b
## [1] 4 7 2 8 5 9 6 3 1 10 12 11
B <- matrix(b, nrow=3, ncol=4, byrow=TRUE) ; B
## [,1] [,2] [,3] [,4]
## [1,] 4 7 2 8
## [2,] 5 9 6 3
## [3,] 1 10 12 11
sel <- c(15,9,27,18)
nama_kolom <- c("C1", "C2")
nama_baris <- c("R1", "R2")
C <- matrix(sel, nrow=2, ncol=2,
byrow=TRUE, dimnames=list(nama_baris,
nama_kolom)) ; C
## C1 C2
## R1 15 9
## R2 27 18
Akan Dipanggil elemen-elemen yang ada pada sebuah Matriks, cara nya sebagai berikut
Memanggil Komponen matriks
A
## [,1] [,2] [,3] [,4] [,5]
## [1,] 21 25 29 33 37
## [2,] 22 26 30 34 38
## [3,] 23 27 31 35 39
## [4,] 24 28 32 36 40
Memanggil kolom dua Matriks A
A[,2]
## [1] 25 26 27 28
Memanggil Baris 3 Matriks A
A[3,]
## [1] 23 27 31 35 39
Memanggil sel baris ke-3, kolom ke-2 Matriks A
A[3,2]
## [1] 27
Memanggil sel baris ke-1, kolom ke-2 dan sel baris ke-3, kolom ke-2 Matriks A
A[c(1,3),2]
## [1] 25 27
Memanggil kolom ke 1, 2 dan 3 Matriks A
A[,1:3]
## [,1] [,2] [,3]
## [1,] 21 25 29
## [2,] 22 26 30
## [3,] 23 27 31
## [4,] 24 28 32
Memanggil baris ke 2, 3 dan 4 Matriks A
A[2:4,]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 22 26 30 34 38
## [2,] 23 27 31 35 39
## [3,] 24 28 32 36 40
Eigen Value menunjukkan jumlah variasi (informasi) yang dapat dijelaskan oleh satu komponen dan Eigen Vector menunjukkan jumlah variasi (informasi) yang dijelaskan suatu komponen lainnya
eigX = eigen(X); eigX
## eigen() decomposition
## $values
## [1] 22.0140019 0.4816027 0.2043953
##
## $vectors
## [,1] [,2] [,3]
## [1,] 0.5268942 0.5218595 0.61121520
## [2,] 0.5752916 -0.8360039 -0.78978922
## [3,] 0.6256373 0.1695881 0.05146821
eigY = eigen(Y); eigY
## eigen() decomposition
## $values
## [1] 23.230397 -1.286223 0.155826
##
## $vectors
## [,1] [,2] [,3]
## [1,] -0.5633011 -0.7710948 0.5522886
## [2,] -0.5584126 0.5270180 -0.8126545
## [3,] -0.6089887 0.3573023 0.1859299
eigvalX = eigX$values; eigvalX
## [1] 22.0140019 0.4816027 0.2043953
eigvalY = eigY$values; eigvalY
## [1] 23.230397 -1.286223 0.155826
eigvecX = eigX$vectors; eigvecX
## [,1] [,2] [,3]
## [1,] 0.5268942 0.5218595 0.61121520
## [2,] 0.5752916 -0.8360039 -0.78978922
## [3,] 0.6256373 0.1695881 0.05146821
eigvecY = eigY$vectors; eigvecY
## [,1] [,2] [,3]
## [1,] -0.5633011 -0.7710948 0.5522886
## [2,] -0.5584126 0.5270180 -0.8126545
## [3,] -0.6089887 0.3573023 0.1859299
SVD (Singular Value Decomposition) adalah salah satu teknik dalam aljabar linear untuk memecah sebuah matriks \(A\) menjadi perkalian tiga matriks lain, yaitu:
\[A = U \,\Sigma \, V^T \]
library(MASS)
A <- matrix(c(5,-3,6,2,-4,8,-2,5,-1,7,3,9), 4, 3, byrow=TRUE); A
## [,1] [,2] [,3]
## [1,] 5 -3 6
## [2,] 2 -4 8
## [3,] -2 5 -1
## [4,] 7 3 9
svd_result <- svd(A)
singular_value <- svd_result$d ; singular_value
## [1] 16.07076 7.41936 3.11187
U <- svd_result$u ; U
## [,1] [,2] [,3]
## [1,] -0.5046975 0.2278362 -0.3742460
## [2,] -0.5178195 0.4138180 0.7413297
## [3,] 0.1646416 -0.6063789 0.5337354
## [4,] -0.6708477 -0.6396483 -0.1596770
V <- svd_result$v ; V
## [,1] [,2] [,3]
## [1,] -0.5341591 -0.17494276 -0.8270847
## [2,] 0.1490928 -0.98251336 0.1115295
## [3,] -0.8321330 -0.06373793 0.5509011
# MATRIKS JARAK #
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
set.seed(321)
ss <- sample(1:50, 15)
df <- USArrests[ss, ]
df.scaled <- scale(df); df.scaled
## Murder Assault UrbanPop Rape
## Wyoming -0.3721741 -0.02296746 -0.3418930 -0.62039386
## Illinois 0.4221896 1.02244775 1.2520675 0.62633064
## Mississippi 1.6799322 1.14124493 -1.4507350 -0.39776448
## Kansas -0.5486994 -0.56943449 0.0739228 -0.26418686
## New York 0.5766492 1.08184634 1.4599754 0.93801176
## Kentucky 0.2677300 -0.64071280 -0.8963140 -0.51650015
## Oklahoma -0.4163054 -0.14176464 0.2125281 0.03265231
## Hawaii -0.7031590 -1.38913505 1.2520675 0.06233622
## Missouri 0.1132704 0.17898775 0.3511333 1.24969289
## New Mexico 0.6428462 1.45011760 0.3511333 1.82852926
## Louisiana 1.5254725 1.02244775 0.0739228 0.35917539
## South Dakota -1.0341439 -0.91394632 -1.3814324 -1.03596869
## Iowa -1.3871944 -1.27033787 -0.5498008 -1.25859806
## North Dakota -1.6961136 -1.40101477 -1.4507350 -1.85227639
## Texas 0.9296998 0.45222127 1.0441596 0.84896001
## attr(,"scaled:center")
## Murder Assault UrbanPop Rape
## 8.486667 162.933333 64.933333 19.780000
## attr(,"scaled:scale")
## Murder Assault UrbanPop Rape
## 4.531929 84.177081 14.429467 6.737655
Jarak Euclidean adalah Jarak lurus (garis terpendek) antara dua titik di ruang dimensi [jarak lurus standar].
contoh penggunaan nya adalah menghitung jarak garis lurus antar dua koordinat (GPS)
berikut syntax r nya :
dist.eucl <- dist(df.scaled, method = "euclidean"); dist.eucl
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Illinois 2.4122476
## Mississippi 2.6164146 3.1543527
## Kansas 0.7934567 2.3786048 3.1993198
## New York 2.7921742 0.4095812 3.3878156 2.7128511
## Kentucky 1.0532156 2.9515362 2.3433244 1.2948587 3.2757206
## Oklahoma 0.8659748 1.8685718 2.9986711 0.5547563 2.2043102 1.4993175
## Hawaii 2.2322175 2.7203365 4.4270510 1.4800030 2.9246694 2.5403456
## Missouri 2.0625111 1.4167282 3.0563398 1.8349434 1.5351057 2.3176129
## New Mexico 3.1109091 1.5775154 3.0617092 3.1551035 1.4705638 3.4011133
## Louisiana 2.4137967 1.6360410 1.7133330 2.6879097 1.7776353 2.4609320
## South Dakota 1.5765126 3.9457686 3.4644086 1.7515852 4.3067435 1.5082173
## Iowa 1.7426214 3.9154083 4.0958166 1.6038155 4.2724405 1.9508929
## North Dakota 2.5296038 4.8794481 4.4694938 2.6181473 5.2524274 2.5546862
## Texas 2.4496576 0.8218968 2.9692463 2.3259192 0.8377979 2.6949264
## Oklahoma Hawaii Missouri New Mexico Louisiana South Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii 1.6491638
## Missouri 1.3724911 2.3123720
## New Mexico 2.6268378 3.7154012 1.4937447
## Louisiana 2.2916633 3.5012381 1.8909275 1.7882330
## South Dakota 2.1588538 2.9115203 3.2767510 4.4281177 3.7902169
## Iowa 2.1130016 2.3395756 3.3845451 4.6758935 4.0922753 0.9964108
## North Dakota 3.0891779 3.4578871 4.3173165 5.5131433 4.8442635 1.1604313
## Texas 1.8768374 2.5920693 1.1756214 1.5867966 1.3643137 3.8935265
## Iowa North Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii
## Missouri
## New Mexico
## Louisiana
## South Dakota
## Iowa
## North Dakota 1.1298867
## Texas 3.9137858 4.8837032
fviz_dist(dist.eucl)
Jarak Chebyshev adalah jarak ditentukan oleh selisih terbesar, Jarak maksimum di antara perbedaan koordinat. Fokus pada dimensi dengan selisih terbesar.
contoh penggunaanya adalah :Berguna di quality control multivariat, misalnya mengecek dimensi produk (lebar, panjang, tinggi)
dist.cheb <- dist(df.scaled, method = "maximum"); dist.cheb
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Illinois 1.5939604
## Mississippi 2.0521063 2.7028025
## Kansas 0.5464670 1.5918822 2.2286315
## New York 1.8018683 0.3116811 2.9107104 1.6512808
## Kentucky 0.6399041 2.1483815 1.7819577 0.9702368 2.3562894
## Oklahoma 0.6530462 1.1642124 2.0962376 0.4276699 1.2474473 1.1088421
## Hawaii 1.5939604 2.4115828 2.7028025 1.1781447 2.4709814 2.1483815
## Missouri 1.8700867 0.9009342 1.8018683 1.5138797 1.1088421 1.7661930
## New Mexico 2.4489231 1.2021986 2.2262937 2.0927161 1.1088421 2.3450294
## Louisiana 1.8976467 1.1781447 1.5246578 2.0741719 1.3860526 1.6631605
## South Dakota 1.0395394 2.6334999 2.7140760 1.4553552 2.8414078 1.3018739
## Iowa 1.2473704 2.2927856 3.0671266 0.9944112 2.3521842 1.6549244
## North Dakota 1.3780473 2.7028025 3.3760458 1.5880895 2.9107104 1.9638436
## Texas 1.4693539 0.5702265 2.4948946 1.4783991 0.6296251 1.9404736
## Oklahoma Hawaii Missouri New Mexico Louisiana South Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii 1.2473704
## Missouri 1.2170406 1.5681228
## New Mexico 1.7958770 2.8392526 1.2711298
## Louisiana 1.9417780 2.4115828 1.4122022 1.4693539
## South Dakota 1.5939604 2.6334999 2.2856616 2.8644979 2.5596164
## Iowa 1.2912504 1.8018683 2.5082909 3.0871273 2.9126670 0.8316315
## North Dakota 1.8849287 2.7028025 3.1019693 3.6808057 3.2215862 0.8163077
## Texas 1.3460052 1.8413563 0.8164294 0.9978963 0.9702368 2.4255920
## Iowa North Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii
## Missouri
## New Mexico
## Louisiana
## South Dakota
## Iowa
## North Dakota 0.9009342
## Texas 2.3168942 2.7012364
fviz_dist(dist.cheb)
Jarak Manhattan adalah Jumlah perbedaan absolut antar koordinat, seperti berjalan di jalan kota berbentuk grid [jarak berbasis grid (jumlah selisih)].
contoh penggunaanya adalah menghitung jarak dalam gudang/grid jalan yang tidak memungkinkan jalur diagonal. dan menghitung jarak antar dokumen berdasarkan frekuensi kata (NLP).
dist.man <- dist(df.scaled, method = "manhattan"); dist.man
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Illinois 4.6804639
## Mississippi 4.5477901 5.1034373
## Kansas 1.4950151 4.6314334 5.5975464
## New York 5.4139111 0.7334472 5.4091682 5.3648806
## Kentucky 1.9159642 5.1088324 3.8673166 2.1102578 5.8422796
## Oklahoma 1.3703957 3.6359252 5.4729270 0.9955082 4.3693724 2.8409781
## Hawaii 3.9738430 4.1009258 8.0763743 2.4788279 4.8343730 4.4465291
## Missouri 3.2505127 2.6766756 5.9782446 3.2014823 2.7867606 3.9878005
## New Mexico 5.6300548 2.7514592 5.3741207 5.5810243 2.4338278 6.0584233
## Louisiana 4.3384469 2.5485829 2.5548545 4.2894164 2.9731109 4.7668154
## South Dakota 3.0080629 7.6885267 5.4767741 3.0570933 8.4219740 2.5796943
## Iowa 3.1085028 7.7889667 7.2404771 3.1575333 8.5224139 3.3731605
## North Dakota 5.0427114 9.7231753 7.3728174 5.0917419 10.4566225 4.6143429
## Texas 4.6324690 1.5082739 5.1808752 4.5834386 1.4875431 5.0608376
## Oklahoma Hawaii Missouri New Mexico Louisiana
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii 2.6034473
## Missouri 2.2059740 4.4728430
## New Mexico 4.5855161 6.8523850 2.3795420
## Louisiana 3.5711187 6.1151982 3.4233902 3.0568606
## South Dakota 4.0526016 4.5379784 6.2585756 8.6381176 7.3465098
## Iowa 4.1530415 3.9256352 6.3590155 8.7385576 7.4469497
## North Dakota 6.0872501 5.6222495 8.2932241 10.6727662 9.3811583
## Texas 3.5879303 4.4687467 2.1834220 2.9573454 2.6260207
## South Dakota Iowa North Dakota
## Illinois
## Mississippi
## Kansas
## New York
## Kentucky
## Oklahoma
## Hawaii
## Missouri
## New Mexico
## Louisiana
## South Dakota
## Iowa 1.7637030
## North Dakota 2.0346485 1.9342086
## Texas 7.6405319 7.7409718 9.6751804
fviz_dist(dist.man)
Mahalanobis adalah Jarak antar titik yang mempertimbangkan skala (varians) dan korelasi antar variabel. Contoh penggunaany adalah mendeteksi transaksi keuangan yang tidak wajar dan memisahkan kelompok dengan varians dan korelasi berbeda (Analisis Diskriminan)
Berikut Syntax r nya:
library(StatMatch)
## Warning: package 'StatMatch' was built under R version 4.4.3
## Loading required package: proxy
## Warning: package 'proxy' was built under R version 4.4.3
##
## Attaching package: 'proxy'
## The following objects are masked from 'package:stats':
##
## as.dist, dist
## The following object is masked from 'package:base':
##
## as.matrix
## Loading required package: survey
## Warning: package 'survey' was built under R version 4.4.3
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
## Loading required package: lpSolve
## Warning: package 'lpSolve' was built under R version 4.4.2
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dist.mah <- mahalanobis.dist(df.scaled); dist.mah
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Wyoming 0.000000 1.7186109 2.820779 1.4195095 1.8695558 2.867847
## Illinois 1.718611 0.0000000 3.658323 2.2905255 0.4722069 3.878642
## Mississippi 2.820779 3.6583235 0.000000 3.2139075 3.6566922 2.544477
## Kansas 1.419510 2.2905255 3.213907 0.0000000 2.1522535 2.048031
## New York 1.869556 0.4722069 3.656692 2.1522535 0.0000000 3.698342
## Kentucky 2.867847 3.8786421 2.544477 2.0480310 3.6983422 0.000000
## Oklahoma 1.146496 1.8980286 3.237573 0.6499978 1.7772007 2.505941
## Hawaii 3.466671 3.6449604 4.722203 2.2108491 3.3748818 2.753554
## Missouri 3.198071 3.6796400 3.956918 2.2592572 3.3618939 2.642756
## New Mexico 3.281318 3.5101406 4.057258 3.1016653 3.2869855 3.870023
## Louisiana 2.284940 2.5550539 1.688058 2.2700723 2.4136664 2.119635
## South Dakota 1.826205 3.3564158 3.087365 1.6274307 3.3404110 2.261154
## Iowa 1.327907 2.6329606 3.559587 1.1128197 2.6839965 2.621704
## North Dakota 1.582582 3.1919907 3.553572 1.9466491 3.3317039 3.040465
## Texas 2.540604 2.4769381 3.093919 1.7462066 2.1399545 2.108949
## Oklahoma Hawaii Missouri New Mexico Louisiana South Dakota
## Wyoming 1.1464956 3.466671 3.198071 3.281318 2.284940 1.826205
## Illinois 1.8980286 3.644960 3.679640 3.510141 2.555054 3.356416
## Mississippi 3.2375727 4.722203 3.956918 4.057258 1.688058 3.087365
## Kansas 0.6499978 2.210849 2.259257 3.101665 2.270072 1.627431
## New York 1.7772007 3.374882 3.361894 3.286985 2.413666 3.340411
## Kentucky 2.5059414 2.753554 2.642756 3.870023 2.119635 2.261154
## Oklahoma 0.0000000 2.705865 2.203038 2.660216 2.350208 1.672866
## Hawaii 2.7058650 0.000000 3.193764 4.645567 3.383255 3.551072
## Missouri 2.2030382 3.193764 0.000000 1.836797 3.256319 2.505784
## New Mexico 2.6602159 4.645567 1.836797 0.000000 3.676879 3.026024
## Louisiana 2.3502077 3.383255 3.256319 3.676879 0.000000 3.021642
## South Dakota 1.6728664 3.551072 2.505784 3.026024 3.021642 0.000000
## Iowa 1.3299426 2.790197 3.145245 3.792086 2.954252 1.518854
## North Dakota 1.9813596 3.780966 3.590548 3.950259 3.434074 1.304743
## Texas 1.9635201 2.082005 2.576037 3.501666 1.527269 3.090805
## Iowa North Dakota Texas
## Wyoming 1.327907 1.582582 2.540604
## Illinois 2.632961 3.191991 2.476938
## Mississippi 3.559587 3.553572 3.093919
## Kansas 1.112820 1.946649 1.746207
## New York 2.683996 3.331704 2.139954
## Kentucky 2.621704 3.040465 2.108949
## Oklahoma 1.329943 1.981360 1.963520
## Hawaii 2.790197 3.780966 2.082005
## Missouri 3.145245 3.590548 2.576037
## New Mexico 3.792086 3.950259 3.501666
## Louisiana 2.954252 3.434074 1.527269
## South Dakota 1.518854 1.304743 3.090805
## Iowa 0.000000 1.045923 2.734770
## North Dakota 1.045923 0.000000 3.563193
## Texas 2.734770 3.563193 0.000000
dist.mah_matrix <- as.matrix(dist.mah);dist.mah_matrix
## Wyoming Illinois Mississippi Kansas New York Kentucky
## Wyoming 0.000000 1.7186109 2.820779 1.4195095 1.8695558 2.867847
## Illinois 1.718611 0.0000000 3.658323 2.2905255 0.4722069 3.878642
## Mississippi 2.820779 3.6583235 0.000000 3.2139075 3.6566922 2.544477
## Kansas 1.419510 2.2905255 3.213907 0.0000000 2.1522535 2.048031
## New York 1.869556 0.4722069 3.656692 2.1522535 0.0000000 3.698342
## Kentucky 2.867847 3.8786421 2.544477 2.0480310 3.6983422 0.000000
## Oklahoma 1.146496 1.8980286 3.237573 0.6499978 1.7772007 2.505941
## Hawaii 3.466671 3.6449604 4.722203 2.2108491 3.3748818 2.753554
## Missouri 3.198071 3.6796400 3.956918 2.2592572 3.3618939 2.642756
## New Mexico 3.281318 3.5101406 4.057258 3.1016653 3.2869855 3.870023
## Louisiana 2.284940 2.5550539 1.688058 2.2700723 2.4136664 2.119635
## South Dakota 1.826205 3.3564158 3.087365 1.6274307 3.3404110 2.261154
## Iowa 1.327907 2.6329606 3.559587 1.1128197 2.6839965 2.621704
## North Dakota 1.582582 3.1919907 3.553572 1.9466491 3.3317039 3.040465
## Texas 2.540604 2.4769381 3.093919 1.7462066 2.1399545 2.108949
## Oklahoma Hawaii Missouri New Mexico Louisiana South Dakota
## Wyoming 1.1464956 3.466671 3.198071 3.281318 2.284940 1.826205
## Illinois 1.8980286 3.644960 3.679640 3.510141 2.555054 3.356416
## Mississippi 3.2375727 4.722203 3.956918 4.057258 1.688058 3.087365
## Kansas 0.6499978 2.210849 2.259257 3.101665 2.270072 1.627431
## New York 1.7772007 3.374882 3.361894 3.286985 2.413666 3.340411
## Kentucky 2.5059414 2.753554 2.642756 3.870023 2.119635 2.261154
## Oklahoma 0.0000000 2.705865 2.203038 2.660216 2.350208 1.672866
## Hawaii 2.7058650 0.000000 3.193764 4.645567 3.383255 3.551072
## Missouri 2.2030382 3.193764 0.000000 1.836797 3.256319 2.505784
## New Mexico 2.6602159 4.645567 1.836797 0.000000 3.676879 3.026024
## Louisiana 2.3502077 3.383255 3.256319 3.676879 0.000000 3.021642
## South Dakota 1.6728664 3.551072 2.505784 3.026024 3.021642 0.000000
## Iowa 1.3299426 2.790197 3.145245 3.792086 2.954252 1.518854
## North Dakota 1.9813596 3.780966 3.590548 3.950259 3.434074 1.304743
## Texas 1.9635201 2.082005 2.576037 3.501666 1.527269 3.090805
## Iowa North Dakota Texas
## Wyoming 1.327907 1.582582 2.540604
## Illinois 2.632961 3.191991 2.476938
## Mississippi 3.559587 3.553572 3.093919
## Kansas 1.112820 1.946649 1.746207
## New York 2.683996 3.331704 2.139954
## Kentucky 2.621704 3.040465 2.108949
## Oklahoma 1.329943 1.981360 1.963520
## Hawaii 2.790197 3.780966 2.082005
## Missouri 3.145245 3.590548 2.576037
## New Mexico 3.792086 3.950259 3.501666
## Louisiana 2.954252 3.434074 1.527269
## South Dakota 1.518854 1.304743 3.090805
## Iowa 0.000000 1.045923 2.734770
## North Dakota 1.045923 0.000000 3.563193
## Texas 2.734770 3.563193 0.000000
Jarak Minkowski adalah ukuran jarak antara dua titik dalam ruang vektor yang ditentukan oleh sebuah parameter p untuk mencari jarak umum karena menjadi bentuk dasar yang mencakup berbagai jenis jarak lain p=1 jarak Manhattan, p=2 jarak Euclidean, p->tak hingga jarak chebyshev
set.seed(123)
# Data random (5 observasi dengan 3 variabel)
data <- matrix(runif(15, min = 1, max = 10), nrow = 5, ncol = 3)
colnames(data) <- c("X1", "X2", "X3")
print("Data random:")
## [1] "Data random:"
print(data)
## X1 X2 X3
## [1,] 3.588198 1.410008 9.611500
## [2,] 8.094746 5.752949 5.080007
## [3,] 4.680792 9.031771 7.098136
## [4,] 8.947157 5.962915 6.153701
## [5,] 9.464206 5.109533 1.926322
# Tentukan dua titik yang akan dihitung jaraknya
p1 <- data[1, ];p1
## X1 X2 X3
## 3.588198 1.410008 9.611500
p2 <- data[2, ];p2
## X1 X2 X3
## 8.094746 5.752949 5.080007
# Fungsi jarak Minkowski
minkowski_distance <- function(x, y, p) {
sum(abs(x - y)^p)^(1/p)
}
# Contoh penggunaan dengan p = 1 (Manhattan), p = 2 (Euclidean), p = 3 (Minkowski umum)
dist_p1 <- minkowski_distance(p1, p2, p = 1);dist_p1
## [1] 13.38098
dist_p2 <- minkowski_distance(p1, p2, p = 2);dist_p2
## [1] 7.726871
dist_p3 <- minkowski_distance(p1, p2, p = 3);dist_p3
## [1] 6.435156
dist_inf <- max(abs(p1 - p2));dist_inf
## [1] 4.531493
Vektor rata-rata adalah vektor yang diperoleh dengan menghitung rata-rata dari sekumpulan vektor, di mana setiap komponennya merupakan rata-rata dari komponen-komponen yang bersesuaian pada vektor-vektor tersebut.
Berikut Contoh nya :
BB = c(6.2,11.5,8.7,10.1,7.8,6.9,12.0,3.1,14.8,9.4)
PM = c(61,73,68,70,64,60,76,49,84,71)
RTB = c(115,138,127,123,131,120,143,95,160,128)
lizard = as.matrix(cbind(BB,PM,RTB)); lizard
## BB PM RTB
## [1,] 6.2 61 115
## [2,] 11.5 73 138
## [3,] 8.7 68 127
## [4,] 10.1 70 123
## [5,] 7.8 64 131
## [6,] 6.9 60 120
## [7,] 12.0 76 143
## [8,] 3.1 49 95
## [9,] 14.8 84 160
## [10,] 9.4 71 128
Matriks rata-rata adalah matriks yang elemennya berisi nilai rata-rata dari data yang berbentuk matriks. Misalnya, jika data terdiri dari beberapa vektor atau observasi, maka setiap kolom dirata-ratakan sehingga diperoleh vektor rata-rata, lalu hasilnya bisa ditulis dalam bentuk matriks. Matriks ini dipakai sebagai representasi pusat data
# Matriks Rata-Rata
vecMeans = as.matrix(colMeans(lizard)); vecMeans
## [,1]
## BB 9.05
## PM 67.60
## RTB 128.00
vecRata = matrix(c(mean(BB), mean(PM), mean(RTB)), nrow=3, ncol=1); vecRata
## [,1]
## [1,] 9.05
## [2,] 67.60
## [3,] 128.00
Matriks kovarian adalah matriks yang menunjukkan hubungan antar variabel dalam data. Diagonal utama matriks berisi varians setiap variabel, sedangkan elemen di luardiagonal berisi kovarian (hubungan linier) antara dua variabel
varkov = cov(lizard); varkov
## BB PM RTB
## BB 10.98056 31.80000 54.96667
## PM 31.80000 94.04444 160.22222
## RTB 54.96667 160.22222 300.66667
Matriks korelasi mirip dengan matriks kovarian, tetapi berisi koefisien korelasi (nilai antara –1 sampai 1) antara setiap pasangan variabel. Elemen diagonal selalu 1 karena variabel berkorelasi sempurna dengan dirinya sendiri. Matriks ini berguna untuk melihat seberapa kuat hubungan antar variabel
korel = cor(lizard); korel
## BB PM RTB
## BB 1.0000000 0.9895743 0.9566313
## PM 0.9895743 1.0000000 0.9528259
## RTB 0.9566313 0.9528259 1.0000000
Matriks standarisasi adalah matriks yang sudah distandarisasi, artinya setiap variabel diubah sehingga memiliki rata-rata 0 dan standar deviasi 1 atau berdistribusi normal .
n = nrow(lizard);n
## [1] 10
u = matrix(1,n,1); u
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 1
## [5,] 1
## [6,] 1
## [7,] 1
## [8,] 1
## [9,] 1
## [10,] 1
xbar = cbind((1/n)*t(u)%*%lizard); xbar
## BB PM RTB
## [1,] 9.05 67.6 128
D = lizard - u %*% xbar; D
## BB PM RTB
## [1,] -2.85 -6.6 -13
## [2,] 2.45 5.4 10
## [3,] -0.35 0.4 -1
## [4,] 1.05 2.4 -5
## [5,] -1.25 -3.6 3
## [6,] -2.15 -7.6 -8
## [7,] 2.95 8.4 15
## [8,] -5.95 -18.6 -33
## [9,] 5.75 16.4 32
## [10,] 0.35 3.4 0
S = (1/(n-1))*t(D)%*%D; S
## BB PM RTB
## BB 10.98056 31.80000 54.96667
## PM 31.80000 94.04444 160.22222
## RTB 54.96667 160.22222 300.66667
Ds = diag(sqrt(diag(S))); Ds
## [,1] [,2] [,3]
## [1,] 3.313692 0.000000 0.00000
## [2,] 0.000000 9.697651 0.00000
## [3,] 0.000000 0.000000 17.33974
R = solve(Ds) %*% S %*% solve(Ds); R
## [,1] [,2] [,3]
## [1,] 1.0000000 0.9895743 0.9566313
## [2,] 0.9895743 1.0000000 0.9528259
## [3,] 0.9566313 0.9528259 1.0000000