Tugas Analisis Multivariat 2023F

Anggota Kelompok

Farah Raina Febiana (23031554132)
Reva Deshinta Isyana (23031554153)
Salsa Rahma Aulia (23031554219)

Assignment Week 1

# Membuat data credit sesuai dengan yang dibutuhkan
data_credit <- data.frame(
  Age = c(30.83, 58.67, 24.50, 27.83),    
  Debt = c(0.000, 4.460, 0.500, 1.540),  
  YearsEmployed = c(1.25, 3.04, 1.50, 3.75),   
  Income = c(0, 560, 824, 3)  
)

# Menampilkan data credit 
library(knitr)
kable(data_credit, caption = "Dataset Credit")

Dataset Credit
Age	Debt	YearsEmployed	Income
30.83	0.00	1.25	0
58.67	4.46	3.04	560
24.50	0.50	1.50	824
27.83	1.54	3.75	3

Dataset Credit sudah berhasil dibuat. Tahap selanjutnya adalah menghitung Variance-Covariance Matrix, Eigen Value dan Eigen Vector, serta Correlation Matrix.

Variance-Covariance Matrix

# Menghitung Variance-Covariance Matrix
cov_matrix <- cov(data_credit)
print(cov_matrix)

##                      Age       Debt YearsEmployed      Income
## Age            246.15983  28.767550      6.580750   1315.7125
## Debt            28.76755   3.983567      1.526967    220.1150
## YearsEmployed    6.58075   1.526967      1.454567   -119.4483
## Income        1315.71250 220.115000   -119.448333 170547.5833

Variance-Covariance Matrix menunjukkan hubungan linier antar variabel dengan melihat varians dan kovarians. Ini berguna untuk memahami seberapa erat keterkaitan antar variabel dalam dataset credit.

Eigen Value dan Eigen Vector

# Menghitung Eigen Value dan Eigen Vector 
eig <- eigen(cov_matrix)

# Menampilkan Eigen Value
print(eig$values)

## [1]  1.705581e+05  2.393532e+02  1.712288e+00 -1.632702e-11

# Menampilkan Eigen Vector 
print(eig$vectors)

##               [,1]         [,2]          [,3]          [,4]
## [1,] -0.0077252614  0.992899630  0.0928018443  0.0740166350
## [2,] -0.0012918442  0.114272856 -0.5786760906 -0.8075110078
## [3,]  0.0007000127  0.032113304 -0.8102601508  0.5851894853
## [4,] -0.9999690803 -0.007795793 -0.0005365682  0.0008810479

Eigenvalues menunjukkan seberapa besar setiap komponen menjelaskan variasi data. Semakin besar nilai eigenvalue, semakin penting komponen tersebut.

Eigenvectors menunjukkan arah dari komponen dalam ruang variabel asli, yaitu bagaimana variabel-variabel awal membentuk komponen baru.

Correlation Matrix

# Menghitung Correlation Matrix
cor_matrix <- cor(data_credit)

# Menampilkan Correlation Matrix
print(cor_matrix)

##                     Age      Debt YearsEmployed     Income
## Age           1.0000000 0.9186673     0.3477763  0.2030625
## Debt          0.9186673 1.0000000     0.6343467  0.2670489
## YearsEmployed 0.3477763 0.6343467     1.0000000 -0.2398228
## Income        0.2030625 0.2670489    -0.2398228  1.0000000

Correlation Matrix mengukur hubungan antar variabel dalam skala -1 hingga 1. Nilai mendekati 1 atau -1 menunjukkan hubungan kuat, sedangkan mendekati 0 berarti tidak ada hubungan signifikan.

Assignment Week 3

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

path = "D:\\datasetweek3.csv"

mf<-read_csv(path, show_col_types = FALSE)

# Create  data-frames
X<-mf %>% dplyr::select(`Sistolik`, `Diastolik`) %>%
  scale()

Y<-mf %>% dplyr::select(`Tinggi`, `Berat`) %>%
  scale()

cc <- cancor(X,Y)

str(cc)

## List of 5
##  $ cor    : num [1:2] 0.721 0.195
##  $ xcoef  : num [1:2, 1:2] 0.214 -0.595 0.692 -0.412
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "Sistolik" "Diastolik"
##   .. ..$ : NULL
##  $ ycoef  : num [1:2, 1:2] 0.228 -0.648 -1.123 0.945
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "Tinggi" "Berat"
##   .. ..$ : NULL
##  $ xcenter: Named num [1:2] 3.98e-16 1.15e-15
##   ..- attr(*, "names")= chr [1:2] "Sistolik" "Diastolik"
##  $ ycenter: Named num [1:2] -1.11e-15 -4.28e-16
##   ..- attr(*, "names")= chr [1:2] "Tinggi" "Berat"

print(cc)

## $cor
## [1] 0.7206701 0.1953755
## 
## $xcoef
##                 [,1]       [,2]
## Sistolik   0.2135035  0.6918022
## Diastolik -0.5952280 -0.4121620
## 
## $ycoef
##              [,1]       [,2]
## Tinggi  0.2281666 -1.1233476
## Berat  -0.6483502  0.9453105
## 
## $xcenter
##     Sistolik    Diastolik 
## 3.978299e-16 1.147230e-15 
## 
## $ycenter
##        Tinggi         Berat 
## -1.110223e-15 -4.278985e-16

cc$cor

## [1] 0.7206701 0.1953755

Hasil Analisis

Korelasi kanonik pertama menunjukkan hubungan yang cukup kuat antara kombinasi variabel dalam kedua set. Dari koefisien kanonik, variabel dalam set pertama lebih berpengaruh.

Korelasi kanonik kedua lebih lemah, sehingga hubungannya kurang signifikan.

CC1_X <- as.matrix(X) %*% cc$xcoef[, 1]
CC1_Y <- as.matrix(Y) %*% cc$ycoef[, 1]

CC2_X <- as.matrix(X) %*% cc$xcoef[, 2]
CC2_Y <- as.matrix(Y) %*% cc$ycoef[, 2]

cca_df <- mf %>% 
  mutate(CC1_X=CC1_X,
         CC1_Y=CC1_Y,
         CC2_X=CC2_X,
         CC2_Y=CC2_Y) %>%
  glimpse()

## Rows: 6
## Columns: 9
## $ No        <dbl> 1, 2, 3, 4, 5, 6
## $ Sistolik  <dbl> 120, 109, 130, 121, 135, 140
## $ Diastolik <dbl> 76, 80, 82, 78, 85, 87
## $ Tinggi    <dbl> 165, 180, 170, 185, 180, 187
## $ Berat     <dbl> 60, 80, 70, 85, 90, 87
## $ CC1_X     <dbl[,1]> <matrix[6 x 1]>
## $ CC1_Y     <dbl[,1]> <matrix[6 x 1]>
## $ CC2_X     <dbl[,1]> <matrix[6 x 1]>
## $ CC2_Y     <dbl[,1]> <matrix[6 x 1]>

cca_df %>% 
  ggplot(aes(x=CC1_X,y=CC1_Y, color=`Tinggi`))+
  geom_point()

Hasil Analisis

Koefisien kanonik menunjukkan bahwa Sistolik(X1) memiliki pengaruh lebih besar dalam pasangan kedua, sedangkan Diastolik(X2) cenderung berkontribusi negatif dalam kedua pasangan, denan nilai yang berbeda. Pada variabel Y, Tinggi(Y1) memiliki peran dominan dalam pasangan kedua tetapi dengan arah negatif, sementara Berat(Y2) memberikan nilai positif yang lebih signifikan dalam pasangan tersebut.

Hasil ini menunjukkan bahwa pasangan pertama memiliki hubungan yang lebih kuat antara kedua kelompok variabel, terlihat dari korelasi kanonik yang lebih tinggi (0.72067). Sebaliknya, pasangan kedua memiliki korelasi yang jauh lebih kecil (0.19537), sehingga hubungan yang terbentuk dalam pasangan ini tidak terlalu berarti.

Interpretasi

Dalam scatter plot, perbedaan warna merepresentasikan variasi nilai Tinggi, dengan warna lebih gelap mengindikasikan nilai yang lebih tinggi. Sebagian besar titik berada di sekitar pusat, menunjukkan adanya pola pengelompokan data di sekitar nol.

#| fig.width: 5.5
#| fig.height: 4
#| 
# First Canonical Variate of X vs Latent Variable
p1<-cca_df %>% 
  ggplot(aes(x=`Tinggi`,y=CC1_X, color=`Tinggi`))+
  geom_boxplot(width=0.5)+
  geom_jitter(width=0.15)+
  theme(legend.position="none")+
  ggtitle("First Canonical Variate of X vs Tinggi") 

# First Canonical Variate of Y vs Latent Variable
p2<-cca_df %>% 
  ggplot(aes(x=`Tinggi`,y=CC1_Y, color=`Tinggi`))+
  geom_boxplot(width=0.5)+
  geom_jitter(width=0.15)+
  theme(legend.position="none")+
  ggtitle("First Canonical Variate of Y vs Tinggi") 

library(patchwork)
p1+p2

## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

## Warning: The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

## Warning: The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Hasil Analisis

Median: Nilai tengah untuk CC1_X dan CC1_Y berada di sekitar nol, menunjukkan bahwa distribusi data cenderung simetris di sekitar nilai tersebut.
Interquartile Range (IQR): Rentang antara kuartil pertama dan kuartil ketiga cukup besar, menandakan bahwa variasi data cukup tinggi dalam kedua variabel.
Outlier: Terdapat beberapa titik di luar whisker, yang menunjukkan keberadaan individu dengan nilai ekstrem dalam distribusi data.
Distribusi Data: Sebagian besar data berada di sekitar nol, tanpa pola yang jelas, mengindikasikan bahwa tidak ada hubungan linear yang kuat antara tinggi badan dengan variabel kanonik pertama dari X maupun Y.

Tugas Analisis Multivariat 2023F

Kelompok 5

21 Februari 2025

Anggota Kelompok

Assignment Week 1

Variance-Covariance Matrix

Eigen Value dan Eigen Vector

Correlation Matrix

Assignment Week 3

Hasil Analisis

Hasil Analisis

Interpretasi

Hasil Analisis