Eigen_Covar_Correlation

We will calculate eigenvalues, eigenvectors, the covariance matrix, and the correlation matrix for the Multivariate Analysis course assessment. Datasets was given from Credit Card Approval process.

credit_card_approval_dataset <- read.csv("credit_card_approval_dataset.csv")
credit_card_approval_dataset

##   Gender   Age  Debt Married BankCustomer    Industry YearsEmployed
## 1      1 30.83 0.000       1            1 Industrials          1.25
## 2      0 58.67 4.460       1            1   Materials          3.04
## 3      0 24.50 0.500       0            1   Materials          1.50
## 4      1 27.83 1.540       1            1 Industrials          3.75
## 5      1 20.17 5.625       1            1 Industrials          1.71
##   PriorDefault Employed CreditScore DriversLicense      Citizen ZipCode Income
## 1            1        1           1              0      ByBirth     202      0
## 2            1        1           6              0      ByBirth      43    560
## 3            1        0           0              0      ByBirth     280    824
## 4            1        1           5              1      ByBirth     100      3
## 5            1        0           0              0 ByOtherMeans     120      0
##   Approved
## 1        1
## 2        1
## 3        1
## 4        1
## 5        1

For the Credit Card Approval process, we will use four datasets: age, Debt, YearsEmployeed dan Income.

selected_columns <- credit_card_approval_dataset[, c("Age", "Debt", "YearsEmployed", "Income")]
print(selected_columns)

##     Age  Debt YearsEmployed Income
## 1 30.83 0.000          1.25      0
## 2 58.67 4.460          3.04    560
## 3 24.50 0.500          1.50    824
## 4 27.83 1.540          3.75      3
## 5 20.17 5.625          1.71      0

Calculate each data’s Eigen Values

eigen_values_vectors <- eigen(cor(selected_columns))
print(eigen_values_vectors$values)

## [1] 1.6257301 1.2116374 0.7801983 0.3824342

eigen_values <- (eigen_values_vectors$values)

Every number express Principal Componen (PC). Percentage (%) shows bellow

total_variance <- sum(eigen_values)
explained_variance <- (eigen_values / total_variance) * 100

print(explained_variance)

## [1] 40.643254 30.290936 19.504956  9.560854

Percentage value show every data contribution.

Calculate Eigen Value Vector

print(eigen_values_vectors$vectors)

##           [,1]       [,2]        [,3]       [,4]
## [1,] 0.6669885 -0.2767578 -0.04870403  0.6900431
## [2,] 0.4325557  0.4214898  0.77292363 -0.1945018
## [3,] 0.5743585  0.2815327 -0.59684810 -0.4843800
## [4,] 0.1952799 -0.8163889  0.20973632 -0.5013838

Eigenvectors show the contribution of each variable to the PC. For example, in row 1, the ‘Age’ variable has a significant effect on PC1 and PC4 but less effect on PC2 and PC3.

Calculate Matrix Covariance

cov_matrix <- cov(selected_columns)
print(cov_matrix)

##                       Age        Debt YearsEmployed      Income
## Age            231.361400    9.345663      6.999375   2046.9725
## Debt             9.345663    6.187675      0.605225   -112.3137
## YearsEmployed    6.999375    0.605225      1.182050    -42.7750
## Income        2046.972500 -112.313750    -42.775000 151957.8000

The results show the data distribution for Age, Debt, YearsEmployed, and Income. The ‘YearsEmployed’ variable has the smallest spread, indicating that many employees have a similar number of working years. ‘Age’ shows a relatively wide spread, reflecting more variation in the age of individuals. ‘Debt’ demonstrates low variance, meaning most individuals have similar debt values. ‘Income’ has the largest spread, suggesting a wide range of income levels and the potential presence of outliers in the data.

Calculate Matrix Correlation

cor_matrix <- cor(selected_columns)
print(cor_matrix)

##                     Age       Debt YearsEmployed     Income
## Age           1.0000000  0.2470022     0.4232489  0.3452272
## Debt          0.2470022  1.0000000     0.2237872 -0.1158264
## YearsEmployed 0.4232489  0.2237872     1.0000000 -0.1009278
## Income        0.3452272 -0.1158264    -0.1009278  1.0000000

Based on the correlation scale ranging from -1 to 1, each value represents the strength of the relationship between variables. Correlation value below 0.2 indicates a very weak relationship, below 0.4 is weak, below 0.6 is moderate, below 0.8 is strong, and below 0.99 is very strong. Value of 1 represents a perfect positive correlation. This scale also applies in reverse for negative correlations, where -1 indicates a perfect negative correlation and values closer to 0 show weaker relationships.

Age and Debt equals Weak Relationship
Age and YearsEmployed equals Moderate
Age and Income equals Weak Relationship
Debt and YearsEmployed equals Weak Relationship
Debt and Income equals Too Weak Negative
YearsEmployeed & Income equals Too Weak Negative

Eigen_Covar_Correlation_Kelompok 2023C

Brilliyanda Annisaatulrohmah, Gatiari Dwi Panefi, Metha Nailis Sa’adah, Annisa Khaynun Najwa

2025-02-27