We will calculate eigenvalues, eigenvectors, the covariance matrix, and the correlation matrix for the Multivariate Analysis course assessment. Datasets was given from Credit Card Approval process.

credit_card_approval_dataset <- read.csv("credit_card_approval_dataset.csv")
credit_card_approval_dataset
##   Gender   Age  Debt Married BankCustomer    Industry YearsEmployed
## 1      1 30.83 0.000       1            1 Industrials          1.25
## 2      0 58.67 4.460       1            1   Materials          3.04
## 3      0 24.50 0.500       0            1   Materials          1.50
## 4      1 27.83 1.540       1            1 Industrials          3.75
## 5      1 20.17 5.625       1            1 Industrials          1.71
##   PriorDefault Employed CreditScore DriversLicense      Citizen ZipCode Income
## 1            1        1           1              0      ByBirth     202      0
## 2            1        1           6              0      ByBirth      43    560
## 3            1        0           0              0      ByBirth     280    824
## 4            1        1           5              1      ByBirth     100      3
## 5            1        0           0              0 ByOtherMeans     120      0
##   Approved
## 1        1
## 2        1
## 3        1
## 4        1
## 5        1
  1. For the Credit Card Approval process, we will use four datasets: age, Debt, YearsEmployeed dan Income.
selected_columns <- credit_card_approval_dataset[, c("Age", "Debt", "YearsEmployed", "Income")]
print(selected_columns)
##     Age  Debt YearsEmployed Income
## 1 30.83 0.000          1.25      0
## 2 58.67 4.460          3.04    560
## 3 24.50 0.500          1.50    824
## 4 27.83 1.540          3.75      3
## 5 20.17 5.625          1.71      0
  1. Calculate each data’s Eigen Values
eigen_values_vectors <- eigen(cor(selected_columns))
print(eigen_values_vectors$values)
## [1] 1.6257301 1.2116374 0.7801983 0.3824342
eigen_values <- (eigen_values_vectors$values)

Every number express Principal Componen (PC). Percentage (%) shows bellow

total_variance <- sum(eigen_values)
explained_variance <- (eigen_values / total_variance) * 100

print(explained_variance)
## [1] 40.643254 30.290936 19.504956  9.560854

Percentage value show every data contribution.

  1. Calculate Eigen Value Vector
print(eigen_values_vectors$vectors)
##           [,1]       [,2]        [,3]       [,4]
## [1,] 0.6669885 -0.2767578 -0.04870403  0.6900431
## [2,] 0.4325557  0.4214898  0.77292363 -0.1945018
## [3,] 0.5743585  0.2815327 -0.59684810 -0.4843800
## [4,] 0.1952799 -0.8163889  0.20973632 -0.5013838

Eigenvectors show the contribution of each variable to the PC. For example, in row 1, the ‘Age’ variable has a significant effect on PC1 and PC4 but less effect on PC2 and PC3.

  1. Calculate Matrix Covariance
cov_matrix <- cov(selected_columns)
print(cov_matrix)
##                       Age        Debt YearsEmployed      Income
## Age            231.361400    9.345663      6.999375   2046.9725
## Debt             9.345663    6.187675      0.605225   -112.3137
## YearsEmployed    6.999375    0.605225      1.182050    -42.7750
## Income        2046.972500 -112.313750    -42.775000 151957.8000

The results show the data distribution for Age, Debt, YearsEmployed, and Income. The ‘YearsEmployed’ variable has the smallest spread, indicating that many employees have a similar number of working years. ‘Age’ shows a relatively wide spread, reflecting more variation in the age of individuals. ‘Debt’ demonstrates low variance, meaning most individuals have similar debt values. ‘Income’ has the largest spread, suggesting a wide range of income levels and the potential presence of outliers in the data.

  1. Calculate Matrix Correlation
cor_matrix <- cor(selected_columns)
print(cor_matrix)
##                     Age       Debt YearsEmployed     Income
## Age           1.0000000  0.2470022     0.4232489  0.3452272
## Debt          0.2470022  1.0000000     0.2237872 -0.1158264
## YearsEmployed 0.4232489  0.2237872     1.0000000 -0.1009278
## Income        0.3452272 -0.1158264    -0.1009278  1.0000000

Based on the correlation scale ranging from -1 to 1, each value represents the strength of the relationship between variables. Correlation value below 0.2 indicates a very weak relationship, below 0.4 is weak, below 0.6 is moderate, below 0.8 is strong, and below 0.99 is very strong. Value of 1 represents a perfect positive correlation. This scale also applies in reverse for negative correlations, where -1 indicates a perfect negative correlation and values closer to 0 show weaker relationships.

  1. Age and Debt equals Weak Relationship
  2. Age and YearsEmployed equals Moderate
  3. Age and Income equals Weak Relationship
  4. Debt and YearsEmployed equals Weak Relationship
  5. Debt and Income equals Too Weak Negative
  6. YearsEmployeed & Income equals Too Weak Negative