library(readxl)
library(corrplot)
## corrplot 0.95 loaded
library(knitr)
data <- read_excel("Concrete_Data.xls")
fitur <- data[, -ncol(data)]
head(fitur)
## # A tibble: 6 × 8
## Cement (component 1)(kg in a m…¹ Blast Furnace Slag (…² Fly Ash (component 3…³
## <dbl> <dbl> <dbl>
## 1 540 0 0
## 2 540 0 0
## 3 332. 142. 0
## 4 332. 142. 0
## 5 199. 132. 0
## 6 266 114 0
## # ℹ abbreviated names: ¹`Cement (component 1)(kg in a m^3 mixture)`,
## # ²`Blast Furnace Slag (component 2)(kg in a m^3 mixture)`,
## # ³`Fly Ash (component 3)(kg in a m^3 mixture)`
## # ℹ 5 more variables: `Water (component 4)(kg in a m^3 mixture)` <dbl>,
## # `Superplasticizer (component 5)(kg in a m^3 mixture)` <dbl>,
## # `Coarse Aggregate (component 6)(kg in a m^3 mixture)` <dbl>,
## # `Fine Aggregate (component 7)(kg in a m^3 mixture)` <dbl>, …
Data diimpor menggunakan fungsi read_excel() karena file berformat Excel. Dataset terdiri dari beberapa variabel numerik yang merepresentasikan komposisi material beton dan umur beton.
Pada analisis ini, hanya variabel fitur yang digunakan, sedangkan variabel target tidak disertakan dalam perhitungan agar analisis korelasi, kovarians, dan eigen hanya menggambarkan hubungan antar fitur.
cor_matrix <- cor(fitur)
kable(cor_matrix)
| Cement (component 1)(kg in a m^3 mixture) | Blast Furnace Slag (component 2)(kg in a m^3 mixture) | Fly Ash (component 3)(kg in a m^3 mixture) | Water (component 4)(kg in a m^3 mixture) | Superplasticizer (component 5)(kg in a m^3 mixture) | Coarse Aggregate (component 6)(kg in a m^3 mixture) | Fine Aggregate (component 7)(kg in a m^3 mixture) | Age (day) | |
|---|---|---|---|---|---|---|---|---|
| Cement (component 1)(kg in a m^3 mixture) | 1.0000000 | -0.2751934 | -0.3974754 | -0.0815436 | 0.0927714 | -0.1093560 | -0.2227202 | 0.0819473 |
| Blast Furnace Slag (component 2)(kg in a m^3 mixture) | -0.2751934 | 1.0000000 | -0.3235695 | 0.1072859 | 0.0433757 | -0.2839982 | -0.2815933 | -0.0442458 |
| Fly Ash (component 3)(kg in a m^3 mixture) | -0.3974754 | -0.3235695 | 1.0000000 | -0.2570440 | 0.3773396 | -0.0099768 | 0.0790764 | -0.1543702 |
| Water (component 4)(kg in a m^3 mixture) | -0.0815436 | 0.1072859 | -0.2570440 | 1.0000000 | -0.6574644 | -0.1823117 | -0.4506350 | 0.2776044 |
| Superplasticizer (component 5)(kg in a m^3 mixture) | 0.0927714 | 0.0433757 | 0.3773396 | -0.6574644 | 1.0000000 | -0.2663028 | 0.2225015 | -0.1927165 |
| Coarse Aggregate (component 6)(kg in a m^3 mixture) | -0.1093560 | -0.2839982 | -0.0099768 | -0.1823117 | -0.2663028 | 1.0000000 | -0.1785058 | -0.0030155 |
| Fine Aggregate (component 7)(kg in a m^3 mixture) | -0.2227202 | -0.2815933 | 0.0790764 | -0.4506350 | 0.2225015 | -0.1785058 | 1.0000000 | -0.1560940 |
| Age (day) | 0.0819473 | -0.0442458 | -0.1543702 | 0.2776044 | -0.1927165 | -0.0030155 | -0.1560940 | 1.0000000 |
Matriks korelasi digunakan untuk mengukur hubungan linear antar fitur dalam dataset. Nilai korelasi berada pada rentang −1 hingga 1, di mana nilai mendekati 1 menunjukkan hubungan positif yang kuat, nilai mendekati −1 menunjukkan hubungan negatif yang kuat, dan nilai mendekati 0 menunjukkan hubungan yang lemah atau tidak ada hubungan linear.
Hasil matriks korelasi memberikan gambaran awal mengenai fitur-fitur yang saling berkaitan dan potensi adanya redundansi antar variabel.
cov_matrix <- cov(fitur)
kable(cov_matrix)
| Cement (component 1)(kg in a m^3 mixture) | Blast Furnace Slag (component 2)(kg in a m^3 mixture) | Fly Ash (component 3)(kg in a m^3 mixture) | Water (component 4)(kg in a m^3 mixture) | Superplasticizer (component 5)(kg in a m^3 mixture) | Coarse Aggregate (component 6)(kg in a m^3 mixture) | Fine Aggregate (component 7)(kg in a m^3 mixture) | Age (day) | |
|---|---|---|---|---|---|---|---|---|
| Cement (component 1)(kg in a m^3 mixture) | 10921.74265 | -2481.35943 | -2658.3508 | -181.98979 | 57.91462 | -888.60851 | -1866.1511 | 540.99182 |
| Blast Furnace Slag (component 2)(kg in a m^3 mixture) | -2481.35943 | 7444.08373 | -1786.6076 | 197.67855 | 22.35531 | -1905.21057 | -1947.9113 | -241.15038 |
| Fly Ash (component 3)(kg in a m^3 mixture) | -2658.35075 | -1786.60759 | 4095.5481 | -351.29712 | 144.25026 | -49.64420 | 405.7364 | -624.06475 |
| Water (component 4)(kg in a m^3 mixture) | -181.98979 | 197.67855 | -351.2971 | 456.06024 | -83.87096 | -302.72431 | -771.5735 | 374.49650 |
| Superplasticizer (component 5)(kg in a m^3 mixture) | 57.91462 | 22.35531 | 144.2503 | -83.87096 | 35.68260 | -123.68745 | 106.5620 | -72.72060 |
| Coarse Aggregate (component 6)(kg in a m^3 mixture) | -888.60851 | -1905.21057 | -49.6442 | -302.72431 | -123.68745 | 6045.65623 | -1112.7952 | -14.81127 |
| Fine Aggregate (component 7)(kg in a m^3 mixture) | -1866.15111 | -1947.91126 | 405.7364 | -771.57347 | 106.56203 | -1112.79516 | 6428.0992 | -790.56558 |
| Age (day) | 540.99182 | -241.15038 | -624.0647 | 374.49650 | -72.72060 | -14.81127 | -790.5656 | 3990.43773 |
Matriks varians–kovarians digunakan untuk melihat tingkat penyebaran data (varians) dari masing-masing fitur serta arah hubungan (kovarians) antar fitur.
Nilai diagonal pada matriks menunjukkan varians masing-masing fitur, sedangkan nilai di luar diagonal menunjukkan kovarians antar pasangan fitur. Varians yang besar menunjukkan penyebaran data yang tinggi, sementara kovarians menunjukkan apakah dua fitur cenderung meningkat atau menurun secara bersamaan.
eigen_result <- eigen(cov_matrix)
kable(eigen_result$values)
| x |
|---|
| 12840.97152 |
| 9809.73610 |
| 7284.34193 |
| 4243.67465 |
| 3979.16746 |
| 1176.42112 |
| 71.66399 |
| 11.33366 |
kable(eigen_result$vectors)
| 0.9056425 | -0.0326386 | 0.1548071 | -0.0082427 | 0.1513774 | -0.3065154 | -0.1943806 | -0.0079102 |
| -0.2625398 | -0.7860533 | 0.0729160 | -0.1990583 | 0.1067080 | -0.4534540 | -0.2261846 | -0.0092468 |
| -0.2386159 | 0.3030150 | -0.0514909 | 0.6872239 | 0.1775836 | -0.5123562 | -0.2867754 | 0.0056077 |
| 0.0055668 | -0.0762636 | -0.0414557 | 0.0755522 | -0.0984242 | 0.4824817 | -0.8246303 | -0.2534467 |
| -0.0013062 | 0.0050940 | 0.0240654 | 0.0205136 | 0.0229317 | -0.1044518 | 0.2332325 | -0.9659912 |
| -0.0091047 | 0.2745743 | -0.7606985 | -0.4800469 | 0.0763613 | -0.2707187 | -0.1859496 | -0.0414960 |
| -0.2101313 | 0.4506929 | 0.6107760 | -0.4851455 | -0.1328356 | -0.2571290 | -0.2445951 | -0.0268318 |
| 0.0983676 | -0.0698540 | -0.1185727 | 0.1268506 | -0.9489325 | -0.2341287 | 0.0003335 | 0.0021084 |
Eigen value dan eigen vector diperoleh dari dekomposisi matriks kovarians. Eigen value menunjukkan besarnya variasi data yang dijelaskan oleh masing-masing komponen utama, sedangkan eigen vector menunjukkan kontribusi masing-masing fitur terhadap komponen tersebut. Komponen dengan eigen value terbesar merupakan komponen yang paling dominan dalam menjelaskan variasi data.
Berdasarkan analisis yang dilakukan, hubungan antar fitur dapat dipahami melalui matriks korelasi dan kovarians, sedangkan struktur variasi data dapat dijelaskan melalui eigen value dan eigen vector. Analisis ini memberikan pemahaman awal mengenai karakteristik fitur dalam dataset beton sebelum dilakukan pemodelan lebih lanjut.