1 Pendahuluan

Analisis ini bertujuan untuk memahami hubungan antar variabel dalam Concrete Compressive Strength Dataset dengan menggunakan analisis multivariat. Beberapa metode yang digunakan meliputi Correlation Matrix untuk mengetahui kekuatan hubungan linear antar variabel, Variance-Covariance Matrix untuk melihat variasi data serta hubungan perubahan antar variabel, dan analisis Eigenvalue serta Eigenvector untuk mengekstraksi principal components. Melalui analisis ini, data yang memiliki banyak variabel dapat diringkas dan dipahami dengan lebih mudah tanpa kehilangan informasi penting, sehingga pola hubungan antar variabel dalam dataset kekuatan tekan beton dapat dianalisis secara lebih jelas.


2 Import dan Eksplorasi Data

2.1 Load Libraries

library(readxl)
library(corrplot)
library(ggplot2)
library(knitr)
library(kableExtra)
library(dplyr)

2.2 Import Dataset

data_raw <- read_excel("Concrete_Data.xls")
data <- data_raw %>% select(where(is.numeric))
data <- na.omit(data)

2.3 Preview Data

head(data, 10) %>%
  kable(caption = "10 Observasi Pertama", digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE,
                position = "center")
10 Observasi Pertama
Cement (component 1)(kg in a m^3 mixture) Blast Furnace Slag (component 2)(kg in a m^3 mixture) Fly Ash (component 3)(kg in a m^3 mixture) Water (component 4)(kg in a m^3 mixture) Superplasticizer (component 5)(kg in a m^3 mixture) Coarse Aggregate (component 6)(kg in a m^3 mixture) Fine Aggregate (component 7)(kg in a m^3 mixture) Age (day) Concrete compressive strength(MPa, megapascals)
540.0 0.0 0 162 2.5 1040.0 676.0 28 79.99
540.0 0.0 0 162 2.5 1055.0 676.0 28 61.89
332.5 142.5 0 228 0.0 932.0 594.0 270 40.27
332.5 142.5 0 228 0.0 932.0 594.0 365 41.05
198.6 132.4 0 192 0.0 978.4 825.5 360 44.30
266.0 114.0 0 228 0.0 932.0 670.0 90 47.03
380.0 95.0 0 228 0.0 932.0 594.0 365 43.70
380.0 95.0 0 228 0.0 932.0 594.0 28 36.45
266.0 114.0 0 228 0.0 932.0 670.0 28 45.85
475.0 0.0 0 228 0.0 932.0 594.0 28 39.29

2.4 Statistik Deskriptif

summary(data) %>%
  kable(caption = "Statistik Deskriptif", digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = FALSE,
                position = "center")
Statistik Deskriptif
Cement (component 1)(kg in a m^3 mixture) Blast Furnace Slag (component 2)(kg in a m^3 mixture) Fly Ash (component 3)(kg in a m^3 mixture) Water (component 4)(kg in a m^3 mixture) Superplasticizer (component 5)(kg in a m^3 mixture) Coarse Aggregate (component 6)(kg in a m^3 mixture) Fine Aggregate (component 7)(kg in a m^3 mixture) Age (day) Concrete compressive strength(MPa, megapascals)
Min. :102.0 Min. : 0.0 Min. : 0.00 Min. :121.8 Min. : 0.000 Min. : 801.0 Min. :594.0 Min. : 1.00 Min. : 2.332
1st Qu.:192.4 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.:164.9 1st Qu.: 0.000 1st Qu.: 932.0 1st Qu.:731.0 1st Qu.: 7.00 1st Qu.:23.707
Median :272.9 Median : 22.0 Median : 0.00 Median :185.0 Median : 6.350 Median : 968.0 Median :779.5 Median : 28.00 Median :34.443
Mean :281.2 Mean : 73.9 Mean : 54.19 Mean :181.6 Mean : 6.203 Mean : 972.9 Mean :773.6 Mean : 45.66 Mean :35.818
3rd Qu.:350.0 3rd Qu.:142.9 3rd Qu.:118.27 3rd Qu.:192.0 3rd Qu.:10.160 3rd Qu.:1029.4 3rd Qu.:824.0 3rd Qu.: 56.00 3rd Qu.:46.136
Max. :540.0 Max. :359.4 Max. :200.10 Max. :247.0 Max. :32.200 Max. :1145.0 Max. :992.6 Max. :365.00 Max. :82.599

Dimensi Data: 1030 observasi × 9 variabel


3 Correlation Matrix

3.1 Perhitungan

Correlation matrix digunakan untuk mengukur kekuatan dan arah hubungan linear antara setiap pasangan variabel, dengan nilai korelasi yang berkisar antara -1 hingga +1. Nilai yang mendekati +1 menunjukkan hubungan positif yang kuat, sedangkan nilai yang mendekati -1 menunjukkan hubungan negatif yang kuat. Sementara itu, nilai korelasi yang mendekati 0 menandakan bahwa hubungan linear antar variabel relatif lemah atau tidak signifikan.

corr_matrix <- cor(data)

corr_matrix %>%
  kable(caption = "Correlation Matrix", digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                font_size = 11) %>%
  scroll_box(width = "100%", height = "400px")
Correlation Matrix
Cement (component 1)(kg in a m^3 mixture) Blast Furnace Slag (component 2)(kg in a m^3 mixture) Fly Ash (component 3)(kg in a m^3 mixture) Water (component 4)(kg in a m^3 mixture) Superplasticizer (component 5)(kg in a m^3 mixture) Coarse Aggregate (component 6)(kg in a m^3 mixture) Fine Aggregate (component 7)(kg in a m^3 mixture) Age (day) Concrete compressive strength(MPa, megapascals)
Cement (component 1)(kg in a m^3 mixture) 1.0000 -0.2752 -0.3975 -0.0815 0.0928 -0.1094 -0.2227 0.0819 0.4978
Blast Furnace Slag (component 2)(kg in a m^3 mixture) -0.2752 1.0000 -0.3236 0.1073 0.0434 -0.2840 -0.2816 -0.0442 0.1348
Fly Ash (component 3)(kg in a m^3 mixture) -0.3975 -0.3236 1.0000 -0.2570 0.3773 -0.0100 0.0791 -0.1544 -0.1058
Water (component 4)(kg in a m^3 mixture) -0.0815 0.1073 -0.2570 1.0000 -0.6575 -0.1823 -0.4506 0.2776 -0.2896
Superplasticizer (component 5)(kg in a m^3 mixture) 0.0928 0.0434 0.3773 -0.6575 1.0000 -0.2663 0.2225 -0.1927 0.3661
Coarse Aggregate (component 6)(kg in a m^3 mixture) -0.1094 -0.2840 -0.0100 -0.1823 -0.2663 1.0000 -0.1785 -0.0030 -0.1649
Fine Aggregate (component 7)(kg in a m^3 mixture) -0.2227 -0.2816 0.0791 -0.4506 0.2225 -0.1785 1.0000 -0.1561 -0.1672
Age (day) 0.0819 -0.0442 -0.1544 0.2776 -0.1927 -0.0030 -0.1561 1.0000 0.3289
Concrete compressive strength(MPa, megapascals) 0.4978 0.1348 -0.1058 -0.2896 0.3661 -0.1649 -0.1672 0.3289 1.0000

3.2 Visualisasi

corrplot(corr_matrix, 
         method = "color",
         type = "upper",
         order = "hclust",
         addCoef.col = "black",
         tl.col = "black",
         tl.srt = 45,
         number.cex = 0.6,
         tl.cex = 0.8,
         col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200),
         title = "Correlation Matrix - Concrete Data",
         mar = c(0,0,2,0))
Heatmap Correlation Matrix

Heatmap Correlation Matrix

3.3 Interpretasi

Kriteria kekuatan korelasi:

Nilai |r| Interpretasi
0.00 - 0.19 Sangat Lemah
0.20 - 0.39 Lemah
0.40 - 0.59 Sedang
0.60 - 0.79 Kuat
0.80 - 1.00 Sangat Kuat
cor_melted <- data.frame(
  Var1 = rownames(corr_matrix)[row(corr_matrix)],
  Var2 = colnames(corr_matrix)[col(corr_matrix)],
  Correlation = c(corr_matrix)
) %>%
  filter(Var1 != Var2) %>%
  mutate(Abs_Cor = abs(Correlation)) %>%
  arrange(desc(Abs_Cor)) %>%
  head(5)

cor_melted %>%
  select(Var1, Var2, Correlation) %>%
  kable(caption = "Top 5 Korelasi Tertinggi", 
        digits = 4,
        col.names = c("Variabel 1", "Variabel 2", "Korelasi")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                position = "center")
Top 5 Korelasi Tertinggi
Variabel 1 Variabel 2 Korelasi
Superplasticizer (component 5)(kg in a m^3 mixture) Water (component 4)(kg in a m^3 mixture) -0.6575
Water (component 4)(kg in a m^3 mixture) Superplasticizer (component 5)(kg in a m^3 mixture) -0.6575
Concrete compressive strength(MPa, megapascals) Cement (component 1)(kg in a m^3 mixture) 0.4978
Cement (component 1)(kg in a m^3 mixture) Concrete compressive strength(MPa, megapascals) 0.4978
Fine Aggregate (component 7)(kg in a m^3 mixture) Water (component 4)(kg in a m^3 mixture) -0.4506

4 Variance-Covariance Matrix

4.1 Perhitungan

Variance-Covariance matrix digunakan untuk menggambarkan seberapa besar keragaman data serta hubungan perubahan antar variabel. Nilai pada bagian diagonal matriks menunjukkan variance, yaitu tingkat keragaman masing-masing variabel terhadap nilai rata-ratanya. Semakin besar nilai variance, semakin besar pula penyebaran data pada variabel tersebut. Sementara itu, nilai pada bagian off-diagonal menunjukkan covariance, yang menggambarkan bagaimana dua variabel berubah secara bersamaan. Nilai covariance yang positif menandakan bahwa kedua variabel cenderung meningkat atau menurun bersama, sedangkan nilai negatif menunjukkan hubungan perubahan yang berlawanan arah.

cov_matrix <- cov(data)

cov_matrix %>%
  kable(caption = "Variance-Covariance Matrix", digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                font_size = 11) %>%
  scroll_box(width = "100%", height = "400px")
Variance-Covariance Matrix
Cement (component 1)(kg in a m^3 mixture) Blast Furnace Slag (component 2)(kg in a m^3 mixture) Fly Ash (component 3)(kg in a m^3 mixture) Water (component 4)(kg in a m^3 mixture) Superplasticizer (component 5)(kg in a m^3 mixture) Coarse Aggregate (component 6)(kg in a m^3 mixture) Fine Aggregate (component 7)(kg in a m^3 mixture) Age (day) Concrete compressive strength(MPa, megapascals)
Cement (component 1)(kg in a m^3 mixture) 10921.74 -2481.36 -2658.35 -181.99 57.91 -888.61 -1866.15 540.99 869.15
Blast Furnace Slag (component 2)(kg in a m^3 mixture) -2481.36 7444.08 -1786.61 197.68 22.36 -1905.21 -1947.91 -241.15 194.33
Fly Ash (component 3)(kg in a m^3 mixture) -2658.35 -1786.61 4095.55 -351.30 144.25 -49.64 405.74 -624.06 -113.06
Water (component 4)(kg in a m^3 mixture) -181.99 197.68 -351.30 456.06 -83.87 -302.72 -771.57 374.50 -103.32
Superplasticizer (component 5)(kg in a m^3 mixture) 57.91 22.36 144.25 -83.87 35.68 -123.69 106.56 -72.72 36.53
Coarse Aggregate (component 6)(kg in a m^3 mixture) -888.61 -1905.21 -49.64 -302.72 -123.69 6045.66 -1112.80 -14.81 -214.23
Fine Aggregate (component 7)(kg in a m^3 mixture) -1866.15 -1947.91 405.74 -771.57 106.56 -1112.80 6428.10 -790.57 -224.01
Age (day) 540.99 -241.15 -624.06 374.50 -72.72 -14.81 -790.57 3990.44 347.06
Concrete compressive strength(MPa, megapascals) 869.15 194.33 -113.06 -103.32 36.53 -214.23 -224.01 347.06 279.08

4.2 Variance per Variabel

variances <- diag(cov_matrix)

variance_table <- data.frame(
  Variabel = names(variances),
  Variance = variances,
  Std_Dev = sqrt(variances)
) %>%
  arrange(desc(Variance))

variance_table %>%
  kable(caption = "Variance dan Standar Deviasi", 
        digits = 2,
        col.names = c("Variabel", "Variance", "Standar Deviasi")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                position = "center")
Variance dan Standar Deviasi
Variabel Variance Standar Deviasi
Cement (component 1)(kg in a m^3 mixture) Cement (component 1)(kg in a m^3 mixture) 10921.74 104.51
Blast Furnace Slag (component 2)(kg in a m^3 mixture) Blast Furnace Slag (component 2)(kg in a m^3 mixture) 7444.08 86.28
Fine Aggregate (component 7)(kg in a m^3 mixture) Fine Aggregate (component 7)(kg in a m^3 mixture) 6428.10 80.18
Coarse Aggregate (component 6)(kg in a m^3 mixture) Coarse Aggregate (component 6)(kg in a m^3 mixture) 6045.66 77.75
Fly Ash (component 3)(kg in a m^3 mixture) Fly Ash (component 3)(kg in a m^3 mixture) 4095.55 64.00
Age (day) Age (day) 3990.44 63.17
Water (component 4)(kg in a m^3 mixture) Water (component 4)(kg in a m^3 mixture) 456.06 21.36
Concrete compressive strength(MPa, megapascals) Concrete compressive strength(MPa, megapascals) 279.08 16.71
Superplasticizer (component 5)(kg in a m^3 mixture) Superplasticizer (component 5)(kg in a m^3 mixture) 35.68 5.97
ggplot(variance_table, aes(x = reorder(Variabel, Variance), y = Variance)) +
  geom_bar(stat = "identity", fill = "#3498db", alpha = 0.8) +
  coord_flip() +
  theme_minimal() +
  labs(title = "Variance per Variabel",
       x = "",
       y = "Variance") +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
    axis.text = element_text(size = 10)
  )
Variance per Variabel

Variance per Variabel

4.3 Interpretasi

  • Covariance positif: Kedua variabel cenderung bergerak searah
  • Covariance negatif: Kedua variabel cenderung bergerak berlawanan
  • Variabel dengan variance tertinggi memiliki variabilitas data terbesar

5 Eigenvalue dan Eigenvector

5.1 Eigenvalues

eigen_result <- eigen(corr_matrix)
prop_var <- eigen_result$values / sum(eigen_result$values)
cumsum_var <- cumsum(prop_var)

eigen_table <- data.frame(
  PC = paste0("PC", 1:length(eigen_result$values)),
  Eigenvalue = eigen_result$values,
  Proportion = prop_var * 100,
  Cumulative = cumsum_var * 100
)

eigen_table %>%
  kable(caption = "Eigenvalue Summary", 
        digits = 4,
        col.names = c("PC", "Eigenvalue", "Proporsi (%)", "Kumulatif (%)")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                position = "center")
Eigenvalue Summary
PC Eigenvalue Proporsi (%) Kumulatif (%)
PC1 2.2877 25.4190 25.4190
PC2 1.9365 21.5168 46.9359
PC3 1.4089 15.6547 62.5906
PC4 1.0428 11.5865 74.1771
PC5 1.0142 11.2684 85.4455
PC6 0.8474 9.4157 94.8612
PC7 0.2870 3.1884 98.0496
PC8 0.1468 1.6309 99.6805
PC9 0.0288 0.3195 100.0000

5.2 Scree Plot

ggplot(eigen_table, aes(x = PC, y = Eigenvalue, group = 1)) +
  geom_line(color = "#3498db", size = 1.2) +
  geom_point(color = "#e74c3c", size = 3) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "#27ae60", size = 1) +
  annotate("text", x = nrow(eigen_table) * 0.7, y = 1.2, 
           label = "Kaiser Criterion (λ = 1)", color = "#27ae60") +
  theme_minimal() +
  labs(
    title = "Scree Plot",
    subtitle = "Eigenvalue dari Correlation Matrix",
    x = "Principal Component",
    y = "Eigenvalue"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
    plot.subtitle = element_text(hjust = 0.5, size = 11),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )
Scree Plot - Eigenvalue vs Principal Component

Scree Plot - Eigenvalue vs Principal Component

5.3 Eigenvectors

eigen_result$vectors %>%
  as.data.frame() %>%
  setNames(paste0("PC", 1:ncol(.))) %>%
  mutate(Variabel = names(data)) %>%
  select(Variabel, everything()) %>%
  kable(caption = "Eigenvectors (Loading Matrix)", digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                font_size = 10) %>%
  scroll_box(width = "100%", height = "400px")
Eigenvectors (Loading Matrix)
Variabel PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
Cement (component 1)(kg in a m^3 mixture) -0.0411 0.5365 0.3597 -0.3098 -0.0547 -0.3899 -0.1338 -0.2984 -0.4725
Blast Furnace Slag (component 2)(kg in a m^3 mixture) -0.1630 0.1363 -0.6990 0.0763 -0.3626 0.2703 0.0048 -0.2288 -0.4512
Fly Ash (component 3)(kg in a m^3 mixture) 0.3698 -0.2684 0.0198 0.6007 0.2276 -0.3202 0.2472 -0.2553 -0.3865
Water (component 4)(kg in a m^3 mixture) -0.5641 -0.1181 -0.1203 0.0469 0.2961 -0.3062 -0.0098 0.5856 -0.3560
Superplasticizer (component 5)(kg in a m^3 mixture) 0.5361 0.2482 -0.1880 0.1659 -0.0370 -0.0828 -0.6139 0.4476 -0.0528
Coarse Aggregate (component 6)(kg in a m^3 mixture) -0.0605 -0.2248 0.5495 0.2216 -0.5455 0.3476 -0.0598 0.2431 -0.3372
Fine Aggregate (component 7)(kg in a m^3 mixture) 0.3817 -0.1871 0.0012 -0.5278 0.3845 0.4091 0.1747 0.1403 -0.4187
Age (day) -0.2619 0.2518 0.1696 0.3595 0.5285 0.5098 -0.3436 -0.2260 -0.0397
Concrete compressive strength(MPa, megapascals) 0.1072 0.6301 0.0335 0.2253 0.0003 0.1540 0.6260 0.3469 0.0606

5.4 Interpretasi

Eigenvalue dan eigenvector digunakan untuk melihat gambaran utama dari data yang memiliki banyak variabel. Eigenvalue menunjukkan seberapa besar informasi atau variasi data yang bisa dijelaskan oleh setiap principal component. Semakin besar nilai eigenvalue, semakin penting peran komponen tersebut dalam mewakili data. Sementara itu, eigenvector menunjukkan kombinasi variabel yang membentuk setiap komponen utama, sehingga bisa diketahui variabel mana saja yang paling berpengaruh dalam membentuk pola data.

n_significant <- sum(eigen_result$values > 1)
  • PC1 menjelaskan 25.42% dari total varians
  • PC2 menjelaskan 21.52% dari total varians
  • PC1 & PC2 secara kumulatif: 46.94%
  • Berdasarkan Kaiser Criterion (eigenvalue > 1): 5 PC signifikan
  • 5 PC menjelaskan 85.45% total varians Berdasarkan Kaiser Criterion dengan ketentuan eigenvalue lebih dari 1, diperoleh 5 principal components yang signifikan. Kelima komponen tersebut secara bersama-sama mampu menjelaskan 85,45% dari total varians. Dengan demikian, dimensi data dapat direduksi dari 9 variabel menjadi 5 principal components tanpa kehilangan sebagian besar informasi penting, sehingga analisis data menjadi lebih sederhana dan efisien.

6 Kesimpulan

Dari analisis Concrete Compressive Strength Dataset, dapat disimpulkan:

  1. Correlation Matrix Correlation matrix digunakan untuk melihat hubungan antar variabel dalam bentuk yang sudah distandarkan, sehingga perbandingan kekuatan hubungan antar variabel menjadi lebih mudah. Melalui matriks ini dapat diketahui variabel mana yang memiliki hubungan kuat, lemah, positif, maupun negatif. Informasi ini membantu dalam memahami keterkaitan antar faktor yang memengaruhi kekuatan tekan beton.

  2. Variance-Covariance Matrix Variance-covariance matrix memberikan gambaran mengenai seberapa besar keragaman data pada setiap variabel serta bagaimana perubahan satu variabel berkaitan dengan variabel lainnya. Nilai variance menunjukkan tingkat penyebaran data masing-masing variabel, sedangkan nilai covariance menunjukkan apakah dua variabel cenderung berubah searah atau berlawanan arah. Analisis ini penting untuk memahami karakteristik dan pola variasi data secara lebih mendalam.

  3. Principal Component Analysis (PCA) Hasil PCA menunjukkan bahwa dimensi data dapat direduksi secara signifikan tanpa kehilangan informasi yang penting. Dari 9 variabel awal, data dapat diringkas menjadi 5 principal components dengan tetap mempertahankan 85,45% informasi. Hal ini menunjukkan bahwa sebagian besar variasi data sudah dapat dijelaskan oleh lima komponen utama tersebut, sehingga struktur data menjadi lebih sederhana dan mudah dianalisis.

  4. Hasil analisis ini dapat dimanfaatkan untuk berbagai keperluan, seperti reduksi dimensi dalam pemodelan agar model menjadi lebih efisien, visualisasi data multidimensi sehingga pola data lebih mudah dipahami, feature extraction untuk memilih variabel yang paling informatif, serta identifikasi pola yang berkaitan dengan karakteristik kekuatan tekan beton.