Dokumen ini berisi analisis statistik multivariat terhadap dataset Titanic dengan fokus pada:
Variabel yang dianalisis: Age, SibSp,
Parch, dan Fare.
# Membaca dataset Titanic
titanic <- read.csv("Titanic-Dataset.csv")
# Menampilkan struktur data
str(titanic)## 'data.frame': 891 obs. of 12 variables:
## $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
## $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
## $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
## $ Sex : chr "male" "female" "female" "female" ...
## $ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
## $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
## $ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
## $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## $ Cabin : chr "" "C85" "" "C123" ...
## $ Embarked : chr "S" "C" "S" "S" ...
## PassengerId Survived Pclass
## 1 1 0 3
## 2 2 1 1
## 3 3 1 3
## 4 4 1 1
## 5 5 0 3
## 6 6 0 3
## 7 7 0 1
## 8 8 0 3
## 9 9 1 3
## 10 10 1 2
## Name Sex Age SibSp Parch
## 1 Braund, Mr. Owen Harris male 22 1 0
## 2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0
## 3 Heikkinen, Miss. Laina female 26 0 0
## 4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0
## 5 Allen, Mr. William Henry male 35 0 0
## 6 Moran, Mr. James male NA 0 0
## 7 McCarthy, Mr. Timothy J male 54 0 0
## 8 Palsson, Master. Gosta Leonard male 2 3 1
## 9 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2
## 10 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0
## Ticket Fare Cabin Embarked
## 1 A/5 21171 7.2500 S
## 2 PC 17599 71.2833 C85 C
## 3 STON/O2. 3101282 7.9250 S
## 4 113803 53.1000 C123 S
## 5 373450 8.0500 S
## 6 330877 8.4583 Q
## 7 17463 51.8625 E46 S
## 8 349909 21.0750 S
## 9 347742 11.1333 S
## 10 237736 30.0708 C
Memilih 4 kolom yang akan dianalisis: Age,
SibSp, Parch, dan Fare.
# Memilih kolom yang diperlukan
data_selected <- titanic[, c("Age", "SibSp", "Parch", "Fare")]
# Menampilkan ringkasan statistik
summary(data_selected)## Age SibSp Parch Fare
## Min. : 0.42 Min. :0.000 Min. :0.0000 Min. : 0.00
## 1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.: 7.91
## Median :28.00 Median :0.000 Median :0.0000 Median : 14.45
## Mean :29.70 Mean :0.523 Mean :0.3816 Mean : 32.20
## 3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.: 31.00
## Max. :80.00 Max. :8.000 Max. :6.0000 Max. :512.33
## NA's :177
## Jumlah Missing Value:
## Age SibSp Parch Fare
## 177 0 0 0
# Hapus baris dengan missing value
data_clean <- na.omit(data_selected)
# Informasi dimensi data
cat("\n=== DIMENSI DATA ===\n")##
## === DIMENSI DATA ===
## Sebelum pembersihan: 891 baris
## Sesudah pembersihan: 714 baris
## Baris terhapus: 177 baris
## Age SibSp Parch Fare
## Min. : 0.42 Min. :0.0000 Min. :0.0000 Min. : 0.00
## 1st Qu.:20.12 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 8.05
## Median :28.00 Median :0.0000 Median :0.0000 Median : 15.74
## Mean :29.70 Mean :0.5126 Mean :0.4314 Mean : 34.69
## 3rd Qu.:38.00 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 33.38
## Max. :80.00 Max. :5.0000 Max. :6.0000 Max. :512.33
## Age SibSp Parch Fare
## Age 1.00000000 -0.3082468 -0.1891193 0.09606669
## SibSp -0.30824676 1.0000000 0.3838199 0.13832879
## Parch -0.18911926 0.3838199 1.0000000 0.20511888
## Fare 0.09606669 0.1383288 0.2051189 1.00000000
# Membuat heatmap matriks korelasi
library(corrplot)
corrplot(correlation_matrix,
method = "color",
type = "upper",
addCoef.col = "black",
tl.col = "black",
tl.srt = 45,
title = "Matriks Korelasi - Dataset Titanic",
mar = c(0,0,2,0))Matriks korelasi menunjukkan hubungan linear antara variabel dengan nilai berkisar dari -1 hingga +1:
## 1. Age vs SibSp: -0.3082
## → Korelasi negatif lemah
## → Penumpang lebih tua cenderung memiliki lebih sedikit saudara/pasangan
## 2. SibSp vs Parch: 0.3838
## → Korelasi positif sedang (TERKUAT)
## → Penumpang dengan saudara/pasangan cenderung juga bersama orang tua/anak
## 3. Parch vs Fare: 0.2051
## → Korelasi positif sedang
## → Penumpang dengan orang tua/anak membayar tarif lebih tinggi
# Menghitung matriks variance-covariance
covariance_matrix <- cov(data_clean)
print(covariance_matrix)## Age SibSp Parch Fare
## Age 211.019125 -4.1633339 -2.3441911 73.849030
## SibSp -4.163334 0.8644973 0.3045128 6.806212
## Parch -2.344191 0.3045128 0.7281027 9.262176
## Fare 73.849030 6.8062117 9.2621760 2800.413100
# Visualisasi varian (diagonal)
variances <- diag(covariance_matrix)
barplot(variances,
names.arg = names(variances),
main = "Varian Setiap Variabel",
ylab = "Varian",
col = "steelblue",
las = 1)Matriks Variance-Covariance menunjukkan:
## 1. Varian Age: 211.02
## → Penyebaran usia penumpang
## 2. Varian SibSp: 0.86
## → Variabilitas jumlah saudara/pasangan
## 3. Varian Parch: 0.73
## → Variabilitas jumlah orang tua/anak
## 4. Varian Fare: 2800.41 ⭐ TERTINGGI
## → Variabilitas tarif sangat tinggi
## → Perbedaan tarif antar penumpang sangat besar
# Menghitung eigenvalue dan eigenvector dari matriks korelasi
eigen_result <- eigen(correlation_matrix)
cat("=== EIGENVALUES ===\n")## === EIGENVALUES ===
## [1] 1.6367503 1.1071770 0.6694052 0.5866676
##
## === EIGENVECTORS ===
## [,1] [,2] [,3] [,4]
## [1,] 0.4388714 -0.5962415 0.56095237 0.37043268
## [2,] -0.6250770 0.0732461 0.05500006 0.77517016
## [3,] -0.5908590 -0.1774532 0.60558695 -0.50265342
## [4,] -0.2599159 -0.7795136 -0.56175785 -0.09607493
# Membuat tabel informatif
eigenvalue_df <- data.frame(
Komponen = paste0("PC", 1:length(eigen_result$values)),
Eigenvalue = round(eigen_result$values, 4),
Proporsi_Varian = round(eigen_result$values / sum(eigen_result$values) * 100, 2),
Kumulatif = round(cumsum(eigen_result$values / sum(eigen_result$values) * 100), 2)
)
knitr::kable(eigenvalue_df,
caption = "Eigenvalue dan Proporsi Varian",
align = "c")| Komponen | Eigenvalue | Proporsi_Varian | Kumulatif |
|---|---|---|---|
| PC1 | 1.6368 | 40.92 | 40.92 |
| PC2 | 1.1072 | 27.68 | 68.60 |
| PC3 | 0.6694 | 16.74 | 85.33 |
| PC4 | 0.5867 | 14.67 | 100.00 |
plot(eigen_result$values,
type = "b",
xlab = "Komponen",
ylab = "Eigenvalue",
main = "Scree Plot - Eigenvalue per Komponen",
col = "blue",
pch = 19,
lwd = 2,
cex = 1.5)
abline(h = 1, col = "red", lty = 2, lwd = 2)
legend("topright",
legend = "Kaiser Criterion (Eigenvalue = 1)",
col = "red",
lty = 2,
lwd = 2)
grid()barplot(eigenvalue_df$Proporsi_Varian,
names.arg = eigenvalue_df$Komponen,
xlab = "Komponen",
ylab = "Proporsi Varian (%)",
main = "Proporsi Varian yang Dijelaskan",
col = c("steelblue", "orange", "green", "purple"),
ylim = c(0, max(eigenvalue_df$Proporsi_Varian) * 1.2))
text(x = 1:4 * 1.2 - 0.5,
y = eigenvalue_df$Proporsi_Varian + 2,
labels = paste0(round(eigenvalue_df$Proporsi_Varian, 1), "%"))plot(eigenvalue_df$Kumulatif,
type = "b",
xlab = "Jumlah Komponen",
ylab = "Varian Kumulatif (%)",
main = "Varian Kumulatif",
col = "darkgreen",
pch = 19,
lwd = 2,
cex = 1.5,
ylim = c(0, 100))
abline(h = 80, col = "red", lty = 2, lwd = 2)
legend("bottomright",
legend = "Target 80% Varian",
col = "red",
lty = 2,
lwd = 2)
grid()# Tabel loading
loadings_df <- data.frame(
Variabel = rownames(correlation_matrix),
PC1 = round(eigen_result$vectors[, 1], 4),
PC2 = round(eigen_result$vectors[, 2], 4),
PC3 = round(eigen_result$vectors[, 3], 4),
PC4 = round(eigen_result$vectors[, 4], 4)
)
knitr::kable(loadings_df,
caption = "Loading Variabel pada Setiap Komponen",
align = "c")| Variabel | PC1 | PC2 | PC3 | PC4 |
|---|---|---|---|---|
| Age | 0.4389 | -0.5962 | 0.5610 | 0.3704 |
| SibSp | -0.6251 | 0.0732 | 0.0550 | 0.7752 |
| Parch | -0.5909 | -0.1775 | 0.6056 | -0.5027 |
| Fare | -0.2599 | -0.7795 | -0.5618 | -0.0961 |
# Menghitung scores untuk PC1 dan PC2
pca_scores <- as.matrix(scale(data_clean)) %*% eigen_result$vectors[, 1:2]
# Membuat biplot
plot(pca_scores[, 1], pca_scores[, 2],
xlab = paste0("PC1 (", round(eigenvalue_df$Proporsi_Varian[1], 1), "%)"),
ylab = paste0("PC2 (", round(eigenvalue_df$Proporsi_Varian[2], 1), "%)"),
main = "Biplot: Sebaran Data pada 2 Komponen Utama",
pch = 20,
col = rgb(0, 0, 1, 0.3),
cex = 0.8)
abline(h = 0, v = 0, col = "gray", lty = 2)
grid()
# Menambahkan vektor loading
arrows(0, 0,
eigen_result$vectors[, 1] * 3,
eigen_result$vectors[, 2] * 3,
col = "red",
lwd = 2,
length = 0.1)
text(eigen_result$vectors[, 1] * 3.5,
eigen_result$vectors[, 2] * 3.5,
labels = rownames(correlation_matrix),
col = "red",
font = 2)1. Eigenvalue:
2. Eigenvector:
## 📊 INTERPRETASI HASIL:
## 1. Komponen Utama 1 (PC1):
## • Eigenvalue: 1.6368
## • Proporsi Varian: 40.92 %
## • Dimensi utama variabilitas data
## 2. Komponen Utama 2 (PC2):
## • Eigenvalue: 1.1072
## • Proporsi Varian: 27.68 %
## 3. Reduksi Dimensi:
## • 2 komponen pertama menjelaskan: 68.6 % varian
## • Efektif: 4 variabel → 2 komponen
## • Kehilangan informasi minimal!
## 4. Berdasarkan Kaiser Criterion (eigenvalue > 1):
significant_components <- sum(eigen_result$values > 1)
cat(" • Jumlah komponen signifikan:", significant_components, "\n")## • Jumlah komponen signifikan: 2
| Age | SibSp | Parch | Fare | |
|---|---|---|---|---|
| Min. : 0.42 | Min. :0.0000 | Min. :0.0000 | Min. : 0.00 | |
| 1st Qu.:20.12 | 1st Qu.:0.0000 | 1st Qu.:0.0000 | 1st Qu.: 8.05 | |
| Median :28.00 | Median :0.0000 | Median :0.0000 | Median : 15.74 | |
| Mean :29.70 | Mean :0.5126 | Mean :0.4314 | Mean : 34.69 | |
| 3rd Qu.:38.00 | 3rd Qu.:1.0000 | 3rd Qu.:1.0000 | 3rd Qu.: 33.38 | |
| Max. :80.00 | Max. :5.0000 | Max. :6.0000 | Max. :512.33 |
Analisis ini menunjukkan bahwa:
## R version 4.5.2 (2025-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_Indonesia.utf8 LC_CTYPE=English_Indonesia.utf8
## [3] LC_MONETARY=English_Indonesia.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Indonesia.utf8
##
## time zone: Asia/Jakarta
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] corrplot_0.95
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.56
## [5] cachem_1.1.0 knitr_1.51 htmltools_0.5.9 rmarkdown_2.30
## [9] lifecycle_1.0.5 cli_3.6.5 sass_0.4.10 jquerylib_0.1.4
## [13] compiler_4.5.2 tools_4.5.2 evaluate_1.0.5 bslib_0.10.0
## [17] yaml_2.3.12 rlang_1.1.7 jsonlite_2.0.0
Analisis Selesai
Dataset: Titanic | Variabel: Age, SibSp, Parch, Fare
Observasi: 714 baris