Titanic.Dataset <- read.csv("Titanic-Dataset.csv")
data <- Titanic.Dataset[, c("Age", "SibSp", "Parch", "Fare")]
data <- na.omit(data)
dim(data)
## [1] 714 4
Only four variables were selected: Age, Siblings, Parents, and Cost. Rows containing missing values were deleted. After cleaning, the dataset consisted of 714 observations and 4 variables.
cor_matrix <- cor(data)
cor_matrix
## Age SibSp Parch Fare
## Age 1.00000000 -0.3082468 -0.1891193 0.09606669
## SibSp -0.30824676 1.0000000 0.3838199 0.13832879
## Parch -0.18911926 0.3838199 1.0000000 0.20511888
## Fare 0.09606669 0.1383288 0.2051189 1.00000000
The correlation matrix is used to observe the strength of relationships between variables.
The correlation matrix shows that the highest correlation occurs between SibSp and Parch (0.383), indicating that passengers who brought siblings or spouses also tended to bring parents or children. The Age variable has a weak to moderate negative correlation with SibSp (−0.308) and Parch (−0.189), meaning older passengers tended to travel with fewer family members. The correlation between Age and Fare is very weak (0.096), indicating that age has little influence on ticket prices. Meanwhile, SibSp–Fare (0.138) and Parch–Fare (0.205) show weak positive correlation, meaning that the more family members who accompany the passenger, the higher the ticket price tends to be.
cov_matrix <- cov(data)
cov_matrix
## Age SibSp Parch Fare
## Age 211.019125 -4.1633339 -2.3441911 73.849030
## SibSp -4.163334 0.8644973 0.3045128 6.806212
## Parch -2.344191 0.3045128 0.7281027 9.262176
## Fare 73.849030 6.8062117 9.2621760 2800.413100
The variance–covariance matrix describes how variables vary individually and together.
Fare has the largest variance (2800.41), indicating that ticket prices vary widely among passengers, while Age shows moderate variation (211.01) and SibSp (0.86) and Parch (0.72) have relatively small variances. The negative covariance between Age and SibSp (−4.16) and between Age and Parch (−2.34) suggests that older passengers tended to travel with fewer family members. Meanwhile, the positive covariance between SibSp and Parch (0.30) indicates that passengers traveling with siblings or spouses also tended to bring parents or children.
eigen_result <- eigen(cov_matrix)
Eigen Values
eigen_result$values
## [1] 2802.5636587 209.0385659 0.9438783 0.4787214
Eigen Vectors
eigen_result$vectors
## [,1] [,2] [,3] [,4]
## [1,] 0.028477552 0.99929943 -0.024018111 0.0035788596
## [2,] 0.002386349 -0.02093144 -0.773693322 0.6332099362
## [3,] 0.003280818 -0.01253786 -0.633088089 -0.7739712590
## [4,] 0.999586200 -0.02837826 0.004609234 0.0009266652
Eigen values and Eigen vectors summarize overall data variability and show which variables contribute most to that variability.
The first eigen value (2802.56) is significantly larger than the others, showing that most of the variability in the data is captured by the first component. The second eigen value (209.03) explains a moderate amount of remaining variation, while the third (0.94) and fourth (0.47) eigen values are very small, indicating that these components contribute very little to the overall variance.
The first component is dominated by Fare (0.9995), indicating that ticket price is the main factor driving variability among passengers. The second component is primarily influenced by Age (0.9992), showing that age explains the second largest portion of variation. The third and fourth components are mainly associated with SibSp and Parch, suggesting that family-related variables have relatively small influence compared to Fare and Age.