Data Cleaning

Titanic.Dataset <- read.csv("Titanic-Dataset.csv")
data <- Titanic.Dataset[, c("Age", "SibSp", "Parch", "Fare")]
data <- na.omit(data)
dim(data)
## [1] 714   4

Only four variables were selected: Age, Siblings, Parents, and Cost. Rows containing missing values were deleted. After cleaning, the dataset consisted of 714 observations and 4 variables.

a) Correlation Matrix

cor_matrix <- cor(data)
cor_matrix
##               Age      SibSp      Parch       Fare
## Age    1.00000000 -0.3082468 -0.1891193 0.09606669
## SibSp -0.30824676  1.0000000  0.3838199 0.13832879
## Parch -0.18911926  0.3838199  1.0000000 0.20511888
## Fare   0.09606669  0.1383288  0.2051189 1.00000000

The correlation matrix is used to observe the strength of relationships between variables.

The correlation matrix shows that the highest correlation occurs between SibSp and Parch (0.383), indicating that passengers who brought siblings or spouses also tended to bring parents or children. The Age variable has a weak to moderate negative correlation with SibSp (−0.308) and Parch (−0.189), meaning older passengers tended to travel with fewer family members. The correlation between Age and Fare is very weak (0.096), indicating that age has little influence on ticket prices. Meanwhile, SibSp–Fare (0.138) and Parch–Fare (0.205) show weak positive correlation, meaning that the more family members who accompany the passenger, the higher the ticket price tends to be.

b) Variance-Covariance Matrix

cov_matrix <- cov(data)
cov_matrix
##              Age      SibSp      Parch        Fare
## Age   211.019125 -4.1633339 -2.3441911   73.849030
## SibSp  -4.163334  0.8644973  0.3045128    6.806212
## Parch  -2.344191  0.3045128  0.7281027    9.262176
## Fare   73.849030  6.8062117  9.2621760 2800.413100

The variance–covariance matrix describes how variables vary individually and together.

Fare has the largest variance (2800.41), indicating that ticket prices vary widely among passengers, while Age shows moderate variation (211.01) and SibSp (0.86) and Parch (0.72) have relatively small variances. The negative covariance between Age and SibSp (−4.16) and between Age and Parch (−2.34) suggests that older passengers tended to travel with fewer family members. Meanwhile, the positive covariance between SibSp and Parch (0.30) indicates that passengers traveling with siblings or spouses also tended to bring parents or children.

c) Eigen Value dan Eigen Vector

eigen_result <- eigen(cov_matrix)

Eigen Values

eigen_result$values
## [1] 2802.5636587  209.0385659    0.9438783    0.4787214

Eigen Vectors

eigen_result$vectors
##             [,1]        [,2]         [,3]          [,4]
## [1,] 0.028477552  0.99929943 -0.024018111  0.0035788596
## [2,] 0.002386349 -0.02093144 -0.773693322  0.6332099362
## [3,] 0.003280818 -0.01253786 -0.633088089 -0.7739712590
## [4,] 0.999586200 -0.02837826  0.004609234  0.0009266652

Eigen values and Eigen vectors summarize overall data variability and show which variables contribute most to that variability.

The first eigen value (2802.56) is significantly larger than the others, showing that most of the variability in the data is captured by the first component. The second eigen value (209.03) explains a moderate amount of remaining variation, while the third (0.94) and fourth (0.47) eigen values are very small, indicating that these components contribute very little to the overall variance.

The first component is dominated by Fare (0.9995), indicating that ticket price is the main factor driving variability among passengers. The second component is primarily influenced by Age (0.9992), showing that age explains the second largest portion of variation. The third and fourth components are mainly associated with SibSp and Parch, suggesting that family-related variables have relatively small influence compared to Fare and Age.