About Titanic

RMS Titanic was a British passenger liner operated by the White Star Line that sank in the North Atlantic Ocean on 15 April 1912, after striking an iceberg during her maiden voyage from Southampton to New York City. Of the estimated 2,224 passengers and crew aboard, more than 1,500 died, making the sinking at the time one of the deadliest of a single ship and the deadliest peacetime sinking of a superliner or cruise ship to date.

RMS Titanic was the largest ship afloat at the time she entered service and was the second of three Olympic-class ocean liners operated by the White Star Line. She was built by the Harland and Wolff shipyard in Belfast. Thomas Andrews, chief naval architect of the shipyard at the time, died in the disaster.

#Data Input
titanic<- read.csv("data/train.csv")
str(titanic)
## 'data.frame':    891 obs. of  12 variables:
##  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
##  $ Sex        : chr  "male" "female" "female" "female" ...
##  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
##  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Cabin      : chr  "" "C85" "" "C123" ...
##  $ Embarked   : chr  "S" "C" "S" "S" ...
#Tidying Data
colSums(is.na(titanic))
## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0           0           0           0           0         177 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           0           0           0
titanic$Sex <- as.factor(titanic$Sex)
titanic$Survived <- as.factor(titanic$Survived)
titanic$Pclass <- as.factor(titanic$Pclass)
titanic$Embarked <- as.factor(titanic$Embarked)

Even though there is an NA value in Age columns, it will be ignored as it is for a further analysis.

Data Overview

summary(titanic)
##   PassengerId    Survived Pclass      Name               Sex     
##  Min.   :  1.0   0:549    1:216   Length:891         female:314  
##  1st Qu.:223.5   1:342    2:184   Class :character   male  :577  
##  Median :446.0            3:491   Mode  :character               
##  Mean   :446.0                                                   
##  3rd Qu.:668.5                                                   
##  Max.   :891.0                                                   
##                                                                  
##       Age            SibSp           Parch           Ticket         
##  Min.   : 0.42   Min.   :0.000   Min.   :0.0000   Length:891        
##  1st Qu.:20.12   1st Qu.:0.000   1st Qu.:0.0000   Class :character  
##  Median :28.00   Median :0.000   Median :0.0000   Mode  :character  
##  Mean   :29.70   Mean   :0.523   Mean   :0.3816                     
##  3rd Qu.:38.00   3rd Qu.:1.000   3rd Qu.:0.0000                     
##  Max.   :80.00   Max.   :8.000   Max.   :6.0000                     
##  NA's   :177                                                        
##       Fare           Cabin           Embarked
##  Min.   :  0.00   Length:891          :  2   
##  1st Qu.:  7.91   Class :character   C:168   
##  Median : 14.45   Mode  :character   Q: 77   
##  Mean   : 32.20                      S:644   
##  3rd Qu.: 31.00                              
##  Max.   :512.33                              
## 

Description of the data :

PassengerId : ID of Passenger Survived : Survival - 0 = Did not survived - 1 = Survived Pclass : Ticket Class - 1 = 1st = Upper - 2 = 2nd = Middle - 3 = 3rd = Lower Name : Passenger name Sex : Passenger sex Age : Passenger age SibSp : No. of siblings boarding the Titanic Parch : No. of parents or children boarding the Titanic Ticket : Ticket Number Fare : Passenger fare Cabin : Cabin number Embarked : Port of Embarkation - C = Cherbourg - Q = Queenstown - S = Southampton

Survival Rate

How many survived?

#Check Data of Survived Passenger
titanic_survived <- titanic[titanic$Survived == "1",]
nrow(titanic_survived)
## [1] 342

342 people survived the Titanic.

Does Sex affect the rate of survival?

graphics::pie(xtabs(~ Sex, titanic_survived), main = "Sex of Survived Passengers")

female <- titanic_survived[titanic_survived$Sex == "female",]
aggregate(Age ~ Sex, female, mean)
##      Sex      Age
## 1 female 28.84772

The data shown that there are a total of 891 passenger boarding the Titanic. When the ship sank, it shown that there are only 324 passenger survived. 549 others did not survived the accident. From 324 survived passenger, there are more female passengers (233 passengers) since the procedure of evacuation is to save women and children first. The average of female passengers survived are around the age of 28 years old.

Does ticket class affect the rate of survival?

graphics::barplot(xtabs(~ Pclass, titanic_survived), main = "Ticket Class of Survived Passengers")

#Check ratio of the survived passengers of each class

summary(titanic_survived$Pclass)
##   1   2   3 
## 136  87 119
summary(titanic$Pclass)
##   1   2   3 
## 216 184 491
ratio1 <- 136 / 216 * 100
ratio1 # Upper Class
## [1] 62.96296
ratio2 <- 87 / 184 * 100
ratio2 # Middle Class
## [1] 47.28261
ratio3 <- 119 / 491 * 100
ratio3 # Lower Class
## [1] 24.23625

The data also shown that most of the survived passengers are from the Upper Class (136 people). But to be precise, from the calculation of survival rate ratio of each classes, it also shown that the Upper Class gets more than 50% survival rate, and the Lower Class only got 24% of survival rate.

Does Age affect the rate of survival?

graphics::barplot(xtabs(~ Age, titanic_survived), main = "Age of Survived Passengers")

It also shown that the mean of age of survival in the Titanic accident are around 28 years old. This could also happen because those mostly survived are female.

Does the Port of Embarkament affect the rate of survival?

summary(titanic$Embarked)
##       C   Q   S 
##   2 168  77 644
summary(titanic_survived$Embarked)
##       C   Q   S 
##   2  93  30 217
#Ratio of Survival Rate based on Port of Embarkament
ratioC <- 93 / 168 * 100
ratioC
## [1] 55.35714
ratioQ <- 30 / 77 * 100
ratioQ
## [1] 38.96104
ratioS <- 217 / 644 * 100
ratioS
## [1] 33.69565
# Ticket Class of each Port of Embarkament
cherbourg <- titanic_survived[titanic_survived$Embarked == "C",]
summary(cherbourg$Pclass)
##  1  2  3 
## 59  9 25
queenstown <- titanic_survived[titanic_survived$Embarked == "Q",]
summary(queenstown$Pclass)
##  1  2  3 
##  1  2 27
southampton <- titanic_survived[titanic_survived$Embarked == "S",]
summary(southampton$Pclass)
##  1  2  3 
## 74 76 67

From the calculation, it shows that passengers from Cherbourg, France has the highest survival rate (55%). It could also be because of female:male ratio. Other than that, most passengers from Cherbourg also in Upper Class.

female <- titanic_survived[titanic_survived$Sex == "female",]
summary(female$Embarked)
##       C   Q   S 
##   2  64  27 140

Cherbourg

summary(cherbourg$Sex)
## female   male 
##     64     29
summary(cherbourg$Pclass)
##  1  2  3 
## 59  9 25
nrow(cherbourg[cherbourg$Sex == "female" & cherbourg$Pclass == "1",]) # No. of survived female passengers of Cherbourg in Upper Class
## [1] 42

After more inspection, it shows interesting result. It turns out that most female passengers came from Southampton, but the reason why passengers from Cherbourg has the highest survival was because all female passengers of Cherbourg survived (64 people).

Queenstown

summary(queenstown$Sex)
## female   male 
##     27      3
summary(queenstown$Pclass)
##  1  2  3 
##  1  2 27
nrow(queenstown[queenstown$Sex == "female" & queenstown$Pclass == "1",])
## [1] 1

Upper Class female of Queenstown also survived.

Southampton

summary(southampton$Sex)
## female   male 
##    140     77
summary(southampton$Pclass)
##  1  2  3 
## 74 76 67
nrow(southampton[southampton$Sex == "female" & southampton$Pclass == "1",])
## [1] 46

Not all Upper Class female of Southampton survived.

Which female passengers survived?

nrow(female[female$Age >= 28,])  # survived female passengers (above 28 yo)
## [1] 138
nrow(female[female$Age >= 28 & female$Parch >= 1,])  # survived female passengers (above 28 yo) and with parents/child on board
## [1] 33
# Ticket Class of survived female passengers

summary(female$Pclass)
##  1  2  3 
## 91 70 72

From 233 female passengers who survived, 138 of them have the age above average (28 years old). Only 33 of them have a parents or children on board. 91 of them are from the Upper Class.

How many children under 17 years old survived?

nrow(titanic[titanic$Age < 17,]) # No of passengers under 17 y.o.
## [1] 277
nrow(titanic_survived[titanic_survived$Age < 17,]) # No. of survived passengers under 17 y.o.
## [1] 107

Only 107 children under 17 years old survived the Titanic. But this is not an accurate data as there are so many NAs in the Age column which means there are a lot of unidentified passengers.

Conclusion

From all the analysis, it can be concluded that female who is around the age of 20 something has the highest chance of survival in the Titanic. Women and children traveling in Upper and Middle Class were given priority for getting into the lifeboats and most of them were saved. Upper class passengers had 62% of survival rate. The Port of Embarkament did not have a significant difference of survival rate, mostly passengers were only identified by their ticket class.

The less number of survival in Titanic tragedy also happened because of the lack of Emergency Preparation. Some specific failures in emergency preparedness before the sinking of the Titanic included: - Not enough lifeboats for all passengers and crew, perhaps because the builders considered the ship “unsinkable”! - No lifeboat drills had been conducted, and many people did not know where to go or what to do. - Many of the first lifeboats to leave the Titanic were not full and some occupants were reluctant to pull other people from the icy water for fear of capsizing their lifeboat. - The decision to abandon ship was delayed while the captain and crew assessed damage. Had the captain started evacuation earlier, before people began to panic, more lifeboats may have been filled in a more orderly evacuation.

References

https://www.kaggle.com/c/titanic/data?select=train.csv
https://www.shiftcomm.com/insights/never-let-go-titanic-survival-101/#:~:text=Port%20of%20Embarkation&text=The%20three%20ports%20were%20Queesntown,this%20region%20survived%20the%20accident
https://www.aiche.org/sites/default/files/2012-07-Beacon-English.pdf