1 About Titanic

RMS Titanic was a British passenger liner operated by the White Star Line that sank in the North Atlantic Ocean on 15 April 1912, after striking an iceberg during her maiden voyage from Southampton to New York City. Of the estimated 2,224 passengers and crew aboard, more than 1,500 died, making the sinking at the time one of the deadliest of a single ship and the deadliest peacetime sinking of a superliner or cruise ship to date.

RMS Titanic was the largest ship afloat at the time she entered service and was the second of three Olympic-class ocean liners operated by the White Star Line. She was built by the Harland and Wolff shipyard in Belfast. Thomas Andrews, chief naval architect of the shipyard at the time, died in the disaster.

1.1 Data Set Up

#Data Input
titanic<- read.csv("data/train.csv")
str(titanic)
## 'data.frame':    891 obs. of  12 variables:
##  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
##  $ Sex        : chr  "male" "female" "female" "female" ...
##  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
##  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Cabin      : chr  "" "C85" "" "C123" ...
##  $ Embarked   : chr  "S" "C" "S" "S" ...
#Tidying Data
titanic$Sex <- as.factor(titanic$Sex)
titanic$Pclass <- as.factor(titanic$Pclass)
titanic$Embarked <-  as.factor(titanic$Embarked)
#Checking NA

colSums(is.na(titanic))
## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0           0           0           0           0         177 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           0           0           0
#Drop NAs
titanic <- na.omit(titanic)
# Change value of Survived

titanic$Survived <- sapply(X = as.character(titanic$Survived), FUN = switch, 
"0" = "Not Survived",
"1" = "Survived")

table(titanic$Survived)
## 
## Not Survived     Survived 
##          424          290
titanic$Survived <- as.factor(titanic$Survived)

1.1.1 Data Overview

summary(titanic)
##   PassengerId            Survived   Pclass      Name               Sex     
##  Min.   :  1.0   Not Survived:424   1:186   Length:714         female:261  
##  1st Qu.:222.2   Survived    :290   2:173   Class :character   male  :453  
##  Median :445.0                      3:355   Mode  :character               
##  Mean   :448.6                                                             
##  3rd Qu.:677.8                                                             
##  Max.   :891.0                                                             
##       Age            SibSp            Parch           Ticket         
##  Min.   : 0.42   Min.   :0.0000   Min.   :0.0000   Length:714        
##  1st Qu.:20.12   1st Qu.:0.0000   1st Qu.:0.0000   Class :character  
##  Median :28.00   Median :0.0000   Median :0.0000   Mode  :character  
##  Mean   :29.70   Mean   :0.5126   Mean   :0.4314                     
##  3rd Qu.:38.00   3rd Qu.:1.0000   3rd Qu.:1.0000                     
##  Max.   :80.00   Max.   :5.0000   Max.   :6.0000                     
##       Fare           Cabin           Embarked
##  Min.   :  0.00   Length:714          :  2   
##  1st Qu.:  8.05   Class :character   C:130   
##  Median : 15.74   Mode  :character   Q: 28   
##  Mean   : 34.69                      S:554   
##  3rd Qu.: 33.38                              
##  Max.   :512.33

Description of the data :

PassengerId : ID of Passenger Survived : Survival - 0 = Did not survived - 1 = Survived Pclass : Ticket Class - 1 = 1st = Upper - 2 = 2nd = Middle - 3 = 3rd = Lower Name : Passenger name Sex : Passenger sex Age : Passenger age SibSp : No. of siblings boarding the Titanic Parch : No. of parents or children boarding the Titanic Ticket : Ticket Number Fare : Passenger fare Cabin : Cabin number Embarked : Port of Embarkation - C = Cherbourg - Q = Queenstown - S = Southampton

1.1.2 Survival Rate

1.1.2.1 How many survived?

#Check Data of Survived Passenger
titanic_survived <- titanic[titanic$Survived == "Survived",]
nrow(titanic_survived)
## [1] 290

342 people survived the Titanic.

1.1.2.2 Does Sex affect the rate of survival?

graphics::pie(xtabs(~ Sex, titanic_survived), main = "Sex of Survived Passengers")

female <- titanic_survived[titanic_survived$Sex == "female",]
aggregate(Age ~ Sex, female, mean)
##      Sex      Age
## 1 female 28.84772

The data shown that there are a total of 891 passenger boarding the Titanic. When the ship sank, it shown that there are only 324 passenger survived. 549 others did not survived the accident. From 324 survived passenger, there are more female passengers (233 passengers) since the procedure of evacuation is to save women and children first. The average of female passengers survived are around the age of 28 years old.

1.1.2.3 Does ticket class affect the rate of survival?

graphics::barplot(xtabs(~ Pclass, titanic_survived), main = "Ticket Class of Survived Passengers")

#Check ratio of the survived passengers of each class

summary(titanic_survived$Pclass)
##   1   2   3 
## 122  83  85
summary(titanic$Pclass)
##   1   2   3 
## 186 173 355
ratio1 <- 136 / 216 * 100
ratio1 # Upper Class
## [1] 62.96296
ratio2 <- 87 / 184 * 100
ratio2 # Middle Class
## [1] 47.28261
ratio3 <- 119 / 491 * 100
ratio3 # Lower Class
## [1] 24.23625

The data also shown that most of the survived passengers are from the Upper Class (136 people). But to be precise, from the calculation of survival rate ratio of each classes, it also shown that the Upper Class gets more than 50% survival rate, and the Lower Class only got 24% of survival rate.

1.1.2.4 Does Age affect the rate of survival?

graphics::barplot(xtabs(~ Age, titanic_survived), main = "Age of Survived Passengers")

It also shown that the mean of age of survival in the Titanic accident are around 28 years old. This could also happen because those mostly survived are female.

1.1.2.5 Does the Port of Embarkament affect the rate of survival?

summary(titanic$Embarked)
##       C   Q   S 
##   2 130  28 554
summary(titanic_survived$Embarked)
##       C   Q   S 
##   2  79   8 201
#Ratio of Survival Rate based on Port of Embarkation
ratioC <- 93 / 168 * 100
ratioC
## [1] 55.35714
ratioQ <- 30 / 77 * 100
ratioQ
## [1] 38.96104
ratioS <- 217 / 644 * 100
ratioS
## [1] 33.69565
# Ticket Class of each Port of Embarkation
cherbourg <- titanic_survived[titanic_survived$Embarked == "C",]
summary(cherbourg$Pclass)
##  1  2  3 
## 53  8 18
queenstown <- titanic_survived[titanic_survived$Embarked == "Q",]
summary(queenstown$Pclass)
## 1 2 3 
## 1 1 6
southampton <- titanic_survived[titanic_survived$Embarked == "S",]
summary(southampton$Pclass)
##  1  2  3 
## 66 74 61

From the calculation, it shows that passengers from Cherbourg, France has the highest survival rate (55%). It could also be because of female:male ratio. Other than that, most passengers from Cherbourg also in Upper Class.

female <- titanic_survived[titanic_survived$Sex == "female",]
summary(female$Embarked)
##       C   Q   S 
##   2  55   7 133
  • Cherbourg
summary(cherbourg$Sex)
## female   male 
##     55     24
summary(cherbourg$Pclass)
##  1  2  3 
## 53  8 18
nrow(cherbourg[cherbourg$Sex == "female" & cherbourg$Pclass == "1",]) # No. of survived female passengers of Cherbourg in Upper Class
## [1] 37

After more inspection, it shows interesting result. It turns out that most female passengers came from Southampton, but the reason why passengers from Cherbourg has the highest survival was because all female passengers of Cherbourg survived (64 people).

  • Queenstown
summary(queenstown$Sex)
## female   male 
##      7      1
summary(queenstown$Pclass)
## 1 2 3 
## 1 1 6
nrow(queenstown[queenstown$Sex == "female" & queenstown$Pclass == "1",])
## [1] 1

Upper Class female of Queenstown also survived.

  • Southampton
summary(southampton$Sex)
## female   male 
##    133     68
summary(southampton$Pclass)
##  1  2  3 
## 66 74 61
nrow(southampton[southampton$Sex == "female" & southampton$Pclass == "1",])
## [1] 42

Not all Upper Class female of Southampton survived.

1.1.2.6 Which female passengers survived?

nrow(female[female$Age >= 28,])  # survived female passengers (above 28 yo)
## [1] 102
nrow(female[female$Age >= 28 & female$Parch >= 1,])  # survived female passengers (above 28 yo) and with parents/child on board
## [1] 30
# Ticket Class of survived female passengers

summary(female$Pclass)
##  1  2  3 
## 82 68 47

From 233 female passengers who survived, 138 of them have the age above average (28 years old). Only 33 of them have a parents or children on board. 91 of them are from the Upper Class.

1.1.2.7 How many children under 17 years old survived?

nrow(titanic[titanic$Age < 17,]) # No of passengers under 17 y.o.
## [1] 100
nrow(titanic_survived[titanic_survived$Age < 17,]) # No. of survived passengers under 17 y.o.
## [1] 55

Only 107 children under 17 years old survived the Titanic. But this is not an accurate data as there are so many NAs in the Age column which means there are a lot of unidentified passengers.

1.2 Visualization

library(ggplot2)
theme_titanic <- function() {
  
  theme(
      plot.background = element_rect(fill = "snow2"),
      panel.grid.major.x = element_blank(),
      panel.grid.major.y = element_blank(),
      panel.background = element_rect(color = "orange2"),
      axis.ticks = element_blank(),
      legend.background = element_rect(fill = "snow2"),
      plot.title = element_text(size = 18)
  )
}

1.2.1 Who Survived?

ggplot(titanic, aes(x = Survived,
                    y = Sex,
                    fill = Sex)) +
  geom_bar(position="stack", stat="identity") +
  
  facet_grid(~ Pclass) +
  
  labs(title = "Survival based on Gender and Ticket Class",
       x = "",
       y = "") +
  
  guides(size = F) +
  
  theme_titanic()

library(viridis)
## Loading required package: viridisLite
# Based on Age, Sex and Pclass

ggplot(titanic_survived, aes(x = Pclass,
                             y = Age
                             )) +
  geom_boxplot() +
  
   geom_jitter(aes(col = Age)) +
  
   geom_boxplot(alpha=0.5) +
  
   labs(title = "Survival Based on Ticket Class and Age",
        x= "Ticket Class",
        y= "Age",
        col = "Age") +
  
  theme_titanic()

### How many survived on each Embakation Port?

library(leaflet)
# install.packages("leaflet.minicharts")
library(leaflet.minicharts)
titanic_agg <- aggregate(Survived ~ Embarked, titanic_survived, length)
titanic_agg$latitude <- c(0, 49.644577, 51.850334, 50.8965)
titanic_agg$longitude <- c(0, -1.605079, -8.294286, -1.3968)
titanic_agg <- titanic_agg[-c(1),]

1.2.2 Survived Passengers based on Embarkation Port

# Survival based on Embarkament Port
map1 <- leaflet(data = titanic_agg)
map1 <- addTiles(map1) 


map1 <- addMarkers(map1,
                   lng = ~longitude,
                   lat = ~latitude)

map1 <- addMinicharts(map = map1,
    lng = titanic_agg$longitude,
    lat = titanic_agg$latitude,
    chartdata = titanic_agg$Survived,
    showLabels = TRUE,
    width = 80
  )
map1

1.2.3 Conclusion

From all the analysis, it can be concluded that female who is around the age of 20 something has the highest chance of survival in the Titanic. Women and children traveling in Upper and Middle Class were given priority for getting into the lifeboats and most of them were saved. Upper class passengers had 62% of survival rate. The Port of Embarkament did not have a significant difference of survival rate, mostly passengers were only identified by their ticket class.

The less number of survival in Titanic tragedy also happened because of the lack of Emergency Preparation. Some specific failures in emergency preparedness before the sinking of the Titanic included: - Not enough lifeboats for all passengers and crew, perhaps because the builders considered the ship “unsinkable”! - No lifeboat drills had been conducted, and many people did not know where to go or what to do. - Many of the first lifeboats to leave the Titanic were not full and some occupants were reluctant to pull other people from the icy water for fear of capsizing their lifeboat. - The decision to abandon ship was delayed while the captain and crew assessed damage. Had the captain started evacuation earlier, before people began to panic, more lifeboats may have been filled in a more orderly evacuation.

1.3 References

https://www.kaggle.com/c/titanic/data?select=train.csv
https://www.shiftcomm.com/insights/never-let-go-titanic-survival-101/#:~:text=Port%20of%20Embarkation&text=The%20three%20ports%20were%20Queesntown,this%20region%20survived%20the%20accident
https://www.aiche.org/sites/default/files/2012-07-Beacon-English.pdf