1 Introduction

1.1 The Origin

RMS Titanic was a British passenger liner operated by the White Star Line that sank in the North Atlantic Ocean in the early morning hours of 15 April 1912, after striking an iceberg during her maiden voyage from Southampton to New York City. Of the estimated 2,224 passengers and crew aboard, more than 1,500 died, making the sinking one of modern history’s deadliest peacetime commercial marine disasters.

The name Titanic derives from the Titans of Greek mythology. Built in Belfast, Ireland, in the United Kingdom of Great Britain and Ireland (as it was then known), the RMS Titanic was the second of the three Olympic-class ocean liners—the first was the RMS Olympic and the third was the HMHS Britannic. Britannic was originally to be called Gigantic and was to be over 1,000 feet (300 m) long. They were by far the largest vessels of the British shipping company White Star Line’s fleet, which comprised 29 steamers and tenders in 1912.[14] The three ships had their genesis in a discussion in mid-1907 between the White Star Line’s chairman, J. Bruce Ismay, and the American financier J. P. Morgan, who controlled the White Star Line’s parent corporation, the International Mercantile Marine Co. (IMM).

1.2 Dimension & Facilities

Titanic was 882 feet 9 inches (269.06 m) long with a maximum breadth of 92 feet 6 inches (28.19 m). Her total height, measured from the base of the keel to the top of the bridge, was 104 feet (32 m). She measured 46,328 gross register tons and with a draught of 34 feet 7 inches (10.54 m), she displaced 52,310 tons.

The passenger facilities aboard Titanic aimed to meet the highest standards of luxury. According to Titanic’s general arrangement plans, the ship could accommodate 833 First Class Passengers, 614 in Second Class and 1,006 in Third Class, for a total passenger capacity of 2,453. In addition, her capacity for crew members exceeded 900, as most documents of her original configuration have stated that her full carrying capacity for both passengers and crew was approximately 3,547. Her interior design was a departure from that of other passenger liners, which had typically been decorated in the rather heavy style of a manor house or an English country house.

1.3 Disclaimer

This analysis was performed to get better understanding about Titanic’s passengers using the data provided. This was made to complete the Learn By Building : Programming for Data Science class.

2 Preparing the Data

The dataset being used in this analysis was consisted of the passengers’ information who boarded the ship on the day the tragedy happened.

This dataset can be found & downloaded at https://www.kaggle.com/c/titanic/overview

2.1 Data Inspection

Importing data can be done by using [read.csv] function.

# Import Data & Check Missing Values
titanic <- read.csv("Titanic/train.csv")
head(titanic)
dim(titanic)
## [1] 891  12
str(titanic)
## 'data.frame':    891 obs. of  12 variables:
##  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
##  $ Sex        : chr  "male" "female" "female" "female" ...
##  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
##  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Cabin      : chr  "" "C85" "" "C123" ...
##  $ Embarked   : chr  "S" "C" "S" "S" ...

From data inspection above, we can conclude statements as follow :

  • There were 891 passengers with 12 variables(information) provided

  • There were missing values in some of the variables

  • Some data structures were not match with the type of data given

From those findings, Data Cleansing & Coercion are needed to be done.

2.2 Data Cleaning & Coercion

From the last section we found that there were missing values from some of the variables, seen as NA.

anyNA(titanic)
## [1] TRUE
colSums(is.na(titanic))
## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0           0           0           0           0         177 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           0           0           0

Since some of the data may still missing but weren’t recorded as NA, we have to change the blank data into NA.

# Transform Missing Values into NA
titanic[titanic == ""] <- NA
colSums(is.na(titanic))
## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0           0           0           0           0         177 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           0         687           2

The result of changing blank data into NA can be seen from the function above, where it clearly shows the difference between the data before and after the changes made.

# Remove Missing Values
titanic$Cabin <- NULL
titanic$Age <- NULL
titanic$PassengerId <- NULL
which(is.na(titanic$Embarked))
## [1]  62 830

Since there were so many missing values from “Age” and “Cabin”, we’re going to remove those variables. We also removed “PassengerID” because we already have index to number the passenger. As for “Embarked” variable, because there were only 2 data missing, we just had to transform that in the next section.

# Make Values More Understandable
titanic$Survived[titanic$Survived==1]<-"Survived"
titanic$Survived[titanic$Survived==0]<-"Deceased"

titanic$Pclass[titanic$Pclass==1]<-"First Class"
titanic$Pclass[titanic$Pclass==2]<-"Second Class"
titanic$Pclass[titanic$Pclass==3]<-"Third Class"

titanic$Embarked[titanic$Embarked=="Q"]<-"Queenstown"
titanic$Embarked[titanic$Embarked=="C"]<-"Cherbourg"
titanic$Embarked[titanic$Embarked=="S"]<-"Southampton"
titanic$Embarked[is.na(titanic$Embarked)]<-"Unidentified"

titanic$Survived<-as.factor(titanic$Survived)
titanic$Pclass<-as.factor(titanic$Pclass)
titanic$Sex<-as.factor(titanic$Sex)
titanic$Embarked<-as.factor(titanic$Embarked)
titanic$SibSp<-as.factor(titanic$SibSp)
titanic$Parch<-as.factor(titanic$Parch)

After successfully changing the data structure, it’s highly recommended to check again the overall structure and missing values remaining.

str(titanic)
## 'data.frame':    891 obs. of  9 variables:
##  $ Survived: Factor w/ 2 levels "Deceased","Survived": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Pclass  : Factor w/ 3 levels "First Class",..: 3 1 3 1 3 3 1 3 3 2 ...
##  $ Name    : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
##  $ Sex     : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
##  $ SibSp   : Factor w/ 7 levels "0","1","2","3",..: 2 2 1 2 1 1 1 4 1 2 ...
##  $ Parch   : Factor w/ 7 levels "0","1","2","3",..: 1 1 1 1 1 1 1 2 3 1 ...
##  $ Ticket  : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare    : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Embarked: Factor w/ 4 levels "Cherbourg","Queenstown",..: 3 1 3 3 3 2 3 3 3 1 ...
anyNA(titanic)
## [1] FALSE

All of the structure had already match the data type and there were no missing value left. The data is ready to use.

3 Passengers Aboard

3.1 General Information

summary(titanic)
##      Survived            Pclass        Name               Sex      SibSp  
##  Deceased:549   First Class :216   Length:891         female:314   0:608  
##  Survived:342   Second Class:184   Class :character   male  :577   1:209  
##                 Third Class :491   Mode  :character                2: 28  
##                                                                    3: 16  
##                                                                    4: 18  
##                                                                    5:  5  
##                                                                    8:  7  
##  Parch      Ticket               Fare                Embarked  
##  0:678   Length:891         Min.   :  0.00   Cherbourg   :168  
##  1:118   Class :character   1st Qu.:  7.91   Queenstown  : 77  
##  2: 80   Mode  :character   Median : 14.45   Southampton :644  
##  3:  5                      Mean   : 32.20   Unidentified:  2  
##  4:  4                      3rd Qu.: 31.00                     
##  5:  5                      Max.   :512.33                     
##  6:  1

3.2 Deeper Insight

  1. Gender of Survivor and Point of Boarding from Each Class
surv<-titanic[,c("Survived","Sex","Pclass","Embarked")]
table(surv)
## , , Pclass = First Class, Embarked = Cherbourg
## 
##           Sex
## Survived   female male
##   Deceased      1   25
##   Survived     42   17
## 
## , , Pclass = Second Class, Embarked = Cherbourg
## 
##           Sex
## Survived   female male
##   Deceased      0    8
##   Survived      7    2
## 
## , , Pclass = Third Class, Embarked = Cherbourg
## 
##           Sex
## Survived   female male
##   Deceased      8   33
##   Survived     15   10
## 
## , , Pclass = First Class, Embarked = Queenstown
## 
##           Sex
## Survived   female male
##   Deceased      0    1
##   Survived      1    0
## 
## , , Pclass = Second Class, Embarked = Queenstown
## 
##           Sex
## Survived   female male
##   Deceased      0    1
##   Survived      2    0
## 
## , , Pclass = Third Class, Embarked = Queenstown
## 
##           Sex
## Survived   female male
##   Deceased      9   36
##   Survived     24    3
## 
## , , Pclass = First Class, Embarked = Southampton
## 
##           Sex
## Survived   female male
##   Deceased      2   51
##   Survived     46   28
## 
## , , Pclass = Second Class, Embarked = Southampton
## 
##           Sex
## Survived   female male
##   Deceased      6   82
##   Survived     61   15
## 
## , , Pclass = Third Class, Embarked = Southampton
## 
##           Sex
## Survived   female male
##   Deceased     55  231
##   Survived     33   34
## 
## , , Pclass = First Class, Embarked = Unidentified
## 
##           Sex
## Survived   female male
##   Deceased      0    0
##   Survived      2    0
## 
## , , Pclass = Second Class, Embarked = Unidentified
## 
##           Sex
## Survived   female male
##   Deceased      0    0
##   Survived      0    0
## 
## , , Pclass = Third Class, Embarked = Unidentified
## 
##           Sex
## Survived   female male
##   Deceased      0    0
##   Survived      0    0

From data above, we can see the detail of survivor from each category provided such as Survival, Gender, Ticket Class, and Point of Embark.

  1. Ticket Class
tick_sum<-aggregate(Fare~Pclass,titanic,sum)
tick_sum
tick_ave<-aggregate(Fare~Pclass,titanic,mean)
tick_ave
tick_med<-aggregate(Fare~Pclass,titanic,median)
tick_med
tick_max<-aggregate(Fare~Pclass,titanic,max)
tick_max
tick_min<-aggregate(Fare~Pclass,titanic,min)
tick_min

From data above we can see the total and average of ticket fare for each class in Titanic.

4 Answering Questions from Media

  1. Which gender survived the most?
no1 <- titanic[titanic$Survived=="Survived",]
summary(no1$Sex)
## female   male 
##    233    109

From 342 survived passengers, most of them were female (233 passengers).

  1. How much the total of the fare tallied from Titanic?
no2 <-sum(titanic$Fare)
no2
## [1] 28693.95

The total of fare combined were $28,693.95

  1. For passengers who boarded from Cherbourg, how many passenger didn’t bring siblings/spouses and parents/children with them?
no3<-titanic[titanic$Embarked=="Cherbourg" & titanic$SibSp==0 & titanic$Parch==0,]
nrow(no3)
## [1] 85

There were 85 passengers who boarded from Cherbourg and didn’t bring any relatives.

  1. For passengers who boarded from Southampton using Third Class ticket, how many of them were female?
no4<-titanic[titanic$Embarked=="Southampton" & titanic$Pclass=="Third Class" & titanic$Sex=="female",]
nrow(no4)
## [1] 88

There were 88 female passengers who purchased Third Class ticket and boarded from Southampton.

  1. What gender & ticket class combination with the lowest number of passengers’ survival?
no5<-titanic[titanic$Survived=="Survived",]
table_no5 <- table(no5$Sex, no5$Pclass)
table_no5
##         
##          First Class Second Class Third Class
##   female          91           70          72
##   male            45           17          47

From 342 survived passengers, the least category was male passengers who bought Second Class ticket.

  1. Which ticket class were the most purchased by men?
no6<-titanic[titanic$Sex=="male",]
table(no6$Pclass)
## 
##  First Class Second Class  Third Class 
##          122          108          347
  1. How much was the lowest and highest fare male passenger has to pay for each class from Southampton?
no7<-titanic[titanic$Sex=="male"&titanic$Embarked=="Southampton",]
no7_min<-aggregate(Fare~Pclass,no7,min)
no7_min
no7_max<-aggregate(Fare~Pclass,no7,max)
no7_max

For the lowest rate, passenger was free of charge for all of the class. For the highest rate, it’s varied for each class as presented above.

  1. For people who boarded the ship with more than 2 members of their parents/children, how many of them were survived?
no8<-titanic[titanic$Survived=="Survived",]
table_no8<-table(titanic$Parch)
table_no8
## 
##   0   1   2   3   4   5   6 
## 678 118  80   5   4   5   1
sum(table_no8[4:7])
## [1] 15

Only 15 passengers who brought their parent/children more than 2 people were alive after the tragedy happened.

  1. For female passenger, how much is the average of the ticket fare for each class?
no9<-titanic[titanic$Sex=="female",]
no9<-aggregate(Fare~Pclass,no9,mean)
no9

The highest average of the ticket fare purchased by female passenger went to First Class ticket, while the lowest was Third Class ticket.

  1. What is the name of the male passenger who bought the most expensive Third Class ticket who boarded from Queenstown?
no10<-titanic[titanic$Sex=="male"&titanic$Pclass=="Third Class"&titanic$Embarked=="Queenstown",]
no10<-no10[order(no10$Fare, decreasing = T), ]
no10[1:4,c("Name","Fare")]

There were 4 male passengers who bought the most expensive Third Class ticket boarding from Queenstown :

  • Mr. Eugene Rice

  • Mr. Arthur Rice

  • Mr. Eric Rice

  • Mr. George Hugh Rice

5 Conclusion

From data above, we can conclude as follow :

  • There were less than half of the passengers who survived.

  • Most of the survivor are female who bought Second Class ticket embarking from Southampton.

  • From all of the deceased passengers, most of them are male who bought Third Class ticket embarking from Southampton

  • More than half of the passengers bought the Third Class ticket.

  • The passengers were mostly male.

  • Most of the passengers didn’t board the ship with their siblings, spouses, parents or children.

  • The most expensive ticket purchased by passenger aboard was $512.33, while the cheapest was free (no charge).

  • Most of the passengers were boarded the ship from Southampton, while a small number were boarded from Queenstown.