RMS Titanic was a British passenger liner operated by the White Star Line that sank in the North Atlantic Ocean in the early morning hours of 15 April 1912, after striking an iceberg during her maiden voyage from Southampton to New York City. Of the estimated 2,224 passengers and crew aboard, more than 1,500 died, making the sinking one of modern history’s deadliest peacetime commercial marine disasters.
The name Titanic derives from the Titans of Greek mythology. Built in Belfast, Ireland, in the United Kingdom of Great Britain and Ireland (as it was then known), the RMS Titanic was the second of the three Olympic-class ocean liners—the first was the RMS Olympic and the third was the HMHS Britannic. Britannic was originally to be called Gigantic and was to be over 1,000 feet (300 m) long. They were by far the largest vessels of the British shipping company White Star Line’s fleet, which comprised 29 steamers and tenders in 1912.[14] The three ships had their genesis in a discussion in mid-1907 between the White Star Line’s chairman, J. Bruce Ismay, and the American financier J. P. Morgan, who controlled the White Star Line’s parent corporation, the International Mercantile Marine Co. (IMM).
Titanic was 882 feet 9 inches (269.06 m) long with a maximum breadth of 92 feet 6 inches (28.19 m). Her total height, measured from the base of the keel to the top of the bridge, was 104 feet (32 m). She measured 46,328 gross register tons and with a draught of 34 feet 7 inches (10.54 m), she displaced 52,310 tons.
The passenger facilities aboard Titanic aimed to meet the highest standards of luxury. According to Titanic’s general arrangement plans, the ship could accommodate 833 First Class Passengers, 614 in Second Class and 1,006 in Third Class, for a total passenger capacity of 2,453. In addition, her capacity for crew members exceeded 900, as most documents of her original configuration have stated that her full carrying capacity for both passengers and crew was approximately 3,547. Her interior design was a departure from that of other passenger liners, which had typically been decorated in the rather heavy style of a manor house or an English country house.
This analysis was performed to get better understanding about Titanic’s passengers using the data provided. This was made to complete the Learn By Building : Programming for Data Science class.
The dataset being used in this analysis was consisted of the passengers’ information who boarded the ship on the day the tragedy happened.
This dataset can be found & downloaded at https://www.kaggle.com/c/titanic/overview
Importing data can be done by using [read.csv] function.
## [1] 891 12
## 'data.frame': 891 obs. of 12 variables:
## $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
## $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
## $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
## $ Sex : chr "male" "female" "female" "female" ...
## $ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
## $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
## $ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
## $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## $ Cabin : chr "" "C85" "" "C123" ...
## $ Embarked : chr "S" "C" "S" "S" ...
From data inspection above, we can conclude statements as follow :
There were 891 passengers with 12 variables(information) provided
There were missing values in some of the variables
Some data structures were not match with the type of data given
From those findings, Data Cleansing & Coercion are needed to be done.
From the last section we found that there were missing values from some of the variables, seen as NA.
## [1] TRUE
## PassengerId Survived Pclass Name Sex Age
## 0 0 0 0 0 177
## SibSp Parch Ticket Fare Cabin Embarked
## 0 0 0 0 0 0
Since some of the data may still missing but weren’t recorded as NA, we have to change the blank data into NA.
## PassengerId Survived Pclass Name Sex Age
## 0 0 0 0 0 177
## SibSp Parch Ticket Fare Cabin Embarked
## 0 0 0 0 687 2
The result of changing blank data into NA can be seen from the function above, where it clearly shows the difference between the data before and after the changes made.
# Remove Missing Values
titanic$Cabin <- NULL
titanic$Age <- NULL
titanic$PassengerId <- NULL
which(is.na(titanic$Embarked))## [1] 62 830
Since there were so many missing values from “Age” and “Cabin”, we’re going to remove those variables. We also removed “PassengerID” because we already have index to number the passenger. As for “Embarked” variable, because there were only 2 data missing, we just had to transform that in the next section.
# Make Values More Understandable
titanic$Survived[titanic$Survived==1]<-"Survived"
titanic$Survived[titanic$Survived==0]<-"Deceased"
titanic$Pclass[titanic$Pclass==1]<-"First Class"
titanic$Pclass[titanic$Pclass==2]<-"Second Class"
titanic$Pclass[titanic$Pclass==3]<-"Third Class"
titanic$Embarked[titanic$Embarked=="Q"]<-"Queenstown"
titanic$Embarked[titanic$Embarked=="C"]<-"Cherbourg"
titanic$Embarked[titanic$Embarked=="S"]<-"Southampton"
titanic$Embarked[is.na(titanic$Embarked)]<-"Unidentified"
titanic$Survived<-as.factor(titanic$Survived)
titanic$Pclass<-as.factor(titanic$Pclass)
titanic$Sex<-as.factor(titanic$Sex)
titanic$Embarked<-as.factor(titanic$Embarked)
titanic$SibSp<-as.factor(titanic$SibSp)
titanic$Parch<-as.factor(titanic$Parch)After successfully changing the data structure, it’s highly recommended to check again the overall structure and missing values remaining.
## 'data.frame': 891 obs. of 9 variables:
## $ Survived: Factor w/ 2 levels "Deceased","Survived": 1 2 2 2 1 1 1 1 2 2 ...
## $ Pclass : Factor w/ 3 levels "First Class",..: 3 1 3 1 3 3 1 3 3 2 ...
## $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
## $ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
## $ SibSp : Factor w/ 7 levels "0","1","2","3",..: 2 2 1 2 1 1 1 4 1 2 ...
## $ Parch : Factor w/ 7 levels "0","1","2","3",..: 1 1 1 1 1 1 1 2 3 1 ...
## $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## $ Embarked: Factor w/ 4 levels "Cherbourg","Queenstown",..: 3 1 3 3 3 2 3 3 3 1 ...
## [1] FALSE
All of the structure had already match the data type and there were no missing value left. The data is ready to use.
## Survived Pclass Name Sex SibSp
## Deceased:549 First Class :216 Length:891 female:314 0:608
## Survived:342 Second Class:184 Class :character male :577 1:209
## Third Class :491 Mode :character 2: 28
## 3: 16
## 4: 18
## 5: 5
## 8: 7
## Parch Ticket Fare Embarked
## 0:678 Length:891 Min. : 0.00 Cherbourg :168
## 1:118 Class :character 1st Qu.: 7.91 Queenstown : 77
## 2: 80 Mode :character Median : 14.45 Southampton :644
## 3: 5 Mean : 32.20 Unidentified: 2
## 4: 4 3rd Qu.: 31.00
## 5: 5 Max. :512.33
## 6: 1
## , , Pclass = First Class, Embarked = Cherbourg
##
## Sex
## Survived female male
## Deceased 1 25
## Survived 42 17
##
## , , Pclass = Second Class, Embarked = Cherbourg
##
## Sex
## Survived female male
## Deceased 0 8
## Survived 7 2
##
## , , Pclass = Third Class, Embarked = Cherbourg
##
## Sex
## Survived female male
## Deceased 8 33
## Survived 15 10
##
## , , Pclass = First Class, Embarked = Queenstown
##
## Sex
## Survived female male
## Deceased 0 1
## Survived 1 0
##
## , , Pclass = Second Class, Embarked = Queenstown
##
## Sex
## Survived female male
## Deceased 0 1
## Survived 2 0
##
## , , Pclass = Third Class, Embarked = Queenstown
##
## Sex
## Survived female male
## Deceased 9 36
## Survived 24 3
##
## , , Pclass = First Class, Embarked = Southampton
##
## Sex
## Survived female male
## Deceased 2 51
## Survived 46 28
##
## , , Pclass = Second Class, Embarked = Southampton
##
## Sex
## Survived female male
## Deceased 6 82
## Survived 61 15
##
## , , Pclass = Third Class, Embarked = Southampton
##
## Sex
## Survived female male
## Deceased 55 231
## Survived 33 34
##
## , , Pclass = First Class, Embarked = Unidentified
##
## Sex
## Survived female male
## Deceased 0 0
## Survived 2 0
##
## , , Pclass = Second Class, Embarked = Unidentified
##
## Sex
## Survived female male
## Deceased 0 0
## Survived 0 0
##
## , , Pclass = Third Class, Embarked = Unidentified
##
## Sex
## Survived female male
## Deceased 0 0
## Survived 0 0
From data above, we can see the detail of survivor from each category provided such as Survival, Gender, Ticket Class, and Point of Embark.
From data above we can see the total and average of ticket fare for each class in Titanic.
## female male
## 233 109
From 342 survived passengers, most of them were female (233 passengers).
## [1] 28693.95
The total of fare combined were $28,693.95
## [1] 85
There were 85 passengers who boarded from Cherbourg and didn’t bring any relatives.
no4<-titanic[titanic$Embarked=="Southampton" & titanic$Pclass=="Third Class" & titanic$Sex=="female",]
nrow(no4)## [1] 88
There were 88 female passengers who purchased Third Class ticket and boarded from Southampton.
##
## First Class Second Class Third Class
## female 91 70 72
## male 45 17 47
From 342 survived passengers, the least category was male passengers who bought Second Class ticket.
##
## First Class Second Class Third Class
## 122 108 347
no7<-titanic[titanic$Sex=="male"&titanic$Embarked=="Southampton",]
no7_min<-aggregate(Fare~Pclass,no7,min)
no7_minFor the lowest rate, passenger was free of charge for all of the class. For the highest rate, it’s varied for each class as presented above.
##
## 0 1 2 3 4 5 6
## 678 118 80 5 4 5 1
## [1] 15
Only 15 passengers who brought their parent/children more than 2 people were alive after the tragedy happened.
The highest average of the ticket fare purchased by female passenger went to First Class ticket, while the lowest was Third Class ticket.
no10<-titanic[titanic$Sex=="male"&titanic$Pclass=="Third Class"&titanic$Embarked=="Queenstown",]
no10<-no10[order(no10$Fare, decreasing = T), ]
no10[1:4,c("Name","Fare")]There were 4 male passengers who bought the most expensive Third Class ticket boarding from Queenstown :
Mr. Eugene Rice
Mr. Arthur Rice
Mr. Eric Rice
Mr. George Hugh Rice
From data above, we can conclude as follow :
There were less than half of the passengers who survived.
Most of the survivor are female who bought Second Class ticket embarking from Southampton.
From all of the deceased passengers, most of them are male who bought Third Class ticket embarking from Southampton
More than half of the passengers bought the Third Class ticket.
The passengers were mostly male.
Most of the passengers didn’t board the ship with their siblings, spouses, parents or children.
The most expensive ticket purchased by passenger aboard was $512.33, while the cheapest was free (no charge).
Most of the passengers were boarded the ship from Southampton, while a small number were boarded from Queenstown.