The sinking of the RMS Titanic occurred on the night of 14 April through to the morning of 15 April 1912 in the North Atlantic Ocean, four days into the ship’s maiden voyage from Southampton to New York City. The largest passenger liner in service at the time, Titanic had an estimated 2,224 people on board when she struck an iceberg at around 23:40 (ship’s time) on Sunday, 14 April 1912. Her sinking two hours and forty minutes later at 02:20 (05:18 GMT) on Monday, 15 April resulted in the deaths of more than 1,500 people, which made it one of the deadliest peacetime maritime disasters in history.
This is a R Markdown document highlighting some facts about the Titanic Data.
Read the Titanic data set into R. Create a dataframe called “titanic”.
Hints:
Place the CSV data into a folder of your choice in your computer. Change the working directory to the folder in which your dataset is located. Use the read.csv() function in R to read the data and store it in a dataframe called “titanic”. Use the View() function in R to view the dataframe in R Confirm that the View() output in R matches the data that you saw in Excel.
setwd("D:/manipal-year2/internship/IIML_dataAnalytics/Datasets")
titanic.df <- read.csv(paste("Titanic Data.csv",sep=""))
View(titanic.df)
Use R to count the total number of passengers on board the Titanic (total number of rows in the dataset).
dimensions <- dim(titanic.df)
dimensions[1]
## [1] 889
Use R to count the number of passengers who survived the sinking of the Titanic (the rows that have Survived = 1).
table(titanic.df$Survived)[2]
## 1
## 340
Use R to measure the percentage of passengers who survived the sinking of the Titanic (percentage of people that have Survived = 1).
prop.table(table(titanic.df$Survived))[2]*100
## 1
## 38.24522
Use R to count the number of first-class passengers who survived the sinking of the Titanic (number of people with Survived = 1 and Pclass = 1).
firstclass.survived <- xtabs(~ Pclass+Survived,data=titanic.df)
firstclass.survived
## Survived
## Pclass 0 1
## 1 80 134
## 2 97 87
## 3 372 119
firstclass.survived[1,2]
## [1] 134
Use R to measure the percentage of first-class passengers who survived the sinking of the Titanic (percentage of people with Pclass = 1 and Survived = 1).
firstclass.sur.prop <- prop.table(xtabs(~ Pclass+Survived,data=titanic.df),1)
firstclass.sur.prop
## Survived
## Pclass 0 1
## 1 0.3738318 0.6261682
## 2 0.5271739 0.4728261
## 3 0.7576375 0.2423625
firstclass.sur.prop[1,2]*100
## [1] 62.61682
Use R to count the number of females from First-Class who survived the sinking of the Titanic (number of people with Sex= female, Survived=1 and Pclass = 1).
firstclass.sur.fem <- xtabs(~ Pclass+Sex+Survived,data=titanic.df)
firstclass.sur.fem
## , , Survived = 0
##
## Sex
## Pclass female male
## 1 3 77
## 2 6 91
## 3 72 300
##
## , , Survived = 1
##
## Sex
## Pclass female male
## 1 89 45
## 2 70 17
## 3 72 47
firstclass.sur.fem[1,1,2]
## [1] 89
Use R to measure the percentage of survivors who were female (percentage of people with Survived = 1 and Sex= female).
sur.fem.perc <- prop.table(xtabs(~ Survived+Sex,data=titanic.df),1)
sur.fem.perc
## Sex
## Survived female male
## 0 0.1475410 0.8524590
## 1 0.6794118 0.3205882
sur.fem.perc[2,1]*100
## [1] 67.94118
Use R to measure the percentage of females on board the Titanic who survived(percentage of people with Sex= female and Survived = 1).
sur.fem.perc <- prop.table(xtabs(~ Survived+Sex,data=titanic.df),2)
sur.fem.perc
## Sex
## Survived female male
## 0 0.2596154 0.8110919
## 1 0.7403846 0.1889081
sur.fem.perc[2,1]*100
## [1] 74.03846
Run a Pearson’s Chi-squared test to test the following hypothesis:
Hypothesis: The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.
mytable <- xtabs(~ Survived+Sex,data=titanic.df)
chisq.test(mytable)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: mytable
## X-squared = 258.43, df = 1, p-value < 2.2e-16
The Pearson’s chi-squared test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. This output suggests a relationship between the Sex of the passenger and if the passenger survived(p < 0.01). Since the probability is small (p < 0.01), we reject the Null hypothesis that Sex and Survived are independent.