Sinking of the RMS Titanic

The sinking of the RMS Titanic occurred on the night of 14 April through to the morning of 15 April 1912 in the North Atlantic Ocean, four days into the ship’s maiden voyage from Southampton to New York City. The largest passenger liner in service at the time, Titanic had an estimated 2,224 people on board when she struck an iceberg at around 23:40 (ship’s time) on Sunday, 14 April 1912. Her sinking two hours and forty minutes later at 02:20 (05:18 GMT) on Monday, 15 April resulted in the deaths of more than 1,500 people, which made it one of the deadliest peacetime maritime disasters in history.

This is a R Markdown document highlighting some facts about the Titanic Data.

Reading the dataset (Task 2(b))

Read the Titanic data set into R. Create a dataframe called “titanic”.

Hints:
Place the CSV data into a folder of your choice in your computer. Change the working directory to the folder in which your dataset is located. Use the read.csv() function in R to read the data and store it in a dataframe called “titanic”. Use the View() function in R to view the dataframe in R Confirm that the View() output in R matches the data that you saw in Excel.

setwd("D:/manipal-year2/internship/IIML_dataAnalytics/Datasets")
titanic.df <- read.csv(paste("Titanic Data.csv",sep=""))
View(titanic.df)

Task 3 (a)

Use R to count the total number of passengers on board the Titanic (total number of rows in the dataset).

dimensions <- dim(titanic.df)
dimensions[1]

## [1] 889

TASK 3 (b)

Use R to count the number of passengers who survived the sinking of the Titanic (the rows that have Survived = 1).

table(titanic.df$Survived)[2]

##   1 
## 340

TASK 3 (c)

Use R to measure the percentage of passengers who survived the sinking of the Titanic (percentage of people that have Survived = 1).

prop.table(table(titanic.df$Survived))[2]*100

##        1 
## 38.24522

TASK 3 (d)

Use R to count the number of first-class passengers who survived the sinking of the Titanic (number of people with Survived = 1 and Pclass = 1).

firstclass.survived <- xtabs(~ Pclass+Survived,data=titanic.df)
firstclass.survived

##       Survived
## Pclass   0   1
##      1  80 134
##      2  97  87
##      3 372 119

firstclass.survived[1,2]

## [1] 134

TASK 3 (e)

Use R to measure the percentage of first-class passengers who survived the sinking of the Titanic (percentage of people with Pclass = 1 and Survived = 1).

firstclass.sur.prop <- prop.table(xtabs(~ Pclass+Survived,data=titanic.df),1)
firstclass.sur.prop

##       Survived
## Pclass         0         1
##      1 0.3738318 0.6261682
##      2 0.5271739 0.4728261
##      3 0.7576375 0.2423625

firstclass.sur.prop[1,2]*100

## [1] 62.61682

TASK 3 (f)

Use R to count the number of females from First-Class who survived the sinking of the Titanic (number of people with Sex= female, Survived=1 and Pclass = 1).

firstclass.sur.fem <- xtabs(~ Pclass+Sex+Survived,data=titanic.df)
firstclass.sur.fem

## , , Survived = 0
## 
##       Sex
## Pclass female male
##      1      3   77
##      2      6   91
##      3     72  300
## 
## , , Survived = 1
## 
##       Sex
## Pclass female male
##      1     89   45
##      2     70   17
##      3     72   47

firstclass.sur.fem[1,1,2]

## [1] 89

TASK 3 (g)

Use R to measure the percentage of survivors who were female (percentage of people with Survived = 1 and Sex= female).

sur.fem.perc <- prop.table(xtabs(~ Survived+Sex,data=titanic.df),1)
sur.fem.perc

##         Sex
## Survived    female      male
##        0 0.1475410 0.8524590
##        1 0.6794118 0.3205882

sur.fem.perc[2,1]*100

## [1] 67.94118

TASK 3 (h)

Use R to measure the percentage of females on board the Titanic who survived(percentage of people with Sex= female and Survived = 1).

sur.fem.perc <- prop.table(xtabs(~ Survived+Sex,data=titanic.df),2)
sur.fem.perc

##         Sex
## Survived    female      male
##        0 0.2596154 0.8110919
##        1 0.7403846 0.1889081

sur.fem.perc[2,1]*100

## [1] 74.03846

TASK 3 (i)

Run a Pearson’s Chi-squared test to test the following hypothesis:

Hypothesis: The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.

mytable <- xtabs(~ Survived+Sex,data=titanic.df)
chisq.test(mytable)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  mytable
## X-squared = 258.43, df = 1, p-value < 2.2e-16

The Pearson’s chi-squared test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. This output suggests a relationship between the Sex of the passenger and if the passenger survived(p < 0.01). Since the probability is small (p < 0.01), we reject the Null hypothesis that Sex and Survived are independent.