Sinking of RMS Titanic

Below is an analysis of the dataset " Titanic Data.csv"

First we start with loading the dataset.

titanic.df <- read.csv("Titanic Data.csv")
View(titanic.df)

Number of total passengers:

nrow(titanic.df)

## [1] 889

Number of total passengers: 889

Number of passengers survived:

sum(titanic.df$Survived)

## [1] 340

Number of passengers survived: 340

Percentage of passengers survived:

sum(titanic.df$Survived)/nrow(titanic.df)*100

## [1] 38.24522

Percentage of passengers survived: 38.24%

Number of 1st class passengers survived:

xtabs( ~ Survived + Pclass, data = titanic.df)

##         Pclass
## Survived   1   2   3
##        0  80  97 372
##        1 134  87 119

Number of 1st class passengers survived: 119

Percentage of 1st class passengers survived:

prop.table(xtabs( ~ Survived + Pclass, data = titanic.df))*100

##         Pclass
## Survived         1         2         3
##        0  8.998875 10.911136 41.844769
##        1 15.073116  9.786277 13.385827

Percentage of 1st class passengers who survived: 13%.

Number of female passengers from 1st class who survived:

xtabs( ~ Survived + Pclass + Sex, data = titanic.df)

## , , Sex = female
## 
##         Pclass
## Survived   1   2   3
##        0   3   6  72
##        1  89  70  72
## 
## , , Sex = male
## 
##         Pclass
## Survived   1   2   3
##        0  77  91 300
##        1  45  17  47

Number of 1st class passenger female survivors: 72.

The percentage of survivors who were female and Percentage of female on board who survived:

library(gmodels)

## Warning: package 'gmodels' was built under R version 3.3.3

CrossTable(titanic.df$Survived,titanic.df$Sex)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  889 
## 
##  
##                     | titanic.df$Sex 
## titanic.df$Survived |    female |      male | Row Total | 
## --------------------|-----------|-----------|-----------|
##                   0 |        81 |       468 |       549 | 
##                     |    64.727 |    35.000 |           | 
##                     |     0.148 |     0.852 |     0.618 | 
##                     |     0.260 |     0.811 |           | 
##                     |     0.091 |     0.526 |           | 
## --------------------|-----------|-----------|-----------|
##                   1 |       231 |       109 |       340 | 
##                     |   104.515 |    56.514 |           | 
##                     |     0.679 |     0.321 |     0.382 | 
##                     |     0.740 |     0.189 |           | 
##                     |     0.260 |     0.123 |           | 
## --------------------|-----------|-----------|-----------|
##        Column Total |       312 |       577 |       889 | 
##                     |     0.351 |     0.649 |           | 
## --------------------|-----------|-----------|-----------|
## 
##

From the table above: The percentage of survivors who were female: 67.9%

The percentage of female on board who survived: 74%

Conducting Pearson’s Chi-squared test to test the following hypothesis:

We are using Chi-squared test because the data involved is categorical.

Hypothesis: The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.

Null Hypothesis: There is no relation between the two variables

Alternative hypothesis: There is a relation between the two variables

a <- xtabs(~Survived+Sex, data = titanic.df)
chisq.test(a)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  a
## X-squared = 258.43, df = 1, p-value < 2.2e-16

As the p-value is very small(p<0.05) we conclude that we can reject our Null hypothesis.