Story of Titanic:

The sinking of the RMS Titanic occurred on the night of 14 April through to the morning of 15 April 1912 in the North Atlantic Ocean, four days into the ship’s maiden voyage from Southampton to New York City. The largest passenger liner in service at the time, Titanic had an estimated 2,224 people on board when she struck an iceberg at around 23:40 (ship’s time) on Sunday, 14 April 1912. Her sinking two hours and forty minutes later at 02:20 (05:18 GMT) on Monday, 15 April resulted in the deaths of more than 1,500 people, which made it one of the deadliest peacetime maritime disasters in history.

Data Analysis of Titanic Dataset:

1. Task 2b:

Reading of data into titanic dataframe

titanic <- read.csv(paste("Titanic.data.csv", sep = "")) 

2. Task 2b:

View titanic dataframe, can verify that it is same as original csv.

View(titanic) 

3. Task 3a:

Total Number of passengers = 889.

dim(titanic)
## [1] 889   8

4. Task 3a:

Total Number of passengers = 889.

library(psych)
describe(titanic) 
##           vars   n  mean    sd median trimmed   mad min    max  range
## Survived     1 889  0.38  0.49   0.00    0.35  0.00 0.0   1.00   1.00
## Pclass       2 889  2.31  0.83   3.00    2.39  0.00 1.0   3.00   2.00
## Sex*         3 889  1.65  0.48   2.00    1.69  0.00 1.0   2.00   1.00
## Age          4 889 29.65 12.97  29.70   29.22  9.34 0.4  80.00  79.60
## SibSp        5 889  0.52  1.10   0.00    0.27  0.00 0.0   8.00   8.00
## Parch        6 889  0.38  0.81   0.00    0.19  0.00 0.0   6.00   6.00
## Fare         7 889 32.10 49.70  14.45   21.28 10.24 0.0 512.33 512.33
## Embarked*    8 889  2.54  0.79   3.00    2.67  0.00 1.0   3.00   2.00
##            skew kurtosis   se
## Survived   0.48    -1.77 0.02
## Pclass    -0.63    -1.27 0.03
## Sex*      -0.62    -1.61 0.02
## Age        0.43     0.96 0.43
## SibSp      3.68    17.69 0.04
## Parch      2.74     9.66 0.03
## Fare       4.79    33.23 1.67
## Embarked* -1.26    -0.23 0.03

5. Task 3b:

Total Number of passengers who survived sinking = 340

table(titanic$Survived == 1) 
## 
## FALSE  TRUE 
##   549   340

6. Task 3c:

Percentage of survivor = 38

(prop.table( table(titanic$Survived))*100)
## 
##        0        1 
## 61.75478 38.24522

7. Task 3d:

Number of first class survivor = 134

titanic.sink <- xtabs(~Survived + Pclass, data=titanic) 
titanic.sink
##         Pclass
## Survived   1   2   3
##        0  80  97 372
##        1 134  87 119

8. Task 3e:

63% of the first class people survived in the titanic.

(prop.table(titanic.sink,2))*100
##         Pclass
## Survived        1        2        3
##        0 37.38318 52.71739 75.76375
##        1 62.61682 47.28261 24.23625

9. Task 3f:

89 Females from the 1st class survived the titanic.

Female.sink <- xtabs(~Survived + Pclass + Sex, data = titanic)
Female.sink
## , , Sex = female
## 
##         Pclass
## Survived   1   2   3
##        0   3   6  72
##        1  89  70  72
## 
## , , Sex = male
## 
##         Pclass
## Survived   1   2   3
##        0  77  91 300
##        1  45  17  47

10. Task 3g:

68% of the survivors were female

female.Survivor <- prop.table(xtabs(~Survived + Sex, data = titanic), 1)*100
female.Survivor
##         Sex
## Survived   female     male
##        0 14.75410 85.24590
##        1 67.94118 32.05882

11. TASK 3h:

74% of females on board the Titanic survived

female.onboard.survivor <- prop.table(xtabs(~Survived + Sex, data = titanic), 2)*100
female.onboard.survivor
##         Sex
## Survived   female     male
##        0 25.96154 81.10919
##        1 74.03846 18.89081

12. TASK 3i:

Pearson’s Chi-squared test to test the following hypothesis:

Hypothesis: The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.

library(vcd)
## Warning: package 'vcd' was built under R version 3.4.3
## Loading required package: grid
survivor.table <- xtabs( ~Survived + Sex, data = titanic)
chisq.test(survivor.table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  survivor.table
## X-squared = 258.43, df = 1, p-value < 2.2e-16

The p-vlaue is very small, hence we can reject the NULL. We can conclude that the survival rate was dependent on sex. It is clear from the above data analysis that females had better survival rates as compared to males.