Firstly read a csv file containing data elements associated with some Titanic survivors.This data set contains 8 variables and has no missing data.
Survived–Survival {0 = No, 1 = Yes}
Pclass–Ticket Class {1 = 1st, 2 = 2nd, 3 = 3rd}
Sex–Sex {Male, Female}
Age–Age in years
SibSp–Number of Siblings / spouses aboard the Titanic
Parch–Number of Parents / children aboard the Titanic
Fare–Passenger fare
Embarked–Port of Embarking {C = Cherbourg, Q = Queenstown, S = Southampton}
titanic.df <- read.csv(paste("TitanicData.csv", sep=""))
[TASK 3a]
Total number of passengers on board the titanic=889
library(psych)
describe(titanic.df)
## vars n mean sd median trimmed mad min max range
## Survived 1 889 0.38 0.49 0.00 0.35 0.00 0.0 1.00 1.00
## Pclass 2 889 2.31 0.83 3.00 2.39 0.00 1.0 3.00 2.00
## Sex* 3 889 1.65 0.48 2.00 1.69 0.00 1.0 2.00 1.00
## Age 4 889 29.65 12.97 29.70 29.22 9.34 0.4 80.00 79.60
## SibSp 5 889 0.52 1.10 0.00 0.27 0.00 0.0 8.00 8.00
## Parch 6 889 0.38 0.81 0.00 0.19 0.00 0.0 6.00 6.00
## Fare 7 889 32.10 49.70 14.45 21.28 10.24 0.0 512.33 512.33
## Embarked* 8 889 2.54 0.79 3.00 2.67 0.00 1.0 3.00 2.00
## skew kurtosis se
## Survived 0.48 -1.77 0.02
## Pclass -0.63 -1.27 0.03
## Sex* -0.62 -1.61 0.02
## Age 0.43 0.96 0.43
## SibSp 3.68 17.69 0.04
## Parch 2.74 9.66 0.03
## Fare 4.79 33.23 1.67
## Embarked* -1.26 -0.23 0.03
[TASK 3b]
Number of passengers who survived the sinking of the Titanic=340
(since the digit 1 in the variable (survival) represents passengers survived.)
table(titanic.df$Survived)
##
## 0 1
## 549 340
[TASK 3c]
The percentage of passengers who survived the sinking of the Titanic=38%
survived1 <- with(titanic.df, table(Survived))
prop.table(survived1)
## Survived
## 0 1
## 0.6175478 0.3824522
prop.table(survived1)*100
## Survived
## 0 1
## 61.75478 38.24522
[TASK 3d]
The number of first-class passengers who survived the sinking of the Titanic=134.
task3d <- xtabs(~ Survived+Pclass, data=titanic.df)
addmargins(task3d)
## Pclass
## Survived 1 2 3 Sum
## 0 80 97 372 549
## 1 134 87 119 340
## Sum 214 184 491 889
[TASK 3e]
The percentage of first-class passengers who survived the sinking of the Titanic=15%
task3e <- with(titanic.df, table(Survived,Pclass))
prop.table(task3e)
## Pclass
## Survived 1 2 3
## 0 0.08998875 0.10911136 0.41844769
## 1 0.15073116 0.09786277 0.13385827
prop.table(task3e)*100
## Pclass
## Survived 1 2 3
## 0 8.998875 10.911136 41.844769
## 1 15.073116 9.786277 13.385827
[TASK 3f]
The number of females from First-Class who survived the sinking of the Titanic=89
task3f <- xtabs(~ Pclass+Sex+Survived, data=titanic.df)
task3f
## , , Survived = 0
##
## Sex
## Pclass female male
## 1 3 77
## 2 6 91
## 3 72 300
##
## , , Survived = 1
##
## Sex
## Pclass female male
## 1 89 45
## 2 70 17
## 3 72 47
[TASK 3g]
The percentage of survivors who were female=26%
task3g <- xtabs(~ Survived+Sex, data=titanic.df)
prop.table(task3g)
## Sex
## Survived female male
## 0 0.09111361 0.52643420
## 1 0.25984252 0.12260967
prop.table(task3g)*100
## Sex
## Survived female male
## 0 9.111361 52.643420
## 1 25.984252 12.260967
[TASK 3h]
The percentage of females on board the Titanic who survived=74%
prop.table(task3g,2)
## Sex
## Survived female male
## 0 0.2596154 0.8110919
## 1 0.7403846 0.1889081
prop.table(task3g,2)*100
## Sex
## Survived female male
## 0 25.96154 81.10919
## 1 74.03846 18.89081
[TASK 3i]
Run a Pearson’s Chi-squared test to test the following hypothesis:
Hypothesis: The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.
chisq.test(task3g)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: task3g
## X-squared = 258.43, df = 1, p-value < 2.2e-16
As the value of p is less than 0.01 it is safe to reject the null hypothesis.