titanic.df <- read.csv("Titanic Data.csv")
View(titanic.df)
nrow(titanic.df)
## [1] 889
Number of total passengers: 889
sum(titanic.df$Survived)
## [1] 340
Number of passengers survived: 340
sum(titanic.df$Survived)/nrow(titanic.df)*100
## [1] 38.24522
Percentage of passengers survived: 38.24%
xtabs( ~ Survived + Pclass, data = titanic.df)
## Pclass
## Survived 1 2 3
## 0 80 97 372
## 1 134 87 119
Number of 1st class passengers survived: 119
prop.table(xtabs( ~ Survived + Pclass, data = titanic.df))*100
## Pclass
## Survived 1 2 3
## 0 8.998875 10.911136 41.844769
## 1 15.073116 9.786277 13.385827
Percentage of 1st class passengers who survived: 13%.
xtabs( ~ Survived + Pclass + Sex, data = titanic.df)
## , , Sex = female
##
## Pclass
## Survived 1 2 3
## 0 3 6 72
## 1 89 70 72
##
## , , Sex = male
##
## Pclass
## Survived 1 2 3
## 0 77 91 300
## 1 45 17 47
Number of 1st class passenger female survivors: 72.
library(gmodels)
## Warning: package 'gmodels' was built under R version 3.3.3
CrossTable(titanic.df$Survived,titanic.df$Sex)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 889
##
##
## | titanic.df$Sex
## titanic.df$Survived | female | male | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 81 | 468 | 549 |
## | 64.727 | 35.000 | |
## | 0.148 | 0.852 | 0.618 |
## | 0.260 | 0.811 | |
## | 0.091 | 0.526 | |
## --------------------|-----------|-----------|-----------|
## 1 | 231 | 109 | 340 |
## | 104.515 | 56.514 | |
## | 0.679 | 0.321 | 0.382 |
## | 0.740 | 0.189 | |
## | 0.260 | 0.123 | |
## --------------------|-----------|-----------|-----------|
## Column Total | 312 | 577 | 889 |
## | 0.351 | 0.649 | |
## --------------------|-----------|-----------|-----------|
##
##
From the table above: The percentage of survivors who were female: 67.9%
The percentage of female on board who survived: 74%
We are using Chi-squared test because the data involved is categorical.
Hypothesis: The proportion of females onboard who survived the sinking of the Titanic was higher than the proportion of males onboard who survived the sinking of the Titanic.
Null Hypothesis: There is no relation between the two variables
Alternative hypothesis: There is a relation between the two variables
a <- xtabs(~Survived+Sex, data = titanic.df)
chisq.test(a)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: a
## X-squared = 258.43, df = 1, p-value < 2.2e-16
As the p-value is very small(p<0.05) we conclude that we can reject our Null hypothesis.