Ramya
27th September 2017
Q1a. How many passengers were on board the Titanic?
Titanic<-read.csv("Titanic Data.csv")
dim(Titanic)
[1] 889 8
nrow(Titanic)
[1] 889
Q1b. How many passengers survived the sinking of the Titanic?
nrow(subset(Titanic,Titanic$Survived==1))
[1] 340
Q1c. Create a one-way contingency table summarizing the Titanic passengers based on how many survived and how many died.
table(Titanic$Survived,dnn=c("Survival"))
Survival
0 1
549 340
Q1d. What was the percentage of passengers who survived the sinking of the Titanic?
prop.table(table(Titanic$Survived,dnn=c("Survival")))*100
Survival
0 1
61.75478 38.24522
Q2a. Create a two-way contingency table characterising the passengers based on survival and based on the passenger class
xtabs(~Titanic$Pclass+Titanic$Survived)
Titanic$Survived
Titanic$Pclass 0 1
1 80 134
2 97 87
3 372 119
Q2b. Visualize your table using a Bar plot.
barplot(xtabs(~Titanic$Survived+Titanic$Pclass),beside=TRUE,xlab = "Class",ylab = "No.of Survivors",legend.text = c("Died","Survived"),main = "Survival by Passenger Class")
Q2c. How many first-class passengers survived the sinking of the Titanic?
nrow(subset(Titanic,Titanic$Survived=="1" & Titanic$Pclass=="1"))
[1] 134
Q2d. What was the percentage of first-class passengers who survived the sinking of the Titanic?
prop.table(xtabs(~Titanic$Pclass+Titanic$Survived), 1)*100
Titanic$Survived
Titanic$Pclass 0 1
1 37.38318 62.61682
2 52.71739 47.28261
3 75.76375 24.23625
Q3a. Create a three-way contingency table showing the number of passengers based on the passenger class, gender and survival.
xtabs(~Titanic$Pclass+Titanic$Survived+Titanic$Sex)
, , Titanic$Sex = female
Titanic$Survived
Titanic$Pclass 0 1
1 3 89
2 6 70
3 72 72
, , Titanic$Sex = male
Titanic$Survived
Titanic$Pclass 0 1
1 77 45
2 91 17
3 300 47
Q3b. Express Q3a. in percentages, displaying answers up to two decimal places.
round(prop.table(ftable(Titanic$Pclass,Titanic$Sex,Titanic$Survived))*100,digits = 2)
0 1
1 female 0.34 10.01
male 8.66 5.06
2 female 0.67 7.87
male 10.24 1.91
3 female 8.10 8.10
male 33.75 5.29
Challenge Question C1: Visualize your table in Q3b, using a bar plot. (Hint: See this)
Titanicf<-subset(Titanic,Titanic$Sex=="female")
Titanicm<-subset(Titanic,Titanic$Sex=="male")
par(mfrow = c(1:2))
barplot(xtabs(~Titanicm$Survived+Titanicm$Pclass),beside=TRUE,xlab = "Class",ylab = "No.of Survivors",legend.text = c("Died","Survived"),main = "Male Survivors by Class")
barplot(xtabs(~Titanicf$Survived+Titanicf$Pclass),beside=TRUE,xlab = "Class",ylab = "No.of Survivors",legend.text = c("Died","Survived"),main = "Female Survivors by Class")
Q3c. How many Females traveling by First-Class survived the sinking of the Titanic?
nrow(subset(Titanic,Titanic$Survived=="1" & Titanic$Pclass=="1"&Titanic$Sex=="female"))*100/nrow(subset(Titanic, Titanic$Pclass=="1"&Titanic$Sex=="female"))
[1] 96.73913
Q3d. What was the percentage of survivors who were female?
b<-nrow(subset(Titanic,Titanic$Survived=="1" & Titanic$Sex=="female"))*100/nrow(subset(Titanic, Titanic$Survived=="1"))
b
[1] 67.94118
Challenge Question C2: Visualize your answer in Q3d using a Pie-chart.
pie(c(b,100-b),labels = c("Female","Male"),main="% of Female Survivors")
Q3e. What was the percentage of females on board the Titanic who survived?
a<-nrow(subset(Titanic,Titanic$Survived=="1" & Titanic$Sex=="female"))*100/nrow(subset(Titanic, Titanic$Sex=="female"))
a
[1] 74.03846
Challenge Question C3:
Visualize your answer in Q3e using a Pie-chart.
pie(c(a,100-a),labels = c("Survived","Not Survived"),main="% of Females who Survived")
Q4a. Use a Pearson's Chi-squared test to evaluate whether the proportion of females who survived was larger than the proportion of males who survived?
Q4b. What is the p-value of the previous Pearson???s Chi-squared test?
surv<-xtabs(~Titanic$Survived+Titanic$Sex)
chisq.test(surv)
Pearson's Chi-squared test with Yates' continuity correction
data: surv
X-squared = 258.43, df = 1, p-value < 2.2e-16
Challenge Question C4: Create a Mosaic Plot of Titanic survivors and nonsurvivors based on gender (male/female), passenger class (First/Second/Third).
mosaicplot(xtabs(~Titanic$Sex+Titanic$Pclass+Titanic$Survived),main="Mosaic Plot of Titanic Passengers",color=TRUE)
Q5a. Create a one-way contingency table showing the average age of the survivors and the average age of those who died
aggregate(Titanic$Age~Titanic$Survived,FUN=mean)
Titanic$Survived Titanic$Age
1 0 30.41530
2 1 28.42382
Q5b. Create two boxplots, placed side-by-side, to visualize the distribution of the age of the survivors and the age of those who died
boxplot(Titanic$Age~Titanic$Survived,horizontal = TRUE, col= c("powder blue", "misty rose"), main="Avg. age of survivors & Non Survivors", xlab="Age", ylab="Survival")
Q5c Run a t-test, comparing the average age of the survivors with the average age of those who died when the Titanic sank. (Hint: See Kobakoff???s sample code on running t-tests)
t.test(Titanic$Age~Titanic$Survived)
Welch Two Sample t-test
data: Titanic$Age by Titanic$Survived
t = 2.1816, df = 667.56, p-value = 0.02949
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1990628 3.7838912
sample estimates:
mean in group 0 mean in group 1
30.41530 28.42382