Elena Giasi
25 Sept 2017
Q1a. How many passengers were on board the Titanic?
Titanic<-read.csv("TitanicData.csv")
View(Titanic)
man=subset(Titanic,Titanic$Survived == 1)
View(man)
dim(Titanic)[1]
[1] 889
nrow(Titanic)
[1] 889
survived=subset(Titanic,Titanic$Survived==1)
dim(survived)[1]
[1] 340
View(survived)
Q1b. How many passengers survived the sinking of the Titanic?
survived=subset(Titanic,Titanic$Survived==1)
dim(survived)
[1] 340 8
Q1c. Create a one-way contingency table summarizing the Titanic passengers based on how many survived and how many died.
Titanic$Survived<-factor(Titanic$Survived,levels = c(0,1),labels =c("No","Yes") )
library(vcd)
mytable<-with(Titanic,table(Survived))
mytable
Survived
No Yes
549 340
Q1d. What was the percentage of passengers who survived the sinking of the Titanic?
prop.table(mytable)*100
Survived
No Yes
61.75478 38.24522
Q2a. Create a two-way contingency table characterising the passengers based on survival and based on the passenger class.
mytable2<-xtabs(~Titanic$Pclass+Titanic$Survived,data=Titanic)
mytable2
Titanic$Survived
Titanic$Pclass No Yes
1 80 134
2 97 87
3 372 119
(addmargins(mytable2))
Titanic$Survived
Titanic$Pclass No Yes Sum
1 80 134 214
2 97 87 184
3 372 119 491
Sum 549 340 889
Q2b. Visualize your table using a Bar plot
barplot(mytable2,main = "Survival by Passenger Class", xlab="Passenger class", ylab = "Frequency", col = c("grey","black"),legend=rownames(mytable2),beside = TRUE)
Q2c. How many first-class passengers survived the sinking of the Titanic?
View(Titanic)
qwe= subset(Titanic,Titanic$Survived=="Yes")
View(qwe)
nrow(qwe)
[1] 340
survived1=subset(Titanic, Titanic$Survived=="Yes" & Titanic$Pclass==1)
View(survived1)
dim(survived1)[1]
[1] 134
nrow(survived1)
[1] 134
Q2d. What was the percentage of first-class passengers who survived the sinking of the Titanic?
prop.table(mytable2)
Titanic$Survived
Titanic$Pclass No Yes
1 0.08998875 0.15073116
2 0.10911136 0.09786277
3 0.41844769 0.13385827
Q3a. Create a three-way contingency table showing the number of passengers based on the passenger's class, gender and survival.
mytable3<-xtabs(~Survived+Pclass+Sex, data = Titanic)
mytable3
, , Sex = female
Pclass
Survived 1 2 3
No 3 6 72
Yes 89 70 72
, , Sex = male
Pclass
Survived 1 2 3
No 77 91 300
Yes 45 17 47
ftable(addmargins(mytable3))
Sex female male Sum
Survived Pclass
No 1 3 77 80
2 6 91 97
3 72 300 372
Sum 81 468 549
Yes 1 89 45 134
2 70 17 87
3 72 47 119
Sum 231 109 340
Sum 1 92 122 214
2 76 108 184
3 144 347 491
Sum 312 577 889
Q3b. Express Q3a. in percentages, displaying answers up to two decimal places.
ftable(addmargins(prop.table(mytable3,c(1,2)),3)*100)
Sex female male Sum
Survived Pclass
No 1 3.750000 96.250000 100.000000
2 6.185567 93.814433 100.000000
3 19.354839 80.645161 100.000000
Yes 1 66.417910 33.582090 100.000000
2 80.459770 19.540230 100.000000
3 60.504202 39.495798 100.000000
Challenge Question C1: Visualize your table in Q3b, using a bar plot. (Hint: See this)
par(mfrow=c(1,2))
mytable2<- xtabs(~Survived+Pclass,data=Titanic,Titanic$Sex=="male")
mytable2
Pclass
Survived 1 2 3
No 77 91 300
Yes 45 17 47
barplot(mytable2,main = "Males",
ylab = "Nr of passengers",
col = c("gray", "white"),legend=rownames(mytable2),beside = TRUE)
mytable4<- xtabs(~Survived+Pclass,data=Titanic,Titanic$Sex=="female")
mytable4
Pclass
Survived 1 2 3
No 3 6 72
Yes 89 70 72
barplot(mytable4,main = "Females",
ylab = "Nr of passengers",
col = c("gray", "white"),legend=rownames(mytable4),beside = TRUE)
Q3c. How many Females traveling by First-Class survived the sinking of the Titanic?
mytable4
Pclass
Survived 1 2 3
No 3 6 72
Yes 89 70 72
View(mytable4)
Q3d. What was the percentage of survivors who were female? (Hint: Q3c. and Q3d. are not identical)
Visualize your answer in Q3d using a Pie-chart.
Q3e. What was the percentage of females on board the Titanic who survived?
Q4a. Use a Pearson's Chi-squared test to evaluate whether the proportion of females who survived was larger than the proportion of males who survived?
survivors<-xtabs(~Titanic$Survived+Titanic$Sex,data = Titanic)
survivors
Titanic$Sex
Titanic$Survived female male
No 81 468
Yes 231 109
Q4b. What is the p-value of the previous Pearson's Chi-squared test?
chisq.test(survivors)[3]
$p.value
[1] 3.77991e-58
Challenge Question C4: Learn about Mosaic Plot by browsing this link. Create a Mosaic Plot of Titanic survivors and nonsurvivors based on gender (male/female), passenger class (First/Second/Third).
sex<-factor(Titanic$Sex,levels = c("female","male"),
labels = c("F","M"))
tab<-xtabs(~Survived+Pclass+Sex, data = Titanic)
mosaicplot(tab, main = "Mosaic Plot Survors",color = c("green","white","red"))
Q5a. Create a one-way contingency table showing the average age of the survivors and the average age of those who died (male/female), passenger class (First/Second/Third).
dead<-subset(Titanic,Titanic$Survived==0)
averagedead<-mean((dead$Age))
averagealived<-mean(survived$Age)
matrix<-matrix(c(averagedead,averagealived))
colnames(matrix)<- "Average"
row.names(matrix)<-c("Dead","Alived")
matrix
Average
Dead NaN
Alived 28.42382
Q5b Create two boxplots, placed side-by-side, to visualize the distribution of the age of the survivors and the age of those who died. Hint: Your box plot could be analogous to the following example:
boxplot(Titanic$Age~Titanic$Survived,data = Titanic,
main="Age of alived and dead",
xlab="age",
ylab="alive or dead")
Q5c Run a t-test, comparing the average age of the survivors with the average age of those who died when the Titanic sank. (Hint: See Kobakoff's sample code on running t-tests)
Error in t.test.default(dead[, 4], survived[, 4]) :
not enough 'x' observations