Titanic

Ramya
27th September 2017

First Question (1/2)

Q1a. How many passengers were on board the Titanic?

Titanic<-read.csv("Titanic Data.csv")
      dim(Titanic)
[1] 889   8
      nrow(Titanic)
[1] 889

Q1b. How many passengers survived the sinking of the Titanic?

nrow(subset(Titanic,Titanic$Survived==1))
[1] 340

First Question (2/2)

Q1c. Create a one-way contingency table summarizing the Titanic passengers based on how many survived and how many died.

table(Titanic$Survived,dnn=c("Survival"))
Survival
  0   1 
549 340 

Q1d. What was the percentage of passengers who survived the sinking of the Titanic?

prop.table(table(Titanic$Survived,dnn=c("Survival")))*100
Survival
       0        1 
61.75478 38.24522 

Second Question (1/3)

Q2a. Create a two-way contingency table characterising the passengers based on survival and based on the passenger class

xtabs(~Titanic$Pclass+Titanic$Survived)   
              Titanic$Survived
Titanic$Pclass   0   1
             1  80 134
             2  97  87
             3 372 119

Second Question (2/3)

Q2b. Visualize your table using a Bar plot.

barplot(xtabs(~Titanic$Survived+Titanic$Pclass),beside=TRUE,xlab = "Class",ylab = "No.of Survivors",legend.text = c("Died","Survived"),main = "Survival by Passenger Class")

plot of chunk unnamed-chunk-6

Second Question (3/3)

Q2c. How many first-class passengers survived the sinking of the Titanic?

nrow(subset(Titanic,Titanic$Survived=="1" & Titanic$Pclass=="1"))
[1] 134

Q2d. What was the percentage of first-class passengers who survived the sinking of the Titanic?

prop.table(xtabs(~Titanic$Pclass+Titanic$Survived), 1)*100  
              Titanic$Survived
Titanic$Pclass        0        1
             1 37.38318 62.61682
             2 52.71739 47.28261
             3 75.76375 24.23625

Third Question (1/7)

Q3a. Create a three-way contingency table showing the number of passengers based on the passenger class, gender and survival.

xtabs(~Titanic$Pclass+Titanic$Survived+Titanic$Sex) 
, , Titanic$Sex = female

              Titanic$Survived
Titanic$Pclass   0   1
             1   3  89
             2   6  70
             3  72  72

, , Titanic$Sex = male

              Titanic$Survived
Titanic$Pclass   0   1
             1  77  45
             2  91  17
             3 300  47

Third Question (2/7)

Q3b. Express Q3a. in percentages, displaying answers up to two decimal places.

round(prop.table(ftable(Titanic$Pclass,Titanic$Sex,Titanic$Survived))*100,digits = 2)
              0     1

1 female   0.34 10.01
  male     8.66  5.06
2 female   0.67  7.87
  male    10.24  1.91
3 female   8.10  8.10
  male    33.75  5.29

Third Question (3/7)

Challenge Question C1: Visualize your table in Q3b, using a bar plot. (Hint: See this)

Titanicf<-subset(Titanic,Titanic$Sex=="female")
Titanicm<-subset(Titanic,Titanic$Sex=="male")
par(mfrow = c(1:2))
barplot(xtabs(~Titanicm$Survived+Titanicm$Pclass),beside=TRUE,xlab = "Class",ylab = "No.of Survivors",legend.text = c("Died","Survived"),main = "Male Survivors by Class")
barplot(xtabs(~Titanicf$Survived+Titanicf$Pclass),beside=TRUE,xlab = "Class",ylab = "No.of Survivors",legend.text = c("Died","Survived"),main = "Female Survivors by Class")

plot of chunk unnamed-chunk-11

Third Question (4/7)

Q3c. How many Females traveling by First-Class survived the sinking of the Titanic?

nrow(subset(Titanic,Titanic$Survived=="1" & Titanic$Pclass=="1"&Titanic$Sex=="female"))*100/nrow(subset(Titanic, Titanic$Pclass=="1"&Titanic$Sex=="female"))
[1] 96.73913

Q3d. What was the percentage of survivors who were female?

b<-nrow(subset(Titanic,Titanic$Survived=="1" & Titanic$Sex=="female"))*100/nrow(subset(Titanic, Titanic$Survived=="1"))
b
[1] 67.94118

Third Question (5/7)

Challenge Question C2: Visualize your answer in Q3d using a Pie-chart.

pie(c(b,100-b),labels = c("Female","Male"),main="% of Female Survivors")

plot of chunk unnamed-chunk-14

Third Question (6/7)

Q3e. What was the percentage of females on board the Titanic who survived?

a<-nrow(subset(Titanic,Titanic$Survived=="1" & Titanic$Sex=="female"))*100/nrow(subset(Titanic, Titanic$Sex=="female"))
a
[1] 74.03846

Third Question (7/7)

Challenge Question C3:
Visualize your answer in Q3e using a Pie-chart.

pie(c(a,100-a),labels = c("Survived","Not Survived"),main="% of Females who Survived")

plot of chunk unnamed-chunk-16

Fourth Question (1/2)

Q4a. Use a Pearson's Chi-squared test to evaluate whether the proportion of females who survived was larger than the proportion of males who survived?

Q4b. What is the p-value of the previous Pearson???s Chi-squared test?

surv<-xtabs(~Titanic$Survived+Titanic$Sex)
chisq.test(surv)

    Pearson's Chi-squared test with Yates' continuity correction

data:  surv
X-squared = 258.43, df = 1, p-value < 2.2e-16

Fourth Question (2/2)

Challenge Question C4: Create a Mosaic Plot of Titanic survivors and nonsurvivors based on gender (male/female), passenger class (First/Second/Third).

mosaicplot(xtabs(~Titanic$Sex+Titanic$Pclass+Titanic$Survived),main="Mosaic Plot of Titanic Passengers",color=TRUE)

plot of chunk unnamed-chunk-18

Fifth Question (1/3)

Q5a. Create a one-way contingency table showing the average age of the survivors and the average age of those who died

aggregate(Titanic$Age~Titanic$Survived,FUN=mean)
  Titanic$Survived Titanic$Age
1                0    30.41530
2                1    28.42382

Fifth Question (2/3)

Q5b. Create two boxplots, placed side-by-side, to visualize the distribution of the age of the survivors and the age of those who died

boxplot(Titanic$Age~Titanic$Survived,horizontal = TRUE, col= c("powder blue", "misty rose"), main="Avg. age of survivors & Non Survivors", xlab="Age", ylab="Survival")

plot of chunk unnamed-chunk-20

Fifth Question (3/3)

Q5c Run a t-test, comparing the average age of the survivors with the average age of those who died when the Titanic sank. (Hint: See Kobakoff???s sample code on running t-tests)

t.test(Titanic$Age~Titanic$Survived)

    Welch Two Sample t-test

data:  Titanic$Age by Titanic$Survived
t = 2.1816, df = 667.56, p-value = 0.02949
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.1990628 3.7838912
sample estimates:
mean in group 0 mean in group 1 
       30.41530        28.42382