Titanic assignment final

Elena Giasi
25 Sept 2017

Q1a. How many passengers were on board the Titanic?

Titanic<-read.csv("TitanicData.csv")
View(Titanic)
man=subset(Titanic,Titanic$Survived == 1)
View(man)
dim(Titanic)[1]
[1] 889
nrow(Titanic)
[1] 889
survived=subset(Titanic,Titanic$Survived==1)
dim(survived)[1]
[1] 340
View(survived)

Q1b. How many passengers survived the sinking of the Titanic?

survived=subset(Titanic,Titanic$Survived==1)
dim(survived)
[1] 340   8

Q1c. Create a one-way contingency table summarizing the Titanic passengers based on how many survived and how many died.

Titanic$Survived<-factor(Titanic$Survived,levels = c(0,1),labels =c("No","Yes") )
library(vcd)
mytable<-with(Titanic,table(Survived))
mytable
Survived
 No Yes 
549 340 

```

Q1d. What was the percentage of passengers who survived the sinking of the Titanic?

prop.table(mytable)*100
Survived
      No      Yes 
61.75478 38.24522 

```

Q2a. Create a two-way contingency table characterising the passengers based on survival and based on the passenger class.

mytable2<-xtabs(~Titanic$Pclass+Titanic$Survived,data=Titanic)
mytable2
              Titanic$Survived
Titanic$Pclass  No Yes
             1  80 134
             2  97  87
             3 372 119
(addmargins(mytable2))
              Titanic$Survived
Titanic$Pclass  No Yes Sum
           1    80 134 214
           2    97  87 184
           3   372 119 491
           Sum 549 340 889

```

Q2b. Visualize your table using a Bar plot

barplot(mytable2,main = "Survival by Passenger Class", xlab="Passenger class", ylab = "Frequency", col = c("grey","black"),legend=rownames(mytable2),beside = TRUE)

![plot of chunk unnamed-chunk-6](ffff-figure/unnamed-chunk-6-1.png)

Q2c. How many first-class passengers survived the sinking of the Titanic?

View(Titanic)
qwe= subset(Titanic,Titanic$Survived=="Yes")
View(qwe)
nrow(qwe)
[1] 340
survived1=subset(Titanic, Titanic$Survived=="Yes" & Titanic$Pclass==1)
View(survived1)
dim(survived1)[1]
[1] 134
nrow(survived1)
[1] 134

```

Q2d. What was the percentage of first-class passengers who survived the sinking of the Titanic?

prop.table(mytable2)
              Titanic$Survived
Titanic$Pclass         No        Yes
             1 0.08998875 0.15073116
             2 0.10911136 0.09786277
             3 0.41844769 0.13385827

```

Q3a. Create a three-way contingency table showing the number of passengers based on the passenger's class, gender and survival.

mytable3<-xtabs(~Survived+Pclass+Sex, data = Titanic)
mytable3
, , Sex = female

        Pclass
Survived   1   2   3
     No    3   6  72
     Yes  89  70  72

, , Sex = male

        Pclass
Survived   1   2   3
     No   77  91 300
     Yes  45  17  47
ftable(addmargins(mytable3))
                Sex female male Sum
Survived Pclass                    
No       1               3   77  80
         2               6   91  97
         3              72  300 372
         Sum            81  468 549
Yes      1              89   45 134
         2              70   17  87
         3              72   47 119
         Sum           231  109 340
Sum      1              92  122 214
         2              76  108 184
         3             144  347 491
         Sum           312  577 889

```

Q3b. Express Q3a. in percentages, displaying answers up to two decimal places.

ftable(addmargins(prop.table(mytable3,c(1,2)),3)*100)
                Sex     female       male        Sum
Survived Pclass                                     
No       1            3.750000  96.250000 100.000000
         2            6.185567  93.814433 100.000000
         3           19.354839  80.645161 100.000000
Yes      1           66.417910  33.582090 100.000000
         2           80.459770  19.540230 100.000000
         3           60.504202  39.495798 100.000000

```

Challenge Question C1: Visualize your table in Q3b, using a bar plot. (Hint: See this)

par(mfrow=c(1,2))
mytable2<- xtabs(~Survived+Pclass,data=Titanic,Titanic$Sex=="male")
mytable2
        Pclass
Survived   1   2   3
     No   77  91 300
     Yes  45  17  47
barplot(mytable2,main = "Males",
         ylab = "Nr of passengers",
        col = c("gray", "white"),legend=rownames(mytable2),beside = TRUE)


mytable4<- xtabs(~Survived+Pclass,data=Titanic,Titanic$Sex=="female")
mytable4
        Pclass
Survived  1  2  3
     No   3  6 72
     Yes 89 70 72
barplot(mytable4,main = "Females",
        ylab = "Nr of passengers",
        col = c("gray", "white"),legend=rownames(mytable4),beside = TRUE)

![plot of chunk unnamed-chunk-11](ffff-figure/unnamed-chunk-11-1.png)

Q3c. How many Females traveling by First-Class survived the sinking of the Titanic?

mytable4
        Pclass
Survived  1  2  3
     No   3  6 72
     Yes 89 70 72
View(mytable4)

```

Q3d. What was the percentage of survivors who were female? (Hint: Q3c. and Q3d. are not identical)

Visualize your answer in Q3d using a Pie-chart.

Q3e. What was the percentage of females on board the Titanic who survived?

Q4a. Use a Pearson's Chi-squared test to evaluate whether the proportion of females who survived was larger than the proportion of males who survived?

survivors<-xtabs(~Titanic$Survived+Titanic$Sex,data = Titanic)
survivors
                Titanic$Sex
Titanic$Survived female male
             No      81  468
             Yes    231  109

```

Q4b. What is the p-value of the previous Pearson's Chi-squared test?

chisq.test(survivors)[3]
$p.value
[1] 3.77991e-58

```

Challenge Question C4: Learn about Mosaic Plot by browsing this link. Create a Mosaic Plot of Titanic survivors and nonsurvivors based on gender (male/female), passenger class (First/Second/Third).

sex<-factor(Titanic$Sex,levels = c("female","male"),
            labels = c("F","M"))

tab<-xtabs(~Survived+Pclass+Sex, data = Titanic)
mosaicplot(tab, main = "Mosaic Plot Survors",color = c("green","white","red"))

```

Q5a. Create a one-way contingency table showing the average age of the survivors and the average age of those who died (male/female), passenger class (First/Second/Third).

dead<-subset(Titanic,Titanic$Survived==0)
averagedead<-mean((dead$Age))
averagealived<-mean(survived$Age)
matrix<-matrix(c(averagedead,averagealived))
colnames(matrix)<- "Average"
row.names(matrix)<-c("Dead","Alived")
matrix
        Average
Dead        NaN
Alived 28.42382

```

Q5b Create two boxplots, placed side-by-side, to visualize the distribution of the age of the survivors and the age of those who died. Hint: Your box plot could be analogous to the following example:

boxplot(Titanic$Age~Titanic$Survived,data = Titanic, 
        main="Age of alived and dead",
        xlab="age",
        ylab="alive or dead")

![plot of chunk unnamed-chunk-20](ffff-figure/unnamed-chunk-20-1.png)

Q5c Run a t-test, comparing the average age of the survivors with the average age of those who died when the Titanic sank. (Hint: See Kobakoff's sample code on running t-tests)

Error in t.test.default(dead[, 4], survived[, 4]) : 
  not enough 'x' observations