Class - 0 = crew, 1 = first class, 2 = second class, 3 = third class (pertain to the quality and types of cabins on the Titanic)
Age - (1 = adult, 0 = child)
Sex - (1 = male, 0 = female)
Survived - Survived (1 = yes, 0 = no)
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
titanic<-read.csv(url("http://www.personal.psu.edu/dlp/w540/datasets/titanicsurvival.csv"))
Titanic<-tbl_df(titanic)
Titanic
## Source: local data frame [2,201 x 4]
##
## Class Age Sex Survive
## 1 1 1 1 1
## 2 1 1 1 1
## 3 1 1 1 1
## 4 1 1 1 1
## 5 1 1 1 1
## 6 1 1 1 1
## 7 1 1 1 1
## 8 1 1 1 1
## 9 1 1 1 1
## 10 1 1 1 1
## .. ... ... ... ...
glimpse(Titanic)
## Observations: 2201
## Variables:
## $ Class (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ Age (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ Sex (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ Survive (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
count(Titanic) # I think this syntax is better.
## Source: local data frame [1 x 1]
##
## n
## 1 2201
survivedpassengers<-filter(titanic,Survive==1)
glimpse(survivedpassengers)
## Observations: 711
## Variables:
## $ Class (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ Age (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ Sex (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ Survive (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
prop.table(table(titanic$Survive))
##
## 0 1
## 0.676965 0.323035
Crew: 24.0% (0.2395480)
First class: 62.5% (0.6246154)
Second class: 41.4% (0.4140351)
Third class: 25.2% (0.2521246)
prop.table(table(titanic$Class, titanic$Survive),1)
##
## 0 1
## 0 0.7604520 0.2395480
## 1 0.3753846 0.6246154
## 2 0.5859649 0.4140351
## 3 0.7478754 0.2521246
Female 73.2%(0.7319149)
Male 21.1%(0.2120162)
Which sex had the highest survival rate? Female
prop.table(table(titanic$Sex, titanic$Survive),1)
##
## 0 1
## 0 0.2680851 0.7319149
## 1 0.7879838 0.2120162
Child 52.3%(0.5229358)
Adult 31.3%(0.3126195)
Which age had the lowest survival rate? Child
prop.table(table(titanic$Age, titanic$Survive),1)
##
## 0 1
## 0 0.4770642 0.5229358
## 1 0.6873805 0.3126195
Adult male 20.3%(0.2027594)
Child male 45.3%(0.453125)
Adult female 74.4%(0.7435294)
Child female 62.2%(0.6222222)
Which group was most likely to survive? Adult female
Least likely? Adult male
cf<-filter(titanic,Age==0, Sex==0) # Child females
cfs<-filter(titanic,Age==0, Sex==0, Survive==1) # Survived child females
cm<-filter(titanic,Age==0, Sex==1) # Child males
cms<-filter(titanic,Age==0, Sex==1, Survive==1) # Survived child males
am<-filter(titanic,Age==1, Sex==1) # Adult males
ams<-filter(titanic,Age==1, Sex==1, Survive==1) # Survived adult males
af<-filter(titanic,Age==1, Sex==0) # Adult females
afs<-filter(titanic,Age==1, Sex==0, Survive==1) # Survived adult females
nrow(cfs)/nrow(cf)
## [1] 0.6222222
nrow(afs)/nrow(af)
## [1] 0.7435294
nrow(ams)/nrow(am)
## [1] 0.2027594
nrow(cms)/nrow(cm)
## [1] 0.453125
Crew, Adult male: 22.3%(0.222738)
Crew, Child male: none
Crew, Adult female: 87.0%(0.869565)
Crew, Child female: none
First class , Adult male: 32.6%(0.325714)
First class , Child male: 100.0%
First class , Adult female: 97.2%(0.972222)
First class , Child female: 100.0%
Second class, Adult male: 8.3%(0.083333)
Second class, Child male: 100.0%
Second class, Adult female: 86.0%(0.860215)
Second class, Child female: 100.0%
Third class, Adult male: 16.2%(0.162338)
Third class, Child male: 27.1%(0.270833)
Third class, Adult female: 46.1%(0.460606)
Third class, Child female: 45.2%(0.451613)
Which group had the highest mortality in this disaster. Second class, Adult males
Why? I think that adult men would have sacrificed himself in order to save the women and children.
Category <- group_by(titanic, Class, Age, Sex)
Survived<-filter(Category,Survive==1)
total<-summarise(Category, n=n())
Sur<-summarise(Survived, n=n())
proportion<-Sur[,4]/total[,4]
summarise(Category)
## Source: local data frame [14 x 3]
## Groups: Class, Age
##
## Class Age Sex
## 1 0 1 0
## 2 0 1 1
## 3 1 0 0
## 4 1 0 1
## 5 1 1 0
## 6 1 1 1
## 7 2 0 0
## 8 2 0 1
## 9 2 1 0
## 10 2 1 1
## 11 3 0 0
## 12 3 0 1
## 13 3 1 0
## 14 3 1 1
proportion
## n
## 1 0.86956522
## 2 0.22273782
## 3 1.00000000
## 4 1.00000000
## 5 0.97222222
## 6 0.32571429
## 7 1.00000000
## 8 1.00000000
## 9 0.86021505
## 10 0.08333333
## 11 0.45161290
## 12 0.27083333
## 13 0.46060606
## 14 0.16233766
The results of the analysis show that the survive rate in the Titanic accident was 32.3%. To be specific, the survival rate was higher in female (73.2%) than male (21.1%) and it was higher in child (52.3%) than adult (31.3%). With regard to class, the survival rate in the first class was highest (62.5%). Overall, in all classes, the survival rate of adult males was the lowest. This is because they helped to rescue the women and children first, I think. On the other hand, the survival rate of third class was relatively low. Given that the survival rates in female and child in third class were not so high, I think that the passengers in third class may have been rescued later than the other classes.