For this assignment, we analyzed a file containing data elements about Titanic travelers from http://www.personal.psu.edu/dlp/w540/datasets/titanicsurvival.csv into an R dataset.
This dataset contains the following four variables and has no missing data.
This was calculated by using a simple “tally” of the observations in the dataset.
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(stats)
titanic <- read.csv(file = "http://www.personal.psu.edu/dlp/w540/datasets/titanicsurvival.csv", header = TRUE, sep = ",")
totalpassengers <- tally(titanic)
totalpassengers
## n
## 1 2201
The result shows us that there were 2,201 observations in the dataset, which, in this case, means 2,201 passenger records. (Interesting to note that this is lower than the 2,224 passengers and crew reported in Dr. Passmore’s assignment introduction.)
To determine this, we must filter out the number of survivors (1) from the dataset, then calculate what percent the survivors represent of the total number of passengers. To do this, I used the following:
titanic_df <- tbl_df(titanic)
survivors <- filter(titanic_df, Survive == 1)
totalsurvivors <- tally(survivors)
totalsurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 711
round((totalsurvivors/totalpassengers)*100)
## n
## 1 32
The result shows us that there were 711 survivors, which, when rounded, means that 32% of the ship’s passengers survived. (Again, interesting to note that this does not correspond with the information in Dr. Passmore’s assignment introduction, but since the number of observations in this dataset was lower than his reported 2,224, it makes sense that the number of survivors indicated in this dataset would also be slightly lower.)
To determine this, we must first select a single class of passengers, then calculate how many of that class survived. Finally, we calculate what percent the survivors in that class represent of the total number of passengers in that class. To do this, I used the following:
totalcrew <- tally(filter(titanic_df, Class == 0))
totalcrew
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 885
crewsurvivors <- tally(filter(titanic_df, Survive == 1, Class == 0))
crewsurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 212
totalfirstclass <- tally(filter(titanic_df, Class == 1))
totalfirstclass
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 325
firstclasssurvivors <- tally(filter(titanic_df, Survive == 1, Class == 1))
firstclasssurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 203
totalsecondclass <- tally(filter(titanic_df, Class == 2))
totalsecondclass
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 285
secondclasssurvivors <- tally(filter(titanic_df, Survive == 1, Class == 2))
secondclasssurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 118
totalthirdclass <- tally(filter(titanic_df, Class == 3))
totalthirdclass
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 706
thirdclasssurvivors <- tally(filter(titanic_df, Survive == 1, Class == 3))
thirdclasssurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 178
round((crewsurvivors/totalcrew)*100)
## n
## 1 24
round((firstclasssurvivors/totalfirstclass)*100)
## n
## 1 62
round((secondclasssurvivors/totalsecondclass)*100)
## n
## 1 41
round((thirdclasssurvivors/totalthirdclass)*100)
## n
## 1 25
The result shows us that there 24% of crew, 62% of first class, 41% of second class, and 25% of third class passengers survived.
To determine this, we must determine how many male and female passengers were aboard by filtering each from the total number of passengers. Then we filter the number of male survivors, then do the same for the female survivors. Finally, we calculate what percent of each sex survived. To do this, I used the following:
totalmale <- tally(filter(titanic_df, Sex == 1))
totalmale
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 1731
malesurvivors <- tally(filter(titanic_df, Survive == 1, Sex == 1))
malesurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 367
totalfemale <- tally(filter(titanic_df, Sex == 0))
totalfemale
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 470
femalesurvivors <- tally(filter(titanic_df, Survive == 1, Sex == 0))
femalesurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 344
round((malesurvivors/totalmale)*100)
## n
## 1 21
round((femalesurvivors/totalfemale)*100)
## n
## 1 73
The result shows us that 21% of men and 73% of women survived the sinking of the Titanic. Women had the highest survival rate.
To determine this, we must determine how many adult and how many child passengers were aboard by filtering each from the total number of passengers. Then we filter the number of adult survivors, then do the same for the child survivors. Finally, we calculate what percent of each age category survived. To do this, I used the following:
totaladult <- tally(filter(titanic_df, Age == 1))
totaladult
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 2092
adultsurvivors <- tally(filter(titanic_df, Survive == 1, Age == 1))
adultsurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 654
totalchild <- tally(filter(titanic_df, Age == 0))
totalchild
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 109
childsurvivors <- tally(filter(titanic_df, Survive == 1, Age == 0))
childsurvivors
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 57
round((adultsurvivors/totaladult)*100)
## n
## 1 31
round((childsurvivors/totalchild)*100)
## n
## 1 52
The result shows us that 31% of adults and 52% of children survived the sinking of the Titanic. Adults had the lowest survival rate.
adultmales_allClasses <- filter(titanic_df, Age == 1, Sex == 1)
totaladultmales_allClasses <- tally(adultmales_allClasses)
totaladultmales_allClasses
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 1667
adultmalesurvivors_allClasses <- filter(titanic_df, Age == 1, Sex == 1, Survive == 1)
totaladultmalesurvivors_allClasses <- tally(adultmalesurvivors_allClasses)
totaladultmalesurvivors_allClasses
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 338
adultfemales_allClasses <- filter(titanic_df, Age == 1, Sex == 0)
totaladultfemales_allClasses <- tally(adultfemales_allClasses)
totaladultfemales_allClasses
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 425
adultfemalesurvivors_allClasses <- filter(titanic_df, Age == 1, Sex == 0, Survive == 1)
totaladultfemalesurvivors_allClasses <- tally(adultfemalesurvivors_allClasses)
totaladultfemalesurvivors_allClasses
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 316
childmales_allClasses <- filter(titanic_df, Age == 0, Sex == 1)
totalchildmales_allClasses <- tally(childmales_allClasses)
totalchildmales_allClasses
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 64
childmalesurvivors_allClasses <- filter(titanic_df, Age == 0, Sex == 1, Survive == 1)
totalchildmalesurvivors_allClasses <- tally(childmalesurvivors_allClasses)
totalchildmalesurvivors_allClasses
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 29
childfemales_allClasses <- filter(titanic_df, Age == 0, Sex == 0)
totalchildfemales_allClasses <- tally(childfemales_allClasses)
totalchildfemales_allClasses
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 45
childfemalesurvivors_allClasses <- filter(titanic_df, Age == 0, Sex == 0, Survive == 1)
totalchildfemalesurvivors_allClasses <- tally(childfemalesurvivors_allClasses)
totalchildfemalesurvivors_allClasses
## Source: local data frame [1 x 1]
##
## n
## (int)
## 1 28
round((totaladultmalesurvivors_allClasses*100)/totaladultmales_allClasses)
## n
## 1 20
round((totaladultfemalesurvivors_allClasses*100)/totaladultfemales_allClasses)
## n
## 1 74
round((totalchildmalesurvivors_allClasses*100)/totalchildmales_allClasses)
## n
## 1 45
round((totalchildfemalesurvivors_allClasses*100)/totalchildfemales_allClasses)
## n
## 1 62