Assignment Guidelines can be found here
The following is an analysis of passenger survival data from the Titanic ship wreck disaster:
The first thing that I did was to create a new R Markdown project file for this assignment. Also, I began with the assumption that I would need the following R analysis packages: datasets, ggvis, dplyr and magritter, so I loaded them:
r require(datasets) require(ggvis)
## Loading required package: ggvis
r require(dplyr)
## Loading required package: dplyr ## ## Attaching package: 'dplyr' ## ## The following objects are masked from 'package:stats': ## ## filter, lag ## ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
r require(magrittr)
## Loading required package: magrittr
I also read the .csv file into R from the WFED 540 course website, as prompted in the assignment:
r survivaldata <- read.csv(file = "http://www.personal.psu.edu/dlp/w540/datasets/titanicsurvival.csv", header = TRUE, sep=",")
The R Environment pane tells us that the dataset which I have named ‘survivaldata’ has 2201 observations of 4 variables, and the assignment instructions confirm that there are no missing data in this data set. There are a variety of ways to double-check this number with R, including the ‘tbl_df’ command, which displays the number of observations and the number of variables in the “Source” line at the top.
survivaldata1 <-tbl_df(survivaldata)
survivaldata1
## Source: local data frame [2,201 x 4]
##
## Class Age Sex Survive
## (int) (int) (int) (int)
## 1 1 1 1 1
## 2 1 1 1 1
## 3 1 1 1 1
## 4 1 1 1 1
## 5 1 1 1 1
## 6 1 1 1 1
## 7 1 1 1 1
## 8 1 1 1 1
## 9 1 1 1 1
## 10 1 1 1 1
## .. ... ... ... ...
To calculate the proportion of survivors, I have created a table which sorted the data based on the two conditions of the ‘Survive’ variable: a ‘1’ indicates that the passenger survived, and a ‘0’ indicates that they did not survive.
question3 <- table(survivaldata$Survive)
question3
##
## 0 1
## 1490 711
711/2201
## [1] 0.323035
As evidenced in the table, 1490 of the passengers in the data set did not survive, and 711 of the passengers did survive. Arthimatic in R calcualtes the proportion of survivors as 711/2201 = .323.
R also offers a ‘prop.table’ function that will automatically calcuate the proportion of survivors, with “TRUE” representing the ’Survive = 1:
proportions <- table(survivaldata$Survive==1)
proportions
##
## FALSE TRUE
## 1490 711
prop.table(proportions)
##
## FALSE TRUE
## 0.676965 0.323035
To calculate the porporation of each class that survived the disaster, I once again used the ‘prop.table’ function. To use the ‘prop.table’ command, I first created a table indicating the data set to refer to and the variables to analyze. In this table, the rows indicate passenger class and the columns indicate passenger survival:
question4 <- table(survivaldata$Class, survivaldata$Survive)
question4
##
## 0 1
## 0 673 212
## 1 122 203
## 2 167 118
## 3 528 178
prop.table(question4, margin=1)
##
## 0 1
## 0 0.7604520 0.2395480
## 1 0.3753846 0.6246154
## 2 0.5859649 0.4140351
## 3 0.7478754 0.2521246
Note that the greatest proportion of First Class passengers survived at .624, followed by Second Class passengers at .414%. Third Class and Crew Members had the lowest survival rates at .252 and .239, respectively.
Two-hundred-twelve of the eight-hundred-eighty-five (212/885) Crew members survived, a proportion of .239 (23.9%).
Two-hundred-and-three of three-hundred-twenty-five (203/325) First Class passengers survived, a proportion of .624 (62.4%).
One-hundred-eighteen of the one-hundred-sixty-seven 118/167 Second Class passengers survived, a proportion of .414 (41.4%).
One-hundred-seventy-eight of the five-hundred-twenty-eight (178/528) Third Class passengers survived, a proportion of .252 (25.2%).
We can calculate the percentage of male versus female survivors in the same manner, using the “table()” and “prop.table()” functions to view the passenger survival data. In this table, the rows indicate passenger gender and the columns indicate passenger survival:
question5 <- table(survivaldata$Sex, survivaldata$Survive)
question5
##
## 0 1
## 0 126 344
## 1 1364 367
prop.table(question5, margin=1)
##
## 0 1
## 0 0.2680851 0.7319149
## 1 0.7879838 0.2120162
Notably, female passengers survived at in significantly greater proportion than male passengers at .731 and .212, respectively.
In order to understand the proportion of survivors from each age category, we use a similar procedure, but this time we will create a table showing Age and Survive variables. The rows indicate passenger age and the columns indicate passenger survival:
question6 <- table(survivaldata$Age, survivaldata$Survive)
question6
##
## 0 1
## 0 52 57
## 1 1438 654
prop.table(question6, margin = 1)
##
## 0 1
## 0 0.4770642 0.5229358
## 1 0.6873805 0.3126195
Notably, adult passengers had a much lower survival rate than child passengers at .321 and .522, respectively.
To examine passenger survival data by Sex, Age and Survive variables we will first filter the survivaldata set by sex, then calculate the proportion of each passenger subset that survived. The rows represent passenger age and the columns indicate passenger survival:
males<- filter(survivaldata, Sex ==1)
malesS <- ftable(males$Age, males$Survive)
prop.table(malesS, margin = 1)
## 0 1
##
## 0 0.5468750 0.4531250
## 1 0.7972406 0.2027594
Here we see that a proportion of .435 of the male child passengers survived and .202 of the adult male passengers survived.
females<- filter(survivaldata, Sex ==0)
femalesS <- ftable(females$Age, females$Survive)
prop.table(femalesS, margin = 1)
## 0 1
##
## 0 0.3777778 0.6222222
## 1 0.2564706 0.7435294
From this table we see that .622 of the female child passengers survived and .743 of the adult female passengers survived.
Adult female passengers survived in the highest proportion, followed by female child passengers and male child passengers, respectively. Adult males survived in the lowest proportion.
The proportion of passengers surviving from each age/sex/class category can be calcuated with a similar procedure, filtering the ‘survivaldata’ data set by Sex, creating a flat table (‘ftable) with the data sorted by the Age, Class and Survive==1 variables, and then executing the ’prop.table’ command to calculate the proportion of surviving passengers in each subset. Note that I have multiplied the prop.table results by 100 so that I can view the results as a percentage, which is easier for me to process than proportions. The table displays Age, Sex and Class in the rows and Survive in the columns:
question8 <- ftable(survivaldata$Sex, survivaldata$Age, survivaldata$Class, survivaldata$Survive==1)
question8
## FALSE TRUE
##
## 0 0 0 0 0
## 1 0 1
## 2 0 13
## 3 17 14
## 1 0 3 20
## 1 4 140
## 2 13 80
## 3 89 76
## 1 0 0 0 0
## 1 0 5
## 2 0 11
## 3 35 13
## 1 0 670 192
## 1 118 57
## 2 154 14
## 3 387 75
prop.table(question8, margin=1)
## FALSE TRUE
##
## 0 0 0 NaN NaN
## 1 0.00000000 1.00000000
## 2 0.00000000 1.00000000
## 3 0.54838710 0.45161290
## 1 0 0.13043478 0.86956522
## 1 0.02777778 0.97222222
## 2 0.13978495 0.86021505
## 3 0.53939394 0.46060606
## 1 0 0 NaN NaN
## 1 0.00000000 1.00000000
## 2 0.00000000 1.00000000
## 3 0.72916667 0.27083333
## 1 0 0.77726218 0.22273782
## 1 0.67428571 0.32571429
## 2 0.91666667 0.08333333
## 3 0.83766234 0.16233766
prop.table(question8, margin=1)*100
## FALSE TRUE
##
## 0 0 0 NaN NaN
## 1 0.000000 100.000000
## 2 0.000000 100.000000
## 3 54.838710 45.161290
## 1 0 13.043478 86.956522
## 1 2.777778 97.222222
## 2 13.978495 86.021505
## 3 53.939394 46.060606
## 1 0 0 NaN NaN
## 1 0.000000 100.000000
## 2 0.000000 100.000000
## 3 72.916667 27.083333
## 1 0 77.726218 22.273782
## 1 67.428571 32.571429
## 2 91.666667 8.333333
## 3 83.766234 16.233766
The proportions of passengers surviving are detailed in the table. Not that, although Third Class passengers had the highest overall mortality rate (as noted in question #4), Second Class male passengers actually had the highest mortaility rate when the data is further stratified by age, sex and class. Although I read one theory that Second Class men were the most compelled by duty to see that the ‘women and children first’ rule applied to boarding of lifeboats (“Titanic Casualty Figures”), this is likely a function of the ‘sample proportion’ from that one and only sample of Titanic passenger survivors. If the ship sank again and another sample was taken, it very well may play out differently.
Overall, women and children survived the Titanic at higher rate than adult males, meaning that sex and age had the most significant impact on passenger survival. Class was also a factor, with Third Class passengers surviving at the lowest proportion overall, though Second Class adult male passengers had the highest mortality in the disaster.
British Board of Trade Inquiry Report. (1990). Report on the Loss of the
‘Titanic’ (S.S.) [data file]. Gloucester, UK: Allan Sutton Publishing [producer]. Dawson, R. J. M. (1995). The ‘unusual episode’ data revisited. Journal of Statistics Education [on-line] 3(3) [distributor]. (http://www.amstat.org/publications/jse/v3n3/datasets.dawson.html).
Titanic Casualty Figures. (2015, September 12). Retrieved from http://www.anesi.com/titanic.htm