Assignment Guidelines can be found here

The following is an analysis of passenger survival data from the Titanic ship wreck disaster:

1. Preparing the Data for Analysis

The first thing that I did was to create a new R Markdown project file for this assignment. Also, I began with the assumption that I would need the following R analysis packages: datasets, ggvis, dplyr and magritter, so I loaded them:

r require(datasets) require(ggvis)

## Loading required package: ggvis

r require(dplyr)

## Loading required package: dplyr ## ## Attaching package: 'dplyr' ## ## The following objects are masked from 'package:stats': ## ## filter, lag ## ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union

r require(magrittr)

## Loading required package: magrittr

I also read the .csv file into R from the WFED 540 course website, as prompted in the assignment:

r survivaldata <- read.csv(file = "http://www.personal.psu.edu/dlp/w540/datasets/titanicsurvival.csv", header = TRUE, sep=",")

2. Calculating the Total Number of Passengers in the Dataset

The R Environment pane tells us that the dataset which I have named ‘survivaldata’ has 2201 observations of 4 variables, and the assignment instructions confirm that there are no missing data in this data set. There are a variety of ways to double-check this number with R, including the ‘tbl_df’ command, which displays the number of observations and the number of variables in the “Source” line at the top.

survivaldata1 <-tbl_df(survivaldata)
survivaldata1
## Source: local data frame [2,201 x 4]
## 
##    Class   Age   Sex Survive
##    (int) (int) (int)   (int)
## 1      1     1     1       1
## 2      1     1     1       1
## 3      1     1     1       1
## 4      1     1     1       1
## 5      1     1     1       1
## 6      1     1     1       1
## 7      1     1     1       1
## 8      1     1     1       1
## 9      1     1     1       1
## 10     1     1     1       1
## ..   ...   ...   ...     ...

3. Calculating the Total Proportion of Passengers Surviving

To calculate the proportion of survivors, I have created a table which sorted the data based on the two conditions of the ‘Survive’ variable: a ‘1’ indicates that the passenger survived, and a ‘0’ indicates that they did not survive.

question3 <- table(survivaldata$Survive)
question3
## 
##    0    1 
## 1490  711
711/2201
## [1] 0.323035

As evidenced in the table, 1490 of the passengers in the data set did not survive, and 711 of the passengers did survive. Arthimatic in R calcualtes the proportion of survivors as 711/2201 = .323.

R also offers a ‘prop.table’ function that will automatically calcuate the proportion of survivors, with “TRUE” representing the ’Survive = 1:

proportions <- table(survivaldata$Survive==1)
proportions
## 
## FALSE  TRUE 
##  1490   711
prop.table(proportions)
## 
##    FALSE     TRUE 
## 0.676965 0.323035

4. Calculating the Survival Rate for Each Class of Passenger

To calculate the porporation of each class that survived the disaster, I once again used the ‘prop.table’ function. To use the ‘prop.table’ command, I first created a table indicating the data set to refer to and the variables to analyze. In this table, the rows indicate passenger class and the columns indicate passenger survival:

question4 <- table(survivaldata$Class, survivaldata$Survive)
question4
##    
##       0   1
##   0 673 212
##   1 122 203
##   2 167 118
##   3 528 178
prop.table(question4, margin=1)
##    
##             0         1
##   0 0.7604520 0.2395480
##   1 0.3753846 0.6246154
##   2 0.5859649 0.4140351
##   3 0.7478754 0.2521246

Note that the greatest proportion of First Class passengers survived at .624, followed by Second Class passengers at .414%. Third Class and Crew Members had the lowest survival rates at .252 and .239, respectively.

5. Calculating the Survival Rate by Gender Category

We can calculate the percentage of male versus female survivors in the same manner, using the “table()” and “prop.table()” functions to view the passenger survival data. In this table, the rows indicate passenger gender and the columns indicate passenger survival:

question5 <- table(survivaldata$Sex, survivaldata$Survive)
question5
##    
##        0    1
##   0  126  344
##   1 1364  367
prop.table(question5, margin=1)
##    
##             0         1
##   0 0.2680851 0.7319149
##   1 0.7879838 0.2120162

Notably, female passengers survived at in significantly greater proportion than male passengers at .731 and .212, respectively.

6. Calculating the Proportion of Passengers Surviving from Each Age Category

In order to understand the proportion of survivors from each age category, we use a similar procedure, but this time we will create a table showing Age and Survive variables. The rows indicate passenger age and the columns indicate passenger survival:

question6 <- table(survivaldata$Age, survivaldata$Survive)
question6
##    
##        0    1
##   0   52   57
##   1 1438  654
prop.table(question6, margin = 1)
##    
##             0         1
##   0 0.4770642 0.5229358
##   1 0.6873805 0.3126195

Notably, adult passengers had a much lower survival rate than child passengers at .321 and .522, respectively.

7. Calculating the Proportion of Passengers Surviving Based on Age and Sex

To examine passenger survival data by Sex, Age and Survive variables we will first filter the survivaldata set by sex, then calculate the proportion of each passenger subset that survived. The rows represent passenger age and the columns indicate passenger survival:

males<- filter(survivaldata, Sex ==1)
malesS <- ftable(males$Age, males$Survive)
prop.table(malesS, margin = 1)
##            0         1
##                       
## 0  0.5468750 0.4531250
## 1  0.7972406 0.2027594

Here we see that a proportion of .435 of the male child passengers survived and .202 of the adult male passengers survived.

females<- filter(survivaldata, Sex ==0)
femalesS <- ftable(females$Age, females$Survive)
prop.table(femalesS, margin = 1)
##            0         1
##                       
## 0  0.3777778 0.6222222
## 1  0.2564706 0.7435294

From this table we see that .622 of the female child passengers survived and .743 of the adult female passengers survived.

Adult female passengers survived in the highest proportion, followed by female child passengers and male child passengers, respectively. Adult males survived in the lowest proportion.

8. Calculating the Proportion of Passengers Surviving Based on Age, Sex and Class

The proportion of passengers surviving from each age/sex/class category can be calcuated with a similar procedure, filtering the ‘survivaldata’ data set by Sex, creating a flat table (‘ftable) with the data sorted by the Age, Class and Survive==1 variables, and then executing the ’prop.table’ command to calculate the proportion of surviving passengers in each subset. Note that I have multiplied the prop.table results by 100 so that I can view the results as a percentage, which is easier for me to process than proportions. The table displays Age, Sex and Class in the rows and Survive in the columns:

question8 <- ftable(survivaldata$Sex, survivaldata$Age, survivaldata$Class, survivaldata$Survive==1)
question8
##        FALSE TRUE
##                  
## 0 0 0      0    0
##     1      0    1
##     2      0   13
##     3     17   14
##   1 0      3   20
##     1      4  140
##     2     13   80
##     3     89   76
## 1 0 0      0    0
##     1      0    5
##     2      0   11
##     3     35   13
##   1 0    670  192
##     1    118   57
##     2    154   14
##     3    387   75
prop.table(question8, margin=1)
##             FALSE       TRUE
##                             
## 0 0 0         NaN        NaN
##     1  0.00000000 1.00000000
##     2  0.00000000 1.00000000
##     3  0.54838710 0.45161290
##   1 0  0.13043478 0.86956522
##     1  0.02777778 0.97222222
##     2  0.13978495 0.86021505
##     3  0.53939394 0.46060606
## 1 0 0         NaN        NaN
##     1  0.00000000 1.00000000
##     2  0.00000000 1.00000000
##     3  0.72916667 0.27083333
##   1 0  0.77726218 0.22273782
##     1  0.67428571 0.32571429
##     2  0.91666667 0.08333333
##     3  0.83766234 0.16233766
prop.table(question8, margin=1)*100
##             FALSE       TRUE
##                             
## 0 0 0         NaN        NaN
##     1    0.000000 100.000000
##     2    0.000000 100.000000
##     3   54.838710  45.161290
##   1 0   13.043478  86.956522
##     1    2.777778  97.222222
##     2   13.978495  86.021505
##     3   53.939394  46.060606
## 1 0 0         NaN        NaN
##     1    0.000000 100.000000
##     2    0.000000 100.000000
##     3   72.916667  27.083333
##   1 0   77.726218  22.273782
##     1   67.428571  32.571429
##     2   91.666667   8.333333
##     3   83.766234  16.233766

The proportions of passengers surviving are detailed in the table. Not that, although Third Class passengers had the highest overall mortality rate (as noted in question #4), Second Class male passengers actually had the highest mortaility rate when the data is further stratified by age, sex and class. Although I read one theory that Second Class men were the most compelled by duty to see that the ‘women and children first’ rule applied to boarding of lifeboats (“Titanic Casualty Figures”), this is likely a function of the ‘sample proportion’ from that one and only sample of Titanic passenger survivors. If the ship sank again and another sample was taken, it very well may play out differently.

Summary of Findings

Overall, women and children survived the Titanic at higher rate than adult males, meaning that sex and age had the most significant impact on passenger survival. Class was also a factor, with Third Class passengers surviving at the lowest proportion overall, though Second Class adult male passengers had the highest mortality in the disaster.



References

British Board of Trade Inquiry Report. (1990). Report on the Loss of the
‘Titanic’ (S.S.) [data file]. Gloucester, UK: Allan Sutton Publishing [producer]. Dawson, R. J. M. (1995). The ‘unusual episode’ data revisited. Journal of Statistics Education [on-line] 3(3) [distributor]. (http://www.amstat.org/publications/jse/v3n3/datasets.dawson.html).

Titanic Casualty Figures. (2015, September 12). Retrieved from http://www.anesi.com/titanic.htm