Exploring Class’s Impact on Survival Rate on the Titanic

It seems my own discovery of this dataset coincided with a classmate, Henry Vasquez. The questions I posed to the data are as follows:

  1. One would expect that the wealthy passengers of the Titanic were able to make their way to safety at a more successful rate than those of lower classes. Was there indeed a decreasing survival rate by descending cabin class, as one might skeptically suppose?

  2. One would also expect that the crew would be selfless and give passengers priority to disembark into the safety vessels. Did women crew members fare worse than the fare-paying females?

  3. We all know the motto of “Women and Children First,” but did that rule apply evenly across the classes?

  4. Did men in first class actually survive at a higher rate than women in 3rd Class?

library(datasets)
library(dplyr)
library(ggplot2)

Loading and inspecting the structure of the Titanic dataset. Notice that the attributes are already defined as characters, so no need for stringsasFactors = FALSE argument in the dataframe.

data(Titanic)
str(Titanic)
##  'table' num [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
##  - attr(*, "dimnames")=List of 4
##   ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
##   ..$ Sex     : chr [1:2] "Male" "Female"
##   ..$ Age     : chr [1:2] "Child" "Adult"
##   ..$ Survived: chr [1:2] "No" "Yes"
df <- data.frame(Titanic)
tail(df)
##    Class    Sex   Age Survived Freq
## 27   3rd   Male Adult      Yes   75
## 28  Crew   Male Adult      Yes  192
## 29   1st Female Adult      Yes  140
## 30   2nd Female Adult      Yes   80
## 31   3rd Female Adult      Yes   76
## 32  Crew Female Adult      Yes   20

First I calculate the overall Survivorship Rate using sums. A higher-than-expected 32.3% of passengers survived.

no.passengers <- sum(df$Freq)
no.survivors <- sum(df$Freq[df$Survived == "Yes"])
survivor.rate <- no.survivors / no.passengers

no.passengers 
## [1] 2201
no.survivors
## [1] 711
survivor.rate
## [1] 0.323035

73.2% of Women and 52.3% of Children made it off the sinking barge safely.

women.surv.rate <- sum(df$Freq[df$Sex == "Female" & df$Survived == "Yes"]) / sum(df$Freq[df$Sex == "Female"])
child.surv.rate <- sum(df$Freq[df$Age == "Child"& df$Survived == "Yes"]) / sum(df$Freq[df$Age == "Child"])

women.surv.rate
## [1] 0.7319149
child.surv.rate 
## [1] 0.5229358

Now, I begin to look at survival rates by Cabin Class. 1st class passengers survived at 62.5%, nearly double the average, which each class showing a steady drop-off. Indeed the crew, the largest group, were the least likely to survive at around 24.0%.

## # A tibble: 4 x 5
##   Class    No   Yes total surv.rate
##   <fct> <dbl> <dbl> <dbl>     <dbl>
## 1 1st     122   203   325     0.625
## 2 2nd     167   118   285     0.414
## 3 3rd     528   178   706     0.252
## 4 Crew    673   212   885     0.240

Children of the 1st and 2nd classes did entirely get off safely. 3rd Class children only fared slightly better than the overall average. Was this because their cabins were too far beneath the levels of the deck? Crew did not bring children fortunately.

df.child <- subset(df, Age == "Child")

df.child <- df.child %>% 
  group_by(Class) %>% 
  summarise(No = sum(Freq[Survived == "No"]), Yes = sum(Freq[Survived == "Yes"])) %>% 
  mutate(total = No + Yes, surv.rate = Yes/ (No + Yes))

df.child
## # A tibble: 4 x 5
##   Class    No   Yes total surv.rate
##   <fct> <dbl> <dbl> <dbl>     <dbl>
## 1 1st       0     6     6     1    
## 2 2nd       0    24    24     1    
## 3 3rd      52    27    79     0.342
## 4 Crew      0     0     0   NaN

First class ladies almost entirely survived, but for four. 2nd class did marginally worse, at the same level as the Crew Stewardesses. However, more 3rd class women died then lived that April 15, 1912.

df.women <- subset(df, Sex == "Female")

df.women <- df.women %>% 
  group_by(Class) %>% 
  summarise(No = sum(Freq[Survived == "No"]), Yes = sum(Freq[Survived == "Yes"]))  %>% 
  mutate(total = No + Yes, surv.rate = Yes/ (No + Yes))

df.women
## # A tibble: 4 x 5
##   Class    No   Yes total surv.rate
##   <fct> <dbl> <dbl> <dbl>     <dbl>
## 1 1st       4   141   145     0.972
## 2 2nd      13    93   106     0.877
## 3 3rd     106    90   196     0.459
## 4 Crew      3    20    23     0.870

As for men, the pattern nearly holds, except that one discovers that 3rd Class Men did actually better than 2nd Class ones. A case of Nice Guys Finishing Last? The Crew Men seemed to have a secret way off that the non-elite couldn’t quite afford?

df.men <- subset(df, Sex == "Male")

df.men <- df.men %>% 
  group_by(Class) %>% 
  summarise(No = sum(Freq[Survived == "No"]), Yes = sum(Freq[Survived == "Yes"]))  %>% 
  mutate(total = No + Yes, surv.rate = Yes/ (No + Yes))

df.men
## # A tibble: 4 x 5
##   Class    No   Yes total surv.rate
##   <fct> <dbl> <dbl> <dbl>     <dbl>
## 1 1st     118    62   180     0.344
## 2 2nd     154    25   179     0.140
## 3 3rd     422    88   510     0.173
## 4 Crew    670   192   862     0.223

The full picture:

survivor <- data.frame(All = df.all$surv.rate, Men = df.men$surv.rate, Women = df.women$surv.rate, Child = df.child$surv.rate)
labels <- c("1st","2nd","3rd","Crew")
row.names(survivor) <- labels
survivor
##            All       Men     Women     Child
## 1st  0.6246154 0.3444444 0.9724138 1.0000000
## 2nd  0.4140351 0.1396648 0.8773585 1.0000000
## 3rd  0.2521246 0.1725490 0.4591837 0.3417722
## Crew 0.2395480 0.2227378 0.8695652       NaN
colours <- c("red", "orange", "blue", "yellow")

barplot(as.matrix(survivor), main="Survival Rates Aboard the Titanic", ylab = "Percent", cex.lab = 1.5, cex.main = 1.4, beside=TRUE, col=colours)

legend("topleft", labels, cex=1.3, bty="n", fill=colours)