It seems my own discovery of this dataset coincided with a classmate, Henry Vasquez. The questions I posed to the data are as follows:
One would expect that the wealthy passengers of the Titanic were able to make their way to safety at a more successful rate than those of lower classes. Was there indeed a decreasing survival rate by descending cabin class, as one might skeptically suppose?
One would also expect that the crew would be selfless and give passengers priority to disembark into the safety vessels. Did women crew members fare worse than the fare-paying females?
We all know the motto of “Women and Children First,” but did that rule apply evenly across the classes?
Did men in first class actually survive at a higher rate than women in 3rd Class?
library(datasets)
library(dplyr)
library(ggplot2)
Loading and inspecting the structure of the Titanic dataset. Notice that the attributes are already defined as characters, so no need for stringsasFactors = FALSE argument in the dataframe.
data(Titanic)
str(Titanic)
## 'table' num [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
## - attr(*, "dimnames")=List of 4
## ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
## ..$ Sex : chr [1:2] "Male" "Female"
## ..$ Age : chr [1:2] "Child" "Adult"
## ..$ Survived: chr [1:2] "No" "Yes"
df <- data.frame(Titanic)
tail(df)
## Class Sex Age Survived Freq
## 27 3rd Male Adult Yes 75
## 28 Crew Male Adult Yes 192
## 29 1st Female Adult Yes 140
## 30 2nd Female Adult Yes 80
## 31 3rd Female Adult Yes 76
## 32 Crew Female Adult Yes 20
First I calculate the overall Survivorship Rate using sums. A higher-than-expected 32.3% of passengers survived.
no.passengers <- sum(df$Freq)
no.survivors <- sum(df$Freq[df$Survived == "Yes"])
survivor.rate <- no.survivors / no.passengers
no.passengers
## [1] 2201
no.survivors
## [1] 711
survivor.rate
## [1] 0.323035
73.2% of Women and 52.3% of Children made it off the sinking barge safely.
women.surv.rate <- sum(df$Freq[df$Sex == "Female" & df$Survived == "Yes"]) / sum(df$Freq[df$Sex == "Female"])
child.surv.rate <- sum(df$Freq[df$Age == "Child"& df$Survived == "Yes"]) / sum(df$Freq[df$Age == "Child"])
women.surv.rate
## [1] 0.7319149
child.surv.rate
## [1] 0.5229358
Now, I begin to look at survival rates by Cabin Class. 1st class passengers survived at 62.5%, nearly double the average, which each class showing a steady drop-off. Indeed the crew, the largest group, were the least likely to survive at around 24.0%.
## # A tibble: 4 x 5
## Class No Yes total surv.rate
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1st 122 203 325 0.625
## 2 2nd 167 118 285 0.414
## 3 3rd 528 178 706 0.252
## 4 Crew 673 212 885 0.240
Children of the 1st and 2nd classes did entirely get off safely. 3rd Class children only fared slightly better than the overall average. Was this because their cabins were too far beneath the levels of the deck? Crew did not bring children fortunately.
df.child <- subset(df, Age == "Child")
df.child <- df.child %>%
group_by(Class) %>%
summarise(No = sum(Freq[Survived == "No"]), Yes = sum(Freq[Survived == "Yes"])) %>%
mutate(total = No + Yes, surv.rate = Yes/ (No + Yes))
df.child
## # A tibble: 4 x 5
## Class No Yes total surv.rate
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1st 0 6 6 1
## 2 2nd 0 24 24 1
## 3 3rd 52 27 79 0.342
## 4 Crew 0 0 0 NaN
First class ladies almost entirely survived, but for four. 2nd class did marginally worse, at the same level as the Crew Stewardesses. However, more 3rd class women died then lived that April 15, 1912.
df.women <- subset(df, Sex == "Female")
df.women <- df.women %>%
group_by(Class) %>%
summarise(No = sum(Freq[Survived == "No"]), Yes = sum(Freq[Survived == "Yes"])) %>%
mutate(total = No + Yes, surv.rate = Yes/ (No + Yes))
df.women
## # A tibble: 4 x 5
## Class No Yes total surv.rate
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1st 4 141 145 0.972
## 2 2nd 13 93 106 0.877
## 3 3rd 106 90 196 0.459
## 4 Crew 3 20 23 0.870
As for men, the pattern nearly holds, except that one discovers that 3rd Class Men did actually better than 2nd Class ones. A case of Nice Guys Finishing Last? The Crew Men seemed to have a secret way off that the non-elite couldn’t quite afford?
df.men <- subset(df, Sex == "Male")
df.men <- df.men %>%
group_by(Class) %>%
summarise(No = sum(Freq[Survived == "No"]), Yes = sum(Freq[Survived == "Yes"])) %>%
mutate(total = No + Yes, surv.rate = Yes/ (No + Yes))
df.men
## # A tibble: 4 x 5
## Class No Yes total surv.rate
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1st 118 62 180 0.344
## 2 2nd 154 25 179 0.140
## 3 3rd 422 88 510 0.173
## 4 Crew 670 192 862 0.223
The full picture:
survivor <- data.frame(All = df.all$surv.rate, Men = df.men$surv.rate, Women = df.women$surv.rate, Child = df.child$surv.rate)
labels <- c("1st","2nd","3rd","Crew")
row.names(survivor) <- labels
survivor
## All Men Women Child
## 1st 0.6246154 0.3444444 0.9724138 1.0000000
## 2nd 0.4140351 0.1396648 0.8773585 1.0000000
## 3rd 0.2521246 0.1725490 0.4591837 0.3417722
## Crew 0.2395480 0.2227378 0.8695652 NaN
colours <- c("red", "orange", "blue", "yellow")
barplot(as.matrix(survivor), main="Survival Rates Aboard the Titanic", ylab = "Percent", cex.lab = 1.5, cex.main = 1.4, beside=TRUE, col=colours)
legend("topleft", labels, cex=1.3, bty="n", fill=colours)