The goal of this work is to explore risk factors and create an analysis for survival likelihood of an individual passenger on the ship Titanic which sunk in 1912. The sinking of the Titanic was one of the deadliest, most infamous shipwrecks in history. The Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. Many factors are involved in understanding who survived this shipwreck and why. In addition to sheer luck, some groups of people such as women, children, and the upper-class were more likely to survive than others. There are many systemic and social factors that contributed to being or becoming “upper-class” such as race and education. The sinking of the Titanic is a prime example of how privilege can compound and continue to give benefit beyond expectation.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(psych)
## Warning: package 'psych' was built under R version 4.4.2
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(RWeka)
## Warning: package 'RWeka' was built under R version 4.4.2
library(dplyr)
library(stringr)
library(caret)
## Warning: package 'caret' was built under R version 4.4.2
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(ggplot2)
library(broom)
The recommended Titanic CSV file does not include any indicators of
race/ethnicity. I believed this be a potential factor in survival so I
found another CSV that contains country to help explore if country of
origin (as a shorthand for race) had an impact. This author acknowledges
that it is an incomplete shorthand, but was the most readily available
information. Race itself may be an understudied component of this data
set.
I hoped to join the country column back into the recommended
‘titanic.csv’ data set.
titanic <- read.csv("titanic.csv", stringsAsFactors = FALSE)
titanicwnation <- read.csv("titanicwnation.csv", stringsAsFactors = FALSE)
# Clean the name columns in both datasets
titanic <- titanic %>%
mutate(Name_clean = str_to_lower(str_trim(Name)))
titanicwnation <- titanicwnation %>%
mutate(name_clean = str_to_lower(str_trim(name)))
# Join using cleaned name columns
titanic <- titanic %>%
left_join(titanicwnation %>% select(name_clean, country), by = c("Name_clean" = "name_clean"))
# Count NAs in titanic$Name
sum(is.na(titanic$country))
## [1] 435
# Count NAs in titanicwnation$name
sum(is.na(titanicwnation$country))
## [1] 81
Because of differences in the formatting of the Names column, the join was lackluster. As seen above, the join resulted in significantly more null values in the ‘country’ column than existed in the ‘titanicwnation’ data set. ‘titanicwnation’ also has more entries. For these reasons I have elected to forgo ‘titanic.csv’ in favor of ‘titanicwnation.csv’. I am changing the name of ‘titanic’ to ‘titanic_old’ and ‘titanicwnation’ to simply ‘titanic’
# Rename 'titanic' to 'titanic_old'
titanic_old <- titanic
# Rename 'titanicwnation' to 'titanic'
titanic <- titanicwnation
VARIABLE DESCRIPTIONS:
survived Survival (Yes,No)
class Passenger Class(1 = 1st; 2 = 2nd; 3 = 3rd) (class is a proxy for socio-economic status (SES)
1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower)(Additionally the addtional dataset selected later includes staff, and where they worked)
name Name
gender Gender
age Age (Age is in Years; Fractional if Age less than One (1) If the Age is Estimated, it is in the form xx.5)
sibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Number
fare Passenger Fare
embarked Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
country Country of origin (ADDED VIA JOIN)
titanic %>% str()
## 'data.frame': 2207 obs. of 12 variables:
## $ name : chr "Abbing, Mr. Anthony" "Abbott, Mr. Eugene Joseph" "Abbott, Mr. Rossmore Edward" "Abbott, Mrs. Rhoda Mary 'Rosa'" ...
## $ gender : chr "male" "male" "male" "female" ...
## $ age : num 42 13 16 39 16 25 30 28 27 20 ...
## $ class : chr "3rd" "3rd" "3rd" "3rd" ...
## $ embarked : chr "S" "S" "S" "S" ...
## $ country : chr "United States" "United States" "United States" "England" ...
## $ ticketno : int 5547 2673 2673 2673 348125 348122 3381 3381 2699 3101284 ...
## $ fare : num 7.11 20.05 20.05 20.05 7.13 ...
## $ sibsp : int 0 0 1 1 0 0 1 1 0 0 ...
## $ parch : int 0 2 1 1 0 0 0 0 0 0 ...
## $ survived : chr "no" "no" "no" "yes" ...
## $ name_clean: chr "abbing, mr. anthony" "abbott, mr. eugene joseph" "abbott, mr. rossmore edward" "abbott, mrs. rhoda mary 'rosa'" ...
titanic %>% head()
## name gender age class embarked country
## 1 Abbing, Mr. Anthony male 42 3rd S United States
## 2 Abbott, Mr. Eugene Joseph male 13 3rd S United States
## 3 Abbott, Mr. Rossmore Edward male 16 3rd S United States
## 4 Abbott, Mrs. Rhoda Mary 'Rosa' female 39 3rd S England
## 5 Abelseth, Miss. Karen Marie female 16 3rd S Norway
## 6 Abelseth, Mr. Olaus Jørgensen male 25 3rd S United States
## ticketno fare sibsp parch survived name_clean
## 1 5547 7.11 0 0 no abbing, mr. anthony
## 2 2673 20.05 0 2 no abbott, mr. eugene joseph
## 3 2673 20.05 1 1 no abbott, mr. rossmore edward
## 4 2673 20.05 1 1 yes abbott, mrs. rhoda mary 'rosa'
## 5 348125 7.13 0 0 yes abelseth, miss. karen marie
## 6 348122 7.13 0 0 yes abelseth, mr. olaus jørgensen
titanic %>% tail()
## name gender age class embarked country
## 2202 Wyeth, Mr. James Robert male 26 engineering crew S England
## 2203 Wynn, Mr. Walter male 41 deck crew B England
## 2204 Yearsley, Mr. Harry male 40 victualling crew S England
## 2205 Young, Mr. Francis James male 32 engineering crew S England
## 2206 Zanetti, Sig. Minio male 20 restaurant staff S England
## 2207 Zarracchi, Sig. L. male 26 restaurant staff S England
## ticketno fare sibsp parch survived name_clean
## 2202 NA NA NA NA no wyeth, mr. james robert
## 2203 NA NA NA NA yes wynn, mr. walter
## 2204 NA NA NA NA yes yearsley, mr. harry
## 2205 NA NA NA NA no young, mr. francis james
## 2206 NA NA NA NA no zanetti, sig. minio
## 2207 NA NA NA NA no zarracchi, sig. l.
titanic %>% summary()
## name gender age class
## Length:2207 Length:2207 Min. : 0.1667 Length:2207
## Class :character Class :character 1st Qu.:22.0000 Class :character
## Mode :character Mode :character Median :29.0000 Mode :character
## Mean :30.4444
## 3rd Qu.:38.0000
## Max. :74.0000
## NA's :2
## embarked country ticketno fare
## Length:2207 Length:2207 Min. : 2 Min. : 3.030
## Class :character Class :character 1st Qu.: 14262 1st Qu.: 7.181
## Mode :character Mode :character Median : 111427 Median : 14.090
## Mean : 284216 Mean : 33.405
## 3rd Qu.: 347077 3rd Qu.: 31.061
## Max. :3101317 Max. :512.061
## NA's :891 NA's :916
## sibsp parch survived name_clean
## Min. :0.0000 Min. :0.0000 Length:2207 Length:2207
## 1st Qu.:0.0000 1st Qu.:0.0000 Class :character Class :character
## Median :0.0000 Median :0.0000 Mode :character Mode :character
## Mean :0.4996 Mean :0.3856
## 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :8.0000 Max. :9.0000
## NA's :900 NA's :900
dropped_col_df <- titanic %>% select(name_clean)
titanic <- titanic %>% select(-name_clean)
#Check that column was dropped successfully:
titanic %>% str()
## 'data.frame': 2207 obs. of 11 variables:
## $ name : chr "Abbing, Mr. Anthony" "Abbott, Mr. Eugene Joseph" "Abbott, Mr. Rossmore Edward" "Abbott, Mrs. Rhoda Mary 'Rosa'" ...
## $ gender : chr "male" "male" "male" "female" ...
## $ age : num 42 13 16 39 16 25 30 28 27 20 ...
## $ class : chr "3rd" "3rd" "3rd" "3rd" ...
## $ embarked: chr "S" "S" "S" "S" ...
## $ country : chr "United States" "United States" "United States" "England" ...
## $ ticketno: int 5547 2673 2673 2673 348125 348122 3381 3381 2699 3101284 ...
## $ fare : num 7.11 20.05 20.05 20.05 7.13 ...
## $ sibsp : int 0 0 1 1 0 0 1 1 0 0 ...
## $ parch : int 0 2 1 1 0 0 0 0 0 0 ...
## $ survived: chr "no" "no" "no" "yes" ...
titanic$survived <- factor(titanic$survived)
titanic$gender <- factor(titanic$gender)
titanic$class <- factor(titanic$class)
titanic$embarked <- factor(titanic$embarked)
#Check to make sure factorization was successful
titanic %>% str()
## 'data.frame': 2207 obs. of 11 variables:
## $ name : chr "Abbing, Mr. Anthony" "Abbott, Mr. Eugene Joseph" "Abbott, Mr. Rossmore Edward" "Abbott, Mrs. Rhoda Mary 'Rosa'" ...
## $ gender : Factor w/ 2 levels "female","male": 2 2 2 1 1 2 2 1 2 2 ...
## $ age : num 42 13 16 39 16 25 30 28 27 20 ...
## $ class : Factor w/ 7 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 2 2 3 3 ...
## $ embarked: Factor w/ 4 levels "B","C","Q","S": 4 4 4 4 4 4 2 2 2 4 ...
## $ country : chr "United States" "United States" "United States" "England" ...
## $ ticketno: int 5547 2673 2673 2673 348125 348122 3381 3381 2699 3101284 ...
## $ fare : num 7.11 20.05 20.05 20.05 7.13 ...
## $ sibsp : int 0 0 1 1 0 0 1 1 0 0 ...
## $ parch : int 0 2 1 1 0 0 0 0 0 0 ...
## $ survived: Factor w/ 2 levels "no","yes": 1 1 1 2 2 2 1 2 2 2 ...
titanic %>% summarize(across(everything(), ~ sum(is.na(.))))
## name gender age class embarked country ticketno fare sibsp parch survived
## 1 0 0 2 0 0 81 891 916 900 900 0
dropped_titanic <- titanic %>% drop_na()
nrow(dropped_titanic)
## [1] 1213
#gender, age
titanic %>% select(gender,age) %>% summary()
## gender age
## female: 489 Min. : 0.1667
## male :1718 1st Qu.:22.0000
## Median :29.0000
## Mean :30.4444
## 3rd Qu.:38.0000
## Max. :74.0000
## NA's :2
#gender, age, fare
titanic %>% select(gender,age,fare) %>% summary()
## gender age fare
## female: 489 Min. : 0.1667 Min. : 3.030
## male :1718 1st Qu.:22.0000 1st Qu.: 7.181
## Median :29.0000 Median : 14.090
## Mean :30.4444 Mean : 33.405
## 3rd Qu.:38.0000 3rd Qu.: 31.061
## Max. :74.0000 Max. :512.061
## NA's :2 NA's :916
Insights:
# Quantile ex. fare
titanic %>% pull(fare) %>% quantile(., seq(from = 0, to = 1, by = 0.20), na.rm = TRUE)
## 0% 20% 40% 60% 80% 100%
## 3.0305 7.1706 11.0208 22.0702 42.0800 512.0607
titanic %>% pull(fare) %>% quantile(., seq(from = 0, to = 1, by = 0.10), na.rm = TRUE)
## 0% 10% 20% 30% 40% 50% 60% 70%
## 3.0305 7.1408 7.1706 8.0100 11.0208 14.0902 22.0702 27.1405
## 80% 90% 100%
## 42.0800 78.0504 512.0607
# Quantile ex. age
titanic %>% pull(age) %>% quantile(., seq(from = 0, to = 1, by = 0.20), na.rm = TRUE)
## 0% 20% 40% 60% 80% 100%
## 0.1666667 21.0000000 26.0000000 32.0000000 40.0000000 74.0000000
titanic %>% pull(age) %>% quantile(., seq(from = 0, to = 1, by = 0.10), na.rm = TRUE)
## 0% 10% 20% 30% 40% 50% 60%
## 0.1666667 18.0000000 21.0000000 24.0000000 26.0000000 29.0000000 32.0000000
## 70% 80% 90% 100%
## 36.0000000 40.0000000 46.0000000 74.0000000
Insights:
Quantile analysis of fares indicates that the majority (80%) of fares were $42 or less. The max fare, $512, is somewhat of an outlier.
Quantile analysis of ages indicates that the majority (80%) of ages were 36-40 years or less.
#ggplot
dropped_titanic %>%
ggplot(aes(x= fare)) +
geom_boxplot() +
ggtitle('boxplot of Fare')
# histogram of a numeric variable, fare:
hist(dropped_titanic$fare, main = "Histogram of Fare in the titanic data set",
xlab = "Fare")
Insights:
dropped_titanic %>% ggplot() +
geom_histogram(aes(x=age),binwidth = 20) +
ggtitle('Histogram of Age in the titanic data set')
Insights:
#base R
var(dropped_titanic$fare)
## [1] 2324.664
sd(dropped_titanic$fare)
## [1] 48.21477
Insights:
dropped_titanic %>% pull(age) %>% var()
## [1] 196.6936
dropped_titanic %>% pull(age) %>% sd()
## [1] 14.02475
Insights:
# cor, boxplot, 2D scatter plot - plot, 3D scatter plot
# ggplot
dropped_titanic %>% ggplot() +
geom_point(aes(x=age,y=fare))
cov(dropped_titanic[,c("fare","age")])
## fare age
## fare 2324.6644 126.9543
## age 126.9543 196.6936
var(dropped_titanic[,c("fare","age")])
## fare age
## fare 2324.6644 126.9543
## age 126.9543 196.6936
Insights:
# Generate 2D scatter plots and correlation coefficients
dropped_titanic %>% select(where(is.numeric)) %>% pairs.panels()
Insights: Pairs tells us much we already know-
Fare tends to increase slightly with age, but not strongly.
Passengers with more siblings/spouses (SibSp) tend to have more parents/children (Parch) — makes sense for family groups.
Ticket number has weak or no meaningful correlation with other variables — just a random identifier.
dropped_col_df <- titanic %>% select(ticketno)
titanic <- titanic %>% select(-ticketno)
# Count of survived by country
titanic %>%
group_by(country, survived) %>%
summarise(count = n(), .groups = "drop")
## # A tibble: 79 × 3
## country survived count
## <chr> <fct> <int>
## 1 Argentina no 7
## 2 Australia no 5
## 3 Austria no 2
## 4 Austria yes 1
## 5 Belgium no 17
## 6 Belgium yes 5
## 7 Bosnia no 4
## 8 Bulgaria no 19
## 9 Canada no 19
## 10 Canada yes 15
## # ℹ 69 more rows
titanic %>%
group_by(country) %>%
summarise(
total = n(),
survived = sum(survived == "yes"), # or survived == 1 if it's numeric
survival_ratio = survived / total,
.groups = "drop"
)
## # A tibble: 49 × 4
## country total survived survival_ratio
## <chr> <int> <int> <dbl>
## 1 Argentina 7 0 0
## 2 Australia 5 0 0
## 3 Austria 3 1 0.333
## 4 Belgium 22 5 0.227
## 5 Bosnia 4 0 0
## 6 Bulgaria 19 0 0
## 7 Canada 34 15 0.441
## 8 Channel Islands 17 8 0.471
## 9 China/Hong Kong 1 1 1
## 10 Croatia 11 0 0
## # ℹ 39 more rows
Insights from country are interesting but may not be large enough samples to be significant. For example, Italy is showing a 100% survival rate– because the one Italian person who exists in the data survived. This data also sometimes fails to show a full portrait of an individual’s life. For example, Mr. Joseph Philippe Lemercier Laroche– the only verified Black passenger on the Titanic, is listed as French because he lived in France when the ticket was purchased. Mr. Laroche was born in Haiti, and married a French white woman named Juliette. The two had been living in France with their two daughters and travelling to Haiti so their youngest could be born in his father’s homeland. It is believed he was only granted passage on the Titanic because the family had booked passage with another liner and transferred their tickets to the Titanic. Mr. Laroche’s family survived and he did not. One account tells of Mr.Laroche being barred from attempts to hand his baby daughter to his wife when she was in the escape boat. The account states the the white boatman barring Mr.Laroche didn’t realize the parentage of the child and appeared to be stopping Mr.Laroche because of his Blackness. Mr. Laroche has been mistreated by the boatmen and is now mistreated by the creators of the datasets who either leave out ancestry entirely or who erase the full context of his story.
dropped_titanic <- dropped_titanic %>%
mutate(country = ifelse(name == "Laroche, Mr. Joseph Philippe Lemercier", "Haiti", country))
#tidyveryse plot of class
titanic %>% ggplot() +
geom_bar(aes(x=class)) +
ggtitle("Barplot of class")
#tidyveryse plot of gender
titanic %>% ggplot() +
geom_bar(aes(x=gender)) +
ggtitle("Barplot of gender")
Insights:
There were significantly more men aboard the Titanic than women. This is important to note for later data that shows survival rates of the sexes.
The highest single passenger class is 3rd – this class has the cheapest tickets and some sources indicate that some workers (especially workers of color) had to pay their own 3rd class fare in order to work the ship. This inequity is relevant to the low survival rates for 3rd class passengers.
titanic %>% select(survived,class) %>% table()
## class
## survived 1st 2nd 3rd deck crew engineering crew restaurant staff
## no 123 166 528 23 253 66
## yes 201 118 181 43 71 3
## class
## survived victualling crew
## no 337
## yes 94
titanic %>% select(survived,class) %>% table() %>% prop.table() %>% round(2)
## class
## survived 1st 2nd 3rd deck crew engineering crew restaurant staff
## no 0.06 0.08 0.24 0.01 0.11 0.03
## yes 0.09 0.05 0.08 0.02 0.03 0.00
## class
## survived victualling crew
## no 0.15
## yes 0.04
titanic %>% select(survived,class) %>% table() %>% prop.table() %>% round(2) * 100
## class
## survived 1st 2nd 3rd deck crew engineering crew restaurant staff
## no 6 8 24 1 11 3
## yes 9 5 8 2 3 0
## class
## survived victualling crew
## no 15
## yes 4
Insights:
Number of deaths among 3rd class passengers are especially high, even when taking into account the large size of that group
Number of deaths among engineering crew are also especially high, and ratio of death for restaurant crew is high when compared to other staff groups
1st class has both the highest survival count and ratio compared to all other groups
#tidyveryse barplot for passenger class by survived
titanic %>% ggplot() +
geom_bar(aes(x=class,fill=survived),position="dodge") +
ggtitle("Barplot of Class by Survived")
#tidyverse barplot for gender by survived
titanic %>% ggplot() +
geom_bar(aes(x=gender,fill=survived),position="dodge") +
ggtitle("Barplot of Gender by Survived")
titanic %>% ggplot() +
geom_bar(aes(x=gender,fill= class),position="dodge") +
ggtitle("Barplot of Gender by class")
summary_data <- dropped_titanic %>%
group_by(class, gender) %>%
summarize(survival_rate = mean(survived == "yes"))
## `summarise()` has grouped output by 'class'. You can override using the
## `.groups` argument.
ggplot(summary_data, aes(x = class, y = survival_rate, fill = gender)) +
geom_col(position = "dodge") +
labs(title = "Survival Rates by Class and Gender",
y = "Survival Rate", x = "Passenger Class") +
scale_fill_brewer(palette = "Pastel1", name = "Gender") +
theme_classic()
Insights:
Reinforces that 1st class had high survival
Visually clarifies that victualing crew had high survival among crew members
Shows strong correlation with gender (Female) and survival. Similar numbers of men and women survived (slightly more women) and as mentioned earlier, men significantly outnumbered women
Age was not much of a factor in survival
# survived by age
dropped_titanic %>%
ggplot() +
geom_boxplot(aes(x=survived,y= age)) +
ggtitle('Survived by age')
dropped_titanic %>%
ggplot() +
geom_boxplot(aes(x=survived,y= fare)) +
ggtitle('Survived by Fare')
ggplot(dropped_titanic, aes(x = survived, y = age, fill = gender)) +
geom_boxplot(outlier.shape = 21) +
facet_wrap(~ class) +
labs(title = "Age, Gender, and Survival by Class", y = "Age", x = "Survived") +
scale_fill_manual(values = c("goldenrod", "lightblue")) +
theme_minimal()
Insights:
Survival is somewhat tied to fare but the large number of 3rd class tickets sold and the variance in fare cost even among 1st class tickets muddy the plot slightly. What is clear is that those who paid the very most were likely to survive, and those who paid the least were likely to perish.
Age does not seem to be much of a factor, given that the spread is similar between survival and non-survival.
aggregate(fare~survived, summary, data = dropped_titanic)
## survived fare.Min. fare.1st Qu. fare.Median fare.Mean fare.3rd Qu. fare.Max.
## 1 no 4.00000 7.17080 12.03080 23.74167 26.00000 263.00000
## 2 yes 3.03050 10.33525 26.00000 45.95062 56.18070 512.06070
#tidyverse
dropped_titanic %>%
group_by(survived) %>%
summarize(
min = min(fare),
q1 = quantile(fare, 0.25),
median = median(fare),
mean = mean(fare),
q3 = quantile(fare, 0.75),
max = max(fare)
)
## # A tibble: 2 × 7
## survived min q1 median mean q3 max
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 no 4 7.17 12.0 23.7 26 263
## 2 yes 3.03 10.3 26 46.0 56.2 512.
plot(dropped_titanic$age,dropped_titanic$fare, col=dropped_titanic$survived, pch = as.numeric((dropped_titanic$survived)))
dropped_titanic %>% ggplot() + geom_point(aes(x=age,y=fare,color=survived))
Insights from another viewing style of these factors reinforces:
# Fit a logistic regression model
model <- glm(survived ~ class + gender + age + fare,
data = dropped_titanic,
family = binomial) #logistic regression
# View model summary
summary(model)
##
## Call:
## glm(formula = survived ~ class + gender + age + fare, family = binomial,
## data = dropped_titanic)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.963470 0.373026 10.625 < 2e-16 ***
## class2nd -1.450138 0.249754 -5.806 6.39e-09 ***
## class3rd -2.599696 0.255797 -10.163 < 2e-16 ***
## gendermale -2.587138 0.159321 -16.239 < 2e-16 ***
## age -0.039171 0.006168 -6.351 2.14e-10 ***
## fare -0.002324 0.001893 -1.228 0.22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1619.6 on 1212 degrees of freedom
## Residual deviance: 1119.3 on 1207 degrees of freedom
## AIC: 1131.3
##
## Number of Fisher Scoring iterations: 4
# Predict probabilities
predicted_probs <- predict(model, type = "response")
# Predict classes
predicted_classes <- ifelse(predicted_probs > 0.5, "yes", "no") %>% factor()
conf_mat <- confusionMatrix(predicted_classes, dropped_titanic$survived)
conf_mat$table
## Reference
## Prediction no yes
## no 628 136
## yes 115 334
tidy(model) %>%
ggplot(aes(x = term, y = estimate)) +
geom_point() +
geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.2) +
coord_flip() +
ggtitle("Logistic Regression Coefficients")
Insights:
1st class female most likely to survive
Men and 3rd class least
Age and fare have little impact/are too dispersed to impact coefficients
In the above model, the intercept represents the baseline log-odds of survival for a female passenger in 1st class with an average fare and age. All other coefficients reflect how the odds change relative to this reference profile. The model has selected 1st-class female passengers as the most likely to survive. The groups least likely are 3rd class passengers and men. It is important to note that women had higher survival odds, likely because of a “women and children first” mentality. This mentality was likely not applied evenly, as seen in section 24. 3rd class women are still more likely to survive than even 1st class men, but were significantly less likely to survive than 1st class women. These results have been consistent across the analytic methods and have troubling implications about wealth inequity. I believe that some of the causes for a high death rate among 3rd class passengers relates to their location in the ship, as lower class cabins were in less desirable, deeper parts of the Titanic, which would have been difficult to escape. This could also contribute to understanding the high deaths among engineers above all other workers. Additionally, their duties may have required them to stay as long as possible. I believe there could be social factors at play, like a conscious or unconscious bias towards wealthier passengers. That theory is supported by the fact that the only people of color on board (Mr.Laroche, Chinese workmen, and more) were all in 2nd class or lower. This analysis is just a short overview of an emergency situation which had many factors for survival. It would take a more in-depth cultural analysis to confirm theories on why some survived and others didn’t. That said, it is clear to me that wealth, gender, and race had a major role in determining who survived that fateful night the Titanic was overcome.