The Titanic was one of the most historic tragedies ever recorded. The Titanic was a British passenger ship that sunk due to it hitting an iceberg on the voyage from Southampton, England to New York City, United States. The data set used in this report includes variables such as passenger class, survived or not, passenger names, gender, age, family, fare price, location embarked, and the date. The topic that is being explored is how sex and passenger class affects survival rates for males and females. There are many factors (and key issues) that go into consideration with survival, including the women and children first rule, who has access to lifeboats (only 20 were on the ship), etc.
According to the Titanic presentation from earlier for this class, my group and I have found that women have higher chances of survival than men due to the drastic differences between genders. Those who were in first class had a higher chance of survival as well due to the cabins being closer to the lifeboats. So, if you are a woman and were seated in first class, you had the greatest chance of survival over the other passengers. According to a peer-reviewed academic journal, Social Science & Medicine: Social Class and Survival on the S.S. Titanic, Table 1 on page 2 (page 688 in the journal) shows that 97.3% of women in first class survived over only 32.6% of men that survived in first class. If we compare third class and first class women, only 42.2% of women in third class survived over the 97.3% in first class.
Source: https://www.sciencedirect.com/science/article/pii/0277953686900419
The main objective of the current study is to figure out answers to the research question: how sex and passenger class affect survival rates for males and females? This is important because it shows the social and socioeconomic expectations and impacts people have, especially in these types of emergencies. Everyone should care about the results because this is one of the biggest sunken ship disaster in history, so understand why and how people survived and didn’t survive is important.
Sources of Data:
The data set used for this is the Titanic Dataset from Kaggle, which is a Google-owned platform for data science and machine learning. Link to Data set: https://www.kaggle.com/datasets/sakshisatre/titanic-dataset?resource=download&select=The+Titanic+dataset.csv.
One thing I had to change was the column names due to the fact it was just numeric names. When creating the graph, some parts of the code didn’t run due to numeric column name, so it was changed to the specific names. I will provide the new data set file in my submission. The sample size of this data set is n = 1300 passengers. The independent variables included in this report specifically is Sex and pclass (passenger class). Sex includes 2 categories, males and females, while pclass includes the class the passenger was seated in. In this case, it was 1st class, 2nd class, or 3rd class. The main dependent variable is Survived, which represents 0 for Did Not Survive and 1 for Survived. There were other variables included in the data set such as Names, Age, Family, Fare, Embarked, and Date.
Setup Code for Data Wrangling: The setup code for data wrangling would be using the tidyverse package to tidy the data. This package many useful elements, including ggplot2 to make graphs and other visualizations, dplyr to manipulate data, and many others. The main part of this package used is to filter out the data that includes any missing values for Sex, pclass, and Survived. This is important because doing this can provide the most accurate results possible to be analyzed correctly later. Leaving these missing values is prone to many errors and inaccuracy.
Methods:
The techniques used to address the research question, how sex and passenger class affect survival rates for males and females?, is using basic statistics with the visualization since we are trying to figure out survival rates among different groups. This was done with 2 important steps. The first step was calculating the mean of the Survived variable. This was done by mutating the Survived variable so that it gets converted to a numerical variable. Originally, the variable was written in as a categorical variable where 0 meant Did Not Survive and 1 meant Did Survive. By converting this variable to a numeric variable, we can easily identify the rate of survival for each group when calculating the mean. The data is grouped by passenger class and Sex. For the bars itself, the mean of Survived times 100 is used because the final result is a rate as a percentage.
For the visualization itself, this was created using the ggplot2 package. This package includes many visual components that can be used to create any visualization, such as aes() for aesthetic mapping, geom methods for geometric objects, and many other methods. I decided to use a side by side bar chart to show the relationship between passenger class and gender, and how that affects the overall survival rate for each class. Using this style of chart is the best for my research question since it shows direct comparison easily between each class for males and females.
library(tidyverse) #using tidyverse package
#Data Wrangling
survival_rates <- titanic %>%
#filtering out missing values
filter(!is.na(Survived), !is.na(Sex), Sex %in% c("male","female"),
!is.na(pclass)) %>%
#Data Manipulation
mutate(Survived = as.numeric(Survived)) %>%
group_by(pclass, Sex) %>%
summarize(survival_rate = mean(Survived, na.rm = TRUE) * 100) %>%
ungroup()
Code for Visualization:
#Bar Chart
ggplot(survival_rates, aes(x = pclass, y = survival_rate, fill = Sex)) +
geom_col(position = position_dodge(width = 0.9)) + #Bar Positioning
geom_text(aes(label = paste0(round(survival_rate, 1), "%")), #Displays Percents for Each Bar
position = position_dodge(width = 0.9),
vjust = -0.5, size = 3.5) +
labs(
title = "Survival Rates by Passenger Class and Sex",
x = "Passenger Class",
y = "Survival Rate (%)"
) +
scale_fill_manual(values = c("female" = "#a989de", "male" = "#E89149")) + #Bars are filled in for Sex
scale_y_continuous(breaks = seq(0, 100, by = 10), limits = c(0, 100)) +
theme_minimal()
This bar chart shows the survival rates on the Titanic in percent, based on passenger class and sex. Between the first class passengers, about 97 percent of the women survived compared to only 34 percent of men. Between the second class passengers, about 89 percent of women survived while only 15 percent of men survive. Lastly, between the third class passengers with the lowest survival rates, 49 percent of women survived and only 16 percent of men survived. Overall, this bar chart shows that the passengers are more likely to survive the Titanic if they were seated in first class, and sex was an extremely influential factor in survival rate.
The results of the visualization shows how important gender norms and socioeconomic status reflects survival results during the Titanic disaster. Passengers in first class had the greatest change of survival due to the fact that they were in close proximity to the lifeboats, crew, and less crowds. The reason there was a higher survival rate in women for all three classes was because of the women and children first emergency evacuation rule. These difference shows that survival was chosen rather than required due to the societal expectations.
Not having access to a lifeboat wasn’t the only reason many passengers didn’t survive. After doing some research, I found that the Titanic sunk in the North Atlantic Ocean. At the time of the Titanic sinking, the water was 28 degrees Fahrenheit (-2 degrees Celsius). According to the University of Sea Kayaking, this temperature is enough for a person to loose dexterity under 5 minutes and become unconscious in 30 minutes. This below freezing water was enough for a human to die from hypothermia in under an hour, which is another reason many people did not survive the Titanic.
This analysis is limited and considered by the sample size of about 1,300 passengers, when there was about 1,317 passengers (2,222 total including passengers and crew members). There are many variables to consider that were not in the data set, such as each passengers specific location on the Titanic (since the ship sunk at 2:20am), whether or not they had severe health problems, etc. These factors are just as important when analyzing why someone survived or didn’t survive. Overall, this report highlights how the societal expectations, passenger class, and gender affects the likelihood of survival on the Titanic.