Tidyverse is a collection of packages that are used for data science. Loading the package, attaches the package to the search path. It allows you to call the function directly rather than explicitely referencing the package name.
library(tidyverse)
Reads the Titanic excel file and saves it as the object “titanicdata”. After the file is loaded into R, then we display the first parts of the data frame.
titanicdata <- readxl::read_excel("TitanicData.xlsx")
head(titanicdata)
unique(titanicdata$survived)
In this section, I group the Titanic data by passenger class and survival status. Then I count how many passengers fall into each group and calculate the percentage of people who survived (or didn’t survive) within each class. This helps show whether certain classes had higher survival rates than others.
pclass_survival <- titanicdata %>%
group_by(class, survived) %>%
summarize(n = n()) %>%
mutate(percent = n / sum(n))
pclass_survival
Here, I take the previous analysis a step further by adding gender. I group the data by class, sex, and survival outcome, then calculate a percentage within each class–gender combination. This lets us compare survival outcomes for men and women in each passenger class and see where the differences were the strongest.
pclass_sex_survival <- titanicdata %>%
group_by(class, gender, survived) %>%
summarize(n = n()) %>%
mutate(percent = n / sum(n))
This section creates a bar graph showing the percentage of survivors in each passenger class. The graph is faceted by gender, which makes the differences between men and women easier to see side-by-side. Only the “Survived” category is plotted.
pclass_sex_survival_graph <- pclass_sex_survival %>%
filter(survived == "yes") %>%
ggplot(mapping = aes(x = class, y= percent, fill = class)) +
geom_col() +
facet_grid(~gender)
Finally, I add a title, subtitle, and caption to the graph and format the y-axis to display percentages. I also apply a theme and remove the legend since the passenger class is already labeled on the x-axis.
pclass_sex_survival_graph +
labs(title ="Titanic Survival Rates",
subtitle = "Percent by Gender and Cabin Class",
caption = "Source: Encyclopedia Titanica") +
scale_y_continuous(labels = scales::percent) +
theme_grey() +
theme(
axis.title=element_blank(),
legend.position = "none"
)