Load the packages and setup the document.
This section will read the excel file where the data is contained and store it as an object called titanic data. Next the script will run a function called head to print the first 5 rows of my data set.
titanicdata <- read_excel("TitanicData.xlsx")
head(titanicdata)
## # A tibble: 6 × 6
## pclass survived name sex age fare
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 First Survived Allen, Miss. Elisabeth Walton fema… 29 211.
## 2 First Survived Allison, Master. Hudson Trevor male 0.92 152.
## 3 First Perished Allison, Miss. Helen Loraine fema… 2 152.
## 4 First Perished Allison, Mr. Hudson Joshua Creighton male 30 152.
## 5 First Perished Allison, Mrs. Hudson J C (Bessie Waldo Dani… fema… 25 152.
## 6 First Survived Anderson, Mr. Harry male 48 26.6
This section will group the Titanic data by passenger class and survival status. Then it will count how many people fall into each group and calculate the percent within each class. Finally it prints the summarized table so I can compare survival rates by class.
pclass_survival <- titanicdata %>%
group_by(pclass, survived) %>%
summarize(n = n(), .groups = 'drop') %>%
mutate(percent = n / sum(n))
pclass_survival
## # A tibble: 6 × 4
## pclass survived n percent
## <chr> <chr> <int> <dbl>
## 1 First Perished 123 0.0940
## 2 First Survived 200 0.153
## 3 Second Perished 158 0.121
## 4 Second Survived 119 0.0909
## 5 Third Perished 528 0.403
## 6 Third Survived 181 0.138
This section will group the data by passenger class, gender, and survival status. Then it will count the number of passengers in each category and calculate the percent within each class and gender group. This creates the summary table that will be used to make the graph.
pclass_sex_survival <- titanicdata %>%
group_by(pclass, sex, survived) %>%
summarize(n = n(), .groups = 'drop') %>%
mutate(percent = n / sum(n))
This section will filter the summarized table to only include passengers who survived. Then it will make a bar chart showing survival percent by passenger class, and it will split the chart into panels by gender so it is easy to compare male and female survival patterns.
pclass_sex_survival_graph <- pclass_sex_survival %>%
filter(survived == "Survived") %>%
ggplot(mapping = aes(x = pclass, y = percent, fill = pclass)) +
geom_col() +
facet_grid(~sex)
This section will take the graph from the previous step and format it to look cleaner. It adds a title, subtitle, and caption, converts the y axis into percent format, removes the axis titles, and hides the legend since passenger class is already obvious from the x axis.
pclass_sex_survival_graph +
labs(
title = "Titanic Survival Rates",
subtitle = "Percent by Gender and Cabin Class",
caption = "Source: Encyclopedia Titanica"
) +
scale_y_continuous(labels = label_percent()) +
theme_grey() +
theme(
axis.title = element_blank(),
legend.position = "none"
)