This section loads the tidyverse package, which contains tools for data wrangling, plotting, and data manipulation. We need tidyverse for the remaining steps of the analysis.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
This section reads the Titanic dataset from the Excel file named
“TitanicData.xlsx” into R.
The head(titanicdata) function prints the first few rows so
we can check that the data loaded correctly.
titanicdata <- readxl::read_excel("TitanicData.xlsx")
head(titanicdata)
## # A tibble: 6 × 6
## pclass survived name sex age fare
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 First Survived Allen, Miss. Elisabeth Walton fema… 29 211.
## 2 First Survived Allison, Master. Hudson Trevor male 0.92 152.
## 3 First Perished Allison, Miss. Helen Loraine fema… 2 152.
## 4 First Perished Allison, Mr. Hudson Joshua Creighton male 30 152.
## 5 First Perished Allison, Mrs. Hudson J C (Bessie Waldo Dani… fema… 25 152.
## 6 First Survived Anderson, Mr. Harry male 48 26.6
This section groups the data by passenger class (pclass)
and survival outcome (survived). It counts how many
passengers are in each group and then calculates the percentage who
survived or did not survive within each class.
pclass_survival <- titanicdata %>%
group_by(pclass, survived) %>%
summarize(n = n()) %>%
mutate(percent = n / sum(n))
## `summarise()` has grouped output by 'pclass'. You can override using the
## `.groups` argument.
pclass_survival
## # A tibble: 6 × 4
## # Groups: pclass [3]
## pclass survived n percent
## <chr> <chr> <int> <dbl>
## 1 First Perished 123 0.381
## 2 First Survived 200 0.619
## 3 Second Perished 158 0.570
## 4 Second Survived 119 0.430
## 5 Third Perished 528 0.745
## 6 Third Survived 181 0.255
This section groups the data by passenger class, gender, and survival outcome. It display how many passengers in each group survived or did not survive, and then calculates the percentage within each combination of class and gender.
pclass_sex_survival <- titanicdata %>%
group_by(pclass, sex, survived) %>%
summarize(n = n()) %>%
mutate(percent = n / sum(n))
## `summarise()` has grouped output by 'pclass', 'sex'. You can override using the
## `.groups` argument.
pclass_sex_survival
## # A tibble: 12 × 5
## # Groups: pclass, sex [6]
## pclass sex survived n percent
## <chr> <chr> <chr> <int> <dbl>
## 1 First female Perished 5 0.0347
## 2 First female Survived 139 0.965
## 3 First male Perished 118 0.659
## 4 First male Survived 61 0.341
## 5 Second female Perished 12 0.113
## 6 Second female Survived 94 0.887
## 7 Second male Perished 146 0.854
## 8 Second male Survived 25 0.146
## 9 Third female Perished 110 0.509
## 10 Third female Survived 106 0.491
## 11 Third male Perished 418 0.848
## 12 Third male Survived 75 0.152
This section filters for survivors only from Strip 4’s table to keep only the survivors. Then it builds a bar graph showing the percentage of survivors in each class, separated by male and female.
pclass_sex_survival_graph <- pclass_sex_survival %>%
filter(survived == "Survived") %>%
ggplot(mapping = aes(x = pclass, y = percent, fill = pclass)) +
geom_col() +
facet_grid(~ sex)
pclass_sex_survival_graph
This section adds a title, subtitle, and caption to the graph. It also formats the y-axis as percentages, applies a grey theme, and removes the legend.
pclass_sex_survival_graph +
labs(
title = "Titanic Survival Rates",
subtitle = "Percent by Gender and Cabin Class",
caption = "Source: Encyclopedia Titanica"
) +
scale_y_continuous(labels = scales::percent) +
theme_grey() +
theme(
axis.title = element_blank(),
legend.position = "none"
)