1

Load the packages and setup the document.

2

This section will read the excel file where the data is contained and store it as an object called titanic data. Next the script will run a function called head to print the first 5 rows of my data set.

titanicdata <- read_excel("TitanicData.xlsx")
head(titanicdata)
## # A tibble: 6 × 6
##   pclass survived name                                         sex     age  fare
##   <chr>  <chr>    <chr>                                        <chr> <dbl> <dbl>
## 1 First  Survived Allen, Miss. Elisabeth Walton                fema… 29    211. 
## 2 First  Survived Allison, Master. Hudson Trevor               male   0.92 152. 
## 3 First  Perished Allison, Miss. Helen Loraine                 fema…  2    152. 
## 4 First  Perished Allison, Mr. Hudson Joshua Creighton         male  30    152. 
## 5 First  Perished Allison, Mrs. Hudson J C (Bessie Waldo Dani… fema… 25    152. 
## 6 First  Survived Anderson, Mr. Harry                          male  48     26.6

3

This section will group the Titanic data by passenger class and survival status. Then it will count how many people fall into each group and calculate the percent within each class. Finally it prints the summarized table so I can compare survival rates by class.

pclass_survival <- titanicdata %>%
  group_by(pclass, survived) %>%
  summarize(n = n(), .groups = 'drop') %>%
  mutate(percent = n / sum(n))

pclass_survival
## # A tibble: 6 × 4
##   pclass survived     n percent
##   <chr>  <chr>    <int>   <dbl>
## 1 First  Perished   123  0.0940
## 2 First  Survived   200  0.153 
## 3 Second Perished   158  0.121 
## 4 Second Survived   119  0.0909
## 5 Third  Perished   528  0.403 
## 6 Third  Survived   181  0.138

4

This section will group the data by passenger class, gender, and survival status. Then it will count the number of passengers in each category and calculate the percent within each class and gender group. This creates the summary table that will be used to make the graph.

pclass_sex_survival <- titanicdata %>%
  group_by(pclass, sex, survived) %>%
  summarize(n = n(), .groups = 'drop') %>%
  mutate(percent = n / sum(n))

5

This section will filter the summarized table to only include passengers who survived. Then it will make a bar chart showing survival percent by passenger class, and it will split the chart into panels by gender so it is easy to compare male and female survival patterns.

pclass_sex_survival_graph <- pclass_sex_survival %>%
  filter(survived == "Survived") %>%
  ggplot(mapping = aes(x = pclass, y = percent, fill = pclass)) +
  geom_col() +
  facet_grid(~sex) 

6

This section will take the graph from the previous step and format it to look cleaner. It adds a title, subtitle, and caption, converts the y axis into percent format, removes the axis titles, and hides the legend since passenger class is already obvious from the x axis.

pclass_sex_survival_graph + 
  labs(
    title = "Titanic Survival Rates", 
    subtitle = "Percent by Gender and Cabin Class", 
    caption = "Source: Encyclopedia Titanica"
  ) + 
  scale_y_continuous(labels = label_percent()) + 
  theme_grey() + 
  theme(
    axis.title = element_blank(), 
    legend.position = "none"
  )