First R Markdown

Strip 1 – Load required packages

This section loads the tidyverse package, which contains tools for data wrangling, plotting, and data manipulation. We need tidyverse for the remaining steps of the analysis.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Strip 2 – Read and preview the Titanic data

This section reads the Titanic dataset from the Excel file named “TitanicData.xlsx” into R.
The head(titanicdata) function prints the first few rows so we can check that the data loaded correctly.

titanicdata <- readxl::read_excel("TitanicData.xlsx")
head(titanicdata)

## # A tibble: 6 × 6
##   pclass survived name                                         sex     age  fare
##   <chr>  <chr>    <chr>                                        <chr> <dbl> <dbl>
## 1 First  Survived Allen, Miss. Elisabeth Walton                fema… 29    211. 
## 2 First  Survived Allison, Master. Hudson Trevor               male   0.92 152. 
## 3 First  Perished Allison, Miss. Helen Loraine                 fema…  2    152. 
## 4 First  Perished Allison, Mr. Hudson Joshua Creighton         male  30    152. 
## 5 First  Perished Allison, Mrs. Hudson J C (Bessie Waldo Dani… fema… 25    152. 
## 6 First  Survived Anderson, Mr. Harry                          male  48     26.6

Strip 3 – Summarize survival by passenger class

This section groups the data by passenger class (pclass) and survival outcome (survived). It counts how many passengers are in each group and then calculates the percentage who survived or did not survive within each class.

pclass_survival <- titanicdata %>%
  group_by(pclass, survived) %>%
  summarize(n = n()) %>%
  mutate(percent = n / sum(n))

## `summarise()` has grouped output by 'pclass'. You can override using the
## `.groups` argument.

pclass_survival

## # A tibble: 6 × 4
## # Groups:   pclass [3]
##   pclass survived     n percent
##   <chr>  <chr>    <int>   <dbl>
## 1 First  Perished   123   0.381
## 2 First  Survived   200   0.619
## 3 Second Perished   158   0.570
## 4 Second Survived   119   0.430
## 5 Third  Perished   528   0.745
## 6 Third  Survived   181   0.255

Strip 4 – Summarize survival by class and sex

This section groups the data by passenger class, gender, and survival outcome. It display how many passengers in each group survived or did not survive, and then calculates the percentage within each combination of class and gender.

pclass_sex_survival <- titanicdata %>%
  group_by(pclass, sex, survived) %>%
  summarize(n = n()) %>%
  mutate(percent = n / sum(n))

## `summarise()` has grouped output by 'pclass', 'sex'. You can override using the
## `.groups` argument.

pclass_sex_survival

## # A tibble: 12 × 5
## # Groups:   pclass, sex [6]
##    pclass sex    survived     n percent
##    <chr>  <chr>  <chr>    <int>   <dbl>
##  1 First  female Perished     5  0.0347
##  2 First  female Survived   139  0.965 
##  3 First  male   Perished   118  0.659 
##  4 First  male   Survived    61  0.341 
##  5 Second female Perished    12  0.113 
##  6 Second female Survived    94  0.887 
##  7 Second male   Perished   146  0.854 
##  8 Second male   Survived    25  0.146 
##  9 Third  female Perished   110  0.509 
## 10 Third  female Survived   106  0.491 
## 11 Third  male   Perished   418  0.848 
## 12 Third  male   Survived    75  0.152

Strip 5 – Create a bar graph of survivor percentages

This section filters for survivors only from Strip 4’s table to keep only the survivors. Then it builds a bar graph showing the percentage of survivors in each class, separated by male and female.

pclass_sex_survival_graph <- pclass_sex_survival %>%
  filter(survived == "Survived") %>%
  ggplot(mapping = aes(x = pclass, y = percent, fill = pclass)) +
  geom_col() +
  facet_grid(~ sex)

pclass_sex_survival_graph

Strip 6 – Add labels and styling to the graph

This section adds a title, subtitle, and caption to the graph. It also formats the y-axis as percentages, applies a grey theme, and removes the legend.

pclass_sex_survival_graph +
  labs(
    title    = "Titanic Survival Rates",
    subtitle = "Percent by Gender and Cabin Class",
    caption  = "Source: Encyclopedia Titanica"
  ) +
  scale_y_continuous(labels = scales::percent) +
  theme_grey() +
  theme(
    axis.title      = element_blank(),
    legend.position = "none"
  )