R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.


1

This loads the tidyverse package, which contains tools for data manipulation and visualization that will be used in this analysis.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

2

This reads the Titanic dataset from an Excel file and displays the first few rows so we can see the structure of the data.

titanicdata <- readxl::read_excel("TitanicData.xlsx")
head(titanicdata)
## # A tibble: 6 × 6
##   pclass survived name                                         sex     age  fare
##   <chr>  <chr>    <chr>                                        <chr> <dbl> <dbl>
## 1 First  Survived Allen, Miss. Elisabeth Walton                fema… 29    211. 
## 2 First  Survived Allison, Master. Hudson Trevor               male   0.92 152. 
## 3 First  Perished Allison, Miss. Helen Loraine                 fema…  2    152. 
## 4 First  Perished Allison, Mr. Hudson Joshua Creighton         male  30    152. 
## 5 First  Perished Allison, Mrs. Hudson J C (Bessie Waldo Dani… fema… 25    152. 
## 6 First  Survived Anderson, Mr. Harry                          male  48     26.6

3

This groups the data by passenger class and survival status, counts the number of passengers in each group, and calculates the percentage within each class.

pclass_survival <- titanicdata %>%
  group_by(pclass, survived) %>%
  summarize(n = n()) %>%
  mutate(percent = n / sum(n))
## `summarise()` has grouped output by 'pclass'. You can override using the
## `.groups` argument.
pclass_survival
## # A tibble: 6 × 4
## # Groups:   pclass [3]
##   pclass survived     n percent
##   <chr>  <chr>    <int>   <dbl>
## 1 First  Perished   123   0.381
## 2 First  Survived   200   0.619
## 3 Second Perished   158   0.570
## 4 Second Survived   119   0.430
## 5 Third  Perished   528   0.745
## 6 Third  Survived   181   0.255

4

This groups the data by passenger class, gender, and survival status. It counts the number of passengers in each group and calculates the percentage for each combination.

pclass_sex_survival <- titanicdata %>%
  group_by(pclass, sex, survived) %>%
  summarize(n = n()) %>%
  mutate(percent = n / sum(n))
## `summarise()` has grouped output by 'pclass', 'sex'. You can override using the
## `.groups` argument.
pclass_sex_survival
## # A tibble: 12 × 5
## # Groups:   pclass, sex [6]
##    pclass sex    survived     n percent
##    <chr>  <chr>  <chr>    <int>   <dbl>
##  1 First  female Perished     5  0.0347
##  2 First  female Survived   139  0.965 
##  3 First  male   Perished   118  0.659 
##  4 First  male   Survived    61  0.341 
##  5 Second female Perished    12  0.113 
##  6 Second female Survived    94  0.887 
##  7 Second male   Perished   146  0.854 
##  8 Second male   Survived    25  0.146 
##  9 Third  female Perished   110  0.509 
## 10 Third  female Survived   106  0.491 
## 11 Third  male   Perished   418  0.848 
## 12 Third  male   Survived    75  0.152

5

This filters the data to include only passengers who survived and creates a bar chart showing survival percentages by passenger class and gender.

pclass_sex_survival_graph <- pclass_sex_survival %>%
  filter(survived == "Survived") %>%
  ggplot(mapping = aes(x = pclass, y = percent, fill = pclass)) +
  geom_col() +
  facet_grid(~sex)

pclass_sex_survival_graph


6

This improves the graph by adding a title, subtitle, caption, formatting the y-axis as percentages, and adjusting the theme.

pclass_sex_survival_graph +
  labs(title = "Titanic Survival Rates",
       subtitle = "Percent by Gender and Cabin Class",
       caption = "Source: Encyclopedia Titanica") +
  scale_y_continuous(labels = scales::percent) +
  theme_grey() +
  theme(
    axis.title = element_blank(),
    legend.position = "none"
  )