Chunk #1

library(tidyverse): This is a command that loads a number of packages in R. This can include ggplot2, lubridate, dplyr, etc. All of these packages are useful in data visualization and analysis.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Chunk #2

titanicdata <- readxl::read_excel(“TitanicData.xlsx”): This line of code is creating a variable called titanicdata and is assigning that to an excel file that gets read in. This is case, it would be TitanicData.xlsx. Now, we can use this dataset just by using the titanicdata variable, which shows up as a dataset in R.

head(titanicdata) This is line of code returns the first 6 rows of the titanicdata dataset.

titanicdata <- readxl::read_excel("TitanicData.xlsx")
head(titanicdata)
## # A tibble: 6 × 6
##   pclass survived name                                         sex     age  fare
##   <chr>  <chr>    <chr>                                        <chr> <dbl> <dbl>
## 1 First  Survived Allen, Miss. Elisabeth Walton                fema… 29    211. 
## 2 First  Survived Allison, Master. Hudson Trevor               male   0.92 152. 
## 3 First  Perished Allison, Miss. Helen Loraine                 fema…  2    152. 
## 4 First  Perished Allison, Mr. Hudson Joshua Creighton         male  30    152. 
## 5 First  Perished Allison, Mrs. Hudson J C (Bessie Waldo Dani… fema… 25    152. 
## 6 First  Survived Anderson, Mr. Harry                          male  48     26.6

Chunk #3

pclass_survival <- titanicdata %>% group_by(pclass, survived) %>% summarize(n = n()) %>% mutate(percent = n / sum(n))

pclass_survival

This code chunk creates a new variable called pclass_survival and uses the titanicdata dataset by grouping pclass and survived into 2 different columns. Then, it summarizes the data by the number of passengers in each class (column n). After that,the data is mutated meaning a new column called percent gets created. This column calculates n / sum(n) for the specific class and survival status.

The line pclass_survival then displays the new dataset.

## `summarise()` has grouped output by 'pclass'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 4
## # Groups:   pclass [3]
##   pclass survived     n percent
##   <chr>  <chr>    <int>   <dbl>
## 1 First  Perished   123   0.381
## 2 First  Survived   200   0.619
## 3 Second Perished   158   0.570
## 4 Second Survived   119   0.430
## 5 Third  Perished   528   0.745
## 6 Third  Survived   181   0.255

Chunk #4

pclass_sex_survival <- titanicdata %>% group_by(pclass, sex, survived) %>% summarize(n = n()) %>% mutate(percent = n / sum(n))

This code chunk creates a new variable called **pclass__sex_survival and uses the titanicdata dataset by grouping pclass, sex, and survived** into 3 different columns. Then, it summarizes the data by the number of passengers in each class (column n). After that, the data is mutated meaning a new column called percent gets created. This column calculates n / sum(n) for the specific class, sex and survival status.

## `summarise()` has grouped output by 'pclass', 'sex'. You can override using the
## `.groups` argument.
## # A tibble: 12 × 5
## # Groups:   pclass, sex [6]
##    pclass sex    survived     n percent
##    <chr>  <chr>  <chr>    <int>   <dbl>
##  1 First  female Perished     5  0.0347
##  2 First  female Survived   139  0.965 
##  3 First  male   Perished   118  0.659 
##  4 First  male   Survived    61  0.341 
##  5 Second female Perished    12  0.113 
##  6 Second female Survived    94  0.887 
##  7 Second male   Perished   146  0.854 
##  8 Second male   Survived    25  0.146 
##  9 Third  female Perished   110  0.509 
## 10 Third  female Survived   106  0.491 
## 11 Third  male   Perished   418  0.848 
## 12 Third  male   Survived    75  0.152

Chunk #5

pclass_sex_survival_graph <- pclass_sex_survival %>% filter(survived == “Survived”) %>% ggplot(mapping = aes(x = pclass, y= percent, fill = pclass)) + geom_col() + facet_grid(~sex)

This code chunk creates a new variable called pclass_sex_survival_graph and uses the previous dataset of pclass_sex_survival to create a graph. In the graph, the columns will only have Survived and doesn’t show the passengers that did not survive. The ggplot line creates the graph with x axis as pclass, y axis as percent, and bars that represent each class. Geom_col makes the graph a bar chart and facet_grid(~sex) separate them by sex.

Chunk #5

pclass_sex_survival_graph + labs(title =“Titanic Survival Rates”, subtitle = “Percent by Gender and Cabin Class”, caption = “Source: Encyclopedia Titanica”) + scale_y_continuous(labels = scales::percent) + theme_grey() + theme( axis.title=element_blank(), legend.position = “none” )

This code chunk creates a graph by adding onto the previous bar chart and adding more details. A title, subtitle, and caption is added and the y axis scales are converted into percentages. The theme is grey and the legend is not included.