This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
This loads the tidyverse package, which contains tools for data manipulation and visualization that will be used in this analysis.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
This reads the Titanic dataset from an Excel file and displays the first few rows so we can see the structure of the data.
titanicdata <- readxl::read_excel("TitanicData.xlsx")
head(titanicdata)
## # A tibble: 6 × 6
## pclass survived name sex age fare
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 First Survived Allen, Miss. Elisabeth Walton fema… 29 211.
## 2 First Survived Allison, Master. Hudson Trevor male 0.92 152.
## 3 First Perished Allison, Miss. Helen Loraine fema… 2 152.
## 4 First Perished Allison, Mr. Hudson Joshua Creighton male 30 152.
## 5 First Perished Allison, Mrs. Hudson J C (Bessie Waldo Dani… fema… 25 152.
## 6 First Survived Anderson, Mr. Harry male 48 26.6
This groups the data by passenger class and survival status, counts the number of passengers in each group, and calculates the percentage within each class.
pclass_survival <- titanicdata %>%
group_by(pclass, survived) %>%
summarize(n = n()) %>%
mutate(percent = n / sum(n))
## `summarise()` has grouped output by 'pclass'. You can override using the
## `.groups` argument.
pclass_survival
## # A tibble: 6 × 4
## # Groups: pclass [3]
## pclass survived n percent
## <chr> <chr> <int> <dbl>
## 1 First Perished 123 0.381
## 2 First Survived 200 0.619
## 3 Second Perished 158 0.570
## 4 Second Survived 119 0.430
## 5 Third Perished 528 0.745
## 6 Third Survived 181 0.255
This groups the data by passenger class, gender, and survival status. It counts the number of passengers in each group and calculates the percentage for each combination.
pclass_sex_survival <- titanicdata %>%
group_by(pclass, sex, survived) %>%
summarize(n = n()) %>%
mutate(percent = n / sum(n))
## `summarise()` has grouped output by 'pclass', 'sex'. You can override using the
## `.groups` argument.
pclass_sex_survival
## # A tibble: 12 × 5
## # Groups: pclass, sex [6]
## pclass sex survived n percent
## <chr> <chr> <chr> <int> <dbl>
## 1 First female Perished 5 0.0347
## 2 First female Survived 139 0.965
## 3 First male Perished 118 0.659
## 4 First male Survived 61 0.341
## 5 Second female Perished 12 0.113
## 6 Second female Survived 94 0.887
## 7 Second male Perished 146 0.854
## 8 Second male Survived 25 0.146
## 9 Third female Perished 110 0.509
## 10 Third female Survived 106 0.491
## 11 Third male Perished 418 0.848
## 12 Third male Survived 75 0.152
This filters the data to include only passengers who survived and creates a bar chart showing survival percentages by passenger class and gender.
pclass_sex_survival_graph <- pclass_sex_survival %>%
filter(survived == "Survived") %>%
ggplot(mapping = aes(x = pclass, y = percent, fill = pclass)) +
geom_col() +
facet_grid(~sex)
pclass_sex_survival_graph
This improves the graph by adding a title, subtitle, caption, formatting the y-axis as percentages, and adjusting the theme.
pclass_sex_survival_graph +
labs(title = "Titanic Survival Rates",
subtitle = "Percent by Gender and Cabin Class",
caption = "Source: Encyclopedia Titanica") +
scale_y_continuous(labels = scales::percent) +
theme_grey() +
theme(
axis.title = element_blank(),
legend.position = "none"
)