library(tidyverse): This is a command that loads a number of packages in R. This can include ggplot2, lubridate, dplyr, etc. All of these packages are useful in data visualization and analysis.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
titanicdata <- readxl::read_excel(“TitanicData.xlsx”): This line of code is creating a variable called titanicdata and is assigning that to an excel file that gets read in. This is case, it would be TitanicData.xlsx. Now, we can use this dataset just by using the titanicdata variable, which shows up as a dataset in R.
head(titanicdata) This is line of code returns the first 6 rows of the titanicdata dataset.
titanicdata <- readxl::read_excel("TitanicData.xlsx")
head(titanicdata)
## # A tibble: 6 × 6
## pclass survived name sex age fare
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 First Survived Allen, Miss. Elisabeth Walton fema… 29 211.
## 2 First Survived Allison, Master. Hudson Trevor male 0.92 152.
## 3 First Perished Allison, Miss. Helen Loraine fema… 2 152.
## 4 First Perished Allison, Mr. Hudson Joshua Creighton male 30 152.
## 5 First Perished Allison, Mrs. Hudson J C (Bessie Waldo Dani… fema… 25 152.
## 6 First Survived Anderson, Mr. Harry male 48 26.6
pclass_survival <- titanicdata %>% group_by(pclass, survived) %>% summarize(n = n()) %>% mutate(percent = n / sum(n))
pclass_survival
This code chunk creates a new variable called pclass_survival and uses the titanicdata dataset by grouping pclass and survived into 2 different columns. Then, it summarizes the data by the number of passengers in each class (column n). After that,the data is mutated meaning a new column called percent gets created. This column calculates n / sum(n) for the specific class and survival status.
The line pclass_survival then displays the new dataset.
## `summarise()` has grouped output by 'pclass'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 4
## # Groups: pclass [3]
## pclass survived n percent
## <chr> <chr> <int> <dbl>
## 1 First Perished 123 0.381
## 2 First Survived 200 0.619
## 3 Second Perished 158 0.570
## 4 Second Survived 119 0.430
## 5 Third Perished 528 0.745
## 6 Third Survived 181 0.255
pclass_sex_survival <- titanicdata %>% group_by(pclass, sex, survived) %>% summarize(n = n()) %>% mutate(percent = n / sum(n))
This code chunk creates a new variable called **pclass__sex_survival and uses the titanicdata dataset by grouping pclass, sex, and survived** into 3 different columns. Then, it summarizes the data by the number of passengers in each class (column n). After that, the data is mutated meaning a new column called percent gets created. This column calculates n / sum(n) for the specific class, sex and survival status.
## `summarise()` has grouped output by 'pclass', 'sex'. You can override using the
## `.groups` argument.
## # A tibble: 12 × 5
## # Groups: pclass, sex [6]
## pclass sex survived n percent
## <chr> <chr> <chr> <int> <dbl>
## 1 First female Perished 5 0.0347
## 2 First female Survived 139 0.965
## 3 First male Perished 118 0.659
## 4 First male Survived 61 0.341
## 5 Second female Perished 12 0.113
## 6 Second female Survived 94 0.887
## 7 Second male Perished 146 0.854
## 8 Second male Survived 25 0.146
## 9 Third female Perished 110 0.509
## 10 Third female Survived 106 0.491
## 11 Third male Perished 418 0.848
## 12 Third male Survived 75 0.152
pclass_sex_survival_graph <- pclass_sex_survival %>% filter(survived == “Survived”) %>% ggplot(mapping = aes(x = pclass, y= percent, fill = pclass)) + geom_col() + facet_grid(~sex)
This code chunk creates a new variable called pclass_sex_survival_graph and uses the previous dataset of pclass_sex_survival to create a graph. In the graph, the columns will only have Survived and doesn’t show the passengers that did not survive. The ggplot line creates the graph with x axis as pclass, y axis as percent, and bars that represent each class. Geom_col makes the graph a bar chart and facet_grid(~sex) separate them by sex.
pclass_sex_survival_graph + labs(title =“Titanic Survival Rates”, subtitle = “Percent by Gender and Cabin Class”, caption = “Source: Encyclopedia Titanica”) + scale_y_continuous(labels = scales::percent) + theme_grey() + theme( axis.title=element_blank(), legend.position = “none” )
This code chunk creates a graph by adding onto the previous bar chart and adding more details. A title, subtitle, and caption is added and the y axis scales are converted into percentages. The theme is grey and the legend is not included.