For this assignment, I used the following libraries:
Download and import data from https://public.tableau.com/en-us/s/resources
url <- "https://public.tableau.com/s/sites/default/files/media/titanic%20passenger%20list.csv"
dest <- "titanic passenger list.csv"
download.file(url, dest)
Import data as titanic
titanic <- read_csv("titanic passenger list.csv", show_col_types = FALSE)
To find the mean age of all passengers, I rounded to the nearest tenth, and ignored any null values.
mean_all <- round(mean(titanic$age, na.rm=TRUE), digits = 1)
count_all <- nrow(titanic)
count_age_only <- NROW(na.omit(titanic$age))
To find the mean age and count of all female passengers, I used the filter function from tidyverse/dplyr, and created a new data set called titanic_f
titanic_f <- filter(titanic, sex=="female")
mean_f <- round(mean(titanic_f$age, na.rm=TRUE), digits = 1)
count_f <- NROW(na.omit(titanic_f$name))
count_f_age_only <- NROW(na.omit(titanic_f$age))
I did the same for the male passengers, creating a new data set called titanic_m
titanic_m <- filter(titanic, sex=="male")
mean_m <- round(mean(titanic_m$age, na.rm=TRUE), digits = 1)
count_m <- NROW(na.omit(titanic_m$name))
count_m_age_only <- NROW(na.omit(titanic_m$age))
In total, there were 1309 passengers aboard the RMS Titanic, but only 1046 had known ages.
Note: All means have been rounded to the nearest tenth
I wanted to try putting this data into a table format, so I created a new data frame called age_table, and played around with knitr::kable.
all <- c(count_all, count_age_only, mean_all)
females <- c(count_f, count_f_age_only, mean_f)
males <- c(count_m, count_m_age_only, mean_m)
age_table <- data.frame(all, females, males)
colnames(age_table) <- c("All", "Female", "Male")
rownames(age_table) <- c("Total", "Known Ages", "Known Age Mean")
knitr::kable(age_table, "pipe")
| All | Female | Male | |
|---|---|---|---|
| Total | 1309.0 | 466.0 | 843.0 |
| Known Ages | 1046.0 | 388.0 | 658.0 |
| Known Age Mean | 29.9 | 28.7 | 30.6 |
Note: I attempted to further style this table, but can’t install “kableExtra” for some reason. I also can’t figure out how to only show the mean numbers with 1 decimal and keep the others at 0.
I used ggplot to create and format a histogram, first filtering to only include passengers with ages.
titanic_ages_only <- filter(titanic, age>0)
ggplot(titanic_ages_only, aes(age)) +
geom_histogram(binwidth = 5, fill = "steelblue", col = "black", alpha = 0.75) +
labs(
title = "Histogram of Titanic Passenger Ages",
subtitle = "Includes the 1,046 passengers with known ages",
x = "Age",
y = "No. of Passengers"
) +
theme_bw()
“If you look in your dictionary you will find: Titans – a race of people vainly striving to overcome the forces of nature. Could anything be more unfortunate than such a name, anything more significant?”
-Arthur Rostron