For this assignment, I used the following libraries:

Aquire data set

Download and import data from https://public.tableau.com/en-us/s/resources

url <- "https://public.tableau.com/s/sites/default/files/media/titanic%20passenger%20list.csv"
dest <- "titanic passenger list.csv"
download.file(url, dest)

Import data as titanic

titanic <- read_csv("titanic passenger list.csv", show_col_types = FALSE)

Passenger count and mean ages

To find the mean age of all passengers, I rounded to the nearest tenth, and ignored any null values.

mean_all <- round(mean(titanic$age, na.rm=TRUE), digits = 1)
count_all <- nrow(titanic)
count_age_only <- NROW(na.omit(titanic$age))

To find the mean age and count of all female passengers, I used the filter function from tidyverse/dplyr, and created a new data set called titanic_f

titanic_f <- filter(titanic, sex=="female")
mean_f <- round(mean(titanic_f$age, na.rm=TRUE), digits = 1)
count_f <- NROW(na.omit(titanic_f$name))
count_f_age_only <- NROW(na.omit(titanic_f$age))

I did the same for the male passengers, creating a new data set called titanic_m

titanic_m <- filter(titanic, sex=="male")
mean_m <- round(mean(titanic_m$age, na.rm=TRUE), digits = 1)
count_m <- NROW(na.omit(titanic_m$name))
count_m_age_only <- NROW(na.omit(titanic_m$age))

Summary of passenger count and mean ages

In total, there were 1309 passengers aboard the RMS Titanic, but only 1046 had known ages.

  • All passengers: 1309
    • Passengers with known ages: 1046
    • Mean age: 29.9
  • Female passengers: 466
    • Females with known ages: 388
      • Female mean age: 28.7
  • Male passengers: 843
    • Males with known ages: 658
      • Male mean age: 30.6

Note: All means have been rounded to the nearest tenth

Table using knitr::kable

I wanted to try putting this data into a table format, so I created a new data frame called age_table, and played around with knitr::kable.

all <- c(count_all, count_age_only, mean_all)
females <- c(count_f, count_f_age_only, mean_f)
males <- c(count_m, count_m_age_only, mean_m)
age_table <- data.frame(all, females, males)
colnames(age_table) <- c("All", "Female", "Male") 
rownames(age_table) <- c("Total", "Known Ages", "Known Age Mean")
knitr::kable(age_table, "pipe")
All Female Male
Total 1309.0 466.0 843.0
Known Ages 1046.0 388.0 658.0
Known Age Mean 29.9 28.7 30.6

Note: I attempted to further style this table, but can’t install “kableExtra” for some reason. I also can’t figure out how to only show the mean numbers with 1 decimal and keep the others at 0.

Histogram of ages

I used ggplot to create and format a histogram, first filtering to only include passengers with ages.

titanic_ages_only <- filter(titanic, age>0)
ggplot(titanic_ages_only, aes(age)) +
  geom_histogram(binwidth = 5, fill = "steelblue", col = "black", alpha = 0.75) +
  labs(
    title = "Histogram of Titanic Passenger Ages",
    subtitle = "Includes the 1,046 passengers with known ages",
    x = "Age",
    y = "No. of Passengers"
  ) +
  theme_bw()

“If you look in your dictionary you will find: Titans – a race of people vainly striving to overcome the forces of nature. Could anything be more unfortunate than such a name, anything more significant?”

-Arthur Rostron