Assignment 2 Solution

suppressPackageStartupMessages(library(tidyverse))

# Increases font size for all ggplot2 plots
theme_set(theme_gray(base_size=18))

# List of colors for customizing plots
colors <- c("#1f77b4","#ff7f0e", "#2ca02c", "#d62728",
            "#9467bd","#8c564b", "#e377c2", "#7f7f7f",
            "#bcbd22", "#17becf")

titanic <- read.csv(file.choose())
head(titanic, 10)

##    survived pclass    sex age sibsp parch    fare embarked  class   who
## 1         0      3   male  22     1     0  7.2500        S  Third   man
## 2         1      1 female  38     1     0 71.2833        C  First woman
## 3         1      3 female  26     0     0  7.9250        S  Third woman
## 4         1      1 female  35     1     0 53.1000        S  First woman
## 5         0      3   male  35     0     0  8.0500        S  Third   man
## 6         0      3   male  NA     0     0  8.4583        Q  Third   man
## 7         0      1   male  54     0     0 51.8625        S  First   man
## 8         0      3   male   2     3     1 21.0750        S  Third child
## 9         1      3 female  27     0     2 11.1333        S  Third woman
## 10        1      2 female  14     1     0 30.0708        C Second child
##    adult_male deck embark_town alive alone
## 1        TRUE      Southampton    no FALSE
## 2       FALSE    C   Cherbourg   yes FALSE
## 3       FALSE      Southampton   yes  TRUE
## 4       FALSE    C Southampton   yes FALSE
## 5        TRUE      Southampton    no  TRUE
## 6        TRUE       Queenstown    no  TRUE
## 7        TRUE    E Southampton    no  TRUE
## 8       FALSE      Southampton    no FALSE
## 9       FALSE      Southampton   yes FALSE
## 10      FALSE        Cherbourg   yes FALSE

# Subset the titanic dataset to include first class passengers who embarked in Southampton


# Using base R
firstSouth <- titanic[titanic$class == "First" & titanic$embarked == "S",]

# Subset the titanic dataset to include either second or third class passenger

# Using base R
secondThird <- titanic[titanic$pclass == 2 | titanic$pclass == 3,]

firstSouth %>%
group_by(class, sex) %>%
    summarize(n=n(), .groups="drop_last") %>%
    spread(sex, n)

## # A tibble: 1 × 3
## # Groups:   class [1]
##   class female  male
##   <chr>  <int> <int>
## 1 First     48    79

secondThird %>%
group_by(pclass, alive) %>%
    summarize(n=n(), .groups="drop_last") %>%
    spread(alive, n)

## # A tibble: 2 × 3
## # Groups:   pclass [2]
##   pclass    no   yes
##    <int> <int> <int>
## 1      2    97    87
## 2      3   372   119

The R code you provided is designed to manipulate the Titanic dataset using the dplyr and tidyr packages, which are commonly used for data manipulation and tidying in R.

Here’s a step-by-step breakdown of what this code does to the Titanic dataset:

group_by(pclass, sex): This line groups the dataset by two variables, “pclass” (passenger class) and “sex” (gender). This means that the data will be organized into groups based on these two variables.

summarize(n = n(), .groups = “drop_last”): Within each group created in the previous step, this line calculates the count of observations using the n() function and assigns the result to a new variable called “n.” The .groups = “drop_last” argument is used to drop the grouping information after summarizing, which is useful for further data manipulation.

spread(sex, n): This line spreads the data into a wide format, with “sex” values as columns and the corresponding counts (“n”) as values. This essentially creates a table where each row represents a unique combination of “pclass” and “sex,” and the columns represent the counts of male and female passengers in each passenger class.

The result of running this code will be a new data frame where each row corresponds to a passenger class (“pclass”) and includes counts for both male and female passengers in separate columns. The “sex” values will become column headers, and the “n” values will be the counts of passengers in each category. This can be useful for further analysis or visualization of gender distribution within each passenger class on the Titanic.

# Create a bar chart for the first class passengers who embarked in Southampton grouped by sex

firstSouth %>% ggplot(aes(x=class)) + 
    geom_bar(aes(fill=sex), position="dodge") +
    scale_fill_manual(values=colors) + 
    labs(x="Class", y="Count", fill="Sex")

# Create a bar chart for the second and third class passengers grouped by survival status

secondThird %>% ggplot(aes(x=class)) + 
    geom_bar(aes(fill=alive)) +
    scale_fill_manual(values=colors) + 
    labs(x="Class", y="Count", fill="Alive")

Assignment 2 Solution

Jason Pemberton

2023-09-22