December Coding Challenge

Tony Chen

09/12/2021

1.Who paid for the most expensive ticket price (Fare) for the Titanic, and how much was it?

library(tidyverse)
dat <- read.csv("Titanic_train.csv")

print(slice_max(dat, order_by = Fare))
  PassengerId Survived Pclass                               Name    Sex Age
1         259        1      1                   Ward, Miss. Anna female  35
2         680        1      1 Cardeza, Mr. Thomas Drake Martinez   male  36
3         738        1      1             Lesurer, Mr. Gustave J   male  35
  SibSp Parch   Ticket     Fare       Cabin Embarked
1     0     0 PC 17755 512.3292                    C
2     0     1 PC 17755 512.3292 B51 B53 B55        C
3     0     0 PC 17755 512.3292        B101        C

The highest fare paid is tied between Miss Anna Ward, Mr. Thomas Drake Martinez Cardeza, and Mr. Gustave J Lesurer.

2. Which class (Pclass) was the most likely to survive on the titanic?

print(class_surv <- dat %>%
    group_by(Pclass) %>%
    mutate(sum_surv = sum(Survived), count_class = n(), prop_surv = sum_surv/count_class) %>%
    select(Pclass, sum_surv, count_class, prop_surv) %>%
    slice_head())
# A tibble: 3 × 4
# Groups:   Pclass [3]
  Pclass sum_surv count_class prop_surv
   <int>    <int>       <int>     <dbl>
1      1      136         216     0.630
2      2       87         184     0.473
3      3      119         491     0.242

The first class cabin has the highest proportion of survivors.

3. Is there a relationship between passenger age and fare price? How might you assess the significance of this relationship? (HINT: a linear model may be helpful)

library(ggplot2)
library(plotly)

summary(lm(Fare ~ Age, data = dat))

Call:
lm(formula = Fare ~ Age, data = dat)

Residuals:
   Min     1Q Median     3Q    Max 
-42.42 -24.49 -17.60   2.33 475.78 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  24.3009     4.4922   5.410 8.64e-08 ***
Age           0.3500     0.1359   2.575   0.0102 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 52.71 on 712 degrees of freedom
  (177 observations deleted due to missingness)
Multiple R-squared:  0.009229,  Adjusted R-squared:  0.007837 
F-statistic: 6.632 on 1 and 712 DF,  p-value: 0.01022
ggplotly(ggplot(dat, aes(Age, Fare)) + geom_point() + geom_smooth(method = "lm",
    formula = y ~ x))

According to the univariate linear regression model, there is a statistically significant relationship between passenger age and fare price (every increase in age by 1 year leads to a $0.35 increase in fare price). However, there is reason to take this with a grain of salt: the R-squared value indicates that age explains very little variance in fare price (which is supported by a visual inspection).

4. Graphical challenge: Please explore the data and make any graph of your choice, with any of the data you which to use to represent an interesting trend, cool graph, wacky colours

dat1 <- dat %>%
    mutate(survjitter = ifelse(Survived == 1, -0.25, 0.25), child = replace_na(ifelse(Age <
        18, 1, 0), 0), child_opanum = ifelse(child == 1, 1, 0.5), child_siznum = ifelse(child ==
        1, 2, 1), Sex = str_to_title(Sex), Surv = ifelse(Survived == 1, "Survived",
        "Perished")) %>%
    group_by(Pclass, Sex) %>%
    mutate(survcount = cumsum(Survived)) %>%
    group_by(Pclass) %>%
    mutate(survprop = survcount/n() * 100) %>%
    mutate(survprop_maxf = max(ifelse(Sex == "Female", survprop, 0)), survprop_maxm = max(ifelse(Sex ==
        "Male", survprop, 0)), survprop_plusmaxf = ifelse(Sex == "Male", survprop +
        survprop_maxf, 0))

plot_ly(dat1) %>%
    add_trace(dat1, y = ~jitter(Pclass + survjitter, 1.75), x = ~Fare, symbol = ~Surv,
        symbols = c("cross-open-dot", "circle-open-dot"), color = ~Sex, colors = c("red",
            "blue"), type = "scatter", mode = "markers", opacity = ~child_opanum,
        size = ~child_siznum, text = ~Name, meta = ~Sex, customdata = ~Age, hovertemplate = paste0("<b>Name</b>: %{text}
                         <b>Sex</b>: %{meta}
                         <b>Age</b>: %{customdata}
                         <b>Fare</b>: $%{x:.2f}")) %>%
    add_trace(data = dat1 %>%
        filter(Sex == "Female"), x = ~survprop, y = ~Pclass, type = "scatter", mode = "lines",
        name = "Female survival rate", line = list(color = "red"), meta = ~survprop_maxf,
        hovertemplate = paste0("<b>Class: %{y}</b>
                            <b>Cumulative survival</b>: %{x}
                            <b>Female survival rate</b>: %{meta:.2f}"),
        xaxis = "x2") %>%
    add_trace(data = dat1 %>%
        filter(Sex == "Male"), x = ~survprop_plusmaxf, y = ~Pclass, type = "scatter",
        mode = "lines", name = "Male survival rate", line = list(color = "blue"),
        meta = ~survprop_maxm, hovertemplate = paste0("<b>Class: %{y}</b>
                            <b>Cumulative survival</b>: %{x}
                            <b>Male survival rate</b>: %{meta:.2f}"),
        xaxis = "x2") %>%
    add_annotations(x = rep(-30, 6), y = seq(0.75, 3.25, by = 0.5), text = rep(rep(c("Survived",
        "Perished"), length.out = length(c(1, 1))), times = 3), textposition = "left center",
        showarrow = F, font = list(size = 10)) %>%
    add_annotations(x = 900, y = 0, text = "iceberg", textposition = "left center",
        xref = "x", yref = "y", ax = 20, ay = -40) %>%
    layout(yaxis = list(range = c(4, -1), autorange = F, autorange = "reversed",
        autotick = F, tickmode = "array", tickvals = c(1, 2, 3), zeroline = F, title = "Cabin Class"),
        xaxis = list(range = c(-50, 600), zeroline = F, tickprefix = "$"), xaxis2 = list(overlaying = "x",
            side = "top", showticklabels = T, scaleanchor = "x", scaleratio = 8.75,
            ticksuffix = "%", rangemode = "tozero", showgrid = F), shapes = list(list(type = "line",
            x0 = -400, x1 = 400, y0 = 3.5, y1 = 3.5, line = list(color = "black")),
            list(type = "line", x0 = -450, x1 = 600, y0 = 0.5, y1 = 0.5, line = list(color = "black")),
            list(type = "line", x0 = -433.3, x1 = 533, y0 = 1.5, y1 = 1.5, line = list(color = "black")),
            list(type = "line", x0 = -416.6, x1 = 466.6, y0 = 2.5, y1 = 2.5, line = list(color = "black")),
            list(type = "line", x0 = 400, x1 = 600, y0 = 3.5, y1 = 0.5, line = list(color = "black")),
            list(type = "line", x0 = -400, x1 = -450, y0 = 3.5, y1 = 0.5, line = list(color = "black")),
            list(type = "rect", x0 = 300, x1 = 400, y0 = 0.5, y1 = -2, line = list(color = "black"),
                fillcolor = "brown"), list(type = "rect", x0 = 100, x1 = 200, y0 = 0.5,
                y1 = -2, line = list(color = "black"), fillcolor = "brown"), list(type = "rect",
                x0 = -100, x1 = 0, y0 = 0.5, y1 = -2, line = list(color = "black"),
                fillcolor = "brown"), list(type = "rect", x0 = -300, x1 = -200, y0 = 0.5,
                y1 = -2, line = list(color = "black"), fillcolor = "brown"), list(type = "rect",
                x0 = 300, x1 = 400, y0 = -1.5, y1 = -2, line = list(color = "black"),
                fillcolor = "black"), list(type = "rect", x0 = 100, x1 = 200, y0 = -1.5,
                y1 = -2, line = list(color = "black"), fillcolor = "black"), list(type = "rect",
                x0 = -100, x1 = 0, y0 = -1.5, y1 = -2, line = list(color = "black"),
                fillcolor = "black"), list(type = "rect", x0 = -300, x1 = -200, y0 = -1.5,
                y1 = -2, line = list(color = "black"), fillcolor = "black"), list(type = "line",
                x0 = 700, x1 = 900, y0 = 1.5, y1 = 0, line = list(color = "lightblue")),
            list(type = "line", x0 = 900, x1 = 1000, y0 = 0, y1 = 1, line = list(color = "lightblue")),
            list(type = "line", x0 = 1000, x1 = 900, y0 = 1, y1 = 3, line = list(color = "lightblue")),
            list(type = "line", x0 = 900, x1 = 700, y0 = 3, y1 = 1.5, line = list(color = "lightblue")),
            list(type = "line", x0 = 553.34, x1 = 20000, y0 = 1.2, y1 = 1.2, line = list(color = "royalblue",
                size = 20)), list(type = "line", x0 = -438, x1 = -20000, y0 = 1.2,
                y1 = 1.2, line = list(color = "royalblue", size = 20))), title = "Women and children first...?",
        margin = list(t = 50)) %>%
    config(modeBarButtonsToRemove = c("autoScale2d", "lasso2d", "select2d"))

Lots to unpack here. Let’s take a deep dive like those treasure salvagers going down in their submersibles in the movie. First, click zoom out in the top right corner a couple of times to appreciate a visual depiction of how I imagine Jack would’ve painted the scene if he had RStudio instead of paintbrushes.

Next, click the little home button to reset the axes. The seafaring principle of “women and children first” was popularized by this event (and an iconic line from the movie), but is apparently a lie according to a quick Google search (“every man for himself” seems to be the proper guideline to follow).

From the survival rate bars, we see more women survived in every cabin class. Children (those under 18, denoted by larger symbols) clearly fared better in First and Second class (only one child perished in First Class, 2 in Second). Once we get to Third, it’s not so clear that children fared much better (more children may have died than survived). For a closer look, make sure to select zoom, then click and drag your cursor over the cabin/area you want to see.