The Myth of The RMS Titanic


Source: Encyclopaedia Britannica, Titanic Ship (Amy Tikkanen,2019)

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: VISUALIZATION — LEARNING FROM DISASTER: TITANIC (Loquarts,2016)


Objective and Targeted Audience

  • The original objective was to ‘determine if with the other features/information about the passengers it is possible to determine those who are likely to survive.’(Loquarts,2016)

  • The targetted audience wasn’t stated, but based on the entire work, it’s most likely for general public or whoever’s interested in the myth of titanic.

Main Issues

  • Confusing Objective: not sure about the original objective until seeing the conclusion
  • Perceptual issues: 1) incosistent aesthetic representation; 2) incomprehensive visualisation to see the relations
  • Inadequte Conclusion: conclusion on ticket class 1 is not entirely correct

After Reconstruction

  • Use multivariate visualisation techniques to show the relations
  • Mark all key aesthetic components to make the visualisation easy to read
  • Illustrate the relations between ticket fare and class (higher fare doesn’t guarantee a higher ticket class)

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(tidyverse)

# import train.csv
rmstitanic <- read.csv("train.csv")
class(rmstitanic)
## [1] "data.frame"
# remove missing value - NA
rmstitanic <- na.omit(rmstitanic)


# Factor variable "Survived" and replace its value with meaningful words
rmstitanic$Survived[rmstitanic$Survived == 1] <- "Survived"
rmstitanic$Survived[rmstitanic$Survived == 0] <- "Dead"
rmstitanic$Survived <- as.factor(rmstitanic$Survived)
rmstitanic$Survived %>% head()
## [1] Dead     Survived Survived Survived Dead     Dead    
## Levels: Dead Survived
# Factor variable "Pclass", replace its value with meaningful words and level the measurement
rmstitanic$Pclass[rmstitanic$Pclass == 1] <- "Class 1" # top class
rmstitanic$Pclass[rmstitanic$Pclass == 2] <- "Class 2"
rmstitanic$Pclass[rmstitanic$Pclass == 3] <- "Class 3" # lowest class
rmstitanic$Pclass <- as.factor(rmstitanic$Pclass) 
levels(rmstitanic$Pclass) = c("Class 3", "Class 2", "Class 1")
rmstitanic$Pclass %>% head()
## [1] Class 1 Class 3 Class 1 Class 3 Class 1 Class 3
## Levels: Class 3 Class 2 Class 1
# change variable name "Survived" to "SurvivedOrDead" so that it represents the meaning of the variable clearer
names(rmstitanic)[2] <- "SurvivedOrDead"

# plot four variables into one visualisation to have a more comprehensive undertanding of the relations between passenger ticket fare and their survival situations
p1 <- ggplot(data = rmstitanic, aes(x = Age, y = Fare, colour = SurvivedOrDead)) +
      geom_point() +
      facet_grid(. ~ Sex)+ 
      labs(title = "Relations Between Ticket Fares and Survival Situations By Gender",
           x = "Passenger's Age", y = "Passenger's Ticket Fare")
p1

# plot relations between passenger ticket fare and ticket class
p2 <- ggplot(data = rmstitanic, aes(y = Fare, x = Pclass)) +
      geom_boxplot(outlier.shape = NA) +
      geom_jitter(alpha = 2/5) +
      ggtitle("Relations Between Passenger Ticket Fare and Ticket Class") +
      labs(x = "Ticket Class", y = "Passenger's Ticket Fare") +
      stat_summary(fun.y = mean, colour = "red", geom = "point", shape = 20) +
      theme_minimal()
p2

Data Reference

Reconstruction

The following plot fixes the main issues in the original.