Surviving the Titanic

Luc Frachon
2016-02-17

On April 15, 1912, the largest ocean liner of its time sank in the North Atlantic, killing over two thirds of its 2,213 passengers and crew. Interactively explore how different learning algorithms predict survival of passengers.

Problem Statement

Predict survival (1) or death (0) of a passenger based on their Gender, Age and Passenger Class.

Data used:

  • Titanic dataset provided with R, tidied
  • Predictors: Sex (female / male), Age, Pclass (1 / 2 / 3)
  • Outcome: Survived (0 / 1)
  • 25% retained as validation set

[1] "Summary of the training data"
 Survived Pclass      Sex           Age       
 0:318    1:143   female:187   Min.   : 0.42  
 1:218    2:127   male  :349   1st Qu.:20.88  
          3:266                Median :29.00  
                               Mean   :29.78  
                               3rd Qu.:39.00  
                               Max.   :80.00  

Learning Algorithms

alt_text

3 classification algorithm families are offered:

  • Selection Trees
  • Generalised Linear Models
  • Boosted Generalised Linear Models with logit link

For each algorithm, one parameter can be tuned using the slider.

Predictions

plot of chunk unnamed-chunk-4

  • Are run on the validation set instantly (no “Go” button)
  • Accuracy, confusion matrix and other statistics provided
  • Plot shows Survival vs all predictors, as well as correct / incorrect prediction

Results

  • Simplest model Selection Tree with \( depth = 1 \Leftrightarrow \) \( Male = died, \: Female = Survived \):
    \( 79.78\% \) accuracy, \( 95\% \: CI = [73.12-85.41\%] \)
  • Best result Boosted Generalised Linear Model with \( \nu \: (step \: size) = 0.019 \):
    \( 83.71\% \) accuracy with \( 95\% \: CI = [77.45-88.81\%] \)
  • Misclassifications: mainly young Male 2nd/3rd class
  • Findings need to be confirmed against unseen test set

          Reference
Prediction  0  1
         0 90 13
         1 16 59
      Accuracy  AccuracyLower  AccuracyUpper AccuracyPValue 
     8.371e-01      7.745e-01      8.881e-01      2.943e-12