titanic_demo

Chaoran Liu
26 Sept, 2015

Overview

This is a simple prediction based on Titanic dataset using decision tree alogrithm.
The dataset is obtained from kaggle Titanic competition.
Response: Survived (0-Perished; 1-Survived)
Features:

                class level complete
PassengerId   integer   891     100%
Survived      integer     2     100%
Pclass        integer     3     100%
Name        character   891     100%
Sex         character     2     100%
Age           numeric    89   80.13%
SibSp         integer     7     100%
Parch         integer     7     100%
Ticket      character   681     100%
Fare          numeric   248     100%
Cabin       character   148    22.9%
Embarked    character     4   99.78%
Cat         character     1     100%

Data Pre-processing

  1. Handling missing data.
    • Age: predict from other variables
    • Fare: replace NA with median value
    • Embarked: replace NA with the biggest category
  2. Creating features:
    • created Title from Name
    • created FamilySize from SibSp and Parch

Decision Tree Model

Decision Tree Diagram:
plot of chunk unnamed-chunk-2

Predict with shiny.app