Investigation of Key Factors affecting Survival Rate on Titanic

Wilson Yap
6 March, 2016

Introduction

This presentation is producced for the Developing Data Products Course Project. There are 2 components to the project:

  • Creating a Shiny application for deployment on Rstudio's servers
  • Creating a Slidify/Rstudio Presenter slides for reproducible pitch presentation about the application

This presentation addresses the 2nd component of the project, while the app created for the 1st component of the project can be found at:

https://drarreg.shinyapps.io/CourseProject/

Titanic Dataset

The data used for the app is the Titanic Dataset which comes from the R dataset package.This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner 'Titanic', summarized according to the following 3 factors:

  • Passenger's Class (Representation of Economic Status)
  • Passenger's Sex
  • Passenger's Age

The data provides the frequency of surviving passengers according to the 3 factors stated above.

Exploratory Data Analysis

We can investigate the data briefly using the following code:

data(Titanic)
str(Titanic)
 table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
 - attr(*, "dimnames")=List of 4
  ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
  ..$ Sex     : chr [1:2] "Male" "Female"
  ..$ Age     : chr [1:2] "Child" "Adult"
  ..$ Survived: chr [1:2] "No" "Yes"

As we an see, there are 4 values to the Class factor, and 2 values to both the Sex and Age factors respectively.

Shiny Application - Investigate the Impact of the 3 Factors on Survival Rate %

This application is designed to help you better investigate the impact of the 3 different factors on the Survival Rate %.

You will need to select a minimum of 1 value from each of the 3 different factors, and the results table will be automatically updated to show the Survival Rate % of the passengers who are represented by the factors selected by you.

Note that the application also works perfectly if you select multiple values for each of the 3 factors.