Georgios Mintzopoulos
23/09/2015
In the context of the course Developing Data Products we used the Titanic dataset as provided by Kaggle. We use the provided train csv file for building two prediction on who survived the disaster.
The application written in shiny for this project works as following:
The initial dataset has 891 observations and 11 variables as possible predictors, and the structure:
str(train)
'data.frame': 891 obs. of 12 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Name : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : Factor w/ 147 levels "A10","A14","A16",..: NA 82 NA 56 NA NA 130 NA NA NA ...
$ Embarked : Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
The wrangled dataset has 6 predictors and the following structure:
str(training)
'data.frame': 714 obs. of 7 variables:
$ Survived: int 0 1 1 0 0 0 1 1 1 1 ...
$ Pclass : int 3 1 1 3 1 3 3 2 3 1 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 2 2 2 1 1 1 1 ...
$ Age : num 22 38 35 29 54 2 27 14 4 58 ...
$ SibSp : int 1 1 1 0 0 3 0 1 1 0 ...
$ Parch : int 0 0 0 0 0 1 2 0 1 0 ...
$ Fare : num 7.25 71.28 53.1 8.46 51.86 ...
The models we fit are:
The fit results are shown as a confusion matrix.