Titanic competition from Kaggle. Part 2. Create dummy variables to factor variables using Caret package.

Part 1 is https://rpubs.com/Minxing2046/395349

library(tidyverse)
library(DataExplorer)
library(lubridate)
library(pander)
library(data.table)
library(grid)
library(gridExtra)
library(mice)
library(caret)

1 Loading the data

the dataset is generated from part 1: https://rpubs.com/Minxing2046/395349

## 'data.frame':    1309 obs. of  7 variables:
##  $ PassengerId: int  892 893 894 895 896 897 898 899 900 901 ...
##  $ Pclass     : int  3 3 2 3 3 3 3 2 3 3 ...
##  $ Sex        : Factor w/ 2 levels "female","male": 2 1 2 2 1 2 1 2 1 2 ...
##  $ Age        : num  34.5 47 62 27 22 14 30 26 18 21 ...
##  $ SibSp      : int  0 1 0 0 1 0 0 1 0 2 ...
##  $ Parch      : int  0 0 0 0 1 0 0 1 0 0 ...
##  $ Survived   : int  NA NA NA NA NA NA NA NA NA NA ...

Set Pclass as factor

2 Create Dummies for factor variables.

Introduction is here https://www.machinelearningplus.com/machine-learning/caret-package/

create dummy variables to factor variables using dummyVars function in the Caret package

dummyVars takes two steps

  1. calldummyVars function.
  2. use predict function, then convert into data.frame
## 'data.frame':    1309 obs. of  10 variables:
##  $ PassengerId: num  892 893 894 895 896 897 898 899 900 901 ...
##  $ Pclass.1   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Pclass.2   : num  0 0 1 0 0 0 0 1 0 0 ...
##  $ Pclass.3   : num  1 1 0 1 1 1 1 0 1 1 ...
##  $ Sex.female : num  0 1 0 0 1 0 1 0 1 0 ...
##  $ Sex.male   : num  1 0 1 1 0 1 0 1 0 1 ...
##  $ Age        : num  34.5 47 62 27 22 14 30 26 18 21 ...
##  $ SibSp      : num  0 1 0 0 1 0 0 1 0 2 ...
##  $ Parch      : num  0 0 0 0 1 0 0 1 0 0 ...
##  $ Survived   : num  NA NA NA NA NA NA NA NA NA NA ...
## [1] 418

Export dataset