[참고] https://www.kaggle.com/pyy0715/titanic-data-analysis-with-r


변수 설명

  • survival : 생존유무, target 값. (0 = 사망, 1 = 생존)
  • pclass : 티켓 클래스. (1 = 1st, 2 = 2nd, 3 = 3rd)
  • sex : 성별
  • Age : 나이(세)
  • sibsp : 함께 탑승한 형제자매, 배우자 수 총합
  • parch : 함께 탑승한 부모, 자녀 수 총합
  • ticket : 티켓 넘버
  • fare : 탑승 요금
  • cabin : 객실 넘버
  • embarked : 탑승 항구

0.2 import data

  • train.csv : 예측 모델을 만들기 위해 사용하는 학습 데이터다. 탑승객의 신상정보와 생존유무가 주어진다.

  • test.csv : 학습 데이터에서 신상정보 및 파생변수를 토대로 모델을 만들고 test.csv파일을 이용하여 생존유무를 예측한다.

  • sampleSubmission.csv : 제출시 사용하는 csv 파일이다.

## # A tibble: 891 x 12
##    PassengerId Survived Pclass Name  Sex     Age SibSp Parch Ticket  Fare
##          <dbl>    <dbl>  <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr>  <dbl>
##  1           1        0      3 Brau~ male     22     1     0 A/5 2~  7.25
##  2           2        1      1 Cumi~ fema~    38     1     0 PC 17~ 71.3 
##  3           3        1      3 Heik~ fema~    26     0     0 STON/~  7.92
##  4           4        1      1 Futr~ fema~    35     1     0 113803 53.1 
##  5           5        0      3 Alle~ male     35     0     0 373450  8.05
##  6           6        0      3 Mora~ male     NA     0     0 330877  8.46
##  7           7        0      1 McCa~ male     54     0     0 17463  51.9 
##  8           8        0      3 Pals~ male      2     3     1 349909 21.1 
##  9           9        1      3 John~ fema~    27     0     2 347742 11.1 
## 10          10        1      2 Nass~ fema~    14     1     0 237736 30.1 
## # ... with 881 more rows, and 2 more variables: Cabin <chr>,
## #   Embarked <chr>
## # A tibble: 418 x 11
##    PassengerId Pclass Name  Sex     Age SibSp Parch Ticket  Fare Cabin
##          <dbl>  <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr>  <dbl> <chr>
##  1         892      3 Kell~ male   34.5     0     0 330911  7.83 <NA> 
##  2         893      3 Wilk~ fema~  47       1     0 363272  7    <NA> 
##  3         894      2 Myle~ male   62       0     0 240276  9.69 <NA> 
##  4         895      3 Wirz~ male   27       0     0 315154  8.66 <NA> 
##  5         896      3 Hirv~ fema~  22       1     1 31012~ 12.3  <NA> 
##  6         897      3 Sven~ male   14       0     0 7538    9.22 <NA> 
##  7         898      3 Conn~ fema~  30       0     0 330972  7.63 <NA> 
##  8         899      2 Cald~ male   26       1     1 248738 29    <NA> 
##  9         900      3 Abra~ fema~  18       0     0 2657    7.23 <NA> 
## 10         901      3 Davi~ male   21       2     0 A/4 4~ 24.2  <NA> 
## # ... with 408 more rows, and 1 more variable: Embarked <chr>
## # A tibble: 1,309 x 12
##    PassengerId Survived Pclass Name  Sex     Age SibSp Parch Ticket  Fare
##          <dbl>    <dbl>  <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr>  <dbl>
##  1           1        0      3 Brau~ male     22     1     0 A/5 2~  7.25
##  2           2        1      1 Cumi~ fema~    38     1     0 PC 17~ 71.3 
##  3           3        1      3 Heik~ fema~    26     0     0 STON/~  7.92
##  4           4        1      1 Futr~ fema~    35     1     0 113803 53.1 
##  5           5        0      3 Alle~ male     35     0     0 373450  8.05
##  6           6        0      3 Mora~ male     NA     0     0 330877  8.46
##  7           7        0      1 McCa~ male     54     0     0 17463  51.9 
##  8           8        0      3 Pals~ male      2     3     1 349909 21.1 
##  9           9        1      3 John~ fema~    27     0     2 347742 11.1 
## 10          10        1      2 Nass~ fema~    14     1     0 237736 30.1 
## # ... with 1,299 more rows, and 2 more variables: Cabin <chr>,
## #   Embarked <chr>

0.3 EDA

변수 형태 정의, 기본 요약

## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1309 obs. of  12 variables:
##  $ PassengerId: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Pclass     : Ord.factor w/ 3 levels "1"<"2"<"3": 3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : Factor w/ 1307 levels "Abbing, Mr. Anthony",..: 156 287 531 430 23 826 775 922 613 855 ...
##  $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
##  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
##  $ SibSp      : num  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : num  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Cabin      : chr  NA "C85" NA "C123" ...
##  $ Embarked   : Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
##   PassengerId   Survived   Pclass                                Name     
##  Min.   :   1   0   :549   1:323   Connolly, Miss. Kate            :   2  
##  1st Qu.: 328   1   :342   2:277   Kelly, Mr. James                :   2  
##  Median : 655   NA's:418   3:709   Abbing, Mr. Anthony             :   1  
##  Mean   : 655                      Abbott, Master. Eugene Joseph   :   1  
##  3rd Qu.: 982                      Abbott, Mr. Rossmore Edward     :   1  
##  Max.   :1309                      Abbott, Mrs. Stanton (Rosa Hunt):   1  
##                                    (Other)                         :1301  
##      Sex           Age            SibSp            Parch      
##  female:466   Min.   : 0.17   Min.   :0.0000   Min.   :0.000  
##  male  :843   1st Qu.:21.00   1st Qu.:0.0000   1st Qu.:0.000  
##               Median :28.00   Median :0.0000   Median :0.000  
##               Mean   :29.88   Mean   :0.4989   Mean   :0.385  
##               3rd Qu.:39.00   3rd Qu.:1.0000   3rd Qu.:0.000  
##               Max.   :80.00   Max.   :8.0000   Max.   :9.000  
##               NA's   :263                                     
##     Ticket               Fare            Cabin           Embarked  
##  Length:1309        Min.   :  0.000   Length:1309        C   :270  
##  Class :character   1st Qu.:  7.896   Class :character   Q   :123  
##  Mode  :character   Median : 14.454   Mode  :character   S   :914  
##                     Mean   : 33.295                      NA's:  2  
##                     3rd Qu.: 31.275                                
##                     Max.   :512.329                                
##                     NA's   :1

결측치 EDA_1

## PassengerId    Survived      Pclass        Name         Sex         Age 
##         891           2           3         891           2          89 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           7           7         681         248         148           4
## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0         418           0           0           0         263 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           1        1014           2
## # A tibble: 1 x 12
##   PassengerId Survived Pclass  Name   Sex   Age SibSp Parch Ticket    Fare
##         <dbl>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl>
## 1           0    0.319      0     0     0 0.201     0     0      0 7.64e-4
## # ... with 2 more variables: Cabin <dbl>, Embarked <dbl>
## # A tibble: 12 x 2
##    feature     missing_pct
##    <chr>             <dbl>
##  1 PassengerId    0       
##  2 Survived       0.319   
##  3 Pclass         0       
##  4 Name           0       
##  5 Sex            0       
##  6 Age            0.201   
##  7 SibSp          0       
##  8 Parch          0       
##  9 Ticket         0       
## 10 Fare           0.000764
## 11 Cabin          0.775   
## 12 Embarked       0.00153


결측치 EDA_2


sex(성별) EDA

## 
## female   male 
##    466    843
## Warning: Factor `Survived` contains implicit NA, consider using
## `forcats::fct_explicit_na`

## Warning: Factor `Survived` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 6 x 3
## # Groups:   Survived [3]
##   Survived Sex     freq
##   <fct>    <fct>  <int>
## 1 0        female    81
## 2 0        male     468
## 3 1        female   233
## 4 1        male     109
## 5 <NA>     female   152
## 6 <NA>     male     266
##         
##                  0         1
##   female 0.2579618 0.7420382
##   male   0.8110919 0.1889081


Pclass(티켓 클래스) EDA

## 
##   1   2   3 
## 323 277 709
##    
##             0         1
##   1 0.3703704 0.6296296
##   2 0.5271739 0.4728261
##   3 0.7576375 0.2423625


fare(탑승요금) EDA

## Warning: Removed 1 rows containing non-finite values (stat_bin).


age(나이) EDA

## Warning: Removed 263 rows containing non-finite values (stat_bin).
## Warning: Removed 177 rows containing non-finite values (stat_density).


sibsp(함께 탑승한 형제자매, 배우자 수 총합) EDA

## 
##   0   1   2   3   4   5   8 
## 891 319  42  20  22   6   9
## # A tibble: 12 x 3
## # Groups:   Survived [2]
##    Survived SibSp  freq
##       <dbl> <dbl> <int>
##  1        0     0   398
##  2        0     1    97
##  3        0     2    15
##  4        0     3    12
##  5        0     4    15
##  6        0     5     5
##  7        0     8     7
##  8        1     0   210
##  9        1     1   112
## 10        1     2    13
## 11        1     3     4
## 12        1     4     3
##    
##             0         1
##   0 0.6546053 0.3453947
##   1 0.4641148 0.5358852
##   2 0.5357143 0.4642857
##   3 0.7500000 0.2500000
##   4 0.8333333 0.1666667
##   5 1.0000000 0.0000000
##   8 1.0000000 0.0000000

parch (함께 탑승한 부모, 자녀 수 총합) EDA

## 
##   0   1   2   3   4   5   6 
## 678 118  80   5   4   5   1
## # A tibble: 12 x 3
## # Groups:   Survived [2]
##    Survived Parch  freq
##       <dbl> <dbl> <int>
##  1        0     0   445
##  2        0     1    53
##  3        0     2    40
##  4        0     3     2
##  5        0     4     4
##  6        0     5     4
##  7        0     6     1
##  8        1     0   233
##  9        1     1    65
## 10        1     2    40
## 11        1     3     3
## 12        1     5     1
##    
##             0         1
##   0 0.6563422 0.3436578
##   1 0.4491525 0.5508475
##   2 0.5000000 0.5000000
##   3 0.4000000 0.6000000
##   4 1.0000000 0.0000000
##   5 0.8000000 0.2000000
##   6 1.0000000 0.0000000

Embarked(탑승 항구) EDA

## 
##   C   Q   S 
## 168  77 644
## # A tibble: 7 x 3
## # Groups:   Survived [2]
##   Survived Embarked  freq
##      <dbl> <chr>    <int>
## 1        0 C           75
## 2        0 Q           47
## 3        0 S          427
## 4        1 C           93
## 5        1 Q           30
## 6        1 S          217
## 7        1 <NA>         2
##    
##             0         1
##   C 0.4464286 0.5535714
##   Q 0.6103896 0.3896104
##   S 0.6630435 0.3369565

0.4 data mumming_1

결측치 처리

## PassengerId    Survived      Pclass        Name         Sex         Age 
##           0         418           0           0           0         263 
##       SibSp       Parch      Ticket        Fare       Cabin    Embarked 
##           0           0           0           1        1014           2
## # A tibble: 2 x 12
##   PassengerId Survived Pclass Name  Sex     Age SibSp Parch Ticket  Fare
##         <dbl> <fct>    <ord>  <fct> <fct> <dbl> <dbl> <dbl> <chr>  <dbl>
## 1          62 1        1      Icar~ fema~    38     0     0 113572    80
## 2         830 1        1      Ston~ fema~    62     0     0 113572    80
## # ... with 2 more variables: Cabin <chr>, Embarked <fct>
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).

## # A tibble: 2 x 12
##   PassengerId Survived Pclass Name  Sex     Age SibSp Parch Ticket  Fare
##         <dbl> <fct>    <ord>  <fct> <fct> <dbl> <dbl> <dbl> <chr>  <dbl>
## 1          62 1        1      Icar~ fema~    38     0     0 113572    80
## 2         830 1        1      Ston~ fema~    62     0     0 113572    80
## # ... with 2 more variables: Cabin <chr>, Embarked <fct>
## # A tibble: 1 x 12
##   PassengerId Survived Pclass Name  Sex     Age SibSp Parch Ticket  Fare
##         <dbl> <fct>    <ord>  <fct> <fct> <dbl> <dbl> <dbl> <chr>  <dbl>
## 1        1044 <NA>     3      Stor~ male   60.5     0     0 3701      NA
## # ... with 2 more variables: Cabin <chr>, Embarked <fct>
## # A tibble: 1 x 12
##   PassengerId Survived Pclass Name  Sex     Age SibSp Parch Ticket  Fare
##         <dbl> <fct>    <ord>  <fct> <fct> <dbl> <dbl> <dbl> <chr>  <dbl>
## 1        1044 <NA>     3      Stor~ male   60.5     0     0 3701    8.05
## # ... with 2 more variables: Cabin <chr>, Embarked <fct>

0.5 data mumming_2

Feature engineering

##  [1] "Mr"           "Mrs"          "Miss"         "Master"      
##  [5] "Don"          "Rev"          "Dr"           "Mme"         
##  [9] "Ms"           "Major"        "Lady"         "Sir"         
## [13] "Mlle"         "Col"          "Capt"         "the Countess"
## [17] "Jonkheer"     "Dona"
##    Cell Contents 
## |-------------------------|
## |                       N | 
## |           N / Row Total | 
## |-------------------------|
## 
## |         Capt |          Col |          Don |         Dona |
## |--------------|--------------|--------------|--------------|
## |            1 |            4 |            1 |            1 |
## |        0.001 |        0.003 |        0.001 |        0.001 |
## |--------------|--------------|--------------|--------------|
## 
## |           Dr |     Jonkheer |         Lady |        Major |
## |--------------|--------------|--------------|--------------|
## |            8 |            1 |            1 |            2 |
## |        0.006 |        0.001 |        0.001 |        0.002 |
## |--------------|--------------|--------------|--------------|
## 
## |       Master |         Miss |         Mlle |          Mme |
## |--------------|--------------|--------------|--------------|
## |           61 |          260 |            2 |            1 |
## |        0.047 |        0.199 |        0.002 |        0.001 |
## |--------------|--------------|--------------|--------------|
## 
## |           Mr |          Mrs |           Ms |          Rev |
## |--------------|--------------|--------------|--------------|
## |          757 |          197 |            2 |            8 |
## |        0.578 |        0.150 |        0.002 |        0.006 |
## |--------------|--------------|--------------|--------------|
## 
## |          Sir | the Countess |
## |--------------|--------------|
## |            1 |            1 |
## |        0.001 |        0.001 |
## |--------------|--------------|
##    Cell Contents 
## |-------------------------|
## |                       N | 
## |           N / Row Total | 
## |-------------------------|
## 
## |  Master |    Miss |      Mr |     Mrs | Officer |
## |---------|---------|---------|---------|---------|
## |      61 |     266 |     757 |     198 |      27 |
## |   0.047 |   0.203 |   0.578 |   0.151 |   0.021 |
## |---------|---------|---------|---------|---------|

## 
##   1   2   3   4   5   6   7   8  11 
## 790 235 159  43  22  25  16   8  11


## Warning: Unknown or uninitialised column: 'Familysize'.
## 
##  large single  small 
##     82    790    437

## $x
## [1] "Familysize"
## 
## $y
## [1] "Rate"
## 
## attr(,"class")
## [1] "labels"

##  [1] NA            "C85"         NA            "C123"        NA           
##  [6] NA            "E46"         NA            NA            NA           
## [11] "G6"          "C103"        NA            NA            NA           
## [16] NA            NA            NA            NA            NA           
## [21] NA            "D56"         NA            "A6"          NA           
## [26] NA            NA            "C23 C25 C27"
## [1] "C" "8" "5"
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1309 obs. of  16 variables:
##  $ PassengerId: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Pclass     : Ord.factor w/ 3 levels "1"<"2"<"3": 3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : Factor w/ 1307 levels "Abbing, Mr. Anthony",..: 156 287 531 430 23 826 775 922 613 855 ...
##  $ Sex        : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
##  $ SibSp      : num  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : num  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Cabin      : chr  NA "C85" NA "C123" ...
##  $ Embarked   : Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
##  $ Title      : Factor w/ 5 levels "Master","Miss",..: 3 4 2 4 3 3 3 1 4 4 ...
##  $ Fsize      : num  2 2 1 2 1 1 1 5 3 2 ...
##  $ Familysize : Factor w/ 3 levels "large","single",..: 3 3 2 3 2 2 2 1 3 3 ...
##  $ Deck       : chr  NA "C" NA "C" ...
## # A tibble: 6 x 15
##   PassengerId Survived Pclass Name  Sex     Age SibSp Parch Ticket  Fare
##         <dbl> <fct>    <ord>  <fct> <fct> <dbl> <dbl> <dbl> <chr>  <dbl>
## 1           1 0        3      Brau~ 0        22     1     0 A/5 2~  7.25
## 2           2 1        1      Cumi~ 1        38     1     0 PC 17~ 71.3 
## 3           3 1        3      Heik~ 1        26     0     0 STON/~  7.92
## 4           4 1        1      Futr~ 1        35     1     0 113803 53.1 
## 5           5 0        3      Alle~ 0        35     0     0 373450  8.05
## 6           6 0        3      Mora~ 0        NA     0     0 330877  8.46
## # ... with 5 more variables: Embarked <fct>, Title <fct>, Fsize <dbl>,
## #   Familysize <fct>, Deck <chr>

## # A tibble: 11 x 2
##    Deck      n
##    <chr> <int>
##  1 A        22
##  2 B        65
##  3 C        94
##  4 D        46
##  5 E        41
##  6 F        21
##  7 G         5
##  8 T         1
##  9 X        67
## 10 Y       254
## 11 Z       693

## Warning: Removed 263 rows containing non-finite values (stat_density).

## Warning: Removed 263 rows containing non-finite values (stat_density).

## Warning: Removed 263 rows containing non-finite values (stat_density).


##     Title  Age.mean    Age.sd Age.median
## 1  Master  5.482642  4.161554          4
## 2    Miss 22.026000 12.300349         22
## 3      Mr 32.252151 12.422089         29
## 4     Mrs 36.918129 12.902087         35
## 5 Officer 45.307692 11.460434         48


## [1] 929
## [1] "A/5 21171"        "PC 17599"         "STON/O2. 3101282"
## [4] "113803"           "373450"           "330877"
##   PassengerId Survived Pclass
## 1         258        1      1
## 2         505        1      1
## 3         760        1      1
## 4         263        0      1
## 5         559        1      1
## 6         586        1      1
##                                                       Name Sex   Age SibSp
## 1                                     Cherry, Miss. Gladys   1 adult     0
## 2                                    Maioni, Miss. Roberta   1 child     0
## 3 Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)   1 adult     0
## 4                                        Taussig, Mr. Emil   0 adult     1
## 5                   Taussig, Mrs. Emil (Tillie Mandelbaum)   1 adult     1
## 6                                      Taussig, Miss. Ruth   1 child     0
##   Parch Ticket  Fare Embarked   Title Fsize Familysize Deck
## 1     0 110152 86.50        S    Miss     1     single    B
## 2     0 110152 86.50        S    Miss     1     single    B
## 3     0 110152 86.50        S Officer     1     single    B
## 4     1 110413 79.65        S      Mr     3      small    E
## 5     1 110413 79.65        S     Mrs     3      small    E
## 6     2 110413 79.65        S    Miss     3      small    E
##   PassengerId Survived Pclass
## 1           1        0      3
## 2           2        1      1
## 3           3        1      3
## 4           4        1      1
## 5           5        0      3
## 6           6        0      3
##                                                  Name Sex   Age SibSp
## 1                             Braund, Mr. Owen Harris   0 adult     1
## 2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)   1 adult     1
## 3                              Heikkinen, Miss. Laina   1 adult     0
## 4        Futrelle, Mrs. Jacques Heath (Lily May Peel)   1 adult     1
## 5                            Allen, Mr. William Henry   0 adult     0
## 6                                    Moran, Mr. James   0 adult     0
##   Parch           Ticket    Fare Embarked Title Fsize Familysize Deck
## 1     0        A/5 21171  7.2500        S    Mr     2      small    Z
## 2     0         PC 17599 71.2833        C   Mrs     2      small    C
## 3     0 STON/O2. 3101282  7.9250        S  Miss     1     single    Z
## 4     0           113803 53.1000        S   Mrs     2      small    C
## 5     0           373450  8.0500        S    Mr     1     single    Z
## 6     0           330877  8.4583        Q    Mr     1     single    Z
##   TravelGroup
## 1           1
## 2           2
## 3           3
## 4           4
## 5           5
## 6           6
## # A tibble: 6 x 17
##   PassengerId Survived Pclass Name  Sex   Age   SibSp Parch Ticket  Fare
##         <dbl> <fct>    <ord>  <fct> <fct> <chr> <dbl> <dbl> <chr>  <dbl>
## 1         258 1        1      Cher~ 1     adult     0     0 110152  86.5
## 2         505 1        1      Maio~ 1     child     0     0 110152  86.5
## 3         760 1        1      Roth~ 1     adult     0     0 110152  86.5
## 4         263 0        1      Taus~ 0     adult     1     1 110413  79.6
## 5         559 1        1      Taus~ 1     adult     1     1 110413  79.6
## 6         586 1        1      Taus~ 1     child     0     2 110413  79.6
## # ... with 7 more variables: Embarked <fct>, Title <fct>, Fsize <dbl>,
## #   Familysize <fct>, Deck <chr>, TravelGroup <int>, GroupSize <int>

0.6 prediction

## Classes 'tbl_df', 'tbl' and 'data.frame':    1309 obs. of  17 variables:
##  $ PassengerId: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Pclass     : Ord.factor w/ 3 levels "1"<"2"<"3": 3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : Factor w/ 1307 levels "Abbing, Mr. Anthony",..: 156 287 531 430 23 826 775 922 613 855 ...
##  $ Sex        : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Age        : chr  "adult" "adult" "adult" "adult" ...
##  $ SibSp      : num  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : num  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Embarked   : Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
##  $ Title      : Factor w/ 5 levels "Master","Miss",..: 3 4 2 4 3 3 3 1 4 4 ...
##  $ Fsize      : num  2 2 1 2 1 1 1 5 3 2 ...
##  $ Familysize : Factor w/ 3 levels "large","single",..: 3 3 2 3 2 2 2 1 3 3 ...
##  $ Deck       : chr  "Z" "C" "Z" "C" ...
##  $ TravelGroup: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ GroupSize  : int  1 2 1 2 1 1 2 5 3 2 ...
## Classes 'tbl_df', 'tbl' and 'data.frame':    1309 obs. of  10 variables:
##  $ Survived  : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Pclass    : Ord.factor w/ 3 levels "1"<"2"<"3": 3 1 3 1 3 3 1 3 3 2 ...
##  $ Sex       : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
##  $ Age       : Factor w/ 3 levels "adult","child",..: 1 1 1 1 1 1 1 2 1 2 ...
##  $ Fare      : num  2.11 4.28 2.19 3.99 2.2 ...
##  $ Embarked  : Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
##  $ Title     : Factor w/ 5 levels "Master","Miss",..: 3 4 2 4 3 3 3 1 4 4 ...
##  $ Familysize: Factor w/ 3 levels "large","single",..: 3 3 2 3 2 2 2 1 3 3 ...
##  $ Deck      : Factor w/ 11 levels "A","B","C","D",..: 11 3 11 3 11 11 5 11 11 10 ...
##  $ GroupSize : Factor w/ 9 levels "1","2","3","4",..: 1 2 1 2 1 1 2 5 3 2 ...
##   Pclass1 Pclass2 Pclass3 Sex1 Agechild Agesenior     Fare EmbarkedQ
## 1       0       0       1    0        0         0 2.110213         0
## 2       1       0       0    1        0         0 4.280593         0
## 3       0       0       1    1        0         0 2.188856         0
## 4       1       0       0    1        0         0 3.990834         0
## 5       0       0       1    0        0         0 2.202765         0
## 6       0       0       1    0        0         0 2.246893         1
##   EmbarkedS TitleMiss TitleMr TitleMrs TitleOfficer Familysizesingle
## 1         1         0       1        0            0                0
## 2         0         0       0        1            0                0
## 3         1         1       0        0            0                1
## 4         1         0       0        1            0                0
## 5         1         0       1        0            0                1
## 6         0         0       1        0            0                1
##   Familysizesmall DeckB DeckC DeckD DeckE DeckF DeckG DeckT DeckX DeckY
## 1               1     0     0     0     0     0     0     0     0     0
## 2               1     0     1     0     0     0     0     0     0     0
## 3               0     0     0     0     0     0     0     0     0     0
## 4               1     0     1     0     0     0     0     0     0     0
## 5               0     0     0     0     0     0     0     0     0     0
## 6               0     0     0     0     0     0     0     0     0     0
##   DeckZ GroupSize2 GroupSize3 GroupSize4 GroupSize5 GroupSize6 GroupSize7
## 1     1          0          0          0          0          0          0
## 2     0          1          0          0          0          0          0
## 3     1          0          0          0          0          0          0
## 4     0          1          0          0          0          0          0
## 5     1          0          0          0          0          0          0
## 6     1          0          0          0          0          0          0
##   GroupSize8 GroupSize11
## 1          0           0
## 2          0           0
## 3          0           0
## 4          0           0
## 5          0           0
## 6          0           0
##   Pclass1 Pclass2 Pclass3 Sex1 Agechild Agesenior     Fare EmbarkedQ
## 1       0       0       1    0        0         0 2.178064         1
## 2       0       0       1    1        0         0 2.079442         0
## 3       0       1       0    0        0         0 2.369075         1
## 4       0       0       1    0        0         0 2.268252         0
## 5       0       0       1    1        0         0 2.586824         0
## 6       0       0       1    0        1         0 2.324836         0
##   EmbarkedS TitleMiss TitleMr TitleMrs TitleOfficer Familysizesingle
## 1         0         0       1        0            0                1
## 2         1         0       0        1            0                0
## 3         0         0       1        0            0                1
## 4         1         0       1        0            0                1
## 5         1         0       0        1            0                0
## 6         1         0       1        0            0                1
##   Familysizesmall DeckB DeckC DeckD DeckE DeckF DeckG DeckT DeckX DeckY
## 1               0     0     0     0     0     0     0     0     0     0
## 2               1     0     0     0     0     0     0     0     0     0
## 3               0     0     0     0     0     0     0     0     0     1
## 4               0     0     0     0     0     0     0     0     0     0
## 5               1     0     0     0     0     0     0     0     0     0
## 6               0     0     0     0     0     0     0     0     0     0
##   DeckZ GroupSize2 GroupSize3 GroupSize4 GroupSize5 GroupSize6 GroupSize7
## 1     1          0          0          0          0          0          0
## 2     1          0          0          0          0          0          0
## 3     0          0          0          0          0          0          0
## 4     1          0          0          0          0          0          0
## 5     1          1          0          0          0          0          0
## 6     1          0          0          0          0          0          0
##   GroupSize8 GroupSize11
## 1          0           0
## 2          0           0
## 3          0           0
## 4          0           0
## 5          0           0
## 6          0           0

##               Length  Class              Mode       
## handle              1 xgb.Booster.handle externalptr
## raw           3882785 -none-             raw        
## niter               1 -none-             numeric    
## call                7 -none-             call       
## params             10 -none-             list       
## callbacks           0 -none-             list       
## feature_names      33 -none-             character  
## nfeatures           1 -none-             numeric
##               Length  Class              Mode       
## handle              1 xgb.Booster.handle externalptr
## raw           3449489 -none-             raw        
## niter               1 -none-             numeric    
## call                7 -none-             call       
## params             10 -none-             list       
## callbacks           0 -none-             list       
## feature_names      33 -none-             character  
## nfeatures           1 -none-             numeric
##  [1] 0.52341890 0.79136348 0.88854092 0.05676261 0.05077031 0.27443644
##  [7] 0.81016070 0.05935763 0.11204775 0.20620725
##  [1] 1 1 1 0 0 0 1 0 1 1
## Levels: 0 1