1.Introduction

Having a dataset set of ticket purchases, I have to build a predictive model about extra baggage request by each client. I approach this problem from exploratory data analysis perspective in the starting stage, and recursively, I do feature engieering, i.e. manipulation of the current features and extraction of new ones out of the data.

Having prepared the data for modelling, I start with logit with all variables, and shrunk logit with lasso. While, these are not for sake of prediction, and these are more for getting insight into the data, I produce ROC curves for them, and even F1 metric as it was asked by the question set.

Later, I venture on decision tree model, random forest model, and gradient boosting. The two latter are famous for their predictive power, so I tune them using grid search on their parameters, and search for good parameter values that generate the best F1. The models are finally compared based on this F1, and the best is chosen for prediction on the final dataset.

This analysis is tried to be as visual as possible. +30 visualizations are generated, mostly interactive, for the readers to get the story better. Note1:The codes are written with the main purpose of readability rather than performance. Thus, there are many for loops, instead of using parallel approach of mclapply in R. The for loops are very slow, but they are easy to read specially for non-R-users. I know how to speed up the code by parallelizing.

Note2: There are some other approaches that I would love to try, such as SVM and NN, however my 5yo laptop is not powerful enough for tunning them all. It took hours to tune each RF or GB model, and unfortunately, I could not try more models. However, the code are presented at the final section, to show my knowledge about some of these models.

2. First evaluation of the data

2.1 Libraries

Various R libararies are used to prevent invention of wheel. However, in some cases that R packages are available, I code from scratch, in order to show my understanding of underlying logic. For instance, it is possible to generate all models with a few commands using caret package and then compare them, but I evade this approach and make the models and tune them to show my understanding.

require(tidyverse)
require(tibble)
require(dplyr)
require(knitr)
require(ggplot2)
require(lubridate) # for date and time 
# require(rjson) #for loading json file 
# require(jsonlite) # for loading json file 
require(readr) # for speeding up reading the csv file
# require(parallel) # to speed up some codes 
# require(tidyjson)
require(purrr)
require(plotly) # for interactive visualization 
# require(smooth)
# require(forecast)
# require(gplot) # for baloonplot
require(tidyr) # for spreading df 
require(alluvial) # for alluvial diagram 
require(tsne) # for tsne visualization
# require(venneuler)
require(caTools) # for sample split
require(caret) # for confusionMatrix
require(ROCR) # for roc curve
# require(parallel) # for parallel computing
require(stringr)
library(Amelia) # for missing value visualization 
# library(forcats) # for collapsing rare factor levels into one level 
# library(pROC)
# library(ROCR) # for ROC 
# library(bestglm) # for model selection of glm
library(glmnet) # for Lasso 
 # library(ranger) # for random forest 
# library(Rborist) # for random forest
library(ggalluvial) # for alluvial diagrams
library(e1071) #for SVM 
library(caret)
# library(som) # for visualization 
library(kohonen) # for SOM visualization

library(cluster) # for daisy function 
library(tempR) # for pretty_pallette()

library(rpart) # for decision tree 
library(tree) # for decision tree 
# library(rattle) # for beautiful tree visualization 

require(xgboost) # for gradient boosting 

require(randomForest)

require(mice) # for missing values imputation 
require(VIM) # for visualization of missing values

require(Matrix) # for sparse model matrix


dyn.load(paste0(system2('/usr/libexec/java_home', stdout = TRUE), '/jre/lib/server/libjvm.dylib'))
require(rJava)

# install.packages("venneuler")
# require(VennDiagram)
# install.packages("VennDiagram")
#require(data.table)
#install.packages("jsonlite")

2.2 reading data

I would like to speculate a little about the data based on my intuition. Considering the goal which is “The goal of this task is to predict which new customers are going to purchase additional baggage for their trips”, and based on common sense, I can guess about influential variables. For instance, extra luguages are more probable when you have children, specially infants. When the time of staying, i.e. the time between departure and arrival, is higher than a “trip”, then probably an extra luguage would be needed. Also I expect continental travels be associated with extra luguges, in contrast to domestic flights, however this association is possibly related to the length of staying. In addition, the month of travel may be related to the duration of staying. I expect longer staying in August rather than in November!

It is now clear that most of the variables are not in their correct type. A little bit of data leaning, and a little bit of type transformation is needed. (Btw, type is a Pythonian term for classes in R)

train_data$TIMESTAMP <- as.Date(x = train_data$TIMESTAMP, format = "%d/%b")
train_data$DEPARTURE <- as.Date(x = train_data$DEPARTURE, format = "%d/%b")
train_data$ARRIVAL <- as.Date(x = train_data$ARRIVAL, format = "%d/%b")


train_data <- 
        train_data %>% 
        mutate(WEBSITE = factor(WEBSITE), 
                TRAIN = factor(TRAIN), 
               DEVICE = factor(DEVICE),
               HAUL_TYPE = factor(HAUL_TYPE),
               DEVICE = factor(DEVICE), 
               TRIP_TYPE = factor(TRIP_TYPE),
               PRODUCT = factor(PRODUCT),
               SMS = factor(SMS),
               EXTRA_BAGGAGE = factor(EXTRA_BAGGAGE)
               )

glimpse(train_data)
## Observations: 50,000
## Variables: 18
## $ ID            <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...
## $ TIMESTAMP     <date> 2018-07-01, 2018-07-01, 2018-07-01, 2018-07-01,...
## $ WEBSITE       <fct> EDES, EDIT, OPUK, OPIT, EDES, EDFR, EDES, EDES, ...
## $ GDS           <int> 1, 0, 2, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, ...
## $ DEPARTURE     <date> 2018-07-22, 2018-07-29, 2018-07-29, 2018-07-24,...
## $ ARRIVAL       <date> 2018-07-25, 2018-07-29, 2018-08-19, 2018-08-04,...
## $ ADULTS        <int> 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, ...
## $ CHILDREN      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ INFANTS       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ TRAIN         <fct> False, False, False, False, False, False, False,...
## $ HAUL_TYPE     <fct> DOMESTIC, CONTINENTAL, CONTINENTAL, DOMESTIC, CO...
## $ DISTANCE      <dbl> 628844, 128143, 173035, 652702, 171785, 106556, ...
## $ DEVICE        <fct> TABLET, SMARTPHONE, TABLET, SMARTPHONE, COMPUTER...
## $ TRIP_TYPE     <fct> ROUND_TRIP, ONE_WAY, ROUND_TRIP, MULTI_DESTINATI...
## $ PRODUCT       <fct> TRIP, TRIP, TRIP, TRIP, TRIP, TRIP, TRIP, TRIP, ...
## $ SMS           <fct> True, False, True, False, False, False, True, Fa...
## $ EXTRA_BAGGAGE <fct> False, False, False, False, False, False, False,...
## $ NO_GDS        <int> 0, 1, 0, 2, 1, 1, 0, 0, 0, 2, 1, 0, 1, 0, 0, 0, ...
# train_data %>%
#         filter(TIMESTAMP > DEPARTURE)

It is always very useful to extract some new features out of the current ones. At least, doing so engages the mind with the data, so it is part of “making sense of the data” phase, and at most, new features would be distinctive for the response variable, i.e. improves the predictive power of the model.

#day of week 
train_data <- 
        train_data %>% 
        mutate(dep_wday = factor(wday(DEPARTURE)), 
               arr_wday = factor(wday(ARRIVAL)), 
               stmp_wday = factor(wday(TIMESTAMP)), 
               dep_month = factor(month(DEPARTURE)), 
               arr_month = factor(month(ARRIVAL)), 
               stmp_month = factor(month(TIMESTAMP)), 
               dep_day = factor(day(DEPARTURE)), 
               arr_day = factor(day(ARRIVAL)), 
               stmp_day = factor(day(TIMESTAMP)), 
               staying_duration= as.integer(ARRIVAL-DEPARTURE) )

# websites and countries 
train_data <- 
        train_data %>% 
                mutate( country = factor(substr(WEBSITE,start = 3 , stop = 10)), 
                        website = factor(substr(WEBSITE , start = 1 , stop = 2)), 
                        WEBSITE = NULL )
        
# number of flights/tickets bought 

train_data <- 
        train_data %>% 
        mutate(num_of_flights = GDS + NO_GDS)


# three 1/3 for each month 1-10, 11-20, 21-30 



# Alone?
summary(train_data)
##        ID          TIMESTAMP               GDS        
##  Min.   :    0   Min.   :2018-07-01   Min.   :0.0000  
##  1st Qu.:12500   1st Qu.:2018-07-01   1st Qu.:0.0000  
##  Median :25000   Median :2018-07-01   Median :1.0000  
##  Mean   :25000   Mean   :2018-07-01   Mean   :0.6424  
##  3rd Qu.:37499   3rd Qu.:2018-07-01   3rd Qu.:1.0000  
##  Max.   :49999   Max.   :2018-07-02   Max.   :4.0000  
##                                                       
##    DEPARTURE             ARRIVAL               ADULTS     
##  Min.   :2018-01-01   Min.   :2018-01-01   Min.   :0.000  
##  1st Qu.:2018-07-07   1st Qu.:2018-07-10   1st Qu.:1.000  
##  Median :2018-07-20   Median :2018-07-27   Median :1.000  
##  Mean   :2018-07-29   Mean   :2018-08-02   Mean   :1.488  
##  3rd Qu.:2018-08-12   3rd Qu.:2018-08-22   3rd Qu.:2.000  
##  Max.   :2018-12-31   Max.   :2018-12-31   Max.   :9.000  
##                                                           
##     CHILDREN         INFANTS          TRAIN                  HAUL_TYPE    
##  Min.   :0.0000   Min.   :0.00000   False:49731   CONTINENTAL     :25878  
##  1st Qu.:0.0000   1st Qu.:0.00000   True :  269   DOMESTIC        :11053  
##  Median :0.0000   Median :0.00000                 INTERCONTINENTAL:13069  
##  Mean   :0.0991   Mean   :0.01816                                         
##  3rd Qu.:0.0000   3rd Qu.:0.00000                                         
##  Max.   :5.0000   Max.   :2.00000                                         
##                                                                           
##     DISTANCE             DEVICE                  TRIP_TYPE    
##  Min.   :     0   COMPUTER  :34064   MULTI_DESTINATION: 2161  
##  1st Qu.:132962   OTHER     :  942   ONE_WAY          :17335  
##  Median :224002   SMARTPHONE:11709   ROUND_TRIP       :30504  
##  Mean   :367276   TABLET    : 3152                            
##  3rd Qu.:612708   NA's      :  133                            
##  Max.   :999941                                               
##                                                               
##     PRODUCT         SMS        EXTRA_BAGGAGE     NO_GDS       dep_wday
##  DYNPACK:  956   False:25168   False:40201   Min.   :0.0000   1:8763  
##  TRIP   :49044   True :24832   True : 9799   1st Qu.:0.0000   2:8260  
##                                              Median :1.0000   3:7062  
##                                              Mean   :0.5913   4:7364  
##                                              3rd Qu.:1.0000   5:6064  
##                                              Max.   :4.0000   6:5896  
##                                                               7:6591  
##  arr_wday stmp_wday   dep_month       arr_month     stmp_month
##  1:6702   1:37822   7      :31496   7      :26701   7:50000   
##  2:7499   2:12178   8      :10783   8      :13286             
##  3:9422             9      : 3997   9      : 5209             
##  4:8123             10     : 1761   10     : 2212             
##  5:6317             12     :  709   11     :  918             
##  6:6165             11     :  640   12     :  594             
##  7:5772             (Other):  614   (Other): 1080             
##     dep_day         arr_day      stmp_day  staying_duration  
##  2      : 3532   3      : 2555   1:37822   Min.   :-363.000  
##  3      : 3164   2      : 2522   2:12178   1st Qu.:   0.000  
##  4      : 2795   4      : 2314             Median :   3.000  
##  5      : 2652   7      : 2128             Mean   :   4.309  
##  8      : 2387   5      : 2119             3rd Qu.:   8.000  
##  6      : 2335   10     : 2114             Max.   : 179.000  
##  (Other):33135   (Other):36248                               
##     country      website    num_of_flights 
##  FR     :15165   ED:28368   Min.   :1.000  
##  ES     : 7725   GO: 6014   1st Qu.:1.000  
##  DE     : 6349   OP:14715   Median :1.000  
##  UK     : 5486   TL:  903   Mean   :1.234  
##  IT     : 4823              3rd Qu.:1.000  
##  GB     : 2754              Max.   :4.000  
##  (Other): 7698

There is something strange here in the above summary of data. There is one negative value for staying duration. In fact the 1st quartile of staying duration is 0. Let’s have a look at negative values for this variable.

train_data %>% 
        filter(staying_duration <0)
## # A tibble: 469 x 30
##       ID TIMESTAMP    GDS DEPARTURE  ARRIVAL    ADULTS CHILDREN INFANTS
##    <int> <date>     <int> <date>     <date>      <int>    <int>   <int>
##  1    15 2018-07-01     1 2018-12-15 2018-01-29      1        0       0
##  2   135 2018-07-01     1 2018-12-26 2018-01-07      4        0       0
##  3   216 2018-07-01     0 2018-12-22 2018-01-02      2        0       0
##  4   489 2018-07-01     1 2018-09-07 2018-02-06      1        0       0
##  5   576 2018-07-01     1 2018-07-04 2018-03-31      1        0       0
##  6   754 2018-07-01     0 2018-12-29 2018-01-28      1        0       0
##  7   968 2018-07-01     1 2018-08-08 2018-03-01      2        0       0
##  8  1000 2018-07-01     0 2018-12-19 2018-01-04      2        1       0
##  9  1010 2018-07-01     1 2018-12-26 2018-01-15      2        1       0
## 10  1039 2018-07-01     1 2018-12-17 2018-01-07      1        0       0
## # ... with 459 more rows, and 22 more variables: TRAIN <fct>,
## #   HAUL_TYPE <fct>, DISTANCE <dbl>, DEVICE <fct>, TRIP_TYPE <fct>,
## #   PRODUCT <fct>, SMS <fct>, EXTRA_BAGGAGE <fct>, NO_GDS <int>,
## #   dep_wday <fct>, arr_wday <fct>, stmp_wday <fct>, dep_month <fct>,
## #   arr_month <fct>, stmp_month <fct>, dep_day <fct>, arr_day <fct>,
## #   stmp_day <fct>, staying_duration <int>, country <fct>, website <fct>,
## #   num_of_flights <int>

So as it is clear, the 2018 assigned to all dates has caused some problem. When the departure is in December, and arrival is in Jan, then we have negative duration of stay. It is not true unless we have time machine. One solution is adding one year to the arrival date of the rows with negative staying duration, or subtracting one year from their departure time. I chose the first option, since all the bookings are in July 2018, so the arrival dates of negative durations should be in 2019, not 2017.

train_data$ARRIVAL[train_data$staying_duration<0] <- train_data$ARRIVAL[train_data$staying_duration<0] %m+% years(1)

train_data <- 
        train_data %>% 
        mutate(staying_duration= as.integer(ARRIVAL-DEPARTURE))

summary(train_data)
##        ID          TIMESTAMP               GDS        
##  Min.   :    0   Min.   :2018-07-01   Min.   :0.0000  
##  1st Qu.:12500   1st Qu.:2018-07-01   1st Qu.:0.0000  
##  Median :25000   Median :2018-07-01   Median :1.0000  
##  Mean   :25000   Mean   :2018-07-01   Mean   :0.6424  
##  3rd Qu.:37499   3rd Qu.:2018-07-01   3rd Qu.:1.0000  
##  Max.   :49999   Max.   :2018-07-02   Max.   :4.0000  
##                                                       
##    DEPARTURE             ARRIVAL               ADULTS     
##  Min.   :2018-01-01   Min.   :2018-01-01   Min.   :0.000  
##  1st Qu.:2018-07-07   1st Qu.:2018-07-10   1st Qu.:1.000  
##  Median :2018-07-20   Median :2018-07-27   Median :1.000  
##  Mean   :2018-07-29   Mean   :2018-08-06   Mean   :1.488  
##  3rd Qu.:2018-08-12   3rd Qu.:2018-08-23   3rd Qu.:2.000  
##  Max.   :2018-12-31   Max.   :2019-06-26   Max.   :9.000  
##                                                           
##     CHILDREN         INFANTS          TRAIN                  HAUL_TYPE    
##  Min.   :0.0000   Min.   :0.00000   False:49731   CONTINENTAL     :25878  
##  1st Qu.:0.0000   1st Qu.:0.00000   True :  269   DOMESTIC        :11053  
##  Median :0.0000   Median :0.00000                 INTERCONTINENTAL:13069  
##  Mean   :0.0991   Mean   :0.01816                                         
##  3rd Qu.:0.0000   3rd Qu.:0.00000                                         
##  Max.   :5.0000   Max.   :2.00000                                         
##                                                                           
##     DISTANCE             DEVICE                  TRIP_TYPE    
##  Min.   :     0   COMPUTER  :34064   MULTI_DESTINATION: 2161  
##  1st Qu.:132962   OTHER     :  942   ONE_WAY          :17335  
##  Median :224002   SMARTPHONE:11709   ROUND_TRIP       :30504  
##  Mean   :367276   TABLET    : 3152                            
##  3rd Qu.:612708   NA's      :  133                            
##  Max.   :999941                                               
##                                                               
##     PRODUCT         SMS        EXTRA_BAGGAGE     NO_GDS       dep_wday
##  DYNPACK:  956   False:25168   False:40201   Min.   :0.0000   1:8763  
##  TRIP   :49044   True :24832   True : 9799   1st Qu.:0.0000   2:8260  
##                                              Median :1.0000   3:7062  
##                                              Mean   :0.5913   4:7364  
##                                              3rd Qu.:1.0000   5:6064  
##                                              Max.   :4.0000   6:5896  
##                                                               7:6591  
##  arr_wday stmp_wday   dep_month       arr_month     stmp_month
##  1:6702   1:37822   7      :31496   7      :26701   7:50000   
##  2:7499   2:12178   8      :10783   8      :13286             
##  3:9422             9      : 3997   9      : 5209             
##  4:8123             10     : 1761   10     : 2212             
##  5:6317             12     :  709   11     :  918             
##  6:6165             11     :  640   12     :  594             
##  7:5772             (Other):  614   (Other): 1080             
##     dep_day         arr_day      stmp_day  staying_duration 
##  2      : 3532   3      : 2555   1:37822   Min.   :  0.000  
##  3      : 3164   2      : 2522   2:12178   1st Qu.:  0.000  
##  4      : 2795   4      : 2314             Median :  3.000  
##  5      : 2652   7      : 2128             Mean   :  7.732  
##  8      : 2387   5      : 2119             3rd Qu.:  9.000  
##  6      : 2335   10     : 2114             Max.   :351.000  
##  (Other):33135   (Other):36248                              
##     country      website    num_of_flights 
##  FR     :15165   ED:28368   Min.   :1.000  
##  ES     : 7725   GO: 6014   1st Qu.:1.000  
##  DE     : 6349   OP:14715   Median :1.000  
##  UK     : 5486   TL:  903   Mean   :1.234  
##  IT     : 4823              3rd Qu.:1.000  
##  GB     : 2754              Max.   :4.000  
##  (Other): 7698

The as.Date function automatically adds 2018 to the month and day dates that we have. The first problem arises here when the timestamp is later than departure date. So I change the year of timestamp to 2017. Unfortunately, there is no guide here attached to the dataset, so I have to assume and continue.

train_data <- 
train_data %>% 
        mutate(TIMESTAMP = TIMESTAMP %m-% years(1))

Nevertheless, the timestamp does not seem helpful here since it has only two values, 1st and 2nd of July, and the extra luggage would have no relation to them rationally. Moreover, the test dataset has different timestamp date, so it is better to remove this variable from the dataset.

train_data <- 
        train_data %>% 
        dplyr::select(-TIMESTAMP, -stmp_wday, -stmp_day, -stmp_month)

2.3 Duplications

I noticed that the data has several thousand duplications. Regardless of the ID variable, the distinct records are :

train_data %>% 
        dplyr::select(-ID ) %>% 
        distinct() %>% 
        nrow()
## [1] 47554

What can I do? I assume that you know it, and I continue with what I have. However, being so will affect the prediction models.

2.4 Missing Values

Is there missing values in our data? in any real datasets there are many missing values, even though the measurement is as precise as possible. We can impute them, or we can harshly remove incomplete cases from the data.

As it is seen, 133 of the records are not complete, i.e. they have missig values. What to do with them? They may cause problem in modelling part. However, since they are very few to overal size of the data, ~0.0025 of the total records, I can safely remove them from the dataset. However, then I have to decide what to do with the missing values of the test dataset. I leave the missing values in the training dataset alone for now.

3.Making Sense of Data - EDA

Having the data in the order that I want, I can now start the first analytical stage. I start with exploratory data analysis in order to make sense of the data. More specifically, my focus would be on visualization. In the following subsections, I start with uni-variate visualization and go up to get the big picture via high-dimensional data visualization.

3.1 Uni-Variate

3.1.1 Website

#the websites 
levels(train_data$website)
## [1] "ED" "GO" "OP" "TL"
#number of websites

g_websites <- 
        train_data %>% 
                dplyr::count(website, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(website,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "websites") + 
                ylab(label = "purchases") + 
                ggtitle("Websites: Frequency of Purchase") 

ggplotly(g_websites)

So it is seen that most of the purchases are coming from “ED”, the least comming from “TL”.

3.1.2 GDS, NO_GDS, and num_of_flights

g_gds <- 
train_data %>% 
        ggplot() + 
        geom_histogram(aes(x = GDS), binwidth = 1, color = "white", fill = "blue") + 
        theme_light() + 
                xlab(label = "Number of Flights") + 
                ylab(label = "Frequency") + 
                ggtitle("Flights# bought through the Global Distribution System")
        
ggplotly(g_gds)
g_nogds <- 
train_data %>% 
        ggplot() + 
        geom_histogram(aes(x = NO_GDS), binwidth = 1, color = "white", fill = "gold") + 
        theme_light() + 
                xlab(label = "Number of Flights") + 
                ylab(label = "Frequency") + 
                ggtitle("Flights# bought NOT through the Global Distribution System")
        
ggplotly(g_nogds)
g_num_of_flights <- 
train_data %>% 
        ggplot() + 
        geom_histogram(aes(x = num_of_flights), binwidth = 1, color = "white", fill = "darkgreen") + 
        theme_light() + 
                xlab(label = "Number of Flights") + 
                ylab(label = "Frequency") + 
                ggtitle("Flights# bought in each transaction")
        
ggplotly(g_num_of_flights)
# subplot(g_gds, g_nogds,g_num_of_flights , nrows = 3)

The GDS table does not show that much without no_GDS variable. Basically, we can figure out how many purchases is for 1 seat, how many is for 2 seats and so on so forth. This is why I have devised num_of_flights variable.

As we can see from the num_of_flights table, most of transactions are for buying only 1 flight, then 2 flights, so on. Let’s have a look at the exact numbers, since the 4 tickets is not shown properly due to scale of the plot.

table(train_data$num_of_flights)
## 
##     1     2     3     4 
## 38769 10783   439     9
train_data %>%
        filter(num_of_flights == 1, TRIP_TYPE == "ROUND_TRIP" )
## # A tibble: 21,434 x 26
##       ID   GDS DEPARTURE  ARRIVAL    ADULTS CHILDREN INFANTS TRAIN
##    <int> <int> <date>     <date>      <int>    <int>   <int> <fct>
##  1     0     1 2018-07-22 2018-07-25      1        0       0 False
##  2     7     1 2018-07-18 2018-07-25      2        0       0 False
##  3    11     1 2018-07-27 2018-08-04      1        0       0 False
##  4    14     1 2018-08-04 2018-09-07      1        0       0 False
##  5    15     1 2018-12-15 2019-01-29      1        0       0 False
##  6    17     1 2018-07-07 2018-07-10      1        0       0 False
##  7    21     1 2018-07-16 2018-07-18      2        0       0 False
##  8    23     0 2018-09-16 2018-09-18      9        0       0 False
##  9    24     0 2018-09-19 2018-09-25      2        0       0 False
## 10    26     0 2018-08-11 2018-08-12      1        0       0 False
## # ... with 21,424 more rows, and 18 more variables: HAUL_TYPE <fct>,
## #   DISTANCE <dbl>, DEVICE <fct>, TRIP_TYPE <fct>, PRODUCT <fct>,
## #   SMS <fct>, EXTRA_BAGGAGE <fct>, NO_GDS <int>, dep_wday <fct>,
## #   arr_wday <fct>, dep_month <fct>, arr_month <fct>, dep_day <fct>,
## #   arr_day <fct>, staying_duration <int>, country <fct>, website <fct>,
## #   num_of_flights <int>

The summation of these tickets is 50,000 equal to the number of records that we have in the dataset. Does it mean anything? Do I get the meaning of GDS and NO_GDS correctly?

Also it seems that there are some records with GDS + NO_GDS = 1, while they have round_trips. What does it mean? I still have some problems with the context of the data, however I am really enjoying it. This is how someone learns.

3.1.3 Adults number

g_adults <- 
        train_data %>% 
                dplyr::count(ADULTS, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(ADULTS,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "ADULTS") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of ADULTS numbers for each transaction") 

ggplotly(g_adults)
# pecentage
g_adult_perc <- 
        train_data %>% 
                dplyr::count(ADULTS, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(ADULTS,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "ADULTS number") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of ADULTS number") 

ggplotly(g_adult_perc)

It is interesting that the number of adults in some purchases is even more than 8. People fly in groups, possibly the most enjoyable travels.

What would be interesting facilities for alone travellers?

3.1.4 Children

g_children <- 
        train_data %>% 
                dplyr::count(CHILDREN, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(CHILDREN,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "CHILDREN") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of CHILDREN numbers for each transaction") 

ggplotly(g_children)
# percentange

g_children_perc <- 
        train_data %>% 
                dplyr::count(CHILDREN, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(CHILDREN,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "CHILDREN number") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of CHILDREN number") 

ggplotly(g_children_perc)

As it was expected, the number of children tranvelling with adults is right skewed with a very high peak at zero. This is helpful for in cabin preparations for children.

3.1.5 Infants

g_infants <- 
        train_data %>% 
                dplyr::count(INFANTS, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(INFANTS,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "INFANTS") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of INFANTS numbers for each transaction") 

ggplotly(g_infants)
# percentange

g_infant_perc <- 
        train_data %>% 
                dplyr::count(INFANTS, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(INFANTS,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "INFANTS number") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of INFANTS number") 

ggplotly(g_infant_perc)

The distribution of infants has similar shape to distribution of children, however the range is very limited here to maximum 2 infants. Possibly twins?

This is helpful for in cabin preparations for infants.

3.1.6 Train

g_train <- 
        train_data %>% 
                dplyr::count(TRAIN, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(TRAIN,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "Train booked?") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of Train Booking") 

ggplotly(g_train)
# percentange

g_train_perc <- 
        train_data %>% 
                dplyr::count(TRAIN, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(TRAIN,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "Train booked?") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of Train booked?") 

ggplotly(g_train_perc)

Most of our customers did not book train. While the above plot shows the distribution, and the significance of it, we are more acustomed to percentage rather than absolute numbers.

g_train_perc <- 
        train_data %>% 
                dplyr::count(TRAIN, sort = TRUE) %>% 
                mutate(percentage = round(n / sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(TRAIN,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "Train booked?") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of Train Booking") 

ggplotly(g_train_perc)

Almost 99% of the customers did not want train.

3.1.7 Haul Type

g_haul <- 
        train_data %>% 
                dplyr::count(HAUL_TYPE, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(HAUL_TYPE,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "Haul Type") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of Haul Types") 

ggplotly(g_haul)
# percentange

g_haul_perc <- 
        train_data %>% 
                dplyr::count(HAUL_TYPE, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(HAUL_TYPE,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "Haul Type") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of Haul Types") 

ggplotly(g_haul_perc)

So around 51% of flights are continental. Does it mean that people prefer to travel by train or car inside of each country? Does it mean they generally prefer to travel outside of their country for travelling?

This plot may help for optimizng the website and its suggestions. For instance the photos, to promote domestic flights let’s say, or tempting clients to buy continental flights as they are popular already.

3.1.8 Distance

g_distance <- 
        train_data %>% 
        ggplot() + 
        geom_histogram(aes(x = DISTANCE), binwidth = 10000, fill = "dodgerblue", color = "white") + 
        theme_light() + 
                xlab(label = "Distance") + 
                ylab(label = "Frequency") + 
                ggtitle("Distance Distribution") 


ggplotly(g_distance)

The binwidth is set to 10,000. It is interesting that the distribution is bimodal. Possibly based on HAUL_TYPE if we divide the data, we can get uni-modal distributions. To me, the above plot is composed of three distributions. We can investigate this later in the bi-variate plots.

3.1.9 Device

g_device <- 
        train_data %>% 
                dplyr::count(DEVICE, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(DEVICE,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "DEVICE Type") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of DEVICE Types") 

ggplotly(g_device)
# percentange

g_device_perc <- 
        train_data %>% 
                dplyr::count(DEVICE, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(DEVICE,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "DEVICE Type") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of DEVICE Types") 

ggplotly(g_device_perc)

So most of the bookings, around 68%, is by computers. It is interesting that we have a null category here, i.e. missing data. Can it be a sign of fraud?

3.1.10 Trip Type

g_trip <- 
        train_data %>% 
                dplyr::count(TRIP_TYPE, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(TRIP_TYPE,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "TRIP TYPE ") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of TRIP TYPE") 

ggplotly(g_trip)
# percentange

g_trip_perc <- 
        train_data %>% 
                dplyr::count(TRIP_TYPE, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(TRIP_TYPE,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "TRIP_TYPE") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of TRIP_TYPE") 

ggplotly(g_trip_perc)

61% of the bookins are for round trips. This expected based on my experience, as the round trip is always cheaper. However, it is interesting that we have ~4% of multidestination bookings.

3.1.11 Product

g_product <- 
        train_data %>% 
                dplyr::count(PRODUCT, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(PRODUCT,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "PRODUCT TYPE ") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of PRODUCT TYPE") 

ggplotly(g_product)
# percentange

g_product_perc <- 
        train_data %>% 
                dplyr::count(PRODUCT, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(PRODUCT,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "PRODUCT") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of PRODUCT") 

ggplotly(g_product_perc)

Dynpack is significantly less popular than trip. The request of clients is plain and simple: trip. I guess the dynpack has positive effect on buying extra luggage.

3.1.12 SMS

g_sms <- 
        train_data %>% 
                dplyr::count(SMS, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(SMS,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "SMS ") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of SMS") 

ggplotly(g_sms)
# percentange

g_sms_perc <- 
        train_data %>% 
                dplyr::count(SMS, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(SMS,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "SMS") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of SMS") 

ggplotly(g_sms_perc)

Very interesting. So almost half of the customers selected SMS for confirmation. I personally rather newer means than SMS.

3.1.13 departure wday

g_wday <- 
        train_data %>% 
                dplyr::count(dep_wday, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = dep_wday, y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "departure weekday ") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of departure weekday") 

ggplotly(g_wday)
# percentange

g_wday_perc <- 
        train_data %>% 
                dplyr::count(dep_wday, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = dep_wday, y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "departure weekday(sunday = 1)") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of departure weekday") 

ggplotly(g_wday_perc)

Very interesting! While Sunday has the highest frequency of being chosen for departure, the second highest is Monday! and the third is Saturday. Is it because of the ticket prices of Monday which tends to be lower?

3.1.14 Arrival Weekday

g_wday <- 
        train_data %>% 
                dplyr::count(arr_wday, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = arr_wday, y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "arrival weekday(Sunday = 1) ") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of arrival weekday") 

ggplotly(g_wday)
# percentange

g_wday_perc <- 
        train_data %>% 
                dplyr::count(arr_wday, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = arr_wday, y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "arrival weekday(sunday = 1)") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of arrival weekday") 

ggplotly(g_wday_perc)

The pattern for arrival is even more counter-intuitive. People tend to comeback on Tuesdays!

Are the arrival and departure weekdays helpful for suggesting promotions?

3.1.15 departure month

g_dep_month <- 
        train_data %>% 
                dplyr::count(dep_month, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = dep_month, y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "departure month ") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of departure month") 

ggplotly(g_dep_month)
# percentange

g_dep_month_perc <- 
        train_data %>% 
                dplyr::count(dep_month, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = dep_month, y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "departure month") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of departure month") 

ggplotly(g_dep_month_perc)

So the frequency of first six months for departure is very low, and this is because our TIME_STAMP, i.e. the time that these bookings have been done, is on July. So most of the bookings are for the same month! very interesting.

3.1.16 Arrival Month

g_arr_month <- 
        train_data %>% 
                dplyr::count(arr_month, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = arr_month, y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "arrival month ") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of arrival month") 

ggplotly(g_arr_month)
# percentange

g_arr_month_perc <- 
        train_data %>% 
                dplyr::count(arr_month, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = arr_month, y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "arrival month") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of arrival month") 

ggplotly(g_arr_month_perc)

The arrival month is more or less similar to departure month, however the interesting point is the August column in which 26% of departures happen. I guess most of the travels heppen in July, and some in July(dep)-Aug(arr). This is something that can be investigated in the bi-variate section.

3.1.17 dep day

g_dep_day <- 
        train_data %>% 
                dplyr::count(dep_day, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = dep_day, y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "departure day") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of departure day") 

ggplotly(g_dep_day)
# percentange

g_dep_day_perc <- 
        train_data %>% 
                dplyr::count(dep_day, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = dep_day, y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "departure day") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of departure day") 

ggplotly(g_dep_day_perc)

Super interesting. Most of the departures happen at the start of the month rathre than end of it. But why? Is this dominated by behaviour of a specific month?

3.1.18 arrival day

g_arr_day <- 
        train_data %>% 
                dplyr::count(arr_day, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = arr_day, y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "arrival day") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of arrival day") 

ggplotly(g_arr_day)
# percentange

g_arr_day_perc <- 
        train_data %>% 
                dplyr::count(arr_day, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = arr_day, y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "arrival day") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of arrival day") 

ggplotly(g_arr_day_perc)

Even for arrival day, the start of the month is more popular than the end. I like to go deeper into cohorts, to analyze the behaviour better.

3.1.19 Staying Duration

g_staying <- 
        train_data %>% 
        ggplot() + 
        geom_histogram(aes(x = staying_duration), binwidth = 1, fill = "dodgerblue") + 
        theme_light() + 
                xlab(label = "Staying Duration") + 
                ylab(label = "Frequency") + 
                ggtitle("Staying Duration Distribution(binwidth: 1 days") 

ggplotly(g_staying)

The staying duration is right skewed, but the dominant trips are just very short tripes of 0 days! Very interesting.

3.1.20 Country

g_country <- 
        train_data %>% 
                dplyr::count(country, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(country,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "arrival day") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of country") 

ggplotly(g_country)
# percentange

g_country_perc <- 
        train_data %>% 
                dplyr::count(country, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(country,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "country") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of country") 

ggplotly(g_country_perc)

Interestingly 30% of the clients are comming from France, and 15% from Spain. While it is not possible to evaluate the market share based on the countries, it is interesting to do so using other variables. Moreover, this plot may help to customize websites based on language, culture, and so on.

Too many rare levels! This is not a problem now, but based on my experience it would cause problem. So what to do? Lumping up uncommon levels into one others level would be a remedy.

There are two alternatives here. Either combine based on the frequency, or based on the response variable. I do base on response variable such that the countries with high proportion of Extra_baggage=True would be combined together as “high_rate_countries”. This would help prediction hopefully.

countries_with_high_baggage_rate <- 
train_data %>% 
        dplyr::select(country,EXTRA_BAGGAGE) %>% 
        table() %>% 
        prop.table(margin = 1) %>% 
        data.frame()  %>% 
        filter(EXTRA_BAGGAGE == "True") %>% 
        mutate(Freq = round(Freq,2)) %>% 
        arrange(desc(Freq)) %>% 
        filter(Freq >= 0.2) %>% 
        select(country) %>% 
        mutate(country = as.character(country))

train_data <- 
train_data %>% 
        mutate(country = as.character(country)) %>% 
        mutate(country = ifelse(country %in% countries_with_high_baggage_rate$country,
                                "high rate country","low rate country")) %>% 
        mutate(country = factor(country))
# 
# train_data %>% 
#         count(country) %>% 
#         ggplot() + 
#         geom_col(aes(x = reorder(country,-n), y = n) , fill = "dodgerblue") + 
#         theme_light() + 
#                 xlab(label = "country") + 
#                 ylab(label = "Percentage") + 
#                 ggtitle("Percentage of country") 
rare_levels <- 
train_data %>% 
        #dplyr::select(country) %>% 
        count(country, sort = TRUE) %>% 
        mutate(perc = n / sum(n)) %>% 
        filter(perc < 0.01) %>% 
        select(country) 



# train_data %>% 
#         case_when(country %in% rare_levels$country ~ "Others", 
#                   TRUE ~ as.character(country))
train_data <- 
train_data %>% 
        mutate(country = as.character(country )) %>% 
        mutate(country = ifelse(country %in% rare_levels$country , yes = "Others", no = country)) %>% 
        mutate(country = factor(country)) 


# rare_levels <- rare_levels$country
# 
# train_data$country %>% 
#         fct_collapse(Others =rare_levels$country)
#      
# a<- combineLevels(train_data$country ,levs = levels(rare_levels), newLabel = c("Others") )        
# # levels(train_data$country)[levels(train_data$country)  %in% rare_levels]
# 
# levels(train_data$country)

# rare_levels$country %in% levels(train_data$country)

Without rare levels of country.

g_country <- 
        train_data %>% 
                dplyr::count(country, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(country,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "arrival day") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of country") 

ggplotly(g_country)
# percentange

g_country_perc <- 
        train_data %>% 
                dplyr::count(country, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(country,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "country") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of country") 

ggplotly(g_country_perc)

3.1.21 Extra Baggage

g_baggage <- 
        train_data %>% 
                dplyr::count(EXTRA_BAGGAGE, sort = TRUE) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(EXTRA_BAGGAGE,-n), y = n), fill = "dodgerblue" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "EXTRA_BAGGAGE") + 
                ylab(label = "Frequency") + 
                ggtitle("Frequency of EXTRA_BAGGAGE") 

ggplotly(g_baggage)
# percentange

g_baggage_perc <- 
        train_data %>% 
                dplyr::count(EXTRA_BAGGAGE, sort = TRUE) %>% 
                mutate(percentage = round(n/sum(n),4)*100) %>% 
                ggplot() + 
                geom_col(aes(x = reorder(EXTRA_BAGGAGE,-percentage), y = percentage), fill = "purple" ) + 
                #coord_flip() + 
                theme_light() + 
                xlab(label = "EXTRA_BAGGAGE") + 
                ylab(label = "Percentage") + 
                ggtitle("Percentage of EXTRA_BAGGAGE") 

ggplotly(g_baggage_perc)

So around 20% of the clients buy extra baggages. This makes our problem a little bit “imbalance class” for predictive modelling.

3.2 Bi-Variates

I would love to examine every pair of variables visually, however, even for a small dataset this would be overwhelming. Combination of 2 out of ~22 variables would make doing so impossible.

So what to do? I visualize pairs with the goal of predictive model in the horizon. In other words, bi-variate plots would try to reveal the relations of extra_baggage response variable to predictors.

3.2.1 GDS, NO_GDS, Number of Flights ~ Extra Luggage

t_gds_baggage <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, GDS) %>% 
        table() %>% 
        data.frame()

g_gds_baggage <- 
       t_gds_baggage %>% 
        ggplot() +
        geom_col(aes(x = GDS , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "GDS flight numbers") + 
                ylab(label = "Proportion") + 
                ggtitle("gds and extra baggage")

ggplotly(g_gds_baggage)
# 

t_nogds_baggage <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, NO_GDS) %>% 
        table() %>% 
        data.frame()

g_nogds_baggage <- 
       t_nogds_baggage %>% 
        ggplot() +
        geom_col(aes(x = NO_GDS , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "NO GDS flight numbers") + 
                ylab(label = "Proportion") + 
                ggtitle("No_gds and extra baggage")

ggplotly(g_nogds_baggage)
# 

t_numflights_baggage <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, num_of_flights) %>% 
        table() %>% 
        data.frame()

g_numflights_baggage <- 
       t_numflights_baggage %>% 
        ggplot() +
        geom_col(aes(x = num_of_flights , y = Freq , fill= EXTRA_BAGGAGE), position = "fill" ) + 
                theme_light() + 
                xlab(label = "flight numbers") + 
                ylab(label = "Proportion") + 
                ggtitle("Flights# bought and extra baggage")

ggplotly(g_numflights_baggage)

It seems that there is a relationship between number of flights and extra baggage. Hopefully this aggregate variable helps in the prediction.

t_numflights_baggage <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, num_of_flights) %>% 
        table() %>% 
        data.frame()


g_numflights_baggage_heatmap <- 
        t_numflights_baggage %>% 
        ggplot(aes(x = num_of_flights, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("num_of_flights") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        ggtitle("Heatmap of Number of flights ~ Extra Baggage")
        theme_light()
## List of 57
##  $ line                 :List of 6
##   ..$ colour       : chr "black"
##   ..$ size         : num 0.5
##   ..$ linetype     : num 1
##   ..$ lineend      : chr "butt"
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ rect                 :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : chr "black"
##   ..$ size         : num 0.5
##   ..$ linetype     : num 1
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ text                 :List of 11
##   ..$ family       : chr ""
##   ..$ face         : chr "plain"
##   ..$ colour       : chr "black"
##   ..$ size         : num 11
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : num 0
##   ..$ lineheight   : num 0.9
##   ..$ margin       : 'margin' num [1:4] 0pt 0pt 0pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 5.5pt 0pt 0pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x.top     :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 0pt 5.5pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 5.5pt 0pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y.right   :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : num -90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 0pt 0pt 5.5pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text            :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "grey30"
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x          :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 2.2pt 0pt 0pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x.top      :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 0pt 2.2pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.y          :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 1
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 2.2pt 0pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.y.right    :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 0pt 0pt 2.2pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.ticks           :List of 6
##   ..$ colour       : chr "grey70"
##   ..$ size         : num 0.25
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ axis.ticks.length    : 'unit' num 2.75pt
##   ..- attr(*, "valid.unit")= int 8
##   ..- attr(*, "unit")= chr "pt"
##  $ axis.line            : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.line.x          : NULL
##  $ axis.line.y          : NULL
##  $ legend.background    :List of 5
##   ..$ fill         : NULL
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ legend.margin        : 'margin' num [1:4] 0.2cm 0.2cm 0.2cm 0.2cm
##   ..- attr(*, "valid.unit")= int 1
##   ..- attr(*, "unit")= chr "cm"
##  $ legend.spacing       : 'unit' num 0.4cm
##   ..- attr(*, "valid.unit")= int 1
##   ..- attr(*, "unit")= chr "cm"
##  $ legend.spacing.x     : NULL
##  $ legend.spacing.y     : NULL
##  $ legend.key           :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ legend.key.size      : 'unit' num 1.2lines
##   ..- attr(*, "valid.unit")= int 3
##   ..- attr(*, "unit")= chr "lines"
##  $ legend.key.height    : NULL
##  $ legend.key.width     : NULL
##  $ legend.text          :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.text.align    : NULL
##  $ legend.title         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.title.align   : NULL
##  $ legend.position      : chr "right"
##  $ legend.direction     : NULL
##  $ legend.justification : chr "center"
##  $ legend.box           : NULL
##  $ legend.box.margin    : 'margin' num [1:4] 0cm 0cm 0cm 0cm
##   ..- attr(*, "valid.unit")= int 1
##   ..- attr(*, "unit")= chr "cm"
##  $ legend.box.background: list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.box.spacing   : 'unit' num 0.4cm
##   ..- attr(*, "valid.unit")= int 1
##   ..- attr(*, "unit")= chr "cm"
##  $ panel.background     :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ panel.border         :List of 5
##   ..$ fill         : logi NA
##   ..$ colour       : chr "grey70"
##   ..$ size         : num 0.5
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ panel.spacing        : 'unit' num 5.5pt
##   ..- attr(*, "valid.unit")= int 8
##   ..- attr(*, "unit")= chr "pt"
##  $ panel.spacing.x      : NULL
##  $ panel.spacing.y      : NULL
##  $ panel.grid.major     :List of 6
##   ..$ colour       : chr "grey87"
##   ..$ size         : num 0.25
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.minor     :List of 6
##   ..$ colour       : chr "grey87"
##   ..$ size         : num 0.125
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.ontop          : logi FALSE
##  $ plot.background      :List of 5
##   ..$ fill         : NULL
##   ..$ colour       : chr "white"
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ plot.title           :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 1.2
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 0pt 6.6pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.subtitle        :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.9
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 0pt 4.95pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.caption         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.9
##   ..$ hjust        : num 1
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 4.95pt 0pt 0pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.margin          : 'margin' num [1:4] 5.5pt 5.5pt 5.5pt 5.5pt
##   ..- attr(*, "valid.unit")= int 8
##   ..- attr(*, "unit")= chr "pt"
##  $ strip.background     :List of 5
##   ..$ fill         : chr "grey70"
##   ..$ colour       : logi NA
##   ..$ size         : NULL
##   ..$ linetype     : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ strip.placement      : chr "inside"
##  $ strip.text           :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "white"
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.x         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 5.5pt 0pt 5.5pt 0pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.y         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : num -90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0pt 5.5pt 0pt 5.5pt
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.switch.pad.grid: 'unit' num 0.1cm
##   ..- attr(*, "valid.unit")= int 1
##   ..- attr(*, "unit")= chr "cm"
##  $ strip.switch.pad.wrap: 'unit' num 0.1cm
##   ..- attr(*, "valid.unit")= int 1
##   ..- attr(*, "unit")= chr "cm"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi TRUE
##  - attr(*, "validate")= logi TRUE
ggplotly(g_numflights_baggage_heatmap)

The heatmap provides relationships between extra baggage and the number of flights in terms of absolute frequencies of intersection. When the number of flights goes up, the contrast of two colors extra baggage diminishes, thus this variable can help in prediction. Moreover, the map helps to get the insight that no-extrabaggage and number_of_flights = 1 has the highest density of transactions.

3.2.2 Website ~ Baggage

t_website_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, website) %>% 
        table() %>% 
        data.frame()

g_website_bag <- 
       t_website_bag %>% 
        ggplot() +
        geom_col(aes(x = website , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "Website") + 
                ylab(label = "Proportion") + 
                ggtitle("Website and Extra Baggage relation")

ggplotly(g_website_bag)

It seems that there is a relationship between the website and extra baggage, such that the clients of TL and OP websites buy more extra luggages than GO and ED. However, the proportions are very close and possibly insignificant.

t_website_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, website) %>% 
        table() %>% 
        data.frame()


g_website_bag_heatmap <- 
        t_website_bag %>% 
        ggplot(aes(x = website, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("Websites") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of websites ~ extra baggage")

ggplotly(g_website_bag_heatmap)

3.2.3 Adults ~ extra baggage

t_adults_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, ADULTS) %>% 
        table() %>% 
        data.frame()



g_adults_bag <- 
       t_adults_bag %>% 
        ggplot() +
        geom_col(aes(x = ADULTS , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "ADULTS") + 
                ylab(label = "Proportion") + 
                ggtitle("ADULTS and Extra Baggage relation")


ggplotly(g_adults_bag)
t_adults_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, ADULTS) %>% 
        table() %>% 
        data.frame()


g_adults_bag_heatmap <- 
        t_adults_bag %>% 
        ggplot(aes(x = ADULTS, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("Adults") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of #adults ~ extra baggage")

ggplotly(g_adults_bag_heatmap)

It is not counter-intuitive that with more adults traveller, extra baggages are more probable to be purchased. More people, possibly longer trip, more equipments…

3.2.4 Children ~ extra luggage

t_children_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, CHILDREN) %>% 
        table() %>% 
        data.frame()



g_children_bag <- 
       t_children_bag %>% 
        ggplot() +
        geom_col(aes(x = CHILDREN , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "CHILDREN") + 
                ylab(label = "Proportion") + 
                ggtitle("CHILDREN and Extra Baggage relation")


ggplotly(g_children_bag)
t_children_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, CHILDREN) %>% 
        table() %>% 
        data.frame()


g_children_bag_heatmap <- 
        t_children_bag %>% 
        ggplot(aes(x = CHILDREN, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("CHILDREN") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of #children ~ extra baggage")

ggplotly(g_children_bag_heatmap)

Absolute majority of the purchases are for zero children and no extra baggage, however the disappearing contrast of up and down of columns shows that the more children, the more probability of ordering extra baggage.

3.2.5 infants ~ extra baggage

t_infants_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, INFANTS) %>% 
        table() %>% 
        data.frame()

g_infants_bag <- 
 t_infants_bag %>% 
        ggplot() +
        geom_col(aes(x = INFANTS , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "INFANTS") + 
                ylab(label = "Frequency") + 
                ggtitle("INFANTS and Extra Baggage relation")

ggplotly(g_infants_bag)
g_infants_bag <- 
       t_infants_bag %>% 
        ggplot() +
        geom_col(aes(x = INFANTS , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "INFANTS") + 
                ylab(label = "Proportion") + 
                ggtitle("INFANTS and Extra Baggage relation")


ggplotly(g_infants_bag)

There are some traces that more infants, more probability of buying extra baggages.

t_infants_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, INFANTS) %>% 
        table() %>% 
        data.frame()


g_infants_bag_heatmap <- 
        t_infants_bag %>% 
        ggplot(aes(x = INFANTS, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("INFANTS") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of #INFANTS ~ extra baggage")

ggplotly(g_infants_bag_heatmap)

Same pattern as children and adults more or less.

3.2.6 Train ~ Extra Baggage

t_train_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, TRAIN) %>% 
        table() %>% 
        data.frame()

g_train_bag <- 
 t_train_bag %>% 
        ggplot() +
        geom_col(aes(x = TRAIN , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "TRAIN") + 
                ylab(label = "Frequency") + 
                ggtitle("INFANTS and Extra Baggage relation")

ggplotly(g_train_bag)
g_train_bag <- 
       t_train_bag %>% 
        ggplot() +
        geom_col(aes(x = TRAIN , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "TRAIN") + 
                ylab(label = "Proportion") + 
                ggtitle("TRAIN and Extra Baggage relation")


ggplotly(g_train_bag)

Very strange result: when there is no request for train, there is higher request for baggage!

t_train_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, TRAIN) %>% 
        table() %>% 
        data.frame()


g_train_bag_heatmap <- 
        t_train_bag %>% 
        ggplot(aes(x = TRAIN, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("TRAIN") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of #TRAIN ~ extra baggage")

ggplotly(g_train_bag_heatmap)
# glimpse(train_data)
# sum(complete.cases(train_data))

3.2.7 Haul Type ~ Extra Baggage

t_haul_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, HAUL_TYPE) %>% 
        table() %>% 
        data.frame()

g_haul_bag <- 
 t_haul_bag %>% 
        ggplot() +
        geom_col(aes(x = HAUL_TYPE , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "HAUL_TYPE") + 
                ylab(label = "Frequency") + 
                ggtitle("HAUL_TYPE and Extra Baggage relation")

ggplotly(g_haul_bag)
g_haul_bag <- 
       t_haul_bag %>% 
        ggplot() +
        geom_col(aes(x = HAUL_TYPE , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "Haul Type") + 
                ylab(label = "Proportion") + 
                ggtitle("Haul type and Extra Baggage relation")


ggplotly(g_haul_bag)

It is very clear that haul_type is associated with extra_baggage variable, and can be a good predictor for our model. The surprising point is higher proportion of domestic flights with extra baggage comparing to inter-continental.

t_haul_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, HAUL_TYPE) %>% 
        table() %>% 
        data.frame()


g_haul_bag_heatmap <- 
        t_haul_bag %>% 
        ggplot(aes(x = HAUL_TYPE, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("HAUL_TYPE") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of #HAUL_TYPE ~ extra baggage")

ggplotly(g_haul_bag_heatmap)
# glimpse(train_data)
# sum(complete.cases(train_data))

The hitmap shows the strange point as well, most of the between continent travellers do not want extra luggage, maybe because the extra baggage is part of the package? The higher the contrast of each column, the more hope for predictor to be useful.

3.2.8 distance ~ Extra baggage

For this quant variable vs qual variable, a boxplot would be perfect.

g_distance_baggage <- 
train_data %>% 
        ggplot(aes(x = EXTRA_BAGGAGE , y = DISTANCE, fill = EXTRA_BAGGAGE)) + 
        geom_boxplot() + 
        stat_summary(fun.y=mean, geom="point", shape=5, size=4 , color = "black") + 
        theme_light() + 
        ggtitle("Distance ~ Extra Baggage")


ggplotly(g_distance_baggage)
#--- overlapping dist 

g_distance_baggage_dist <- 
train_data %>%  
        ggplot() + 
        geom_density(aes(x = DISTANCE, fill = EXTRA_BAGGAGE), bins = 100, alpha = 0.3) + 
        theme_light() + 
        ggtitle("Distance ~ Extra Baggage")

ggplotly(g_distance_baggage_dist)

It is very interesting that the median travel distance of the travellers with extra baggage is lower than without extra baggage, while the opposite should be true seemingly. Maybe the default package of the tickets for long distances include extra baggage, so no extra extra baggage is needed!

From the overlapping distributions, we can hardly see any predictive power for distance. They have similar shapes, overlapping as much as possible. Interesting that distance is a not good predictor.

3.2.9 device ~ Extra baggage

t_device_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, DEVICE) %>% 
        table() %>% 
        data.frame()

g_device_bag <- 
 t_device_bag %>% 
        ggplot() +
        geom_col(aes(x = DEVICE , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "DEVICE") + 
                ylab(label = "Frequency") + 
                ggtitle("DEVICE and Extra Baggage relation")

ggplotly(g_device_bag)
g_device_bag <- 
       t_device_bag %>% 
        ggplot() +
        geom_col(aes(x = DEVICE , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "DEVICE") + 
                ylab(label = "Proportion") + 
                ggtitle("DEVICE type and Extra Baggage relation")


ggplotly(g_device_bag)

Not that much discriminatory power I can see between device type and extra baggage. The smartphone has the lowest rate of requesting extra baggage, and the other has the highest, but the differences are significant? It is possible to check using bootstrapping.

t_device_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, DEVICE) %>% 
        table() %>% 
        data.frame()


g_device_bag_heatmap <- 
        t_device_bag %>% 
        ggplot(aes(x = DEVICE, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("DEVICE") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of #DEVICE ~ extra baggage")

ggplotly(g_device_bag_heatmap)
# glimpse(train_data)
# sum(complete.cases(train_data))

From the heatmap, it seems that computer or smartphone are good predictors, based on the high contrast that we can see. I guess the whole variable would be a signficant then.

3.2.10 trip type ~ Extra baggage

t_trip_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, TRIP_TYPE) %>% 
        table() %>% 
        data.frame()

g_trip_bag <- 
 t_trip_bag %>% 
        ggplot() +
        geom_col(aes(x = TRIP_TYPE , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "TRIP_TYPE") + 
                ylab(label = "Frequency") + 
                ggtitle("TRIP_TYPE and Extra Baggage relation")

ggplotly(g_trip_bag)
g_trip_bag <- 
       t_trip_bag %>% 
        ggplot() +
        geom_col(aes(x = TRIP_TYPE , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "TRIP_TYPE") + 
                ylab(label = "Proportion") + 
                ggtitle("TRIP_TYPE type and Extra Baggage relation")


ggplotly(g_trip_bag)

This variable is beautifully associated with the extra baggage. Multi destination has higher proportion of extra baggage than roundtrip, and it seems that this variable has predictive power.

t_trip_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, TRIP_TYPE) %>% 
        table() %>% 
        data.frame()


g_trip_bag_heatmap <- 
        t_trip_bag %>% 
        ggplot(aes(x = TRIP_TYPE, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("TRIP_TYPE") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of TRIP_TYPE ~ extra baggage")

ggplotly(g_trip_bag_heatmap)
# glimpse(train_data)
# sum(complete.cases(train_data))

The contrasts show that this is a useful variable for prediction.

3.2.11 Product ~ Extra Baggage

t_product_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, PRODUCT) %>% 
        table() %>% 
        data.frame()

g_product_bag <- 
 t_product_bag %>% 
        ggplot() +
        geom_col(aes(x = PRODUCT , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "PRODUCT") + 
                ylab(label = "Frequency") + 
                ggtitle("PRODUCT and Extra Baggage relation")

ggplotly(g_product_bag)
g_product_bag <- 
       t_product_bag %>% 
        ggplot() +
        geom_col(aes(x = PRODUCT , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "PRODUCT") + 
                ylab(label = "Proportion") + 
                ggtitle("PRODUCT type and Extra Baggage relation")


ggplotly(g_product_bag)

There is not that much difference between proportions of the two category levels, and I am afraid that this variable would be helpful. Anyway, it is interesting that DYNPACK has lower proportion.

t_product_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, PRODUCT) %>% 
        table() %>% 
        data.frame()


g_product_bag_heatmap <- 
        t_product_bag %>% 
        ggplot(aes(x = PRODUCT, y = EXTRA_BAGGAGE )) +
          geom_tile(aes(fill = Freq) , color = "white") +
          scale_fill_gradient(low = "blue", high = "red") +
          ylab("Extra Baggage") +
          xlab("PRODUCT") +
          theme(legend.title = element_text(size = 10),
                legend.text = element_text(size = 12),
                plot.title = element_text(size=16),
                axis.title=element_text(size=14,face="bold"),
                axis.text.x = element_text(angle = 90, hjust = 1)) +
          labs(fill = "Frequency") + 
        theme_light() + 
        ggtitle("heatmap of PRODUCT ~ extra baggage")

ggplotly(g_product_bag_heatmap)
# glimpse(train_data)
# sum(complete.cases(train_data))

3.2.12 SMS ~ Extra Baggage

t_sms_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, SMS) %>% 
        table() %>% 
        data.frame()

g_sms_bag <- 
 t_sms_bag %>% 
        ggplot() +
        geom_col(aes(x = SMS , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "SMS") + 
                ylab(label = "Frequency") + 
                ggtitle("SMS and Extra Baggage relation")

ggplotly(g_sms_bag)
g_sms_bag <- 
       t_sms_bag %>% 
        ggplot() +
        geom_col(aes(x = SMS , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "SMS") + 
                ylab(label = "Proportion") + 
                ggtitle("SMS type and Extra Baggage relation")


ggplotly(g_sms_bag)

The proportions of SMS levels are almost identical, so knowing that a client uses SMS or not, cannot give us any clue about whether he/she wants extra baggage or not.

3.2.13 dep_wday ~ Extra baggage

t_dep_wday_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, dep_wday) %>% 
        table() %>% 
        data.frame()

g_dep_wday_bag <- 
 t_dep_wday_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_wday , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "dep_wday") + 
                ylab(label = "Frequency") + 
                ggtitle("dep_wday and Extra Baggage relation")

ggplotly(g_dep_wday_bag)
g_dep_wday_bag <- 
       t_dep_wday_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_wday , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "dep_wday") + 
                ylab(label = "Proportion") + 
                ggtitle("dep_wday type and Extra Baggage relation")


ggplotly(g_dep_wday_bag)

There is difference between proportions of extra baggage among the departure weekdays, however I am not sure whether it is significant.

3.2.14 arr_wday ~ Extra baggage

t_arr_wday_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, arr_wday) %>% 
        table() %>% 
        data.frame()

g_arr_wday_bag <- 
 t_arr_wday_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_wday , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "arr_wday") + 
                ylab(label = "Frequency") + 
                ggtitle("arr_wday and Extra Baggage relation")

ggplotly(g_arr_wday_bag)
g_arr_wday_bag <- 
       t_arr_wday_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_wday , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "arr_wday") + 
                ylab(label = "Proportion") + 
                ggtitle("arr_wday type and Extra Baggage relation")


ggplotly(g_arr_wday_bag)

There are minimal differences between proportions at various levels, but is it significant?

3.2.15 dep_month ~ baggage

t_dep_month_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, dep_month) %>% 
        table() %>% 
        data.frame()

g_dep_month_bag <- 
 t_dep_month_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_month , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "dep_month") + 
                ylab(label = "Frequency") + 
                ggtitle("dep_month and Extra Baggage relation")

ggplotly(g_dep_month_bag)
g_dep_month_bag <- 
       t_dep_month_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_month , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "dep_month") + 
                ylab(label = "Proportion") + 
                ggtitle("dep_month type and Extra Baggage relation")


ggplotly(g_dep_month_bag)

To me, it seems that departure month has a predictive power for extra baggage request. There is variance among the proportions of levels, but I don’t know whether it is significant or not.

It seems to me that with combination of the months with high proportions, and having two levels at the end, we would be better off for predictive modelling.

train_data <- 
train_data %>% 
        mutate(dep_month = as.numeric(dep_month)) %>% 
        mutate(dep_month = ifelse(dep_month %in% c(2,6,8,9,10), "2-6-8-9-10", "rest"  )) %>% 
        mutate(dep_month = factor(dep_month))

New visualization

t_dep_month_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, dep_month) %>% 
        table() %>% 
        data.frame()

g_dep_month_bag <- 
 t_dep_month_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_month , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "dep_month") + 
                ylab(label = "Frequency") + 
                ggtitle("dep_month and Extra Baggage relation")

ggplotly(g_dep_month_bag)
g_dep_month_bag <- 
       t_dep_month_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_month , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "dep_month") + 
                ylab(label = "Proportion") + 
                ggtitle("dep_month type and Extra Baggage relation")


ggplotly(g_dep_month_bag)

Now it seems to me that the predictive ability has increased, since the variation of proportions between two levels are higher, and knowing the level of dep_month would make us more able to predict the extra baggage request.

3.2.16 arr_month ~

t_arr_month_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, arr_month) %>% 
        table() %>% 
        data.frame()

g_arr_month_bag <- 
 t_arr_month_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_month , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "arr_month") + 
                ylab(label = "Frequency") + 
                ggtitle("arr_month and Extra Baggage relation")

ggplotly(g_arr_month_bag)
g_arr_month_bag <- 
       t_arr_month_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_month , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "arr_month") + 
                ylab(label = "Proportion") + 
                ggtitle("arr_month type and Extra Baggage relation")


ggplotly(g_arr_month_bag)

Similar to departure month, it seems to me that the variation of arrival month has a predictive power as well. However, the variation is very low, and the number of levels are high.

Should I convert this variable to a categorical with two levels, having 2,6,7,8,9,10 in one level and the others in another?

train_data <- 
train_data %>% 
        mutate(arr_month = as.numeric(arr_month)) %>% 
        mutate(arr_month = ifelse(arr_month %in% c(2,6,8,9,10), yes = "2-6-8-9-10", no = "rest") ) %>% 
        mutate(arr_month = factor(arr_month)) 

New visualization

t_arr_month_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, arr_month) %>% 
        table() %>% 
        data.frame()

g_arr_month_bag <- 
 t_arr_month_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_month , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "arr_month") + 
                ylab(label = "Frequency") + 
                ggtitle("arr_month and Extra Baggage relation")

ggplotly(g_arr_month_bag)
g_arr_month_bag <- 
       t_arr_month_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_month , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "arr_month") + 
                ylab(label = "Proportion") + 
                ggtitle("arr_month type and Extra Baggage relation")


ggplotly(g_arr_month_bag)

3.2.17 dep_day ~

t_dep_day_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, dep_day) %>% 
        table() %>% 
        data.frame()

g_dep_day_bag <- 
 t_dep_day_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_day , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "dep_day") + 
                ylab(label = "Frequency") + 
                ggtitle("dep_day and Extra Baggage relation")

ggplotly(g_dep_day_bag)
g_dep_day_bag <- 
       t_dep_day_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_day , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "dep_day") + 
                ylab(label = "Proportion") + 
                ggtitle("dep_day type and Extra Baggage relation")


ggplotly(g_dep_day_bag)

Knowing that a traveller has departure date at the end of the month rather than start of the month, can help to predict request of extra baggage better. However, knowing a specific date seemingly has not that much predictive power comparing to not that specific date. The proportions are close.

Isn’t it better to convert this variable to the first half of the month and the second half?

train_data %>% 
        filter(dep_day %in% c(16:31)) %>% 
        select(EXTRA_BAGGAGE) %>%
        table()
## .
## False  True 
## 13345  3684
train_data %>% 
        filter(!dep_day %in% c(16:31)) %>% 
        select(EXTRA_BAGGAGE) %>%
        table()
## .
## False  True 
## 26856  6115

It seems this variable would help!

train_data <- 
train_data %>% 
        mutate(dep_day = as.numeric(dep_day)) %>% 
        mutate(dep_day = ifelse(dep_day > 15 , "second half", "first half")) %>% 
        mutate(dep_day = factor(dep_day))

New visualization

t_dep_day_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, dep_day) %>% 
        table() %>% 
        data.frame()

g_dep_day_bag <- 
 t_dep_day_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_day , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "dep_day") + 
                ylab(label = "Frequency") + 
                ggtitle("dep_day and Extra Baggage relation")

ggplotly(g_dep_day_bag)
g_dep_day_bag <- 
       t_dep_day_bag %>% 
        ggplot() +
        geom_col(aes(x = dep_day , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "dep_day") + 
                ylab(label = "Proportion") + 
                ggtitle("dep_day type and Extra Baggage relation")


ggplotly(g_dep_day_bag)

3.2.18 arr_day ~

t_arr_day_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, arr_day) %>% 
        table() %>% 
        data.frame()

g_arr_day_bag <- 
 t_arr_day_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_day , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "arr_day") + 
                ylab(label = "Frequency") + 
                ggtitle("arr_day and Extra Baggage relation")

ggplotly(g_arr_day_bag)
g_arr_day_bag <- 
       t_arr_day_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_day , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "arr_day") + 
                ylab(label = "Proportion") + 
                ggtitle("arr_day type and Extra Baggage relation")


ggplotly(g_arr_day_bag)

This has the same situation as departure day. Knowing for instance the departure day is at 31st, or not 31st, does not give us that much predictive power for the response variable.

I do level combination as dep_day.

train_data <- 
train_data %>% 
        mutate(arr_day = as.numeric(arr_day)) %>% 
        mutate(arr_day = ifelse(arr_day>15, "second half", "first half")) %>% 
        mutate(arr_day = factor(arr_day))
t_arr_day_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, arr_day) %>% 
        table() %>% 
        data.frame()

g_arr_day_bag <- 
 t_arr_day_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_day , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "arr_day") + 
                ylab(label = "Frequency") + 
                ggtitle("arr_day and Extra Baggage relation")

ggplotly(g_arr_day_bag)
g_arr_day_bag <- 
       t_arr_day_bag %>% 
        ggplot() +
        geom_col(aes(x = arr_day , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "arr_day") + 
                ylab(label = "Proportion") + 
                ggtitle("arr_day type and Extra Baggage relation")


ggplotly(g_arr_day_bag)

3.2.19 staying_duration ~

g_staying_duration_baggage <- 
train_data %>% 
        ggplot(aes(x = EXTRA_BAGGAGE , y = staying_duration, fill = EXTRA_BAGGAGE)) + 
        geom_boxplot() + 
        stat_summary(fun.y=mean, geom="point", shape=5, size=4 , color = "black") + 
        theme_light() + 
        ggtitle("staying_duration ~ Extra Baggage")


ggplotly(g_staying_duration_baggage)
#--- overlapping dist 

g_staying_duration_baggage_dist <- 
train_data %>%  
        ggplot() + 
        geom_density(aes(x = staying_duration, fill = EXTRA_BAGGAGE), bins = 100, alpha = 0.3) + 
        theme_light() + 
        ggtitle("staying_duration ~ Extra Baggage")

ggplotly(g_staying_duration_baggage_dist)

Staying duration does not show that much information for empowering us to predict. While it is expected that the travellers who stay more, be more probable to purchase extra baggage, the data shows no sign of this expectation. Still I think it may be because of inclusion of baggage in some of the purchases.

3.2.20 country ~

t_country_bag <- 
train_data %>% 
        dplyr::select(EXTRA_BAGGAGE, country) %>% 
        table() %>% 
        data.frame()

g_country_bag <- 
 t_country_bag %>% 
        ggplot() +
        geom_col(aes(x = country , y = Freq , fill= EXTRA_BAGGAGE)) + 
                theme_light() + 
                xlab(label = "country") + 
                ylab(label = "Frequency") + 
                ggtitle("country and Extra Baggage relation")

ggplotly(g_country_bag)
g_country_bag <- 
       t_country_bag %>% 
        ggplot() +
        geom_col(aes(x = country , y = Freq , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
                xlab(label = "country") + 
                ylab(label = "Proportion") + 
                ggtitle("country type and Extra Baggage relation")


ggplotly(g_country_bag)

3.3 Tri-Variates

Since the combination of 3 variables out of the dataset is myriad, it would be suffice to show how to do visualization on these variables as a sample. If I have had more time, I would go through all visualizations, but the time is short for this report.

3.3.1 Haul type ~ Train ~ Baggage

t <- train_data %>%
        group_by(HAUL_TYPE,TRAIN,EXTRA_BAGGAGE) %>% 
        count() 

# install.packages("ggalluvial")




alluvial(t[,1:3], freq = t$n,
         col = ifelse(t$EXTRA_BAGGAGE == "True", "gold", "grey"),
         border = ifelse(t$EXTRA_BAGGAGE == "True", "gold", "grey"),
         cex = 0.7)

summary(train_data)
##        ID             GDS           DEPARTURE         
##  Min.   :    0   Min.   :0.0000   Min.   :2018-01-01  
##  1st Qu.:12500   1st Qu.:0.0000   1st Qu.:2018-07-07  
##  Median :25000   Median :1.0000   Median :2018-07-20  
##  Mean   :25000   Mean   :0.6424   Mean   :2018-07-29  
##  3rd Qu.:37499   3rd Qu.:1.0000   3rd Qu.:2018-08-12  
##  Max.   :49999   Max.   :4.0000   Max.   :2018-12-31  
##                                                       
##     ARRIVAL               ADULTS         CHILDREN         INFANTS       
##  Min.   :2018-01-01   Min.   :0.000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:2018-07-10   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :2018-07-27   Median :1.000   Median :0.0000   Median :0.00000  
##  Mean   :2018-08-06   Mean   :1.488   Mean   :0.0991   Mean   :0.01816  
##  3rd Qu.:2018-08-23   3rd Qu.:2.000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :2019-06-26   Max.   :9.000   Max.   :5.0000   Max.   :2.00000  
##                                                                         
##    TRAIN                  HAUL_TYPE        DISTANCE     
##  False:49731   CONTINENTAL     :25878   Min.   :     0  
##  True :  269   DOMESTIC        :11053   1st Qu.:132962  
##                INTERCONTINENTAL:13069   Median :224002  
##                                         Mean   :367276  
##                                         3rd Qu.:612708  
##                                         Max.   :999941  
##                                                         
##         DEVICE                  TRIP_TYPE        PRODUCT     
##  COMPUTER  :34064   MULTI_DESTINATION: 2161   DYNPACK:  956  
##  OTHER     :  942   ONE_WAY          :17335   TRIP   :49044  
##  SMARTPHONE:11709   ROUND_TRIP       :30504                  
##  TABLET    : 3152                                            
##  NA's      :  133                                            
##                                                              
##                                                              
##     SMS        EXTRA_BAGGAGE     NO_GDS       dep_wday arr_wday
##  False:25168   False:40201   Min.   :0.0000   1:8763   1:6702  
##  True :24832   True : 9799   1st Qu.:0.0000   2:8260   2:7499  
##                              Median :1.0000   3:7062   3:9422  
##                              Mean   :0.5913   4:7364   4:8123  
##                              3rd Qu.:1.0000   5:6064   5:6317  
##                              Max.   :4.0000   6:5896   6:6165  
##                                               7:6591   7:5772  
##       dep_month          arr_month            dep_day     
##  2-6-8-9-10:16708   2-6-8-9-10:20940   first half :32971  
##  rest      :33292   rest      :29060   second half:17029  
##                                                           
##                                                           
##                                                           
##                                                           
##                                                           
##         arr_day      staying_duration               country     
##  first half :29331   Min.   :  0.000   high rate country:21858  
##  second half:20669   1st Qu.:  0.000   low rate country :28142  
##                      Median :  3.000                            
##                      Mean   :  7.732                            
##                      3rd Qu.:  9.000                            
##                      Max.   :351.000                            
##                                                                 
##  website    num_of_flights 
##  ED:28368   Min.   :1.000  
##  GO: 6014   1st Qu.:1.000  
##  OP:14715   Median :1.000  
##  TL:  903   Mean   :1.234  
##             3rd Qu.:1.000  
##             Max.   :4.000  
## 

The type of plot should be chosen based on the goal of visualization, and the type of data. The above plot is an alluvial diagram of three categorical variables. It shows the relations between various levels of these variables, for instance, we can see that the small proportion of customers who bought train, they have intercontinental hauls. Besides, the biggest proportion of the baggage buyers have continental hauls.

3.3.2 Haul Type ~ Trip Type ~ Baggage

train_data %>% 
        count(HAUL_TYPE,TRIP_TYPE,EXTRA_BAGGAGE) %>% 
        ggplot() + 
        geom_col(aes(x = HAUL_TYPE , y = n , fill= EXTRA_BAGGAGE), position = "fill") + 
                theme_light() + 
        facet_grid(TRIP_TYPE ~ . ) +
        ggtitle(" Haul Type ~ Trip Type ~ Baggage")

Not that much interesting thing here, but a few points. For instance, multidestination customers have high rate of buying extra baggage regardless of haul type. While it is not true for one_way and round_trip customers. The patterns of buying extra baggage for different hauls of one_way and round_trip are the same, and continental travels have the highest proportion. But why not intercontinental?

3.3.3 Haul type ~ Distance ~ Extra Baggage

3.4 High-Dimensional Visualization

3.4.1 tsne

A dataset is a system of compoenents(variables or compoenents) and their relationships. A complex system cannot be understood by knowing its parts, out of the system. A complex system demands a holistic approach, and high-dimensional data visualization tries to be holistic.

Here I use tsne algorithm to visualize the whole dataset seeking for patterns.

set.seed(8)
# index <- which(colnames(imputed_train_data)== "EXTRA_BAGGAGE")

# tsne_train_prox <- daisy(x = imputed_train_data[,-index]  ,
#                     metric = "gower",
#                     stand = TRUE ,
#                     type = list(asymm = 1)) 

train_data_model_mat<- 
model.matrix(data = train_data[,-c(1,3,4)] , object = EXTRA_BAGGAGE ~ . )

train_data_model_mat <- scale(train_data_model_mat[,-1])
train_data_model_mat <- train_data_model_mat[complete.cases(train_data_model_mat),]

#since my laptop cannot handle 50,000 observation in tsne()
sample_vec <- sample(x = 1:nrow(train_data_model_mat), size = 10000 , replace = FALSE)

test_dist <- dist(x =train_data_model_mat[sample_vec,] , method = "manhattan" )


tsne_test <- tsne(X = test_dist , k = 2  )

Ok, again after an hour of waiting, the computation did not end. My machine is too weak for this implementation, so I leave the code here without any output.

3.4.2 SOM

set.seed(14)

train_data_model_mat<- 
model.matrix(data = train_data[,-c(1,3,4)] , object = EXTRA_BAGGAGE ~ . )

train_data_model_mat <- scale(train_data_model_mat[,-1])
train_data_model_mat <- train_data_model_mat[complete.cases(train_data_model_mat),]

kohonen_model <- som(X =  train_data_model_mat, grid = somgrid(20, 20, "hexagonal"))
# the counts plot shows the number of members in each node, a good map is ideally have uniform distribution of #members, but it can also show the outliers 
par(mfrow=c(1,2))
plot(kohonen_model, type = "counts", main = "Counts Plot")
#Quality is based on the average difference of node memebrs from the node model, i.e. deviation from the representative model 
plot(kohonen_model, type = "quality", main = "Quality Plot")

par(mfrow=c(1,2))
# raining progress, as measured by the average distance of an object with the closest codebook vector unit
plot(kohonen_model, type = "changes", main = "Change Plot")

# distances of each node from their neighbour nodes. 
plot(kohonen_model, type = "dist.neighbours", main = "Distance from Neighbor")

 coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
       

# # GDS property map        
# plot(kohonen_model, type = "property", property = kohonen_model$codes[[1]][,1], palette.name = coolBlueHotRed , main = colnames(train_data_model_mat)[1] )
# 
# # ADULTS
# plot(kohonen_model, type = "property", property = kohonen_model$codes[[1]][,2], palette.name = coolBlueHotRed , main = colnames(train_data_model_mat)[2] )
# 
# #Distance
# plot(kohonen_model, type = "property", property = kohonen_model$codes[[1]][,8], palette.name = coolBlueHotRed , main = colnames(train_data_model_mat)[8] )

# now plotting the extra baggage per node
extra_baggage_binary <- as.numeric(train_data$EXTRA_BAGGAGE)
extra_baggage_binary <- extra_baggage_binary - 1 

baggage_vector_som_nodes <- vector(length=400)

# 400 is the grid size
for (i in 1:400){
      baggage_vector_som_nodes[i] <- mean(extra_baggage_binary[which(kohonen_model$unit.classif==i)])
}

# plot(kohonen_model, type = "property", property = baggage_vector_som_nodes, palette.name = coolBlueHotRed ,
#      main = "Average Extra Baggage Per node" )

par(mfrow=c(2,2))
# GDS property map        
plot(kohonen_model, type = "property", property = kohonen_model$codes[[1]][,1], palette.name = coolBlueHotRed , main = colnames(train_data_model_mat)[1] )

# ADULTS
plot(kohonen_model, type = "property", property = kohonen_model$codes[[1]][,2], palette.name = coolBlueHotRed , main = colnames(train_data_model_mat)[2] )

#Distance
plot(kohonen_model, type = "property", property = kohonen_model$codes[[1]][,8], palette.name = coolBlueHotRed , main = colnames(train_data_model_mat)[8] )

#Extra Baggage
plot(kohonen_model, type = "property", property = baggage_vector_som_nodes, palette.name = coolBlueHotRed ,
     main = "Average Extra Baggage Per node" )

par(mfrow=c(1,1))
# #clustering 
# clusters_range = 2:12
# kohonen_model_clustering = kmeansruns(data = kohonen_model$codes , krange = clusters_range , criterion = "ch" , iter.max = 100 , runs = 100 )
# print(paste("According to Calinski-Harabasz, The optimum number of clusters is",kohonen_model_clustering$bestk))
# plot(clusters_range, kohonen_model_clustering$crit , type = "b" , main = "ch criterion values" , xlab = "number of clusters" , ylab = "criterion value" )

While there are some insignificant patterns in the grids, there is a big problem with Kohonen package in R that makes the above maps unreliable with datasets with more than a dozen variables.

The Kohonen package in R only works with Euclidean distance, and Euclidean distance shows strange behaviours in high dimensional space. More specifically, in high dimensional space, the Euclidean distance of every two points is almost equal to any other pair. This is part of the curse of dimensionality.

——————-

4. Predictive Modelling

I try to predict the Extra Baggage variable as the response variable, using set of predictors from other variables. To do so, several algorithms are used, and choice of algorithm is not merely based on their prediction power.

Before venturing on modelling, I rather fill the missing values using imputation. It is specifically necessary for some models that I am going to use such as Lasso.

# missing values pattern 
# md.pattern(train_data)
# so the device is missing in 133 observations 
# another visualization of missing values from the package VIM
aggr_plot <- aggr(x =  train_data, col=c('navyblue','red'), numbers=TRUE, sortVars=TRUE, labels=names(train_data), cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))

## 
##  Variables sorted by number of missings: 
##          Variable   Count
##            DEVICE 0.00266
##                ID 0.00000
##               GDS 0.00000
##         DEPARTURE 0.00000
##           ARRIVAL 0.00000
##            ADULTS 0.00000
##          CHILDREN 0.00000
##           INFANTS 0.00000
##             TRAIN 0.00000
##         HAUL_TYPE 0.00000
##          DISTANCE 0.00000
##         TRIP_TYPE 0.00000
##           PRODUCT 0.00000
##               SMS 0.00000
##     EXTRA_BAGGAGE 0.00000
##            NO_GDS 0.00000
##          dep_wday 0.00000
##          arr_wday 0.00000
##         dep_month 0.00000
##         arr_month 0.00000
##           dep_day 0.00000
##           arr_day 0.00000
##  staying_duration 0.00000
##           country 0.00000
##           website 0.00000
##    num_of_flights 0.00000
imputed_train_data <- mice(data = train_data[,-c(1,3,4)] , m = 1 , method = "cart" , seed = 78 )
## 
##  iter imp variable
##   1   1  DEVICE
##   2   1  DEVICE
##   3   1  DEVICE
##   4   1  DEVICE
##   5   1  DEVICE
imputed_train_data<- mice::complete(imputed_train_data,1)

# have a look at the missing values of the imputed data
aggr_plot <- aggr(x =  imputed_train_data, col=c('navyblue','red'), numbers=TRUE, sortVars=TRUE, labels=names(imputed_train_data), cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))

## 
##  Variables sorted by number of missings: 
##          Variable Count
##               GDS     0
##            ADULTS     0
##          CHILDREN     0
##           INFANTS     0
##             TRAIN     0
##         HAUL_TYPE     0
##          DISTANCE     0
##            DEVICE     0
##         TRIP_TYPE     0
##           PRODUCT     0
##               SMS     0
##     EXTRA_BAGGAGE     0
##            NO_GDS     0
##          dep_wday     0
##          arr_wday     0
##         dep_month     0
##         arr_month     0
##           dep_day     0
##           arr_day     0
##  staying_duration     0
##           country     0
##           website     0
##    num_of_flights     0

4.1 Regression

4.1.1 Logistic Regression

The good old Logistic regression is the first algorithm that I use for prediction. I do not expect it to be an excellent predictor, however Logit can provide us very good insight into the problem. Hopefully.

set.seed(78)



# First try with a simple train - test separation 
set.seed(78)
glm_training_vec <- sample.split(Y = train_data$EXTRA_BAGGAGE , SplitRatio = 0.7 )
glm_training <- train_data[glm_training_vec, ]
glm_test <- train_data[!glm_training_vec, ]

# remove num_of_flights in order to avoid singularity condition 
glm_model_general <- glm(data = glm_training ,
                         formula = EXTRA_BAGGAGE ~ . - ID - DEPARTURE - ARRIVAL - num_of_flights,
                         family = binomial )


summary(glm_model_general)
## 
## Call:
## glm(formula = EXTRA_BAGGAGE ~ . - ID - DEPARTURE - ARRIVAL - 
##     num_of_flights, family = binomial, data = glm_training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7411  -0.7068  -0.5183  -0.3149   2.7804  
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -1.886e+00  1.669e-01 -11.298  < 2e-16 ***
## GDS                       -1.889e-01  4.344e-02  -4.348 1.37e-05 ***
## ADULTS                     2.195e-01  1.583e-02  13.867  < 2e-16 ***
## CHILDREN                   1.843e-01  3.398e-02   5.424 5.84e-08 ***
## INFANTS                    4.336e-01  9.885e-02   4.387 1.15e-05 ***
## TRAINTrue                 -1.324e+01  1.019e+02  -0.130 0.896588    
## HAUL_TYPEDOMESTIC         -3.315e-01  3.583e-02  -9.251  < 2e-16 ***
## HAUL_TYPEINTERCONTINENTAL -1.279e+00  4.645e-02 -27.534  < 2e-16 ***
## DISTANCE                   1.567e-08  4.996e-08   0.314 0.753835    
## DEVICEOTHER               -1.392e+01  2.855e+02  -0.049 0.961110    
## DEVICESMARTPHONE          -2.439e-01  3.565e-02  -6.842 7.80e-12 ***
## DEVICETABLET              -4.274e-02  5.781e-02  -0.739 0.459732    
## TRIP_TYPEONE_WAY           1.815e-01  8.095e-02   2.242 0.024936 *  
## TRIP_TYPEROUND_TRIP       -3.620e-01  7.137e-02  -5.071 3.95e-07 ***
## PRODUCTTRIP                4.359e-01  1.136e-01   3.839 0.000124 ***
## SMSTrue                   -2.946e-02  2.824e-02  -1.043 0.296808    
## NO_GDS                     4.411e-01  3.943e-02  11.186  < 2e-16 ***
## dep_wday2                  6.631e-02  5.712e-02   1.161 0.245631    
## dep_wday3                  3.608e-01  5.469e-02   6.597 4.19e-11 ***
## dep_wday4                  2.886e-01  5.406e-02   5.338 9.40e-08 ***
## dep_wday5                  1.356e-01  5.922e-02   2.290 0.022039 *  
## dep_wday6                  7.576e-02  6.050e-02   1.252 0.210493    
## dep_wday7                  8.440e-02  5.759e-02   1.465 0.142811    
## arr_wday2                  1.037e-01  5.941e-02   1.745 0.081059 .  
## arr_wday3                 -3.111e-01  5.486e-02  -5.670 1.43e-08 ***
## arr_wday4                 -2.370e-01  5.588e-02  -4.241 2.22e-05 ***
## arr_wday5                 -1.344e-01  6.013e-02  -2.236 0.025381 *  
## arr_wday6                 -1.066e-01  6.106e-02  -1.746 0.080835 .  
## arr_wday7                 -1.144e-01  6.170e-02  -1.854 0.063740 .  
## dep_monthrest              1.391e-01  5.570e-02   2.498 0.012493 *  
## arr_monthrest             -4.835e-01  5.619e-02  -8.605  < 2e-16 ***
## dep_daysecond half        -6.795e-02  3.672e-02  -1.850 0.064246 .  
## arr_daysecond half         2.094e-01  3.568e-02   5.868 4.42e-09 ***
## staying_duration           6.103e-03  1.071e-03   5.698 1.21e-08 ***
## countrylow rate country   -9.886e-02  3.056e-02  -3.235 0.001218 ** 
## websiteGO                  4.150e-01  4.927e-02   8.423  < 2e-16 ***
## websiteOP                  2.476e-01  3.327e-02   7.443 9.83e-14 ***
## websiteTL                  1.418e+01  2.855e+02   0.050 0.960399    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 34593  on 34905  degrees of freedom
## Residual deviance: 31519  on 34868  degrees of freedom
##   (94 observations deleted due to missingness)
## AIC: 31595
## 
## Number of Fisher Scoring iterations: 14

Previously I had done this analysis based on raw variables such as arr_month or dep_day. These factor variables with high number of levels were not significant, however now after combination of them into fewer levels, usually two levels, they are significant predictors. I am very happy! (The logit on raw data is not available here to make the report shorter.)

The priliminary results, the results of the first glm model is very interesting. Even though I have not checked the association and correlations, and multicollinearity may have affected the p-values, but significance of variables are very interesting.

These are variables with significant p-values, i.e. their coefficients is very improbable to be zero so they have influence on the purchage of extra baggage with high probability. The insignificant variables are: Train, Distance, and SMS. It is strange that staying_duration is significant.

Let’s remove the insignificant variables, and build an improved model. In order to do so, one approach is shrinkage using penalties, i.e. Ridge and Lasso, and one is finding the best subset. I rather go for Lasso in order to select the subsets, since bestglm or regsubsets did not return any good result considering their computation time. Nevertheless, I have provided the code of stepwise functions in the following chunks.

# ROC for the general glm model 
final_df <- data.frame(id = 1:50000, prob1 = NA , prob2 = NA , prob3 = NA, prob4 = NA , prob5 = NA  )

for (i in 1:5){
        set.seed(i)
        # Let's this time do the CV folding by caret package 
        flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
        
        predictions_df <- data.frame(id = 1:50000, prob1 = NA  )
        cv_pred_df <- data.frame(id = NA, prob = NA )
        
        for(k in 1:5){
                # print(paste0(i,k))
                
                k = 1 
                
                
                glm_general_model <-  glm(data = imputed_train_data[-flds[[k]],-23] ,
                                 formula = EXTRA_BAGGAGE ~ . ,
                                family = binomial )
                
                prediction_glm_model_general<- 
                        predict(object = glm_general_model ,
                                 newdata = imputed_train_data[flds[[k]],-23] ,
                                 type =  "response")
                
                cv_df <- data.frame(id = flds[[k]],prob = prediction_glm_model_general)
                

                predictions_df<- 
                predictions_df %>%
                        left_join(y = cv_df , by = "id")
                
        }
        
        final_df[,(i+1)] <- apply(predictions_df[,-c(1,2)], MARGIN = 1 , FUN = sum , na.rm = TRUE)
        
        
}

#CV-ROC Mean
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , mean )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Mean
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc

#CV-ROC Median
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , median )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Median
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc



# -------




# plot(roc(glm_test$EXTRA_BAGGAGE, prediction_glm_model_general, direction="<"),
#      col="yellow", lwd=3, main="The turtle finds its way")


ROCRpred = prediction( prediction_glm_model_general , glm_test$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
# plot(ROCRperf)
# plot(ROCRperf,  colorize = TRUE)
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7))

# AUC 
ROCRpredTest = prediction(prediction_glm_model_general , glm_test$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc

4.1.2 bestglm

Bestglm is extremely slow on my machine. So the code is below, but I spare myself from running it.

not_include <- which(colnames(train_data) %in% c("ID","DEPARTURE","ARRIVAL","EXTRA_BAGGAGE","num_of_flights"))
Xy <- cbind(train_data[,-not_include], as.logical(train_data$EXTRA_BAGGAGE))

bestglm(Xy, IC = "BIC")

The stepwise selection can be automated using leaps package.

not_include <- which(colnames(train_data) %in% c("ID","DEPARTURE","ARRIVAL","num_of_flights"))

regsub_model <- regsubsets(EXTRA_BAGGAGE ~ website + country + GDS + ADULTS + CHILDREN + 
                                 INFANTS + HAUL_TYPE + DEVICE + TRIP_TYPE + dep_wday + arr_wday + dep_month + 
                                 arr_month + dep_day + arr_day + NO_GDS + PRODUCT + staying_duration
                           , data = train_data[,-not_include],
                           method = "Backward" , nvmax = 10)

4.1.3 Lasso

Lasso penelizes coefficients of predictors in a way that the ones that are not worthy enough would be removed from the model automatically. So the variable set of the model would be shrunk.

matrix_for_lasso <- 
model.matrix( data = imputed_train_data[,-23], ~ TRAIN + HAUL_TYPE + GDS + ADULTS + CHILDREN + INFANTS + 
                      DISTANCE + DEVICE + TRIP_TYPE + PRODUCT + SMS + NO_GDS + dep_wday + arr_wday +
                      staying_duration + country + website+ dep_month + arr_month + dep_day + arr_day )

lasso_model <- glmnet(x = matrix_for_lasso,
                      y = train_data$EXTRA_BAGGAGE, family = "binomial" , alpha = 1  )


plot(lasso_model, xvar = "lambda", label = TRUE)

# plot(lasso_model, xvar = "dev", label = TRUE)

The missing values are imputed and the lasso is done on the model. The above plot shows by change of lambda, how the variables’ coefficients grow from zero. For instance look at variable 4 which jumps out very soon (a sign of good predictibility ? )

Now CV evaluation and variable selection with Lasso

matrix_for_lasso <- 
model.matrix( data = imputed_train_data[,-23],
              ~ TRAIN + HAUL_TYPE + GDS + ADULTS + CHILDREN + INFANTS + 
                      DISTANCE + DEVICE + TRIP_TYPE + PRODUCT + SMS + NO_GDS + dep_wday + arr_wday +
                      staying_duration + country + website+ dep_month + arr_month + dep_day + arr_day )

response_lasso <- train_data$EXTRA_BAGGAGE


flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)


# 5k CV 
k_length <- nrow(matrix_for_lasso) %/% 5 
cv_vector <- c( rep(1,k_length),rep(2,k_length), rep(3,k_length),rep(4,k_length),
                rep(5,(nrow(matrix_for_lasso)-(4*k_length))) )

lambda_cv_df <- data.frame(id = NA, lambda = NA, CV_avg_sen = NA , CV_avg_prec = NA , CV_avg_f1 = NA)
id <- 0 
cv_sen_vec <- vector(length = 5)
cv_prec_vec <- vector(length = 5)
cv_f1_vec<- vector(length = 5)

for (lambda in seq(0.005,0.05,0.001)) {
        
        id <- id + 1 
        set.seed(id)
        
        # print(id)
        for (k in 1:5){
                
                # lambda = 0.001
                # k = 1
                
                 lasso_model <- glmnet(x = matrix_for_lasso[-flds[[k]],],
                      y = response_lasso[-flds[[k]]], family = "binomial" ,
                      alpha = 1, lambda = lambda  )
                 
                 cv_lasso_prediction <-
                         predict(lasso_model,matrix_for_lasso[flds[[k]],],
                                 type = "response" )
                 
                 
                 
                 cv_lasso_prediction <- factor(cv_lasso_prediction>0.2 )
                 levels(cv_lasso_prediction) <- c("False","True")
            
                 c_mat <- confusionMatrix(data = cv_lasso_prediction ,
                                 reference = response_lasso[flds[[k]]] )
                 
                 cv_sen_vec[k] <- c_mat$byClass[1]
                 cv_prec_vec[k] <- c_mat$byClass[5]
                cv_f1_vec[k] <-  c_mat$byClass[7]
                
                
        }
        
        lambda_cv_df[id,] <- c(id, lambda , mean(cv_sen_vec), mean(cv_prec_vec), mean(cv_f1_vec) ) 
        
        
        
       
        
        
}

lambda_cv_df %>% 
        arrange(desc(CV_avg_f1)) %>%
        head()
##   id lambda CV_avg_sen CV_avg_prec CV_avg_f1
## 1  1  0.005  0.6201089   0.8871258 0.7299606
## 2  2  0.006  0.6187905   0.8862482 0.7287522
## 3  3  0.007  0.6176213   0.8859401 0.7278263
## 4  4  0.008  0.6132186   0.8858471 0.7247366
## 5  5  0.009  0.6091640   0.8862894 0.7220480
## 6 10  0.014  0.6093129   0.8818660 0.7206195
# 3d visualization of parameters 
lasso_3dplot_param <- plot_ly(lambda_cv_df, x = ~lambda, y = ~CV_avg_sen, z = ~CV_avg_prec,
        marker = list(color = ~CV_avg_f1, colorscale = c('#0000FF', '#683531'), showscale = TRUE)) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Lambda'),
                     yaxis = list(title = 'CV Avg Sensitivity'),
                     zaxis = list(title = 'CV Avg Precision')))

lasso_3dplot_param
lambda_cv_df %>% 
        ggplot() +
        geom_line(aes(x = lambda , y = CV_avg_sen), color = "blue") + 
        geom_line(aes(x = lambda , y = CV_avg_prec), color = "red") + 
        geom_line(aes(x = lambda , y = CV_avg_f1), color = "green") + 
        theme_light() + 
        ylab("Sen(blue) Prec(red) F1(green)") + 
        ggtitle("CV error on glm training data")

# opt_lambda <- 
#         lambda_cv_df %>% 
#         arrange(desc(CV_avg_f1)) %>% 
#         head(n = 1)
# 
# opt_lambda
# coef(opt_lasso_model)

opt_lasso <- glmnet(x = matrix_for_lasso,
                      y = response_lasso, family = "binomial" ,
                      alpha = 1, lambda = 0.015 )

 lambda_cv_df %>% 
        filter(lambda == 0.015) 
##   id lambda CV_avg_sen CV_avg_prec CV_avg_f1
## 1 11  0.015  0.6034175   0.8808095 0.7161429
# 
# 
# opt_lasso <- glmnet(x = matrix_for_lasso,
#                       y = train_data$EXTRA_BAGGAGE, family = "binomial" , alpha = 1 )


coef(opt_lasso)
## 39 x 1 sparse Matrix of class "dgCMatrix"
##                           s0
## (Intercept)      -1.45009716
## (Intercept)       .         
## TRAIN2            .         
## HAUL_TYPE2       -0.07173010
## HAUL_TYPE3       -0.74243424
## GDS              -0.09827774
## ADULTS            0.16205003
## CHILDREN          .         
## INFANTS           .         
## DISTANCE          .         
## DEVICEOTHER       .         
## DEVICESMARTPHONE  .         
## DEVICETABLET      .         
## TRIP_TYPE2        .         
## TRIP_TYPE3       -0.18343297
## PRODUCT2          .         
## SMS2              .         
## NO_GDS            0.35101086
## dep_wday2         .         
## dep_wday3         .         
## dep_wday4         .         
## dep_wday5         .         
## dep_wday6         .         
## dep_wday7         .         
## arr_wday2         .         
## arr_wday3         .         
## arr_wday4         .         
## arr_wday5         .         
## arr_wday6         .         
## arr_wday7         .         
## staying_duration  .         
## country2          .         
## website2          .         
## website3          .         
## website4          .         
## dep_month2        .         
## arr_month2       -0.17875005
## dep_day2          .         
## arr_day2          .

The best lambda seems 0.015 to me. Using this lambda value, the model would keep arr_month, NO_GDS, TRIP_TYPE,DEVICE, CHILDREN, ADULT, GDS, and HAUL_TYPE. So non of my extracted features are kept! Anyway F1 is really good at this lambda, ~ 0.71.

Let’s evaluate the lasso with the optimum lambda using CV and F1

final_df <- data.frame(id = 1:50000, prob1 = NA , prob2 = NA , prob3 = NA, prob4 = NA , prob5 = NA  )

for (i in 1:5){
        set.seed(i)
        # Let's this time do the CV folding by caret package 
        flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
        
        predictions_df <- data.frame(id = 1:50000, prob1 = NA  )
        cv_pred_df <- data.frame(id = NA, prob = NA )
        
        for(k in 1:5){
                # print(paste0(i,k))
                
                best_lasso <- glmnet(x = matrix_for_lasso[-flds[[k]],],
                      y = response_lasso[-flds[[k]]], family = "binomial" ,
                      alpha = 1, lambda = 0.015  )
                 
                 cv_lasso_prediction <-
                         predict(best_lasso,matrix_for_lasso[flds[[k]],],type = "response" )
                

        
                
                cv_df <- data.frame(id = flds[[k]],prob = cv_lasso_prediction)
                

                predictions_df<- 
                predictions_df %>%
                        left_join(y = cv_df , by = "id")
                
        }
        
        final_df[,(i+1)] <- apply(predictions_df[,-c(1,2)], MARGIN = 1 , FUN = sum , na.rm = TRUE)
        
        
}

#CV-ROC Mean
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , mean )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Mean
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc
## [1] 0.6887411
#CV-ROC Median
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , median )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Median
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc
## [1] 0.6886319

AUC, ~0.68, is not as good as CV F1 of the model at threshold = 0.2.

4.2 Decision Tree

Previously we worked with logit and lasso. A very different class that can help us to get insight into the data is decision tree. While I do not expect very good predictive ability out of this model, the insight would be very valuable.

flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)


decision_tree_cv_df <- data.frame(id= NA, threshold = NA, sen = NA, prec = NA,  f1 = NA)
id = 0 
cv_f1_vec <- vector(length = 5)

for(threshold in seq(0.1,0.9,0.05)){
        id <- id + 1 
        for (k in 1:5) { 
                
                # K = 1 
                
                decision_tree_model <- tree(formula = EXTRA_BAGGAGE ~ . ,
                                            data = imputed_train_data[-flds[[k]],-23])
                
                decision_tree_cv_pred <- predict(decision_tree_model,
                                                 newdata =imputed_train_data[flds[[k]],-23],
                                                 type = "vector" )
                
                
                decision_tree_cv_pred <- factor(decision_tree_cv_pred[,2]>threshold )
                          levels(decision_tree_cv_pred) <- c("False","True")
                
        
                
                
                         c_mat <- confusionMatrix(data = decision_tree_cv_pred ,
                                         reference = imputed_train_data[flds[[k]],"EXTRA_BAGGAGE"  ],
                                         positive = "True" )

                         cv_sen_vec[k] <- c_mat$byClass[1]
                         cv_prec_vec[k] <- c_mat$byClass[5]
                         cv_f1_vec[k] <-  c_mat$byClass[7]
                
                
        }
        decision_tree_cv_df[id,] <- c(id, threshold,mean(cv_sen_vec),mean(cv_prec_vec),mean(cv_f1_vec))
        
}


decision_tree_cv_df %>% 
        arrange(desc(f1)) %>% 
        head()
##   id threshold       sen      prec        f1
## 1  3      0.20 0.7109908 0.2740228 0.3955720
## 2  1      0.10 0.9695883 0.2334438 0.3762884
## 3  2      0.15 0.9695883 0.2334438 0.3762884
## 4  4      0.25 0.3713645 0.3547647 0.3628274
## 5  5      0.30 0.3713645 0.3547647 0.3628274
## 6  6      0.35 0.3713645 0.3547647 0.3628274

As we can see, at threshod 0.2, our decision tree has f1 equal to ~0.39. Very low score, but let’s check CV-ROC of the decision tree.

final_df <- data.frame(id = 1:50000, prob1 = NA , prob2 = NA , prob3 = NA, prob4 = NA , prob5 = NA  )

for (i in 1:5){
        set.seed(i)
        # Let's this time do the CV folding by caret package 
        flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
        
        predictions_df <- data.frame(id = 1:50000, prob1 = NA  )
        cv_pred_df <- data.frame(id = NA, prob = NA )
        
        for(k in 1:5){
                 # print(paste0(i,k))
                
           
                
                 
                 decision_tree_model <- tree(formula = EXTRA_BAGGAGE ~ . ,
                                            data = imputed_train_data[-flds[[k]],-23])
                
                decision_tree_cv_pred <- predict(decision_tree_model,
                                                 newdata =imputed_train_data[flds[[k]],-23],
                                                 type = "vector" )[,2]

        
                
                cv_df <- data.frame(id = flds[[k]],prob = decision_tree_cv_pred)
                

                predictions_df<- 
                predictions_df %>%
                        left_join(y = cv_df , by = "id")
                
        }
        
        final_df[,(i+1)] <- apply(predictions_df[,-c(1,2)], MARGIN = 1 , FUN = sum , na.rm = TRUE)
        
        
}

#CV-ROC Mean
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , mean )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Mean
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc

#CV-ROC Median
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , median )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Median
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc

AUC is ~ 0.67, better than Lasso, but still I think we can get better results. Also it was possible to prune this tree, but anyway we have random forest in the horizon, so let’s not waste more time on decision tree.

4.3 Random Forest

Random Forest usually have better predictability. First I need to tune the RF model a range of number of trees, and number of variable selections. I have done using the below chunk of code based on CV-OOB. It took ages for my machine to do the grid search, so I saved the results, and spared my laptop to redo it.

Note: The below code is the second grid search attempt. On the first, I searched from ntree= 50 to 300 and mtry 2:18. Then I detected a smaller range in which OOB is low, so in the second attempt the ranges are more strict and adjusted. The algorithm takes very long on my laptop, so I spare my machine from a thorough grid search.

set.seed(7)
rf_oob_df <- data.frame(id = NA, ntree = NA, mtry = NA, avg_oob = NA)
id = 0 

# expand.grid(ntree = seq(50,300,50), mtry = seq(2,18,1))
for(t in seq(200,400,100)) {
          for (n in 4:9) {
        
                
                id <- id + 1 
                # print(id)
                
                rf_model <- randomForest(EXTRA_BAGGAGE ~ . ,
                                         data = imputed_train_data[,-23] ,mtry = n, ntree = t )
        
                
                rf_oob_df[id,] <- c(id,t,n,(rf_model$err.rate[t,1]))
                
        }
        
}
      
# 3dplot 
rf_3dplot_param <- plot_ly(rf_oob_df, x = ~ntree, y = ~mtry, z = ~avg_oob,
        marker = list(color = ~avg_oob, colorscale = c('#0000FF', '#683531'), showscale = TRUE)) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'number of trees'),
                     yaxis = list(title = 'number of variables'),
                     zaxis = list(title = 'Avg OOB error')))

rf_3dplot_param




write_csv(x = rf_oob_df, "randomforest_cv_oob.csv")

From the 3d plot it is seen that ntree=350 and mtry = 6 is the best combination based on the minimum OOB. Let’s build a model based on it.

# Have a look at varibale importance based on the optimum number of mtry
rf_model <- randomForest(EXTRA_BAGGAGE ~ . ,
                                 data = imputed_train_data[,-23] ,mtry = 6, ntree = 350 )

varImpPlot(rf_model,type=2)

Extremely interesting that DISTANCE and staying_duration are the most influential variables. From the four top variables, three are the features that I have defined from raw variables. These influential variables are the features that have discriminatory power to separate extra baggage buyers from non-buyers.

Now let’s find a good threshold for binary response.

final_df <- data.frame(id = 1:50000, prob1 = NA , prob2 = NA , prob3 = NA, prob4 = NA , prob5 = NA  )

for (i in 1:5){
        set.seed(i)
        # Let's this time do the CV folding by caret package 
        flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
        
        predictions_df <- data.frame(id = 1:50000, prob1 = NA  )
        cv_pred_df <- data.frame(id = NA, prob = NA )
        
        for(k in 1:5){
                 # print(paste0(i,k))

                
                 
                
                rf_model <- randomForest(EXTRA_BAGGAGE ~ . ,
                                         data = imputed_train_data[-flds[[k]],-23] ,
                                         mtry = 6, ntree = 350 )
                
                rf_prediction <- predict(rf_model,newdata = imputed_train_data[flds[[k]],-23],
                                         type = "prob")[,2]

        
                
                cv_df <- data.frame(id = flds[[k]],prob = rf_prediction)
                

                predictions_df<- 
                predictions_df %>%
                        left_join(y = cv_df , by = "id")
                
        }
        
        final_df[,(i+1)] <- apply(predictions_df[,-c(1,2)], MARGIN = 1 , FUN = sum , na.rm = TRUE)
        
        
}

#CV-ROC Mean
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , mean )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Mean
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc
## [1] 0.7829982
#CV-ROC Median
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , median )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Median
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc
## [1] 0.7825461

This is the highest AUC so far, ~ 0.78.

rf_cv_df_f1 <- data.frame(id = NA ,ntree = NA,
                          mtry = NA, threshold = NA , mean_sen = NA,
                          mean_perc = NA, mean_f1 = NA , sd_f1 = NA)

id <- 0 
set.seed(14)
# Let's this time do the CV folding by caret package 
flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
# names(flds)[1] <- "train"
cv_sen_vec <- vector(length = 5)
cv_prec_vec <- vector(length = 5)
cv_f1_vec <- vector(length = 5)

# rf_cv_df_f1 <- c(id,t,n,mean(cv_sen_vec),mean(cv_prec_vec),mean(cv_f1_vec),sd(cv_f1_vec))

# expand.grid(ntree = seq(50,300,50), mtry = seq(2,18,1))
for(t in seq(450,500,50)) {
          for (n in 4:7) {
                  
                          th = 0.2 
                          id <- id + 1 
                          # print(id)
                
                
                        # 5 fold cross-valudation, on each remaining fold prediction is done
                        for (k in 1:5){
                                
                                # n = 6 
                                # t = 300
                               rf_model <- randomForest(EXTRA_BAGGAGE ~ . ,
                                                 data = imputed_train_data[-flds[[k]],-23] ,
                                                 mtry = n, ntree = t )
                        
                                rf_prediction <- predict(rf_model,
                                                 newdata = imputed_train_data[flds[[k]],-23],
                                                 type = "prob")
                        
                                rf_prediction <- factor(rf_prediction[,2]>th )
                                  levels(rf_prediction) <- c("False","True")
                                  
                                  c_mat <- confusionMatrix(data = rf_prediction ,
                                                 reference = imputed_train_data[flds[[k]],"EXTRA_BAGGAGE"],
                                                 positive = "True" )
        
                                 cv_sen_vec[k] <- c_mat$byClass[1]
                                 cv_prec_vec[k] <- c_mat$byClass[5]
                                 cv_f1_vec[k] <-  c_mat$byClass[7]
                                
                        }
                
                               rf_cv_df_f1[id, ] <-
                                       c(id,t,n,th,mean(cv_sen_vec),
                                         mean(cv_prec_vec),
                                         mean(cv_f1_vec),sd(cv_f1_vec))
         
                         
                
                
          }
        
        
        
}
      
# write_csv(x =rf_cv_df_f1 ,  "randomforest_cv_f1.csv")
# 3dplot 
rf_3dplot_param <- plot_ly(rf_cv_df_f1, x = ~ntree, y = ~mtry, z = ~mean_f1,
        marker = list(color = ~mean_f1, colorscale = c('#0000FF', '#683531'), showscale = TRUE)) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'number of trees'),
                     yaxis = list(title = 'number of variables'),
                     zaxis = list(title = 'Avg CV-F1')))

rf_3dplot_param
#So the optimum parameters that I could find are ntree = 400 and mtry = 5 




final_df <- data.frame(id = 1:50000, prob1 = NA , prob2 = NA , prob3 = NA, prob4 = NA , prob5 = NA  )

for (i in 1:5){
        set.seed(i)
        # Let's this time do the CV folding by caret package 
        flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
        
        predictions_df <- data.frame(id = 1:50000, prob1 = NA  )
        cv_pred_df <- data.frame(id = NA, prob = NA )
        
        for(k in 1:5){
                # print(paste0(i,k))
                
                best_rf <- randomForest(EXTRA_BAGGAGE ~ . ,
                                                 data = imputed_train_data[-flds[[k]],-23] ,
                                                 mtry = 5, ntree = 400 )
                
                rf_prediction <- predict(best_rf,
                                                 newdata = imputed_train_data[flds[[k]],-23],
                                                 type = "prob")[,2]
                
                cv_df <- data.frame(id = flds[[k]],prob = rf_prediction)
                

                predictions_df<- 
                predictions_df %>%
                        left_join(y = cv_df , by = "id")
                
        }
        
        final_df[,(i+1)] <- apply(predictions_df[,-c(1,2)], MARGIN = 1 , FUN = sum , na.rm = TRUE)
        
        
}

#CV-ROC Mean
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , mean )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Mean
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc

#CV-ROC Median
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , median )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Median
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc

The CV-AUC is ~ 0.77 and slightly lower than the previous settings of (6,350). So I go with the former.

4.4 Gradient Boosting

Gradient boosting and specifically XGboost implementation are not only algorithms with high predictive power, but also high pace with embedded multi-core computation.

So I won’t have the problem that I had with SVM, hopefully. Following I run two XGBoost algorithms, one based on boosting trees, and one linear boosting. I tune parameters nrounds, max depth, and prediction threshold using CV method and F1 metric.

# sparse model matrix 
# imputed_train_data[,-23]

sparse_train_data <- sparse.model.matrix( EXTRA_BAGGAGE ~ . - EXTRA_BAGGAGE ,
                                          data = imputed_train_data[,-c(23)] )
# colnames(sparse_train_data)

#Parameters
# max_depth = 2: the trees won’t be deep, because our case is very simple ; 
# nthread = 3: the number of cpu threads we are going to use;
# nrounds = 2: there will be two passes on the data, the second one will enhance the model by 

## Since the labels must be numeric 0 and 1 
# table(imputed_train_data$EXTRA_BAGGAGE)
xgboost_labels <- imputed_train_data$EXTRA_BAGGAGE
levels(xgboost_labels) <- c(0,1)
# table(xgboost_labels)
xgboost_labels <- as.integer(xgboost_labels)
xgboost_labels <- xgboost_labels - 1 



# CV preparation 
set.seed(13)
# Let's this time do the CV folding by caret package 
flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
# names(flds)[1] <- "train"
c_sen_vec <- vector(length = 5)
c_prec_vec <- vector(length = 5)
c_f1_vec <- vector(length = 5)
id <- 0 
xgb_cv_df <- data.frame(id = NA , nrounds = NA , max_depnth = NA , 
                        threshod = NA , avg_sen = NA , avg_prec = NA,
                        avg_f1 = NA , sd_f1 = NA)

#for tuning nrounds
for(n in 1:4){
#for tuning max depth 
for(d in 2:15) {
         
# tuning threshold of response variable
for (t in seq(0.1,0.4,0.05)) {
        id <- id + 1 
        # print(id)
       
        # CV 
        for (k in 1:5){
                # print(paste0("k",k))
                # k = 1 
                # t = 0.1 
                
                
                bstSparse <- xgboost(data = sparse_train_data[-flds[[k]],],
                                     label = xgboost_labels[-flds[[k]]],
                                     max_depth = d, eta = 1,
                                     nthread = 3, nrounds = n,
                                     objective = "binary:logistic")
                
                xgb_prediction <- predict(bstSparse ,
                                          sparse_train_data[flds[[k]],], type = "response")
                
                xgb_prediction <- factor(xgb_prediction>t )
                          levels(xgb_prediction) <- c("False","True")
                
               
                
                
                         c_mat <- confusionMatrix(data = xgb_prediction ,
                                         reference = imputed_train_data$EXTRA_BAGGAGE[flds[[k]]],
                                         positive = "True" )

                         cv_sen_vec[k] <- c_mat$byClass[1]
                         cv_prec_vec[k] <- c_mat$byClass[5]
                         cv_f1_vec[k] <-  c_mat$byClass[7]
                
                
        }
        xgb_cv_df[id,] <- c(id , n ,  d , t ,
                            mean(cv_sen_vec) , mean(cv_prec_vec), mean(cv_f1_vec) , sd(cv_f1_vec))

        
}#threshold 
}#depth 
}#nrounds
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195620 
## [1]  train-error:0.195300 
## [1]  train-error:0.195650 
## [1]  train-error:0.195605 
## [1]  train-error:0.195475 
## [1]  train-error:0.195620 
## [1]  train-error:0.195300 
## [1]  train-error:0.195650 
## [1]  train-error:0.195605 
## [1]  train-error:0.195475 
## [1]  train-error:0.195620 
## [1]  train-error:0.195300 
## [1]  train-error:0.195650 
## [1]  train-error:0.195605 
## [1]  train-error:0.195475 
## [1]  train-error:0.195620 
## [1]  train-error:0.195300 
## [1]  train-error:0.195650 
## [1]  train-error:0.195605 
## [1]  train-error:0.195475 
## [1]  train-error:0.195620 
## [1]  train-error:0.195300 
## [1]  train-error:0.195650 
## [1]  train-error:0.195605 
## [1]  train-error:0.195475 
## [1]  train-error:0.195620 
## [1]  train-error:0.195300 
## [1]  train-error:0.195650 
## [1]  train-error:0.195605 
## [1]  train-error:0.195475 
## [1]  train-error:0.195620 
## [1]  train-error:0.195300 
## [1]  train-error:0.195650 
## [1]  train-error:0.195605 
## [1]  train-error:0.195475 
## [1]  train-error:0.194820 
## [1]  train-error:0.194575 
## [1]  train-error:0.194200 
## [1]  train-error:0.192480 
## [1]  train-error:0.192425 
## [1]  train-error:0.194820 
## [1]  train-error:0.194575 
## [1]  train-error:0.194200 
## [1]  train-error:0.192480 
## [1]  train-error:0.192425 
## [1]  train-error:0.194820 
## [1]  train-error:0.194575 
## [1]  train-error:0.194200 
## [1]  train-error:0.192480 
## [1]  train-error:0.192425 
## [1]  train-error:0.194820 
## [1]  train-error:0.194575 
## [1]  train-error:0.194200 
## [1]  train-error:0.192480 
## [1]  train-error:0.192425 
## [1]  train-error:0.194820 
## [1]  train-error:0.194575 
## [1]  train-error:0.194200 
## [1]  train-error:0.192480 
## [1]  train-error:0.192425 
## [1]  train-error:0.194820 
## [1]  train-error:0.194575 
## [1]  train-error:0.194200 
## [1]  train-error:0.192480 
## [1]  train-error:0.192425 
## [1]  train-error:0.194820 
## [1]  train-error:0.194575 
## [1]  train-error:0.194200 
## [1]  train-error:0.192480 
## [1]  train-error:0.192425 
## [1]  train-error:0.191820 
## [1]  train-error:0.193675 
## [1]  train-error:0.193850 
## [1]  train-error:0.190405 
## [1]  train-error:0.190225 
## [1]  train-error:0.191820 
## [1]  train-error:0.193675 
## [1]  train-error:0.193850 
## [1]  train-error:0.190405 
## [1]  train-error:0.190225 
## [1]  train-error:0.191820 
## [1]  train-error:0.193675 
## [1]  train-error:0.193850 
## [1]  train-error:0.190405 
## [1]  train-error:0.190225 
## [1]  train-error:0.191820 
## [1]  train-error:0.193675 
## [1]  train-error:0.193850 
## [1]  train-error:0.190405 
## [1]  train-error:0.190225 
## [1]  train-error:0.191820 
## [1]  train-error:0.193675 
## [1]  train-error:0.193850 
## [1]  train-error:0.190405 
## [1]  train-error:0.190225 
## [1]  train-error:0.191820 
## [1]  train-error:0.193675 
## [1]  train-error:0.193850 
## [1]  train-error:0.190405 
## [1]  train-error:0.190225 
## [1]  train-error:0.191820 
## [1]  train-error:0.193675 
## [1]  train-error:0.193850 
## [1]  train-error:0.190405 
## [1]  train-error:0.190225 
## [1]  train-error:0.190295 
## [1]  train-error:0.190775 
## [1]  train-error:0.189900 
## [1]  train-error:0.189205 
## [1]  train-error:0.189000 
## [1]  train-error:0.190295 
## [1]  train-error:0.190775 
## [1]  train-error:0.189900 
## [1]  train-error:0.189205 
## [1]  train-error:0.189000 
## [1]  train-error:0.190295 
## [1]  train-error:0.190775 
## [1]  train-error:0.189900 
## [1]  train-error:0.189205 
## [1]  train-error:0.189000 
## [1]  train-error:0.190295 
## [1]  train-error:0.190775 
## [1]  train-error:0.189900 
## [1]  train-error:0.189205 
## [1]  train-error:0.189000 
## [1]  train-error:0.190295 
## [1]  train-error:0.190775 
## [1]  train-error:0.189900 
## [1]  train-error:0.189205 
## [1]  train-error:0.189000 
## [1]  train-error:0.190295 
## [1]  train-error:0.190775 
## [1]  train-error:0.189900 
## [1]  train-error:0.189205 
## [1]  train-error:0.189000 
## [1]  train-error:0.190295 
## [1]  train-error:0.190775 
## [1]  train-error:0.189900 
## [1]  train-error:0.189205 
## [1]  train-error:0.189000 
## [1]  train-error:0.187745 
## [1]  train-error:0.187575 
## [1]  train-error:0.187700 
## [1]  train-error:0.187280 
## [1]  train-error:0.186150 
## [1]  train-error:0.187745 
## [1]  train-error:0.187575 
## [1]  train-error:0.187700 
## [1]  train-error:0.187280 
## [1]  train-error:0.186150 
## [1]  train-error:0.187745 
## [1]  train-error:0.187575 
## [1]  train-error:0.187700 
## [1]  train-error:0.187280 
## [1]  train-error:0.186150 
## [1]  train-error:0.187745 
## [1]  train-error:0.187575 
## [1]  train-error:0.187700 
## [1]  train-error:0.187280 
## [1]  train-error:0.186150 
## [1]  train-error:0.187745 
## [1]  train-error:0.187575 
## [1]  train-error:0.187700 
## [1]  train-error:0.187280 
## [1]  train-error:0.186150 
## [1]  train-error:0.187745 
## [1]  train-error:0.187575 
## [1]  train-error:0.187700 
## [1]  train-error:0.187280 
## [1]  train-error:0.186150 
## [1]  train-error:0.187745 
## [1]  train-error:0.187575 
## [1]  train-error:0.187700 
## [1]  train-error:0.187280 
## [1]  train-error:0.186150 
## [1]  train-error:0.183120 
## [1]  train-error:0.183950 
## [1]  train-error:0.184150 
## [1]  train-error:0.183905 
## [1]  train-error:0.182850 
## [1]  train-error:0.183120 
## [1]  train-error:0.183950 
## [1]  train-error:0.184150 
## [1]  train-error:0.183905 
## [1]  train-error:0.182850 
## [1]  train-error:0.183120 
## [1]  train-error:0.183950 
## [1]  train-error:0.184150 
## [1]  train-error:0.183905 
## [1]  train-error:0.182850 
## [1]  train-error:0.183120 
## [1]  train-error:0.183950 
## [1]  train-error:0.184150 
## [1]  train-error:0.183905 
## [1]  train-error:0.182850 
## [1]  train-error:0.183120 
## [1]  train-error:0.183950 
## [1]  train-error:0.184150 
## [1]  train-error:0.183905 
## [1]  train-error:0.182850 
## [1]  train-error:0.183120 
## [1]  train-error:0.183950 
## [1]  train-error:0.184150 
## [1]  train-error:0.183905 
## [1]  train-error:0.182850 
## [1]  train-error:0.183120 
## [1]  train-error:0.183950 
## [1]  train-error:0.184150 
## [1]  train-error:0.183905 
## [1]  train-error:0.182850 
## [1]  train-error:0.176321 
## [1]  train-error:0.179125 
## [1]  train-error:0.178025 
## [1]  train-error:0.177879 
## [1]  train-error:0.177875 
## [1]  train-error:0.176321 
## [1]  train-error:0.179125 
## [1]  train-error:0.178025 
## [1]  train-error:0.177879 
## [1]  train-error:0.177875 
## [1]  train-error:0.176321 
## [1]  train-error:0.179125 
## [1]  train-error:0.178025 
## [1]  train-error:0.177879 
## [1]  train-error:0.177875 
## [1]  train-error:0.176321 
## [1]  train-error:0.179125 
## [1]  train-error:0.178025 
## [1]  train-error:0.177879 
## [1]  train-error:0.177875 
## [1]  train-error:0.176321 
## [1]  train-error:0.179125 
## [1]  train-error:0.178025 
## [1]  train-error:0.177879 
## [1]  train-error:0.177875 
## [1]  train-error:0.176321 
## [1]  train-error:0.179125 
## [1]  train-error:0.178025 
## [1]  train-error:0.177879 
## [1]  train-error:0.177875 
## [1]  train-error:0.176321 
## [1]  train-error:0.179125 
## [1]  train-error:0.178025 
## [1]  train-error:0.177879 
## [1]  train-error:0.177875 
## [1]  train-error:0.169721 
## [1]  train-error:0.172925 
## [1]  train-error:0.170850 
## [1]  train-error:0.171254 
## [1]  train-error:0.171575 
## [1]  train-error:0.169721 
## [1]  train-error:0.172925 
## [1]  train-error:0.170850 
## [1]  train-error:0.171254 
## [1]  train-error:0.171575 
## [1]  train-error:0.169721 
## [1]  train-error:0.172925 
## [1]  train-error:0.170850 
## [1]  train-error:0.171254 
## [1]  train-error:0.171575 
## [1]  train-error:0.169721 
## [1]  train-error:0.172925 
## [1]  train-error:0.170850 
## [1]  train-error:0.171254 
## [1]  train-error:0.171575 
## [1]  train-error:0.169721 
## [1]  train-error:0.172925 
## [1]  train-error:0.170850 
## [1]  train-error:0.171254 
## [1]  train-error:0.171575 
## [1]  train-error:0.169721 
## [1]  train-error:0.172925 
## [1]  train-error:0.170850 
## [1]  train-error:0.171254 
## [1]  train-error:0.171575 
## [1]  train-error:0.169721 
## [1]  train-error:0.172925 
## [1]  train-error:0.170850 
## [1]  train-error:0.171254 
## [1]  train-error:0.171575 
## [1]  train-error:0.162271 
## [1]  train-error:0.167300 
## [1]  train-error:0.163425 
## [1]  train-error:0.164004 
## [1]  train-error:0.165375 
## [1]  train-error:0.162271 
## [1]  train-error:0.167300 
## [1]  train-error:0.163425 
## [1]  train-error:0.164004 
## [1]  train-error:0.165375 
## [1]  train-error:0.162271 
## [1]  train-error:0.167300 
## [1]  train-error:0.163425 
## [1]  train-error:0.164004 
## [1]  train-error:0.165375 
## [1]  train-error:0.162271 
## [1]  train-error:0.167300 
## [1]  train-error:0.163425 
## [1]  train-error:0.164004 
## [1]  train-error:0.165375 
## [1]  train-error:0.162271 
## [1]  train-error:0.167300 
## [1]  train-error:0.163425 
## [1]  train-error:0.164004 
## [1]  train-error:0.165375 
## [1]  train-error:0.162271 
## [1]  train-error:0.167300 
## [1]  train-error:0.163425 
## [1]  train-error:0.164004 
## [1]  train-error:0.165375 
## [1]  train-error:0.162271 
## [1]  train-error:0.167300 
## [1]  train-error:0.163425 
## [1]  train-error:0.164004 
## [1]  train-error:0.165375 
## [1]  train-error:0.154296 
## [1]  train-error:0.160400 
## [1]  train-error:0.156600 
## [1]  train-error:0.157979 
## [1]  train-error:0.158700 
## [1]  train-error:0.154296 
## [1]  train-error:0.160400 
## [1]  train-error:0.156600 
## [1]  train-error:0.157979 
## [1]  train-error:0.158700 
## [1]  train-error:0.154296 
## [1]  train-error:0.160400 
## [1]  train-error:0.156600 
## [1]  train-error:0.157979 
## [1]  train-error:0.158700 
## [1]  train-error:0.154296 
## [1]  train-error:0.160400 
## [1]  train-error:0.156600 
## [1]  train-error:0.157979 
## [1]  train-error:0.158700 
## [1]  train-error:0.154296 
## [1]  train-error:0.160400 
## [1]  train-error:0.156600 
## [1]  train-error:0.157979 
## [1]  train-error:0.158700 
## [1]  train-error:0.154296 
## [1]  train-error:0.160400 
## [1]  train-error:0.156600 
## [1]  train-error:0.157979 
## [1]  train-error:0.158700 
## [1]  train-error:0.154296 
## [1]  train-error:0.160400 
## [1]  train-error:0.156600 
## [1]  train-error:0.157979 
## [1]  train-error:0.158700 
## [1]  train-error:0.148071 
## [1]  train-error:0.153825 
## [1]  train-error:0.148950 
## [1]  train-error:0.151004 
## [1]  train-error:0.151900 
## [1]  train-error:0.148071 
## [1]  train-error:0.153825 
## [1]  train-error:0.148950 
## [1]  train-error:0.151004 
## [1]  train-error:0.151900 
## [1]  train-error:0.148071 
## [1]  train-error:0.153825 
## [1]  train-error:0.148950 
## [1]  train-error:0.151004 
## [1]  train-error:0.151900 
## [1]  train-error:0.148071 
## [1]  train-error:0.153825 
## [1]  train-error:0.148950 
## [1]  train-error:0.151004 
## [1]  train-error:0.151900 
## [1]  train-error:0.148071 
## [1]  train-error:0.153825 
## [1]  train-error:0.148950 
## [1]  train-error:0.151004 
## [1]  train-error:0.151900 
## [1]  train-error:0.148071 
## [1]  train-error:0.153825 
## [1]  train-error:0.148950 
## [1]  train-error:0.151004 
## [1]  train-error:0.151900 
## [1]  train-error:0.148071 
## [1]  train-error:0.153825 
## [1]  train-error:0.148950 
## [1]  train-error:0.151004 
## [1]  train-error:0.151900 
## [1]  train-error:0.142771 
## [1]  train-error:0.147000 
## [1]  train-error:0.143175 
## [1]  train-error:0.145604 
## [1]  train-error:0.145525 
## [1]  train-error:0.142771 
## [1]  train-error:0.147000 
## [1]  train-error:0.143175 
## [1]  train-error:0.145604 
## [1]  train-error:0.145525 
## [1]  train-error:0.142771 
## [1]  train-error:0.147000 
## [1]  train-error:0.143175 
## [1]  train-error:0.145604 
## [1]  train-error:0.145525 
## [1]  train-error:0.142771 
## [1]  train-error:0.147000 
## [1]  train-error:0.143175 
## [1]  train-error:0.145604 
## [1]  train-error:0.145525 
## [1]  train-error:0.142771 
## [1]  train-error:0.147000 
## [1]  train-error:0.143175 
## [1]  train-error:0.145604 
## [1]  train-error:0.145525 
## [1]  train-error:0.142771 
## [1]  train-error:0.147000 
## [1]  train-error:0.143175 
## [1]  train-error:0.145604 
## [1]  train-error:0.145525 
## [1]  train-error:0.142771 
## [1]  train-error:0.147000 
## [1]  train-error:0.143175 
## [1]  train-error:0.145604 
## [1]  train-error:0.145525 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [4]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.196225 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [4]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [4]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.196225 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [4]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [4]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.196225 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [4]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [4]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.196225 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [4]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [4]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.196225 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [4]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [4]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.196225 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [4]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [4]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.196225 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [4]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [4]  train-error:0.192845 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [4]  train-error:0.190925 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [4]  train-error:0.190850 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [4]  train-error:0.192755 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [4]  train-error:0.192350 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [4]  train-error:0.192845 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [4]  train-error:0.190925 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [4]  train-error:0.190850 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [4]  train-error:0.192755 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [4]  train-error:0.192350 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [4]  train-error:0.192845 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [4]  train-error:0.190925 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [4]  train-error:0.190850 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [4]  train-error:0.192755 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [4]  train-error:0.192350 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [4]  train-error:0.192845 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [4]  train-error:0.190925 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [4]  train-error:0.190850 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [4]  train-error:0.192755 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [4]  train-error:0.192350 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [4]  train-error:0.192845 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [4]  train-error:0.190925 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [4]  train-error:0.190850 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [4]  train-error:0.192755 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [4]  train-error:0.192350 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [4]  train-error:0.192845 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [4]  train-error:0.190925 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [4]  train-error:0.190850 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [4]  train-error:0.192755 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [4]  train-error:0.192350 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [4]  train-error:0.192845 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [4]  train-error:0.190925 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [4]  train-error:0.190850 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [4]  train-error:0.192755 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [4]  train-error:0.192350 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [4]  train-error:0.190495 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [4]  train-error:0.190175 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [4]  train-error:0.190150 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [4]  train-error:0.191105 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [4]  train-error:0.190700 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [4]  train-error:0.190495 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [4]  train-error:0.190175 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [4]  train-error:0.190150 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [4]  train-error:0.191105 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [4]  train-error:0.190700 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [4]  train-error:0.190495 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [4]  train-error:0.190175 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [4]  train-error:0.190150 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [4]  train-error:0.191105 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [4]  train-error:0.190700 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [4]  train-error:0.190495 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [4]  train-error:0.190175 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [4]  train-error:0.190150 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [4]  train-error:0.191105 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [4]  train-error:0.190700 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [4]  train-error:0.190495 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [4]  train-error:0.190175 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [4]  train-error:0.190150 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [4]  train-error:0.191105 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [4]  train-error:0.190700 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [4]  train-error:0.190495 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [4]  train-error:0.190175 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [4]  train-error:0.190150 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [4]  train-error:0.191105 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [4]  train-error:0.190700 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [4]  train-error:0.190495 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [4]  train-error:0.190175 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [4]  train-error:0.190150 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [4]  train-error:0.191105 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [4]  train-error:0.190700 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [4]  train-error:0.190595 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [4]  train-error:0.188100 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [4]  train-error:0.188500 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [4]  train-error:0.189730 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [4]  train-error:0.189250 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [4]  train-error:0.190595 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [4]  train-error:0.188100 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [4]  train-error:0.188500 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [4]  train-error:0.189730 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [4]  train-error:0.189250 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [4]  train-error:0.190595 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [4]  train-error:0.188100 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [4]  train-error:0.188500 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [4]  train-error:0.189730 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [4]  train-error:0.189250 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [4]  train-error:0.190595 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [4]  train-error:0.188100 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [4]  train-error:0.188500 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [4]  train-error:0.189730 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [4]  train-error:0.189250 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [4]  train-error:0.190595 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [4]  train-error:0.188100 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [4]  train-error:0.188500 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [4]  train-error:0.189730 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [4]  train-error:0.189250 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [4]  train-error:0.190595 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [4]  train-error:0.188100 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [4]  train-error:0.188500 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [4]  train-error:0.189730 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [4]  train-error:0.189250 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [4]  train-error:0.190595 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [4]  train-error:0.188100 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [4]  train-error:0.188500 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [4]  train-error:0.189730 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [4]  train-error:0.189250 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [4]  train-error:0.185820 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [4]  train-error:0.187300 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [4]  train-error:0.185175 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [4]  train-error:0.187080 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [4]  train-error:0.185850 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [4]  train-error:0.185820 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [4]  train-error:0.187300 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [4]  train-error:0.185175 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [4]  train-error:0.187080 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [4]  train-error:0.185850 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [4]  train-error:0.185820 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [4]  train-error:0.187300 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [4]  train-error:0.185175 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [4]  train-error:0.187080 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [4]  train-error:0.185850 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [4]  train-error:0.185820 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [4]  train-error:0.187300 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [4]  train-error:0.185175 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [4]  train-error:0.187080 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [4]  train-error:0.185850 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [4]  train-error:0.185820 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [4]  train-error:0.187300 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [4]  train-error:0.185175 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [4]  train-error:0.187080 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [4]  train-error:0.185850 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [4]  train-error:0.185820 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [4]  train-error:0.187300 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [4]  train-error:0.185175 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [4]  train-error:0.187080 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [4]  train-error:0.185850 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [4]  train-error:0.185820 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [4]  train-error:0.187300 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [4]  train-error:0.185175 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [4]  train-error:0.187080 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [4]  train-error:0.185850 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [4]  train-error:0.181020 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [4]  train-error:0.182700 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [4]  train-error:0.181425 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [4]  train-error:0.181730 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [4]  train-error:0.182150 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [4]  train-error:0.181020 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [4]  train-error:0.182700 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [4]  train-error:0.181425 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [4]  train-error:0.181730 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [4]  train-error:0.182150 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [4]  train-error:0.181020 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [4]  train-error:0.182700 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [4]  train-error:0.181425 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [4]  train-error:0.181730 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [4]  train-error:0.182150 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [4]  train-error:0.181020 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [4]  train-error:0.182700 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [4]  train-error:0.181425 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [4]  train-error:0.181730 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [4]  train-error:0.182150 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [4]  train-error:0.181020 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [4]  train-error:0.182700 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [4]  train-error:0.181425 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [4]  train-error:0.181730 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [4]  train-error:0.182150 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [4]  train-error:0.181020 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [4]  train-error:0.182700 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [4]  train-error:0.181425 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [4]  train-error:0.181730 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [4]  train-error:0.182150 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [4]  train-error:0.181020 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [4]  train-error:0.182700 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [4]  train-error:0.181425 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [4]  train-error:0.181730 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [4]  train-error:0.182150 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [4]  train-error:0.176321 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [4]  train-error:0.175275 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [4]  train-error:0.173700 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [4]  train-error:0.176829 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [4]  train-error:0.176250 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [4]  train-error:0.176321 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [4]  train-error:0.175275 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [4]  train-error:0.173700 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [4]  train-error:0.176829 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [4]  train-error:0.176250 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [4]  train-error:0.176321 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [4]  train-error:0.175275 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [4]  train-error:0.173700 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [4]  train-error:0.176829 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [4]  train-error:0.176250 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [4]  train-error:0.176321 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [4]  train-error:0.175275 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [4]  train-error:0.173700 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [4]  train-error:0.176829 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [4]  train-error:0.176250 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [4]  train-error:0.176321 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [4]  train-error:0.175275 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [4]  train-error:0.173700 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [4]  train-error:0.176829 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [4]  train-error:0.176250 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [4]  train-error:0.176321 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [4]  train-error:0.175275 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [4]  train-error:0.173700 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [4]  train-error:0.176829 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [4]  train-error:0.176250 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [4]  train-error:0.176321 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [4]  train-error:0.175275 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [4]  train-error:0.173700 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [4]  train-error:0.176829 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [4]  train-error:0.176250 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [4]  train-error:0.167921 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [4]  train-error:0.168625 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [4]  train-error:0.165475 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [4]  train-error:0.167504 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [4]  train-error:0.165000 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [4]  train-error:0.167921 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [4]  train-error:0.168625 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [4]  train-error:0.165475 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [4]  train-error:0.167504 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [4]  train-error:0.165000 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [4]  train-error:0.167921 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [4]  train-error:0.168625 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [4]  train-error:0.165475 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [4]  train-error:0.167504 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [4]  train-error:0.165000 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [4]  train-error:0.167921 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [4]  train-error:0.168625 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [4]  train-error:0.165475 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [4]  train-error:0.167504 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [4]  train-error:0.165000 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [4]  train-error:0.167921 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [4]  train-error:0.168625 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [4]  train-error:0.165475 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [4]  train-error:0.167504 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [4]  train-error:0.165000 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [4]  train-error:0.167921 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [4]  train-error:0.168625 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [4]  train-error:0.165475 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [4]  train-error:0.167504 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [4]  train-error:0.165000 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [4]  train-error:0.167921 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [4]  train-error:0.168625 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [4]  train-error:0.165475 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [4]  train-error:0.167504 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [4]  train-error:0.165000 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [4]  train-error:0.156721 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [4]  train-error:0.159600 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [4]  train-error:0.158525 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [4]  train-error:0.161704 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [4]  train-error:0.155850 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [4]  train-error:0.156721 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [4]  train-error:0.159600 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [4]  train-error:0.158525 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [4]  train-error:0.161704 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [4]  train-error:0.155850 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [4]  train-error:0.156721 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [4]  train-error:0.159600 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [4]  train-error:0.158525 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [4]  train-error:0.161704 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [4]  train-error:0.155850 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [4]  train-error:0.156721 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [4]  train-error:0.159600 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [4]  train-error:0.158525 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [4]  train-error:0.161704 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [4]  train-error:0.155850 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [4]  train-error:0.156721 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [4]  train-error:0.159600 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [4]  train-error:0.158525 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [4]  train-error:0.161704 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [4]  train-error:0.155850 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [4]  train-error:0.156721 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [4]  train-error:0.159600 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [4]  train-error:0.158525 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [4]  train-error:0.161704 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [4]  train-error:0.155850 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [4]  train-error:0.156721 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [4]  train-error:0.159600 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [4]  train-error:0.158525 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [4]  train-error:0.161704 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [4]  train-error:0.155850 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [4]  train-error:0.143621 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [4]  train-error:0.146975 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [4]  train-error:0.143850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [4]  train-error:0.146879 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [4]  train-error:0.147675 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [4]  train-error:0.143621 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [4]  train-error:0.146975 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [4]  train-error:0.143850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [4]  train-error:0.146879 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [4]  train-error:0.147675 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [4]  train-error:0.143621 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [4]  train-error:0.146975 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [4]  train-error:0.143850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [4]  train-error:0.146879 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [4]  train-error:0.147675 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [4]  train-error:0.143621 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [4]  train-error:0.146975 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [4]  train-error:0.143850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [4]  train-error:0.146879 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [4]  train-error:0.147675 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [4]  train-error:0.143621 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [4]  train-error:0.146975 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [4]  train-error:0.143850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [4]  train-error:0.146879 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [4]  train-error:0.147675 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [4]  train-error:0.143621 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [4]  train-error:0.146975 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [4]  train-error:0.143850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [4]  train-error:0.146879 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [4]  train-error:0.147675 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [4]  train-error:0.143621 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [4]  train-error:0.146975 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [4]  train-error:0.143850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [4]  train-error:0.146879 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [4]  train-error:0.147675 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [4]  train-error:0.135422 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [4]  train-error:0.137050 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [4]  train-error:0.131825 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [4]  train-error:0.133503 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [4]  train-error:0.135800 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [4]  train-error:0.135422 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [4]  train-error:0.137050 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [4]  train-error:0.131825 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [4]  train-error:0.133503 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [4]  train-error:0.135800 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [4]  train-error:0.135422 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [4]  train-error:0.137050 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [4]  train-error:0.131825 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [4]  train-error:0.133503 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [4]  train-error:0.135800 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [4]  train-error:0.135422 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [4]  train-error:0.137050 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [4]  train-error:0.131825 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [4]  train-error:0.133503 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [4]  train-error:0.135800 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [4]  train-error:0.135422 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [4]  train-error:0.137050 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [4]  train-error:0.131825 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [4]  train-error:0.133503 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [4]  train-error:0.135800 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [4]  train-error:0.135422 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [4]  train-error:0.137050 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [4]  train-error:0.131825 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [4]  train-error:0.133503 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [4]  train-error:0.135800 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [4]  train-error:0.135422 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [4]  train-error:0.137050 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [4]  train-error:0.131825 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [4]  train-error:0.133503 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [4]  train-error:0.135800 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [4]  train-error:0.115347 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [4]  train-error:0.127175 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [4]  train-error:0.115825 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [4]  train-error:0.119403 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [4]  train-error:0.124400 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [4]  train-error:0.115347 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [4]  train-error:0.127175 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [4]  train-error:0.115825 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [4]  train-error:0.119403 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [4]  train-error:0.124400 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [4]  train-error:0.115347 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [4]  train-error:0.127175 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [4]  train-error:0.115825 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [4]  train-error:0.119403 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [4]  train-error:0.124400 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [4]  train-error:0.115347 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [4]  train-error:0.127175 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [4]  train-error:0.115825 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [4]  train-error:0.119403 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [4]  train-error:0.124400 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [4]  train-error:0.115347 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [4]  train-error:0.127175 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [4]  train-error:0.115825 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [4]  train-error:0.119403 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [4]  train-error:0.124400 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [4]  train-error:0.115347 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [4]  train-error:0.127175 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [4]  train-error:0.115825 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [4]  train-error:0.119403 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [4]  train-error:0.124400 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [4]  train-error:0.115347 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [4]  train-error:0.127175 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [4]  train-error:0.115825 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [4]  train-error:0.119403 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [4]  train-error:0.124400 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [4]  train-error:0.106022 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [4]  train-error:0.114375 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [4]  train-error:0.103850 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [4]  train-error:0.111153 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [4]  train-error:0.104150 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [4]  train-error:0.106022 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [4]  train-error:0.114375 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [4]  train-error:0.103850 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [4]  train-error:0.111153 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [4]  train-error:0.104150 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [4]  train-error:0.106022 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [4]  train-error:0.114375 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [4]  train-error:0.103850 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [4]  train-error:0.111153 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [4]  train-error:0.104150 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [4]  train-error:0.106022 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [4]  train-error:0.114375 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [4]  train-error:0.103850 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [4]  train-error:0.111153 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [4]  train-error:0.104150 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [4]  train-error:0.106022 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [4]  train-error:0.114375 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [4]  train-error:0.103850 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [4]  train-error:0.111153 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [4]  train-error:0.104150 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [4]  train-error:0.106022 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [4]  train-error:0.114375 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [4]  train-error:0.103850 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [4]  train-error:0.111153 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [4]  train-error:0.104150 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [4]  train-error:0.106022 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [4]  train-error:0.114375 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [4]  train-error:0.103850 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [4]  train-error:0.111153 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [4]  train-error:0.104150 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [4]  train-error:0.090548 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [4]  train-error:0.097975 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [4]  train-error:0.097350 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [4]  train-error:0.104828 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [4]  train-error:0.094525 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [4]  train-error:0.090548 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [4]  train-error:0.097975 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [4]  train-error:0.097350 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [4]  train-error:0.104828 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [4]  train-error:0.094525 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [4]  train-error:0.090548 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [4]  train-error:0.097975 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [4]  train-error:0.097350 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [4]  train-error:0.104828 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [4]  train-error:0.094525 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [4]  train-error:0.090548 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [4]  train-error:0.097975 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [4]  train-error:0.097350 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [4]  train-error:0.104828 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [4]  train-error:0.094525 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [4]  train-error:0.090548 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [4]  train-error:0.097975 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [4]  train-error:0.097350 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [4]  train-error:0.104828 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [4]  train-error:0.094525 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [4]  train-error:0.090548 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [4]  train-error:0.097975 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [4]  train-error:0.097350 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [4]  train-error:0.104828 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [4]  train-error:0.094525 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [4]  train-error:0.090548 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [4]  train-error:0.097975 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [4]  train-error:0.097350 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [4]  train-error:0.104828 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [4]  train-error:0.094525
xgb_cv_df %>% 
        arrange(desc(avg_f1)) %>% 
        head()
##    id nrounds max_depnth threshod   avg_sen  avg_prec    avg_f1
## 1 346       4          9      0.2 0.6863956 0.3440655 0.4583503
## 2 269       3         12      0.2 0.6669034 0.3492233 0.4583456
## 3 255       3         10      0.2 0.6898648 0.3430115 0.4581469
## 4 367       4         12      0.2 0.6501676 0.3528064 0.4573673
## 5 248       3          9      0.2 0.6910896 0.3414836 0.4570883
## 6 332       4          7      0.2 0.6872108 0.3415076 0.4562319
##         sd_f1
## 1 0.003508941
## 2 0.007325933
## 3 0.004356323
## 4 0.006197740
## 5 0.004473675
## 6 0.006541971
# sparse model matrix 
# imputed_train_data[,-23]

sparse_train_data <- sparse.model.matrix( EXTRA_BAGGAGE ~ . - EXTRA_BAGGAGE ,
                                          data = imputed_train_data[,-c(23)] )
# colnames(sparse_train_data)

#Parameters
# max_depth = 2: the trees won’t be deep, because our case is very simple ; 
# nthread = 3: the number of cpu threads we are going to use;
# nrounds = 2: there will be two passes on the data, the second one will enhance the model by 

## Since the labels must be numeric 0 and 1 
# table(imputed_train_data$EXTRA_BAGGAGE)
xgboost_labels <- imputed_train_data$EXTRA_BAGGAGE
levels(xgboost_labels) <- c(0,1)
# table(xgboost_labels)
xgboost_labels <- as.integer(xgboost_labels)
xgboost_labels <- xgboost_labels - 1 



# CV preparation 
set.seed(13)
# Let's this time do the CV folding by caret package 
flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
# names(flds)[1] <- "train"
auc<- vector(length = 5)
id <- 0 
xgb_cv_df <- data.frame(id = NA , nrounds = NA , max_depnth = NA , 
                        mean_AUC = NA)

#for tuning nrounds
for(n in 1:4){
#for tuning max depth 
for(d in 2:15) {
         
# tuning threshold of response variable

        id <- id + 1 
        # print(id)
       
        # CV 
        for (k in 1:5){
                # print(paste0("k",k))
                # k = 1 
                # t = 0.1 
                
                
                bstSparse <- xgboost(data = sparse_train_data[-flds[[k]],],
                                     label = xgboost_labels[-flds[[k]]],
                                     max_depth = d, eta = 1,
                                     nthread = 3, nrounds = n,
                                     objective = "binary:logistic")
                
                xgb_prediction <- predict(bstSparse ,
                                          sparse_train_data[flds[[k]],], type = "response")
                
                

                ROCRpredTest = prediction(xgb_prediction , train_data$EXTRA_BAGGAGE[flds[[k]]])
                auc[k] = as.numeric(performance(ROCRpredTest, "auc")@y.values)
                
                
        }
        xgb_cv_df[id,] <- c(id , n ,  d , mean(auc))

        

}#depth 
}#nrounds
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195995 
## [1]  train-error:0.195975 
## [1]  train-error:0.195975 
## [1]  train-error:0.195980 
## [1]  train-error:0.195975 
## [1]  train-error:0.195620 
## [1]  train-error:0.195300 
## [1]  train-error:0.195650 
## [1]  train-error:0.195605 
## [1]  train-error:0.195475 
## [1]  train-error:0.194820 
## [1]  train-error:0.194575 
## [1]  train-error:0.194200 
## [1]  train-error:0.192480 
## [1]  train-error:0.192425 
## [1]  train-error:0.191820 
## [1]  train-error:0.193675 
## [1]  train-error:0.193850 
## [1]  train-error:0.190405 
## [1]  train-error:0.190225 
## [1]  train-error:0.190295 
## [1]  train-error:0.190775 
## [1]  train-error:0.189900 
## [1]  train-error:0.189205 
## [1]  train-error:0.189000 
## [1]  train-error:0.187745 
## [1]  train-error:0.187575 
## [1]  train-error:0.187700 
## [1]  train-error:0.187280 
## [1]  train-error:0.186150 
## [1]  train-error:0.183120 
## [1]  train-error:0.183950 
## [1]  train-error:0.184150 
## [1]  train-error:0.183905 
## [1]  train-error:0.182850 
## [1]  train-error:0.176321 
## [1]  train-error:0.179125 
## [1]  train-error:0.178025 
## [1]  train-error:0.177879 
## [1]  train-error:0.177875 
## [1]  train-error:0.169721 
## [1]  train-error:0.172925 
## [1]  train-error:0.170850 
## [1]  train-error:0.171254 
## [1]  train-error:0.171575 
## [1]  train-error:0.162271 
## [1]  train-error:0.167300 
## [1]  train-error:0.163425 
## [1]  train-error:0.164004 
## [1]  train-error:0.165375 
## [1]  train-error:0.154296 
## [1]  train-error:0.160400 
## [1]  train-error:0.156600 
## [1]  train-error:0.157979 
## [1]  train-error:0.158700 
## [1]  train-error:0.148071 
## [1]  train-error:0.153825 
## [1]  train-error:0.148950 
## [1]  train-error:0.151004 
## [1]  train-error:0.151900 
## [1]  train-error:0.142771 
## [1]  train-error:0.147000 
## [1]  train-error:0.143175 
## [1]  train-error:0.145604 
## [1]  train-error:0.145525 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [1]  train-error:0.195995 
## [2]  train-error:0.195995 
## [3]  train-error:0.195995 
## [4]  train-error:0.195995 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.196225 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193500 
## [1]  train-error:0.195980 
## [2]  train-error:0.195980 
## [3]  train-error:0.195980 
## [4]  train-error:0.195980 
## [1]  train-error:0.195975 
## [2]  train-error:0.195975 
## [3]  train-error:0.195975 
## [4]  train-error:0.193400 
## [1]  train-error:0.195995 
## [2]  train-error:0.193970 
## [3]  train-error:0.191795 
## [4]  train-error:0.192845 
## [1]  train-error:0.195975 
## [2]  train-error:0.195525 
## [3]  train-error:0.191975 
## [4]  train-error:0.190925 
## [1]  train-error:0.195975 
## [2]  train-error:0.193500 
## [3]  train-error:0.190925 
## [4]  train-error:0.190850 
## [1]  train-error:0.195980 
## [2]  train-error:0.193280 
## [3]  train-error:0.193280 
## [4]  train-error:0.192755 
## [1]  train-error:0.195975 
## [2]  train-error:0.193400 
## [3]  train-error:0.193400 
## [4]  train-error:0.192350 
## [1]  train-error:0.195620 
## [2]  train-error:0.193595 
## [3]  train-error:0.193320 
## [4]  train-error:0.190495 
## [1]  train-error:0.195300 
## [2]  train-error:0.193000 
## [3]  train-error:0.190650 
## [4]  train-error:0.190175 
## [1]  train-error:0.195650 
## [2]  train-error:0.193175 
## [3]  train-error:0.190025 
## [4]  train-error:0.190150 
## [1]  train-error:0.195605 
## [2]  train-error:0.193330 
## [3]  train-error:0.190605 
## [4]  train-error:0.191105 
## [1]  train-error:0.195475 
## [2]  train-error:0.195650 
## [3]  train-error:0.192800 
## [4]  train-error:0.190700 
## [1]  train-error:0.194820 
## [2]  train-error:0.192245 
## [3]  train-error:0.192795 
## [4]  train-error:0.190595 
## [1]  train-error:0.194575 
## [2]  train-error:0.191875 
## [3]  train-error:0.189325 
## [4]  train-error:0.188100 
## [1]  train-error:0.194200 
## [2]  train-error:0.192000 
## [3]  train-error:0.190300 
## [4]  train-error:0.188500 
## [1]  train-error:0.192480 
## [2]  train-error:0.191930 
## [3]  train-error:0.189930 
## [4]  train-error:0.189730 
## [1]  train-error:0.192425 
## [2]  train-error:0.190700 
## [3]  train-error:0.190875 
## [4]  train-error:0.189250 
## [1]  train-error:0.191820 
## [2]  train-error:0.190620 
## [3]  train-error:0.187770 
## [4]  train-error:0.185820 
## [1]  train-error:0.193675 
## [2]  train-error:0.190400 
## [3]  train-error:0.188850 
## [4]  train-error:0.187300 
## [1]  train-error:0.193850 
## [2]  train-error:0.190450 
## [3]  train-error:0.187925 
## [4]  train-error:0.185175 
## [1]  train-error:0.190405 
## [2]  train-error:0.189730 
## [3]  train-error:0.188505 
## [4]  train-error:0.187080 
## [1]  train-error:0.190225 
## [2]  train-error:0.188700 
## [3]  train-error:0.187425 
## [4]  train-error:0.185850 
## [1]  train-error:0.190295 
## [2]  train-error:0.187045 
## [3]  train-error:0.185695 
## [4]  train-error:0.181020 
## [1]  train-error:0.190775 
## [2]  train-error:0.187275 
## [3]  train-error:0.184925 
## [4]  train-error:0.182700 
## [1]  train-error:0.189900 
## [2]  train-error:0.187050 
## [3]  train-error:0.185025 
## [4]  train-error:0.181425 
## [1]  train-error:0.189205 
## [2]  train-error:0.186605 
## [3]  train-error:0.184480 
## [4]  train-error:0.181730 
## [1]  train-error:0.189000 
## [2]  train-error:0.187425 
## [3]  train-error:0.185225 
## [4]  train-error:0.182150 
## [1]  train-error:0.187745 
## [2]  train-error:0.184270 
## [3]  train-error:0.180845 
## [4]  train-error:0.176321 
## [1]  train-error:0.187575 
## [2]  train-error:0.183950 
## [3]  train-error:0.179375 
## [4]  train-error:0.175275 
## [1]  train-error:0.187700 
## [2]  train-error:0.182625 
## [3]  train-error:0.177975 
## [4]  train-error:0.173700 
## [1]  train-error:0.187280 
## [2]  train-error:0.184355 
## [3]  train-error:0.179629 
## [4]  train-error:0.176829 
## [1]  train-error:0.186150 
## [2]  train-error:0.183750 
## [3]  train-error:0.178850 
## [4]  train-error:0.176250 
## [1]  train-error:0.183120 
## [2]  train-error:0.178046 
## [3]  train-error:0.172871 
## [4]  train-error:0.167921 
## [1]  train-error:0.183950 
## [2]  train-error:0.178700 
## [3]  train-error:0.172850 
## [4]  train-error:0.168625 
## [1]  train-error:0.184150 
## [2]  train-error:0.177600 
## [3]  train-error:0.170275 
## [4]  train-error:0.165475 
## [1]  train-error:0.183905 
## [2]  train-error:0.177204 
## [3]  train-error:0.171329 
## [4]  train-error:0.167504 
## [1]  train-error:0.182850 
## [2]  train-error:0.177975 
## [3]  train-error:0.170475 
## [4]  train-error:0.165000 
## [1]  train-error:0.176321 
## [2]  train-error:0.169621 
## [3]  train-error:0.163196 
## [4]  train-error:0.156721 
## [1]  train-error:0.179125 
## [2]  train-error:0.172325 
## [3]  train-error:0.164950 
## [4]  train-error:0.159600 
## [1]  train-error:0.178025 
## [2]  train-error:0.171475 
## [3]  train-error:0.164375 
## [4]  train-error:0.158525 
## [1]  train-error:0.177879 
## [2]  train-error:0.171654 
## [3]  train-error:0.166879 
## [4]  train-error:0.161704 
## [1]  train-error:0.177875 
## [2]  train-error:0.169950 
## [3]  train-error:0.162000 
## [4]  train-error:0.155850 
## [1]  train-error:0.169721 
## [2]  train-error:0.161096 
## [3]  train-error:0.151446 
## [4]  train-error:0.143621 
## [1]  train-error:0.172925 
## [2]  train-error:0.165575 
## [3]  train-error:0.154000 
## [4]  train-error:0.146975 
## [1]  train-error:0.170850 
## [2]  train-error:0.163100 
## [3]  train-error:0.151850 
## [4]  train-error:0.143850 
## [1]  train-error:0.171254 
## [2]  train-error:0.162204 
## [3]  train-error:0.151729 
## [4]  train-error:0.146879 
## [1]  train-error:0.171575 
## [2]  train-error:0.161025 
## [3]  train-error:0.153900 
## [4]  train-error:0.147675 
## [1]  train-error:0.162271 
## [2]  train-error:0.150896 
## [3]  train-error:0.142271 
## [4]  train-error:0.135422 
## [1]  train-error:0.167300 
## [2]  train-error:0.155350 
## [3]  train-error:0.146425 
## [4]  train-error:0.137050 
## [1]  train-error:0.163425 
## [2]  train-error:0.151750 
## [3]  train-error:0.139200 
## [4]  train-error:0.131825 
## [1]  train-error:0.164004 
## [2]  train-error:0.153879 
## [3]  train-error:0.143579 
## [4]  train-error:0.133503 
## [1]  train-error:0.165375 
## [2]  train-error:0.153425 
## [3]  train-error:0.142625 
## [4]  train-error:0.135800 
## [1]  train-error:0.154296 
## [2]  train-error:0.139122 
## [3]  train-error:0.125797 
## [4]  train-error:0.115347 
## [1]  train-error:0.160400 
## [2]  train-error:0.147925 
## [3]  train-error:0.134475 
## [4]  train-error:0.127175 
## [1]  train-error:0.156600 
## [2]  train-error:0.144975 
## [3]  train-error:0.128100 
## [4]  train-error:0.115825 
## [1]  train-error:0.157979 
## [2]  train-error:0.143904 
## [3]  train-error:0.128178 
## [4]  train-error:0.119403 
## [1]  train-error:0.158700 
## [2]  train-error:0.144000 
## [3]  train-error:0.132325 
## [4]  train-error:0.124400 
## [1]  train-error:0.148071 
## [2]  train-error:0.130072 
## [3]  train-error:0.116297 
## [4]  train-error:0.106022 
## [1]  train-error:0.153825 
## [2]  train-error:0.138425 
## [3]  train-error:0.122450 
## [4]  train-error:0.114375 
## [1]  train-error:0.148950 
## [2]  train-error:0.137450 
## [3]  train-error:0.118675 
## [4]  train-error:0.103850 
## [1]  train-error:0.151004 
## [2]  train-error:0.134628 
## [3]  train-error:0.121928 
## [4]  train-error:0.111153 
## [1]  train-error:0.151900 
## [2]  train-error:0.136175 
## [3]  train-error:0.118325 
## [4]  train-error:0.104150 
## [1]  train-error:0.142771 
## [2]  train-error:0.121797 
## [3]  train-error:0.101772 
## [4]  train-error:0.090548 
## [1]  train-error:0.147000 
## [2]  train-error:0.129475 
## [3]  train-error:0.112050 
## [4]  train-error:0.097975 
## [1]  train-error:0.143175 
## [2]  train-error:0.128700 
## [3]  train-error:0.113650 
## [4]  train-error:0.097350 
## [1]  train-error:0.145604 
## [2]  train-error:0.128078 
## [3]  train-error:0.118003 
## [4]  train-error:0.104828 
## [1]  train-error:0.145525 
## [2]  train-error:0.125150 
## [3]  train-error:0.108700 
## [4]  train-error:0.094525
xgb_cv_df %>% 
        arrange(desc(mean_AUC)) %>% 
        head()
##   id nrounds max_depnth  mean_AUC
## 1 49       4          8 0.7523877
## 2 50       4          9 0.7521100
## 3 33       3          6 0.7516437
## 4 47       4          6 0.7515910
## 5 48       4          7 0.7514230
## 6 36       3          9 0.7513118

So the optimum parameters that I could find are nround = 4 , max_depth = 9, threshold = 0.2 which yield F1 = 0.4583503.

I tuned the model on CV-AUC, the optimum parameters became 4 and 8 but the mean AUC reduced to ~ 0.75

#write_csv(x=xgb_cv_df , "xgb_cv_f1.csv")
xgb_cv_df <- read_csv( "xgb_cv_f1.csv")
# parameters tuning viz 
xgb_3dplot_param <- plot_ly(xgb_cv_df, x = ~nrounds, y = ~max_depnth, z = ~threshod,
        marker = list(color = ~avg_f1, colorscale = c('#0000FF', '#683531'), showscale = TRUE)) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'max number of boosting iterations'),
                     yaxis = list(title = 'maximum depth of tree'),
                     zaxis = list(title = 'Prediction threshold')))

xgb_3dplot_param
# evaluation metrics 3d scatterplot

xgb_3dplot_param <- plot_ly(xgb_cv_df, x = ~avg_sen, y = ~avg_prec, z = ~avg_f1,
        marker = list(color = ~avg_f1, colorscale = c('#0000FF', '#683531'), showscale = TRUE)) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'CV avg senstitivity'),
                     yaxis = list(title = 'CV avg Precision'),
                     zaxis = list(title = 'CV avg F1')))

xgb_3dplot_param
# importance of features 
xgb_best <- xgboost(data = sparse_train_data,
                                     label = xgboost_labels,
                                     max_depth = 9, eta = 1,
                                     nthread = 3, nrounds = 4,
                                     objective = "binary:logistic")
## [1]  train-error:0.183540 
## [2]  train-error:0.180020 
## [3]  train-error:0.174940 
## [4]  train-error:0.172180
importance_matrix <- xgb.importance(model = xgb_best)
print(importance_matrix)
##              Feature        Gain        Cover   Frequency
##  1:           NO_GDS 0.182486238 0.0974745488 0.026183283
##  2: staying_duration 0.181222928 0.1963888401 0.121852971
##  3:         DISTANCE 0.156807003 0.1704233508 0.302114804
##  4:       HAUL_TYPE3 0.115334464 0.1093858214 0.007049345
##  5:           ADULTS 0.083313623 0.0619885874 0.046324270
##  6:         country2 0.022626243 0.0133467201 0.032225579
##  7:       TRIP_TYPE3 0.020851480 0.0231521499 0.028197382
##  8:       HAUL_TYPE2 0.018585751 0.0145279171 0.016112790
##  9:         website3 0.016678225 0.0318855321 0.021148036
## 10: DEVICESMARTPHONE 0.016243683 0.0279488458 0.028197382
## 11:         website2 0.016091872 0.0394255482 0.015105740
## 12:       dep_month2 0.014389289 0.0189477874 0.026183283
## 13:         website4 0.013660134 0.0229292646 0.006042296
## 14:              GDS 0.012264100 0.0148545548 0.032225579
## 15:         CHILDREN 0.011570910 0.0137293813 0.025176234
## 16:      DEVICEOTHER 0.010082995 0.0162830805 0.008056395
## 17:         dep_day2 0.009165273 0.0181224513 0.015105740
## 18:         arr_day2 0.009042725 0.0053397473 0.027190332
## 19:        arr_wday3 0.006523652 0.0067251711 0.018126888
## 20:       arr_month2 0.006417592 0.0086691373 0.014098691
## 21:        arr_wday7 0.006392173 0.0077337304 0.016112790
## 22:        arr_wday2 0.006340769 0.0027847844 0.012084592
## 23:        dep_wday6 0.006174384 0.0054222935 0.015105740
## 24:        dep_wday4 0.005927478 0.0028601704 0.017119839
## 25:           TRAIN2 0.005658069 0.0040244545 0.002014099
## 26:        dep_wday2 0.005518894 0.0102895662 0.011077543
## 27:        dep_wday7 0.005353776 0.0088682779 0.014098691
## 28:        dep_wday3 0.005103010 0.0052693321 0.011077543
## 29:        arr_wday6 0.004260805 0.0042513881 0.010070493
## 30:         PRODUCT2 0.004223782 0.0121547103 0.007049345
## 31:     DEVICETABLET 0.003691355 0.0032452732 0.014098691
## 32:             SMS2 0.003623258 0.0009732833 0.016112790
## 33:          INFANTS 0.003398870 0.0065423110 0.006042296
## 34:        arr_wday4 0.003223997 0.0052746761 0.009063444
## 35:        arr_wday5 0.003024451 0.0012865353 0.007049345
## 36:        dep_wday5 0.002918590 0.0024872271 0.009063444
## 37:       TRIP_TYPE2 0.001808161 0.0049835487 0.006042296
##              Feature        Gain        Cover   Frequency
xgb.plot.importance(importance_matrix = importance_matrix)

It is interesting that NO_GDS is a top variable, and then staying_duration. I am proud of my feature engineering.

Now I do a CV-ROC computation to evaluate my XGBoost model. The idea is doing 5fold CV with 5 repetitions, so I would have 5 probability cv-prediction for each observation. Then I calculate mean and median of them and I produce two ROC plots, one for mean and one for median. I do median because of unknown distribution of the prob predictions, as if it is skewed then mean would not be a good metric.

#So the optimum parameters that I could find are nround = 4 , max_depth = 9, threshold = 0.2 which yield 0.4583503. 

sparse_train_data <- sparse.model.matrix( EXTRA_BAGGAGE ~ . - EXTRA_BAGGAGE ,
                                          data = imputed_train_data[,-c(23)] )


xgboost_labels <- imputed_train_data$EXTRA_BAGGAGE
levels(xgboost_labels) <- c(0,1)

xgboost_labels <- as.integer(xgboost_labels)
xgboost_labels <- xgboost_labels - 1 



final_df <- data.frame(id = 1:50000, prob1 = NA , prob2 = NA , prob3 = NA, prob4 = NA , prob5 = NA  )

for (i in 1:5){
        set.seed(i)
        # Let's this time do the CV folding by caret package 
        flds <- createFolds(y = imputed_train_data$EXTRA_BAGGAGE, k = 5, list = TRUE, returnTrain = FALSE)
        
        predictions_df <- data.frame(id = 1:50000, prob1 = NA  )
        cv_pred_df <- data.frame(id = NA, prob = NA )
        
        for(k in 1:5){
                # k = 1 
                best_xgboost <- xgboost(data = sparse_train_data[-flds[[k]],],
                                     label = xgboost_labels[-flds[[k]]],
                                     max_depth = 9, eta = 1,
                                     nthread = 3, nrounds = 4,
                                     objective = "binary:logistic")
                
                xgb_prediction <- predict(best_xgboost ,
                                          sparse_train_data[flds[[k]],], type = "response")
                
                cv_df <- data.frame(id = flds[[k]],prob = xgb_prediction)
                
                # cv_pred_df <- rbind(cv_pred_df,data.frame(id = flds[[k]],prob = xgb_prediction))
                
                predictions_df<- 
                predictions_df %>%
                        left_join(y = cv_df , by = "id")
                
        }
        
        final_df[,(i+1)] <- apply(predictions_df[,-c(1,2)], MARGIN = 1 , FUN = sum , na.rm = TRUE)
        
        
}
## [1]  train-error:0.180700 
## [2]  train-error:0.173325 
## [3]  train-error:0.169000 
## [4]  train-error:0.166775 
## [1]  train-error:0.182880 
## [2]  train-error:0.178179 
## [3]  train-error:0.171829 
## [4]  train-error:0.167154 
## [1]  train-error:0.183675 
## [2]  train-error:0.177825 
## [3]  train-error:0.172825 
## [4]  train-error:0.166825 
## [1]  train-error:0.182400 
## [2]  train-error:0.177575 
## [3]  train-error:0.172650 
## [4]  train-error:0.167450 
## [1]  train-error:0.183020 
## [2]  train-error:0.179696 
## [3]  train-error:0.173721 
## [4]  train-error:0.169746 
## [1]  train-error:0.183755 
## [2]  train-error:0.179980 
## [3]  train-error:0.172429 
## [4]  train-error:0.166804 
## [1]  train-error:0.183250 
## [2]  train-error:0.177550 
## [3]  train-error:0.173200 
## [4]  train-error:0.170625 
## [1]  train-error:0.181820 
## [2]  train-error:0.175496 
## [3]  train-error:0.170896 
## [4]  train-error:0.166496 
## [1]  train-error:0.184250 
## [2]  train-error:0.177875 
## [3]  train-error:0.171200 
## [4]  train-error:0.166750 
## [1]  train-error:0.184825 
## [2]  train-error:0.178800 
## [3]  train-error:0.172450 
## [4]  train-error:0.166175 
## [1]  train-error:0.182755 
## [2]  train-error:0.178554 
## [3]  train-error:0.172904 
## [4]  train-error:0.164904 
## [1]  train-error:0.182500 
## [2]  train-error:0.176700 
## [3]  train-error:0.171000 
## [4]  train-error:0.166100 
## [1]  train-error:0.182450 
## [2]  train-error:0.177900 
## [3]  train-error:0.172875 
## [4]  train-error:0.168375 
## [1]  train-error:0.181645 
## [2]  train-error:0.176471 
## [3]  train-error:0.172121 
## [4]  train-error:0.167721 
## [1]  train-error:0.182700 
## [2]  train-error:0.178300 
## [3]  train-error:0.172675 
## [4]  train-error:0.169575 
## [1]  train-error:0.183450 
## [2]  train-error:0.177350 
## [3]  train-error:0.174025 
## [4]  train-error:0.167025 
## [1]  train-error:0.182550 
## [2]  train-error:0.176425 
## [3]  train-error:0.171875 
## [4]  train-error:0.167600 
## [1]  train-error:0.183375 
## [2]  train-error:0.179375 
## [3]  train-error:0.175825 
## [4]  train-error:0.171050 
## [1]  train-error:0.182925 
## [2]  train-error:0.178575 
## [3]  train-error:0.172475 
## [4]  train-error:0.167550 
## [1]  train-error:0.183375 
## [2]  train-error:0.177075 
## [3]  train-error:0.170950 
## [4]  train-error:0.167250 
## [1]  train-error:0.182725 
## [2]  train-error:0.178600 
## [3]  train-error:0.174875 
## [4]  train-error:0.170000 
## [1]  train-error:0.183800 
## [2]  train-error:0.177575 
## [3]  train-error:0.172600 
## [4]  train-error:0.168750 
## [1]  train-error:0.182100 
## [2]  train-error:0.176525 
## [3]  train-error:0.172050 
## [4]  train-error:0.168800 
## [1]  train-error:0.183200 
## [2]  train-error:0.178950 
## [3]  train-error:0.171925 
## [4]  train-error:0.167725 
## [1]  train-error:0.183350 
## [2]  train-error:0.176850 
## [3]  train-error:0.171375 
## [4]  train-error:0.167075
#CV-ROC Mean
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , mean )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Mean
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc
## [1] 0.7737647
#CV-ROC Median
prob_for_roc <- apply(final_df[,-1] , MARGIN = 1 , median )


ROCRpred = prediction( prob_for_roc , train_data$EXTRA_BAGGAGE )
ROCRperf = performance(ROCRpred, "tpr", "fpr")
plot(ROCRperf, colorize = TRUE, print.cutoffs.at= seq(0,1,0.1), text.adj=c(-0.2,1.7),title = "CV-ROC Mean")

# AUC of CV-ROC Median
ROCRpredTest = prediction(prob_for_roc , train_data$EXTRA_BAGGAGE)
auc = as.numeric(performance(ROCRpredTest, "auc")@y.values)
auc
## [1] 0.7716794

0.77 AUC is very good, and both mean and median probs show consistency.