This document summarizes the solution to the Santanger Customer Satisfaction Competition hosted by Kaggle in winter-spring 2016. See details of the competition and download the files at https://www.kaggle.com/c/santander-customer-satisfaction.
The final solition was based on several xboost models with some pre-processing of the data.
The datafiles are quite large, so we will be using data.table::fread to inhale it. We will remove ID variable and extract the TARGET variable to avoid accidental leakage.
library(xgboost)
library(Matrix)
require(MatrixModels)
## Loading required package: MatrixModels
library(data.table)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
require(rms)
## Loading required package: rms
## Loading required package: Hmisc
## Loading required package: survival
##
## Attaching package: 'survival'
## The following object is masked from 'package:caret':
##
## cluster
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
library(vtreat)
rm(list=ls())
train <- fread("./input/train.csv", integer64 = "numeric", data.table = FALSE)
##
Read 13.2% of 76020 rows
Read 52.6% of 76020 rows
Read 76020 rows and 371 (of 371) columns from 0.055 GB file in 00:00:05
test <- fread("./input/test.csv", integer64 = "numeric", data.table = FALSE)
##
Read 52.8% of 75818 rows
Read 75818 rows and 370 (of 370) columns from 0.055 GB file in 00:00:04
#str(train); summary(train)
##### Removing IDs
train$ID <- NULL
test.id <- test$ID
test$ID <- NULL
##### Extracting TARGET
train.y <- train$TARGET
train$TARGET <- NULL
The data seems to very sparse. In such cases NAs and zero values may carry additional information, especially since the TARGET variable is related to customer satisfaction. Looking closer at the data we see that some of the variables look like ordinal values with the multiple of 3. We will count how many of those features we have in the dataset (by row).
##### 0 count per line
count0 <- function(x) {
sum(x == 0, na.rm = T)
}
count3mod <- function(x) {
sum(x %% 3 == 0, na.rm = T)
}
train$n0 <- apply(train, 1, FUN=count0)
test$n0 <- apply(test, 1, FUN=count0)
train$mod3 <- apply(train, 1, FUN=count3mod)
test$mod3 <- apply(test, 1, FUN=count3mod)
Quickly looking at the histograms we notice that some variables contain extreme values (like 99 or -999999), which are likely to be “placeholders” for missing values. Both var3 and var36 seem to be categorical. We will convert them to factors.
Also, var38 has a special mode of 117310.979016494 which looks like mean or median value filled in instead of missing values.
Let’s reintroduce the missing values to deal with them in the consistent manner later.
Once mode value is removed var38 looks near-perfect. The shape of the distribution suggests that this might be net worth, or income or other measure of monetary value.
train[train$var3==-999999, "var3"] <- NA
test[test$var3==-999999, "var3"] <- NA
train[train$var36==99, "var36"] <- NA
test[test$var36==99, "var36"] <- NA
train[train$var38==117310.979016494, "var38"] <- NA
test[test$var38==117310.979016494, "var38"] <- NA
hist(log1p(train$var38), 100)
In the result of extensive EDA the following variable map was built:
The above map illustrates the relationship between the variables (aggregation) and the grouping of variables by the type of activity they, likely, describe. Since the dataset is semi-anonimized it is difficult to be certain for sure, but the hypothesis was that the variable group rolling up to var30 describes the cash products, the variables rolling up to var01 relate to card products and those rolling up to var31 are likely loan products. The nature of other groups of variables is less clear.
We will now remove constant and identical features to reduce dimensions of the dataset before further processing.
##### Removing constant features
cat("\n## Removing the constants features.\n")
##
## ## Removing the constants features.
for (f in names(train)) {
if (length(unique(train[[f]])) == 1) {
cat(f, "is constant in train. We delete it.\n")
train[[f]] <- NULL
test[[f]] <- NULL
}
}
## ind_var2_0 is constant in train. We delete it.
## ind_var2 is constant in train. We delete it.
## ind_var27_0 is constant in train. We delete it.
## ind_var28_0 is constant in train. We delete it.
## ind_var28 is constant in train. We delete it.
## ind_var27 is constant in train. We delete it.
## ind_var41 is constant in train. We delete it.
## ind_var46_0 is constant in train. We delete it.
## ind_var46 is constant in train. We delete it.
## num_var27_0 is constant in train. We delete it.
## num_var28_0 is constant in train. We delete it.
## num_var28 is constant in train. We delete it.
## num_var27 is constant in train. We delete it.
## num_var41 is constant in train. We delete it.
## num_var46_0 is constant in train. We delete it.
## num_var46 is constant in train. We delete it.
## saldo_var28 is constant in train. We delete it.
## saldo_var27 is constant in train. We delete it.
## saldo_var41 is constant in train. We delete it.
## saldo_var46 is constant in train. We delete it.
## imp_amort_var18_hace3 is constant in train. We delete it.
## imp_amort_var34_hace3 is constant in train. We delete it.
## imp_reemb_var13_hace3 is constant in train. We delete it.
## imp_reemb_var33_hace3 is constant in train. We delete it.
## imp_trasp_var17_out_hace3 is constant in train. We delete it.
## imp_trasp_var33_out_hace3 is constant in train. We delete it.
## num_var2_0_ult1 is constant in train. We delete it.
## num_var2_ult1 is constant in train. We delete it.
## num_reemb_var13_hace3 is constant in train. We delete it.
## num_reemb_var33_hace3 is constant in train. We delete it.
## num_trasp_var17_out_hace3 is constant in train. We delete it.
## num_trasp_var33_out_hace3 is constant in train. We delete it.
## saldo_var2_ult1 is constant in train. We delete it.
## saldo_medio_var13_medio_hace3 is constant in train. We delete it.
##### Removing identical features
features_pair <- combn(names(train), 2, simplify = F)
toRemove <- c()
for(pair in features_pair) {
f1 <- pair[1]
f2 <- pair[2]
if (!(f1 %in% toRemove) & !(f2 %in% toRemove)) {
if (all(train[[f1]] == train[[f2]])) {
cat(f1, "and", f2, "are equals.\n")
toRemove <- c(toRemove, f2)
}
}
}
## ind_var6_0 and ind_var29_0 are equals.
## ind_var6 and ind_var29 are equals.
## ind_var13_medio_0 and ind_var13_medio are equals.
## ind_var18_0 and ind_var18 are equals.
## ind_var26_0 and ind_var26 are equals.
## ind_var25_0 and ind_var25 are equals.
## ind_var32_0 and ind_var32 are equals.
## ind_var34_0 and ind_var34 are equals.
## ind_var37_0 and ind_var37 are equals.
## ind_var40 and ind_var39 are equals.
## num_var6_0 and num_var29_0 are equals.
## num_var6 and num_var29 are equals.
## num_var13_medio_0 and num_var13_medio are equals.
## num_var18_0 and num_var18 are equals.
## num_var26_0 and num_var26 are equals.
## num_var25_0 and num_var25 are equals.
## num_var32_0 and num_var32 are equals.
## num_var34_0 and num_var34 are equals.
## num_var37_0 and num_var37 are equals.
## num_var40 and num_var39 are equals.
## saldo_var6 and saldo_var29 are equals.
## saldo_var13_medio and saldo_medio_var13_medio_ult1 are equals.
## delta_imp_reemb_var13_1y3 and delta_num_reemb_var13_1y3 are equals.
## delta_imp_reemb_var17_1y3 and delta_num_reemb_var17_1y3 are equals.
## delta_imp_reemb_var33_1y3 and delta_num_reemb_var33_1y3 are equals.
## delta_imp_trasp_var17_in_1y3 and delta_num_trasp_var17_in_1y3 are equals.
## delta_imp_trasp_var17_out_1y3 and delta_num_trasp_var17_out_1y3 are equals.
## delta_imp_trasp_var33_in_1y3 and delta_num_trasp_var33_in_1y3 are equals.
## delta_imp_trasp_var33_out_1y3 and delta_num_trasp_var33_out_1y3 are equals.
feature.names <- setdiff(names(train), toRemove)
train <- train[, feature.names]
test <- test[, feature.names]
Dealing with missing value can be tricky due to the danger of overfitting. In this competition we will be using the excellent vtreat package by WinVector LLC to clean and normalize the variables. More information about the package are available in reference manual and vignettes on CRAN.
First of all we will reintroduce the label variable and indicate some of the features as categorical for vtreat to apply proper method of imputation/processing.
train$TARGET <- train.y
# Make some variable categorical
train$var3 <- as.factor(train$var3)
test$var3 <- as.factor(test$var3)
train$var36 <- as.factor(train$var36)
test$var36 <- as.factor(test$var36)
library(parallel)
no_cores <- detectCores()-1 # Calculate the number of cores
cl <- makeCluster(no_cores) # Initiate cluster
##### VTREAT ########
#set.seed(1234)
#treatmentsC <- designTreatmentsC(train,colnames(train),'TARGET',1, verbose=F)
#dTrainCTreated <- prepare(treatmentsC,train,pruneSig=0.5,scale=FALSE)
#dTestCTreated <- prepare(treatmentsC,test,pruneSig=0.5,scale=FALSE)
set.seed(1234)
prep <- vtreat::mkCrossFrameCExperiment(dframe=train, varlist=colnames(train),outcomename = "TARGET",
outcometarget=1, rareCount=2, scale = F,
ncross = 5, parallelCluster=cl)
dTrainCTreated <- prep$crossFrame
treatments <- prep$treatments
treatmentsSF <- treatments$scoreFrame
dTestCTreated <- vtreat::prepare(treatments,test,pruneSig=0.5,scale=F)
## Warning in vtreat::prepare(treatments, test, pruneSig = 0.5, scale = F):
## variable imp_aport_var33_hace3 expected type/class integer integer saw
## double numeric
## Warning in vtreat::prepare(treatments, test, pruneSig = 0.5, scale = F):
## variable imp_aport_var33_ult1 expected type/class integer integer saw
## double numeric
# limit train columns to those found significant by prepare function given threshold + the TARGET variable
dTrainCTreated <- dTrainCTreated[, c(colnames(dTestCTreated), "TARGET")]
train <- dTrainCTreated
test <- dTestCTreated
rm(dTrainCTreated, dTestCTreated, treatments, treatmentsSF)
We will now build a simple xgboost model that we will be able to cross-validate. The data is quite noisy and the model is difficult to tune. This competition is evaluated using “area under the ROC” metric (“auc”), but we will use other classification metrics available through rms package to calculate, among other things Brier score. We will use this and other metrics for more reliable cross-validation.
See rmsreference on CRAN for details of the val.prob function.
trainLGOCVxgb_C <- function(train, valid){
train.y <- train[, 'TARGET']
valid.y <- valid[, 'TARGET']
# Matrix
train <- sparse.model.matrix(TARGET ~ .-1, data = train)
valid <- sparse.model.matrix(TARGET ~ .-1, data = valid)
dtrain <- xgb.DMatrix(train, label = train.y)
dvalid <- xgb.DMatrix(valid, label= valid.y)
watchlist <- list(valid = dvalid, train = dtrain)
params <- list(booster = "gbtree", objective = "binary:logistic", eval_metric = "auc"
, max_depth = 5
, eta = 0.04
, colsample_bytree = 0.75
, subsample = 0.75
)
set.seed(1234)
model <- xgb.train(params = params
, data = dtrain
, nrounds = 560
, verbose = 1
, early.stop.round = 40
, watchlist = watchlist
, print.every.n = 20
)
pred <- predict(model, dvalid)
model$valprob <- rms::val.prob(pred, valid.y)
return(model)
}
Similar function is built for making final predictions. Number of rounds is chosen based on cross-validated early-stop.
trainPredxgb_C <- function(train){
train.y <- train[, 'TARGET']
# Matrix
train <- sparse.model.matrix(TARGET ~ .-1, data = train)
dtrain <- xgb.DMatrix(data=train, label = train.y)
watchlist <- list(train = dtrain)
params <- list(booster = "gbtree", objective = "binary:logistic", eval_metric = "auc"
, max_depth = 5
, eta = 0.04
, colsample_bytree = 0.75
, subsample = 0.75
)
set.seed(1234)
model <- xgb.train(params = params
, data = dtrain
, nrounds = 300
, verbose = 1
, watchlist = watchlist
, print.every.n = 20
)
return(model)
}
We will then proceed to modeling the customer satisfaction (TARGET variable) through 10-fold cross-validation. We will collect the classification measures from the folds to trace the mean and SD of scores across the resampling rounds.
# perform cross-validation
set.seed(120)
k=10 #7
t=1 #2
folds <- createMultiFolds(train[, "TARGET"], k = k, times=t)
r <- k*t
CVpred <-NULL
CVlabels <- NULL
CVvalprob <- matrix(NA, nrow=10, ncol=17)
for (i in 1:r) { # start CV
cat("\nResampling round", i,"\n")
m2m <- trainLGOCVxgb_C(train=train[folds[[i]],] , valid=train[-folds[[i]],])
CVvalprob[i,] <- m2m$valprob
colnames(CVvalprob) <- names(m2m$valprob)
} # end of CV
##
## Resampling round 1
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.798390 train-auc:0.812304
## [20] valid-auc:0.827746 train-auc:0.840313
## [40] valid-auc:0.832511 train-auc:0.846894
## [60] valid-auc:0.838726 train-auc:0.853556
## [80] valid-auc:0.842743 train-auc:0.858556
## [100] valid-auc:0.845698 train-auc:0.864177
## [120] valid-auc:0.846751 train-auc:0.869149
## [140] valid-auc:0.847907 train-auc:0.873286
## [160] valid-auc:0.848563 train-auc:0.876437
## [180] valid-auc:0.849196 train-auc:0.879360
## [200] valid-auc:0.849258 train-auc:0.882344
## [220] valid-auc:0.849378 train-auc:0.885014
## [240] valid-auc:0.849562 train-auc:0.887074
## [260] valid-auc:0.849767 train-auc:0.889517
## [280] valid-auc:0.849651 train-auc:0.891901
## Stopping. Best iteration: 249
##
## Resampling round 2
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.779119 train-auc:0.815678
## [20] valid-auc:0.810477 train-auc:0.846187
## [40] valid-auc:0.810584 train-auc:0.851360
## [60] valid-auc:0.815591 train-auc:0.857757
## [80] valid-auc:0.818495 train-auc:0.862582
## [100] valid-auc:0.819811 train-auc:0.867476
## [120] valid-auc:0.820694 train-auc:0.871634
## [140] valid-auc:0.821022 train-auc:0.875537
## [160] valid-auc:0.820999 train-auc:0.879030
## [180] valid-auc:0.820733 train-auc:0.882312
## Stopping. Best iteration: 145
##
## Resampling round 3
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.806939 train-auc:0.815943
## [20] valid-auc:0.835925 train-auc:0.841134
## [40] valid-auc:0.837040 train-auc:0.848050
## [60] valid-auc:0.841376 train-auc:0.853647
## [80] valid-auc:0.843213 train-auc:0.859292
## [100] valid-auc:0.843224 train-auc:0.864974
## [120] valid-auc:0.844314 train-auc:0.869554
## [140] valid-auc:0.843719 train-auc:0.873451
## [160] valid-auc:0.843800 train-auc:0.877286
## Stopping. Best iteration: 121
##
## Resampling round 4
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.784890 train-auc:0.812542
## [20] valid-auc:0.819146 train-auc:0.842661
## [40] valid-auc:0.818752 train-auc:0.847750
## [60] valid-auc:0.825504 train-auc:0.854882
## [80] valid-auc:0.829247 train-auc:0.860129
## [100] valid-auc:0.832721 train-auc:0.865246
## [120] valid-auc:0.834451 train-auc:0.869740
## [140] valid-auc:0.836093 train-auc:0.874329
## [160] valid-auc:0.837631 train-auc:0.877452
## [180] valid-auc:0.837950 train-auc:0.880554
## [200] valid-auc:0.837873 train-auc:0.883378
## [220] valid-auc:0.837902 train-auc:0.885463
## [240] valid-auc:0.837863 train-auc:0.888463
## Stopping. Best iteration: 204
##
## Resampling round 5
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.814316 train-auc:0.809990
## [20] valid-auc:0.840854 train-auc:0.839184
## [40] valid-auc:0.847417 train-auc:0.847449
## [60] valid-auc:0.848482 train-auc:0.853322
## [80] valid-auc:0.851602 train-auc:0.858589
## [100] valid-auc:0.852320 train-auc:0.863457
## [120] valid-auc:0.853293 train-auc:0.867730
## [140] valid-auc:0.854684 train-auc:0.871549
## [160] valid-auc:0.855598 train-auc:0.875628
## [180] valid-auc:0.855792 train-auc:0.879188
## [200] valid-auc:0.856465 train-auc:0.882226
## [220] valid-auc:0.857304 train-auc:0.884958
## [240] valid-auc:0.858117 train-auc:0.887320
## [260] valid-auc:0.858596 train-auc:0.889450
## [280] valid-auc:0.858987 train-auc:0.891753
## [300] valid-auc:0.859531 train-auc:0.893647
## [320] valid-auc:0.859490 train-auc:0.895716
## [340] valid-auc:0.859625 train-auc:0.897530
## [360] valid-auc:0.860391 train-auc:0.899594
## [380] valid-auc:0.860532 train-auc:0.901476
## [400] valid-auc:0.860337 train-auc:0.903417
## Stopping. Best iteration: 380
##
## Resampling round 6
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.800968 train-auc:0.811410
## [20] valid-auc:0.827667 train-auc:0.842045
## [40] valid-auc:0.833039 train-auc:0.849013
## [60] valid-auc:0.839367 train-auc:0.855348
## [80] valid-auc:0.839773 train-auc:0.860723
## [100] valid-auc:0.839713 train-auc:0.865577
## [120] valid-auc:0.840553 train-auc:0.869804
## [140] valid-auc:0.841513 train-auc:0.873576
## [160] valid-auc:0.842456 train-auc:0.877239
## [180] valid-auc:0.842921 train-auc:0.880588
## [200] valid-auc:0.842224 train-auc:0.883314
## [220] valid-auc:0.841925 train-auc:0.886193
## Stopping. Best iteration: 181
##
## Resampling round 7
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.803594 train-auc:0.807368
## [20] valid-auc:0.831374 train-auc:0.841771
## [40] valid-auc:0.835528 train-auc:0.847556
## [60] valid-auc:0.839605 train-auc:0.853725
## [80] valid-auc:0.842495 train-auc:0.859265
## [100] valid-auc:0.843390 train-auc:0.864893
## [120] valid-auc:0.844821 train-auc:0.869286
## [140] valid-auc:0.845229 train-auc:0.873536
## [160] valid-auc:0.845331 train-auc:0.876599
## [180] valid-auc:0.845947 train-auc:0.880291
## [200] valid-auc:0.846009 train-auc:0.883495
## Stopping. Best iteration: 173
##
## Resampling round 8
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.816285 train-auc:0.811096
## [20] valid-auc:0.834251 train-auc:0.841090
## [40] valid-auc:0.838270 train-auc:0.847343
## [60] valid-auc:0.841923 train-auc:0.855037
## [80] valid-auc:0.844856 train-auc:0.861027
## [100] valid-auc:0.844697 train-auc:0.866154
## [120] valid-auc:0.844838 train-auc:0.870411
## [140] valid-auc:0.845198 train-auc:0.874319
## [160] valid-auc:0.845333 train-auc:0.877221
## [180] valid-auc:0.846205 train-auc:0.880337
## [200] valid-auc:0.846576 train-auc:0.883735
## [220] valid-auc:0.846268 train-auc:0.886435
## Stopping. Best iteration: 195
##
## Resampling round 9
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.806238 train-auc:0.817467
## [20] valid-auc:0.825399 train-auc:0.842917
## [40] valid-auc:0.825904 train-auc:0.848433
## [60] valid-auc:0.831145 train-auc:0.855515
## [80] valid-auc:0.833601 train-auc:0.860764
## [100] valid-auc:0.835537 train-auc:0.866775
## [120] valid-auc:0.835919 train-auc:0.870838
## [140] valid-auc:0.835534 train-auc:0.874806
## [160] valid-auc:0.835287 train-auc:0.877899
## Stopping. Best iteration: 129
##
## Resampling round 10
## Warning in xgb.train(params = params, data = dtrain, nrounds = 560, verbose
## = 1, : Only the first data set in watchlist is used for early stopping
## process.
## [0] valid-auc:0.796626 train-auc:0.814269
## [20] valid-auc:0.820444 train-auc:0.844613
## [40] valid-auc:0.822156 train-auc:0.849720
## [60] valid-auc:0.825449 train-auc:0.856312
## [80] valid-auc:0.826399 train-auc:0.861326
## [100] valid-auc:0.827708 train-auc:0.866078
## [120] valid-auc:0.828470 train-auc:0.870403
## [140] valid-auc:0.828681 train-auc:0.874283
## [160] valid-auc:0.829436 train-auc:0.878770
## [180] valid-auc:0.829649 train-auc:0.881454
## [200] valid-auc:0.829624 train-auc:0.884762
## [220] valid-auc:0.829941 train-auc:0.888216
## [240] valid-auc:0.829517 train-auc:0.890189
## Stopping. Best iteration: 219
cat("\n")
cat("CV score: \n")
## CV score:
scores <- rbind(apply(CVvalprob, 2, mean), apply(CVvalprob, 2, sd))
rownames(scores) <- c("mean", "sd")
print(scores)
## Dxy C (ROC) R2 D D:Chi-sq D:p U
## mean 0.68124354 0.8406218 0.2245267 0.06558355 499.56617 NA -0.0000942426
## sd 0.02187359 0.0109368 0.0156069 0.00458612 34.86369 NA 0.0001831644
## U:Chi-sq U:p Q Brier Intercept Slope
## mean 1.283568 0.6191425 0.065677795 0.0344364875 0.02368246 1.00647979
## sd 1.392416 0.2942358 0.004651485 0.0008411937 0.11705484 0.05298757
## Emax S:z S:p Eavg
## mean 0.02837047 0.1816441 0.7194494 0.0029800177
## sd 0.01764127 0.6634078 0.3113167 0.0009152832
# calculate and show feature importance from the last resampling round
f<- xgb.importance(feature_names = colnames(test), model=m2m)
head(f,50)
## Feature Gain Cover Frequence
## 1: var15_clean 0.268463490 0.194272374 0.083877129
## 2: saldo_var30_clean 0.167816916 0.128352380 0.046952093
## 3: var38_clean 0.068617748 0.072706435 0.097564858
## 4: n0_clean 0.042572843 0.020404546 0.024351424
## 5: saldo_medio_var5_hace3_clean 0.029674546 0.032244611 0.037084195
## 6: saldo_medio_var5_ult3_clean 0.029167107 0.026853052 0.043132262
## 7: saldo_medio_var5_hace2_clean 0.025264761 0.025250417 0.038994111
## 8: num_var22_ult3_clean 0.020213546 0.026112114 0.031195289
## 9: num_var45_hace3_clean 0.018668896 0.020087042 0.038994111
## 10: saldo_medio_var5_ult1_clean 0.016606118 0.011543835 0.024669744
## 11: imp_op_var41_efect_ult3_clean 0.012893958 0.014676484 0.017825879
## 12: num_var22_ult1_clean 0.012345215 0.024951802 0.018939997
## 13: mod3_clean 0.011932514 0.016813066 0.021804870
## 14: num_var45_ult1_clean 0.010877274 0.007862311 0.020690753
## 15: saldo_var42_clean 0.010711639 0.011341639 0.015120166
## 16: num_var22_hace3_clean 0.010017284 0.014607306 0.023396467
## 17: imp_op_var39_ult1_clean 0.009700368 0.012425803 0.013846888
## 18: saldo_var5_clean 0.009581209 0.010287524 0.018462518
## 19: saldo_var37_clean 0.009180095 0.010767366 0.015438485
## 20: saldo_var8_clean 0.008848297 0.031999816 0.009867898
## 21: num_var4_clean 0.008805998 0.004269736 0.002864873
## 22: var3_catB 0.008052801 0.008886556 0.013210250
## 23: imp_op_var41_ult1_clean 0.007642434 0.011678609 0.014165208
## 24: imp_trans_var37_ult1_clean 0.007521902 0.010971744 0.015438485
## 25: num_meses_var39_vig_ult3_clean 0.007122657 0.010938460 0.013687729
## 26: var3_catP 0.007077299 0.007937457 0.014642687
## 27: imp_op_var41_efect_ult1_clean 0.007062190 0.016375040 0.011141175
## 28: var36_catB 0.006478173 0.002173671 0.013846888
## 29: num_var30_0_clean 0.006469239 0.007905537 0.004138151
## 30: num_meses_var5_ult3_clean 0.006439109 0.005367138 0.003978991
## 31: num_var35_clean 0.006242707 0.008795362 0.003342352
## 32: imp_op_var39_comer_ult3_clean 0.006096856 0.008039103 0.012573611
## 33: imp_op_var39_comer_ult1_clean 0.005631149 0.009000989 0.011459494
## 34: num_var22_hace2_clean 0.004987933 0.002772061 0.010504536
## 35: imp_op_var41_comer_ult3_clean 0.004605716 0.004951422 0.008753780
## 36: var36_catP 0.004532460 0.002630844 0.012891931
## 37: imp_var43_emit_ult1_clean 0.004521308 0.005760601 0.009549578
## 38: num_ent_var16_ult1_clean 0.004448427 0.012878239 0.009072099
## 39: saldo_medio_var8_hace2_clean 0.004142743 0.008869759 0.008117141
## 40: imp_op_var39_efect_ult3_clean 0.004072686 0.004700536 0.005729747
## 41: imp_op_var41_comer_ult1_clean 0.003464496 0.004382961 0.008753780
## 42: num_op_var41_ult1_clean 0.003401376 0.002079729 0.006048066
## 43: num_med_var22_ult3_clean 0.003344084 0.003591988 0.005729747
## 44: num_op_var41_efect_ult3_clean 0.003300449 0.002933236 0.006048066
## 45: num_op_var41_hace2_clean 0.003223464 0.003387859 0.005570587
## 46: imp_op_var39_efect_ult1_clean 0.002980164 0.005178179 0.004138151
## 47: var38_isBAD 0.002955948 0.002513535 0.006525545
## 48: num_op_var41_ult3_clean 0.002891716 0.001691668 0.005570587
## 49: saldo_medio_var8_ult1_clean 0.002867156 0.005501656 0.005729747
## 50: num_var42_0_clean 0.002699580 0.004705893 0.002705714
## Feature Gain Cover Frequence
This model can now be combined with other in an ensemble for further submission to the competition.