Split dataset into train, valid and test

Gradient Boosting Machine Learing Model Setting

## Model Details:
## ==============
## 
## H2OBinomialModel: gbm
## Model ID:  GBM_model_R_1558919351183_3389 
## Model Summary: 
##   number_of_trees number_of_internal_trees model_size_in_bytes min_depth
## 1             500                      500              180748         5
##   max_depth mean_depth min_leaves max_leaves mean_leaves
## 1         5    5.00000         17         31    23.96400
## 
## 
## H2OBinomialMetrics: gbm
## ** Reported on training data. **
## 
## MSE:  0.176599
## RMSE:  0.4202368
## LogLoss:  0.5373888
## Mean Per-Class Error:  0.1910963
## AUC:  0.8918551
## pr_auc:  0.8035924
## Gini:  0.7837101
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##         no yes    Error       Rate
## no     584  78 0.117825    =78/662
## yes     92 256 0.264368    =92/348
## Totals 676 334 0.168317  =170/1010
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.354626 0.750733 164
## 2                       max f2  0.290111 0.812562 291
## 3                 max f0point5  0.400812 0.820957  98
## 4                 max accuracy  0.368542 0.838614 142
## 5                max precision  0.598200 1.000000   0
## 6                   max recall  0.264495 1.000000 371
## 7              max specificity  0.598200 1.000000   0
## 8             max absolute_mcc  0.368542 0.633925 142
## 9   max min_per_class_accuracy  0.335600 0.796073 200
## 10 max mean_per_class_accuracy  0.354626 0.808904 164
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
## H2OBinomialMetrics: gbm
## ** Reported on validation data. **
## 
## MSE:  0.1901109
## RMSE:  0.4360171
## LogLoss:  0.5668623
## Mean Per-Class Error:  0.2957044
## AUC:  0.7756624
## pr_auc:  0.6490403
## Gini:  0.5513248
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##         no yes    Error      Rate
## no     156 101 0.392996  =101/257
## yes     25 101 0.198413   =25/126
## Totals 181 202 0.328982  =126/383
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.307786 0.615854 179
## 2                       max f2  0.283216 0.758755 241
## 3                 max f0point5  0.400520 0.679487  49
## 4                 max accuracy  0.400520 0.775457  49
## 5                max precision  0.598200 1.000000   0
## 6                   max recall  0.248641 1.000000 320
## 7              max specificity  0.598200 1.000000   0
## 8             max absolute_mcc  0.400520 0.460375  49
## 9   max min_per_class_accuracy  0.324252 0.690476 147
## 10 max mean_per_class_accuracy  0.307786 0.704296 179
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
## Variable Importances: 
##           variable relative_importance scaled_importance percentage
## 1      months_loan         4882.286133          1.000000   0.235244
## 2           amount         3386.633057          0.693657   0.163178
## 3  savings_balance         2718.146729          0.556736   0.130969
## 4              age         1999.284546          0.409498   0.096332
## 5          housing         1767.279297          0.361978   0.085153
## 6       employment         1442.025513          0.295359   0.069481
## 7    credit_record         1282.387573          0.262661   0.061789
## 8              job         1063.397217          0.217807   0.051238
## 9          purpose          744.287109          0.152446   0.035862
## 10    fixed_assets          668.080750          0.136838   0.032190
## 11    other_credit          353.054474          0.072313   0.017011
## 12     loans_count          272.191284          0.055751   0.013115
## 13      dependents          175.124817          0.035869   0.008438

##   predict        no       yes
## 1     yes 0.4819037 0.5180963
## 2      no 0.7083291 0.2916709
## 3     yes 0.6627646 0.3372354
## 4     yes 0.4709696 0.5290304
## 5     yes 0.6629977 0.3370023
## 6      no 0.7343484 0.2656516
## 
## [607 rows x 3 columns]
## Confusion Matrix (vertical: actual; across: predicted)  for max f1 @ threshold = 0.354625927965481:
##         no yes    Error       Rate
## no     584  78 0.117825    =78/662
## yes     92 256 0.264368    =92/348
## Totals 676 334 0.168317  =170/1010
## Confusion Matrix (vertical: actual; across: predicted)  for max f1 @ threshold = 0.307786176353324:
##         no yes    Error      Rate
## no     156 101 0.392996  =101/257
## yes     25 101 0.198413   =25/126
## Totals 181 202 0.328982  =126/383
## Confusion Matrix (vertical: actual; across: predicted)  for max f1 @ threshold = 0.32610821859585:
##         no yes    Error      Rate
## no     275 113 0.291237  =113/388
## yes     62 157 0.283105   =62/219
## Totals 337 270 0.288303  =175/607

AUC: Area Under ROC Curve

## [1] 0.8918485
## [1] 0.7756624
## [1] 0.7750788