application_train.head(5)
|
|
SK_ID_CURR
|
TARGET
|
NAME_CONTRACT_TYPE
|
CODE_GENDER
|
FLAG_OWN_CAR
|
FLAG_OWN_REALTY
|
CNT_CHILDREN
|
AMT_INCOME_TOTAL
|
AMT_CREDIT
|
AMT_ANNUITY
|
AMT_GOODS_PRICE
|
NAME_TYPE_SUITE
|
NAME_INCOME_TYPE
|
NAME_EDUCATION_TYPE
|
NAME_FAMILY_STATUS
|
NAME_HOUSING_TYPE
|
REGION_POPULATION_RELATIVE
|
DAYS_BIRTH
|
DAYS_EMPLOYED
|
DAYS_REGISTRATION
|
DAYS_ID_PUBLISH
|
OWN_CAR_AGE
|
FLAG_MOBIL
|
FLAG_EMP_PHONE
|
FLAG_WORK_PHONE
|
FLAG_CONT_MOBILE
|
FLAG_PHONE
|
FLAG_EMAIL
|
OCCUPATION_TYPE
|
CNT_FAM_MEMBERS
|
REGION_RATING_CLIENT
|
REGION_RATING_CLIENT_W_CITY
|
WEEKDAY_APPR_PROCESS_START
|
HOUR_APPR_PROCESS_START
|
REG_REGION_NOT_LIVE_REGION
|
REG_REGION_NOT_WORK_REGION
|
LIVE_REGION_NOT_WORK_REGION
|
REG_CITY_NOT_LIVE_CITY
|
REG_CITY_NOT_WORK_CITY
|
LIVE_CITY_NOT_WORK_CITY
|
ORGANIZATION_TYPE
|
EXT_SOURCE_1
|
EXT_SOURCE_2
|
EXT_SOURCE_3
|
APARTMENTS_AVG
|
BASEMENTAREA_AVG
|
YEARS_BEGINEXPLUATATION_AVG
|
YEARS_BUILD_AVG
|
COMMONAREA_AVG
|
ELEVATORS_AVG
|
ENTRANCES_AVG
|
FLOORSMAX_AVG
|
FLOORSMIN_AVG
|
LANDAREA_AVG
|
LIVINGAPARTMENTS_AVG
|
LIVINGAREA_AVG
|
NONLIVINGAPARTMENTS_AVG
|
NONLIVINGAREA_AVG
|
APARTMENTS_MODE
|
BASEMENTAREA_MODE
|
YEARS_BEGINEXPLUATATION_MODE
|
YEARS_BUILD_MODE
|
COMMONAREA_MODE
|
ELEVATORS_MODE
|
ENTRANCES_MODE
|
FLOORSMAX_MODE
|
FLOORSMIN_MODE
|
LANDAREA_MODE
|
LIVINGAPARTMENTS_MODE
|
LIVINGAREA_MODE
|
NONLIVINGAPARTMENTS_MODE
|
NONLIVINGAREA_MODE
|
APARTMENTS_MEDI
|
BASEMENTAREA_MEDI
|
YEARS_BEGINEXPLUATATION_MEDI
|
YEARS_BUILD_MEDI
|
COMMONAREA_MEDI
|
ELEVATORS_MEDI
|
ENTRANCES_MEDI
|
FLOORSMAX_MEDI
|
FLOORSMIN_MEDI
|
LANDAREA_MEDI
|
LIVINGAPARTMENTS_MEDI
|
LIVINGAREA_MEDI
|
NONLIVINGAPARTMENTS_MEDI
|
NONLIVINGAREA_MEDI
|
FONDKAPREMONT_MODE
|
HOUSETYPE_MODE
|
TOTALAREA_MODE
|
WALLSMATERIAL_MODE
|
EMERGENCYSTATE_MODE
|
OBS_30_CNT_SOCIAL_CIRCLE
|
DEF_30_CNT_SOCIAL_CIRCLE
|
OBS_60_CNT_SOCIAL_CIRCLE
|
DEF_60_CNT_SOCIAL_CIRCLE
|
DAYS_LAST_PHONE_CHANGE
|
FLAG_DOCUMENT_2
|
FLAG_DOCUMENT_3
|
FLAG_DOCUMENT_4
|
FLAG_DOCUMENT_5
|
FLAG_DOCUMENT_6
|
FLAG_DOCUMENT_7
|
FLAG_DOCUMENT_8
|
FLAG_DOCUMENT_9
|
FLAG_DOCUMENT_10
|
FLAG_DOCUMENT_11
|
FLAG_DOCUMENT_12
|
FLAG_DOCUMENT_13
|
FLAG_DOCUMENT_14
|
FLAG_DOCUMENT_15
|
FLAG_DOCUMENT_16
|
FLAG_DOCUMENT_17
|
FLAG_DOCUMENT_18
|
FLAG_DOCUMENT_19
|
FLAG_DOCUMENT_20
|
FLAG_DOCUMENT_21
|
AMT_REQ_CREDIT_BUREAU_HOUR
|
AMT_REQ_CREDIT_BUREAU_DAY
|
AMT_REQ_CREDIT_BUREAU_WEEK
|
AMT_REQ_CREDIT_BUREAU_MON
|
AMT_REQ_CREDIT_BUREAU_QRT
|
AMT_REQ_CREDIT_BUREAU_YEAR
|
|
0
|
100002
|
1
|
Cash loans
|
M
|
N
|
Y
|
0
|
202500.0
|
406597.5
|
24700.5
|
351000.0
|
Unaccompanied
|
Working
|
Secondary / secondary special
|
Single / not married
|
House / apartment
|
0.018801
|
-9461
|
-637
|
-3648.0
|
-2120
|
NaN
|
1
|
1
|
0
|
1
|
1
|
0
|
Laborers
|
1.0
|
2
|
2
|
WEDNESDAY
|
10
|
0
|
0
|
0
|
0
|
0
|
0
|
Business Entity Type 3
|
0.083037
|
0.262949
|
0.139376
|
0.0247
|
0.0369
|
0.9722
|
0.6192
|
0.0143
|
0.00
|
0.0690
|
0.0833
|
0.1250
|
0.0369
|
0.0202
|
0.0190
|
0.0000
|
0.0000
|
0.0252
|
0.0383
|
0.9722
|
0.6341
|
0.0144
|
0.0000
|
0.0690
|
0.0833
|
0.1250
|
0.0377
|
0.022
|
0.0198
|
0.0
|
0.0
|
0.0250
|
0.0369
|
0.9722
|
0.6243
|
0.0144
|
0.00
|
0.0690
|
0.0833
|
0.1250
|
0.0375
|
0.0205
|
0.0193
|
0.0000
|
0.00
|
reg oper account
|
block of flats
|
0.0149
|
Stone, brick
|
No
|
2.0
|
2.0
|
2.0
|
2.0
|
-1134.0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.0
|
0.0
|
0.0
|
0.0
|
0.0
|
1.0
|
|
1
|
100003
|
0
|
Cash loans
|
F
|
N
|
N
|
0
|
270000.0
|
1293502.5
|
35698.5
|
1129500.0
|
Family
|
State servant
|
Higher education
|
Married
|
House / apartment
|
0.003541
|
-16765
|
-1188
|
-1186.0
|
-291
|
NaN
|
1
|
1
|
0
|
1
|
1
|
0
|
Core staff
|
2.0
|
1
|
1
|
MONDAY
|
11
|
0
|
0
|
0
|
0
|
0
|
0
|
School
|
0.311267
|
0.622246
|
NaN
|
0.0959
|
0.0529
|
0.9851
|
0.7960
|
0.0605
|
0.08
|
0.0345
|
0.2917
|
0.3333
|
0.0130
|
0.0773
|
0.0549
|
0.0039
|
0.0098
|
0.0924
|
0.0538
|
0.9851
|
0.8040
|
0.0497
|
0.0806
|
0.0345
|
0.2917
|
0.3333
|
0.0128
|
0.079
|
0.0554
|
0.0
|
0.0
|
0.0968
|
0.0529
|
0.9851
|
0.7987
|
0.0608
|
0.08
|
0.0345
|
0.2917
|
0.3333
|
0.0132
|
0.0787
|
0.0558
|
0.0039
|
0.01
|
reg oper account
|
block of flats
|
0.0714
|
Block
|
No
|
1.0
|
0.0
|
1.0
|
0.0
|
-828.0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.0
|
0.0
|
0.0
|
0.0
|
0.0
|
0.0
|
|
2
|
100004
|
0
|
Revolving loans
|
M
|
Y
|
Y
|
0
|
67500.0
|
135000.0
|
6750.0
|
135000.0
|
Unaccompanied
|
Working
|
Secondary / secondary special
|
Single / not married
|
House / apartment
|
0.010032
|
-19046
|
-225
|
-4260.0
|
-2531
|
26.0
|
1
|
1
|
1
|
1
|
1
|
0
|
Laborers
|
1.0
|
2
|
2
|
MONDAY
|
9
|
0
|
0
|
0
|
0
|
0
|
0
|
Government
|
NaN
|
0.555912
|
0.729567
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
0.0
|
0.0
|
0.0
|
0.0
|
-815.0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.0
|
0.0
|
0.0
|
0.0
|
0.0
|
0.0
|
|
3
|
100006
|
0
|
Cash loans
|
F
|
N
|
Y
|
0
|
135000.0
|
312682.5
|
29686.5
|
297000.0
|
Unaccompanied
|
Working
|
Secondary / secondary special
|
Civil marriage
|
House / apartment
|
0.008019
|
-19005
|
-3039
|
-9833.0
|
-2437
|
NaN
|
1
|
1
|
0
|
1
|
0
|
0
|
Laborers
|
2.0
|
2
|
2
|
WEDNESDAY
|
17
|
0
|
0
|
0
|
0
|
0
|
0
|
Business Entity Type 3
|
NaN
|
0.650442
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
2.0
|
0.0
|
2.0
|
0.0
|
-617.0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
|
4
|
100007
|
0
|
Cash loans
|
M
|
N
|
Y
|
0
|
121500.0
|
513000.0
|
21865.5
|
513000.0
|
Unaccompanied
|
Working
|
Secondary / secondary special
|
Single / not married
|
House / apartment
|
0.028663
|
-19932
|
-3038
|
-4311.0
|
-3458
|
NaN
|
1
|
1
|
0
|
1
|
0
|
0
|
Core staff
|
1.0
|
2
|
2
|
THURSDAY
|
11
|
0
|
0
|
0
|
0
|
1
|
1
|
Religion
|
NaN
|
0.322738
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
0.0
|
0.0
|
0.0
|
0.0
|
-1106.0
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.0
|
0.0
|
0.0
|
0.0
|
0.0
|
0.0
|
application_train.columns.values
array(['SK_ID_CURR', 'TARGET', 'NAME_CONTRACT_TYPE', 'CODE_GENDER',
'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'CNT_CHILDREN',
'AMT_INCOME_TOTAL', 'AMT_CREDIT', 'AMT_ANNUITY', 'AMT_GOODS_PRICE',
'NAME_TYPE_SUITE', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE',
'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE',
'REGION_POPULATION_RELATIVE', 'DAYS_BIRTH', 'DAYS_EMPLOYED',
'DAYS_REGISTRATION', 'DAYS_ID_PUBLISH', 'OWN_CAR_AGE',
'FLAG_MOBIL', 'FLAG_EMP_PHONE', 'FLAG_WORK_PHONE',
'FLAG_CONT_MOBILE', 'FLAG_PHONE', 'FLAG_EMAIL', 'OCCUPATION_TYPE',
'CNT_FAM_MEMBERS', 'REGION_RATING_CLIENT',
'REGION_RATING_CLIENT_W_CITY', 'WEEKDAY_APPR_PROCESS_START',
'HOUR_APPR_PROCESS_START', 'REG_REGION_NOT_LIVE_REGION',
'REG_REGION_NOT_WORK_REGION', 'LIVE_REGION_NOT_WORK_REGION',
'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY',
'LIVE_CITY_NOT_WORK_CITY', 'ORGANIZATION_TYPE', 'EXT_SOURCE_1',
'EXT_SOURCE_2', 'EXT_SOURCE_3', 'APARTMENTS_AVG',
'BASEMENTAREA_AVG', 'YEARS_BEGINEXPLUATATION_AVG',
'YEARS_BUILD_AVG', 'COMMONAREA_AVG', 'ELEVATORS_AVG',
'ENTRANCES_AVG', 'FLOORSMAX_AVG', 'FLOORSMIN_AVG', 'LANDAREA_AVG',
'LIVINGAPARTMENTS_AVG', 'LIVINGAREA_AVG',
'NONLIVINGAPARTMENTS_AVG', 'NONLIVINGAREA_AVG', 'APARTMENTS_MODE',
'BASEMENTAREA_MODE', 'YEARS_BEGINEXPLUATATION_MODE',
'YEARS_BUILD_MODE', 'COMMONAREA_MODE', 'ELEVATORS_MODE',
'ENTRANCES_MODE', 'FLOORSMAX_MODE', 'FLOORSMIN_MODE',
'LANDAREA_MODE', 'LIVINGAPARTMENTS_MODE', 'LIVINGAREA_MODE',
'NONLIVINGAPARTMENTS_MODE', 'NONLIVINGAREA_MODE',
'APARTMENTS_MEDI', 'BASEMENTAREA_MEDI',
'YEARS_BEGINEXPLUATATION_MEDI', 'YEARS_BUILD_MEDI',
'COMMONAREA_MEDI', 'ELEVATORS_MEDI', 'ENTRANCES_MEDI',
'FLOORSMAX_MEDI', 'FLOORSMIN_MEDI', 'LANDAREA_MEDI',
'LIVINGAPARTMENTS_MEDI', 'LIVINGAREA_MEDI',
'NONLIVINGAPARTMENTS_MEDI', 'NONLIVINGAREA_MEDI',
'FONDKAPREMONT_MODE', 'HOUSETYPE_MODE', 'TOTALAREA_MODE',
'WALLSMATERIAL_MODE', 'EMERGENCYSTATE_MODE',
'OBS_30_CNT_SOCIAL_CIRCLE', 'DEF_30_CNT_SOCIAL_CIRCLE',
'OBS_60_CNT_SOCIAL_CIRCLE', 'DEF_60_CNT_SOCIAL_CIRCLE',
'DAYS_LAST_PHONE_CHANGE', 'FLAG_DOCUMENT_2', 'FLAG_DOCUMENT_3',
'FLAG_DOCUMENT_4', 'FLAG_DOCUMENT_5', 'FLAG_DOCUMENT_6',
'FLAG_DOCUMENT_7', 'FLAG_DOCUMENT_8', 'FLAG_DOCUMENT_9',
'FLAG_DOCUMENT_10', 'FLAG_DOCUMENT_11', 'FLAG_DOCUMENT_12',
'FLAG_DOCUMENT_13', 'FLAG_DOCUMENT_14', 'FLAG_DOCUMENT_15',
'FLAG_DOCUMENT_16', 'FLAG_DOCUMENT_17', 'FLAG_DOCUMENT_18',
'FLAG_DOCUMENT_19', 'FLAG_DOCUMENT_20', 'FLAG_DOCUMENT_21',
'AMT_REQ_CREDIT_BUREAU_HOUR', 'AMT_REQ_CREDIT_BUREAU_DAY',
'AMT_REQ_CREDIT_BUREAU_WEEK', 'AMT_REQ_CREDIT_BUREAU_MON',
'AMT_REQ_CREDIT_BUREAU_QRT', 'AMT_REQ_CREDIT_BUREAU_YEAR'],
dtype=object)
def find_missing(data):
# number of missing values
count_missing = data.isnull().sum().values
# total records
total = data.shape[0]
# percentage of missing
ratio_missing = count_missing/total
# return a dataframe to show: feature name, # of missing and % of missing
return pd.DataFrame(data={'missing_count':count_missing, 'missing_ratio':ratio_missing}, index=data.columns.values)
find_missing(application_train).head(12)
|
|
missing_count
|
missing_ratio
|
|
SK_ID_CURR
|
0
|
0.000000
|
|
TARGET
|
0
|
0.000000
|
|
NAME_CONTRACT_TYPE
|
0
|
0.000000
|
|
CODE_GENDER
|
0
|
0.000000
|
|
FLAG_OWN_CAR
|
0
|
0.000000
|
|
FLAG_OWN_REALTY
|
0
|
0.000000
|
|
CNT_CHILDREN
|
0
|
0.000000
|
|
AMT_INCOME_TOTAL
|
0
|
0.000000
|
|
AMT_CREDIT
|
0
|
0.000000
|
|
AMT_ANNUITY
|
12
|
0.000039
|
|
AMT_GOODS_PRICE
|
278
|
0.000904
|
|
NAME_TYPE_SUITE
|
1292
|
0.004201
|
POS_CASH_balance.head()
|
|
SK_ID_PREV
|
SK_ID_CURR
|
MONTHS_BALANCE
|
CNT_INSTALMENT
|
CNT_INSTALMENT_FUTURE
|
NAME_CONTRACT_STATUS
|
SK_DPD
|
SK_DPD_DEF
|
|
0
|
1803195
|
182943
|
-31
|
48.0
|
45.0
|
Active
|
0
|
0
|
|
1
|
1715348
|
367990
|
-33
|
36.0
|
35.0
|
Active
|
0
|
0
|
|
2
|
1784872
|
397406
|
-32
|
12.0
|
9.0
|
Active
|
0
|
0
|
|
3
|
1903291
|
269225
|
-35
|
48.0
|
42.0
|
Active
|
0
|
0
|
|
4
|
2341044
|
334279
|
-35
|
36.0
|
35.0
|
Active
|
0
|
0
|
POS_CASH_balance.columns.values
array(['SK_ID_PREV', 'SK_ID_CURR', 'MONTHS_BALANCE', 'CNT_INSTALMENT',
'CNT_INSTALMENT_FUTURE', 'NAME_CONTRACT_STATUS', 'SK_DPD',
'SK_DPD_DEF'], dtype=object)
find_missing(POS_CASH_balance).head(12)
|
|
missing_count
|
missing_ratio
|
|
SK_ID_PREV
|
0
|
0.000000
|
|
SK_ID_CURR
|
0
|
0.000000
|
|
MONTHS_BALANCE
|
0
|
0.000000
|
|
CNT_INSTALMENT
|
26071
|
0.002607
|
|
CNT_INSTALMENT_FUTURE
|
26087
|
0.002608
|
|
NAME_CONTRACT_STATUS
|
0
|
0.000000
|
|
SK_DPD
|
0
|
0.000000
|
|
SK_DPD_DEF
|
0
|
0.000000
|
bureau.head()
|
|
SK_ID_CURR
|
SK_ID_BUREAU
|
CREDIT_ACTIVE
|
CREDIT_CURRENCY
|
DAYS_CREDIT
|
CREDIT_DAY_OVERDUE
|
DAYS_CREDIT_ENDDATE
|
DAYS_ENDDATE_FACT
|
AMT_CREDIT_MAX_OVERDUE
|
CNT_CREDIT_PROLONG
|
AMT_CREDIT_SUM
|
AMT_CREDIT_SUM_DEBT
|
AMT_CREDIT_SUM_LIMIT
|
AMT_CREDIT_SUM_OVERDUE
|
CREDIT_TYPE
|
DAYS_CREDIT_UPDATE
|
AMT_ANNUITY
|
|
0
|
215354
|
5714462
|
Closed
|
currency 1
|
-497
|
0
|
-153.0
|
-153.0
|
NaN
|
0
|
91323.0
|
0.0
|
NaN
|
0.0
|
Consumer credit
|
-131
|
NaN
|
|
1
|
215354
|
5714463
|
Active
|
currency 1
|
-208
|
0
|
1075.0
|
NaN
|
NaN
|
0
|
225000.0
|
171342.0
|
NaN
|
0.0
|
Credit card
|
-20
|
NaN
|
|
2
|
215354
|
5714464
|
Active
|
currency 1
|
-203
|
0
|
528.0
|
NaN
|
NaN
|
0
|
464323.5
|
NaN
|
NaN
|
0.0
|
Consumer credit
|
-16
|
NaN
|
|
3
|
215354
|
5714465
|
Active
|
currency 1
|
-203
|
0
|
NaN
|
NaN
|
NaN
|
0
|
90000.0
|
NaN
|
NaN
|
0.0
|
Credit card
|
-16
|
NaN
|
|
4
|
215354
|
5714466
|
Active
|
currency 1
|
-629
|
0
|
1197.0
|
NaN
|
77674.5
|
0
|
2700000.0
|
NaN
|
NaN
|
0.0
|
Consumer credit
|
-21
|
NaN
|
bureau.columns.values
array(['SK_ID_CURR', 'SK_ID_BUREAU', 'CREDIT_ACTIVE', 'CREDIT_CURRENCY',
'DAYS_CREDIT', 'CREDIT_DAY_OVERDUE', 'DAYS_CREDIT_ENDDATE',
'DAYS_ENDDATE_FACT', 'AMT_CREDIT_MAX_OVERDUE',
'CNT_CREDIT_PROLONG', 'AMT_CREDIT_SUM', 'AMT_CREDIT_SUM_DEBT',
'AMT_CREDIT_SUM_LIMIT', 'AMT_CREDIT_SUM_OVERDUE', 'CREDIT_TYPE',
'DAYS_CREDIT_UPDATE', 'AMT_ANNUITY'], dtype=object)
find_missing(bureau).head(12)
|
|
missing_count
|
missing_ratio
|
|
SK_ID_CURR
|
0
|
0.000000
|
|
SK_ID_BUREAU
|
0
|
0.000000
|
|
CREDIT_ACTIVE
|
0
|
0.000000
|
|
CREDIT_CURRENCY
|
0
|
0.000000
|
|
DAYS_CREDIT
|
0
|
0.000000
|
|
CREDIT_DAY_OVERDUE
|
0
|
0.000000
|
|
DAYS_CREDIT_ENDDATE
|
105553
|
0.061496
|
|
DAYS_ENDDATE_FACT
|
633653
|
0.369170
|
|
AMT_CREDIT_MAX_OVERDUE
|
1124488
|
0.655133
|
|
CNT_CREDIT_PROLONG
|
0
|
0.000000
|
|
AMT_CREDIT_SUM
|
13
|
0.000008
|
|
AMT_CREDIT_SUM_DEBT
|
257669
|
0.150119
|
bureau_balance.head()
|
|
SK_ID_BUREAU
|
MONTHS_BALANCE
|
STATUS
|
|
0
|
5715448
|
0
|
C
|
|
1
|
5715448
|
-1
|
C
|
|
2
|
5715448
|
-2
|
C
|
|
3
|
5715448
|
-3
|
C
|
|
4
|
5715448
|
-4
|
C
|
bureau_balance.columns.values
array(['SK_ID_BUREAU', 'MONTHS_BALANCE', 'STATUS'], dtype=object)
find_missing(bureau_balance).head(12)
|
|
missing_count
|
missing_ratio
|
|
SK_ID_BUREAU
|
0
|
0.0
|
|
MONTHS_BALANCE
|
0
|
0.0
|
|
STATUS
|
0
|
0.0
|
credit_card_balance.head()
|
|
SK_ID_PREV
|
SK_ID_CURR
|
MONTHS_BALANCE
|
AMT_BALANCE
|
AMT_CREDIT_LIMIT_ACTUAL
|
AMT_DRAWINGS_ATM_CURRENT
|
AMT_DRAWINGS_CURRENT
|
AMT_DRAWINGS_OTHER_CURRENT
|
AMT_DRAWINGS_POS_CURRENT
|
AMT_INST_MIN_REGULARITY
|
AMT_PAYMENT_CURRENT
|
AMT_PAYMENT_TOTAL_CURRENT
|
AMT_RECEIVABLE_PRINCIPAL
|
AMT_RECIVABLE
|
AMT_TOTAL_RECEIVABLE
|
CNT_DRAWINGS_ATM_CURRENT
|
CNT_DRAWINGS_CURRENT
|
CNT_DRAWINGS_OTHER_CURRENT
|
CNT_DRAWINGS_POS_CURRENT
|
CNT_INSTALMENT_MATURE_CUM
|
NAME_CONTRACT_STATUS
|
SK_DPD
|
SK_DPD_DEF
|
|
0
|
2562384
|
378907
|
-6
|
56.970
|
135000
|
0.0
|
877.5
|
0.0
|
877.5
|
1700.325
|
1800.0
|
1800.0
|
0.000
|
0.000
|
0.000
|
0.0
|
1
|
0.0
|
1.0
|
35.0
|
Active
|
0
|
0
|
|
1
|
2582071
|
363914
|
-1
|
63975.555
|
45000
|
2250.0
|
2250.0
|
0.0
|
0.0
|
2250.000
|
2250.0
|
2250.0
|
60175.080
|
64875.555
|
64875.555
|
1.0
|
1
|
0.0
|
0.0
|
69.0
|
Active
|
0
|
0
|
|
2
|
1740877
|
371185
|
-7
|
31815.225
|
450000
|
0.0
|
0.0
|
0.0
|
0.0
|
2250.000
|
2250.0
|
2250.0
|
26926.425
|
31460.085
|
31460.085
|
0.0
|
0
|
0.0
|
0.0
|
30.0
|
Active
|
0
|
0
|
|
3
|
1389973
|
337855
|
-4
|
236572.110
|
225000
|
2250.0
|
2250.0
|
0.0
|
0.0
|
11795.760
|
11925.0
|
11925.0
|
224949.285
|
233048.970
|
233048.970
|
1.0
|
1
|
0.0
|
0.0
|
10.0
|
Active
|
0
|
0
|
|
4
|
1891521
|
126868
|
-1
|
453919.455
|
450000
|
0.0
|
11547.0
|
0.0
|
11547.0
|
22924.890
|
27000.0
|
27000.0
|
443044.395
|
453919.455
|
453919.455
|
0.0
|
1
|
0.0
|
1.0
|
101.0
|
Active
|
0
|
0
|
credit_card_balance.columns.values
array(['SK_ID_PREV', 'SK_ID_CURR', 'MONTHS_BALANCE', 'AMT_BALANCE',
'AMT_CREDIT_LIMIT_ACTUAL', 'AMT_DRAWINGS_ATM_CURRENT',
'AMT_DRAWINGS_CURRENT', 'AMT_DRAWINGS_OTHER_CURRENT',
'AMT_DRAWINGS_POS_CURRENT', 'AMT_INST_MIN_REGULARITY',
'AMT_PAYMENT_CURRENT', 'AMT_PAYMENT_TOTAL_CURRENT',
'AMT_RECEIVABLE_PRINCIPAL', 'AMT_RECIVABLE',
'AMT_TOTAL_RECEIVABLE', 'CNT_DRAWINGS_ATM_CURRENT',
'CNT_DRAWINGS_CURRENT', 'CNT_DRAWINGS_OTHER_CURRENT',
'CNT_DRAWINGS_POS_CURRENT', 'CNT_INSTALMENT_MATURE_CUM',
'NAME_CONTRACT_STATUS', 'SK_DPD', 'SK_DPD_DEF'], dtype=object)
find_missing(credit_card_balance).head(12)
|
|
missing_count
|
missing_ratio
|
|
SK_ID_PREV
|
0
|
0.000000
|
|
SK_ID_CURR
|
0
|
0.000000
|
|
MONTHS_BALANCE
|
0
|
0.000000
|
|
AMT_BALANCE
|
0
|
0.000000
|
|
AMT_CREDIT_LIMIT_ACTUAL
|
0
|
0.000000
|
|
AMT_DRAWINGS_ATM_CURRENT
|
749816
|
0.195249
|
|
AMT_DRAWINGS_CURRENT
|
0
|
0.000000
|
|
AMT_DRAWINGS_OTHER_CURRENT
|
749816
|
0.195249
|
|
AMT_DRAWINGS_POS_CURRENT
|
749816
|
0.195249
|
|
AMT_INST_MIN_REGULARITY
|
305236
|
0.079482
|
|
AMT_PAYMENT_CURRENT
|
767988
|
0.199981
|
|
AMT_PAYMENT_TOTAL_CURRENT
|
0
|
0.000000
|
previous_application.head()
|
|
SK_ID_PREV
|
SK_ID_CURR
|
NAME_CONTRACT_TYPE
|
AMT_ANNUITY
|
AMT_APPLICATION
|
AMT_CREDIT
|
AMT_DOWN_PAYMENT
|
AMT_GOODS_PRICE
|
WEEKDAY_APPR_PROCESS_START
|
HOUR_APPR_PROCESS_START
|
FLAG_LAST_APPL_PER_CONTRACT
|
NFLAG_LAST_APPL_IN_DAY
|
RATE_DOWN_PAYMENT
|
RATE_INTEREST_PRIMARY
|
RATE_INTEREST_PRIVILEGED
|
NAME_CASH_LOAN_PURPOSE
|
NAME_CONTRACT_STATUS
|
DAYS_DECISION
|
NAME_PAYMENT_TYPE
|
CODE_REJECT_REASON
|
NAME_TYPE_SUITE
|
NAME_CLIENT_TYPE
|
NAME_GOODS_CATEGORY
|
NAME_PORTFOLIO
|
NAME_PRODUCT_TYPE
|
CHANNEL_TYPE
|
SELLERPLACE_AREA
|
NAME_SELLER_INDUSTRY
|
CNT_PAYMENT
|
NAME_YIELD_GROUP
|
PRODUCT_COMBINATION
|
DAYS_FIRST_DRAWING
|
DAYS_FIRST_DUE
|
DAYS_LAST_DUE_1ST_VERSION
|
DAYS_LAST_DUE
|
DAYS_TERMINATION
|
NFLAG_INSURED_ON_APPROVAL
|
|
0
|
2030495
|
271877
|
Consumer loans
|
1730.430
|
17145.0
|
17145.0
|
0.0
|
17145.0
|
SATURDAY
|
15
|
Y
|
1
|
0.0
|
0.182832
|
0.867336
|
XAP
|
Approved
|
-73
|
Cash through the bank
|
XAP
|
NaN
|
Repeater
|
Mobile
|
POS
|
XNA
|
Country-wide
|
35
|
Connectivity
|
12.0
|
middle
|
POS mobile with interest
|
365243.0
|
-42.0
|
300.0
|
-42.0
|
-37.0
|
0.0
|
|
1
|
2802425
|
108129
|
Cash loans
|
25188.615
|
607500.0
|
679671.0
|
NaN
|
607500.0
|
THURSDAY
|
11
|
Y
|
1
|
NaN
|
NaN
|
NaN
|
XNA
|
Approved
|
-164
|
XNA
|
XAP
|
Unaccompanied
|
Repeater
|
XNA
|
Cash
|
x-sell
|
Contact center
|
-1
|
XNA
|
36.0
|
low_action
|
Cash X-Sell: low
|
365243.0
|
-134.0
|
916.0
|
365243.0
|
365243.0
|
1.0
|
|
2
|
2523466
|
122040
|
Cash loans
|
15060.735
|
112500.0
|
136444.5
|
NaN
|
112500.0
|
TUESDAY
|
11
|
Y
|
1
|
NaN
|
NaN
|
NaN
|
XNA
|
Approved
|
-301
|
Cash through the bank
|
XAP
|
Spouse, partner
|
Repeater
|
XNA
|
Cash
|
x-sell
|
Credit and cash offices
|
-1
|
XNA
|
12.0
|
high
|
Cash X-Sell: high
|
365243.0
|
-271.0
|
59.0
|
365243.0
|
365243.0
|
1.0
|
|
3
|
2819243
|
176158
|
Cash loans
|
47041.335
|
450000.0
|
470790.0
|
NaN
|
450000.0
|
MONDAY
|
7
|
Y
|
1
|
NaN
|
NaN
|
NaN
|
XNA
|
Approved
|
-512
|
Cash through the bank
|
XAP
|
NaN
|
Repeater
|
XNA
|
Cash
|
x-sell
|
Credit and cash offices
|
-1
|
XNA
|
12.0
|
middle
|
Cash X-Sell: middle
|
365243.0
|
-482.0
|
-152.0
|
-182.0
|
-177.0
|
1.0
|
|
4
|
1784265
|
202054
|
Cash loans
|
31924.395
|
337500.0
|
404055.0
|
NaN
|
337500.0
|
THURSDAY
|
9
|
Y
|
1
|
NaN
|
NaN
|
NaN
|
Repairs
|
Refused
|
-781
|
Cash through the bank
|
HC
|
NaN
|
Repeater
|
XNA
|
Cash
|
walk-in
|
Credit and cash offices
|
-1
|
XNA
|
24.0
|
high
|
Cash Street: high
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
NaN
|
previous_application.columns.values
array(['SK_ID_PREV', 'SK_ID_CURR', 'NAME_CONTRACT_TYPE', 'AMT_ANNUITY',
'AMT_APPLICATION', 'AMT_CREDIT', 'AMT_DOWN_PAYMENT',
'AMT_GOODS_PRICE', 'WEEKDAY_APPR_PROCESS_START',
'HOUR_APPR_PROCESS_START', 'FLAG_LAST_APPL_PER_CONTRACT',
'NFLAG_LAST_APPL_IN_DAY', 'RATE_DOWN_PAYMENT',
'RATE_INTEREST_PRIMARY', 'RATE_INTEREST_PRIVILEGED',
'NAME_CASH_LOAN_PURPOSE', 'NAME_CONTRACT_STATUS', 'DAYS_DECISION',
'NAME_PAYMENT_TYPE', 'CODE_REJECT_REASON', 'NAME_TYPE_SUITE',
'NAME_CLIENT_TYPE', 'NAME_GOODS_CATEGORY', 'NAME_PORTFOLIO',
'NAME_PRODUCT_TYPE', 'CHANNEL_TYPE', 'SELLERPLACE_AREA',
'NAME_SELLER_INDUSTRY', 'CNT_PAYMENT', 'NAME_YIELD_GROUP',
'PRODUCT_COMBINATION', 'DAYS_FIRST_DRAWING', 'DAYS_FIRST_DUE',
'DAYS_LAST_DUE_1ST_VERSION', 'DAYS_LAST_DUE', 'DAYS_TERMINATION',
'NFLAG_INSURED_ON_APPROVAL'], dtype=object)
find_missing(previous_application).head(12)
|
|
missing_count
|
missing_ratio
|
|
SK_ID_PREV
|
0
|
0.000000e+00
|
|
SK_ID_CURR
|
0
|
0.000000e+00
|
|
NAME_CONTRACT_TYPE
|
0
|
0.000000e+00
|
|
AMT_ANNUITY
|
372235
|
2.228667e-01
|
|
AMT_APPLICATION
|
0
|
0.000000e+00
|
|
AMT_CREDIT
|
1
|
5.987257e-07
|
|
AMT_DOWN_PAYMENT
|
895844
|
5.363648e-01
|
|
AMT_GOODS_PRICE
|
385515
|
2.308177e-01
|
|
WEEKDAY_APPR_PROCESS_START
|
0
|
0.000000e+00
|
|
HOUR_APPR_PROCESS_START
|
0
|
0.000000e+00
|
|
FLAG_LAST_APPL_PER_CONTRACT
|
0
|
0.000000e+00
|
|
NFLAG_LAST_APPL_IN_DAY
|
0
|
0.000000e+00
|
installments_payments.head()
|
|
SK_ID_PREV
|
SK_ID_CURR
|
NUM_INSTALMENT_VERSION
|
NUM_INSTALMENT_NUMBER
|
DAYS_INSTALMENT
|
DAYS_ENTRY_PAYMENT
|
AMT_INSTALMENT
|
AMT_PAYMENT
|
|
0
|
1054186
|
161674
|
1.0
|
6
|
-1180.0
|
-1187.0
|
6948.360
|
6948.360
|
|
1
|
1330831
|
151639
|
0.0
|
34
|
-2156.0
|
-2156.0
|
1716.525
|
1716.525
|
|
2
|
2085231
|
193053
|
2.0
|
1
|
-63.0
|
-63.0
|
25425.000
|
25425.000
|
|
3
|
2452527
|
199697
|
1.0
|
3
|
-2418.0
|
-2426.0
|
24350.130
|
24350.130
|
|
4
|
2714724
|
167756
|
1.0
|
2
|
-1383.0
|
-1366.0
|
2165.040
|
2160.585
|
installments_payments.columns.values
array(['SK_ID_PREV', 'SK_ID_CURR', 'NUM_INSTALMENT_VERSION',
'NUM_INSTALMENT_NUMBER', 'DAYS_INSTALMENT', 'DAYS_ENTRY_PAYMENT',
'AMT_INSTALMENT', 'AMT_PAYMENT'], dtype=object)
find_missing(installments_payments).head(12)
|
|
missing_count
|
missing_ratio
|
|
SK_ID_PREV
|
0
|
0.000000
|
|
SK_ID_CURR
|
0
|
0.000000
|
|
NUM_INSTALMENT_VERSION
|
0
|
0.000000
|
|
NUM_INSTALMENT_NUMBER
|
0
|
0.000000
|
|
DAYS_INSTALMENT
|
0
|
0.000000
|
|
DAYS_ENTRY_PAYMENT
|
2905
|
0.000214
|
|
AMT_INSTALMENT
|
0
|
0.000000
|
|
AMT_PAYMENT
|
2905
|
0.000214
|
- application: ‘binary’ for binary classification
- num_iterations: number of boosting iterations/trees, n_estimators in sklearn
- learning_rate
- num_leaves: number of leaves in one tree
- feature_fraction: part of features used for each iteration
- bagging_fraction: part of data used for each iteration
- lambda_l1/lambda_l2: L1/L2 regularization
- min_split_gain: the minimun gain to perform a split
- early_stopping_round: if the validation metric can’t improve for n rounds, stop iteration
- categorical_feature: LightGBM API can deal with categorical feature automatically, but we need transform string into integer
X = application_train.drop(['SK_ID_CURR', 'TARGET'], axis=1)
y = application_train.TARGET
X_pred = application_test.drop(['SK_ID_CURR'], axis=1)
#folds = Stra(n_splits=5, random_state=seed)
folds = StratifiedKFold(n_splits=5,shuffle=True)
oof_preds = np.zeros(X.shape[0])
sub_preds = np.zeros(X_pred.shape[0])
start = time.time()
valid_score = 0
for n_fold, (trn_idx, val_idx) in enumerate(folds.split(X, y)):
trn_x, trn_y = X.iloc[trn_idx], y[trn_idx]
val_x, val_y = X.iloc[val_idx], y[val_idx]
train_data = lgb.Dataset(data=trn_x, label=trn_y,categorical_feature=categorical_feats)
valid_data = lgb.Dataset(data=val_x, label=val_y)
param = {'application':'binary','num_iterations':4000, 'learning_rate':0.05, 'num_leaves':24, 'feature_fraction':0.8, 'bagging_fraction':0.9,
'lambda_l1':0.1, 'lambda_l2':0.1, 'min_split_gain':0.01, 'early_stopping_round':100, 'max_depth':7, 'min_child_weight':40, 'metric':'auc'}
lgb_es_model = lgb.train(param, train_data, valid_sets=[train_data, valid_data], verbose_eval=100, categorical_feature=categorical_feats)
oof_preds[val_idx] = lgb_es_model.predict(val_x, num_iteration=lgb_es_model.best_iteration)
sub_preds += lgb_es_model.predict(X_pred, num_iteration=lgb_es_model.best_iteration) / folds.n_splits
print('Fold %2d AUC : %.6f' % (n_fold + 1, roc_auc_score(val_y, oof_preds[val_idx])))
valid_score += roc_auc_score(val_y, oof_preds[val_idx])
print('valid score:', str(round(valid_score/folds.n_splits,4)))
end = time.time()
print('training time:', str(round((end - start)/60)), 'mins')
[LightGBM] [Info] Number of positive: 19860, number of negative: 226148
[LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.131338 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 13374
[LightGBM] [Info] Number of data points in the train set: 246008, number of used features: 128
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.080729 -> initscore=-2.432482
[LightGBM] [Info] Start training from score -2.432482
Training until validation scores don't improve for 100 rounds
[100] training's auc: 0.776916 valid_1's auc: 0.762705
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[200] training's auc: 0.795638 valid_1's auc: 0.768677
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[300] training's auc: 0.808041 valid_1's auc: 0.770518
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[400] training's auc: 0.818849 valid_1's auc: 0.770738
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[500] training's auc: 0.828199 valid_1's auc: 0.770716
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Early stopping, best iteration is:
[439] training's auc: 0.822862 valid_1's auc: 0.770979
Fold 1 AUC : 0.770979
[LightGBM] [Info] Number of positive: 19860, number of negative: 226149
[LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.110481 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 13369
[LightGBM] [Info] Number of data points in the train set: 246009, number of used features: 127
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.080729 -> initscore=-2.432486
[LightGBM] [Info] Start training from score -2.432486
Training until validation scores don't improve for 100 rounds
[100] training's auc: 0.778854 valid_1's auc: 0.754465
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[200] training's auc: 0.796533 valid_1's auc: 0.760265
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[300] training's auc: 0.808789 valid_1's auc: 0.761858
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[400] training's auc: 0.819811 valid_1's auc: 0.763065
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[500] training's auc: 0.829447 valid_1's auc: 0.762934
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Early stopping, best iteration is:
[444] training's auc: 0.824114 valid_1's auc: 0.763169
Fold 2 AUC : 0.763169
[LightGBM] [Info] Number of positive: 19860, number of negative: 226149
[LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.108031 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 13352
[LightGBM] [Info] Number of data points in the train set: 246009, number of used features: 127
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.080729 -> initscore=-2.432486
[LightGBM] [Info] Start training from score -2.432486
Training until validation scores don't improve for 100 rounds
[100] training's auc: 0.778366 valid_1's auc: 0.75846
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[200] training's auc: 0.796261 valid_1's auc: 0.764351
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[300] training's auc: 0.808812 valid_1's auc: 0.765831
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[400] training's auc: 0.819305 valid_1's auc: 0.765997
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Early stopping, best iteration is:
[361] training's auc: 0.81526 valid_1's auc: 0.766218
Fold 3 AUC : 0.766218
[LightGBM] [Info] Number of positive: 19860, number of negative: 226149
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.360452 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 13445
[LightGBM] [Info] Number of data points in the train set: 246009, number of used features: 128
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.080729 -> initscore=-2.432486
[LightGBM] [Info] Start training from score -2.432486
Training until validation scores don't improve for 100 rounds
[100] training's auc: 0.776951 valid_1's auc: 0.763977
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[200] training's auc: 0.79515 valid_1's auc: 0.768969
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[300] training's auc: 0.807778 valid_1's auc: 0.770126
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[400] training's auc: 0.818119 valid_1's auc: 0.770465
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Early stopping, best iteration is:
[377] training's auc: 0.815888 valid_1's auc: 0.77056
Fold 4 AUC : 0.770560
[LightGBM] [Info] Number of positive: 19860, number of negative: 226149
[LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.298858 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 13349
[LightGBM] [Info] Number of data points in the train set: 246009, number of used features: 128
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.080729 -> initscore=-2.432486
[LightGBM] [Info] Start training from score -2.432486
Training until validation scores don't improve for 100 rounds
[100] training's auc: 0.777749 valid_1's auc: 0.758466
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[200] training's auc: 0.796029 valid_1's auc: 0.764943
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[300] training's auc: 0.809215 valid_1's auc: 0.766598
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[400] training's auc: 0.819756 valid_1's auc: 0.767519
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[500] training's auc: 0.829364 valid_1's auc: 0.767356
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Early stopping, best iteration is:
[457] training's auc: 0.825471 valid_1's auc: 0.767653
Fold 5 AUC : 0.767653
valid score: 0.7677
training time: 5 mins