Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data–including telco and transactional information–to predict their clients’ repayment abilities. While Home Credit is currently using various statistical and machine learning methods to make these predictions, they’re challenging Kagglers to help them unlock the full potential of their data. Doing so will ensure that clients capable of repayment are not rejected and that loans are given with a principal, maturity, and repayment calendar that will empower their clients to be successful."
In this project, we will explore two models which will be used to predict whether a loan borrower will default on payment or not. The target variable is binary 1.e(1,0). We will first begin by importing the daya before proceeding with the data exploration
We will then examine the datasets for any missing data
str(train_data)
'data.frame': 307511 obs. of 122 variables:
$ SK_ID_CURR : int 100002 100003 100004 100006 100007 100008 100009 100010 100011 100012 ...
$ TARGET : int 1 0 0 0 0 0 0 0 0 0 ...
$ NAME_CONTRACT_TYPE : Factor w/ 2 levels "Cash loans","Revolving loans": 1 1 2 1 1 1 1 1 1 2 ...
$ CODE_GENDER : Factor w/ 3 levels "F","M","XNA": 2 1 2 1 2 2 1 2 1 2 ...
$ FLAG_OWN_CAR : Factor w/ 2 levels "N","Y": 1 1 2 1 1 1 2 2 1 1 ...
$ FLAG_OWN_REALTY : Factor w/ 2 levels "N","Y": 2 1 2 2 2 2 2 2 2 2 ...
$ CNT_CHILDREN : int 0 0 0 0 0 0 1 0 0 0 ...
$ AMT_INCOME_TOTAL : num 202500 270000 67500 135000 121500 ...
$ AMT_CREDIT : num 406598 1293503 135000 312683 513000 ...
$ AMT_ANNUITY : num 24701 35699 6750 29687 21866 ...
$ AMT_GOODS_PRICE : num 351000 1129500 135000 297000 513000 ...
$ NAME_TYPE_SUITE : Factor w/ 8 levels "","Children",..: 8 3 8 8 8 7 8 8 2 8 ...
$ NAME_INCOME_TYPE : Factor w/ 8 levels "Businessman",..: 8 5 8 8 8 5 2 5 4 8 ...
$ NAME_EDUCATION_TYPE : Factor w/ 5 levels "Academic degree",..: 5 2 5 5 5 5 2 2 5 5 ...
$ NAME_FAMILY_STATUS : Factor w/ 6 levels "Civil marriage",..: 4 2 4 1 4 2 2 2 2 4 ...
$ NAME_HOUSING_TYPE : Factor w/ 6 levels "Co-op apartment",..: 2 2 2 2 2 2 2 2 2 2 ...
$ REGION_POPULATION_RELATIVE : num 0.0188 0.00354 0.01003 0.00802 0.02866 ...
$ DAYS_BIRTH : int -9461 -16765 -19046 -19005 -19932 -16941 -13778 -18850 -20099 -14469 ...
$ DAYS_EMPLOYED : int -637 -1188 -225 -3039 -3038 -1588 -3130 -449 365243 -2019 ...
$ DAYS_REGISTRATION : num -3648 -1186 -4260 -9833 -4311 ...
$ DAYS_ID_PUBLISH : int -2120 -291 -2531 -2437 -3458 -477 -619 -2379 -3514 -3992 ...
$ OWN_CAR_AGE : num NA NA 26 NA NA NA 17 8 NA NA ...
$ FLAG_MOBIL : int 1 1 1 1 1 1 1 1 1 1 ...
$ FLAG_EMP_PHONE : int 1 1 1 1 1 1 1 1 0 1 ...
$ FLAG_WORK_PHONE : int 0 0 1 0 0 1 0 1 0 0 ...
$ FLAG_CONT_MOBILE : int 1 1 1 1 1 1 1 1 1 1 ...
$ FLAG_PHONE : int 1 1 1 0 0 1 1 0 0 0 ...
$ FLAG_EMAIL : int 0 0 0 0 0 0 0 0 0 0 ...
$ OCCUPATION_TYPE : Factor w/ 19 levels "","Accountants",..: 10 5 10 10 5 10 2 12 1 10 ...
$ CNT_FAM_MEMBERS : num 1 2 1 2 1 2 3 2 2 1 ...
$ REGION_RATING_CLIENT : int 2 1 2 2 2 2 2 3 2 2 ...
$ REGION_RATING_CLIENT_W_CITY : int 2 1 2 2 2 2 2 3 2 2 ...
$ WEEKDAY_APPR_PROCESS_START : Factor w/ 7 levels "FRIDAY","MONDAY",..: 7 2 2 7 5 7 4 2 7 5 ...
$ HOUR_APPR_PROCESS_START : int 10 11 9 17 11 16 16 16 14 8 ...
$ REG_REGION_NOT_LIVE_REGION : int 0 0 0 0 0 0 0 0 0 0 ...
$ REG_REGION_NOT_WORK_REGION : int 0 0 0 0 0 0 0 0 0 0 ...
$ LIVE_REGION_NOT_WORK_REGION : int 0 0 0 0 0 0 0 0 0 0 ...
$ REG_CITY_NOT_LIVE_CITY : int 0 0 0 0 0 0 0 0 0 0 ...
$ REG_CITY_NOT_WORK_CITY : int 0 0 0 0 1 0 0 1 0 0 ...
$ LIVE_CITY_NOT_WORK_CITY : int 0 0 0 0 1 0 0 1 0 0 ...
$ ORGANIZATION_TYPE : Factor w/ 58 levels "Advertising",..: 6 40 12 6 38 34 6 34 58 10 ...
$ EXT_SOURCE_1 : num 0.083 0.311 NA NA NA ...
$ EXT_SOURCE_2 : num 0.263 0.622 0.556 0.65 0.323 ...
$ EXT_SOURCE_3 : num 0.139 NA 0.73 NA NA ...
$ APARTMENTS_AVG : num 0.0247 0.0959 NA NA NA NA NA NA NA NA ...
$ BASEMENTAREA_AVG : num 0.0369 0.0529 NA NA NA NA NA NA NA NA ...
$ YEARS_BEGINEXPLUATATION_AVG : num 0.972 0.985 NA NA NA ...
$ YEARS_BUILD_AVG : num 0.619 0.796 NA NA NA ...
$ COMMONAREA_AVG : num 0.0143 0.0605 NA NA NA NA NA NA NA NA ...
$ ELEVATORS_AVG : num 0 0.08 NA NA NA NA NA NA NA NA ...
$ ENTRANCES_AVG : num 0.069 0.0345 NA NA NA NA NA NA NA NA ...
$ FLOORSMAX_AVG : num 0.0833 0.2917 NA NA NA ...
$ FLOORSMIN_AVG : num 0.125 0.333 NA NA NA ...
$ LANDAREA_AVG : num 0.0369 0.013 NA NA NA NA NA NA NA NA ...
$ LIVINGAPARTMENTS_AVG : num 0.0202 0.0773 NA NA NA NA NA NA NA NA ...
$ LIVINGAREA_AVG : num 0.019 0.0549 NA NA NA NA NA NA NA NA ...
$ NONLIVINGAPARTMENTS_AVG : num 0 0.0039 NA NA NA NA NA NA NA NA ...
$ NONLIVINGAREA_AVG : num 0 0.0098 NA NA NA NA NA NA NA NA ...
$ APARTMENTS_MODE : num 0.0252 0.0924 NA NA NA NA NA NA NA NA ...
$ BASEMENTAREA_MODE : num 0.0383 0.0538 NA NA NA NA NA NA NA NA ...
$ YEARS_BEGINEXPLUATATION_MODE: num 0.972 0.985 NA NA NA ...
$ YEARS_BUILD_MODE : num 0.634 0.804 NA NA NA ...
$ COMMONAREA_MODE : num 0.0144 0.0497 NA NA NA NA NA NA NA NA ...
$ ELEVATORS_MODE : num 0 0.0806 NA NA NA NA NA NA NA NA ...
$ ENTRANCES_MODE : num 0.069 0.0345 NA NA NA NA NA NA NA NA ...
$ FLOORSMAX_MODE : num 0.0833 0.2917 NA NA NA ...
$ FLOORSMIN_MODE : num 0.125 0.333 NA NA NA ...
$ LANDAREA_MODE : num 0.0377 0.0128 NA NA NA NA NA NA NA NA ...
$ LIVINGAPARTMENTS_MODE : num 0.022 0.079 NA NA NA NA NA NA NA NA ...
$ LIVINGAREA_MODE : num 0.0198 0.0554 NA NA NA NA NA NA NA NA ...
$ NONLIVINGAPARTMENTS_MODE : num 0 0 NA NA NA NA NA NA NA NA ...
$ NONLIVINGAREA_MODE : num 0 0 NA NA NA NA NA NA NA NA ...
$ APARTMENTS_MEDI : num 0.025 0.0968 NA NA NA NA NA NA NA NA ...
$ BASEMENTAREA_MEDI : num 0.0369 0.0529 NA NA NA NA NA NA NA NA ...
$ YEARS_BEGINEXPLUATATION_MEDI: num 0.972 0.985 NA NA NA ...
$ YEARS_BUILD_MEDI : num 0.624 0.799 NA NA NA ...
$ COMMONAREA_MEDI : num 0.0144 0.0608 NA NA NA NA NA NA NA NA ...
$ ELEVATORS_MEDI : num 0 0.08 NA NA NA NA NA NA NA NA ...
$ ENTRANCES_MEDI : num 0.069 0.0345 NA NA NA NA NA NA NA NA ...
$ FLOORSMAX_MEDI : num 0.0833 0.2917 NA NA NA ...
$ FLOORSMIN_MEDI : num 0.125 0.333 NA NA NA ...
$ LANDAREA_MEDI : num 0.0375 0.0132 NA NA NA NA NA NA NA NA ...
$ LIVINGAPARTMENTS_MEDI : num 0.0205 0.0787 NA NA NA NA NA NA NA NA ...
$ LIVINGAREA_MEDI : num 0.0193 0.0558 NA NA NA NA NA NA NA NA ...
$ NONLIVINGAPARTMENTS_MEDI : num 0 0.0039 NA NA NA NA NA NA NA NA ...
$ NONLIVINGAREA_MEDI : num 0 0.01 NA NA NA NA NA NA NA NA ...
$ FONDKAPREMONT_MODE : Factor w/ 5 levels "","not specified",..: 4 4 1 1 1 1 1 1 1 1 ...
$ HOUSETYPE_MODE : Factor w/ 4 levels "","block of flats",..: 2 2 1 1 1 1 1 1 1 1 ...
$ TOTALAREA_MODE : num 0.0149 0.0714 NA NA NA NA NA NA NA NA ...
$ WALLSMATERIAL_MODE : Factor w/ 8 levels "","Block","Mixed",..: 7 2 1 1 1 1 1 1 1 1 ...
$ EMERGENCYSTATE_MODE : Factor w/ 3 levels "","No","Yes": 2 2 1 1 1 1 1 1 1 1 ...
$ OBS_30_CNT_SOCIAL_CIRCLE : num 2 1 0 2 0 0 1 2 1 2 ...
$ DEF_30_CNT_SOCIAL_CIRCLE : num 2 0 0 0 0 0 0 0 0 0 ...
$ OBS_60_CNT_SOCIAL_CIRCLE : num 2 1 0 2 0 0 1 2 1 2 ...
$ DEF_60_CNT_SOCIAL_CIRCLE : num 2 0 0 0 0 0 0 0 0 0 ...
$ DAYS_LAST_PHONE_CHANGE : num -1134 -828 -815 -617 -1106 ...
$ FLAG_DOCUMENT_2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ FLAG_DOCUMENT_3 : int 1 1 0 1 0 1 0 1 1 0 ...
$ FLAG_DOCUMENT_4 : int 0 0 0 0 0 0 0 0 0 0 ...
[list output truncated]
summary(train_data)
SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL
Min. :100002 Min. :0.00000 Cash loans :278232 F :202448 N:202924 N: 94199 Min. : 0.0000 Min. : 25650
1st Qu.:189146 1st Qu.:0.00000 Revolving loans: 29279 M :105059 Y:104587 Y:213312 1st Qu.: 0.0000 1st Qu.: 112500
Median :278202 Median :0.00000 XNA: 4 Median : 0.0000 Median : 147150
Mean :278181 Mean :0.08073 Mean : 0.4171 Mean : 168798
3rd Qu.:367143 3rd Qu.:0.00000 3rd Qu.: 1.0000 3rd Qu.: 202500
Max. :456255 Max. :1.00000 Max. :19.0000 Max. :117000000
AMT_CREDIT AMT_ANNUITY AMT_GOODS_PRICE NAME_TYPE_SUITE NAME_INCOME_TYPE
Min. : 45000 Min. : 1616 Min. : 40500 Unaccompanied :248526 Working :158774
1st Qu.: 270000 1st Qu.: 16524 1st Qu.: 238500 Family : 40149 Commercial associate: 71617
Median : 513531 Median : 24903 Median : 450000 Spouse, partner: 11370 Pensioner : 55362
Mean : 599026 Mean : 27109 Mean : 538396 Children : 3267 State servant : 21703
3rd Qu.: 808650 3rd Qu.: 34596 3rd Qu.: 679500 Other_B : 1770 Unemployed : 22
Max. :4050000 Max. :258026 Max. :4050000 : 1292 Student : 18
NA's :12 NA's :278 (Other) : 1137 (Other) : 15
NAME_EDUCATION_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE REGION_POPULATION_RELATIVE DAYS_BIRTH
Academic degree : 164 Civil marriage : 29775 Co-op apartment : 1122 Min. :0.00029 Min. :-25229
Higher education : 74863 Married :196432 House / apartment :272868 1st Qu.:0.01001 1st Qu.:-19682
Incomplete higher : 10277 Separated : 19770 Municipal apartment: 11183 Median :0.01885 Median :-15750
Lower secondary : 3816 Single / not married: 45444 Office apartment : 2617 Mean :0.02087 Mean :-16037
Secondary / secondary special:218391 Unknown : 2 Rented apartment : 4881 3rd Qu.:0.02866 3rd Qu.:-12413
Widow : 16088 With parents : 14840 Max. :0.07251 Max. : -7489
DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH OWN_CAR_AGE FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE
Min. :-17912 Min. :-24672 Min. :-7197 Min. : 0.00 Min. :0 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.: -2760 1st Qu.: -7480 1st Qu.:-4299 1st Qu.: 5.00 1st Qu.:1 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:1.0000
Median : -1213 Median : -4504 Median :-3254 Median : 9.00 Median :1 Median :1.0000 Median :0.0000 Median :1.0000
Mean : 63815 Mean : -4986 Mean :-2994 Mean :12.06 Mean :1 Mean :0.8199 Mean :0.1994 Mean :0.9981
3rd Qu.: -289 3rd Qu.: -2010 3rd Qu.:-1720 3rd Qu.:15.00 3rd Qu.:1 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :365243 Max. : 0 Max. : 0 Max. :91.00 Max. :1 Max. :1.0000 Max. :1.0000 Max. :1.0000
NA's :202929
FLAG_PHONE FLAG_EMAIL OCCUPATION_TYPE CNT_FAM_MEMBERS REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY
Min. :0.0000 Min. :0.00000 :96391 Min. : 1.000 Min. :1.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:0.00000 Laborers :55186 1st Qu.: 2.000 1st Qu.:2.000 1st Qu.:2.000
Median :0.0000 Median :0.00000 Sales staff:32102 Median : 2.000 Median :2.000 Median :2.000
Mean :0.2811 Mean :0.05672 Core staff :27570 Mean : 2.153 Mean :2.052 Mean :2.032
3rd Qu.:1.0000 3rd Qu.:0.00000 Managers :21371 3rd Qu.: 3.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :1.0000 Max. :1.00000 Drivers :18603 Max. :20.000 Max. :3.000 Max. :3.000
(Other) :56288 NA's :2
WEEKDAY_APPR_PROCESS_START HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION REG_REGION_NOT_WORK_REGION LIVE_REGION_NOT_WORK_REGION
FRIDAY :50338 Min. : 0.00 Min. :0.00000 Min. :0.00000 Min. :0.00000
MONDAY :50714 1st Qu.:10.00 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
SATURDAY :33852 Median :12.00 Median :0.00000 Median :0.00000 Median :0.00000
SUNDAY :16181 Mean :12.06 Mean :0.01514 Mean :0.05077 Mean :0.04066
THURSDAY :50591 3rd Qu.:14.00 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
TUESDAY :53901 Max. :23.00 Max. :1.00000 Max. :1.00000 Max. :1.00000
WEDNESDAY:51934
REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY ORGANIZATION_TYPE EXT_SOURCE_1 EXT_SOURCE_2
Min. :0.00000 Min. :0.0000 Min. :0.0000 Business Entity Type 3: 67992 Min. :0.01 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 XNA : 55374 1st Qu.:0.33 1st Qu.:0.3925
Median :0.00000 Median :0.0000 Median :0.0000 Self-employed : 38412 Median :0.51 Median :0.5660
Mean :0.07817 Mean :0.2305 Mean :0.1796 Other : 16683 Mean :0.50 Mean :0.5144
3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 Medicine : 11193 3rd Qu.:0.68 3rd Qu.:0.6636
Max. :1.00000 Max. :1.0000 Max. :1.0000 Business Entity Type 2: 10553 Max. :0.96 Max. :0.8550
(Other) :107304 NA's :173378 NA's :660
EXT_SOURCE_3 APARTMENTS_AVG BASEMENTAREA_AVG YEARS_BEGINEXPLUATATION_AVG YEARS_BUILD_AVG COMMONAREA_AVG ELEVATORS_AVG
Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00
1st Qu.:0.37 1st Qu.:0.06 1st Qu.:0.04 1st Qu.:0.98 1st Qu.:0.69 1st Qu.:0.01 1st Qu.:0.00
Median :0.54 Median :0.09 Median :0.08 Median :0.98 Median :0.76 Median :0.02 Median :0.00
Mean :0.51 Mean :0.12 Mean :0.09 Mean :0.98 Mean :0.75 Mean :0.04 Mean :0.08
3rd Qu.:0.67 3rd Qu.:0.15 3rd Qu.:0.11 3rd Qu.:0.99 3rd Qu.:0.82 3rd Qu.:0.05 3rd Qu.:0.12
Max. :0.90 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00
NA's :60965 NA's :156061 NA's :179943 NA's :150007 NA's :204488 NA's :214865 NA's :163891
ENTRANCES_AVG FLOORSMAX_AVG FLOORSMIN_AVG LANDAREA_AVG LIVINGAPARTMENTS_AVG LIVINGAREA_AVG NONLIVINGAPARTMENTS_AVG
Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00
1st Qu.:0.07 1st Qu.:0.17 1st Qu.:0.08 1st Qu.:0.02 1st Qu.:0.05 1st Qu.:0.05 1st Qu.:0.00
Median :0.14 Median :0.17 Median :0.21 Median :0.05 Median :0.08 Median :0.07 Median :0.00
Mean :0.15 Mean :0.23 Mean :0.23 Mean :0.07 Mean :0.10 Mean :0.11 Mean :0.01
3rd Qu.:0.21 3rd Qu.:0.33 3rd Qu.:0.38 3rd Qu.:0.09 3rd Qu.:0.12 3rd Qu.:0.13 3rd Qu.:0.00
Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00
NA's :154828 NA's :153020 NA's :208642 NA's :182590 NA's :210199 NA's :154350 NA's :213514
NONLIVINGAREA_AVG APARTMENTS_MODE BASEMENTAREA_MODE YEARS_BEGINEXPLUATATION_MODE YEARS_BUILD_MODE COMMONAREA_MODE ELEVATORS_MODE
Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00
1st Qu.:0.00 1st Qu.:0.05 1st Qu.:0.04 1st Qu.:0.98 1st Qu.:0.70 1st Qu.:0.01 1st Qu.:0.00
Median :0.00 Median :0.08 Median :0.07 Median :0.98 Median :0.76 Median :0.02 Median :0.00
Mean :0.03 Mean :0.11 Mean :0.09 Mean :0.98 Mean :0.76 Mean :0.04 Mean :0.07
3rd Qu.:0.03 3rd Qu.:0.14 3rd Qu.:0.11 3rd Qu.:0.99 3rd Qu.:0.82 3rd Qu.:0.05 3rd Qu.:0.12
Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00
NA's :169682 NA's :156061 NA's :179943 NA's :150007 NA's :204488 NA's :214865 NA's :163891
ENTRANCES_MODE FLOORSMAX_MODE FLOORSMIN_MODE LANDAREA_MODE LIVINGAPARTMENTS_MODE LIVINGAREA_MODE NONLIVINGAPARTMENTS_MODE
Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00
1st Qu.:0.07 1st Qu.:0.17 1st Qu.:0.08 1st Qu.:0.02 1st Qu.:0.05 1st Qu.:0.04 1st Qu.:0.00
Median :0.14 Median :0.17 Median :0.21 Median :0.05 Median :0.08 Median :0.07 Median :0.00
Mean :0.15 Mean :0.22 Mean :0.23 Mean :0.06 Mean :0.11 Mean :0.11 Mean :0.01
3rd Qu.:0.21 3rd Qu.:0.33 3rd Qu.:0.38 3rd Qu.:0.08 3rd Qu.:0.13 3rd Qu.:0.13 3rd Qu.:0.00
Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00
NA's :154828 NA's :153020 NA's :208642 NA's :182590 NA's :210199 NA's :154350 NA's :213514
NONLIVINGAREA_MODE APARTMENTS_MEDI BASEMENTAREA_MEDI YEARS_BEGINEXPLUATATION_MEDI YEARS_BUILD_MEDI COMMONAREA_MEDI ELEVATORS_MEDI
Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00
1st Qu.:0.00 1st Qu.:0.06 1st Qu.:0.04 1st Qu.:0.98 1st Qu.:0.69 1st Qu.:0.01 1st Qu.:0.00
Median :0.00 Median :0.09 Median :0.08 Median :0.98 Median :0.76 Median :0.02 Median :0.00
Mean :0.03 Mean :0.12 Mean :0.09 Mean :0.98 Mean :0.76 Mean :0.04 Mean :0.08
3rd Qu.:0.02 3rd Qu.:0.15 3rd Qu.:0.11 3rd Qu.:0.99 3rd Qu.:0.83 3rd Qu.:0.05 3rd Qu.:0.12
Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00
NA's :169682 NA's :156061 NA's :179943 NA's :150007 NA's :204488 NA's :214865 NA's :163891
ENTRANCES_MEDI FLOORSMAX_MEDI FLOORSMIN_MEDI LANDAREA_MEDI LIVINGAPARTMENTS_MEDI LIVINGAREA_MEDI NONLIVINGAPARTMENTS_MEDI
Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00 Min. :0.00
1st Qu.:0.07 1st Qu.:0.17 1st Qu.:0.08 1st Qu.:0.02 1st Qu.:0.05 1st Qu.:0.05 1st Qu.:0.00
Median :0.14 Median :0.17 Median :0.21 Median :0.05 Median :0.08 Median :0.07 Median :0.00
Mean :0.15 Mean :0.23 Mean :0.23 Mean :0.07 Mean :0.10 Mean :0.11 Mean :0.01
3rd Qu.:0.21 3rd Qu.:0.33 3rd Qu.:0.38 3rd Qu.:0.09 3rd Qu.:0.12 3rd Qu.:0.13 3rd Qu.:0.00
Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00 Max. :1.00
NA's :154828 NA's :153020 NA's :208642 NA's :182590 NA's :210199 NA's :154350 NA's :213514
NONLIVINGAREA_MEDI FONDKAPREMONT_MODE HOUSETYPE_MODE TOTALAREA_MODE WALLSMATERIAL_MODE EMERGENCYSTATE_MODE
Min. :0.00 :210295 :154297 Min. :0.00 :156341 :145755
1st Qu.:0.00 not specified : 5687 block of flats :150503 1st Qu.:0.04 Panel : 66040 No :159428
Median :0.00 org spec account : 5619 specific housing: 1499 Median :0.07 Stone, brick: 64815 Yes: 2328
Mean :0.03 reg oper account : 73830 terraced house : 1212 Mean :0.10 Block : 9253
3rd Qu.:0.03 reg oper spec account: 12080 3rd Qu.:0.13 Wooden : 5362
Max. :1.00 Max. :1.00 Mixed : 2296
NA's :169682 NA's :148431 (Other) : 3404
OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE DAYS_LAST_PHONE_CHANGE FLAG_DOCUMENT_2
Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0 Min. :-4292.0 Min. :0.00e+00
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0 1st Qu.:-1570.0 1st Qu.:0.00e+00
Median : 0.000 Median : 0.0000 Median : 0.000 Median : 0.0 Median : -757.0 Median :0.00e+00
Mean : 1.422 Mean : 0.1434 Mean : 1.405 Mean : 0.1 Mean : -962.9 Mean :4.23e-05
3rd Qu.: 2.000 3rd Qu.: 0.0000 3rd Qu.: 2.000 3rd Qu.: 0.0 3rd Qu.: -274.0 3rd Qu.:0.00e+00
Max. :348.000 Max. :34.0000 Max. :344.000 Max. :24.0 Max. : 0.0 Max. :1.00e+00
NA's :1021 NA's :1021 NA's :1021 NA's :1021 NA's :1
FLAG_DOCUMENT_3 FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_6 FLAG_DOCUMENT_7 FLAG_DOCUMENT_8 FLAG_DOCUMENT_9
Min. :0.00 Min. :0.00e+00 Min. :0.00000 Min. :0.00000 Min. :0.0000000 Min. :0.00000 Min. :0.000000
1st Qu.:0.00 1st Qu.:0.00e+00 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000000 1st Qu.:0.00000 1st Qu.:0.000000
Median :1.00 Median :0.00e+00 Median :0.00000 Median :0.00000 Median :0.0000000 Median :0.00000 Median :0.000000
Mean :0.71 Mean :8.13e-05 Mean :0.01511 Mean :0.08806 Mean :0.0001919 Mean :0.08138 Mean :0.003896
3rd Qu.:1.00 3rd Qu.:0.00e+00 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000000 3rd Qu.:0.00000 3rd Qu.:0.000000
Max. :1.00 Max. :1.00e+00 Max. :1.00000 Max. :1.00000 Max. :1.0000000 Max. :1.00000 Max. :1.000000
FLAG_DOCUMENT_10 FLAG_DOCUMENT_11 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_14 FLAG_DOCUMENT_15 FLAG_DOCUMENT_16
Min. :0.00e+00 Min. :0.000000 Min. :0.0e+00 Min. :0.000000 Min. :0.000000 Min. :0.00000 Min. :0.000000
1st Qu.:0.00e+00 1st Qu.:0.000000 1st Qu.:0.0e+00 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.000000
Median :0.00e+00 Median :0.000000 Median :0.0e+00 Median :0.000000 Median :0.000000 Median :0.00000 Median :0.000000
Mean :2.28e-05 Mean :0.003912 Mean :6.5e-06 Mean :0.003525 Mean :0.002936 Mean :0.00121 Mean :0.009928
3rd Qu.:0.00e+00 3rd Qu.:0.000000 3rd Qu.:0.0e+00 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.000000
Max. :1.00e+00 Max. :1.000000 Max. :1.0e+00 Max. :1.000000 Max. :1.000000 Max. :1.00000 Max. :1.000000
FLAG_DOCUMENT_17 FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 AMT_REQ_CREDIT_BUREAU_HOUR
Min. :0.0000000 Min. :0.00000 Min. :0.0000000 Min. :0.0000000 Min. :0.0000000 Min. :0.00
1st Qu.:0.0000000 1st Qu.:0.00000 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.0000000 1st Qu.:0.00
Median :0.0000000 Median :0.00000 Median :0.0000000 Median :0.0000000 Median :0.0000000 Median :0.00
Mean :0.0002667 Mean :0.00813 Mean :0.0005951 Mean :0.0005073 Mean :0.0003349 Mean :0.01
3rd Qu.:0.0000000 3rd Qu.:0.00000 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.0000000 3rd Qu.:0.00
Max. :1.0000000 Max. :1.00000 Max. :1.0000000 Max. :1.0000000 Max. :1.0000000 Max. :4.00
NA's :41519
AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR
Min. :0.00 Min. :0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0
1st Qu.:0.00 1st Qu.:0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.0
Median :0.00 Median :0.00 Median : 0.00 Median : 0.00 Median : 1.0
Mean :0.01 Mean :0.03 Mean : 0.27 Mean : 0.27 Mean : 1.9
3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.: 3.0
Max. :9.00 Max. :8.00 Max. :27.00 Max. :261.00 Max. :25.0
NA's :41519 NA's :41519 NA's :41519 NA's :41519 NA's :41519
head(train_data)
sum(is.na(train_data))
[1] 8388094
sum(is.na(test_data))
[1] 1285385
sum(is.na(prev_app))
[1] 10288585