Home Credit strives to broaden financial inclusion for the unbanked population by providing a positive and safe borrowing experience. In order to make sure this underserved population has a positive loan experience, Home Credit makes use of a variety of alternative data–including telco and transactional information–to predict their clients’ repayment abilities. While Home Credit is currently using various statistical and machine learning methods to make these predictions, they’re challenging Kagglers to help them unlock the full potential of their data. Doing so will ensure that clients capable of repayment are not rejected and that loans are given with a principal, maturity, and repayment calendar that will empower their clients to be successful."

In this project, we will explore two models which will be used to predict whether a loan borrower will default on payment or not. The target variable is binary 1.e(1,0). We will first begin by importing the daya before proceeding with the data exploration

We will then examine the datasets for any missing data

str(train_data)
'data.frame':   307511 obs. of  122 variables:
 $ SK_ID_CURR                  : int  100002 100003 100004 100006 100007 100008 100009 100010 100011 100012 ...
 $ TARGET                      : int  1 0 0 0 0 0 0 0 0 0 ...
 $ NAME_CONTRACT_TYPE          : Factor w/ 2 levels "Cash loans","Revolving loans": 1 1 2 1 1 1 1 1 1 2 ...
 $ CODE_GENDER                 : Factor w/ 3 levels "F","M","XNA": 2 1 2 1 2 2 1 2 1 2 ...
 $ FLAG_OWN_CAR                : Factor w/ 2 levels "N","Y": 1 1 2 1 1 1 2 2 1 1 ...
 $ FLAG_OWN_REALTY             : Factor w/ 2 levels "N","Y": 2 1 2 2 2 2 2 2 2 2 ...
 $ CNT_CHILDREN                : int  0 0 0 0 0 0 1 0 0 0 ...
 $ AMT_INCOME_TOTAL            : num  202500 270000 67500 135000 121500 ...
 $ AMT_CREDIT                  : num  406598 1293503 135000 312683 513000 ...
 $ AMT_ANNUITY                 : num  24701 35699 6750 29687 21866 ...
 $ AMT_GOODS_PRICE             : num  351000 1129500 135000 297000 513000 ...
 $ NAME_TYPE_SUITE             : Factor w/ 8 levels "","Children",..: 8 3 8 8 8 7 8 8 2 8 ...
 $ NAME_INCOME_TYPE            : Factor w/ 8 levels "Businessman",..: 8 5 8 8 8 5 2 5 4 8 ...
 $ NAME_EDUCATION_TYPE         : Factor w/ 5 levels "Academic degree",..: 5 2 5 5 5 5 2 2 5 5 ...
 $ NAME_FAMILY_STATUS          : Factor w/ 6 levels "Civil marriage",..: 4 2 4 1 4 2 2 2 2 4 ...
 $ NAME_HOUSING_TYPE           : Factor w/ 6 levels "Co-op apartment",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ REGION_POPULATION_RELATIVE  : num  0.0188 0.00354 0.01003 0.00802 0.02866 ...
 $ DAYS_BIRTH                  : int  -9461 -16765 -19046 -19005 -19932 -16941 -13778 -18850 -20099 -14469 ...
 $ DAYS_EMPLOYED               : int  -637 -1188 -225 -3039 -3038 -1588 -3130 -449 365243 -2019 ...
 $ DAYS_REGISTRATION           : num  -3648 -1186 -4260 -9833 -4311 ...
 $ DAYS_ID_PUBLISH             : int  -2120 -291 -2531 -2437 -3458 -477 -619 -2379 -3514 -3992 ...
 $ OWN_CAR_AGE                 : num  NA NA 26 NA NA NA 17 8 NA NA ...
 $ FLAG_MOBIL                  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ FLAG_EMP_PHONE              : int  1 1 1 1 1 1 1 1 0 1 ...
 $ FLAG_WORK_PHONE             : int  0 0 1 0 0 1 0 1 0 0 ...
 $ FLAG_CONT_MOBILE            : int  1 1 1 1 1 1 1 1 1 1 ...
 $ FLAG_PHONE                  : int  1 1 1 0 0 1 1 0 0 0 ...
 $ FLAG_EMAIL                  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ OCCUPATION_TYPE             : Factor w/ 19 levels "","Accountants",..: 10 5 10 10 5 10 2 12 1 10 ...
 $ CNT_FAM_MEMBERS             : num  1 2 1 2 1 2 3 2 2 1 ...
 $ REGION_RATING_CLIENT        : int  2 1 2 2 2 2 2 3 2 2 ...
 $ REGION_RATING_CLIENT_W_CITY : int  2 1 2 2 2 2 2 3 2 2 ...
 $ WEEKDAY_APPR_PROCESS_START  : Factor w/ 7 levels "FRIDAY","MONDAY",..: 7 2 2 7 5 7 4 2 7 5 ...
 $ HOUR_APPR_PROCESS_START     : int  10 11 9 17 11 16 16 16 14 8 ...
 $ REG_REGION_NOT_LIVE_REGION  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ REG_REGION_NOT_WORK_REGION  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ LIVE_REGION_NOT_WORK_REGION : int  0 0 0 0 0 0 0 0 0 0 ...
 $ REG_CITY_NOT_LIVE_CITY      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ REG_CITY_NOT_WORK_CITY      : int  0 0 0 0 1 0 0 1 0 0 ...
 $ LIVE_CITY_NOT_WORK_CITY     : int  0 0 0 0 1 0 0 1 0 0 ...
 $ ORGANIZATION_TYPE           : Factor w/ 58 levels "Advertising",..: 6 40 12 6 38 34 6 34 58 10 ...
 $ EXT_SOURCE_1                : num  0.083 0.311 NA NA NA ...
 $ EXT_SOURCE_2                : num  0.263 0.622 0.556 0.65 0.323 ...
 $ EXT_SOURCE_3                : num  0.139 NA 0.73 NA NA ...
 $ APARTMENTS_AVG              : num  0.0247 0.0959 NA NA NA NA NA NA NA NA ...
 $ BASEMENTAREA_AVG            : num  0.0369 0.0529 NA NA NA NA NA NA NA NA ...
 $ YEARS_BEGINEXPLUATATION_AVG : num  0.972 0.985 NA NA NA ...
 $ YEARS_BUILD_AVG             : num  0.619 0.796 NA NA NA ...
 $ COMMONAREA_AVG              : num  0.0143 0.0605 NA NA NA NA NA NA NA NA ...
 $ ELEVATORS_AVG               : num  0 0.08 NA NA NA NA NA NA NA NA ...
 $ ENTRANCES_AVG               : num  0.069 0.0345 NA NA NA NA NA NA NA NA ...
 $ FLOORSMAX_AVG               : num  0.0833 0.2917 NA NA NA ...
 $ FLOORSMIN_AVG               : num  0.125 0.333 NA NA NA ...
 $ LANDAREA_AVG                : num  0.0369 0.013 NA NA NA NA NA NA NA NA ...
 $ LIVINGAPARTMENTS_AVG        : num  0.0202 0.0773 NA NA NA NA NA NA NA NA ...
 $ LIVINGAREA_AVG              : num  0.019 0.0549 NA NA NA NA NA NA NA NA ...
 $ NONLIVINGAPARTMENTS_AVG     : num  0 0.0039 NA NA NA NA NA NA NA NA ...
 $ NONLIVINGAREA_AVG           : num  0 0.0098 NA NA NA NA NA NA NA NA ...
 $ APARTMENTS_MODE             : num  0.0252 0.0924 NA NA NA NA NA NA NA NA ...
 $ BASEMENTAREA_MODE           : num  0.0383 0.0538 NA NA NA NA NA NA NA NA ...
 $ YEARS_BEGINEXPLUATATION_MODE: num  0.972 0.985 NA NA NA ...
 $ YEARS_BUILD_MODE            : num  0.634 0.804 NA NA NA ...
 $ COMMONAREA_MODE             : num  0.0144 0.0497 NA NA NA NA NA NA NA NA ...
 $ ELEVATORS_MODE              : num  0 0.0806 NA NA NA NA NA NA NA NA ...
 $ ENTRANCES_MODE              : num  0.069 0.0345 NA NA NA NA NA NA NA NA ...
 $ FLOORSMAX_MODE              : num  0.0833 0.2917 NA NA NA ...
 $ FLOORSMIN_MODE              : num  0.125 0.333 NA NA NA ...
 $ LANDAREA_MODE               : num  0.0377 0.0128 NA NA NA NA NA NA NA NA ...
 $ LIVINGAPARTMENTS_MODE       : num  0.022 0.079 NA NA NA NA NA NA NA NA ...
 $ LIVINGAREA_MODE             : num  0.0198 0.0554 NA NA NA NA NA NA NA NA ...
 $ NONLIVINGAPARTMENTS_MODE    : num  0 0 NA NA NA NA NA NA NA NA ...
 $ NONLIVINGAREA_MODE          : num  0 0 NA NA NA NA NA NA NA NA ...
 $ APARTMENTS_MEDI             : num  0.025 0.0968 NA NA NA NA NA NA NA NA ...
 $ BASEMENTAREA_MEDI           : num  0.0369 0.0529 NA NA NA NA NA NA NA NA ...
 $ YEARS_BEGINEXPLUATATION_MEDI: num  0.972 0.985 NA NA NA ...
 $ YEARS_BUILD_MEDI            : num  0.624 0.799 NA NA NA ...
 $ COMMONAREA_MEDI             : num  0.0144 0.0608 NA NA NA NA NA NA NA NA ...
 $ ELEVATORS_MEDI              : num  0 0.08 NA NA NA NA NA NA NA NA ...
 $ ENTRANCES_MEDI              : num  0.069 0.0345 NA NA NA NA NA NA NA NA ...
 $ FLOORSMAX_MEDI              : num  0.0833 0.2917 NA NA NA ...
 $ FLOORSMIN_MEDI              : num  0.125 0.333 NA NA NA ...
 $ LANDAREA_MEDI               : num  0.0375 0.0132 NA NA NA NA NA NA NA NA ...
 $ LIVINGAPARTMENTS_MEDI       : num  0.0205 0.0787 NA NA NA NA NA NA NA NA ...
 $ LIVINGAREA_MEDI             : num  0.0193 0.0558 NA NA NA NA NA NA NA NA ...
 $ NONLIVINGAPARTMENTS_MEDI    : num  0 0.0039 NA NA NA NA NA NA NA NA ...
 $ NONLIVINGAREA_MEDI          : num  0 0.01 NA NA NA NA NA NA NA NA ...
 $ FONDKAPREMONT_MODE          : Factor w/ 5 levels "","not specified",..: 4 4 1 1 1 1 1 1 1 1 ...
 $ HOUSETYPE_MODE              : Factor w/ 4 levels "","block of flats",..: 2 2 1 1 1 1 1 1 1 1 ...
 $ TOTALAREA_MODE              : num  0.0149 0.0714 NA NA NA NA NA NA NA NA ...
 $ WALLSMATERIAL_MODE          : Factor w/ 8 levels "","Block","Mixed",..: 7 2 1 1 1 1 1 1 1 1 ...
 $ EMERGENCYSTATE_MODE         : Factor w/ 3 levels "","No","Yes": 2 2 1 1 1 1 1 1 1 1 ...
 $ OBS_30_CNT_SOCIAL_CIRCLE    : num  2 1 0 2 0 0 1 2 1 2 ...
 $ DEF_30_CNT_SOCIAL_CIRCLE    : num  2 0 0 0 0 0 0 0 0 0 ...
 $ OBS_60_CNT_SOCIAL_CIRCLE    : num  2 1 0 2 0 0 1 2 1 2 ...
 $ DEF_60_CNT_SOCIAL_CIRCLE    : num  2 0 0 0 0 0 0 0 0 0 ...
 $ DAYS_LAST_PHONE_CHANGE      : num  -1134 -828 -815 -617 -1106 ...
 $ FLAG_DOCUMENT_2             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ FLAG_DOCUMENT_3             : int  1 1 0 1 0 1 0 1 1 0 ...
 $ FLAG_DOCUMENT_4             : int  0 0 0 0 0 0 0 0 0 0 ...
  [list output truncated]
summary(train_data)
   SK_ID_CURR         TARGET              NAME_CONTRACT_TYPE CODE_GENDER  FLAG_OWN_CAR FLAG_OWN_REALTY  CNT_CHILDREN     AMT_INCOME_TOTAL   
 Min.   :100002   Min.   :0.00000   Cash loans     :278232   F  :202448   N:202924     N: 94199        Min.   : 0.0000   Min.   :    25650  
 1st Qu.:189146   1st Qu.:0.00000   Revolving loans: 29279   M  :105059   Y:104587     Y:213312        1st Qu.: 0.0000   1st Qu.:   112500  
 Median :278202   Median :0.00000                            XNA:     4                                Median : 0.0000   Median :   147150  
 Mean   :278181   Mean   :0.08073                                                                      Mean   : 0.4171   Mean   :   168798  
 3rd Qu.:367143   3rd Qu.:0.00000                                                                      3rd Qu.: 1.0000   3rd Qu.:   202500  
 Max.   :456255   Max.   :1.00000                                                                      Max.   :19.0000   Max.   :117000000  
                                                                                                                                            
   AMT_CREDIT       AMT_ANNUITY     AMT_GOODS_PRICE          NAME_TYPE_SUITE               NAME_INCOME_TYPE 
 Min.   :  45000   Min.   :  1616   Min.   :  40500   Unaccompanied  :248526   Working             :158774  
 1st Qu.: 270000   1st Qu.: 16524   1st Qu.: 238500   Family         : 40149   Commercial associate: 71617  
 Median : 513531   Median : 24903   Median : 450000   Spouse, partner: 11370   Pensioner           : 55362  
 Mean   : 599026   Mean   : 27109   Mean   : 538396   Children       :  3267   State servant       : 21703  
 3rd Qu.: 808650   3rd Qu.: 34596   3rd Qu.: 679500   Other_B        :  1770   Unemployed          :    22  
 Max.   :4050000   Max.   :258026   Max.   :4050000                  :  1292   Student             :    18  
                   NA's   :12       NA's   :278       (Other)        :  1137   (Other)             :    15  
                    NAME_EDUCATION_TYPE            NAME_FAMILY_STATUS           NAME_HOUSING_TYPE  REGION_POPULATION_RELATIVE   DAYS_BIRTH    
 Academic degree              :   164   Civil marriage      : 29775   Co-op apartment    :  1122   Min.   :0.00029            Min.   :-25229  
 Higher education             : 74863   Married             :196432   House / apartment  :272868   1st Qu.:0.01001            1st Qu.:-19682  
 Incomplete higher            : 10277   Separated           : 19770   Municipal apartment: 11183   Median :0.01885            Median :-15750  
 Lower secondary              :  3816   Single / not married: 45444   Office apartment   :  2617   Mean   :0.02087            Mean   :-16037  
 Secondary / secondary special:218391   Unknown             :     2   Rented apartment   :  4881   3rd Qu.:0.02866            3rd Qu.:-12413  
                                        Widow               : 16088   With parents       : 14840   Max.   :0.07251            Max.   : -7489  
                                                                                                                                              
 DAYS_EMPLOYED    DAYS_REGISTRATION DAYS_ID_PUBLISH  OWN_CAR_AGE       FLAG_MOBIL FLAG_EMP_PHONE   FLAG_WORK_PHONE  FLAG_CONT_MOBILE
 Min.   :-17912   Min.   :-24672    Min.   :-7197   Min.   : 0.00    Min.   :0    Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.: -2760   1st Qu.: -7480    1st Qu.:-4299   1st Qu.: 5.00    1st Qu.:1    1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:1.0000  
 Median : -1213   Median : -4504    Median :-3254   Median : 9.00    Median :1    Median :1.0000   Median :0.0000   Median :1.0000  
 Mean   : 63815   Mean   : -4986    Mean   :-2994   Mean   :12.06    Mean   :1    Mean   :0.8199   Mean   :0.1994   Mean   :0.9981  
 3rd Qu.:  -289   3rd Qu.: -2010    3rd Qu.:-1720   3rd Qu.:15.00    3rd Qu.:1    3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
 Max.   :365243   Max.   :     0    Max.   :    0   Max.   :91.00    Max.   :1    Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
                                                    NA's   :202929                                                                  
   FLAG_PHONE       FLAG_EMAIL         OCCUPATION_TYPE  CNT_FAM_MEMBERS  REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY
 Min.   :0.0000   Min.   :0.00000              :96391   Min.   : 1.000   Min.   :1.000        Min.   :1.000              
 1st Qu.:0.0000   1st Qu.:0.00000   Laborers   :55186   1st Qu.: 2.000   1st Qu.:2.000        1st Qu.:2.000              
 Median :0.0000   Median :0.00000   Sales staff:32102   Median : 2.000   Median :2.000        Median :2.000              
 Mean   :0.2811   Mean   :0.05672   Core staff :27570   Mean   : 2.153   Mean   :2.052        Mean   :2.032              
 3rd Qu.:1.0000   3rd Qu.:0.00000   Managers   :21371   3rd Qu.: 3.000   3rd Qu.:2.000        3rd Qu.:2.000              
 Max.   :1.0000   Max.   :1.00000   Drivers    :18603   Max.   :20.000   Max.   :3.000        Max.   :3.000              
                                    (Other)    :56288   NA's   :2                                                        
 WEEKDAY_APPR_PROCESS_START HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION REG_REGION_NOT_WORK_REGION LIVE_REGION_NOT_WORK_REGION
 FRIDAY   :50338            Min.   : 0.00           Min.   :0.00000            Min.   :0.00000            Min.   :0.00000            
 MONDAY   :50714            1st Qu.:10.00           1st Qu.:0.00000            1st Qu.:0.00000            1st Qu.:0.00000            
 SATURDAY :33852            Median :12.00           Median :0.00000            Median :0.00000            Median :0.00000            
 SUNDAY   :16181            Mean   :12.06           Mean   :0.01514            Mean   :0.05077            Mean   :0.04066            
 THURSDAY :50591            3rd Qu.:14.00           3rd Qu.:0.00000            3rd Qu.:0.00000            3rd Qu.:0.00000            
 TUESDAY  :53901            Max.   :23.00           Max.   :1.00000            Max.   :1.00000            Max.   :1.00000            
 WEDNESDAY:51934                                                                                                                     
 REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY              ORGANIZATION_TYPE   EXT_SOURCE_1     EXT_SOURCE_2   
 Min.   :0.00000        Min.   :0.0000         Min.   :0.0000          Business Entity Type 3: 67992   Min.   :0.01     Min.   :0.0000  
 1st Qu.:0.00000        1st Qu.:0.0000         1st Qu.:0.0000          XNA                   : 55374   1st Qu.:0.33     1st Qu.:0.3925  
 Median :0.00000        Median :0.0000         Median :0.0000          Self-employed         : 38412   Median :0.51     Median :0.5660  
 Mean   :0.07817        Mean   :0.2305         Mean   :0.1796          Other                 : 16683   Mean   :0.50     Mean   :0.5144  
 3rd Qu.:0.00000        3rd Qu.:0.0000         3rd Qu.:0.0000          Medicine              : 11193   3rd Qu.:0.68     3rd Qu.:0.6636  
 Max.   :1.00000        Max.   :1.0000         Max.   :1.0000          Business Entity Type 2: 10553   Max.   :0.96     Max.   :0.8550  
                                                                       (Other)               :107304   NA's   :173378   NA's   :660     
  EXT_SOURCE_3   APARTMENTS_AVG   BASEMENTAREA_AVG YEARS_BEGINEXPLUATATION_AVG YEARS_BUILD_AVG  COMMONAREA_AVG   ELEVATORS_AVG   
 Min.   :0.00    Min.   :0.00     Min.   :0.00     Min.   :0.00                Min.   :0.00     Min.   :0.00     Min.   :0.00    
 1st Qu.:0.37    1st Qu.:0.06     1st Qu.:0.04     1st Qu.:0.98                1st Qu.:0.69     1st Qu.:0.01     1st Qu.:0.00    
 Median :0.54    Median :0.09     Median :0.08     Median :0.98                Median :0.76     Median :0.02     Median :0.00    
 Mean   :0.51    Mean   :0.12     Mean   :0.09     Mean   :0.98                Mean   :0.75     Mean   :0.04     Mean   :0.08    
 3rd Qu.:0.67    3rd Qu.:0.15     3rd Qu.:0.11     3rd Qu.:0.99                3rd Qu.:0.82     3rd Qu.:0.05     3rd Qu.:0.12    
 Max.   :0.90    Max.   :1.00     Max.   :1.00     Max.   :1.00                Max.   :1.00     Max.   :1.00     Max.   :1.00    
 NA's   :60965   NA's   :156061   NA's   :179943   NA's   :150007              NA's   :204488   NA's   :214865   NA's   :163891  
 ENTRANCES_AVG    FLOORSMAX_AVG    FLOORSMIN_AVG     LANDAREA_AVG    LIVINGAPARTMENTS_AVG LIVINGAREA_AVG   NONLIVINGAPARTMENTS_AVG
 Min.   :0.00     Min.   :0.00     Min.   :0.00     Min.   :0.00     Min.   :0.00         Min.   :0.00     Min.   :0.00           
 1st Qu.:0.07     1st Qu.:0.17     1st Qu.:0.08     1st Qu.:0.02     1st Qu.:0.05         1st Qu.:0.05     1st Qu.:0.00           
 Median :0.14     Median :0.17     Median :0.21     Median :0.05     Median :0.08         Median :0.07     Median :0.00           
 Mean   :0.15     Mean   :0.23     Mean   :0.23     Mean   :0.07     Mean   :0.10         Mean   :0.11     Mean   :0.01           
 3rd Qu.:0.21     3rd Qu.:0.33     3rd Qu.:0.38     3rd Qu.:0.09     3rd Qu.:0.12         3rd Qu.:0.13     3rd Qu.:0.00           
 Max.   :1.00     Max.   :1.00     Max.   :1.00     Max.   :1.00     Max.   :1.00         Max.   :1.00     Max.   :1.00           
 NA's   :154828   NA's   :153020   NA's   :208642   NA's   :182590   NA's   :210199       NA's   :154350   NA's   :213514         
 NONLIVINGAREA_AVG APARTMENTS_MODE  BASEMENTAREA_MODE YEARS_BEGINEXPLUATATION_MODE YEARS_BUILD_MODE COMMONAREA_MODE  ELEVATORS_MODE  
 Min.   :0.00      Min.   :0.00     Min.   :0.00      Min.   :0.00                 Min.   :0.00     Min.   :0.00     Min.   :0.00    
 1st Qu.:0.00      1st Qu.:0.05     1st Qu.:0.04      1st Qu.:0.98                 1st Qu.:0.70     1st Qu.:0.01     1st Qu.:0.00    
 Median :0.00      Median :0.08     Median :0.07      Median :0.98                 Median :0.76     Median :0.02     Median :0.00    
 Mean   :0.03      Mean   :0.11     Mean   :0.09      Mean   :0.98                 Mean   :0.76     Mean   :0.04     Mean   :0.07    
 3rd Qu.:0.03      3rd Qu.:0.14     3rd Qu.:0.11      3rd Qu.:0.99                 3rd Qu.:0.82     3rd Qu.:0.05     3rd Qu.:0.12    
 Max.   :1.00      Max.   :1.00     Max.   :1.00      Max.   :1.00                 Max.   :1.00     Max.   :1.00     Max.   :1.00    
 NA's   :169682    NA's   :156061   NA's   :179943    NA's   :150007               NA's   :204488   NA's   :214865   NA's   :163891  
 ENTRANCES_MODE   FLOORSMAX_MODE   FLOORSMIN_MODE   LANDAREA_MODE    LIVINGAPARTMENTS_MODE LIVINGAREA_MODE  NONLIVINGAPARTMENTS_MODE
 Min.   :0.00     Min.   :0.00     Min.   :0.00     Min.   :0.00     Min.   :0.00          Min.   :0.00     Min.   :0.00            
 1st Qu.:0.07     1st Qu.:0.17     1st Qu.:0.08     1st Qu.:0.02     1st Qu.:0.05          1st Qu.:0.04     1st Qu.:0.00            
 Median :0.14     Median :0.17     Median :0.21     Median :0.05     Median :0.08          Median :0.07     Median :0.00            
 Mean   :0.15     Mean   :0.22     Mean   :0.23     Mean   :0.06     Mean   :0.11          Mean   :0.11     Mean   :0.01            
 3rd Qu.:0.21     3rd Qu.:0.33     3rd Qu.:0.38     3rd Qu.:0.08     3rd Qu.:0.13          3rd Qu.:0.13     3rd Qu.:0.00            
 Max.   :1.00     Max.   :1.00     Max.   :1.00     Max.   :1.00     Max.   :1.00          Max.   :1.00     Max.   :1.00            
 NA's   :154828   NA's   :153020   NA's   :208642   NA's   :182590   NA's   :210199        NA's   :154350   NA's   :213514          
 NONLIVINGAREA_MODE APARTMENTS_MEDI  BASEMENTAREA_MEDI YEARS_BEGINEXPLUATATION_MEDI YEARS_BUILD_MEDI COMMONAREA_MEDI  ELEVATORS_MEDI  
 Min.   :0.00       Min.   :0.00     Min.   :0.00      Min.   :0.00                 Min.   :0.00     Min.   :0.00     Min.   :0.00    
 1st Qu.:0.00       1st Qu.:0.06     1st Qu.:0.04      1st Qu.:0.98                 1st Qu.:0.69     1st Qu.:0.01     1st Qu.:0.00    
 Median :0.00       Median :0.09     Median :0.08      Median :0.98                 Median :0.76     Median :0.02     Median :0.00    
 Mean   :0.03       Mean   :0.12     Mean   :0.09      Mean   :0.98                 Mean   :0.76     Mean   :0.04     Mean   :0.08    
 3rd Qu.:0.02       3rd Qu.:0.15     3rd Qu.:0.11      3rd Qu.:0.99                 3rd Qu.:0.83     3rd Qu.:0.05     3rd Qu.:0.12    
 Max.   :1.00       Max.   :1.00     Max.   :1.00      Max.   :1.00                 Max.   :1.00     Max.   :1.00     Max.   :1.00    
 NA's   :169682     NA's   :156061   NA's   :179943    NA's   :150007               NA's   :204488   NA's   :214865   NA's   :163891  
 ENTRANCES_MEDI   FLOORSMAX_MEDI   FLOORSMIN_MEDI   LANDAREA_MEDI    LIVINGAPARTMENTS_MEDI LIVINGAREA_MEDI  NONLIVINGAPARTMENTS_MEDI
 Min.   :0.00     Min.   :0.00     Min.   :0.00     Min.   :0.00     Min.   :0.00          Min.   :0.00     Min.   :0.00            
 1st Qu.:0.07     1st Qu.:0.17     1st Qu.:0.08     1st Qu.:0.02     1st Qu.:0.05          1st Qu.:0.05     1st Qu.:0.00            
 Median :0.14     Median :0.17     Median :0.21     Median :0.05     Median :0.08          Median :0.07     Median :0.00            
 Mean   :0.15     Mean   :0.23     Mean   :0.23     Mean   :0.07     Mean   :0.10          Mean   :0.11     Mean   :0.01            
 3rd Qu.:0.21     3rd Qu.:0.33     3rd Qu.:0.38     3rd Qu.:0.09     3rd Qu.:0.12          3rd Qu.:0.13     3rd Qu.:0.00            
 Max.   :1.00     Max.   :1.00     Max.   :1.00     Max.   :1.00     Max.   :1.00          Max.   :1.00     Max.   :1.00            
 NA's   :154828   NA's   :153020   NA's   :208642   NA's   :182590   NA's   :210199        NA's   :154350   NA's   :213514          
 NONLIVINGAREA_MEDI             FONDKAPREMONT_MODE          HOUSETYPE_MODE   TOTALAREA_MODE      WALLSMATERIAL_MODE EMERGENCYSTATE_MODE
 Min.   :0.00                            :210295                   :154297   Min.   :0.00                 :156341      :145755         
 1st Qu.:0.00       not specified        :  5687   block of flats  :150503   1st Qu.:0.04     Panel       : 66040   No :159428         
 Median :0.00       org spec account     :  5619   specific housing:  1499   Median :0.07     Stone, brick: 64815   Yes:  2328         
 Mean   :0.03       reg oper account     : 73830   terraced house  :  1212   Mean   :0.10     Block       :  9253                      
 3rd Qu.:0.03       reg oper spec account: 12080                             3rd Qu.:0.13     Wooden      :  5362                      
 Max.   :1.00                                                                Max.   :1.00     Mixed       :  2296                      
 NA's   :169682                                                              NA's   :148431   (Other)     :  3404                      
 OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE DAYS_LAST_PHONE_CHANGE FLAG_DOCUMENT_2   
 Min.   :  0.000          Min.   : 0.0000          Min.   :  0.000          Min.   : 0.0             Min.   :-4292.0        Min.   :0.00e+00  
 1st Qu.:  0.000          1st Qu.: 0.0000          1st Qu.:  0.000          1st Qu.: 0.0             1st Qu.:-1570.0        1st Qu.:0.00e+00  
 Median :  0.000          Median : 0.0000          Median :  0.000          Median : 0.0             Median : -757.0        Median :0.00e+00  
 Mean   :  1.422          Mean   : 0.1434          Mean   :  1.405          Mean   : 0.1             Mean   : -962.9        Mean   :4.23e-05  
 3rd Qu.:  2.000          3rd Qu.: 0.0000          3rd Qu.:  2.000          3rd Qu.: 0.0             3rd Qu.: -274.0        3rd Qu.:0.00e+00  
 Max.   :348.000          Max.   :34.0000          Max.   :344.000          Max.   :24.0             Max.   :    0.0        Max.   :1.00e+00  
 NA's   :1021             NA's   :1021             NA's   :1021             NA's   :1021             NA's   :1                                
 FLAG_DOCUMENT_3 FLAG_DOCUMENT_4    FLAG_DOCUMENT_5   FLAG_DOCUMENT_6   FLAG_DOCUMENT_7     FLAG_DOCUMENT_8   FLAG_DOCUMENT_9   
 Min.   :0.00    Min.   :0.00e+00   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000000   Min.   :0.00000   Min.   :0.000000  
 1st Qu.:0.00    1st Qu.:0.00e+00   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000000   1st Qu.:0.00000   1st Qu.:0.000000  
 Median :1.00    Median :0.00e+00   Median :0.00000   Median :0.00000   Median :0.0000000   Median :0.00000   Median :0.000000  
 Mean   :0.71    Mean   :8.13e-05   Mean   :0.01511   Mean   :0.08806   Mean   :0.0001919   Mean   :0.08138   Mean   :0.003896  
 3rd Qu.:1.00    3rd Qu.:0.00e+00   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.0000000   3rd Qu.:0.00000   3rd Qu.:0.000000  
 Max.   :1.00    Max.   :1.00e+00   Max.   :1.00000   Max.   :1.00000   Max.   :1.0000000   Max.   :1.00000   Max.   :1.000000  
                                                                                                                                
 FLAG_DOCUMENT_10   FLAG_DOCUMENT_11   FLAG_DOCUMENT_12  FLAG_DOCUMENT_13   FLAG_DOCUMENT_14   FLAG_DOCUMENT_15  FLAG_DOCUMENT_16  
 Min.   :0.00e+00   Min.   :0.000000   Min.   :0.0e+00   Min.   :0.000000   Min.   :0.000000   Min.   :0.00000   Min.   :0.000000  
 1st Qu.:0.00e+00   1st Qu.:0.000000   1st Qu.:0.0e+00   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.000000  
 Median :0.00e+00   Median :0.000000   Median :0.0e+00   Median :0.000000   Median :0.000000   Median :0.00000   Median :0.000000  
 Mean   :2.28e-05   Mean   :0.003912   Mean   :6.5e-06   Mean   :0.003525   Mean   :0.002936   Mean   :0.00121   Mean   :0.009928  
 3rd Qu.:0.00e+00   3rd Qu.:0.000000   3rd Qu.:0.0e+00   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.000000  
 Max.   :1.00e+00   Max.   :1.000000   Max.   :1.0e+00   Max.   :1.000000   Max.   :1.000000   Max.   :1.00000   Max.   :1.000000  
                                                                                                                                   
 FLAG_DOCUMENT_17    FLAG_DOCUMENT_18  FLAG_DOCUMENT_19    FLAG_DOCUMENT_20    FLAG_DOCUMENT_21    AMT_REQ_CREDIT_BUREAU_HOUR
 Min.   :0.0000000   Min.   :0.00000   Min.   :0.0000000   Min.   :0.0000000   Min.   :0.0000000   Min.   :0.00              
 1st Qu.:0.0000000   1st Qu.:0.00000   1st Qu.:0.0000000   1st Qu.:0.0000000   1st Qu.:0.0000000   1st Qu.:0.00              
 Median :0.0000000   Median :0.00000   Median :0.0000000   Median :0.0000000   Median :0.0000000   Median :0.00              
 Mean   :0.0002667   Mean   :0.00813   Mean   :0.0005951   Mean   :0.0005073   Mean   :0.0003349   Mean   :0.01              
 3rd Qu.:0.0000000   3rd Qu.:0.00000   3rd Qu.:0.0000000   3rd Qu.:0.0000000   3rd Qu.:0.0000000   3rd Qu.:0.00              
 Max.   :1.0000000   Max.   :1.00000   Max.   :1.0000000   Max.   :1.0000000   Max.   :1.0000000   Max.   :4.00              
                                                                                                   NA's   :41519             
 AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR
 Min.   :0.00              Min.   :0.00               Min.   : 0.00             Min.   :  0.00            Min.   : 0.0              
 1st Qu.:0.00              1st Qu.:0.00               1st Qu.: 0.00             1st Qu.:  0.00            1st Qu.: 0.0              
 Median :0.00              Median :0.00               Median : 0.00             Median :  0.00            Median : 1.0              
 Mean   :0.01              Mean   :0.03               Mean   : 0.27             Mean   :  0.27            Mean   : 1.9              
 3rd Qu.:0.00              3rd Qu.:0.00               3rd Qu.: 0.00             3rd Qu.:  0.00            3rd Qu.: 3.0              
 Max.   :9.00              Max.   :8.00               Max.   :27.00             Max.   :261.00            Max.   :25.0              
 NA's   :41519             NA's   :41519              NA's   :41519             NA's   :41519             NA's   :41519             
head(train_data)
sum(is.na(train_data))
[1] 8388094
sum(is.na(test_data))
[1] 1285385
sum(is.na(prev_app))
[1] 10288585
LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQoNCkhvbWUgQ3JlZGl0IHN0cml2ZXMgdG8gYnJvYWRlbiBmaW5hbmNpYWwgaW5jbHVzaW9uIGZvciB0aGUgdW5iYW5rZWQgcG9wdWxhdGlvbiBieSBwcm92aWRpbmcgYSBwb3NpdGl2ZSBhbmQgc2FmZSBib3Jyb3dpbmcgZXhwZXJpZW5jZS4gSW4gb3JkZXIgdG8gbWFrZSBzdXJlIHRoaXMgdW5kZXJzZXJ2ZWQgcG9wdWxhdGlvbiBoYXMgYSBwb3NpdGl2ZSBsb2FuIGV4cGVyaWVuY2UsIEhvbWUgQ3JlZGl0IG1ha2VzIHVzZSBvZiBhIHZhcmlldHkgb2YgYWx0ZXJuYXRpdmUgZGF0YS0taW5jbHVkaW5nIHRlbGNvIGFuZCB0cmFuc2FjdGlvbmFsIGluZm9ybWF0aW9uLS10byBwcmVkaWN0IHRoZWlyIGNsaWVudHMnIHJlcGF5bWVudCBhYmlsaXRpZXMuIFdoaWxlIEhvbWUgQ3JlZGl0IGlzIGN1cnJlbnRseSB1c2luZyB2YXJpb3VzIHN0YXRpc3RpY2FsIGFuZCBtYWNoaW5lIGxlYXJuaW5nIG1ldGhvZHMgdG8gbWFrZSB0aGVzZSBwcmVkaWN0aW9ucywgdGhleSdyZSBjaGFsbGVuZ2luZyBLYWdnbGVycyB0byBoZWxwIHRoZW0gdW5sb2NrIHRoZSBmdWxsIHBvdGVudGlhbCBvZiB0aGVpciBkYXRhLiBEb2luZyBzbyB3aWxsIGVuc3VyZSB0aGF0IGNsaWVudHMgY2FwYWJsZSBvZiByZXBheW1lbnQgYXJlIG5vdCByZWplY3RlZCBhbmQgdGhhdCBsb2FucyBhcmUgZ2l2ZW4gd2l0aCBhIHByaW5jaXBhbCwgbWF0dXJpdHksIGFuZCByZXBheW1lbnQgY2FsZW5kYXIgdGhhdCB3aWxsIGVtcG93ZXIgdGhlaXINCmNsaWVudHMgdG8gYmUgc3VjY2Vzc2Z1bC4iDQoNCkluIHRoaXMgcHJvamVjdCwgd2Ugd2lsbCBleHBsb3JlIHR3byBtb2RlbHMgd2hpY2ggd2lsbCBiZSB1c2VkIHRvIHByZWRpY3Qgd2hldGhlciBhIGxvYW4gYm9ycm93ZXIgd2lsbCBkZWZhdWx0IG9uIHBheW1lbnQgb3Igbm90LiBUaGUgdGFyZ2V0IHZhcmlhYmxlIGlzIGJpbmFyeSAxLmUoMSwwKS4gV2Ugd2lsbCBmaXJzdCBiZWdpbiBieSBpbXBvcnRpbmcgdGhlIGRheWEgYmVmb3JlIHByb2NlZWRpbmcgd2l0aCB0aGUgZGF0YSBleHBsb3JhdGlvbg0KDQpgYGB7ciBzZXR1cCwgcmVzdWx0cz0naGlkZScsIGluY2x1ZGU9RkFMU0UsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpDQoNCg0KdHJhaW5fZGF0YTwtIHJlYWQuY3N2KCIvVXNlcnMvUm9kZGEgT3VtYS9Eb2N1bWVudHMvSGFycmlzYnVyZy9NYWNoaW5lTGVhcm5pbmcvUHJvamVjdC9hcHBsaWNhdGlvbl90cmFpbi5jc3YiKQ0KdGVzdF9kYXRhPC1yZWFkLmNzdigiL1VzZXJzL1JvZGRhIE91bWEvRG9jdW1lbnRzL0hhcnJpc2J1cmcvTWFjaGluZUxlYXJuaW5nL1Byb2plY3QvYXBwbGljYXRpb25fdGVzdC5jc3YiKQ0KcHJldl9hcHA8LXJlYWQuY3N2KCIvVXNlcnMvUm9kZGEgT3VtYS9Eb2N1bWVudHMvSGFycmlzYnVyZy9NYWNoaW5lTGVhcm5pbmcvUHJvamVjdC9wcmV2aW91c19hcHBsaWNhdGlvbi5jc3YiKQ0KDQpgYGANCg0KV2Ugd2lsbCB0aGVuIGV4YW1pbmUgdGhlIGRhdGFzZXRzIGZvciBhbnkgbWlzc2luZyBkYXRhDQpgYGB7cn0NCg0Kc3RyKHRyYWluX2RhdGEpDQoNCnN1bW1hcnkodHJhaW5fZGF0YSkNCg0KaGVhZCh0cmFpbl9kYXRhKQ0KDQpzdW0oaXMubmEodHJhaW5fZGF0YSkpDQpzdW0oaXMubmEodGVzdF9kYXRhKSkNCnN1bShpcy5uYShwcmV2X2FwcCkpDQoNCmBgYA0KDQo=