Homework3
# na.strings = "" will treat empty string as NA
loan_data <- read.csv('https://raw.githubusercontent.com/metis-macys-66898/data622_fa2021/main/hw3/data/Loan_approval.csv', header = TRUE, na.strings = "")
loan_raw <- read.csv('https://raw.githubusercontent.com/metis-macys-66898/data622_fa2021/main/hw3/data/Loan_approval.csv', header = TRUE)1. Exploratory Data Analysis
Our loan data has 614 rows and 13 columns, 8 of which are categorical and 5 are numerical. The target variable is Loan_Status, which can be either Y (yes) or N (no). This let’s us know if the applicant’s loan is approved. There are 7 variables that have blank values. The one with the most blank values is Credit_History with 50 blanks.
| Loan_ID | Gender | Married | Dependents | Education | Self_Employed | ApplicantIncome | CoapplicantIncome | LoanAmount | Loan_Amount_Term | Credit_History | Property_Area | Loan_Status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LP001002 | Male | No | 0 | Graduate | No | 5849 | 0.00 | NA | 360 | 1 | Urban | Y |
| LP001003 | Male | Yes | 1 | Graduate | No | 4583 | 1508.00 | 128 | 360 | 1 | Rural | N |
| LP001005 | Male | Yes | 0 | Graduate | Yes | 3000 | 0.00 | 66 | 360 | 1 | Urban | Y |
| LP001006 | Male | Yes | 0 | Not Graduate | No | 2583 | 2358.00 | 120 | 360 | 1 | Urban | Y |
| LP001008 | Male | No | 0 | Graduate | No | 6000 | 0.00 | 141 | 360 | 1 | Urban | Y |
| LP001011 | Male | Yes | 2 | Graduate | Yes | 5417 | 4196.00 | 267 | 360 | 1 | Urban | Y |
| LP001013 | Male | Yes | 0 | Not Graduate | No | 2333 | 1516.00 | 95 | 360 | 1 | Urban | Y |
| LP001014 | Male | Yes | 3+ | Graduate | No | 3036 | 2504.00 | 158 | 360 | 0 | Semiurban | N |
| LP001018 | Male | Yes | 2 | Graduate | No | 4006 | 1526.00 | 168 | 360 | 1 | Urban | Y |
| LP001020 | Male | Yes | 1 | Graduate | No | 12841 | 10968.00 | 349 | 360 | 1 | Semiurban | N |
| LP001024 | Male | Yes | 2 | Graduate | No | 3200 | 700.00 | 70 | 360 | 1 | Urban | Y |
| LP001027 | Male | Yes | 2 | Graduate | NA | 2500 | 1840.00 | 109 | 360 | 1 | Urban | Y |
| LP001028 | Male | Yes | 2 | Graduate | No | 3073 | 8106.00 | 200 | 360 | 1 | Urban | Y |
| LP001029 | Male | No | 0 | Graduate | No | 1853 | 2840.00 | 114 | 360 | 1 | Rural | N |
| LP001030 | Male | Yes | 2 | Graduate | No | 1299 | 1086.00 | 17 | 120 | 1 | Urban | Y |
| LP001032 | Male | No | 0 | Graduate | No | 4950 | 0.00 | 125 | 360 | 1 | Urban | Y |
| LP001034 | Male | No | 1 | Not Graduate | No | 3596 | 0.00 | 100 | 240 | NA | Urban | Y |
| LP001036 | Female | No | 0 | Graduate | No | 3510 | 0.00 | 76 | 360 | 0 | Urban | N |
| LP001038 | Male | Yes | 0 | Not Graduate | No | 4887 | 0.00 | 133 | 360 | 1 | Rural | N |
| LP001041 | Male | Yes | 0 | Graduate | NA | 2600 | 3500.00 | 115 | NA | 1 | Urban | Y |
| LP001043 | Male | Yes | 0 | Not Graduate | No | 7660 | 0.00 | 104 | 360 | 0 | Urban | N |
| LP001046 | Male | Yes | 1 | Graduate | No | 5955 | 5625.00 | 315 | 360 | 1 | Urban | Y |
| LP001047 | Male | Yes | 0 | Not Graduate | No | 2600 | 1911.00 | 116 | 360 | 0 | Semiurban | N |
| LP001050 | NA | Yes | 2 | Not Graduate | No | 3365 | 1917.00 | 112 | 360 | 0 | Rural | N |
| LP001052 | Male | Yes | 1 | Graduate | NA | 3717 | 2925.00 | 151 | 360 | NA | Semiurban | N |
| LP001066 | Male | Yes | 0 | Graduate | Yes | 9560 | 0.00 | 191 | 360 | 1 | Semiurban | Y |
| LP001068 | Male | Yes | 0 | Graduate | No | 2799 | 2253.00 | 122 | 360 | 1 | Semiurban | Y |
| LP001073 | Male | Yes | 2 | Not Graduate | No | 4226 | 1040.00 | 110 | 360 | 1 | Urban | Y |
| LP001086 | Male | No | 0 | Not Graduate | No | 1442 | 0.00 | 35 | 360 | 1 | Urban | N |
| LP001087 | Female | No | 2 | Graduate | NA | 3750 | 2083.00 | 120 | 360 | 1 | Semiurban | Y |
| LP001091 | Male | Yes | 1 | Graduate | NA | 4166 | 3369.00 | 201 | 360 | NA | Urban | N |
| LP001095 | Male | No | 0 | Graduate | No | 3167 | 0.00 | 74 | 360 | 1 | Urban | N |
| LP001097 | Male | No | 1 | Graduate | Yes | 4692 | 0.00 | 106 | 360 | 1 | Rural | N |
| LP001098 | Male | Yes | 0 | Graduate | No | 3500 | 1667.00 | 114 | 360 | 1 | Semiurban | Y |
| LP001100 | Male | No | 3+ | Graduate | No | 12500 | 3000.00 | 320 | 360 | 1 | Rural | N |
| LP001106 | Male | Yes | 0 | Graduate | No | 2275 | 2067.00 | NA | 360 | 1 | Urban | Y |
| LP001109 | Male | Yes | 0 | Graduate | No | 1828 | 1330.00 | 100 | NA | 0 | Urban | N |
| LP001112 | Female | Yes | 0 | Graduate | No | 3667 | 1459.00 | 144 | 360 | 1 | Semiurban | Y |
| LP001114 | Male | No | 0 | Graduate | No | 4166 | 7210.00 | 184 | 360 | 1 | Urban | Y |
| LP001116 | Male | No | 0 | Not Graduate | No | 3748 | 1668.00 | 110 | 360 | 1 | Semiurban | Y |
| LP001119 | Male | No | 0 | Graduate | No | 3600 | 0.00 | 80 | 360 | 1 | Urban | N |
| LP001120 | Male | No | 0 | Graduate | No | 1800 | 1213.00 | 47 | 360 | 1 | Urban | Y |
| LP001123 | Male | Yes | 0 | Graduate | No | 2400 | 0.00 | 75 | 360 | NA | Urban | Y |
| LP001131 | Male | Yes | 0 | Graduate | No | 3941 | 2336.00 | 134 | 360 | 1 | Semiurban | Y |
| LP001136 | Male | Yes | 0 | Not Graduate | Yes | 4695 | 0.00 | 96 | NA | 1 | Urban | Y |
| LP001137 | Female | No | 0 | Graduate | No | 3410 | 0.00 | 88 | NA | 1 | Urban | Y |
| LP001138 | Male | Yes | 1 | Graduate | No | 5649 | 0.00 | 44 | 360 | 1 | Urban | Y |
| LP001144 | Male | Yes | 0 | Graduate | No | 5821 | 0.00 | 144 | 360 | 1 | Urban | Y |
| LP001146 | Female | Yes | 0 | Graduate | No | 2645 | 3440.00 | 120 | 360 | 0 | Urban | N |
| LP001151 | Female | No | 0 | Graduate | No | 4000 | 2275.00 | 144 | 360 | 1 | Semiurban | Y |
| LP001155 | Female | Yes | 0 | Not Graduate | No | 1928 | 1644.00 | 100 | 360 | 1 | Semiurban | Y |
| LP001157 | Female | No | 0 | Graduate | No | 3086 | 0.00 | 120 | 360 | 1 | Semiurban | Y |
| LP001164 | Female | No | 0 | Graduate | No | 4230 | 0.00 | 112 | 360 | 1 | Semiurban | N |
| LP001179 | Male | Yes | 2 | Graduate | No | 4616 | 0.00 | 134 | 360 | 1 | Urban | N |
| LP001186 | Female | Yes | 1 | Graduate | Yes | 11500 | 0.00 | 286 | 360 | 0 | Urban | N |
| LP001194 | Male | Yes | 2 | Graduate | No | 2708 | 1167.00 | 97 | 360 | 1 | Semiurban | Y |
| LP001195 | Male | Yes | 0 | Graduate | No | 2132 | 1591.00 | 96 | 360 | 1 | Semiurban | Y |
| LP001197 | Male | Yes | 0 | Graduate | No | 3366 | 2200.00 | 135 | 360 | 1 | Rural | N |
| LP001198 | Male | Yes | 1 | Graduate | No | 8080 | 2250.00 | 180 | 360 | 1 | Urban | Y |
| LP001199 | Male | Yes | 2 | Not Graduate | No | 3357 | 2859.00 | 144 | 360 | 1 | Urban | Y |
| LP001205 | Male | Yes | 0 | Graduate | No | 2500 | 3796.00 | 120 | 360 | 1 | Urban | Y |
| LP001206 | Male | Yes | 3+ | Graduate | No | 3029 | 0.00 | 99 | 360 | 1 | Urban | Y |
| LP001207 | Male | Yes | 0 | Not Graduate | Yes | 2609 | 3449.00 | 165 | 180 | 0 | Rural | N |
| LP001213 | Male | Yes | 1 | Graduate | No | 4945 | 0.00 | NA | 360 | 0 | Rural | N |
| LP001222 | Female | No | 0 | Graduate | No | 4166 | 0.00 | 116 | 360 | 0 | Semiurban | N |
| LP001225 | Male | Yes | 0 | Graduate | No | 5726 | 4595.00 | 258 | 360 | 1 | Semiurban | N |
| LP001228 | Male | No | 0 | Not Graduate | No | 3200 | 2254.00 | 126 | 180 | 0 | Urban | N |
| LP001233 | Male | Yes | 1 | Graduate | No | 10750 | 0.00 | 312 | 360 | 1 | Urban | Y |
| LP001238 | Male | Yes | 3+ | Not Graduate | Yes | 7100 | 0.00 | 125 | 60 | 1 | Urban | Y |
| LP001241 | Female | No | 0 | Graduate | No | 4300 | 0.00 | 136 | 360 | 0 | Semiurban | N |
| LP001243 | Male | Yes | 0 | Graduate | No | 3208 | 3066.00 | 172 | 360 | 1 | Urban | Y |
| LP001245 | Male | Yes | 2 | Not Graduate | Yes | 1875 | 1875.00 | 97 | 360 | 1 | Semiurban | Y |
| LP001248 | Male | No | 0 | Graduate | No | 3500 | 0.00 | 81 | 300 | 1 | Semiurban | Y |
| LP001250 | Male | Yes | 3+ | Not Graduate | No | 4755 | 0.00 | 95 | NA | 0 | Semiurban | N |
| LP001253 | Male | Yes | 3+ | Graduate | Yes | 5266 | 1774.00 | 187 | 360 | 1 | Semiurban | Y |
| LP001255 | Male | No | 0 | Graduate | No | 3750 | 0.00 | 113 | 480 | 1 | Urban | N |
| LP001256 | Male | No | 0 | Graduate | No | 3750 | 4750.00 | 176 | 360 | 1 | Urban | N |
| LP001259 | Male | Yes | 1 | Graduate | Yes | 1000 | 3022.00 | 110 | 360 | 1 | Urban | N |
| LP001263 | Male | Yes | 3+ | Graduate | No | 3167 | 4000.00 | 180 | 300 | 0 | Semiurban | N |
| LP001264 | Male | Yes | 3+ | Not Graduate | Yes | 3333 | 2166.00 | 130 | 360 | NA | Semiurban | Y |
| LP001265 | Female | No | 0 | Graduate | No | 3846 | 0.00 | 111 | 360 | 1 | Semiurban | Y |
| LP001266 | Male | Yes | 1 | Graduate | Yes | 2395 | 0.00 | NA | 360 | 1 | Semiurban | Y |
| LP001267 | Female | Yes | 2 | Graduate | No | 1378 | 1881.00 | 167 | 360 | 1 | Urban | N |
| LP001273 | Male | Yes | 0 | Graduate | No | 6000 | 2250.00 | 265 | 360 | NA | Semiurban | N |
| LP001275 | Male | Yes | 1 | Graduate | No | 3988 | 0.00 | 50 | 240 | 1 | Urban | Y |
| LP001279 | Male | No | 0 | Graduate | No | 2366 | 2531.00 | 136 | 360 | 1 | Semiurban | Y |
| LP001280 | Male | Yes | 2 | Not Graduate | No | 3333 | 2000.00 | 99 | 360 | NA | Semiurban | Y |
| LP001282 | Male | Yes | 0 | Graduate | No | 2500 | 2118.00 | 104 | 360 | 1 | Semiurban | Y |
| LP001289 | Male | No | 0 | Graduate | No | 8566 | 0.00 | 210 | 360 | 1 | Urban | Y |
| LP001310 | Male | Yes | 0 | Graduate | No | 5695 | 4167.00 | 175 | 360 | 1 | Semiurban | Y |
| LP001316 | Male | Yes | 0 | Graduate | No | 2958 | 2900.00 | 131 | 360 | 1 | Semiurban | Y |
| LP001318 | Male | Yes | 2 | Graduate | No | 6250 | 5654.00 | 188 | 180 | 1 | Semiurban | Y |
| LP001319 | Male | Yes | 2 | Not Graduate | No | 3273 | 1820.00 | 81 | 360 | 1 | Urban | Y |
| LP001322 | Male | No | 0 | Graduate | No | 4133 | 0.00 | 122 | 360 | 1 | Semiurban | Y |
| LP001325 | Male | No | 0 | Not Graduate | No | 3620 | 0.00 | 25 | 120 | 1 | Semiurban | Y |
| LP001326 | Male | No | 0 | Graduate | NA | 6782 | 0.00 | NA | 360 | NA | Urban | N |
| LP001327 | Female | Yes | 0 | Graduate | No | 2484 | 2302.00 | 137 | 360 | 1 | Semiurban | Y |
| LP001333 | Male | Yes | 0 | Graduate | No | 1977 | 997.00 | 50 | 360 | 1 | Semiurban | Y |
| LP001334 | Male | Yes | 0 | Not Graduate | No | 4188 | 0.00 | 115 | 180 | 1 | Semiurban | Y |
| LP001343 | Male | Yes | 0 | Graduate | No | 1759 | 3541.00 | 131 | 360 | 1 | Semiurban | Y |
| LP001345 | Male | Yes | 2 | Not Graduate | No | 4288 | 3263.00 | 133 | 180 | 1 | Urban | Y |
| LP001349 | Male | No | 0 | Graduate | No | 4843 | 3806.00 | 151 | 360 | 1 | Semiurban | Y |
| LP001350 | Male | Yes | NA | Graduate | No | 13650 | 0.00 | NA | 360 | 1 | Urban | Y |
| LP001356 | Male | Yes | 0 | Graduate | No | 4652 | 3583.00 | NA | 360 | 1 | Semiurban | Y |
| LP001357 | Male | NA | NA | Graduate | No | 3816 | 754.00 | 160 | 360 | 1 | Urban | Y |
| LP001367 | Male | Yes | 1 | Graduate | No | 3052 | 1030.00 | 100 | 360 | 1 | Urban | Y |
| LP001369 | Male | Yes | 2 | Graduate | No | 11417 | 1126.00 | 225 | 360 | 1 | Urban | Y |
| LP001370 | Male | No | 0 | Not Graduate | NA | 7333 | 0.00 | 120 | 360 | 1 | Rural | N |
| LP001379 | Male | Yes | 2 | Graduate | No | 3800 | 3600.00 | 216 | 360 | 0 | Urban | N |
| LP001384 | Male | Yes | 3+ | Not Graduate | No | 2071 | 754.00 | 94 | 480 | 1 | Semiurban | Y |
| LP001385 | Male | No | 0 | Graduate | No | 5316 | 0.00 | 136 | 360 | 1 | Urban | Y |
| LP001387 | Female | Yes | 0 | Graduate | NA | 2929 | 2333.00 | 139 | 360 | 1 | Semiurban | Y |
| LP001391 | Male | Yes | 0 | Not Graduate | No | 3572 | 4114.00 | 152 | NA | 0 | Rural | N |
| LP001392 | Female | No | 1 | Graduate | Yes | 7451 | 0.00 | NA | 360 | 1 | Semiurban | Y |
| LP001398 | Male | No | 0 | Graduate | NA | 5050 | 0.00 | 118 | 360 | 1 | Semiurban | Y |
| LP001401 | Male | Yes | 1 | Graduate | No | 14583 | 0.00 | 185 | 180 | 1 | Rural | Y |
| LP001404 | Female | Yes | 0 | Graduate | No | 3167 | 2283.00 | 154 | 360 | 1 | Semiurban | Y |
| LP001405 | Male | Yes | 1 | Graduate | No | 2214 | 1398.00 | 85 | 360 | NA | Urban | Y |
| LP001421 | Male | Yes | 0 | Graduate | No | 5568 | 2142.00 | 175 | 360 | 1 | Rural | N |
| LP001422 | Female | No | 0 | Graduate | No | 10408 | 0.00 | 259 | 360 | 1 | Urban | Y |
| LP001426 | Male | Yes | NA | Graduate | No | 5667 | 2667.00 | 180 | 360 | 1 | Rural | Y |
| LP001430 | Female | No | 0 | Graduate | No | 4166 | 0.00 | 44 | 360 | 1 | Semiurban | Y |
| LP001431 | Female | No | 0 | Graduate | No | 2137 | 8980.00 | 137 | 360 | 0 | Semiurban | Y |
| LP001432 | Male | Yes | 2 | Graduate | No | 2957 | 0.00 | 81 | 360 | 1 | Semiurban | Y |
| LP001439 | Male | Yes | 0 | Not Graduate | No | 4300 | 2014.00 | 194 | 360 | 1 | Rural | Y |
| LP001443 | Female | No | 0 | Graduate | No | 3692 | 0.00 | 93 | 360 | NA | Rural | Y |
| LP001448 | NA | Yes | 3+ | Graduate | No | 23803 | 0.00 | 370 | 360 | 1 | Rural | Y |
| LP001449 | Male | No | 0 | Graduate | No | 3865 | 1640.00 | NA | 360 | 1 | Rural | Y |
| LP001451 | Male | Yes | 1 | Graduate | Yes | 10513 | 3850.00 | 160 | 180 | 0 | Urban | N |
| LP001465 | Male | Yes | 0 | Graduate | No | 6080 | 2569.00 | 182 | 360 | NA | Rural | N |
| LP001469 | Male | No | 0 | Graduate | Yes | 20166 | 0.00 | 650 | 480 | NA | Urban | Y |
| LP001473 | Male | No | 0 | Graduate | No | 2014 | 1929.00 | 74 | 360 | 1 | Urban | Y |
| LP001478 | Male | No | 0 | Graduate | No | 2718 | 0.00 | 70 | 360 | 1 | Semiurban | Y |
| LP001482 | Male | Yes | 0 | Graduate | Yes | 3459 | 0.00 | 25 | 120 | 1 | Semiurban | Y |
| LP001487 | Male | No | 0 | Graduate | No | 4895 | 0.00 | 102 | 360 | 1 | Semiurban | Y |
| LP001488 | Male | Yes | 3+ | Graduate | No | 4000 | 7750.00 | 290 | 360 | 1 | Semiurban | N |
| LP001489 | Female | Yes | 0 | Graduate | No | 4583 | 0.00 | 84 | 360 | 1 | Rural | N |
| LP001491 | Male | Yes | 2 | Graduate | Yes | 3316 | 3500.00 | 88 | 360 | 1 | Urban | Y |
| LP001492 | Male | No | 0 | Graduate | No | 14999 | 0.00 | 242 | 360 | 0 | Semiurban | N |
| LP001493 | Male | Yes | 2 | Not Graduate | No | 4200 | 1430.00 | 129 | 360 | 1 | Rural | N |
| LP001497 | Male | Yes | 2 | Graduate | No | 5042 | 2083.00 | 185 | 360 | 1 | Rural | N |
| LP001498 | Male | No | 0 | Graduate | No | 5417 | 0.00 | 168 | 360 | 1 | Urban | Y |
| LP001504 | Male | No | 0 | Graduate | Yes | 6950 | 0.00 | 175 | 180 | 1 | Semiurban | Y |
| LP001507 | Male | Yes | 0 | Graduate | No | 2698 | 2034.00 | 122 | 360 | 1 | Semiurban | Y |
| LP001508 | Male | Yes | 2 | Graduate | No | 11757 | 0.00 | 187 | 180 | 1 | Urban | Y |
| LP001514 | Female | Yes | 0 | Graduate | No | 2330 | 4486.00 | 100 | 360 | 1 | Semiurban | Y |
| LP001516 | Female | Yes | 2 | Graduate | No | 14866 | 0.00 | 70 | 360 | 1 | Urban | Y |
| LP001518 | Male | Yes | 1 | Graduate | No | 1538 | 1425.00 | 30 | 360 | 1 | Urban | Y |
| LP001519 | Female | No | 0 | Graduate | No | 10000 | 1666.00 | 225 | 360 | 1 | Rural | N |
| LP001520 | Male | Yes | 0 | Graduate | No | 4860 | 830.00 | 125 | 360 | 1 | Semiurban | Y |
| LP001528 | Male | No | 0 | Graduate | No | 6277 | 0.00 | 118 | 360 | 0 | Rural | N |
| LP001529 | Male | Yes | 0 | Graduate | Yes | 2577 | 3750.00 | 152 | 360 | 1 | Rural | Y |
| LP001531 | Male | No | 0 | Graduate | No | 9166 | 0.00 | 244 | 360 | 1 | Urban | N |
| LP001532 | Male | Yes | 2 | Not Graduate | No | 2281 | 0.00 | 113 | 360 | 1 | Rural | N |
| LP001535 | Male | No | 0 | Graduate | No | 3254 | 0.00 | 50 | 360 | 1 | Urban | Y |
| LP001536 | Male | Yes | 3+ | Graduate | No | 39999 | 0.00 | 600 | 180 | 0 | Semiurban | Y |
| LP001541 | Male | Yes | 1 | Graduate | No | 6000 | 0.00 | 160 | 360 | NA | Rural | Y |
| LP001543 | Male | Yes | 1 | Graduate | No | 9538 | 0.00 | 187 | 360 | 1 | Urban | Y |
| LP001546 | Male | No | 0 | Graduate | NA | 2980 | 2083.00 | 120 | 360 | 1 | Rural | Y |
| LP001552 | Male | Yes | 0 | Graduate | No | 4583 | 5625.00 | 255 | 360 | 1 | Semiurban | Y |
| LP001560 | Male | Yes | 0 | Not Graduate | No | 1863 | 1041.00 | 98 | 360 | 1 | Semiurban | Y |
| LP001562 | Male | Yes | 0 | Graduate | No | 7933 | 0.00 | 275 | 360 | 1 | Urban | N |
| LP001565 | Male | Yes | 1 | Graduate | No | 3089 | 1280.00 | 121 | 360 | 0 | Semiurban | N |
| LP001570 | Male | Yes | 2 | Graduate | No | 4167 | 1447.00 | 158 | 360 | 1 | Rural | Y |
| LP001572 | Male | Yes | 0 | Graduate | No | 9323 | 0.00 | 75 | 180 | 1 | Urban | Y |
| LP001574 | Male | Yes | 0 | Graduate | No | 3707 | 3166.00 | 182 | NA | 1 | Rural | Y |
| LP001577 | Female | Yes | 0 | Graduate | No | 4583 | 0.00 | 112 | 360 | 1 | Rural | N |
| LP001578 | Male | Yes | 0 | Graduate | No | 2439 | 3333.00 | 129 | 360 | 1 | Rural | Y |
| LP001579 | Male | No | 0 | Graduate | No | 2237 | 0.00 | 63 | 480 | 0 | Semiurban | N |
| LP001580 | Male | Yes | 2 | Graduate | No | 8000 | 0.00 | 200 | 360 | 1 | Semiurban | Y |
| LP001581 | Male | Yes | 0 | Not Graduate | NA | 1820 | 1769.00 | 95 | 360 | 1 | Rural | Y |
| LP001585 | NA | Yes | 3+ | Graduate | No | 51763 | 0.00 | 700 | 300 | 1 | Urban | Y |
| LP001586 | Male | Yes | 3+ | Not Graduate | No | 3522 | 0.00 | 81 | 180 | 1 | Rural | N |
| LP001594 | Male | Yes | 0 | Graduate | No | 5708 | 5625.00 | 187 | 360 | 1 | Semiurban | Y |
| LP001603 | Male | Yes | 0 | Not Graduate | Yes | 4344 | 736.00 | 87 | 360 | 1 | Semiurban | N |
| LP001606 | Male | Yes | 0 | Graduate | No | 3497 | 1964.00 | 116 | 360 | 1 | Rural | Y |
| LP001608 | Male | Yes | 2 | Graduate | No | 2045 | 1619.00 | 101 | 360 | 1 | Rural | Y |
| LP001610 | Male | Yes | 3+ | Graduate | No | 5516 | 11300.00 | 495 | 360 | 0 | Semiurban | N |
| LP001616 | Male | Yes | 1 | Graduate | No | 3750 | 0.00 | 116 | 360 | 1 | Semiurban | Y |
| LP001630 | Male | No | 0 | Not Graduate | No | 2333 | 1451.00 | 102 | 480 | 0 | Urban | N |
| LP001633 | Male | Yes | 1 | Graduate | No | 6400 | 7250.00 | 180 | 360 | 0 | Urban | N |
| LP001634 | Male | No | 0 | Graduate | No | 1916 | 5063.00 | 67 | 360 | NA | Rural | N |
| LP001636 | Male | Yes | 0 | Graduate | No | 4600 | 0.00 | 73 | 180 | 1 | Semiurban | Y |
| LP001637 | Male | Yes | 1 | Graduate | No | 33846 | 0.00 | 260 | 360 | 1 | Semiurban | N |
| LP001639 | Female | Yes | 0 | Graduate | No | 3625 | 0.00 | 108 | 360 | 1 | Semiurban | Y |
| LP001640 | Male | Yes | 0 | Graduate | Yes | 39147 | 4750.00 | 120 | 360 | 1 | Semiurban | Y |
| LP001641 | Male | Yes | 1 | Graduate | Yes | 2178 | 0.00 | 66 | 300 | 0 | Rural | N |
| LP001643 | Male | Yes | 0 | Graduate | No | 2383 | 2138.00 | 58 | 360 | NA | Rural | Y |
| LP001644 | NA | Yes | 0 | Graduate | Yes | 674 | 5296.00 | 168 | 360 | 1 | Rural | Y |
| LP001647 | Male | Yes | 0 | Graduate | No | 9328 | 0.00 | 188 | 180 | 1 | Rural | Y |
| LP001653 | Male | No | 0 | Not Graduate | No | 4885 | 0.00 | 48 | 360 | 1 | Rural | Y |
| LP001656 | Male | No | 0 | Graduate | No | 12000 | 0.00 | 164 | 360 | 1 | Semiurban | N |
| LP001657 | Male | Yes | 0 | Not Graduate | No | 6033 | 0.00 | 160 | 360 | 1 | Urban | N |
| LP001658 | Male | No | 0 | Graduate | No | 3858 | 0.00 | 76 | 360 | 1 | Semiurban | Y |
| LP001664 | Male | No | 0 | Graduate | No | 4191 | 0.00 | 120 | 360 | 1 | Rural | Y |
| LP001665 | Male | Yes | 1 | Graduate | No | 3125 | 2583.00 | 170 | 360 | 1 | Semiurban | N |
| LP001666 | Male | No | 0 | Graduate | No | 8333 | 3750.00 | 187 | 360 | 1 | Rural | Y |
| LP001669 | Female | No | 0 | Not Graduate | No | 1907 | 2365.00 | 120 | NA | 1 | Urban | Y |
| LP001671 | Female | Yes | 0 | Graduate | No | 3416 | 2816.00 | 113 | 360 | NA | Semiurban | Y |
| LP001673 | Male | No | 0 | Graduate | Yes | 11000 | 0.00 | 83 | 360 | 1 | Urban | N |
| LP001674 | Male | Yes | 1 | Not Graduate | No | 2600 | 2500.00 | 90 | 360 | 1 | Semiurban | Y |
| LP001677 | Male | No | 2 | Graduate | No | 4923 | 0.00 | 166 | 360 | 0 | Semiurban | Y |
| LP001682 | Male | Yes | 3+ | Not Graduate | No | 3992 | 0.00 | NA | 180 | 1 | Urban | N |
| LP001688 | Male | Yes | 1 | Not Graduate | No | 3500 | 1083.00 | 135 | 360 | 1 | Urban | Y |
| LP001691 | Male | Yes | 2 | Not Graduate | No | 3917 | 0.00 | 124 | 360 | 1 | Semiurban | Y |
| LP001692 | Female | No | 0 | Not Graduate | No | 4408 | 0.00 | 120 | 360 | 1 | Semiurban | Y |
| LP001693 | Female | No | 0 | Graduate | No | 3244 | 0.00 | 80 | 360 | 1 | Urban | Y |
| LP001698 | Male | No | 0 | Not Graduate | No | 3975 | 2531.00 | 55 | 360 | 1 | Rural | Y |
| LP001699 | Male | No | 0 | Graduate | No | 2479 | 0.00 | 59 | 360 | 1 | Urban | Y |
| LP001702 | Male | No | 0 | Graduate | No | 3418 | 0.00 | 127 | 360 | 1 | Semiurban | N |
| LP001708 | Female | No | 0 | Graduate | No | 10000 | 0.00 | 214 | 360 | 1 | Semiurban | N |
| LP001711 | Male | Yes | 3+ | Graduate | No | 3430 | 1250.00 | 128 | 360 | 0 | Semiurban | N |
| LP001713 | Male | Yes | 1 | Graduate | Yes | 7787 | 0.00 | 240 | 360 | 1 | Urban | Y |
| LP001715 | Male | Yes | 3+ | Not Graduate | Yes | 5703 | 0.00 | 130 | 360 | 1 | Rural | Y |
| LP001716 | Male | Yes | 0 | Graduate | No | 3173 | 3021.00 | 137 | 360 | 1 | Urban | Y |
| LP001720 | Male | Yes | 3+ | Not Graduate | No | 3850 | 983.00 | 100 | 360 | 1 | Semiurban | Y |
| LP001722 | Male | Yes | 0 | Graduate | No | 150 | 1800.00 | 135 | 360 | 1 | Rural | N |
| LP001726 | Male | Yes | 0 | Graduate | No | 3727 | 1775.00 | 131 | 360 | 1 | Semiurban | Y |
| LP001732 | Male | Yes | 2 | Graduate | NA | 5000 | 0.00 | 72 | 360 | 0 | Semiurban | N |
| LP001734 | Female | Yes | 2 | Graduate | No | 4283 | 2383.00 | 127 | 360 | NA | Semiurban | Y |
| LP001736 | Male | Yes | 0 | Graduate | No | 2221 | 0.00 | 60 | 360 | 0 | Urban | N |
| LP001743 | Male | Yes | 2 | Graduate | No | 4009 | 1717.00 | 116 | 360 | 1 | Semiurban | Y |
| LP001744 | Male | No | 0 | Graduate | No | 2971 | 2791.00 | 144 | 360 | 1 | Semiurban | Y |
| LP001749 | Male | Yes | 0 | Graduate | No | 7578 | 1010.00 | 175 | NA | 1 | Semiurban | Y |
| LP001750 | Male | Yes | 0 | Graduate | No | 6250 | 0.00 | 128 | 360 | 1 | Semiurban | Y |
| LP001751 | Male | Yes | 0 | Graduate | No | 3250 | 0.00 | 170 | 360 | 1 | Rural | N |
| LP001754 | Male | Yes | NA | Not Graduate | Yes | 4735 | 0.00 | 138 | 360 | 1 | Urban | N |
| LP001758 | Male | Yes | 2 | Graduate | No | 6250 | 1695.00 | 210 | 360 | 1 | Semiurban | Y |
| LP001760 | Male | NA | NA | Graduate | No | 4758 | 0.00 | 158 | 480 | 1 | Semiurban | Y |
| LP001761 | Male | No | 0 | Graduate | Yes | 6400 | 0.00 | 200 | 360 | 1 | Rural | Y |
| LP001765 | Male | Yes | 1 | Graduate | No | 2491 | 2054.00 | 104 | 360 | 1 | Semiurban | Y |
| LP001768 | Male | Yes | 0 | Graduate | NA | 3716 | 0.00 | 42 | 180 | 1 | Rural | Y |
| LP001770 | Male | No | 0 | Not Graduate | No | 3189 | 2598.00 | 120 | NA | 1 | Rural | Y |
| LP001776 | Female | No | 0 | Graduate | No | 8333 | 0.00 | 280 | 360 | 1 | Semiurban | Y |
| LP001778 | Male | Yes | 1 | Graduate | No | 3155 | 1779.00 | 140 | 360 | 1 | Semiurban | Y |
| LP001784 | Male | Yes | 1 | Graduate | No | 5500 | 1260.00 | 170 | 360 | 1 | Rural | Y |
| LP001786 | Male | Yes | 0 | Graduate | NA | 5746 | 0.00 | 255 | 360 | NA | Urban | N |
| LP001788 | Female | No | 0 | Graduate | Yes | 3463 | 0.00 | 122 | 360 | NA | Urban | Y |
| LP001790 | Female | No | 1 | Graduate | No | 3812 | 0.00 | 112 | 360 | 1 | Rural | Y |
| LP001792 | Male | Yes | 1 | Graduate | No | 3315 | 0.00 | 96 | 360 | 1 | Semiurban | Y |
| LP001798 | Male | Yes | 2 | Graduate | No | 5819 | 5000.00 | 120 | 360 | 1 | Rural | Y |
| LP001800 | Male | Yes | 1 | Not Graduate | No | 2510 | 1983.00 | 140 | 180 | 1 | Urban | N |
| LP001806 | Male | No | 0 | Graduate | No | 2965 | 5701.00 | 155 | 60 | 1 | Urban | Y |
| LP001807 | Male | Yes | 2 | Graduate | Yes | 6250 | 1300.00 | 108 | 360 | 1 | Rural | Y |
| LP001811 | Male | Yes | 0 | Not Graduate | No | 3406 | 4417.00 | 123 | 360 | 1 | Semiurban | Y |
| LP001813 | Male | No | 0 | Graduate | Yes | 6050 | 4333.00 | 120 | 180 | 1 | Urban | N |
| LP001814 | Male | Yes | 2 | Graduate | No | 9703 | 0.00 | 112 | 360 | 1 | Urban | Y |
| LP001819 | Male | Yes | 1 | Not Graduate | No | 6608 | 0.00 | 137 | 180 | 1 | Urban | Y |
| LP001824 | Male | Yes | 1 | Graduate | No | 2882 | 1843.00 | 123 | 480 | 1 | Semiurban | Y |
| LP001825 | Male | Yes | 0 | Graduate | No | 1809 | 1868.00 | 90 | 360 | 1 | Urban | Y |
| LP001835 | Male | Yes | 0 | Not Graduate | No | 1668 | 3890.00 | 201 | 360 | 0 | Semiurban | N |
| LP001836 | Female | No | 2 | Graduate | No | 3427 | 0.00 | 138 | 360 | 1 | Urban | N |
| LP001841 | Male | No | 0 | Not Graduate | Yes | 2583 | 2167.00 | 104 | 360 | 1 | Rural | Y |
| LP001843 | Male | Yes | 1 | Not Graduate | No | 2661 | 7101.00 | 279 | 180 | 1 | Semiurban | Y |
| LP001844 | Male | No | 0 | Graduate | Yes | 16250 | 0.00 | 192 | 360 | 0 | Urban | N |
| LP001846 | Female | No | 3+ | Graduate | No | 3083 | 0.00 | 255 | 360 | 1 | Rural | Y |
| LP001849 | Male | No | 0 | Not Graduate | No | 6045 | 0.00 | 115 | 360 | 0 | Rural | N |
| LP001854 | Male | Yes | 3+ | Graduate | No | 5250 | 0.00 | 94 | 360 | 1 | Urban | N |
| LP001859 | Male | Yes | 0 | Graduate | No | 14683 | 2100.00 | 304 | 360 | 1 | Rural | N |
| LP001864 | Male | Yes | 3+ | Not Graduate | No | 4931 | 0.00 | 128 | 360 | NA | Semiurban | N |
| LP001865 | Male | Yes | 1 | Graduate | No | 6083 | 4250.00 | 330 | 360 | NA | Urban | Y |
| LP001868 | Male | No | 0 | Graduate | No | 2060 | 2209.00 | 134 | 360 | 1 | Semiurban | Y |
| LP001870 | Female | No | 1 | Graduate | No | 3481 | 0.00 | 155 | 36 | 1 | Semiurban | N |
| LP001871 | Female | No | 0 | Graduate | No | 7200 | 0.00 | 120 | 360 | 1 | Rural | Y |
| LP001872 | Male | No | 0 | Graduate | Yes | 5166 | 0.00 | 128 | 360 | 1 | Semiurban | Y |
| LP001875 | Male | No | 0 | Graduate | No | 4095 | 3447.00 | 151 | 360 | 1 | Rural | Y |
| LP001877 | Male | Yes | 2 | Graduate | No | 4708 | 1387.00 | 150 | 360 | 1 | Semiurban | Y |
| LP001882 | Male | Yes | 3+ | Graduate | No | 4333 | 1811.00 | 160 | 360 | 0 | Urban | Y |
| LP001883 | Female | No | 0 | Graduate | NA | 3418 | 0.00 | 135 | 360 | 1 | Rural | N |
| LP001884 | Female | No | 1 | Graduate | No | 2876 | 1560.00 | 90 | 360 | 1 | Urban | Y |
| LP001888 | Female | No | 0 | Graduate | No | 3237 | 0.00 | 30 | 360 | 1 | Urban | Y |
| LP001891 | Male | Yes | 0 | Graduate | No | 11146 | 0.00 | 136 | 360 | 1 | Urban | Y |
| LP001892 | Male | No | 0 | Graduate | No | 2833 | 1857.00 | 126 | 360 | 1 | Rural | Y |
| LP001894 | Male | Yes | 0 | Graduate | No | 2620 | 2223.00 | 150 | 360 | 1 | Semiurban | Y |
| LP001896 | Male | Yes | 2 | Graduate | No | 3900 | 0.00 | 90 | 360 | 1 | Semiurban | Y |
| LP001900 | Male | Yes | 1 | Graduate | No | 2750 | 1842.00 | 115 | 360 | 1 | Semiurban | Y |
| LP001903 | Male | Yes | 0 | Graduate | No | 3993 | 3274.00 | 207 | 360 | 1 | Semiurban | Y |
| LP001904 | Male | Yes | 0 | Graduate | No | 3103 | 1300.00 | 80 | 360 | 1 | Urban | Y |
| LP001907 | Male | Yes | 0 | Graduate | No | 14583 | 0.00 | 436 | 360 | 1 | Semiurban | Y |
| LP001908 | Female | Yes | 0 | Not Graduate | No | 4100 | 0.00 | 124 | 360 | NA | Rural | Y |
| LP001910 | Male | No | 1 | Not Graduate | Yes | 4053 | 2426.00 | 158 | 360 | 0 | Urban | N |
| LP001914 | Male | Yes | 0 | Graduate | No | 3927 | 800.00 | 112 | 360 | 1 | Semiurban | Y |
| LP001915 | Male | Yes | 2 | Graduate | No | 2301 | 985.80 | 78 | 180 | 1 | Urban | Y |
| LP001917 | Female | No | 0 | Graduate | No | 1811 | 1666.00 | 54 | 360 | 1 | Urban | Y |
| LP001922 | Male | Yes | 0 | Graduate | No | 20667 | 0.00 | NA | 360 | 1 | Rural | N |
| LP001924 | Male | No | 0 | Graduate | No | 3158 | 3053.00 | 89 | 360 | 1 | Rural | Y |
| LP001925 | Female | No | 0 | Graduate | Yes | 2600 | 1717.00 | 99 | 300 | 1 | Semiurban | N |
| LP001926 | Male | Yes | 0 | Graduate | No | 3704 | 2000.00 | 120 | 360 | 1 | Rural | Y |
| LP001931 | Female | No | 0 | Graduate | No | 4124 | 0.00 | 115 | 360 | 1 | Semiurban | Y |
| LP001935 | Male | No | 0 | Graduate | No | 9508 | 0.00 | 187 | 360 | 1 | Rural | Y |
| LP001936 | Male | Yes | 0 | Graduate | No | 3075 | 2416.00 | 139 | 360 | 1 | Rural | Y |
| LP001938 | Male | Yes | 2 | Graduate | No | 4400 | 0.00 | 127 | 360 | 0 | Semiurban | N |
| LP001940 | Male | Yes | 2 | Graduate | No | 3153 | 1560.00 | 134 | 360 | 1 | Urban | Y |
| LP001945 | Female | No | NA | Graduate | No | 5417 | 0.00 | 143 | 480 | 0 | Urban | N |
| LP001947 | Male | Yes | 0 | Graduate | No | 2383 | 3334.00 | 172 | 360 | 1 | Semiurban | Y |
| LP001949 | Male | Yes | 3+ | Graduate | NA | 4416 | 1250.00 | 110 | 360 | 1 | Urban | Y |
| LP001953 | Male | Yes | 1 | Graduate | No | 6875 | 0.00 | 200 | 360 | 1 | Semiurban | Y |
| LP001954 | Female | Yes | 1 | Graduate | No | 4666 | 0.00 | 135 | 360 | 1 | Urban | Y |
| LP001955 | Female | No | 0 | Graduate | No | 5000 | 2541.00 | 151 | 480 | 1 | Rural | N |
| LP001963 | Male | Yes | 1 | Graduate | No | 2014 | 2925.00 | 113 | 360 | 1 | Urban | N |
| LP001964 | Male | Yes | 0 | Not Graduate | No | 1800 | 2934.00 | 93 | 360 | 0 | Urban | N |
| LP001972 | Male | Yes | NA | Not Graduate | No | 2875 | 1750.00 | 105 | 360 | 1 | Semiurban | Y |
| LP001974 | Female | No | 0 | Graduate | No | 5000 | 0.00 | 132 | 360 | 1 | Rural | Y |
| LP001977 | Male | Yes | 1 | Graduate | No | 1625 | 1803.00 | 96 | 360 | 1 | Urban | Y |
| LP001978 | Male | No | 0 | Graduate | No | 4000 | 2500.00 | 140 | 360 | 1 | Rural | Y |
| LP001990 | Male | No | 0 | Not Graduate | No | 2000 | 0.00 | NA | 360 | 1 | Urban | N |
| LP001993 | Female | No | 0 | Graduate | No | 3762 | 1666.00 | 135 | 360 | 1 | Rural | Y |
| LP001994 | Female | No | 0 | Graduate | No | 2400 | 1863.00 | 104 | 360 | 0 | Urban | N |
| LP001996 | Male | No | 0 | Graduate | No | 20233 | 0.00 | 480 | 360 | 1 | Rural | N |
| LP001998 | Male | Yes | 2 | Not Graduate | No | 7667 | 0.00 | 185 | 360 | NA | Rural | Y |
| LP002002 | Female | No | 0 | Graduate | No | 2917 | 0.00 | 84 | 360 | 1 | Semiurban | Y |
| LP002004 | Male | No | 0 | Not Graduate | No | 2927 | 2405.00 | 111 | 360 | 1 | Semiurban | Y |
| LP002006 | Female | No | 0 | Graduate | No | 2507 | 0.00 | 56 | 360 | 1 | Rural | Y |
| LP002008 | Male | Yes | 2 | Graduate | Yes | 5746 | 0.00 | 144 | 84 | NA | Rural | Y |
| LP002024 | NA | Yes | 0 | Graduate | No | 2473 | 1843.00 | 159 | 360 | 1 | Rural | N |
| LP002031 | Male | Yes | 1 | Not Graduate | No | 3399 | 1640.00 | 111 | 180 | 1 | Urban | Y |
| LP002035 | Male | Yes | 2 | Graduate | No | 3717 | 0.00 | 120 | 360 | 1 | Semiurban | Y |
| LP002036 | Male | Yes | 0 | Graduate | No | 2058 | 2134.00 | 88 | 360 | NA | Urban | Y |
| LP002043 | Female | No | 1 | Graduate | No | 3541 | 0.00 | 112 | 360 | NA | Semiurban | Y |
| LP002050 | Male | Yes | 1 | Graduate | Yes | 10000 | 0.00 | 155 | 360 | 1 | Rural | N |
| LP002051 | Male | Yes | 0 | Graduate | No | 2400 | 2167.00 | 115 | 360 | 1 | Semiurban | Y |
| LP002053 | Male | Yes | 3+ | Graduate | No | 4342 | 189.00 | 124 | 360 | 1 | Semiurban | Y |
| LP002054 | Male | Yes | 2 | Not Graduate | No | 3601 | 1590.00 | NA | 360 | 1 | Rural | Y |
| LP002055 | Female | No | 0 | Graduate | No | 3166 | 2985.00 | 132 | 360 | NA | Rural | Y |
| LP002065 | Male | Yes | 3+ | Graduate | No | 15000 | 0.00 | 300 | 360 | 1 | Rural | Y |
| LP002067 | Male | Yes | 1 | Graduate | Yes | 8666 | 4983.00 | 376 | 360 | 0 | Rural | N |
| LP002068 | Male | No | 0 | Graduate | No | 4917 | 0.00 | 130 | 360 | 0 | Rural | Y |
| LP002082 | Male | Yes | 0 | Graduate | Yes | 5818 | 2160.00 | 184 | 360 | 1 | Semiurban | Y |
| LP002086 | Female | Yes | 0 | Graduate | No | 4333 | 2451.00 | 110 | 360 | 1 | Urban | N |
| LP002087 | Female | No | 0 | Graduate | No | 2500 | 0.00 | 67 | 360 | 1 | Urban | Y |
| LP002097 | Male | No | 1 | Graduate | No | 4384 | 1793.00 | 117 | 360 | 1 | Urban | Y |
| LP002098 | Male | No | 0 | Graduate | No | 2935 | 0.00 | 98 | 360 | 1 | Semiurban | Y |
| LP002100 | Male | No | NA | Graduate | No | 2833 | 0.00 | 71 | 360 | 1 | Urban | Y |
| LP002101 | Male | Yes | 0 | Graduate | NA | 63337 | 0.00 | 490 | 180 | 1 | Urban | Y |
| LP002103 | NA | Yes | 1 | Graduate | Yes | 9833 | 1833.00 | 182 | 180 | 1 | Urban | Y |
| LP002106 | Male | Yes | NA | Graduate | Yes | 5503 | 4490.00 | 70 | NA | 1 | Semiurban | Y |
| LP002110 | Male | Yes | 1 | Graduate | NA | 5250 | 688.00 | 160 | 360 | 1 | Rural | Y |
| LP002112 | Male | Yes | 2 | Graduate | Yes | 2500 | 4600.00 | 176 | 360 | 1 | Rural | Y |
| LP002113 | Female | No | 3+ | Not Graduate | No | 1830 | 0.00 | NA | 360 | 0 | Urban | N |
| LP002114 | Female | No | 0 | Graduate | No | 4160 | 0.00 | 71 | 360 | 1 | Semiurban | Y |
| LP002115 | Male | Yes | 3+ | Not Graduate | No | 2647 | 1587.00 | 173 | 360 | 1 | Rural | N |
| LP002116 | Female | No | 0 | Graduate | No | 2378 | 0.00 | 46 | 360 | 1 | Rural | N |
| LP002119 | Male | Yes | 1 | Not Graduate | No | 4554 | 1229.00 | 158 | 360 | 1 | Urban | Y |
| LP002126 | Male | Yes | 3+ | Not Graduate | No | 3173 | 0.00 | 74 | 360 | 1 | Semiurban | Y |
| LP002128 | Male | Yes | 2 | Graduate | NA | 2583 | 2330.00 | 125 | 360 | 1 | Rural | Y |
| LP002129 | Male | Yes | 0 | Graduate | No | 2499 | 2458.00 | 160 | 360 | 1 | Semiurban | Y |
| LP002130 | Male | Yes | NA | Not Graduate | No | 3523 | 3230.00 | 152 | 360 | 0 | Rural | N |
| LP002131 | Male | Yes | 2 | Not Graduate | No | 3083 | 2168.00 | 126 | 360 | 1 | Urban | Y |
| LP002137 | Male | Yes | 0 | Graduate | No | 6333 | 4583.00 | 259 | 360 | NA | Semiurban | Y |
| LP002138 | Male | Yes | 0 | Graduate | No | 2625 | 6250.00 | 187 | 360 | 1 | Rural | Y |
| LP002139 | Male | Yes | 0 | Graduate | No | 9083 | 0.00 | 228 | 360 | 1 | Semiurban | Y |
| LP002140 | Male | No | 0 | Graduate | No | 8750 | 4167.00 | 308 | 360 | 1 | Rural | N |
| LP002141 | Male | Yes | 3+ | Graduate | No | 2666 | 2083.00 | 95 | 360 | 1 | Rural | Y |
| LP002142 | Female | Yes | 0 | Graduate | Yes | 5500 | 0.00 | 105 | 360 | 0 | Rural | N |
| LP002143 | Female | Yes | 0 | Graduate | No | 2423 | 505.00 | 130 | 360 | 1 | Semiurban | Y |
| LP002144 | Female | No | NA | Graduate | No | 3813 | 0.00 | 116 | 180 | 1 | Urban | Y |
| LP002149 | Male | Yes | 2 | Graduate | No | 8333 | 3167.00 | 165 | 360 | 1 | Rural | Y |
| LP002151 | Male | Yes | 1 | Graduate | No | 3875 | 0.00 | 67 | 360 | 1 | Urban | N |
| LP002158 | Male | Yes | 0 | Not Graduate | No | 3000 | 1666.00 | 100 | 480 | 0 | Urban | N |
| LP002160 | Male | Yes | 3+ | Graduate | No | 5167 | 3167.00 | 200 | 360 | 1 | Semiurban | Y |
| LP002161 | Female | No | 1 | Graduate | No | 4723 | 0.00 | 81 | 360 | 1 | Semiurban | N |
| LP002170 | Male | Yes | 2 | Graduate | No | 5000 | 3667.00 | 236 | 360 | 1 | Semiurban | Y |
| LP002175 | Male | Yes | 0 | Graduate | No | 4750 | 2333.00 | 130 | 360 | 1 | Urban | Y |
| LP002178 | Male | Yes | 0 | Graduate | No | 3013 | 3033.00 | 95 | 300 | NA | Urban | Y |
| LP002180 | Male | No | 0 | Graduate | Yes | 6822 | 0.00 | 141 | 360 | 1 | Rural | Y |
| LP002181 | Male | No | 0 | Not Graduate | No | 6216 | 0.00 | 133 | 360 | 1 | Rural | N |
| LP002187 | Male | No | 0 | Graduate | No | 2500 | 0.00 | 96 | 480 | 1 | Semiurban | N |
| LP002188 | Male | No | 0 | Graduate | No | 5124 | 0.00 | 124 | NA | 0 | Rural | N |
| LP002190 | Male | Yes | 1 | Graduate | No | 6325 | 0.00 | 175 | 360 | 1 | Semiurban | Y |
| LP002191 | Male | Yes | 0 | Graduate | No | 19730 | 5266.00 | 570 | 360 | 1 | Rural | N |
| LP002194 | Female | No | 0 | Graduate | Yes | 15759 | 0.00 | 55 | 360 | 1 | Semiurban | Y |
| LP002197 | Male | Yes | 2 | Graduate | No | 5185 | 0.00 | 155 | 360 | 1 | Semiurban | Y |
| LP002201 | Male | Yes | 2 | Graduate | Yes | 9323 | 7873.00 | 380 | 300 | 1 | Rural | Y |
| LP002205 | Male | No | 1 | Graduate | No | 3062 | 1987.00 | 111 | 180 | 0 | Urban | N |
| LP002209 | Female | No | 0 | Graduate | NA | 2764 | 1459.00 | 110 | 360 | 1 | Urban | Y |
| LP002211 | Male | Yes | 0 | Graduate | No | 4817 | 923.00 | 120 | 180 | 1 | Urban | Y |
| LP002219 | Male | Yes | 3+ | Graduate | No | 8750 | 4996.00 | 130 | 360 | 1 | Rural | Y |
| LP002223 | Male | Yes | 0 | Graduate | No | 4310 | 0.00 | 130 | 360 | NA | Semiurban | Y |
| LP002224 | Male | No | 0 | Graduate | No | 3069 | 0.00 | 71 | 480 | 1 | Urban | N |
| LP002225 | Male | Yes | 2 | Graduate | No | 5391 | 0.00 | 130 | 360 | 1 | Urban | Y |
| LP002226 | Male | Yes | 0 | Graduate | NA | 3333 | 2500.00 | 128 | 360 | 1 | Semiurban | Y |
| LP002229 | Male | No | 0 | Graduate | No | 5941 | 4232.00 | 296 | 360 | 1 | Semiurban | Y |
| LP002231 | Female | No | 0 | Graduate | No | 6000 | 0.00 | 156 | 360 | 1 | Urban | Y |
| LP002234 | Male | No | 0 | Graduate | Yes | 7167 | 0.00 | 128 | 360 | 1 | Urban | Y |
| LP002236 | Male | Yes | 2 | Graduate | No | 4566 | 0.00 | 100 | 360 | 1 | Urban | N |
| LP002237 | Male | No | 1 | Graduate | NA | 3667 | 0.00 | 113 | 180 | 1 | Urban | Y |
| LP002239 | Male | No | 0 | Not Graduate | No | 2346 | 1600.00 | 132 | 360 | 1 | Semiurban | Y |
| LP002243 | Male | Yes | 0 | Not Graduate | No | 3010 | 3136.00 | NA | 360 | 0 | Urban | N |
| LP002244 | Male | Yes | 0 | Graduate | No | 2333 | 2417.00 | 136 | 360 | 1 | Urban | Y |
| LP002250 | Male | Yes | 0 | Graduate | No | 5488 | 0.00 | 125 | 360 | 1 | Rural | Y |
| LP002255 | Male | No | 3+ | Graduate | No | 9167 | 0.00 | 185 | 360 | 1 | Rural | Y |
| LP002262 | Male | Yes | 3+ | Graduate | No | 9504 | 0.00 | 275 | 360 | 1 | Rural | Y |
| LP002263 | Male | Yes | 0 | Graduate | No | 2583 | 2115.00 | 120 | 360 | NA | Urban | Y |
| LP002265 | Male | Yes | 2 | Not Graduate | No | 1993 | 1625.00 | 113 | 180 | 1 | Semiurban | Y |
| LP002266 | Male | Yes | 2 | Graduate | No | 3100 | 1400.00 | 113 | 360 | 1 | Urban | Y |
| LP002272 | Male | Yes | 2 | Graduate | No | 3276 | 484.00 | 135 | 360 | NA | Semiurban | Y |
| LP002277 | Female | No | 0 | Graduate | No | 3180 | 0.00 | 71 | 360 | 0 | Urban | N |
| LP002281 | Male | Yes | 0 | Graduate | No | 3033 | 1459.00 | 95 | 360 | 1 | Urban | Y |
| LP002284 | Male | No | 0 | Not Graduate | No | 3902 | 1666.00 | 109 | 360 | 1 | Rural | Y |
| LP002287 | Female | No | 0 | Graduate | No | 1500 | 1800.00 | 103 | 360 | 0 | Semiurban | N |
| LP002288 | Male | Yes | 2 | Not Graduate | No | 2889 | 0.00 | 45 | 180 | 0 | Urban | N |
| LP002296 | Male | No | 0 | Not Graduate | No | 2755 | 0.00 | 65 | 300 | 1 | Rural | N |
| LP002297 | Male | No | 0 | Graduate | No | 2500 | 20000.00 | 103 | 360 | 1 | Semiurban | Y |
| LP002300 | Female | No | 0 | Not Graduate | No | 1963 | 0.00 | 53 | 360 | 1 | Semiurban | Y |
| LP002301 | Female | No | 0 | Graduate | Yes | 7441 | 0.00 | 194 | 360 | 1 | Rural | N |
| LP002305 | Female | No | 0 | Graduate | No | 4547 | 0.00 | 115 | 360 | 1 | Semiurban | Y |
| LP002308 | Male | Yes | 0 | Not Graduate | No | 2167 | 2400.00 | 115 | 360 | 1 | Urban | Y |
| LP002314 | Female | No | 0 | Not Graduate | No | 2213 | 0.00 | 66 | 360 | 1 | Rural | Y |
| LP002315 | Male | Yes | 1 | Graduate | No | 8300 | 0.00 | 152 | 300 | 0 | Semiurban | N |
| LP002317 | Male | Yes | 3+ | Graduate | No | 81000 | 0.00 | 360 | 360 | 0 | Rural | N |
| LP002318 | Female | No | 1 | Not Graduate | Yes | 3867 | 0.00 | 62 | 360 | 1 | Semiurban | N |
| LP002319 | Male | Yes | 0 | Graduate | NA | 6256 | 0.00 | 160 | 360 | NA | Urban | Y |
| LP002328 | Male | Yes | 0 | Not Graduate | No | 6096 | 0.00 | 218 | 360 | 0 | Rural | N |
| LP002332 | Male | Yes | 0 | Not Graduate | No | 2253 | 2033.00 | 110 | 360 | 1 | Rural | Y |
| LP002335 | Female | Yes | 0 | Not Graduate | No | 2149 | 3237.00 | 178 | 360 | 0 | Semiurban | N |
| LP002337 | Female | No | 0 | Graduate | No | 2995 | 0.00 | 60 | 360 | 1 | Urban | Y |
| LP002341 | Female | No | 1 | Graduate | No | 2600 | 0.00 | 160 | 360 | 1 | Urban | N |
| LP002342 | Male | Yes | 2 | Graduate | Yes | 1600 | 20000.00 | 239 | 360 | 1 | Urban | N |
| LP002345 | Male | Yes | 0 | Graduate | No | 1025 | 2773.00 | 112 | 360 | 1 | Rural | Y |
| LP002347 | Male | Yes | 0 | Graduate | No | 3246 | 1417.00 | 138 | 360 | 1 | Semiurban | Y |
| LP002348 | Male | Yes | 0 | Graduate | No | 5829 | 0.00 | 138 | 360 | 1 | Rural | Y |
| LP002357 | Female | No | 0 | Not Graduate | No | 2720 | 0.00 | 80 | NA | 0 | Urban | N |
| LP002361 | Male | Yes | 0 | Graduate | No | 1820 | 1719.00 | 100 | 360 | 1 | Urban | Y |
| LP002362 | Male | Yes | 1 | Graduate | No | 7250 | 1667.00 | 110 | NA | 0 | Urban | N |
| LP002364 | Male | Yes | 0 | Graduate | No | 14880 | 0.00 | 96 | 360 | 1 | Semiurban | Y |
| LP002366 | Male | Yes | 0 | Graduate | No | 2666 | 4300.00 | 121 | 360 | 1 | Rural | Y |
| LP002367 | Female | No | 1 | Not Graduate | No | 4606 | 0.00 | 81 | 360 | 1 | Rural | N |
| LP002368 | Male | Yes | 2 | Graduate | No | 5935 | 0.00 | 133 | 360 | 1 | Semiurban | Y |
| LP002369 | Male | Yes | 0 | Graduate | No | 2920 | 16.12 | 87 | 360 | 1 | Rural | Y |
| LP002370 | Male | No | 0 | Not Graduate | No | 2717 | 0.00 | 60 | 180 | 1 | Urban | Y |
| LP002377 | Female | No | 1 | Graduate | Yes | 8624 | 0.00 | 150 | 360 | 1 | Semiurban | Y |
| LP002379 | Male | No | 0 | Graduate | No | 6500 | 0.00 | 105 | 360 | 0 | Rural | N |
| LP002386 | Male | No | 0 | Graduate | NA | 12876 | 0.00 | 405 | 360 | 1 | Semiurban | Y |
| LP002387 | Male | Yes | 0 | Graduate | No | 2425 | 2340.00 | 143 | 360 | 1 | Semiurban | Y |
| LP002390 | Male | No | 0 | Graduate | No | 3750 | 0.00 | 100 | 360 | 1 | Urban | Y |
| LP002393 | Female | NA | NA | Graduate | No | 10047 | 0.00 | NA | 240 | 1 | Semiurban | Y |
| LP002398 | Male | No | 0 | Graduate | No | 1926 | 1851.00 | 50 | 360 | 1 | Semiurban | Y |
| LP002401 | Male | Yes | 0 | Graduate | No | 2213 | 1125.00 | NA | 360 | 1 | Urban | Y |
| LP002403 | Male | No | 0 | Graduate | Yes | 10416 | 0.00 | 187 | 360 | 0 | Urban | N |
| LP002407 | Female | Yes | 0 | Not Graduate | Yes | 7142 | 0.00 | 138 | 360 | 1 | Rural | Y |
| LP002408 | Male | No | 0 | Graduate | No | 3660 | 5064.00 | 187 | 360 | 1 | Semiurban | Y |
| LP002409 | Male | Yes | 0 | Graduate | No | 7901 | 1833.00 | 180 | 360 | 1 | Rural | Y |
| LP002418 | Male | No | 3+ | Not Graduate | No | 4707 | 1993.00 | 148 | 360 | 1 | Semiurban | Y |
| LP002422 | Male | No | 1 | Graduate | No | 37719 | 0.00 | 152 | 360 | 1 | Semiurban | Y |
| LP002424 | Male | Yes | 0 | Graduate | No | 7333 | 8333.00 | 175 | 300 | NA | Rural | Y |
| LP002429 | Male | Yes | 1 | Graduate | Yes | 3466 | 1210.00 | 130 | 360 | 1 | Rural | Y |
| LP002434 | Male | Yes | 2 | Not Graduate | No | 4652 | 0.00 | 110 | 360 | 1 | Rural | Y |
| LP002435 | Male | Yes | 0 | Graduate | NA | 3539 | 1376.00 | 55 | 360 | 1 | Rural | N |
| LP002443 | Male | Yes | 2 | Graduate | No | 3340 | 1710.00 | 150 | 360 | 0 | Rural | N |
| LP002444 | Male | No | 1 | Not Graduate | Yes | 2769 | 1542.00 | 190 | 360 | NA | Semiurban | N |
| LP002446 | Male | Yes | 2 | Not Graduate | No | 2309 | 1255.00 | 125 | 360 | 0 | Rural | N |
| LP002447 | Male | Yes | 2 | Not Graduate | No | 1958 | 1456.00 | 60 | 300 | NA | Urban | Y |
| LP002448 | Male | Yes | 0 | Graduate | No | 3948 | 1733.00 | 149 | 360 | 0 | Rural | N |
| LP002449 | Male | Yes | 0 | Graduate | No | 2483 | 2466.00 | 90 | 180 | 0 | Rural | Y |
| LP002453 | Male | No | 0 | Graduate | Yes | 7085 | 0.00 | 84 | 360 | 1 | Semiurban | Y |
| LP002455 | Male | Yes | 2 | Graduate | No | 3859 | 0.00 | 96 | 360 | 1 | Semiurban | Y |
| LP002459 | Male | Yes | 0 | Graduate | No | 4301 | 0.00 | 118 | 360 | 1 | Urban | Y |
| LP002467 | Male | Yes | 0 | Graduate | No | 3708 | 2569.00 | 173 | 360 | 1 | Urban | N |
| LP002472 | Male | No | 2 | Graduate | No | 4354 | 0.00 | 136 | 360 | 1 | Rural | Y |
| LP002473 | Male | Yes | 0 | Graduate | No | 8334 | 0.00 | 160 | 360 | 1 | Semiurban | N |
| LP002478 | NA | Yes | 0 | Graduate | Yes | 2083 | 4083.00 | 160 | 360 | NA | Semiurban | Y |
| LP002484 | Male | Yes | 3+ | Graduate | No | 7740 | 0.00 | 128 | 180 | 1 | Urban | Y |
| LP002487 | Male | Yes | 0 | Graduate | No | 3015 | 2188.00 | 153 | 360 | 1 | Rural | Y |
| LP002489 | Female | No | 1 | Not Graduate | NA | 5191 | 0.00 | 132 | 360 | 1 | Semiurban | Y |
| LP002493 | Male | No | 0 | Graduate | No | 4166 | 0.00 | 98 | 360 | 0 | Semiurban | N |
| LP002494 | Male | No | 0 | Graduate | No | 6000 | 0.00 | 140 | 360 | 1 | Rural | Y |
| LP002500 | Male | Yes | 3+ | Not Graduate | No | 2947 | 1664.00 | 70 | 180 | 0 | Urban | N |
| LP002501 | NA | Yes | 0 | Graduate | No | 16692 | 0.00 | 110 | 360 | 1 | Semiurban | Y |
| LP002502 | Female | Yes | 2 | Not Graduate | NA | 210 | 2917.00 | 98 | 360 | 1 | Semiurban | Y |
| LP002505 | Male | Yes | 0 | Graduate | No | 4333 | 2451.00 | 110 | 360 | 1 | Urban | N |
| LP002515 | Male | Yes | 1 | Graduate | Yes | 3450 | 2079.00 | 162 | 360 | 1 | Semiurban | Y |
| LP002517 | Male | Yes | 1 | Not Graduate | No | 2653 | 1500.00 | 113 | 180 | 0 | Rural | N |
| LP002519 | Male | Yes | 3+ | Graduate | No | 4691 | 0.00 | 100 | 360 | 1 | Semiurban | Y |
| LP002522 | Female | No | 0 | Graduate | Yes | 2500 | 0.00 | 93 | 360 | NA | Urban | Y |
| LP002524 | Male | No | 2 | Graduate | No | 5532 | 4648.00 | 162 | 360 | 1 | Rural | Y |
| LP002527 | Male | Yes | 2 | Graduate | Yes | 16525 | 1014.00 | 150 | 360 | 1 | Rural | Y |
| LP002529 | Male | Yes | 2 | Graduate | No | 6700 | 1750.00 | 230 | 300 | 1 | Semiurban | Y |
| LP002530 | NA | Yes | 2 | Graduate | No | 2873 | 1872.00 | 132 | 360 | 0 | Semiurban | N |
| LP002531 | Male | Yes | 1 | Graduate | Yes | 16667 | 2250.00 | 86 | 360 | 1 | Semiurban | Y |
| LP002533 | Male | Yes | 2 | Graduate | No | 2947 | 1603.00 | NA | 360 | 1 | Urban | N |
| LP002534 | Female | No | 0 | Not Graduate | No | 4350 | 0.00 | 154 | 360 | 1 | Rural | Y |
| LP002536 | Male | Yes | 3+ | Not Graduate | No | 3095 | 0.00 | 113 | 360 | 1 | Rural | Y |
| LP002537 | Male | Yes | 0 | Graduate | No | 2083 | 3150.00 | 128 | 360 | 1 | Semiurban | Y |
| LP002541 | Male | Yes | 0 | Graduate | No | 10833 | 0.00 | 234 | 360 | 1 | Semiurban | Y |
| LP002543 | Male | Yes | 2 | Graduate | No | 8333 | 0.00 | 246 | 360 | 1 | Semiurban | Y |
| LP002544 | Male | Yes | 1 | Not Graduate | No | 1958 | 2436.00 | 131 | 360 | 1 | Rural | Y |
| LP002545 | Male | No | 2 | Graduate | No | 3547 | 0.00 | 80 | 360 | 0 | Rural | N |
| LP002547 | Male | Yes | 1 | Graduate | No | 18333 | 0.00 | 500 | 360 | 1 | Urban | N |
| LP002555 | Male | Yes | 2 | Graduate | Yes | 4583 | 2083.00 | 160 | 360 | 1 | Semiurban | Y |
| LP002556 | Male | No | 0 | Graduate | No | 2435 | 0.00 | 75 | 360 | 1 | Urban | N |
| LP002560 | Male | No | 0 | Not Graduate | No | 2699 | 2785.00 | 96 | 360 | NA | Semiurban | Y |
| LP002562 | Male | Yes | 1 | Not Graduate | No | 5333 | 1131.00 | 186 | 360 | NA | Urban | Y |
| LP002571 | Male | No | 0 | Not Graduate | No | 3691 | 0.00 | 110 | 360 | 1 | Rural | Y |
| LP002582 | Female | No | 0 | Not Graduate | Yes | 17263 | 0.00 | 225 | 360 | 1 | Semiurban | Y |
| LP002585 | Male | Yes | 0 | Graduate | No | 3597 | 2157.00 | 119 | 360 | 0 | Rural | N |
| LP002586 | Female | Yes | 1 | Graduate | No | 3326 | 913.00 | 105 | 84 | 1 | Semiurban | Y |
| LP002587 | Male | Yes | 0 | Not Graduate | No | 2600 | 1700.00 | 107 | 360 | 1 | Rural | Y |
| LP002588 | Male | Yes | 0 | Graduate | No | 4625 | 2857.00 | 111 | 12 | NA | Urban | Y |
| LP002600 | Male | Yes | 1 | Graduate | Yes | 2895 | 0.00 | 95 | 360 | 1 | Semiurban | Y |
| LP002602 | Male | No | 0 | Graduate | No | 6283 | 4416.00 | 209 | 360 | 0 | Rural | N |
| LP002603 | Female | No | 0 | Graduate | No | 645 | 3683.00 | 113 | 480 | 1 | Rural | Y |
| LP002606 | Female | No | 0 | Graduate | No | 3159 | 0.00 | 100 | 360 | 1 | Semiurban | Y |
| LP002615 | Male | Yes | 2 | Graduate | No | 4865 | 5624.00 | 208 | 360 | 1 | Semiurban | Y |
| LP002618 | Male | Yes | 1 | Not Graduate | No | 4050 | 5302.00 | 138 | 360 | NA | Rural | N |
| LP002619 | Male | Yes | 0 | Not Graduate | No | 3814 | 1483.00 | 124 | 300 | 1 | Semiurban | Y |
| LP002622 | Male | Yes | 2 | Graduate | No | 3510 | 4416.00 | 243 | 360 | 1 | Rural | Y |
| LP002624 | Male | Yes | 0 | Graduate | No | 20833 | 6667.00 | 480 | 360 | NA | Urban | Y |
| LP002625 | NA | No | 0 | Graduate | No | 3583 | 0.00 | 96 | 360 | 1 | Urban | N |
| LP002626 | Male | Yes | 0 | Graduate | Yes | 2479 | 3013.00 | 188 | 360 | 1 | Urban | Y |
| LP002634 | Female | No | 1 | Graduate | No | 13262 | 0.00 | 40 | 360 | 1 | Urban | Y |
| LP002637 | Male | No | 0 | Not Graduate | No | 3598 | 1287.00 | 100 | 360 | 1 | Rural | N |
| LP002640 | Male | Yes | 1 | Graduate | No | 6065 | 2004.00 | 250 | 360 | 1 | Semiurban | Y |
| LP002643 | Male | Yes | 2 | Graduate | No | 3283 | 2035.00 | 148 | 360 | 1 | Urban | Y |
| LP002648 | Male | Yes | 0 | Graduate | No | 2130 | 6666.00 | 70 | 180 | 1 | Semiurban | N |
| LP002652 | Male | No | 0 | Graduate | No | 5815 | 3666.00 | 311 | 360 | 1 | Rural | N |
| LP002659 | Male | Yes | 3+ | Graduate | No | 3466 | 3428.00 | 150 | 360 | 1 | Rural | Y |
| LP002670 | Female | Yes | 2 | Graduate | No | 2031 | 1632.00 | 113 | 480 | 1 | Semiurban | Y |
| LP002682 | Male | Yes | NA | Not Graduate | No | 3074 | 1800.00 | 123 | 360 | 0 | Semiurban | N |
| LP002683 | Male | No | 0 | Graduate | No | 4683 | 1915.00 | 185 | 360 | 1 | Semiurban | N |
| LP002684 | Female | No | 0 | Not Graduate | No | 3400 | 0.00 | 95 | 360 | 1 | Rural | N |
| LP002689 | Male | Yes | 2 | Not Graduate | No | 2192 | 1742.00 | 45 | 360 | 1 | Semiurban | Y |
| LP002690 | Male | No | 0 | Graduate | No | 2500 | 0.00 | 55 | 360 | 1 | Semiurban | Y |
| LP002692 | Male | Yes | 3+ | Graduate | Yes | 5677 | 1424.00 | 100 | 360 | 1 | Rural | Y |
| LP002693 | Male | Yes | 2 | Graduate | Yes | 7948 | 7166.00 | 480 | 360 | 1 | Rural | Y |
| LP002697 | Male | No | 0 | Graduate | No | 4680 | 2087.00 | NA | 360 | 1 | Semiurban | N |
| LP002699 | Male | Yes | 2 | Graduate | Yes | 17500 | 0.00 | 400 | 360 | 1 | Rural | Y |
| LP002705 | Male | Yes | 0 | Graduate | No | 3775 | 0.00 | 110 | 360 | 1 | Semiurban | Y |
| LP002706 | Male | Yes | 1 | Not Graduate | No | 5285 | 1430.00 | 161 | 360 | 0 | Semiurban | Y |
| LP002714 | Male | No | 1 | Not Graduate | No | 2679 | 1302.00 | 94 | 360 | 1 | Semiurban | Y |
| LP002716 | Male | No | 0 | Not Graduate | No | 6783 | 0.00 | 130 | 360 | 1 | Semiurban | Y |
| LP002717 | Male | Yes | 0 | Graduate | No | 1025 | 5500.00 | 216 | 360 | NA | Rural | Y |
| LP002720 | Male | Yes | 3+ | Graduate | No | 4281 | 0.00 | 100 | 360 | 1 | Urban | Y |
| LP002723 | Male | No | 2 | Graduate | No | 3588 | 0.00 | 110 | 360 | 0 | Rural | N |
| LP002729 | Male | No | 1 | Graduate | No | 11250 | 0.00 | 196 | 360 | NA | Semiurban | N |
| LP002731 | Female | No | 0 | Not Graduate | Yes | 18165 | 0.00 | 125 | 360 | 1 | Urban | Y |
| LP002732 | Male | No | 0 | Not Graduate | NA | 2550 | 2042.00 | 126 | 360 | 1 | Rural | Y |
| LP002734 | Male | Yes | 0 | Graduate | No | 6133 | 3906.00 | 324 | 360 | 1 | Urban | Y |
| LP002738 | Male | No | 2 | Graduate | No | 3617 | 0.00 | 107 | 360 | 1 | Semiurban | Y |
| LP002739 | Male | Yes | 0 | Not Graduate | No | 2917 | 536.00 | 66 | 360 | 1 | Rural | N |
| LP002740 | Male | Yes | 3+ | Graduate | No | 6417 | 0.00 | 157 | 180 | 1 | Rural | Y |
| LP002741 | Female | Yes | 1 | Graduate | No | 4608 | 2845.00 | 140 | 180 | 1 | Semiurban | Y |
| LP002743 | Female | No | 0 | Graduate | No | 2138 | 0.00 | 99 | 360 | 0 | Semiurban | N |
| LP002753 | Female | No | 1 | Graduate | NA | 3652 | 0.00 | 95 | 360 | 1 | Semiurban | Y |
| LP002755 | Male | Yes | 1 | Not Graduate | No | 2239 | 2524.00 | 128 | 360 | 1 | Urban | Y |
| LP002757 | Female | Yes | 0 | Not Graduate | No | 3017 | 663.00 | 102 | 360 | NA | Semiurban | Y |
| LP002767 | Male | Yes | 0 | Graduate | No | 2768 | 1950.00 | 155 | 360 | 1 | Rural | Y |
| LP002768 | Male | No | 0 | Not Graduate | No | 3358 | 0.00 | 80 | 36 | 1 | Semiurban | N |
| LP002772 | Male | No | 0 | Graduate | No | 2526 | 1783.00 | 145 | 360 | 1 | Rural | Y |
| LP002776 | Female | No | 0 | Graduate | No | 5000 | 0.00 | 103 | 360 | 0 | Semiurban | N |
| LP002777 | Male | Yes | 0 | Graduate | No | 2785 | 2016.00 | 110 | 360 | 1 | Rural | Y |
| LP002778 | Male | Yes | 2 | Graduate | Yes | 6633 | 0.00 | NA | 360 | 0 | Rural | N |
| LP002784 | Male | Yes | 1 | Not Graduate | No | 2492 | 2375.00 | NA | 360 | 1 | Rural | Y |
| LP002785 | Male | Yes | 1 | Graduate | No | 3333 | 3250.00 | 158 | 360 | 1 | Urban | Y |
| LP002788 | Male | Yes | 0 | Not Graduate | No | 2454 | 2333.00 | 181 | 360 | 0 | Urban | N |
| LP002789 | Male | Yes | 0 | Graduate | No | 3593 | 4266.00 | 132 | 180 | 0 | Rural | N |
| LP002792 | Male | Yes | 1 | Graduate | No | 5468 | 1032.00 | 26 | 360 | 1 | Semiurban | Y |
| LP002794 | Female | No | 0 | Graduate | No | 2667 | 1625.00 | 84 | 360 | NA | Urban | Y |
| LP002795 | Male | Yes | 3+ | Graduate | Yes | 10139 | 0.00 | 260 | 360 | 1 | Semiurban | Y |
| LP002798 | Male | Yes | 0 | Graduate | No | 3887 | 2669.00 | 162 | 360 | 1 | Semiurban | Y |
| LP002804 | Female | Yes | 0 | Graduate | No | 4180 | 2306.00 | 182 | 360 | 1 | Semiurban | Y |
| LP002807 | Male | Yes | 2 | Not Graduate | No | 3675 | 242.00 | 108 | 360 | 1 | Semiurban | Y |
| LP002813 | Female | Yes | 1 | Graduate | Yes | 19484 | 0.00 | 600 | 360 | 1 | Semiurban | Y |
| LP002820 | Male | Yes | 0 | Graduate | No | 5923 | 2054.00 | 211 | 360 | 1 | Rural | Y |
| LP002821 | Male | No | 0 | Not Graduate | Yes | 5800 | 0.00 | 132 | 360 | 1 | Semiurban | Y |
| LP002832 | Male | Yes | 2 | Graduate | No | 8799 | 0.00 | 258 | 360 | 0 | Urban | N |
| LP002833 | Male | Yes | 0 | Not Graduate | No | 4467 | 0.00 | 120 | 360 | NA | Rural | Y |
| LP002836 | Male | No | 0 | Graduate | No | 3333 | 0.00 | 70 | 360 | 1 | Urban | Y |
| LP002837 | Male | Yes | 3+ | Graduate | No | 3400 | 2500.00 | 123 | 360 | 0 | Rural | N |
| LP002840 | Female | No | 0 | Graduate | No | 2378 | 0.00 | 9 | 360 | 1 | Urban | N |
| LP002841 | Male | Yes | 0 | Graduate | No | 3166 | 2064.00 | 104 | 360 | 0 | Urban | N |
| LP002842 | Male | Yes | 1 | Graduate | No | 3417 | 1750.00 | 186 | 360 | 1 | Urban | Y |
| LP002847 | Male | Yes | NA | Graduate | No | 5116 | 1451.00 | 165 | 360 | 0 | Urban | N |
| LP002855 | Male | Yes | 2 | Graduate | No | 16666 | 0.00 | 275 | 360 | 1 | Urban | Y |
| LP002862 | Male | Yes | 2 | Not Graduate | No | 6125 | 1625.00 | 187 | 480 | 1 | Semiurban | N |
| LP002863 | Male | Yes | 3+ | Graduate | No | 6406 | 0.00 | 150 | 360 | 1 | Semiurban | N |
| LP002868 | Male | Yes | 2 | Graduate | No | 3159 | 461.00 | 108 | 84 | 1 | Urban | Y |
| LP002872 | NA | Yes | 0 | Graduate | No | 3087 | 2210.00 | 136 | 360 | 0 | Semiurban | N |
| LP002874 | Male | No | 0 | Graduate | No | 3229 | 2739.00 | 110 | 360 | 1 | Urban | Y |
| LP002877 | Male | Yes | 1 | Graduate | No | 1782 | 2232.00 | 107 | 360 | 1 | Rural | Y |
| LP002888 | Male | No | 0 | Graduate | NA | 3182 | 2917.00 | 161 | 360 | 1 | Urban | Y |
| LP002892 | Male | Yes | 2 | Graduate | No | 6540 | 0.00 | 205 | 360 | 1 | Semiurban | Y |
| LP002893 | Male | No | 0 | Graduate | No | 1836 | 33837.00 | 90 | 360 | 1 | Urban | N |
| LP002894 | Female | Yes | 0 | Graduate | No | 3166 | 0.00 | 36 | 360 | 1 | Semiurban | Y |
| LP002898 | Male | Yes | 1 | Graduate | No | 1880 | 0.00 | 61 | 360 | NA | Rural | N |
| LP002911 | Male | Yes | 1 | Graduate | No | 2787 | 1917.00 | 146 | 360 | 0 | Rural | N |
| LP002912 | Male | Yes | 1 | Graduate | No | 4283 | 3000.00 | 172 | 84 | 1 | Rural | N |
| LP002916 | Male | Yes | 0 | Graduate | No | 2297 | 1522.00 | 104 | 360 | 1 | Urban | Y |
| LP002917 | Female | No | 0 | Not Graduate | No | 2165 | 0.00 | 70 | 360 | 1 | Semiurban | Y |
| LP002925 | NA | No | 0 | Graduate | No | 4750 | 0.00 | 94 | 360 | 1 | Semiurban | Y |
| LP002926 | Male | Yes | 2 | Graduate | Yes | 2726 | 0.00 | 106 | 360 | 0 | Semiurban | N |
| LP002928 | Male | Yes | 0 | Graduate | No | 3000 | 3416.00 | 56 | 180 | 1 | Semiurban | Y |
| LP002931 | Male | Yes | 2 | Graduate | Yes | 6000 | 0.00 | 205 | 240 | 1 | Semiurban | N |
| LP002933 | NA | No | 3+ | Graduate | Yes | 9357 | 0.00 | 292 | 360 | 1 | Semiurban | Y |
| LP002936 | Male | Yes | 0 | Graduate | No | 3859 | 3300.00 | 142 | 180 | 1 | Rural | Y |
| LP002938 | Male | Yes | 0 | Graduate | Yes | 16120 | 0.00 | 260 | 360 | 1 | Urban | Y |
| LP002940 | Male | No | 0 | Not Graduate | No | 3833 | 0.00 | 110 | 360 | 1 | Rural | Y |
| LP002941 | Male | Yes | 2 | Not Graduate | Yes | 6383 | 1000.00 | 187 | 360 | 1 | Rural | N |
| LP002943 | Male | No | NA | Graduate | No | 2987 | 0.00 | 88 | 360 | 0 | Semiurban | N |
| LP002945 | Male | Yes | 0 | Graduate | Yes | 9963 | 0.00 | 180 | 360 | 1 | Rural | Y |
| LP002948 | Male | Yes | 2 | Graduate | No | 5780 | 0.00 | 192 | 360 | 1 | Urban | Y |
| LP002949 | Female | No | 3+ | Graduate | NA | 416 | 41667.00 | 350 | 180 | NA | Urban | N |
| LP002950 | Male | Yes | 0 | Not Graduate | NA | 2894 | 2792.00 | 155 | 360 | 1 | Rural | Y |
| LP002953 | Male | Yes | 3+ | Graduate | No | 5703 | 0.00 | 128 | 360 | 1 | Urban | Y |
| LP002958 | Male | No | 0 | Graduate | No | 3676 | 4301.00 | 172 | 360 | 1 | Rural | Y |
| LP002959 | Female | Yes | 1 | Graduate | No | 12000 | 0.00 | 496 | 360 | 1 | Semiurban | Y |
| LP002960 | Male | Yes | 0 | Not Graduate | No | 2400 | 3800.00 | NA | 180 | 1 | Urban | N |
| LP002961 | Male | Yes | 1 | Graduate | No | 3400 | 2500.00 | 173 | 360 | 1 | Semiurban | Y |
| LP002964 | Male | Yes | 2 | Not Graduate | No | 3987 | 1411.00 | 157 | 360 | 1 | Rural | Y |
| LP002974 | Male | Yes | 0 | Graduate | No | 3232 | 1950.00 | 108 | 360 | 1 | Rural | Y |
| LP002978 | Female | No | 0 | Graduate | No | 2900 | 0.00 | 71 | 360 | 1 | Rural | Y |
| LP002979 | Male | Yes | 3+ | Graduate | No | 4106 | 0.00 | 40 | 180 | 1 | Rural | Y |
| LP002983 | Male | Yes | 1 | Graduate | No | 8072 | 240.00 | 253 | 360 | 1 | Urban | Y |
| LP002984 | Male | Yes | 2 | Graduate | No | 7583 | 0.00 | 187 | 360 | 1 | Urban | Y |
| LP002990 | Female | No | 0 | Graduate | Yes | 4583 | 0.00 | 133 | 360 | 0 | Semiurban | N |
| Loan_ID | Gender | Married | Dependents | Education | Self_Employed | ApplicantIncome | CoapplicantIncome | LoanAmount | Loan_Amount_Term | Credit_History | Property_Area | Loan_Status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Length:614 | Length:614 | Length:614 | Length:614 | Length:614 | Length:614 | Min. : 150 | Min. : 0 | Min. : 9.0 | Min. : 12 | Min. :0.0000 | Length:614 | Length:614 | |
| Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | 1st Qu.: 2878 | 1st Qu.: 0 | 1st Qu.:100.0 | 1st Qu.:360 | 1st Qu.:1.0000 | Class :character | Class :character | |
| Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Median : 3812 | Median : 1188 | Median :128.0 | Median :360 | Median :1.0000 | Mode :character | Mode :character | |
| NA | NA | NA | NA | NA | NA | Mean : 5403 | Mean : 1621 | Mean :146.4 | Mean :342 | Mean :0.8422 | NA | NA | |
| NA | NA | NA | NA | NA | NA | 3rd Qu.: 5795 | 3rd Qu.: 2297 | 3rd Qu.:168.0 | 3rd Qu.:360 | 3rd Qu.:1.0000 | NA | NA | |
| NA | NA | NA | NA | NA | NA | Max. :81000 | Max. :41667 | Max. :700.0 | Max. :480 | Max. :1.0000 | NA | NA | |
| NA | NA | NA | NA | NA | NA | NA | NA | NA’s :22 | NA’s :14 | NA’s :50 | NA | NA |
missing <- loan_data %>% mutate_if(is.character, list(~na_if(.,"")))
missing%>%
summarise_all(list(~sum(is.na(.)))) %>%
gather(key="Variable", value="Number_Missing") %>%
arrange(desc(Number_Missing)) %>% kbl() %>% kable_styling() %>% scroll_box(width = "750px", height = "250px")| Variable | Number_Missing |
|---|---|
| Credit_History | 50 |
| Self_Employed | 32 |
| LoanAmount | 22 |
| Dependents | 15 |
| Loan_Amount_Term | 14 |
| Gender | 13 |
| Married | 3 |
| Loan_ID | 0 |
| Education | 0 |
| ApplicantIncome | 0 |
| CoapplicantIncome | 0 |
| Property_Area | 0 |
| Loan_Status | 0 |
The following section we’ll continue to look at the data from the raw perspective (loan_raw).
Categorical Variables
There are several variables which have blank values "". These data points may have been intentionally skipped by customers from banks during the data collection process or they may just be missing. We will handle this later on.
* Loan_ID: unique identifier
* Gender: either Male or Female or blank
* Married: either No or Yes or blank
* Dependents: how many dependents does someone have? 0, 1, 2, 3+ or blank
* Education: Graduate or Not Graduate
* Self_Employed: No or Yes or blank
* Property_Area: Urban, Rural or Semiurban
* Loan_status: Y (yes) or N (no)
* Credit_History: does the credit history meet the guidelines? 1 = Yes, 0 = No
Married
Married applicants have a higher approval rate than non married applicants. It will be useful to look into if this has any correlation with income.
married_loan_status_count <- table(loan_raw$Married,loan_raw$Loan_Status)
married_loan_status_perct <- married_loan_status_count
married_loan_status_perct[1,] <- round(married_loan_status_perct[1,]/3 * 100, 2)
married_loan_status_perct[2,] <- round(married_loan_status_perct[2,]/213 * 100, 2)
married_loan_status_perct[3,] <- round(married_loan_status_perct[3,]/398 * 100, 2)
#set column names for married_loan_status_count
married_loan_status_count <- data.frame(married_loan_status_count)
colnames(married_loan_status_count) <- c('Married','Loan_Status','Count')
#set column names and row names for gender_loan_status_perct
rownames(married_loan_status_perct) <- c("Blank", "Not Married", "Married")
colnames(married_loan_status_perct) <- c("% Applications Not Approved", "% Applications Approved")
loan_data_Married <- loan_raw
loan_data_Married[loan_data_Married$Married == '',] <- "Blank"
t1 <- loan_data_Married %>% group_by(Married) %>% tally
colnames(t1) <- c("Married","Count Loan Applications")
t2 <- married_loan_status_perct
knitr::kable(list(t1, t2))
|
|
ggplot(data=married_loan_status_count, aes(x=Married, y=Count, fill=Loan_Status)) + geom_bar(stat="identity",position="dodge")Dependents
Applicants with 2 dependents appear to have the highest loan approval rate. It’d be interesting to see if the income per dependent has any impact on loan approval if we assume having more income makes it more likely to get a loan approved.
dep_loan_status_count <- table(loan_raw$Dependents,loan_raw$Loan_Status)
dep_loan_status_perct <- dep_loan_status_count
dep_loan_status_perct[1,] <- round(dep_loan_status_perct[1,]/15 * 100, 2)
dep_loan_status_perct[2,] <- round(dep_loan_status_perct[2,]/345 * 100, 2)
dep_loan_status_perct[3,] <- round(dep_loan_status_perct[3,]/102 * 100, 2)
dep_loan_status_perct[4,] <- round(dep_loan_status_perct[4,]/101 * 100, 2)
dep_loan_status_perct[5,] <- round(dep_loan_status_perct[5,]/51 * 100, 2)
#set column names for dep_loan_status_count
dep_loan_status_count <- data.frame(dep_loan_status_count)
colnames(dep_loan_status_count) <- c('Dependents','Loan_Status','Count')
#set column names and row names for gender_loan_status_perct
rownames(dep_loan_status_perct) <- c("Blank", "0", "1","2","3+")
colnames(dep_loan_status_perct) <- c("% Applications Not Approved", "% Applications Approved")
loan_data_Dep <- loan_raw
loan_data_Dep[loan_data_Dep$Dependents == '',] <- "Blank"
t1 <- loan_data_Dep %>% group_by(Dependents) %>% tally
colnames(t1) <- c("Dependents","Count Loan Applications")
t2 <- dep_loan_status_perct
knitr::kable(list(t1, t2))
|
|
ggplot(data=dep_loan_status_count, aes(x=Dependents, y=Count, fill=Loan_Status)) + geom_bar(stat="identity",position="dodge")Education
Applicants with Graduate education have a higher loan approval rate here.
edu_loan_status_count <- table(loan_raw$Education,loan_raw$Loan_Status)
edu_loan_status_perct <- edu_loan_status_count
edu_loan_status_perct[1,] <- round(edu_loan_status_perct[1,]/480 * 100, 2)
edu_loan_status_perct[2,] <- round(edu_loan_status_perct[2,]/134 * 100, 2)
#set column names for edu_loan_status_count
edu_loan_status_count <- data.frame(edu_loan_status_count)
colnames(edu_loan_status_count) <- c('Education','Loan_Status','Count')
#set column names for edu_loan_status_perct
colnames(edu_loan_status_perct) <- c("% Applications Not Approved", "% Applications Approved")
t1 <- loan_raw %>% group_by(Education) %>% tally
colnames(t1) <- c("Education","Count Loan Applications")
t2 <- edu_loan_status_perct
knitr::kable(list(t1, t2))
|
|
ggplot(data=edu_loan_status_count, aes(x=Education, y=Count, fill=Loan_Status)) + geom_bar(stat="identity",position="dodge")Property Area
Semiurban applicants have the highest approval loan rating over rural and urban.
proparea_loan_status_count <- table(loan_raw$Property_Area,loan_raw$Loan_Status)
proparea_loan_status_perct <- proparea_loan_status_count
proparea_loan_status_perct[1,] <- round(proparea_loan_status_perct[1,]/179 * 100, 2)
proparea_loan_status_perct[2,] <- round(proparea_loan_status_perct[2,]/233 * 100, 2)
proparea_loan_status_perct[3,] <- round(proparea_loan_status_perct[3,]/202 * 100, 2)
#set column names for proparea_loan_status_count
proparea_loan_status_count <- data.frame(proparea_loan_status_count)
colnames(proparea_loan_status_count) <- c('Property_Area','Loan_Status','Count')
#set column names for proparea_loan_status_perct
colnames(proparea_loan_status_perct) <- c("% Applications Not Approved", "% Applications Approved")
t1 <- loan_raw %>% group_by(Property_Area) %>% tally
colnames(t1) <- c("Property_Area","Count Loan Applications")
t2 <- proparea_loan_status_perct
knitr::kable(list(t1, t2))
|
|
ggplot(data=proparea_loan_status_count, aes(x=Property_Area, y=Count, fill=Loan_Status)) + geom_bar(stat="identity",position="dodge")Credit History
Having an a credit history that meets the guidelines appears to be extremely important in whether the loan status is approved or not.
credhist_loan_status_count <- table(loan_raw$Credit_History,loan_raw$Loan_Status)
credhist_loan_status_perct <- credhist_loan_status_count
credhist_loan_status_perct[1,] <- round(credhist_loan_status_perct[1,]/89 * 100, 2)
credhist_loan_status_perct[2,] <- round(credhist_loan_status_perct[2,]/475 * 100, 2)
#set column names for credhist_loan_status_count
credhist_loan_status_count <- data.frame(credhist_loan_status_count)
colnames(credhist_loan_status_count) <- c('Credit_History','Loan_Status','Count')
#set column names for credhist_loan_status_perct
colnames(credhist_loan_status_perct) <- c("% Applications Not Approved", "% Applications Approved")
t1 <- loan_raw %>% group_by(Credit_History) %>% tally
colnames(t1) <- c("Credit_History","Count Loan Applications")
t2 <- credhist_loan_status_perct
knitr::kable(list(t1, t2))
|
|
ggplot(data=credhist_loan_status_count, aes(x=Credit_History, y=Count, fill=Loan_Status)) + geom_bar(stat="identity",position="dodge")Numerical Variables
- ApplicantIncome: how much money does the applicant make?
- CoapplicantIncome: how much money does the coapplicant make? if there is no coapplicant this is 0.
- LoanAmount: how much is the loan worth in thousands?
- Loan_Amount_Term: how many months is the loan?
Now let’s use the pairs.panels function to see a lot of important information related to our numeric data:
- Applicant income and loan_amount are strongly correlated
- The most common Loan_Amount_Term is 360 months
numeric_loan_data <- dplyr::select(loan_data,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term)
pairs.panels(numeric_loan_data,
method = "pearson", # correlation method
hist.col = "#00AFBB",
density = TRUE, # show density plots
ellipses = TRUE # show correlation ellipses
)Inspecting ApplicantIncome and Loan Income
Here we can see that the ApplicantIncome does not have a huge effect on whether the Loan_Status was approved (Y) or not. The average ApplicantIncome is about the same for both groups is similar. There are a fewer more outliers of high incomes in the group where the loan status was approved.
approved <- loan_data[loan_data$Loan_Status == 'Y',]
denied <- loan_data[loan_data$Loan_Status == 'N',]
a <- ggplot(loan_data,aes(x=ApplicantIncome,color=Loan_Status)) + geom_boxplot()
b <- ggplot(approved,aes(x=ApplicantIncome,y=LoanAmount,color=Loan_Status)) + geom_point(color='blue') + xlab('Approved Applicant Income') + scale_x_continuous(limits = c(0, 25000)) + scale_y_continuous(limits = c(0, 650))
grid.arrange(a,b,nrow=2)#,nrow=2,ncol=2,layout_matrix=c(1,1,2,3)) c <- ggplot(denied,aes(x=ApplicantIncome,y=LoanAmount,color=Loan_Status)) + geom_point(color='red') + xlab('Denied Applicant Income') + scale_x_continuous(limits = c(0, 25000)) + scale_y_continuous(limits = c(0, 650))
grid.arrange(c)In addition, upon investigating the sum of ApplicantIncome and CoapplicantIncome, we observe that it does not appear to have much prediction power with Loan_Status.
ggplot(data = loan_data, aes(x = Loan_Status, y = ApplicantIncome+CoapplicantIncome, fill=Loan_Status)) +
geom_boxplot() +
coord_flip()LoanAmount Per ApplicantIncome
Now let’s see if the rate of the LoanAmount divided by ApplicantIncome has any prediction power when trying to deteremine if a Loan_Status will be approved or not. This would indicate that perhaps someone who is requesting a LoanAmount 5 times their income, they might not be approved but if they requested 3 times their income they could get approved.
Looking at the boxplots below, the average LoanAmtPerSalary is roughly the same for approved and not approved applications so this disbunks this theory. This variable might prove helpful in our modeling so we will keep it.
loan_data$LoanAmtPerSalary <- loan_data$LoanAmount*100000/loan_data$ApplicantIncome
ggplot(loan_data,aes(x=LoanAmtPerSalary,color=Loan_Status)) + geom_boxplot() + scale_x_continuous(limits = c(0, 30000))## Warning: Removed 25 rows containing non-finite values (stat_boxplot).
Data Prep for Model-fitting
I explicitly recoded the Y/N values into 1/0’s
Since credit history is a categorical value and fewer than 50 rows are missing it’s better to delete these data points rather than to try to interpret a value for them. For loan amount term and loan amount we will use the mice package to impute a value where it is missing.
Additional Data Processing / Manipulation Steps
So, first off, I need to convert Credit_History to factors so that the mice model that I’m going to use can detect that column as a categorical variable.
Combining ApplicantIncome and CoapplicantIncome into a new variable TotalIncome, and dropping the respective input columns. Loan_ID doesn’t help with the prediction obviously. So dropping it as well.
loan_knn_pre_imp <- loan_knn
loan_knn_pre_imp$Credit_History <- as.factor(loan_knn_pre_imp$Credit_History)
loan_knn_pre_imp <- loan_knn_pre_imp %>% mutate(TotalIncome = ApplicantIncome + CoapplicantIncome)
loan_knn_pre_imp <- loan_knn_pre_imp %>% dplyr::select(-c('Loan_ID','ApplicantIncome','CoapplicantIncome'))
# loan_knn_pre_imp[loan_knn_pre_imp$Dependents = "3+"] <- "3"
# recode dependents 3+ to 3
loan_knn_pre_imp$Dependents <- revalue(loan_knn_pre_imp$Dependents, c("3+"="3"))
str(loan_knn_pre_imp)## 'data.frame': 614 obs. of 12 variables:
## $ Gender : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
## $ Married : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 2 2 2 2 2 ...
## $ Dependents : Factor w/ 4 levels "0","1","2","3": 1 2 1 1 1 3 1 4 3 2 ...
## $ Education : Factor w/ 2 levels "Graduate","Not Graduate": 1 1 1 2 1 1 2 1 1 1 ...
## $ Self_Employed : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 2 1 1 1 1 ...
## $ LoanAmount : int NA 128 66 120 141 267 95 158 168 349 ...
## $ Loan_Amount_Term: int 360 360 360 360 360 360 360 360 360 360 ...
## $ Credit_History : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 1 2 2 ...
## $ Property_Area : Factor w/ 3 levels "Rural","Semiurban",..: 3 1 3 3 3 3 3 2 3 2 ...
## $ Loan_Status : num 1 0 1 1 1 1 1 0 1 0 ...
## $ LoanAmtPerSalary: num NA 2793 2200 4646 2350 ...
## $ TotalIncome : num 5849 6091 3000 4941 6000 ...
Status quo of missing data
I’ve set up a predictorMatrix where I can instruct mice to use which method for which column for imputation.
Set seed = 501. Retrieved the results.
# clean_loan_data <- complete(mice(clean_loan_data,m=5,meth='pmm',print=FALSE))
init <- mice(loan_knn_pre_imp, maxit=0)
meth <- init$method
predM <- init$predictorMatrix
meth[c('LoanAmount','Loan_Amount_Term')] <- 'norm'
meth[c('Credit_History','Self_Employed','Gender','Married')] <- 'logreg'
meth[c('Dependents')] <- 'polyreg'
meth[c('Loan_Status','TotalIncome','Property_Area','Education')] = ''
loan_knn_imp1 <- mice(loan_knn_pre_imp, method=meth, predictorMatrix=predM, seed=501)##
## iter imp variable
## 1 1 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 1 2 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 1 3 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 1 4 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 1 5 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 2 1 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 2 2 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 2 3 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 2 4 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 2 5 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 3 1 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 3 2 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 3 3 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 3 4 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 3 5 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 4 1 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 4 2 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 4 3 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 4 4 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 4 5 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 5 1 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 5 2 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 5 3 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 5 4 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
## 5 5 Gender Married Dependents Self_Employed LoanAmount Loan_Amount_Term Credit_History LoanAmtPerSalary
Manual Examinations
After some manual examinations of the different imputed results, I’ve decided to go with imputed column #3.
## 1 2 3 4 5
## 17 1 1 1 1 1
## 25 0 1 1 0 1
## 31 1 1 1 1 1
## 43 1 1 1 1 1
## 80 1 1 1 1 1
## 84 0 0 1 0 0
## 87 1 1 1 1 1
## 96 0 0 1 0 1
## 118 1 1 1 1 1
## 126 1 1 1 1 1
## 130 1 0 1 0 1
## 131 1 1 1 1 1
## 157 1 1 1 1 1
## 182 0 1 1 0 1
## 188 1 1 1 1 1
## 199 1 1 1 1 1
## 220 1 1 1 1 1
## 237 1 0 0 1 1
## 238 1 1 1 1 1
## 260 0 0 1 1 0
## [ reached 'max' / getOption("max.print") -- omitted 30 rows ]
## Loan_ID Gender Married Dependents Education Self_Employed
## 96 LP001326 Male No 0 Graduate <NA>
## 97 LP001327 Female Yes 0 Graduate No
## 98 LP001333 Male Yes 0 Graduate No
## 99 LP001334 Male Yes 0 Not Graduate No
## 100 LP001343 Male Yes 0 Graduate No
## 101 LP001345 Male Yes 2 Not Graduate No
## 102 LP001349 Male No 0 Graduate No
## ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term
## 96 6782 0 NA 360
## 97 2484 2302 137 360
## 98 1977 997 50 360
## 99 4188 0 115 180
## 100 1759 3541 131 360
## 101 4288 3263 133 180
## 102 4843 3806 151 360
## Credit_History Property_Area Loan_Status LoanAmtPerSalary
## 96 NA Urban 0 NA
## 97 1 Semiurban 1 5515.298
## 98 1 Semiurban 1 2529.084
## 99 1 Semiurban 1 2745.941
## 100 1 Semiurban 1 7447.413
## 101 1 Urban 1 3101.679
## 102 1 Semiurban 1 3117.902
## [ reached 'max' / getOption("max.print") -- omitted 16 rows ]
## 1 2 3 4 5
## 105 Yes Yes Yes No Yes
## 229 No Yes Yes Yes Yes
## 436 Yes Yes No No No
## Loan_ID Gender Married Dependents Education Self_Employed
## 430 LP002370 Male No 0 Not Graduate No
## 431 LP002377 Female No 1 Graduate Yes
## 432 LP002379 Male No 0 Graduate No
## 433 LP002386 Male No 0 Graduate <NA>
## 434 LP002387 Male Yes 0 Graduate No
## 435 LP002390 Male No 0 Graduate No
## 436 LP002393 Female <NA> <NA> Graduate No
## ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term
## 430 2717 0 60 180
## 431 8624 0 150 360
## 432 6500 0 105 360
## 433 12876 0 405 360
## 434 2425 2340 143 360
## 435 3750 0 100 360
## 436 10047 0 NA 240
## Credit_History Property_Area Loan_Status LoanAmtPerSalary
## 430 1 Urban 1 2208.318
## 431 1 Semiurban 1 1739.332
## 432 0 Rural 0 1615.385
## 433 1 Semiurban 1 3145.387
## 434 1 Semiurban 1 5896.907
## 435 1 Urban 1 2666.667
## 436 1 Semiurban 1 NA
## 1 2 3 4 5
## 103 2 1 3 1 0
## 105 0 3 0 2 0
## 121 2 2 3 2 0
## 227 2 2 1 1 1
## 229 0 0 2 1 2
## 294 0 0 3 0 0
## 302 0 0 1 3 1
## 333 0 0 0 0 0
## 336 0 0 1 2 2
## 347 0 0 3 0 1
## 356 1 0 0 0 0
## 436 2 2 0 0 0
## 518 0 2 0 1 2
## 572 0 0 1 1 2
## 598 0 0 0 0 0
## Loan_ID Gender Married Dependents Education Self_Employed
## 227 LP001754 Male Yes <NA> Not Graduate Yes
## 228 LP001758 Male Yes 2 Graduate No
## 229 LP001760 Male <NA> <NA> Graduate No
## ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term
## 227 4735 0 138 360
## 228 6250 1695 210 360
## 229 4758 0 158 480
## Credit_History Property_Area Loan_Status LoanAmtPerSalary
## 227 1 Urban 0 2914.467
## 228 1 Semiurban 1 3360.000
## 229 1 Semiurban 1 3320.723
Decision
We picked impute #3.
clean_loan_data <- loan_knn2
# have to redo loan_status as loan_knn's loan status had been recoded to numeric on purpose
clean_loan_data$Loan_Status <- as.factor(clean_loan_data$Loan_Status)
str(clean_loan_data)## 'data.frame': 614 obs. of 12 variables:
## $ Gender : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
## $ Married : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 2 2 2 2 2 ...
## $ Dependents : Factor w/ 4 levels "0","1","2","3": 1 2 1 1 1 3 1 4 3 2 ...
## $ Education : Factor w/ 2 levels "Graduate","Not Graduate": 1 1 1 2 1 1 2 1 1 1 ...
## $ Self_Employed : Factor w/ 2 levels "No","Yes": 1 1 2 1 1 2 1 1 1 1 ...
## $ LoanAmount : num 21.1 128 66 120 141 ...
## $ Loan_Amount_Term: num 360 360 360 360 360 360 360 360 360 360 ...
## $ Credit_History : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 1 2 2 ...
## $ Property_Area : Factor w/ 3 levels "Rural","Semiurban",..: 3 1 3 3 3 3 3 2 3 2 ...
## $ Loan_Status : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 1 2 1 ...
## $ LoanAmtPerSalary: num 403 2793 2200 4646 2350 ...
## $ TotalIncome : num 5849 6091 3000 4941 6000 ...
Imbalanced Dataset
Notice that the response variable is 31/69 split on the binary response, No and Yes, respectively.
imb_dat <- as.data.frame(prop.table(x = table(clean_loan_data$Loan_Status)))
colnames(imb_dat) <- c("Loan Status", "Freq")
imb_dat## Loan Status Freq
## 1 0 0.3127036
## 2 1 0.6872964
Splitting Data into Training & Testing
Here we are going to use 80% of our data to train the model and reserve 20% to test the model we pick.
2. Linear Discriminant Analysis
LDA does not seem to be a good approach with this data set as the points provided by the available data are not linearly separable
lda Cross Validation
predictions with the LDA model are less accurate than if we just used the binary classifier Credit_History to determine weather or not a loan would be approved
lda model results
lda.fit <- train(Loan_Status ~ TotalIncome + LoanAmount,
data = train,
method = 'lda',
trControl = ctrl
)
test$lda <- predict(lda.fit, test)
confusionMatrix(test$lda, test$Loan_Status)## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 0 0
## 1 33 90
##
## Accuracy : 0.7317
## 95% CI : (0.6443, 0.8076)
## No Information Rate : 0.7317
## P-Value [Acc > NIR] : 0.5467
##
## Kappa : 0
##
## Mcnemar's Test P-Value : 2.54e-08
##
## Sensitivity : 0.0000
## Specificity : 1.0000
## Pos Pred Value : NaN
## Neg Pred Value : 0.7317
## Prevalence : 0.2683
## Detection Rate : 0.0000
## Detection Prevalence : 0.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : 0
##
3. K-nearest Neighbor
First off, set seed = 688.
Create training/test partitions by calling createDataPartition. p is set to .8 to mean 80/20 split for train/test set.
Checking the structure of the train set (knn_train)
Checking the structure of the test set (knn_test)
## 'data.frame': 122 obs. of 12 variables:
## $ Gender : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 1 2 ...
## $ Married : Factor w/ 2 levels "No","Yes": 1 2 2 2 2 1 1 2 2 2 ...
## $ Dependents : Factor w/ 4 levels "0","1","2","3": 1 3 2 3 3 2 4 3 2 1 ...
## $ Education : Factor w/ 2 levels "Graduate","Not Graduate": 1 1 1 2 2 1 1 1 1 1 ...
## $ Self_Employed : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 1 2 1 ...
## $ LoanAmount : num 141 267 349 112 110 106 320 134 286 96 ...
## $ Loan_Amount_Term: num 360 360 360 360 360 360 360 360 360 360 ...
## $ Credit_History : Factor w/ 2 levels "0","1": 2 2 2 1 2 2 2 2 1 2 ...
## $ Property_Area : Factor w/ 3 levels "Rural","Semiurban",..: 3 3 2 1 3 1 1 3 3 2 ...
## $ Loan_Status : Factor w/ 2 levels "0","1": 2 2 1 1 2 1 1 1 1 2 ...
## $ LoanAmtPerSalary: num 2350 4929 2718 3328 2603 ...
## $ TotalIncome : num 6000 9613 23809 5282 5266 ...
Cross Validation
Perform a repeated 11-fold cross-validation, meaning the number of complete sets of folks to compute is 11. For this classification problem, we assigned our fitted model to knn.fit. The cross-validated results is plugged in the form of trControl.
# cleaning up some parallel computing
# https://stackoverflow.com/questions/25097729/un-register-a-doparallel-cluster
registerDoSEQ()
trControl <- trainControl(method = "repeatedcv",
repeats = 11)
knn.fit <- train(Loan_Status ~ .,
method = "knn",
tuneGrid = expand.grid(k = 1:10),
trControl = trControl,
preProcess = c("center","scale"),
data = knn_train
)
knn.fit ## k-Nearest Neighbors
##
## 492 samples
## 11 predictor
## 2 classes: '0', '1'
##
## Pre-processing: centered (14), scaled (14)
## Resampling: Cross-Validated (10 fold, repeated 11 times)
## Summary of sample sizes: 442, 444, 443, 443, 442, 444, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 1 0.7122390 0.3217246
## 2 0.7089797 0.3061592
## 3 0.7694411 0.4090793
## 4 0.7605727 0.3808163
## 5 0.7838922 0.4188985
## 6 0.7732961 0.3882704
## 7 0.7848086 0.4070143
## 8 0.7832899 0.4023703
## 9 0.7912427 0.4176016
## 10 0.7873503 0.4063482
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 9.
Model Results
knn_pred <- predict(knn.fit, newdata = knn_test)
# options('max.print' = 100)
# getOption("max.print")
confusionMatrix(knn_pred, knn_test$Loan_Status)## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 15 3
## 1 23 81
##
## Accuracy : 0.7869
## 95% CI : (0.7035, 0.8558)
## No Information Rate : 0.6885
## P-Value [Acc > NIR] : 0.0104406
##
## Kappa : 0.4195
##
## Mcnemar's Test P-Value : 0.0001944
##
## Sensitivity : 0.3947
## Specificity : 0.9643
## Pos Pred Value : 0.8333
## Neg Pred Value : 0.7788
## Prevalence : 0.3115
## Detection Rate : 0.1230
## Detection Prevalence : 0.1475
## Balanced Accuracy : 0.6795
##
## 'Positive' Class : 0
##
Accuracy is 78.69% while balanced accuracy is only 67.95%.
4. Decision Trees
Now we will use a decision tree to see how well it will perform on our data.
* Our decision tree starts by splitting users based on their Credit_History. This makes sense based on our exploratory data analysis.
* Other variables used in the decision tree include LoanAmount, PropertyArea, etc.
loan_tree = tree(Loan_Status ~., train)
plot(loan_tree)
text(loan_tree)
title(main = "Unpruned Decision Tree")Decision Tree Performance
Training Data
Now we will use our model to see how it performs on the training data. We see that the model predicted Loan_Status with an accuracy of ~83%. 81 instances were incorrectly classified.
pred_tree_train <- predict(loan_tree,train,type="class")
test_table <- table(pred_tree_train,train$Loan_Status) %>% kbl() %>% kable_styling()
test_table| 0 | 1 | |
|---|---|---|
| 0 | 89 | 11 |
| 1 | 70 | 321 |
## [1] 0.8350305
Cross-validation for better performance
The first version of our model was a full, unpruned tree. Now we are going to prune it back to get the optimal tree using cross validation. We have plotted the number of misclassifications with the different trees. As we can see, the trees with size 2-4 have the fewest misclassifications. We will choose size 4 to have the fewest misclassifications.
## $size
## [1] 15 10 4 2 1
##
## $dev
## [1] 108 108 101 97 159
##
## $k
## [1] -Inf 0.000000 1.333333 2.500000 65.000000
##
## $method
## [1] "misclass"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
Using a size = 4, our decision tree looks like the following:
Testing Data
Now let’s see how our pruned performs on our testing data. The accuracy for our test data was ~82%, which was almost the same as our training data. 21 of the total observations were misclassified.
pred_tree_test <- predict(loan_tree_pruned,test, type="class")
test_table <- table(pred_tree_test,test$Loan_Status) %>% kbl() %>% kable_styling()
test_table| 0 | 1 | |
|---|---|---|
| 0 | 14 | 2 |
| 1 | 19 | 88 |
## [1] 0.8292683
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 14 2
## 1 19 88
##
## Accuracy : 0.8293
## 95% CI : (0.7509, 0.8911)
## No Information Rate : 0.7317
## P-Value [Acc > NIR] : 0.0075806
##
## Kappa : 0.4804
##
## Mcnemar's Test P-Value : 0.0004803
##
## Sensitivity : 0.4242
## Specificity : 0.9778
## Pos Pred Value : 0.8750
## Neg Pred Value : 0.8224
## Prevalence : 0.2683
## Detection Rate : 0.1138
## Detection Prevalence : 0.1301
## Balanced Accuracy : 0.7010
##
## 'Positive' Class : 0
##
5. Random Forests
Now we will develop a random forest model to see how well it will performs with our data. Parameters for a random forest include: mtry : Number of variables randomly sampled as candidates at each split. Note that the default values are different for classification (sqrt(p) where p is number of variables in x) and regression (p/3) ntree : Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times
Our initial model will have default parameters of mtry=sqrt(13) and ntree=500.
# find out no of cores
no_cores <- detectCores() - 1
cl<-makePSOCKcluster(no_cores)
registerDoParallel(cl)
# start.time<-proc.time()
# model<-train(target~., data=trainingset, method='rf')#drop loan id
train_rf1 <- train
test_rf1 <- test
# Create model with default parameters
control <- trainControl(method="repeatedcv", number=10, repeats=3)
mtry <- sqrt(ncol(train_rf1))
tunegrid <- expand.grid(.mtry=mtry)
rf_default <- train(Loan_Status~., data=train_rf1, method="rf", metric="Accuracy", tuneGrid=tunegrid, trControl=control)
print(rf_default)## Random Forest
##
## 491 samples
## 11 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 443, 442, 441, 442, 442, 442, ...
## Resampling results:
##
## Accuracy Kappa
## 0.8011678 0.4879166
##
## Tuning parameter 'mtry' was held constant at a value of 3.464102
# stop.time<-proc.time()
#
# run.time<-stop.time -start.time
#
# print(run.time)
#
# stopCluster(cl)Our inital model has accuracy of about 80%. Let’s see if we can improve accuracy by finding an optimal mtry value. We will test different mtry values 1-10 by using gridsearch. We see from our results that the optimal mtry value for accuracy is 2.
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(123)
tunegrid <- expand.grid(.mtry=c(1:10))
rf_gridsearch <- train(Loan_Status~., data=train_rf1, method="rf", metric="Accuracy", tuneGrid=tunegrid, trControl=control)
print(rf_gridsearch)## Random Forest
##
## 491 samples
## 11 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 442, 442, 441, 442, 442, 441, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 1 0.6795697 0.0138783
## 2 0.8003464 0.4745384
## 3 0.8030964 0.4913187
## 4 0.7983611 0.4853485
## 5 0.7895170 0.4665550
## 6 0.7874632 0.4625780
## 7 0.7881434 0.4672603
## 8 0.7847562 0.4591980
## 9 0.7846871 0.4605619
## 10 0.7820074 0.4550405
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.
Next let’s find the optimal value for ntree. Again we’ll use gridsearch to test different ntree values. It’s evident from our results that optimal ntree value for accuracy is 1500.
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
tunegrid <- expand.grid(.mtry=2)
modellist <- list()
for (ntree in c(500, 1000, 1500, 2000, 2500)) {
set.seed(124)
fit <- train(Loan_Status~., data=train_rf1, method="rf", metric="Accuracy", tuneGrid=tunegrid, trControl=control, ntree=ntree)
key <- toString(ntree)
modellist[[key]] <- fit
}
# compare results
results <- resamples(modellist)
summary(results)##
## Call:
## summary.resamples(object = results)
##
## Models: 500, 1000, 1500, 2000, 2500
## Number of resamples: 30
##
## Accuracy
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 500 0.7346939 0.7755102 0.7959184 0.8025006 0.8190816 0.8958333 0
## 1000 0.7346939 0.7755102 0.7959184 0.8011400 0.8163265 0.8958333 0
## 1500 0.7346939 0.7755102 0.7959184 0.8018203 0.8163265 0.8979592 0
## 2000 0.7346939 0.7755102 0.7959184 0.8011400 0.8163265 0.8958333 0
## 2500 0.7346939 0.7755102 0.7959184 0.8018203 0.8163265 0.8979592 0
##
## Kappa
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 500 0.2669735 0.4038375 0.4688651 0.4828621 0.545829 0.7435897 0
## 1000 0.2669735 0.4035402 0.4688651 0.4794511 0.541709 0.7435897 0
## 1500 0.2669735 0.4035402 0.4688651 0.4813131 0.541709 0.7476828 0
## 2000 0.2669735 0.4035402 0.4688651 0.4794511 0.541709 0.7435897 0
## 2500 0.2669735 0.4035402 0.4688651 0.4813131 0.541709 0.7476828 0
Our final random forest model will have mtry=2 and ntree=1500.
rf_final <- randomForest(Loan_Status ~ .,
data = train_rf1,
ntree = 1500,
mtry = 2,
importance = TRUE,
proximity = TRUE)
print(rf_final)##
## Call:
## randomForest(formula = Loan_Status ~ ., data = train_rf1, ntree = 1500, mtry = 2, importance = TRUE, proximity = TRUE)
## Type of random forest: classification
## Number of trees: 1500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 20.16%
## Confusion matrix:
## 0 1 class.error
## 0 73 86 0.54088050
## 1 13 319 0.03915663
## 0 1 MeanDecreaseAccuracy MeanDecreaseGini
## Gender -3.23 7.74 4.91 3.08
## Married -3.14 7.90 5.27 3.78
## Dependents -4.43 10.80 6.82 8.80
## Education 0.92 2.94 3.06 3.91
## Self_Employed -4.98 8.16 4.25 3.06
## LoanAmount -3.72 24.77 20.67 25.34
## Loan_Amount_Term 7.10 9.66 11.98 8.48
## Credit_History 93.68 95.74 104.16 50.51
## Property_Area 1.28 3.23 3.38 8.43
## LoanAmtPerSalary 3.34 17.43 16.68 27.54
## TotalIncome -4.38 23.64 19.72 27.19
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 14 1
## 1 19 89
##
## Accuracy : 0.8374
## 95% CI : (0.7601, 0.8978)
## No Information Rate : 0.7317
## P-Value [Acc > NIR] : 0.0039800
##
## Kappa : 0.4994
##
## Mcnemar's Test P-Value : 0.0001439
##
## Sensitivity : 0.4242
## Specificity : 0.9889
## Pos Pred Value : 0.9333
## Neg Pred Value : 0.8241
## Prevalence : 0.2683
## Detection Rate : 0.1138
## Detection Prevalence : 0.1220
## Balanced Accuracy : 0.7066
##
## 'Positive' Class : 0
##
# stop.time<-proc.time()
# run.time<-stop.time -start.time
# print(run.time)
# Stopping Cluster
stopCluster(cl)Accuracy of our final random forest model is about 83% on the test data with 19 instances misclassified. 2 are false negatives and 17 are false positives. Credit_history is the most important feature.
6. Model Performance
| Metric | LDA | K-Nearest Neighbor (KNN) | Decision Trees | Random Forest |
|---|---|---|---|---|
| Accuracy | 0.7317 | 0.7869 | 0.8293 | 0.8374 |
| Balanced Accuracy | 0.5000 | 0.6795 | 0.7010 | 0.7066 |
| Sensitivity | 0 | 0.3947 | 0.4242 | 0.4242 |
Notice that the sensitivity between Decision Trees, and RF is the same at 42.42%. It’s surprising to see that LDA and Random Forest ended up having the highest accuracy, which is usually the go-to metric to go for in an unbalanced dataset with the binary response that are not 50/50. The model we picked is Random Forest.