TMRB
2024-02-18
West Chester University
| Variables | Descriptions |
|---|---|
| Default\((y)\): | (1 displays bank loan default and 0 displays bank loan non default) |
| Checking_amount \((x_1)\): | (Numeric) |
| Term \((x_2)\): | (displayed in months (Numeric)) |
| Credit_score \((x_3)\): | (Numeric) |
| Gender \((x_4)\): | (Categorical) |
| Marital_status\((x_5)\): | (Categorical) |
| Car_loan \((x_6)\): | (1- Own car loan, 0- Does not own car loan – Numeric) |
| Personal_loan\((x_7)\): | (1- Own Personal loan, 0- Does not own Personal loan – Numeric) |
| Home_loan \((x_8)\): | (1- Own Home loan, 0- Does not own Home loan – Numeric) |
| Education_loan \((x_9)\): | (1- Own Education loan, 0- Does not own Education loan – Numeric) |
| Emp_status \((x_{10})\): | (Categorical) |
| Amount \((x_{11})\): | (Numeric) |
| Saving_amount \((x_{12})\): | (Numeric) |
| Emp_duration \((x_{13})\): | (which is displayed in months (Numeric)) |
| Age \((x_{14})\): | (which is displayed in years (Numeric)) |
| No_of_credit_acc \((x_{15})\): | (Numeric) |
The primary question for this analysis may be how will the explored predictive models perform.
'data.frame': 1000 obs. of 16 variables:
$ Default : int 0 0 0 1 1 0 0 0 0 1 ...
$ Checking_amount : int 988 458 158 300 63 1071 -192 172 585 189 ...
$ Term : int 15 15 14 25 24 20 13 16 20 19 ...
$ Credit_score : int 796 813 756 737 662 828 856 763 778 649 ...
$ Gender : chr "Female" "Female" "Female" "Female" ...
$ Marital_status : chr "Single" "Single" "Single" "Single" ...
$ Car_loan : int 1 1 0 0 0 1 1 1 1 1 ...
$ Personal_loan : int 0 0 1 0 0 0 0 0 0 0 ...
$ Home_loan : int 0 0 0 0 0 0 0 0 0 0 ...
$ Education_loan : int 0 0 0 1 1 0 0 0 0 0 ...
$ Emp_status : chr "employed" "employed" "employed" "employed" ...
$ Amount : int 1536 947 1678 1804 1184 475 626 1224 1162 786 ...
$ Saving_amount : int 3455 3600 3093 2449 2867 3282 3398 3022 3475 2711 ...
$ Emp_duration : int 12 25 43 0 4 12 11 12 12 0 ...
$ Age : int 38 36 34 29 30 32 38 36 36 29 ...
$ No_of_credit_acc: int 1 1 1 1 1 2 1 1 1 1 ...
| Default | Min. :0.0 | 1st Qu.:0.0 | Median :0.0 | Mean :0.3 | 3rd Qu.:1.0 | Max. :1.0 |
| Checking_amount | Min. :-665.0 | 1st Qu.: 164.8 | Median : 351.5 | Mean : 362.4 | 3rd Qu.: 553.5 | Max. :1319.0 |
| Term | Min. : 9.00 | 1st Qu.:16.00 | Median :18.00 | Mean :17.82 | 3rd Qu.:20.00 | Max. :27.00 |
| Credit_score | Min. : 376.0 | 1st Qu.: 725.8 | Median : 770.5 | Mean : 760.5 | 3rd Qu.: 812.0 | Max. :1029.0 |
| Gender | Length:1000 | Class :character | Mode :character | NA | NA | NA |
| Marital_status | Length:1000 | Class :character | Mode :character | NA | NA | NA |
| Car_loan | Min. :0.000 | 1st Qu.:0.000 | Median :0.000 | Mean :0.353 | 3rd Qu.:1.000 | Max. :1.000 |
| Personal_loan | Min. :0.000 | 1st Qu.:0.000 | Median :0.000 | Mean :0.474 | 3rd Qu.:1.000 | Max. :1.000 |
| Home_loan | Min. :0.000 | 1st Qu.:0.000 | Median :0.000 | Mean :0.056 | 3rd Qu.:0.000 | Max. :1.000 |
| Education_loan | Min. :0.000 | 1st Qu.:0.000 | Median :0.000 | Mean :0.112 | 3rd Qu.:0.000 | Max. :1.000 |
| Emp_status | Length:1000 | Class :character | Mode :character | NA | NA | NA |
| Amount | Min. : 244 | 1st Qu.:1016 | Median :1226 | Mean :1219 | 3rd Qu.:1420 | Max. :2362 |
| Saving_amount | Min. :2082 | 1st Qu.:2951 | Median :3203 | Mean :3179 | 3rd Qu.:3402 | Max. :4108 |
| Emp_duration | Min. : 0.00 | 1st Qu.: 15.00 | Median : 41.00 | Mean : 49.39 | 3rd Qu.: 85.00 | Max. :120.00 |
| Age | Min. :18.00 | 1st Qu.:29.00 | Median :32.00 | Mean :31.21 | 3rd Qu.:34.00 | Max. :42.00 |
| No_of_credit_acc | Min. :1.000 | 1st Qu.:1.000 | Median :2.000 | Mean :2.546 | 3rd Qu.:3.000 | Max. :9.000 |
| Var1 | Freq |
|---|---|
| 1 | 308 |
| 2 | 325 |
| 3 | 119 |
| 4 | 105 |
| 5 | 109 |
| 6 | 6 |
| 7 | 8 |
| 8 | 6 |
| 9 | 14 |
| Var1 | Freq |
|---|---|
| 0 | 0.7 |
| 1 | 0.3 |
| Checking_amount | Term | Credit_score | Amount | Saving_amount | Emp_duration | Age | |
|---|---|---|---|---|---|---|---|
| Checking_amount | 1.0000000 | -0.1916292 | 0.1892957 | -0.1153301 | 0.2013942 | 0.0698080 | 0.2974109 |
| Term | -0.1916292 | 1.0000000 | -0.1954363 | 0.0540702 | -0.1868427 | -0.0637356 | -0.2443853 |
| Credit_score | 0.1892957 | -0.1954363 | 1.0000000 | -0.0783984 | 0.2138242 | 0.0676228 | 0.3280754 |
| Amount | -0.1153301 | 0.0540702 | -0.0783984 | 1.0000000 | -0.0097196 | 0.0179394 | -0.1077698 |
| Saving_amount | 0.2013942 | -0.1868427 | 0.2138242 | -0.0097196 | 1.0000000 | 0.0909485 | 0.3430830 |
| Emp_duration | 0.0698080 | -0.0637356 | 0.0676228 | 0.0179394 | 0.0909485 | 1.0000000 | 0.0798093 |
| Age | 0.2974109 | -0.2443853 | 0.3280754 | -0.1077698 | 0.3430830 | 0.0798093 | 1.0000000 |
Lastly, after the variables were transformed into factors, the data set was reexamined. The following is a print out of that information.
'data.frame': 1000 obs. of 16 variables:
$ Default : Factor w/ 2 levels "0","1": 1 1 1 2 2 1 1 1 1 2 ...
$ Checking_amount : int 988 458 158 300 63 1071 -192 172 585 189 ...
$ Term : int 15 15 14 25 24 20 13 16 20 19 ...
$ Credit_score : int 796 813 756 737 662 828 856 763 778 649 ...
$ Gender : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 2 2 1 1 2 ...
$ Marital_status : Factor w/ 2 levels "Married","Single": 2 2 2 2 2 1 2 2 2 1 ...
$ Car_loan : Factor w/ 2 levels "0","1": 2 2 1 1 1 2 2 2 2 2 ...
$ Personal_loan : Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 1 ...
$ Home_loan : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Education_loan : Factor w/ 2 levels "0","1": 1 1 1 2 2 1 1 1 1 1 ...
$ Emp_status : Factor w/ 2 levels "employed","unemployed": 1 1 1 1 2 1 1 1 2 1 ...
$ Amount : int 1536 947 1678 1804 1184 475 626 1224 1162 786 ...
$ Saving_amount : int 3455 3600 3093 2449 2867 3282 3398 3022 3475 2711 ...
$ Emp_duration : int 12 25 43 0 4 12 11 12 12 0 ...
$ Age : int 38 36 34 29 30 32 38 36 36 29 ...
$ No_of_credit_acc: Factor w/ 9 levels "1","2","3","4",..: 1 1 1 1 1 2 1 1 1 1 ...
| GVIF | Df | GVIF^(1/(2*Df)) | |
|---|---|---|---|
| Checking_amount | 1.185956 | 1 | 1.089016 |
| Term | 1.045746 | 1 | 1.022617 |
| Credit_score | 1.099802 | 1 | 1.048714 |
| Gender | 2.573097 | 1 | 1.604088 |
| Marital_status | 2.721962 | 1 | 1.649837 |
| Car_loan | 83.354431 | 1 | 9.129865 |
| Personal_loan | 81.896071 | 1 | 9.049645 |
| Home_loan | 14.354499 | 1 | 3.788733 |
| Education_loan | 32.253932 | 1 | 5.679254 |
| Emp_status | 1.143766 | 1 | 1.069470 |
| Amount | 1.076582 | 1 | 1.037585 |
| Saving_amount | 1.228646 | 1 | 1.108443 |
| Emp_duration | 1.166378 | 1 | 1.079990 |
| Age | 1.242704 | 1 | 1.114766 |
| No_of_credit_acc | 1.338116 | 8 | 1.018371 |
| x | |
|---|---|
| Checking_amount | 1.115226 |
| Term | 1.015942 |
| Credit_score | 1.047873 |
| Saving_amount | 1.120649 |
| Age | 1.146111 |
| x | |
|---|---|
| Checking_amount | 1.177743 |
| Term | 1.020826 |
| Credit_score | 1.072932 |
| Personal_loan | 1.187600 |
| Home_loan | 1.176122 |
| Education_loan | 1.200096 |
| Emp_status | 1.031324 |
| Amount | 1.049029 |
| Saving_amount | 1.204225 |
| Age | 1.169251 |
The following is tabular output of the quantities of observations in each non overlapping partition employed in the k Fold Cross Validation procedure.
| folds | Freq |
|---|---|
| 1 | 195 |
| 2 | 186 |
| 3 | 223 |
| 4 | 199 |
| 5 | 197 |
| PE1 | PE2 | PE3 |
|---|---|---|
| 0.0741 | 0.0713 | 0.0631 |
| ACC1 | ACC2 | ACC3 |
|---|---|---|
| 0.9259 | 0.9287 | 0.9369 |
The models were not used to make inferences, so no output of the model summary or interpretations were given.
Note that both the automatic model and 2nd model might have differed with each iteration of k-Fold Cross Validation, based on their selection criteria. That is, if the full model was fitted with training data from differing folds, it may have produced a reduced model that could be entirely different from model 2. The same can be said for model 3.
Predictive Error among the three models did not differ by much. However, if one was to pick a model for prediction based on lowest predictive error, one might choose the third model. Further research might be directed to exploring the cause of the high GVIF values in model 1.