1000 clients asking for a mortgage
Original data set:
- 20 attributes
- 7 numerical, 13 categorical
Modified data set:
- 24 attributes
- all numerical
Cost matrix:
| NO | YES | |
|---|---|---|
| NO | 0 | 5 |
| YES | 1 | 0 |
Luigi Ruberto
1000 clients asking for a mortgage
Original data set:
Modified data set:
Cost matrix:
| NO | YES | |
|---|---|---|
| NO | 0 | 5 |
| YES | 1 | 0 |
credit.l <- lda(V25 ~ ., prior = c(1, 1)/2, data = train)
Error matrix:
| 0 | 1 | |
|---|---|---|
| 0 | 39 | 18 |
| 1 | 33 | 110 |
Probability matrix:
| 0 | 1 | |
|---|---|---|
| 0 | 0.195 | 0.09 |
| 1 | 0.165 | 0.55 |
Total cost:
[1] 123
credit.q <- qda(V25 ~ ., prior = c(1, 1)/2, data = train)
Error matrix:
| 0 | 1 | |
|---|---|---|
| 0 | 35 | 22 |
| 1 | 42 | 101 |
Probability matrix:
| 0 | 1 | |
|---|---|---|
| 0 | 0.175 | 0.110 |
| 1 | 0.210 | 0.505 |
Total cost:
[1] 152
credit.g <- glm(V25 ~ ., family = binomial, data = train)
Probability threshold = 0.5
Error matrix:
| FALSE | TRUE | |
|---|---|---|
| 0 | 24 | 33 |
| 1 | 14 | 129 |
Probability matrix:
| FALSE | TRUE | |
|---|---|---|
| 0 | 0.12 | 0.165 |
| 1 | 0.07 | 0.645 |
Total cost:
[1] 179
Probability threshold = 0.8
Error matrix:
| FALSE | TRUE | |
|---|---|---|
| 0 | 46 | 11 |
| 1 | 52 | 91 |
Probability matrix:
| FALSE | TRUE | |
|---|---|---|
| 0 | 0.23 | 0.055 |
| 1 | 0.26 | 0.455 |
Total cost:
[1] 107
credit.r <- rpart(formula = V25 ~ ., data = train, method = "class", cp = 0.001)
Error matrix
| FALSE | TRUE | |
|---|---|---|
| 0 | 29 | 28 |
| 1 | 36 | 107 |
Probability matrix:
| FALSE | TRUE | |
|---|---|---|
| 0 | 0.145 | 0.140 |
| 1 | 0.180 | 0.535 |
Total cost:
[1] 176
Cross-validation to find K that minimizes the cost
[1] "K = " "2"
credit.K <- knn(train[, -25], test[, -25], cl, k = K)
Error matrix
| 0 | 1 | |
|---|---|---|
| 0 | 26 | 31 |
| 1 | 32 | 111 |
Probability matrix:
| 0 | 1 | |
|---|---|---|
| 0 | 0.13 | 0.155 |
| 1 | 0.16 | 0.555 |
Total cost:
[1] 187
Cross-validation to find C and gamma that minimize the cost
| 2^-13 | 2^-11 | 2^-9 | 2^-7 | 2^-5 | 2^-3 | |
|---|---|---|---|---|---|---|
| 2^5 | 94.6 | 70.6 | 68.1 | 65.9 | 64.9 | 80.7 |
| 2^7 | 69.5 | 68.2 | 65.2 | 70.6 | 68.5 | 80.7 |
| 2^9 | 68.4 | 66.0 | 67.2 | 68.0 | 72.6 | 80.7 |
| 2^11 | 65.5 | 65.2 | 71.1 | 69.8 | 72.6 | 80.7 |
| 2^13 | 65.1 | 68.6 | 68.6 | 73.2 | 72.6 | 80.7 |
| 2^15 | 65.8 | 71.9 | 69.2 | 72.4 | 72.6 | 80.7 |
[1] "C = " "32"
[1] "gamma = " "0.03125"
credit.S <- svm(formula = V25 ~ ., data = train, type = "C-classification",
C = C., gamma = gamma.)
Error matrix
| 0 | 1 | |
|---|---|---|
| 0 | 23 | 34 |
| 1 | 12 | 131 |
Probability matrix:
| 0 | 1 | |
|---|---|---|
| 0 | 0.115 | 0.170 |
| 1 | 0.060 | 0.655 |
Total cost:
[1] 182
Best method
## [1] "LR"