Learn-By-Building

Use any of the 3 classification algorithms you’ve learned to predict the risk status of a bank loan. The variable default in the dataset indicates whether the applicant did default on the loan issued by the bank. Start by reading the loan.csv dataset in, a dataset that is originally from Professor Dr. Hans Hofmann:

library(tm)
## Loading required package: NLP
library(mlr)
## Loading required package: ParamHelpers
library(e1071)
## 
## Attaching package: 'e1071'
## The following object is masked from 'package:mlr':
## 
##     impute
loans <- read.csv("loan.csv")

Use an R Markdown document to lay out your process, and explain the methodology in 1 or 2 brief paragraph. The student should be awarded the full (3) points when:
- The preprocessing steps are done, and the student show an understanding of holding out a test / cross validation set for an estimate of the model’s performance on unseen data
- The model’s performance is sufficiently explained (accuracy may not be the most helpful metric here! Recall about what you’ve learned regarding specificity and sensitivity)
- The student demonstrated extra effort in evaluating his/her model, and proposes ways to improve the accuracy obtained from the initial model

Processing Data

  1. Spliting the data to train and test data
# spliting data to train and test.
split_80 <- sample(nrow(loans), nrow(loans)*0.80)
loans.train <- loans[split_80, ]
loans.test <- loans[-split_80, ]

# portion of default 
prop.table(table(loans$default)) 
## 
##  no yes 
## 0.7 0.3
#Create a classification task for learning on loans Dataset and specify the target feature
task <- makeClassifTask(data = loans.train, target = "default")

#Initialize the Naive Bayes classifier
selected_model <- makeLearner("classif.naiveBayes")

#Train the model
NB_mlr <- train(selected_model, task)

NB_mlr$learner.model
## 
## Naive Bayes Classifier for Discrete Predictors
## 
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
## 
## A-priori probabilities:
## Y
##     no    yes 
## 0.6925 0.3075 
## 
## Conditional probabilities:
##      checking_balance
## Y         < 0 DM   > 200 DM 1 - 200 DM    unknown
##   no  0.20036101 0.06137184 0.24187726 0.49638989
##   yes 0.43495935 0.04878049 0.35365854 0.16260163
## 
##      months_loan_duration
## Y         [,1]     [,2]
##   no  19.56859 11.29674
##   yes 24.75203 13.40421
## 
##      credit_history
## Y       critical       good    perfect       poor  very good
##   no  0.35198556 0.50361011 0.01805054 0.09566787 0.03068592
##   yes 0.15853659 0.56504065 0.09349593 0.09349593 0.08943089
## 
##      purpose
## Y       business        car       car0  education furniture/appliances
##   no  0.08844765 0.32851986 0.01263538 0.04512635           0.50541516
##   yes 0.12601626 0.36585366 0.02032520 0.08130081           0.37804878
##      purpose
## Y     renovations
##   no   0.01985560
##   yes  0.02845528
## 
##      amount
## Y         [,1]     [,2]
##   no  3023.919 2477.857
##   yes 3872.890 3499.636
## 
##      savings_balance
## Y       < 100 DM  > 1000 DM 100 - 500 DM 500 - 1000 DM    unknown
##   no  0.54151625 0.05956679   0.10108303    0.07220217 0.22563177
##   yes 0.73983740 0.02032520   0.10975610    0.02845528 0.10162602
## 
##      employment_duration
## Y       < 1 year  > 7 years 1 - 4 years 4 - 7 years unemployed
##   no  0.13898917 0.27075812  0.33754513  0.19675090 0.05595668
##   yes 0.23170732 0.23170732  0.34552846  0.13008130 0.06097561
## 
##      percent_of_income
## Y         [,1]     [,2]
##   no  2.949458 1.117496
##   yes 3.032520 1.083720
## 
##      years_at_residence
## Y         [,1]     [,2]
##   no  2.828520 1.111106
##   yes 2.849593 1.105524
## 
##      age
## Y         [,1]     [,2]
##   no  36.30325 11.18390
##   yes 34.49187 11.23506
## 
##      other_credit
## Y           bank       none      store
##   no  0.12454874 0.82851986 0.04693141
##   yes 0.18699187 0.75203252 0.06097561
## 
##      housing
## Y          other        own       rent
##   no  0.09747292 0.75992780 0.14259928
##   yes 0.15447154 0.60975610 0.23577236
## 
##      existing_loans_count
## Y         [,1]      [,2]
##   no  1.422383 0.5848640
##   yes 1.394309 0.5811742
## 
##      job
## Y     management    skilled unemployed  unskilled
##   no  0.13898917 0.62996390 0.02527076 0.20577617
##   yes 0.17073171 0.63414634 0.02032520 0.17479675
## 
##      dependents
## Y         [,1]      [,2]
##   no  1.160650 0.3675395
##   yes 1.158537 0.3659880
## 
##      phone
## Y            no       yes
##   no  0.5884477 0.4115523
##   yes 0.6016260 0.3983740
predictions_mlr = as.data.frame(predict(NB_mlr, newdata = loans.train[,1:3]))
 
##Confusion matrix to check accuracy
table(predictions_mlr[,1],loans.train$default)
##      
##        no yes
##   no  507 159
##   yes  47  87
reca <- round(104/(104+154),2)
spec <- round(485/(485+57),2)

paste("Recall:", reca)
## [1] "Recall: 0.4"
paste("Specificity:", spec)
## [1] "Specificity: 0.89"

Decission Tree

library(partykit)
## Loading required package: grid
## Loading required package: libcoin
## Loading required package: mvtnorm
default <- ctree(default ~ ., loans.train)
plot(default, type="simple")