Objective

Prescriptive analytics is what happens when you take predictions made and use them to make strategic changes, e.g. to a business model in order to refocus or enhance the model. This laboratory will specifically look at credit risk whereby credit risk is the risk of default on a debt due to a borrower failing to make the required payments in a timely manner.The bank , as the financial institution, will analyze customer data to predict which customers might be credit risks. These predictions will then feed into risk management.

This particular laboratory focuses on the use of Bayesian methods, specifically Naïve Bayesian Classifiers. We will use the brute force part which is we’ll apply techniques to improve performance individually and manually to the original algorithms.

Method 1

Thi lab uses Credit Data. Data was imported and analyzed.Missing values were checked and the data does not have any missing values.

library(readxl)
creditData <- read_excel("/Users/Rodda Ouma/Documents/Harrisburg/Machine Learning/creditData.xlsx")

##converting excel spreadsheet to dataframe

creditData<-as.data.frame(creditData)
str(creditData)
## 'data.frame':    1000 obs. of  21 variables:
##  $ Creditability                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Account Balance                  : num  1 1 2 1 1 1 1 1 4 2 ...
##  $ Duration of Credit (month)       : num  18 9 12 12 12 10 8 6 18 24 ...
##  $ Payment Status of Previous Credit: num  4 4 2 4 4 4 4 4 4 2 ...
##  $ Purpose                          : num  2 0 9 0 0 0 0 0 3 3 ...
##  $ Credit Amount                    : num  1049 2799 841 2122 2171 ...
##  $ Value Savings/Stocks             : num  1 1 2 1 1 1 1 1 1 3 ...
##  $ Length of current employment     : num  2 3 4 3 3 2 4 2 1 1 ...
##  $ Instalment per cent              : num  4 2 2 3 4 1 1 2 4 1 ...
##  $ Sex & Marital Status             : num  2 3 2 3 3 3 3 3 2 2 ...
##  $ Guarantors                       : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Duration in Current address      : num  4 2 4 2 4 3 4 4 4 4 ...
##  $ Most valuable available asset    : num  2 1 1 1 2 1 1 1 3 4 ...
##  $ Age (years)                      : num  21 36 23 39 38 48 39 40 65 23 ...
##  $ Concurrent Credits               : num  3 3 3 3 1 3 3 3 3 3 ...
##  $ Type of apartment                : num  1 1 1 1 2 1 2 2 2 1 ...
##  $ No of Credits at this Bank       : num  1 2 1 2 2 2 2 1 2 1 ...
##  $ Occupation                       : num  3 3 2 2 2 2 2 2 1 1 ...
##  $ No of dependents                 : num  1 2 1 2 1 2 1 2 1 1 ...
##  $ Telephone                        : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Foreign Worker                   : num  1 1 1 2 2 2 2 2 1 1 ...
##Exploring the data by checking for missing data
sum(is.na(creditData))
## [1] 0
summary(creditData)
##  Creditability Account Balance Duration of Credit (month)
##  Min.   :0.0   Min.   :1.000   Min.   : 4.0              
##  1st Qu.:0.0   1st Qu.:1.000   1st Qu.:12.0              
##  Median :1.0   Median :2.000   Median :18.0              
##  Mean   :0.7   Mean   :2.577   Mean   :20.9              
##  3rd Qu.:1.0   3rd Qu.:4.000   3rd Qu.:24.0              
##  Max.   :1.0   Max.   :4.000   Max.   :72.0              
##  Payment Status of Previous Credit    Purpose       Credit Amount  
##  Min.   :0.000                     Min.   : 0.000   Min.   :  250  
##  1st Qu.:2.000                     1st Qu.: 1.000   1st Qu.: 1366  
##  Median :2.000                     Median : 2.000   Median : 2320  
##  Mean   :2.545                     Mean   : 2.828   Mean   : 3271  
##  3rd Qu.:4.000                     3rd Qu.: 3.000   3rd Qu.: 3972  
##  Max.   :4.000                     Max.   :10.000   Max.   :18424  
##  Value Savings/Stocks Length of current employment Instalment per cent
##  Min.   :1.000        Min.   :1.000                Min.   :1.000      
##  1st Qu.:1.000        1st Qu.:3.000                1st Qu.:2.000      
##  Median :1.000        Median :3.000                Median :3.000      
##  Mean   :2.105        Mean   :3.384                Mean   :2.973      
##  3rd Qu.:3.000        3rd Qu.:5.000                3rd Qu.:4.000      
##  Max.   :5.000        Max.   :5.000                Max.   :4.000      
##  Sex & Marital Status   Guarantors    Duration in Current address
##  Min.   :1.000        Min.   :1.000   Min.   :1.000              
##  1st Qu.:2.000        1st Qu.:1.000   1st Qu.:2.000              
##  Median :3.000        Median :1.000   Median :3.000              
##  Mean   :2.682        Mean   :1.145   Mean   :2.845              
##  3rd Qu.:3.000        3rd Qu.:1.000   3rd Qu.:4.000              
##  Max.   :4.000        Max.   :3.000   Max.   :4.000              
##  Most valuable available asset  Age (years)    Concurrent Credits
##  Min.   :1.000                 Min.   :19.00   Min.   :1.000     
##  1st Qu.:1.000                 1st Qu.:27.00   1st Qu.:3.000     
##  Median :2.000                 Median :33.00   Median :3.000     
##  Mean   :2.358                 Mean   :35.54   Mean   :2.675     
##  3rd Qu.:3.000                 3rd Qu.:42.00   3rd Qu.:3.000     
##  Max.   :4.000                 Max.   :75.00   Max.   :3.000     
##  Type of apartment No of Credits at this Bank   Occupation   
##  Min.   :1.000     Min.   :1.000              Min.   :1.000  
##  1st Qu.:2.000     1st Qu.:1.000              1st Qu.:3.000  
##  Median :2.000     Median :1.000              Median :3.000  
##  Mean   :1.928     Mean   :1.407              Mean   :2.904  
##  3rd Qu.:2.000     3rd Qu.:2.000              3rd Qu.:3.000  
##  Max.   :3.000     Max.   :4.000              Max.   :4.000  
##  No of dependents   Telephone     Foreign Worker 
##  Min.   :1.000    Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000    1st Qu.:1.000   1st Qu.:1.000  
##  Median :1.000    Median :1.000   Median :1.000  
##  Mean   :1.155    Mean   :1.404   Mean   :1.037  
##  3rd Qu.:1.000    3rd Qu.:2.000   3rd Qu.:1.000  
##  Max.   :2.000    Max.   :2.000   Max.   :2.000
##converted to a factor because the Naives Bayes classification needs a categorical variable inorder to run.
creditData$Creditability <-as.factor(creditData$Creditability)

summary(creditData$Creditability)
##   0   1 
## 300 700

Laboratory 2:Naives Bayes Classifiers, Part 1

Training a Model on the Data

75%/25% split for training and test data, i.e. use 75% of the records for the training set and 25% of the records for the test set.

set.seed(12345)

credit_rand <-creditData[order(runif(1000)),]
credit_train <- credit_rand [1:750,]
credit_test <- credit_rand [751:1000,]



prop.table(table(credit_train$Creditability))
## 
##         0         1 
## 0.3146667 0.6853333
prop.table(table(credit_test$Creditability))
## 
##     0     1 
## 0.256 0.744

We use the Naive bayes classification to build the Naive Bayes classification model.

naive_model<-naive_bayes(Creditability ~ ., data = credit_train)

naive_model
## ===================== Naive Bayes ===================== 
## Call: 
## naive_bayes.formula(formula = Creditability ~ ., data = credit_train)
## 
## A priori probabilities: 
## 
##         0         1 
## 0.3146667 0.6853333 
## 
## Tables: 
##                
## Account Balance        0        1
##            mean 1.923729 2.793774
##            sd   1.036826 1.252008
## 
##                           
## Duration of Credit (month)        0        1
##                       mean 24.46610 19.20039
##                       sd   13.82208 11.13433
## 
##                                  
## Payment Status of Previous Credit        0        1
##                              mean 2.161017 2.665370
##                              sd   1.071649 1.045219
## 
##        
## Purpose        0        1
##    mean 2.927966 2.803502
##    sd   2.944722 2.633253
## 
##              
## Credit Amount        0        1
##          mean 3964.195 2984.177
##          sd   3597.093 2379.685
## 
## # ... and 15 more tables

From the results above,68.5% of the creditors are worthy. we can then evaluate our model by lookign at the accuracy.

Model Evaluation
##Model Evaluation
conf_nat <-table(predict(naive_model,credit_test),credit_test$Creditability)
conf_nat
##    
##       0   1
##   0  42  35
##   1  22 151
(Accuracy<-sum(diag(conf_nat))/sum(conf_nat)*100)
## [1] 77.2

From the results above, the model is 77.2 % accurate. We will then use other methods to see if we can improve the performance of the model using feature selection of the variables.

Method 2: Using Feature Selection.

#we'll manually work to improve the performance of the Naïve Bayes classifier

##We first randomize the data
credit_rand2<- creditData[order(runif(1000)), ]
str(credit_rand2)
## 'data.frame':    1000 obs. of  21 variables:
##  $ Creditability                    : Factor w/ 2 levels "0","1": 2 2 1 2 2 1 1 2 2 2 ...
##  $ Account Balance                  : num  2 2 1 4 2 4 2 3 1 4 ...
##  $ Duration of Credit (month)       : num  42 24 24 18 12 24 18 18 18 12 ...
##  $ Payment Status of Previous Credit: num  4 4 2 2 2 2 2 2 2 4 ...
##  $ Purpose                          : num  9 1 0 9 9 3 0 2 0 6 ...
##  $ Credit Amount                    : num  5954 7758 1371 1950 841 ...
##  $ Value Savings/Stocks             : num  1 4 5 1 2 3 5 1 2 1 ...
##  $ Length of current employment     : num  4 5 3 4 4 5 3 2 4 4 ...
##  $ Instalment per cent              : num  2 2 4 4 2 3 4 1 4 2 ...
##  $ Sex & Marital Status             : num  2 2 2 3 2 3 2 2 3 3 ...
##  $ Guarantors                       : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Duration in Current address      : num  1 4 4 1 4 2 2 1 3 3 ...
##  $ Most valuable available asset    : num  1 4 1 3 1 3 2 2 3 1 ...
##  $ Age (years)                      : num  41 29 25 34 23 35 33 45 30 49 ...
##  $ Concurrent Credits               : num  1 3 3 2 3 1 3 2 3 3 ...
##  $ Type of apartment                : num  2 1 1 2 1 2 2 2 2 2 ...
##  $ No of Credits at this Bank       : num  2 1 1 2 1 2 1 1 1 1 ...
##  $ Occupation                       : num  2 3 3 3 2 3 3 2 4 2 ...
##  $ No of dependents                 : num  1 1 1 1 1 1 1 1 2 2 ...
##  $ Telephone                        : num  1 1 1 2 1 2 1 1 2 1 ...
##  $ Foreign Worker                   : num  1 1 1 1 1 1 1 1 1 1 ...

The next step is to scale the data but we first remove the categorical variable in the first column.

## we scale the data
creditDataScaled <- scale(credit_rand2[,2:ncol(credit_rand2)], center=TRUE, scale = TRUE)
View(creditDataScaled)

We then calculate the correlation among the variables using the Find correlation function. A cut off of 0.3 is used to determine the highly correlated variables which are then removed from the dataframe.

##WE use correlation matrix to perform feature, i.e. variable selection. 
##compute the correlation matrix
#note that this does not include the class variable

m <- cor(creditDataScaled)

m
##                                   Account Balance
## Account Balance                       1.000000000
## Duration of Credit (month)           -0.072013088
## Payment Status of Previous Credit     0.192190688
## Purpose                               0.028782569
## Credit Amount                        -0.042695127
## Value Savings/Stocks                  0.222866860
## Length of current employment          0.106338752
## Instalment per cent                  -0.005279856
## Sex & Marital Status                  0.043261280
## Guarantors                           -0.127736563
## Duration in Current address          -0.042233689
## Most valuable available asset        -0.032260126
## Age (years)                           0.058630740
## Concurrent Credits                    0.068273870
## Type of apartment                     0.023335309
## No of Credits at this Bank            0.076005137
## Occupation                            0.040663061
## No of dependents                     -0.014145427
## Telephone                             0.066295834
## Foreign Worker                       -0.035186993
##                                   Duration of Credit (month)
## Account Balance                                  -0.07201309
## Duration of Credit (month)                        1.00000000
## Payment Status of Previous Credit                -0.07718647
## Purpose                                           0.14749187
## Credit Amount                                     0.62498846
## Value Savings/Stocks                              0.04766092
## Length of current employment                      0.05738103
## Instalment per cent                               0.07474882
## Sex & Marital Status                              0.01478933
## Guarantors                                       -0.02448995
## Duration in Current address                       0.03406720
## Most valuable available asset                     0.30397125
## Age (years)                                      -0.03754986
## Concurrent Credits                               -0.06288379
## Type of apartment                                 0.15312556
## No of Credits at this Bank                       -0.01128360
## Occupation                                        0.21090973
## No of dependents                                 -0.02383448
## Telephone                                         0.16471821
## Foreign Worker                                   -0.13467996
##                                   Payment Status of Previous Credit
## Account Balance                                          0.19219069
## Duration of Credit (month)                              -0.07718647
## Payment Status of Previous Credit                        1.00000000
## Purpose                                                 -0.09033589
## Credit Amount                                           -0.05991485
## Value Savings/Stocks                                     0.03905788
## Length of current employment                             0.13822522
## Instalment per cent                                      0.04437459
## Sex & Marital Status                                     0.04217088
## Guarantors                                              -0.04067553
## Duration in Current address                              0.06319797
## Most valuable available asset                           -0.05377676
## Age (years)                                              0.14633747
## Concurrent Credits                                       0.15995707
## Type of apartment                                        0.06142792
## No of Credits at this Bank                               0.43706577
## Occupation                                               0.01035018
## No of dependents                                         0.01154955
## Telephone                                                0.05237019
## Foreign Worker                                           0.02855405
##                                         Purpose Credit Amount
## Account Balance                    0.0287825694  -0.042695127
## Duration of Credit (month)         0.1474918712   0.624988461
## Payment Status of Previous Credit -0.0903358941  -0.059914852
## Purpose                            1.0000000000   0.068480054
## Credit Amount                      0.0684800535   1.000000000
## Value Savings/Stocks              -0.0186844687   0.064632168
## Length of current employment       0.0160130053  -0.008376109
## Instalment per cent                0.0483689475  -0.271322281
## Sex & Marital Status               0.0001565929  -0.016094338
## Guarantors                        -0.0176067538  -0.027830917
## Duration in Current address       -0.0382213445   0.028916676
## Most valuable available asset      0.0109663534   0.311602093
## Age (years)                       -0.0008923856   0.032272677
## Concurrent Credits                -0.1002303932  -0.069392010
## Type of apartment                  0.0134946967   0.133023634
## No of Credits at this Bank         0.0549353555   0.020785277
## Occupation                         0.0080847757   0.285393073
## No of dependents                  -0.0325768744   0.017143582
## Telephone                          0.0783705414   0.277000181
## Foreign Worker                    -0.1132436689  -0.030661601
##                                   Value Savings/Stocks
## Account Balance                            0.222866860
## Duration of Credit (month)                 0.047660924
## Payment Status of Previous Credit          0.039057881
## Purpose                                   -0.018684469
## Credit Amount                              0.064632168
## Value Savings/Stocks                       1.000000000
## Length of current employment               0.120949514
## Instalment per cent                        0.021992529
## Sex & Marital Status                       0.017348689
## Guarantors                                -0.105068513
## Duration in Current address                0.091424109
## Most valuable available asset              0.018948001
## Age (years)                                0.083433512
## Concurrent Credits                         0.001907967
## Type of apartment                          0.006643819
## No of Credits at this Bank                -0.021644133
## Occupation                                 0.011708920
## No of dependents                           0.027513789
## Telephone                                  0.087208402
## Foreign Worker                             0.010449560
##                                   Length of current employment
## Account Balance                                    0.106338752
## Duration of Credit (month)                         0.057381027
## Payment Status of Previous Credit                  0.138225216
## Purpose                                            0.016013005
## Credit Amount                                     -0.008376109
## Value Savings/Stocks                               0.120949514
## Length of current employment                       1.000000000
## Instalment per cent                                0.126161307
## Sex & Marital Status                               0.111278288
## Guarantors                                        -0.008116008
## Duration in Current address                        0.245080745
## Most valuable available asset                      0.087187468
## Age (years)                                        0.259116153
## Concurrent Credits                                -0.007279305
## Type of apartment                                  0.115077459
## No of Credits at this Bank                         0.125790651
## Occupation                                         0.101224870
## No of dependents                                   0.097192004
## Telephone                                          0.060518081
## Foreign Worker                                    -0.022845318
##                                   Instalment per cent Sex & Marital Status
## Account Balance                          -0.005279856         0.0432612798
## Duration of Credit (month)                0.074748816         0.0147893320
## Payment Status of Previous Credit         0.044374587         0.0421708809
## Purpose                                   0.048368947         0.0001565929
## Credit Amount                            -0.271322281        -0.0160943379
## Value Savings/Stocks                      0.021992529         0.0173486885
## Length of current employment              0.126161307         0.1112782879
## Instalment per cent                       1.000000000         0.1193079016
## Sex & Marital Status                      0.119307902         1.0000000000
## Guarantors                               -0.011397639         0.0506338891
## Duration in Current address               0.049302371        -0.0272690320
## Most valuable available asset             0.053391413        -0.0069404770
## Age (years)                               0.057270750         0.0051498271
## Concurrent Credits                        0.007893967        -0.0267469446
## Type of apartment                         0.091228577         0.0989338012
## No of Credits at this Bank                0.021668743         0.0646718729
## Occupation                                0.097755393        -0.0119563566
## No of dependents                         -0.071206943         0.1221648450
## Telephone                                 0.014412880         0.0272748748
## Foreign Worker                           -0.094762307         0.0731034045
##                                     Guarantors Duration in Current address
## Account Balance                   -0.127736563                -0.042233689
## Duration of Credit (month)        -0.024489950                 0.034067202
## Payment Status of Previous Credit -0.040675530                 0.063197969
## Purpose                           -0.017606754                -0.038221345
## Credit Amount                     -0.027830917                 0.028916676
## Value Savings/Stocks              -0.105068513                 0.091424109
## Length of current employment      -0.008116008                 0.245080745
## Instalment per cent               -0.011397639                 0.049302371
## Sex & Marital Status               0.050633889                -0.027269032
## Guarantors                         1.000000000                -0.025677506
## Duration in Current address       -0.025677506                 1.000000000
## Most valuable available asset     -0.155450138                 0.147231116
## Age (years)                       -0.029825663                 0.265626478
## Concurrent Credits                -0.038235049                 0.022654074
## Type of apartment                 -0.065449419                 0.009989899
## No of Credits at this Bank        -0.025446800                 0.089625233
## Occupation                        -0.057962986                 0.012654644
## No of dependents                   0.020399584                 0.042643426
## Telephone                         -0.075034578                 0.095359367
## Foreign Worker                     0.140190191                -0.039690633
##                                   Most valuable available asset
## Account Balance                                    -0.032260126
## Duration of Credit (month)                          0.303971245
## Payment Status of Previous Credit                  -0.053776760
## Purpose                                             0.010966353
## Credit Amount                                       0.311602093
## Value Savings/Stocks                                0.018948001
## Length of current employment                        0.087187468
## Instalment per cent                                 0.053391413
## Sex & Marital Status                               -0.006940477
## Guarantors                                         -0.155450138
## Duration in Current address                         0.147231116
## Most valuable available asset                       1.000000000
## Age (years)                                         0.074551454
## Concurrent Credits                                 -0.107593324
## Type of apartment                                   0.342968580
## No of Credits at this Bank                         -0.007765020
## Occupation                                          0.276149365
## No of dependents                                    0.011871999
## Telephone                                           0.196801583
## Foreign Worker                                     -0.132461796
##                                     Age (years) Concurrent Credits
## Account Balance                    0.0586307400        0.068273870
## Duration of Credit (month)        -0.0375498629       -0.062883787
## Payment Status of Previous Credit  0.1463374687        0.159957065
## Purpose                           -0.0008923856       -0.100230393
## Credit Amount                      0.0322726775       -0.069392010
## Value Savings/Stocks               0.0834335122        0.001907967
## Length of current employment       0.2591161527       -0.007279305
## Instalment per cent                0.0572707503        0.007893967
## Sex & Marital Status               0.0051498271       -0.026746945
## Guarantors                        -0.0298256629       -0.038235049
## Duration in Current address        0.2656264783        0.022654074
## Most valuable available asset      0.0745514538       -0.107593324
## Age (years)                        1.0000000000       -0.030471934
## Concurrent Credits                -0.0304719341        1.000000000
## Type of apartment                  0.3033464109       -0.097397651
## No of Credits at this Bank         0.1507176513       -0.055809873
## Occupation                         0.0153830309        0.006077318
## No of dependents                   0.1185891829       -0.076890642
## Telephone                          0.1435058472       -0.025139895
## Foreign Worker                     0.0139811872        0.007699595
##                                   Type of apartment
## Account Balance                         0.023335309
## Duration of Credit (month)              0.153125556
## Payment Status of Previous Credit       0.061427919
## Purpose                                 0.013494697
## Credit Amount                           0.133023634
## Value Savings/Stocks                    0.006643819
## Length of current employment            0.115077459
## Instalment per cent                     0.091228577
## Sex & Marital Status                    0.098933801
## Guarantors                             -0.065449419
## Duration in Current address             0.009989899
## Most valuable available asset           0.342968580
## Age (years)                             0.303346411
## Concurrent Credits                     -0.097397651
## Type of apartment                       1.000000000
## No of Credits at this Bank              0.050019938
## Occupation                              0.104243222
## No of dependents                        0.115548584
## Telephone                               0.100326589
## Foreign Worker                         -0.083336024
##                                   No of Credits at this Bank   Occupation
## Account Balance                                   0.07600514  0.040663061
## Duration of Credit (month)                       -0.01128360  0.210909735
## Payment Status of Previous Credit                 0.43706577  0.010350179
## Purpose                                           0.05493536  0.008084776
## Credit Amount                                     0.02078528  0.285393073
## Value Savings/Stocks                             -0.02164413  0.011708920
## Length of current employment                      0.12579065  0.101224870
## Instalment per cent                               0.02166874  0.097755393
## Sex & Marital Status                              0.06467187 -0.011956357
## Guarantors                                       -0.02544680 -0.057962986
## Duration in Current address                       0.08962523  0.012654644
## Most valuable available asset                    -0.00776502  0.276149365
## Age (years)                                       0.15071765  0.015383031
## Concurrent Credits                               -0.05580987  0.006077318
## Type of apartment                                 0.05001994  0.104243222
## No of Credits at this Bank                        1.00000000 -0.026321269
## Occupation                                       -0.02632127  1.000000000
## No of dependents                                  0.10966670 -0.093559276
## Telephone                                         0.06555321  0.383022159
## Foreign Worker                                   -0.01889259 -0.092834959
##                                   No of dependents   Telephone
## Account Balance                        -0.01414543  0.06629583
## Duration of Credit (month)             -0.02383448  0.16471821
## Payment Status of Previous Credit       0.01154955  0.05237019
## Purpose                                -0.03257687  0.07837054
## Credit Amount                           0.01714358  0.27700018
## Value Savings/Stocks                    0.02751379  0.08720840
## Length of current employment            0.09719200  0.06051808
## Instalment per cent                    -0.07120694  0.01441288
## Sex & Marital Status                    0.12216485  0.02727487
## Guarantors                              0.02039958 -0.07503458
## Duration in Current address             0.04264343  0.09535937
## Most valuable available asset           0.01187200  0.19680158
## Age (years)                             0.11858918  0.14350585
## Concurrent Credits                     -0.07689064 -0.02513990
## Type of apartment                       0.11554858  0.10032659
## No of Credits at this Bank              0.10966670  0.06555321
## Occupation                             -0.09355928  0.38302216
## No of dependents                        1.00000000 -0.01475344
## Telephone                              -0.01475344  1.00000000
## Foreign Worker                          0.07707085 -0.07501222
##                                   Foreign Worker
## Account Balance                     -0.035186993
## Duration of Credit (month)          -0.134679963
## Payment Status of Previous Credit    0.028554048
## Purpose                             -0.113243669
## Credit Amount                       -0.030661601
## Value Savings/Stocks                 0.010449560
## Length of current employment        -0.022845318
## Instalment per cent                 -0.094762307
## Sex & Marital Status                 0.073103405
## Guarantors                           0.140190191
## Duration in Current address         -0.039690633
## Most valuable available asset       -0.132461796
## Age (years)                          0.013981187
## Concurrent Credits                   0.007699595
## Type of apartment                   -0.083336024
## No of Credits at this Bank          -0.018892588
## Occupation                          -0.092834959
## No of dependents                     0.077070853
## Telephone                           -0.075012215
## Foreign Worker                       1.000000000
##we want to find the varibales that are having correlation co-efficient more than 0.3
highlycor <- findCorrelation(m, 0.30)
highlycor
## [1]  5 12 19 15  3
#Remove highly correlated data and then subdivide train and tests

filteredData <- credit_rand2[, -(highlycor)]
Model Training
#Model split between train and test
str(filteredData)
## 'data.frame':    1000 obs. of  16 variables:
##  $ Creditability                    : Factor w/ 2 levels "0","1": 2 2 1 2 2 1 1 2 2 2 ...
##  $ Account Balance                  : num  2 2 1 4 2 4 2 3 1 4 ...
##  $ Payment Status of Previous Credit: num  4 4 2 2 2 2 2 2 2 4 ...
##  $ Credit Amount                    : num  5954 7758 1371 1950 841 ...
##  $ Value Savings/Stocks             : num  1 4 5 1 2 3 5 1 2 1 ...
##  $ Length of current employment     : num  4 5 3 4 4 5 3 2 4 4 ...
##  $ Instalment per cent              : num  2 2 4 4 2 3 4 1 4 2 ...
##  $ Sex & Marital Status             : num  2 2 2 3 2 3 2 2 3 3 ...
##  $ Guarantors                       : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Most valuable available asset    : num  1 4 1 3 1 3 2 2 3 1 ...
##  $ Age (years)                      : num  41 29 25 34 23 35 33 45 30 49 ...
##  $ Type of apartment                : num  2 1 1 2 1 2 2 2 2 2 ...
##  $ No of Credits at this Bank       : num  2 1 1 2 1 2 1 1 1 1 ...
##  $ Occupation                       : num  2 3 3 3 2 3 3 2 4 2 ...
##  $ Telephone                        : num  1 1 1 2 1 2 1 1 2 1 ...
##  $ Foreign Worker                   : num  1 1 1 1 1 1 1 1 1 1 ...
filteredTraining <- filteredData[1:750, ]
filteredTest <- filteredData[751:1000, ]

##TRRain the Data 
library(naivebayes)
nb_model <- naive_bayes(Creditability ~ ., data=filteredTraining)

## Evaluate the model
filteredTestPred <- predict(nb_model, newdata = filteredTest)

table(filteredTestPred, filteredTest$Creditability)
##                 
## filteredTestPred   0   1
##                0  52  47
##                1  18 133
Model Evaluation
(conf_nat <- table(filteredTestPred, filteredTest$Creditability))
##                 
## filteredTestPred   0   1
##                0  52  47
##                1  18 133
(Accuracy <- sum(diag(conf_nat))/sum(conf_nat)*100)
## [1] 74

From the results above, the accuracy of the model reduced to 75%. This shows that the feature selection did not really improve the performance of the model.

Conclusion:

  • When we randomized the data we got slightly different accuracy Result of 75.2% (down from 77.2%).

  • Now True Positive is 53, True Negative 135 and False Negative - 45, and False Negative - 17.

  • The performance didn’t improve due to randomization because the data set is too small, however the approach itself should work on a large data set.

  • In order to improve accuracy of dataset False Negative and False Positive should be as small as possible. For example, if False Negative and False Positive is zero or close to zero then our Accuracy result will be 100% or close to 100%.

  • In addition, True Positive and True Negative should significantly outweight False Positive and False Negative to get us to a higher accuracy level.