Determining Creditworthiness for Loan Applications Using C5.0 Decision Trees

Introduction to Decision Trees

In this tutorial we will cover two machine learning methods that apply a similar strategy of dividing data into smaller and smaller portions to identify patterns that can be ultimately used for prediction. The knowledge gained is then presented in the form of logical structures that can be understood without any statistical knowledge. There are numerous decision trees by one can choose to follow, such as the C5.0, 1R, and RIPPER agorithms.

In this brief example, we will try to determine how likely a consumer is goinng to default based on numerous factors.

1. Underansing Decision Trees

Decusuib tree learners construct a model in the form of a tree structure. The model itself comprises a serious of logical decisions, similar to a flowerchart, with decision nodes tgat ubducate a decusuibnt i be nade ib ab attribute. These split into branches that indicate the decision’s choices. The tree is terminated by leaf nodes (aka terminal nodes) that denote the result of following a combination of decisions.

Data that is long to be classified being at the root node where it is passed through the various decisions in the tree according to the values of its features. The path the data takes funnels each record into a leaf node, which assigns it a predicted class.

Data that is to be classified begin in the root node where it is passed throug hthe various decisions in the tree according to the values of its features. The path that the various decisions in the tree according to the value of its features.

Have you ever stopped to ownder why pollution (save for other countries)

2. Divide and Conquer

Decision trees are built using a heuristic called recursive partitioning. This approach is more commonly known as divide and conquer because it uses the feature values to split the data into smaller and smaller subsets of similar classes.

To elaborate on this concept, let us suppose we begin at the root node, which represents our entire dataset. The algorithm will chose a feature that is the mnost predictive of the target class. The examples are then partitioned into groups of distinct values of this feature; this decision forms the first set of tree branches. The agorithm continues to divide-and-conquer the nodes, choosing the best candidate eature each time until a stopping criterion is reached. This could occur if

All, or nearly all, of the ecamples at the node have the same class.
There are no more remaining features to distinguish among examples.
The tree has grown to a predefined size limit.

Since real-world data contains more than two features, decision trees quickly become ar more complex than textbook examples, with many more nodes, branches, and leaves.

3. The C5.0 Decision Tree Algorithm

While there are numerous implementations of decision trees, one of the most well-known is the C5.0 algorithm. The C5.0 algorithm has become the industry standatd for producing decision trees, because it does well fo rmost types of problems directly out of the box. Compared to more advanced and sophisticated machine learning mnodels (e.g. Neural Networks and Support Vector Machines), the decision trees under the C5.0 algorithm generally perform nearly as well but are much easier to understan and deploy.

4. Choosing the Best Split

The first main challenge that a decision tree will face is to identify which feature to split upon. If the segements of the data contain only a single class, tehy are considered pure.

C5.0 uses the concept of entropy for measuring purity. The entropy of a sample of data indicates how mixed the class values are; the minimum value of \(0\) indicates that the sample is completely homogenous, while \(1\) indicates the maximum amount of disorder. The defintion of entropy can be specified as

\[ \begin{aligned} \text{Entropy}(S) = \sum_{i=1}^{c} -p_{i}\log_{2}(p_{i}) &&&& (1) \end{aligned} \] In equation \(1\), for a given segment of data \((S)\), the term \(c\) refers to the number of differenct class levels, and \(p_{i}\) refers to the proportion of values falling into the cass level \(i\). For example, if we suppose we have a partiition of data with two classes: red (60 percent) and white (40 percent), we can calculate its entropy as

\[ \begin{aligned} -0.6 \times \log_{2}0.6 - 0.4 \times \log_{2}0.4 = 0.9709506 \end{aligned} \]

We can also examine the entropy for all possible two-class arrangements. If we know the proportion of examples in one class is \(x\), then the proportion in the other class is \(1 - x\). Below is an illustration of this arrangement

As illustrated above, the peak entropy is at \(x = 0.5\), a 50-50 split results in the maximum entropy.

Given the measure of purity, the algorithm must still decide which feature to split upon. In order to do so, the algorithm uses entropy to calculate the change in honmogeneity resulting from a split on each possible feature. This calculation is referred to as information gain. The information gain for a feature \(F\) is calculated as *the differences between the entropy in the segment before the split \((S_{1})\), and the partitions resulting from the split \((S_{2})\). That is,

\[ \begin{aligned} \text{InfoGain}(F) = \text{Entropy}(S_{1}) - \text{Entropy}(S_{2}) &&&& (2) \end{aligned} \]

One complication is that after a split, the data is divided into more than one partition. Therefore, the function to calculate \(\text{Entropy}(S_{2})\) needs to consider the total entropy across all of the partitions. In accomplishes this by weighing each partition’s entropy by the proportion of records falling into that partition, which can be expressed by the following formula

\[ \begin{aligned} \text{Entropy}(S) = \sum_{i=1}^{n}w_{i}\text{Entropy}(P_{i}) &&&& (3) \end{aligned} \]

The higher the information gain, the better a feature is at creating homogenous groups after a split on that feature.

5. Pruning the Decision Tree

Decision trees can continue to grow indefinitely by choosing splitting features and dividing them into smaller and smaller partitions until each example is perfectly classified, or the algorithm runs out of features to split on. However, if the tree grows overly large, many of the decisions it makes will be overly specific and the model will overfit the training data. To overcome this, we can prune a decision tree which involves a process of reducing its size such that it generalizes better to unobserved data.

One solution to this problem is to stop the tree from growing once it reaches a certain number of decisions, or if the decision node contains only a small number of examples. This method is known as pre-pruning, or early stopping the decision tree. As the tree avoids doing unnecessary work, this can quite often be an appealing strategy.

The C5.0 Algorithm

One of the benefits of the C5.0 algorithm is that it is opinionated about pruning; it takes care of many of the decisions automatically using fairly reasonable defaults. Its overall strategy is to postprune the tree. It does this by first growing a large tree that overfits the training data. Aftewrwards, nodes and branches that have little effect on the classification errors are removed.

6. Exampple: Identifying Risky Loans Using the C5.0 Decision Tree

In this example, we will be using the C5.0 algorithm to determine the creditworthiness of a loan applicant.

Loading and Inspecting the Data

library(kableExtra)
library(knitr)
credit.df <- read.csv('/Users/czar.yobero/Documents/credit.csv')
credit.df$default <- factor(credit.df$default, levels = c(1, 2), labels = c('Yes', 'No'))
str(credit.df)

## 'data.frame':    1000 obs. of  21 variables:
##  $ checking_balance    : Factor w/ 4 levels "< 0 DM","> 200 DM",..: 1 3 4 1 1 4 4 3 4 3 ...
##  $ months_loan_duration: int  6 48 12 42 24 36 24 36 12 30 ...
##  $ credit_history      : Factor w/ 5 levels "critical","delayed",..: 1 5 1 5 2 5 5 5 5 1 ...
##  $ purpose             : Factor w/ 10 levels "business","car (new)",..: 8 8 5 6 2 5 6 3 8 2 ...
##  $ amount              : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ savings_balance     : Factor w/ 5 levels "< 100 DM","> 1000 DM",..: 5 1 1 1 1 5 4 1 2 1 ...
##  $ employment_length   : Factor w/ 5 levels "> 7 yrs","0 - 1 yrs",..: 1 3 4 4 3 3 1 3 4 5 ...
##  $ installment_rate    : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ personal_status     : Factor w/ 4 levels "divorced male",..: 4 2 4 4 4 4 4 4 1 3 ...
##  $ other_debtors       : Factor w/ 3 levels "co-applicant",..: 3 3 3 2 3 3 3 3 3 3 ...
##  $ residence_history   : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ property            : Factor w/ 4 levels "building society savings",..: 3 3 3 1 4 4 1 2 3 2 ...
##  $ age                 : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ installment_plan    : Factor w/ 3 levels "bank","none",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ housing             : Factor w/ 3 levels "for free","own",..: 2 2 2 1 1 1 2 3 2 2 ...
##  $ existing_credits    : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ default             : Factor w/ 2 levels "Yes","No": 1 2 1 1 2 1 1 1 1 2 ...
##  $ dependents          : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ telephone           : Factor w/ 2 levels "none","yes": 2 1 1 1 1 2 1 2 1 1 ...
##  $ foreign_worker      : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ job                 : Factor w/ 4 levels "mangement self-employed",..: 2 2 4 2 2 4 2 1 4 1 ...

summary(credit.df)

##    checking_balance months_loan_duration                credit_history
##  < 0 DM    :274     Min.   : 4.0         critical              :293   
##  > 200 DM  : 63     1st Qu.:12.0         delayed               : 88   
##  1 - 200 DM:269     Median :18.0         fully repaid          : 40   
##  unknown   :394     Mean   :20.9         fully repaid this bank: 49   
##                     3rd Qu.:24.0         repaid                :530   
##                     Max.   :72.0                                      
##                                                                       
##        purpose        amount           savings_balance  employment_length
##  radio/tv  :280   Min.   :  250   < 100 DM     :603    > 7 yrs   :253    
##  car (new) :234   1st Qu.: 1366   > 1000 DM    : 48    0 - 1 yrs :172    
##  furniture :181   Median : 2320   101 - 500 DM :103    1 - 4 yrs :339    
##  car (used):103   Mean   : 3271   501 - 1000 DM: 63    4 - 7 yrs :174    
##  business  : 97   3rd Qu.: 3972   unknown      :183    unemployed: 62    
##  education : 50   Max.   :18424                                          
##  (Other)   : 55                                                          
##  installment_rate      personal_status      other_debtors
##  Min.   :1.000    divorced male: 50    co-applicant: 41  
##  1st Qu.:2.000    female       :310    guarantor   : 52  
##  Median :3.000    married male : 92    none        :907  
##  Mean   :2.973    single male  :548                      
##  3rd Qu.:4.000                                           
##  Max.   :4.000                                           
##                                                          
##  residence_history                     property        age       
##  Min.   :1.000     building society savings:232   Min.   :19.00  
##  1st Qu.:2.000     other                   :332   1st Qu.:27.00  
##  Median :3.000     real estate             :282   Median :33.00  
##  Mean   :2.845     unknown/none            :154   Mean   :35.55  
##  3rd Qu.:4.000                                    3rd Qu.:42.00  
##  Max.   :4.000                                    Max.   :75.00  
##                                                                  
##  installment_plan     housing    existing_credits default  
##  bank  :139       for free:108   Min.   :1.000    Yes:700  
##  none  :814       own     :713   1st Qu.:1.000    No :300  
##  stores: 47       rent    :179   Median :1.000             
##                                  Mean   :1.407             
##                                  3rd Qu.:2.000             
##                                  Max.   :4.000             
##                                                            
##    dependents    telephone  foreign_worker                      job     
##  Min.   :1.000   none:596   no : 37        mangement self-employed:148  
##  1st Qu.:1.000   yes :404   yes:963        skilled employee       :630  
##  Median :1.000                             unemployed non-resident: 22  
##  Mean   :1.155                             unskilled resident     :200  
##  3rd Qu.:1.000                                                          
##  Max.   :2.000                                                          
##

kable(head(credit.df, n=15), format="html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

checking_balance	months_loan_duration	credit_history	purpose	amount	savings_balance	employment_length	installment_rate	personal_status	other_debtors	residence_history	property	age	installment_plan	housing	existing_credits	default	dependents	telephone	foreign_worker	job
< 0 DM	6	critical	radio/tv	1169	unknown	> 7 yrs	4	single male	none	4	real estate	67	none	own	2	Yes	1	yes	yes	skilled employee
1 - 200 DM	48	repaid	radio/tv	5951	< 100 DM	1 - 4 yrs	2	female	none	2	real estate	22	none	own	1	No	1	none	yes	skilled employee
unknown	12	critical	education	2096	< 100 DM	4 - 7 yrs	2	single male	none	3	real estate	49	none	own	1	Yes	2	none	yes	unskilled resident
< 0 DM	42	repaid	furniture	7882	< 100 DM	4 - 7 yrs	2	single male	guarantor	4	building society savings	45	none	for free	1	Yes	2	none	yes	skilled employee
< 0 DM	24	delayed	car (new)	4870	< 100 DM	1 - 4 yrs	3	single male	none	4	unknown/none	53	none	for free	2	No	2	none	yes	skilled employee
unknown	36	repaid	education	9055	unknown	1 - 4 yrs	2	single male	none	4	unknown/none	35	none	for free	1	Yes	2	yes	yes	unskilled resident
unknown	24	repaid	furniture	2835	501 - 1000 DM	> 7 yrs	3	single male	none	4	building society savings	53	none	own	1	Yes	1	none	yes	skilled employee
1 - 200 DM	36	repaid	car (used)	6948	< 100 DM	1 - 4 yrs	2	single male	none	2	other	35	none	rent	1	Yes	1	yes	yes	mangement self-employed
unknown	12	repaid	radio/tv	3059	> 1000 DM	4 - 7 yrs	2	divorced male	none	4	real estate	61	none	own	1	Yes	1	none	yes	unskilled resident
1 - 200 DM	30	critical	car (new)	5234	< 100 DM	unemployed	4	married male	none	2	other	28	none	own	2	No	1	none	yes	mangement self-employed
1 - 200 DM	12	repaid	car (new)	1295	< 100 DM	0 - 1 yrs	3	female	none	1	other	25	none	rent	1	No	1	none	yes	skilled employee
< 0 DM	48	repaid	business	4308	< 100 DM	0 - 1 yrs	3	female	none	4	building society savings	24	none	rent	1	No	1	none	yes	skilled employee
1 - 200 DM	12	repaid	radio/tv	1567	< 100 DM	1 - 4 yrs	1	female	none	1	other	22	none	own	1	Yes	1	yes	yes	skilled employee
< 0 DM	24	critical	car (new)	1199	< 100 DM	> 7 yrs	4	single male	none	4	other	60	none	own	2	No	1	none	yes	unskilled resident
< 0 DM	15	repaid	car (new)	1403	< 100 DM	1 - 4 yrs	2	female	none	4	other	28	none	rent	1	Yes	1	none	yes	skilled employee

Data Preparation: Dividing Data into Training and Test Sets and Randomizing

As is common when using modeling techniques to predict the values given by the model, we will divide our data into training and test sets, using 90% of the data as our training set and the rest of the 10% as our test. But before we divide the data set, we must first randomize our data set since this particular data set isn’t random and could lead to misleading results.

Let us first randomize our data set.

set.seed(1978)
credit.random.df <- credit.df[order(runif(1000)), ]

Now that we have randomized our data set, we still must confirm that we have the same data frame sorted differently. We shall compare values of the amount feature across the two data frames.

summary(credit.df$amount)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     250    1366    2320    3271    3972   18424

summary(credit.random.df$amount)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     250    1366    2320    3271    3972   18424

Since the summary statistics are identical, this suggests that our random shuffle worked.

Splitting Data Set Into Training and Test Sets

credit.train <- credit.random.df[1:900, ]
credit.test <- credit.random.df[901:1000, ]

If all went well, we should have about \(30\%\) of the defaulted loans in each data set.

prop.table(table(credit.train$default))

## 
##       Yes        No 
## 0.6888889 0.3111111

prop.table(table(credit.test$default))

## 
## Yes  No 
## 0.8 0.2

This seems to be equally split.

7. Modeling Our Decision Tree

library(C50)
credit.fit <- C5.0(credit.train[-17], credit.train$default)
credit.fit

## 
## Call:
## C5.0.default(x = credit.train[-17], y = credit.train$default)
## 
## Classification Tree
## Number of samples: 900 
## Number of predictors: 20 
## 
## Tree size: 74 
## 
## Non-standard options: attempt to group attributes

summary(credit.fit)

## 
## Call:
## C5.0.default(x = credit.train[-17], y = credit.train$default)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Thu Jul 12 11:56:05 2018
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 900 cases (21 attributes) from undefined.data
## 
## Decision tree:
## 
## checking_balance = unknown: Yes (352/40)
## checking_balance in {< 0 DM,> 200 DM,1 - 200 DM}:
## :...credit_history in {fully repaid,fully repaid this bank}:
##     :...housing in {for free,rent}: No (32/4)
##     :   housing = own:
##     :   :...purpose in {car (used),domestic appliances,radio/tv,
##     :       :           retraining}: Yes (8/2)
##     :       purpose in {education,others,repairs}: No (3)
##     :       purpose = car (new):
##     :       :...months_loan_duration <= 22: No (7)
##     :       :   months_loan_duration > 22: Yes (2)
##     :       purpose = business:
##     :       :...installment_plan = bank: Yes (3)
##     :       :   installment_plan = stores: No (1)
##     :       :   installment_plan = none:
##     :       :   :...age <= 27: No (2)
##     :       :       age > 27: Yes (3)
##     :       purpose = furniture:
##     :       :...installment_plan in {none,stores}: Yes (3)
##     :           installment_plan = bank:
##     :           :...checking_balance in {< 0 DM,> 200 DM}: No (4)
##     :               checking_balance = 1 - 200 DM: Yes (1)
##     credit_history in {critical,delayed,repaid}:
##     :...months_loan_duration > 22:
##         :...checking_balance = > 200 DM:
##         :   :...dependents > 1: No (2)
##         :   :   dependents <= 1:
##         :   :   :...job = mangement self-employed: No (1)
##         :   :       job in {skilled employee,unemployed non-resident,
##         :   :               unskilled resident}: Yes (15/1)
##         :   checking_balance in {< 0 DM,1 - 200 DM}:
##         :   :...savings_balance = > 1000 DM: Yes (3)
##         :       savings_balance = 501 - 1000 DM: No (4/1)
##         :       savings_balance = 101 - 500 DM:
##         :       :...employment_length in {0 - 1 yrs,1 - 4 yrs}: No (13/2)
##         :       :   employment_length in {4 - 7 yrs,unemployed}: Yes (9/1)
##         :       :   employment_length = > 7 yrs:
##         :       :   :...job = mangement self-employed: No (1)
##         :       :       job in {skilled employee,unemployed non-resident,
##         :       :               unskilled resident}: Yes (3)
##         :       savings_balance = unknown:
##         :       :...checking_balance = 1 - 200 DM: Yes (17/1)
##         :       :   checking_balance = < 0 DM:
##         :       :   :...credit_history = critical: Yes (1)
##         :       :       credit_history in {delayed,repaid}: No (12/3)
##         :       savings_balance = < 100 DM:
##         :       :...dependents > 1:
##         :           :...other_debtors in {co-applicant,
##         :           :   :                 guarantor}: Yes (3)
##         :           :   other_debtors = none:
##         :           :   :...credit_history = critical: Yes (3)
##         :           :       credit_history in {delayed,repaid}: No (15/5)
##         :           dependents <= 1:
##         :           :...purpose in {business,car (new),domestic appliances,
##         :               :           repairs,retraining}: No (32/4)
##         :               purpose = others: Yes (3/1)
##         :               purpose = education:
##         :               :...employment_length in {> 7 yrs,1 - 4 yrs,
##         :               :   :                     unemployed}: No (5)
##         :               :   employment_length in {0 - 1 yrs,
##         :               :                         4 - 7 yrs}: Yes (2)
##         :               purpose = radio/tv:
##         :               :...job = mangement self-employed: Yes (2)
##         :               :   job in {skilled employee,unemployed non-resident,
##         :               :           unskilled resident}: No (19/5)
##         :               purpose = car (used):
##         :               :...residence_history <= 3:
##         :               :   :...personal_status = female: No (1)
##         :               :   :   personal_status in {divorced male,married male,
##         :               :   :                       single male}: Yes (4)
##         :               :   residence_history > 3:
##         :               :   :...amount <= 5965: Yes (2)
##         :               :       amount > 5965: No (5)
##         :               purpose = furniture:
##         :               :...checking_balance = 1 - 200 DM: No (5)
##         :                   checking_balance = < 0 DM:
##         :                   :...residence_history > 3: No (4)
##         :                       residence_history <= 3: [S1]
##         months_loan_duration <= 22:
##         :...months_loan_duration <= 8:
##             :...age <= 25: No (4/1)
##             :   age > 25:
##             :   :...amount <= 3001: Yes (37)
##             :       amount > 3001:
##             :       :...residence_history <= 3: Yes (2)
##             :           residence_history > 3: No (2)
##             months_loan_duration > 8:
##             :...other_debtors = co-applicant:
##                 :...installment_rate <= 2: Yes (4)
##                 :   installment_rate > 2:
##                 :   :...amount <= 1569: Yes (2)
##                 :       amount > 1569: No (3)
##                 other_debtors = guarantor:
##                 :...installment_plan in {none,stores}: Yes (15)
##                 :   installment_plan = bank:
##                 :   :...purpose = car (new): No (3)
##                 :       purpose in {business,car (used),domestic appliances,
##                 :                   education,furniture,others,radio/tv,
##                 :                   repairs,retraining}: Yes (3)
##                 other_debtors = none:
##                 :...credit_history in {critical,delayed}: Yes (66/14)
##                     credit_history = repaid:
##                     :...savings_balance = > 1000 DM: Yes (8)
##                         savings_balance in {< 100 DM,101 - 500 DM,
##                         :                   501 - 1000 DM,unknown}:
##                         :...checking_balance = < 0 DM:
##                             :...residence_history <= 2:
##                             :   :...installment_plan = bank: Yes (1)
##                             :   :   installment_plan in {none,
##                             :   :                        stores}: No (20/4)
##                             :   residence_history > 2:
##                             :   :...amount <= 1386: No (18/5)
##                             :       amount > 1386: Yes (19/6)
##                             checking_balance = > 200 DM: [S2]
##                             checking_balance = 1 - 200 DM:
##                             :...installment_plan = stores: No (1)
##                                 installment_plan = bank:
##                                 :...months_loan_duration <= 11: Yes (2)
##                                 :   months_loan_duration > 11: No (6)
##                                 installment_plan = none: [S3]
## 
## SubTree [S1]
## 
## employment_length in {> 7 yrs,4 - 7 yrs}: No (2)
## employment_length in {0 - 1 yrs,1 - 4 yrs,unemployed}: Yes (7)
## 
## SubTree [S2]
## 
## property in {building society savings,other}: Yes (11)
## property = unknown/none: No (1)
## property = real estate:
## :...job in {mangement self-employed,skilled employee,
##     :       unemployed non-resident}: Yes (3)
##     job = unskilled resident: No (3)
## 
## SubTree [S3]
## 
## employment_length = 4 - 7 yrs: Yes (9)
## employment_length in {> 7 yrs,0 - 1 yrs,1 - 4 yrs,unemployed}:
## :...job in {unemployed non-resident,unskilled resident}: Yes (12/2)
##     job = mangement self-employed:
##     :...property in {building society savings,other,real estate}: Yes (2)
##     :   property = unknown/none: No (2)
##     job = skilled employee:
##     :...savings_balance in {101 - 500 DM,501 - 1000 DM}: Yes (6/1)
##         savings_balance = < 100 DM:
##         :...telephone = none: No (9/1)
##         :   telephone = yes: Yes (3/1)
##         savings_balance = unknown:
##         :...personal_status in {female,married male}: No (3)
##             personal_status in {divorced male,single male}: Yes (4)
## 
## 
## Evaluation on training data (900 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##      74  105(11.7%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##     585    35    (a): class Yes
##      70   210    (b): class No
## 
## 
##  Attribute usage:
## 
##  100.00% checking_balance
##   60.89% credit_history
##   54.22% months_loan_duration
##   35.56% savings_balance
##   28.89% other_debtors
##   15.11% purpose
##   14.67% dependents
##   13.11% installment_plan
##   10.22% employment_length
##   10.00% amount
##    9.78% job
##    9.67% residence_history
##    7.67% housing
##    5.56% age
##    2.44% property
##    1.33% personal_status
##    1.33% telephone
##    1.00% installment_rate
## 
## 
## Time: 0.0 secs

The above output is the decision tree to determine whether or not a loan applicant is worthy of receiving a loan. It also shows which features are most important in determining which factors are most important when considering loan applications.