Loan Data Analysis

Introduction

This data set contains loan data from https://www.kaggle.com/itssuru/loan-data. It contains various data like FICO score, interest rate, installment, purpose and some more.
This is an analysis based on loan data which includes various methods to perform exploratory data analysis to understand how FICO score is co-related to various other factors listed on the data set.

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

df <-read.csv("loan_data.csv")

Mean of the FICO score from the dataset can be calculated to see the average score that our dataset includes

mean(df$fico)

## [1] 710.8463

The standared deviation will be calculated to see how deviated the data is.

sd(df$fico)

## [1] 37.97054

Five number summary will be calculated below to quickly explore our dataset.

summary(df$fico)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   612.0   682.0   707.0   710.8   737.0   827.0

Now, the data will be provided with a graphical representation that organizes a group of data points into user-specified ranges. It helps condense data to visually interpret it with convinience.

hist(df$fico, breaks=10, main="Histogram of FICO score", xlab="FICO score", xlim=c(600,860), col="purple", probability = TRUE)
lines(density(df$fico), col = "Black")

Boxplot here will provide a good indication of values in the data that are spread out without using as much space as the histogram even though it may seem primitive.

boxplot(df$fico, main ="Box Plot Representing FICO scores", xlab = "Quartiles", ylab = "FICO scores")

QQ plot will be providing the insights on the distribution only buy only looking at the plot to see the FICO scores of more than 9000 individuals. It gives a clear idea as illustrated below

qqnorm(df$fico)
qqline(df$fico, col = "purple")

Yes, there are outliers that doesn’t fall under the average distribution which might affect the accuracy slightly. The distribution is symmetric ranging at 650 to 8000 but also shows slight positive skewness.

plot(df$fico, df$int.rate)

cor(df$fico, df$int.rate)

## [1] -0.7148208

Negative Coefficient represents negative relationship between these two variables which means they both move in the opposite direction. Higher the credit score, lower the interest rate.

plot(df$int.rate, df$installment)

cor(df$int.rate, df$installment)

## [1] 0.2761402

Here, positive coefficient represents positive relationship between two variables which co-responds to higher interest rate for bigger installment amounts.

table(df$purpose)

## 
##          all_other        credit_card debt_consolidation        educational 
##               2331               1262               3957                343 
##   home_improvement     major_purchase     small_business 
##                629                437                619

The categories above show frequency table which includes number of loans provided for various purposes like credit card, debt consodilation, educational, home improvement, major purchase, small business and all others.

table(df$purpose)/length(df$purpose)

## 
##          all_other        credit_card debt_consolidation        educational 
##         0.24337022         0.13176028         0.41313427         0.03581123 
##   home_improvement     major_purchase     small_business 
##         0.06567133         0.04562539         0.06462727

The table above shows relative frequency.

two_way_table <-table(df$purpose, df$not.fully.paid)
two_way_table

##                     
##                         0    1
##   all_other          1944  387
##   credit_card        1116  146
##   debt_consolidation 3354  603
##   educational         274   69
##   home_improvement    522  107
##   major_purchase      388   49
##   small_business      447  172

Here, 1 relates to loan amount not fully paid whereas 0 means loan amount fully paid. Most loans were given out for debt consolidation and it had most number of unpaid loans. It also had most amount to paid transactions fully paid, making it the best revenue generating source for the credit lendors. Very few educational loans were given out.

boxplot(df$fico ~ df$purpose, col = "orange", main = "Distribution of FICO score with loan purposes", xlab = "Purpose of loans", ylab = "FICO Score")

The side to side box plot above has data illustrating FICO scores on the right and purpose of loan on the right side. This plot shows clear data about the loans granted to most FICO scores below 750 and above 650 on average.

barplot(df$int.rate, main = "Bar Chart", xlab ="Interest Rate")

Barplot above clearly shows that most interest rates were above .15%. I was fascinated to see that FICO score has huge importance and it can significantly lower the interest rates. I realized that having a good FICO score will allow people to have a financially sound living.

Decision Trees

Decision tree analysis is the process of drawing a decision tree, which is a graphic representation of various alternative solutions that are available to solve a given problem, in order to determine the most effective courses of action. Decision trees are comprised of nodes and branches - nodes represent a test on an attribute and branches represent potential alternative outcomes.

library(rpart)
library(rpart.plot)


tree <- rpart(int.rate~.,data = df)
tree1 <- rpart(int.rate ~.,data = df,method = 'class')

library(dplyr)
df <- df%>% 
  mutate(int.rate = factor(int.rate == TRUE, levels = c(TRUE,FALSE),
  labels = c('high,low')))

tree1 <- rpart(int.rate ~.,data = df,method = 'class')
tree1

## n= 9578 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 9578 0 high,low2 (0.0000000 1.0000000) *

rpart.plot(tree)

The decision tree above shows that FICO score greatly affected the installment amounts. It also helps us divide the fico score>==710 at one glimpse and help us understand our datset. 27% of people who had FICO score greater or equal to 735 had installment amount lower than $324. 31% with FICO score greater than 680 but less than 710 had installment amount of $511 a month.

pred <- predict(tree1,df,type="class")
head(pred)

## [1] high,low2 high,low2 high,low2 high,low2 high,low2 high,low2
## Levels: high,low1 high,low2

We can see that there are majority number 2 rates in this data set which correlates to the model predicting FICO scores only equal or below 710.

predict(tree,df) %>%
  head()

##         1         2         3         4         5         6 
## 0.1198941 0.1278030 0.1278030 0.1071067 0.1451888 0.1071067

We can see that the tree is divided.

Confusion Table

confusion_table <- with(df, table(int.rate, pred))
confusion_table

##            pred
## int.rate    high,low1 high,low2
##   high,low1         0         0
##   high,low2         0      9578

The confusion table clearly explains that the rate2 interest rate datatypes are 9578 different datapoints in our dataset.

Now we will slice and train our data.

library(caret)

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

inTrain <- createDataPartition(y = df$int.rate, p = .66, list = FALSE)

## Warning in createDataPartition(y = df$int.rate, p = 0.66, list = FALSE): Some
## classes have no records ( high,low1 ) and these will be ignored

df_train <- df %>% slice(inTrain)
df_test <- df %>% slice(-inTrain)
dim(df_train)

## [1] 6322   14

dim(df_test)

## [1] 3256   14

Here we observe how important our data categories are for us to make a decision from the dataset.

imp <- varImp(tree)
head(imp)

##                      Overall
## credit.policy     0.08648839
## days.with.cr.line 0.03684838
## dti               0.11105553
## fico              1.33329254
## inq.last.6mths    0.20246152
## installment       0.94392436

imp %>% ggplot(aes(x = row.names(imp), weight = Overall)) +
  geom_bar()

Here, FICO score is the most important category in our dataset to make a prediction which is illustrated on the barplot above.

tree1 <- rpart(purpose ~ int.rate + fico ,data = df, method = "class")
tree1

## n= 9578 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 9578 5621 debt_consolidation (0.24 0.13 0.41 0.036 0.066 0.046 0.065)  
##   2) fico< 749.5 7908 4391 debt_consolidation (0.23 0.13 0.44 0.036 0.054 0.039 0.06) *
##   3) fico>=749.5 1670 1165 all_other (0.3 0.12 0.26 0.034 0.12 0.077 0.085) *

The tree above shows people with FICO score less than 750 mostly had debt consolidation loans whereas people above 750 had mostly other type of loans.

rpart.plot(tree1, extra = 2)

Above, the debt consolidation accounted for two categories. All interest rates above 0.1 was for debt consolidation whereas anything less than that was other types of loan.

tail(df)

##      credit.policy            purpose  int.rate installment log.annual.inc
## 9573             0 debt_consolidation high,low2       69.98       10.11047
## 9574             0          all_other high,low2      344.76       12.18075
## 9575             0          all_other high,low2      257.70       11.14186
## 9576             0 debt_consolidation high,low2       97.81       10.59663
## 9577             0   home_improvement high,low2      351.58       10.81978
## 9578             0 debt_consolidation high,low2      853.43       11.26446
##        dti fico days.with.cr.line revol.bal revol.util inq.last.6mths
## 9573  7.02  662          8190.042      2999       39.5              6
## 9574 10.39  672         10474.000    215372       82.1              2
## 9575  0.21  722          4380.000       184        1.1              5
## 9576 13.09  687          3450.042     10036       82.9              8
## 9577 19.18  692          1800.000         0        3.2              5
## 9578 16.28  732          4740.000     37879       57.0              6
##      delinq.2yrs pub.rec not.fully.paid
## 9573           0       0              1
## 9574           0       0              1
## 9575           0       0              1
## 9576           0       0              1
## 9577           0       0              1
## 9578           0       0              1

library(arules)

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## 
## Attaching package: 'arules'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)

We installed the libraries neede to pursue our project for analysis.

transactions(df)

## Warning: Column(s) 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 not logical or
## factor. Applying default discretization (see '? discretizeDF').

## Warning in discretize(x = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : The calculated breaks are: 0, 1, 1, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, : The calculated breaks are: 0, 0, 2, 33
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, : The calculated breaks are: 0, 0, 0, 13
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, : The calculated breaks are: 0, 0, 0, 5
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, : The calculated breaks are: 0, 0, 0, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## transactions in sparse format with
##  9578 transactions (rows) and
##  36 items (columns)

Total transactions were recorded to be 9578 rows and 37 columns.

colnames(df)[c(2,4,5,6,7)]

## [1] "purpose"        "installment"    "log.annual.inc" "dti"           
## [5] "fico"

Some column names on our dataset (df) were illustrated above.

df <- df %>% mutate(
  installment=(installment>0),
  log.annual.inc=(log.annual.inc>0)
)

as(df,"transactions")

## Warning: Column(s) 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14 not logical or factor.
## Applying default discretization (see '? discretizeDF').

## Warning in discretize(x = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : The calculated breaks are: 0, 1, 1, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, : The calculated breaks are: 0, 0, 2, 33
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, : The calculated breaks are: 0, 0, 0, 13
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, : The calculated breaks are: 0, 0, 0, 5
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, : The calculated breaks are: 0, 0, 0, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## transactions in sparse format with
##  9578 transactions (rows) and
##  32 items (columns)

We now know that there were 9578 rows and 33 colums in our dataset.

trans <- transactions(df)

## Warning: Column(s) 1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 14 not logical or factor.
## Applying default discretization (see '? discretizeDF').

## Warning in discretize(x = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : The calculated breaks are: 0, 1, 1, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, : The calculated breaks are: 0, 0, 2, 33
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, : The calculated breaks are: 0, 0, 0, 13
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, : The calculated breaks are: 0, 0, 0, 5
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

## Warning in discretize(x = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, : The calculated breaks are: 0, 0, 0, 1
##   Only unique breaks are used reducing the number of intervals. Look at ? discretize for details.

summary(trans)

## transactions as itemMatrix in sparse format with
##  9578 rows (elements/itemsets/transactions) and
##  32 columns (items) and a density of 0.4375 
## 
## most frequent items:
## credit.policy=[0,1]  int.rate=high,low2         installment      log.annual.inc 
##                9578                9578                9578                9578 
##  delinq.2yrs=[0,13]             (Other) 
##                9578               86202 
## 
## element (itemset/transaction) length distribution:
## sizes
##   14 
## 9578 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      14      14      14      14      14      14 
## 
## includes extended item information - examples:
##                labels     variables      levels
## 1 credit.policy=[0,1] credit.policy       [0,1]
## 2   purpose=all_other       purpose   all_other
## 3 purpose=credit_card       purpose credit_card
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2             2
## 3             3

We now summarized our transaction to understand essential information regarding frequency, Mean and Median.

colnames(trans)

##  [1] "credit.policy=[0,1]"                  
##  [2] "purpose=all_other"                    
##  [3] "purpose=credit_card"                  
##  [4] "purpose=debt_consolidation"           
##  [5] "purpose=educational"                  
##  [6] "purpose=home_improvement"             
##  [7] "purpose=major_purchase"               
##  [8] "purpose=small_business"               
##  [9] "int.rate=high,low1"                   
## [10] "int.rate=high,low2"                   
## [11] "installment"                          
## [12] "log.annual.inc"                       
## [13] "dti=[0,9.13)"                         
## [14] "dti=[9.13,16.1)"                      
## [15] "dti=[16.1,30]"                        
## [16] "fico=[612,687)"                       
## [17] "fico=[687,727)"                       
## [18] "fico=[727,827]"                       
## [19] "days.with.cr.line=[179,3.3e+03)"      
## [20] "days.with.cr.line=[3.3e+03,5.16e+03)" 
## [21] "days.with.cr.line=[5.16e+03,1.76e+04]"
## [22] "revol.bal=[0,4.74e+03)"               
## [23] "revol.bal=[4.74e+03,1.42e+04)"        
## [24] "revol.bal=[1.42e+04,1.21e+06]"        
## [25] "revol.util=[0,30.8)"                  
## [26] "revol.util=[30.8,62.2)"               
## [27] "revol.util=[62.2,119]"                
## [28] "inq.last.6mths=[0,2)"                 
## [29] "inq.last.6mths=[2,33]"                
## [30] "delinq.2yrs=[0,13]"                   
## [31] "pub.rec=[0,5]"                        
## [32] "not.fully.paid=[0,1]"

All of the column names were identified above to have a better understanding of the categories we are dealing with without having to scroll through the original dataset file.

inspect(trans[1:3])

##     items                                   transactionID
## [1] {credit.policy=[0,1],                                
##      purpose=debt_consolidation,                         
##      int.rate=high,low2,                                 
##      installment,                                        
##      log.annual.inc,                                     
##      dti=[16.1,30],                                      
##      fico=[727,827],                                     
##      days.with.cr.line=[5.16e+03,1.76e+04],              
##      revol.bal=[1.42e+04,1.21e+06],                      
##      revol.util=[30.8,62.2),                             
##      inq.last.6mths=[0,2),                               
##      delinq.2yrs=[0,13],                                 
##      pub.rec=[0,5],                                      
##      not.fully.paid=[0,1]}                              1
## [2] {credit.policy=[0,1],                                
##      purpose=credit_card,                                
##      int.rate=high,low2,                                 
##      installment,                                        
##      log.annual.inc,                                     
##      dti=[9.13,16.1),                                    
##      fico=[687,727),                                     
##      days.with.cr.line=[179,3.3e+03),                    
##      revol.bal=[1.42e+04,1.21e+06],                      
##      revol.util=[62.2,119],                              
##      inq.last.6mths=[0,2),                               
##      delinq.2yrs=[0,13],                                 
##      pub.rec=[0,5],                                      
##      not.fully.paid=[0,1]}                              2
## [3] {credit.policy=[0,1],                                
##      purpose=debt_consolidation,                         
##      int.rate=high,low2,                                 
##      installment,                                        
##      log.annual.inc,                                     
##      dti=[9.13,16.1),                                    
##      fico=[612,687),                                     
##      days.with.cr.line=[3.3e+03,5.16e+03),               
##      revol.bal=[0,4.74e+03),                             
##      revol.util=[0,30.8),                                
##      inq.last.6mths=[0,2),                               
##      delinq.2yrs=[0,13],                                 
##      pub.rec=[0,5],                                      
##      not.fully.paid=[0,1]}                              3

We now inspected our data to find the results above.

image(trans)

The image consists of Items (Columns) which corresponds to Transactions (Row).

itemFrequencyPlot(trans,topN = 20)

The plot above is a frequency plot showing us the frequency of columns in the dataset.

vertical <- as(trans, "tidLists")
as(vertical, "matrix")[1:10, 1:5]

##                                1     2     3     4     5
## credit.policy=[0,1]         TRUE  TRUE  TRUE  TRUE  TRUE
## purpose=all_other          FALSE FALSE FALSE FALSE FALSE
## purpose=credit_card        FALSE  TRUE FALSE FALSE  TRUE
## purpose=debt_consolidation  TRUE FALSE  TRUE  TRUE FALSE
## purpose=educational        FALSE FALSE FALSE FALSE FALSE
## purpose=home_improvement   FALSE FALSE FALSE FALSE FALSE
## purpose=major_purchase     FALSE FALSE FALSE FALSE FALSE
## purpose=small_business     FALSE FALSE FALSE FALSE FALSE
## int.rate=high,low1         FALSE FALSE FALSE FALSE FALSE
## int.rate=high,low2          TRUE  TRUE  TRUE  TRUE  TRUE

The table above shows us correlation within our dataset.

trans

## transactions in sparse format with
##  9578 transactions (rows) and
##  32 items (columns)

its <- apriori(trans, parameter=list(target = "frequent"))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##          NA    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen            target  ext
##      10 frequent itemsets TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 957 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[31 item(s), 9578 transaction(s)] done [0.01s].
## sorting and recoding items ... [27 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10

## Warning in apriori(trans, parameter = list(target = "frequent")): Mining stopped
## (maxlen reached). Only patterns up to a length of 10 returned!

##  done [0.07s].
## sorting transactions ... done [0.00s].
## writing ... [18943 set(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

its

## set of 18943 itemsets

inspect(head(its, n = 10))

##      items                             support   transIdenticalToItemsets count
## [1]  {purpose=credit_card}             0.1317603 0                        1262 
## [2]  {purpose=all_other}               0.2433702 0                        2331 
## [3]  {fico=[612,687)}                  0.2817916 0                        2699 
## [4]  {days.with.cr.line=[179,3.3e+03)} 0.3319065 0                        3179 
## [5]  {revol.util=[0,30.8)}             0.3330549 0                        3190 
## [6]  {dti=[0,9.13)}                    0.3331593 0                        3191 
## [7]  {dti=[9.13,16.1)}                 0.3332637 0                        3192 
## [8]  {revol.bal=[0,4.74e+03)}          0.3332637 0                        3192 
## [9]  {revol.bal=[4.74e+03,1.42e+04)}   0.3333681 0                        3193 
## [10] {revol.bal=[1.42e+04,1.21e+06]}   0.3333681 0                        3193

Above we the support for all our columns.

ggplot(tibble(`Itemset Size` = factor(size(its))), aes(`Itemset Size`)) + geom_bar()

A bell curve is established making us easier to see item size.

rules <- apriori(trans, parameter = list(support = 0.05, confidence = 0.9))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 478 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[31 item(s), 9578 transaction(s)] done [0.01s].
## sorting and recoding items ... [29 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10

## Warning in apriori(trans, parameter = list(support = 0.05, confidence = 0.9)):
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!

##  done [0.15s].
## writing ... [190010 rule(s)] done [0.02s].
## creating S4 object  ... done [0.04s].

inspect(head(rules))

##     lhs    rhs                    support confidence coverage lift count
## [1] {}  => {credit.policy=[0,1]}  1       1          1        1    9578 
## [2] {}  => {not.fully.paid=[0,1]} 1       1          1        1    9578 
## [3] {}  => {pub.rec=[0,5]}        1       1          1        1    9578 
## [4] {}  => {delinq.2yrs=[0,13]}   1       1          1        1    9578 
## [5] {}  => {log.annual.inc}       1       1          1        1    9578 
## [6] {}  => {installment}          1       1          1        1    9578

plot(rules,jitter = 1)

The plot above illustrates the correlation of support and confidence.

plot(rules, shading = "order")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

Here we saw several rules in the scatterplot.

plot(head(rules, n = 100), method = "graph")

We conclude the project with the rule plot of our dataset.

Loan Data Analysis

Vivek Adhikari

9/13/2021