Interest Rates and FICO Scores

Introduction:

In the lending industry, loans provided to potential borrowers rely heavily on the credit worthiness of the borrowers. Unsecured loans such as credit cards and student loans carry no collateral and can potentially carry higher default risks compared to car or mortgage loans. Default risk is defined as the possibility that a borrower will not be able to pay back the principal or interest associated with a loan [1].

Lenders normally collect data on amount of the loans, places of employment, annual income, length of employment, type of home ownership or anything that can demonstrate the consumers ability to manage and repay their obligations. They also checks the borrowers' credit report as represented by their FICO score. FICO (Fair Isaac Corporation) score reflects all the information that each of the credit bureau collects on the borrowers such as:payment history, amounts owed, length of credit history, new credit and types of credit used [2]. FICO scores range from 300-850, representing the worse to best scores in credit worthiness.

Lenders have various predictive models that with the FICO scores can determine whether to approve the loan applications and, if approval is merited, the interest rates that they will charge the borrowers. It will be interesting to explore the relationships between the interest rate of the loan and FICO score.

Data:

The data used in this inferential analysis consists of a sample of 2,500 peer-to-peer loans issued through the Lending Club, an online financial community whose loans are made by WebBank (a Utah-chartered Industrial Bank) (https://www.lendingclub.com/home.action) and downloaded from https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv on March 8, 2014.

The data were collected from borrowers/loan applicants (the cases of the study) who voluntarily provided the information. The following variables (with data type and explanation of meaning) comprises the data set:

. Amount.Requested: (numeric) The amount (in dollars) requested in the loan application.

. Amount.Funded.By.Investors: (numeric) The amount (in dollars) loaned to the individual.

. Interest.Rate: (character) The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.

. Loan.Length: (character) The length of time (in months) of the loan.

. Loan.Purpose: (categorical variable) The purpose of the loan i.e., “credit_card”, “debt_consolidation”, “educational”, “major_purchase”, “small_business”, etc.

. Debt.To.Income.Ratio: (character) The debt-to-income ratio of the borrower (amount of debt divided by annual income).

. State: (character) The abbreviation for the U.S. state of residence of the loan applicant.

. Home.Ownership: (character) Types of home ownership, i.e., “MORTGAGE”, “OWN”, “RENT”, “OTHER”,“NONE.”

. Monthly.Income: (numeric) The monthly income of the applicant.

. FICO.Range: (categorical) (expressed as a string label e.g. “650-655”). A range indicating the applicants FICO score.

. Open.CREDIT.Lines: (numeric) The number of open lines of credit at the time of application.

. Revolving.CREDIT.Balance: (numeric) The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).

. Inquiries.in.the.Last.6.Months: (numeric) The borrower's number of inquiries by creditors in the last 6 months.

. Employment.Length: (character) The borrower's length of employment at current job.

Exploratory data analysis:

Exploratory analysis began with data quality checks such as: a) conversion of character variable interest rates to numeric and b) inspection of data for missing values. Results are presented below:

str(loans)
## 'data.frame':    2500 obs. of  14 variables:
##  $ Amount.Requested              : int  20000 19200 35000 10000 12000 6000 10000 33500 14675 7000 ...
##  $ Amount.Funded.By.Investors    : num  20000 19200 35000 9975 12000 ...
##  $ Interest.Rate                 : num  8.9 12.12 21.98 9.99 11.71 ...
##  $ Loan.Length                   : Factor w/ 2 levels "36 months","60 months": 1 1 2 1 1 1 1 2 1 1 ...
##  $ Loan.Purpose                  : Factor w/ 14 levels "car","credit_card",..: 3 3 3 3 2 10 3 2 2 2 ...
##  $ Debt.To.Income.Ratio          : Factor w/ 1669 levels "0%","0.04%","0.17%",..: 390 1178 1000 346 657 775 1102 374 1129 1488 ...
##  $ State                         : Factor w/ 46 levels "AK","AL","AR",..: 37 39 5 16 28 7 19 18 5 5 ...
##  $ Home.Ownership                : Factor w/ 5 levels "MORTGAGE","NONE",..: 1 1 1 1 5 4 5 1 5 5 ...
##  $ Monthly.Income                : num  6542 4583 11500 3833 3195 ...
##  $ FICO.Range                    : Factor w/ 38 levels "640-644","645-649",..: 20 16 11 12 12 7 17 14 10 16 ...
##  $ Open.CREDIT.Lines             : int  14 12 14 10 11 17 10 12 9 8 ...
##  $ Revolving.CREDIT.Balance      : int  14272 11140 21977 9346 14469 10391 15957 27874 7246 7612 ...
##  $ Inquiries.in.the.Last.6.Months: int  2 1 1 0 0 2 0 0 1 0 ...
##  $ Employment.Length             : Factor w/ 12 levels "< 1 year","1 year",..: 1 4 4 7 11 5 3 3 10 5 ...
summary(loans$Interest.Rate)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.42   10.20   13.10   13.10   15.80   24.90
summary(loans$FICO.Range)
## 640-644 645-649 650-654 655-659 660-664 665-669 670-674 675-679 680-684 
##       5       3       1       4     125     145     171     166     157 
## 685-689 690-694 695-699 700-704 705-709 710-714 715-719 720-724 725-729 
##     138     140     153     131     134     112      93     114      94 
## 730-734 735-739 740-744 745-749 750-754 755-759 760-764 765-769 770-774 
##      94      65      53      54      61      46      46      36      17 
## 775-779 780-784 785-789 790-794 795-799 800-804 805-809 810-814 815-819 
##      22      28      19      20      13      13      11       8       6 
## 820-824 830-834 
##       1       1

There were no missing values in both interest rate (a continuous numerical variable) and FICO range (an ordinal categorical variable). However, from both the summary and the boxplot shown below, there are FICO ranges that have less than 30 observations. These ranges are: 640-644, 645-649, 650-654, 655-659, and all the ranges above 810. It is best to remove these observations if we are to compare means of the FICO ranges.

boxplot(loans$Interest.Rate ~ loans$FICO.Range, xlab = "FICO Range", ylab = "Interest Rate")

plot of chunk unnamed-chunk-3

LendingClub = loans
loans = loans[-(which(loans$FICO.Range == "830-834")), ]
loans = loans[-(which(loans$FICO.Range == "820-824")), ]
loans = loans[-(which(loans$FICO.Range == "650-654")), ]
loans = subset(loans, loans$FICO.Range != "640-644" & loans$FICO.Range != "645-649")
loans = subset(loans, loans$FICO.Range != "655-659")
loans = subset(loans, loans$FICO.Range != "770-774" & loans$FICO.Range != "775-779")
loans = subset(loans, loans$FICO.Range != "780-784" & loans$FICO.Range != "785-789")
loans = subset(loans, loans$FICO.Range != "790-794" & loans$FICO.Range != "795-799")
loans = subset(loans, loans$FICO.Range != "800-804" & loans$FICO.Range != "805-809")
loans = subset(loans, loans$FICO.Range != "810-814" & loans$FICO.Range != "815-819")
nrow(loans)
## [1] 2328
boxplot(loans$Interest.Rate ~ loans$FICO.Range, xlab = "FICO Range", ylab = "Interest Rate")

plot of chunk unnamed-chunk-3

The above boxplot is the resulting plot after the removal of those FICO range that have less than 10 observations.

This study is an observational study for two resons: 1) lack of treatment or randomization and 2) the nature of the data collection (volunteered information fom borrowers).

Inference:

Before any analysis, several conditions have to be met. First is to check for independence. It is reasonable to assume that the observations are independent and represents less than 10% of the population. The test for normality is shown below fora few of the FICO ranges. The qqplots for “660-664”“ to "695-699” appear to be nearly normal.


FICO662 = subset(loans, loans$FICO.Range == "660-664")
FICO667 = subset(loans, loans$FICO.Range == "665-669")
FICO672 = subset(loans, loans$FICO.Range == "670-674")
FICO677 = subset(loans, loans$FICO.Range == "675-679")
FICO682 = subset(loans, loans$FICO.Range == "680-684")
FICO687 = subset(loans, loans$FICO.Range == "685-689")
FICO692 = subset(loans, loans$FICO.Range == "690-694")
FICO697 = subset(loans, loans$FICO.Range == "695-699")
FICO702 = subset(loans, loans$FICO.Range == "700-704")
FICO707 = subset(loans, loans$FICO.Range == "705-709")
FICO712 = subset(loans, loans$FICO.Range == "710-714")
FICO717 = subset(loans, loans$FICO.Range == "715-719")
FICO722 = subset(loans, loans$FICO.Range == "720-724")
FICO727 = subset(loans, loans$FICO.Range != "725-729")
FICO732 = subset(loans, loans$FICO.Range != "730-734")
FICO737 = subset(loans, loans$FICO.Range != "735-739")

par(mfrow = c(4, 4))
# qqnorm(loans$Interest.Rate); hist(loans$Interest.Rate)
qqnorm(FICO662$Interest.Rate)
hist(FICO662$Interest.Rate)
qqnorm(FICO667$Interest.Rate)
hist(FICO667$Interest.Rate)
qqnorm(FICO672$Interest.Rate)
hist(FICO672$Interest.Rate)
qqnorm(FICO677$Interest.Rate)
hist(FICO677$Interest.Rate)
qqnorm(FICO682$Interest.Rate)
hist(FICO682$Interest.Rate)
qqnorm(FICO687$Interest.Rate)
hist(FICO687$Interest.Rate)
qqnorm(FICO692$Interest.Rate)
hist(FICO692$Interest.Rate)
qqnorm(FICO697$Interest.Rate)
hist(FICO697$Interest.Rate)

plot of chunk unnamed-chunk-4

# qqnorm(FICO702$Interest.Rate);hist(FICO702$Interest.Rate)
# qqnorm(FICO707$Interest.Rate);hist(FICO707$Interest.Rate)

# sort(table(by(loans$Interest.Rate, loans$FICO.Range, var))
loans = subset(loans, loans$FICO.Range != "765-769")
loans = na.omit(loans)

From the boxplots presented earlier, it can be assumed that the FICO ranges have roughly equal variability, except forabout 8-10 of them, i.e., the variance of “720-724” is 10.4 compared to the 8-9 of most of the FICO ranges. FICO.Range “765-769” has a variance of 2.735 and a 36 number of observations. This range was also deleted from the analysis. The R statement used to calculate the Variance is “by(loans$Interest.Rate, loans$FICO.Range, var).”“

source("http://bit.ly/dasi_inference")
inference(y = loans$Interest.Rate, x = loans$FICO.Range, est = "mean", type = "ht", 
    null = 0, alternative = "greater", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Warning: Ignoring null value since it's undefined for ANOVA.
## ANOVA
## Summary statistics:
## n_640-644 = NA, mean_640-644 = NA, sd_640-644 = NA
## n_645-649 = NA, mean_645-649 = NA, sd_645-649 = NA
## n_650-654 = NA, mean_650-654 = NA, sd_650-654 = NA
## n_655-659 = NA, mean_655-659 = NA, sd_655-659 = NA
## n_660-664 = 125, mean_660-664 = 18.49, sd_660-664 = 2.896
## n_665-669 = 145, mean_665-669 = 17.45, sd_665-669 = 3.03
## n_670-674 = 171, mean_670-674 = 16.25, sd_670-674 = 2.844
## n_675-679 = 166, mean_675-679 = 15.85, sd_675-679 = 2.844
## n_680-684 = 157, mean_680-684 = 15.13, sd_680-684 = 2.634
## n_685-689 = 137, mean_685-689 = 14.69, sd_685-689 = 3.025
## n_690-694 = 140, mean_690-694 = 14.73, sd_690-694 = 3.153
## n_695-699 = 153, mean_695-699 = 14.15, sd_695-699 = 3.074
## n_700-704 = 131, mean_700-704 = 13.36, sd_700-704 = 2.948
## n_705-709 = 134, mean_705-709 = 12.66, sd_705-709 = 3.092
## n_710-714 = 112, mean_710-714 = 12.43, sd_710-714 = 3.225
## n_715-719 = 93, mean_715-719 = 11.18, sd_715-719 = 2.826
## n_720-724 = 114, mean_720-724 = 11.04, sd_720-724 = 2.761
## n_725-729 = 94, mean_725-729 = 10.65, sd_725-729 = 3.222
## n_730-734 = 94, mean_730-734 = 9.956, sd_730-734 = 2.569
## n_735-739 = 65, mean_735-739 = 9.623, sd_735-739 = 2.874
## n_740-744 = 53, mean_740-744 = 9.592, sd_740-744 = 2.535
## n_745-749 = 54, mean_745-749 = 9.902, sd_745-749 = 2.845
## n_750-754 = 61, mean_750-754 = 8.468, sd_750-754 = 2.447
## n_755-759 = 46, mean_755-759 = 8.996, sd_755-759 = 2.427
## n_760-764 = 46, mean_760-764 = 8.628, sd_760-764 = 2.522
## n_765-769 = NA, mean_765-769 = NA, sd_765-769 = NA
## n_770-774 = NA, mean_770-774 = NA, sd_770-774 = NA
## n_775-779 = NA, mean_775-779 = NA, sd_775-779 = NA
## n_780-784 = NA, mean_780-784 = NA, sd_780-784 = NA
## n_785-789 = NA, mean_785-789 = NA, sd_785-789 = NA
## n_790-794 = NA, mean_790-794 = NA, sd_790-794 = NA
## n_795-799 = NA, mean_795-799 = NA, sd_795-799 = NA
## n_800-804 = NA, mean_800-804 = NA, sd_800-804 = NA
## n_805-809 = NA, mean_805-809 = NA, sd_805-809 = NA
## n_810-814 = NA, mean_810-814 = NA, sd_810-814 = NA
## n_815-819 = NA, mean_815-819 = NA, sd_815-819 = NA
## n_820-824 = NA, mean_820-824 = NA, sd_820-824 = NA
## n_830-834 = NA, mean_830-834 = NA, sd_830-834 = NA
## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
## 
## Response: y
##             Df Sum Sq Mean Sq F value Pr(>F)
## x           20  17887     894     106 <2e-16
## Residuals 2270  19150       8               
## 
## Pairwise tests: t tests with pooled SD 
##         660-664 665-669 670-674 675-679 680-684 685-689 690-694 695-699
## 665-669  0.0032      NA      NA      NA      NA      NA      NA      NA
## 670-674  0.0000   3e-04      NA      NA      NA      NA      NA      NA
## 675-679  0.0000   0e+00  0.2135      NA      NA      NA      NA      NA
## 680-684  0.0000   0e+00  0.0005  0.0244      NA      NA      NA      NA
## 685-689  0.0000   0e+00  0.0000  0.0005  0.1953      NA      NA      NA
## 690-694  0.0000   0e+00  0.0000  0.0008  0.2426  0.8968      NA      NA
## 695-699  0.0000   0e+00  0.0000  0.0000  0.0030  0.1141  0.0851      NA
## 700-704  0.0000   0e+00  0.0000  0.0000  0.0000  0.0002  0.0001  0.0224
## 705-709  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 710-714  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 715-719  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 720-724  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 725-729  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 730-734  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 735-739  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 740-744  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 745-749  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 750-754  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 755-759  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 760-764  0.0000   0e+00  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
##         700-704 705-709 710-714 715-719 720-724 725-729 730-734 735-739
## 665-669      NA      NA      NA      NA      NA      NA      NA      NA
## 670-674      NA      NA      NA      NA      NA      NA      NA      NA
## 675-679      NA      NA      NA      NA      NA      NA      NA      NA
## 680-684      NA      NA      NA      NA      NA      NA      NA      NA
## 685-689      NA      NA      NA      NA      NA      NA      NA      NA
## 690-694      NA      NA      NA      NA      NA      NA      NA      NA
## 695-699      NA      NA      NA      NA      NA      NA      NA      NA
## 700-704      NA      NA      NA      NA      NA      NA      NA      NA
## 705-709  0.0509      NA      NA      NA      NA      NA      NA      NA
## 710-714  0.0135  0.5421      NA      NA      NA      NA      NA      NA
## 715-719  0.0000  0.0002  0.0022      NA      NA      NA      NA      NA
## 720-724  0.0000  0.0000  0.0003  0.7188      NA      NA      NA      NA
## 725-729  0.0000  0.0000  0.0000  0.2132  0.3442      NA      NA      NA
## 730-734  0.0000  0.0000  0.0000  0.0039  0.0077  0.1002      NA      NA
## 735-739  0.0000  0.0000  0.0000  0.0009  0.0018  0.0281  0.4778      NA
## 740-744  0.0000  0.0000  0.0000  0.0015  0.0028  0.0336  0.4658  0.9536
## 745-749  0.0000  0.0000  0.0000  0.0100  0.0182  0.1299  0.9125  0.6031
## 750-754  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0019  0.0257
## 755-759  0.0000  0.0000  0.0000  0.0000  0.0001  0.0015  0.0663  0.2623
## 760-764  0.0000  0.0000  0.0000  0.0000  0.0000  0.0001  0.0111  0.0753
##         740-744 745-749 750-754 755-759
## 665-669      NA      NA      NA      NA
## 670-674      NA      NA      NA      NA
## 675-679      NA      NA      NA      NA
## 680-684      NA      NA      NA      NA
## 685-689      NA      NA      NA      NA
## 690-694      NA      NA      NA      NA
## 695-699      NA      NA      NA      NA
## 700-704      NA      NA      NA      NA
## 705-709      NA      NA      NA      NA
## 710-714      NA      NA      NA      NA
## 715-719      NA      NA      NA      NA
## 720-724      NA      NA      NA      NA
## 725-729      NA      NA      NA      NA
## 730-734      NA      NA      NA      NA
## 735-739      NA      NA      NA      NA
## 740-744      NA      NA      NA      NA
## 745-749  0.5817      NA      NA      NA
## 750-754  0.0393  0.0083      NA      NA
## 755-759  0.3085  0.1203  0.3516      NA
## 760-764  0.0995  0.0289  0.7780   0.543

plot of chunk unnamed-chunk-5

It can be concluded from the ANOVA analyis that there is no convincing evidence that the means are equal and thus, the null hypothesis is rejected. Although FICO scores is perhaps the primary factor in setting the interest rates,it should be noted that there are other factors that influence its determination. This is quite evident from the right skewed shape of the histogram and the presence of a range of interest rates for for each range of FICO Scores. These observations are likely from the other metrics that the Lending Club uses to determine the interest rate corresponding to the perceived default risk of the borrowers.

References:

  1. wiseGEEk. What is a Default Risk? URL: http://www.wisegeek.com/what-is-a-default-risk.htm Accessed 04/04/2013.

  2. myFICO. About FICO Score URL: http://www.myfico.com/Downloads/Files/myFICO_UYFS_Booklet.pdf. Accessed 04/05/2014.

  3. https://www.lendingclub.com/public/how-we-set-interest-rates.action Accessed 03/08/2014.