Back in October of 1974, the federal government enacted the Equal Credit Opportunity Act (ECOA), which is directed at ensuring that all legal persons have an equal chance to apply for a loan from banks, credit unions, or other financial firms. Further than this, the act prohibits discrimination against applicants for reasons that are unrelated to just qualifications. In other words are persons race, sex, marital status, or religion cannot be used as factors to decide if a person recieves or is denied a loan. Financial institutions therefore have to assess the risk involved with approving a loan by using factors such as a persons credit score, income, debt, employment history, and the purpose of the loan.
One type of loan that is fairly common is a Home Equity loan or (HELOC) which is in a sense a seconed mortgage, which allows individuals to gain access to more funds for purposes such as debt consolidation, home improvement, and other things. An individual seeking to buy a home usually gets a mortgage loan to secure the purchase. As the owner makes their monthly mortgage payments, they build up equity or the portion of the house/property that they actually own. The way the HELOC works is that the persons equity in their home is used as collateral. Therefore if someone is not paying of their home equity loan they can have their house taken.
It is of much interest to the lending institution to be able to predict which customers are more likely to default on their loans, and which are more likely to pay it back. The purpose of this analysis will be to use simple linear regression on a dataset containing information about past loan recipients and therefore be able to predict someones odds of defaulting on their Home equity loan.
\[ P(Y_i = 1|x_i) = \frac{e^{\beta_0+\beta_1 x_i}}{1+e^{\beta_0 + \beta_1 x_i}} = \pi_i \] Where
Variable | Explanation |
---|---|
\(Y_i =1\) | Applicant defaulted on loan or was seriously delinquent |
\(Y_i =0\) | Applicant paid off loan |
\(x_i\) | Applicants debt to income ratio out of 100, which can be greater than 100 |
The hypothesis test for \(\beta_1\) or the debt to income ratio is as follows. The test will be evaluated at the significance level of \(\alpha\) = .05
\[ H_0: \beta_1 = 0 \\ H_a: \beta_1 \neq 0 \]
datatable(BankLoan, options=list(lengthMenu = c(10,50)), style = "default")
bank.glm <- glm(BAD>0 ~ DEBTINC, data = BankLoan, family = binomial)
pander(summary(bank.glm))
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -5.415 | 0.2886 | -18.76 | 1.647e-78 |
DEBTINC | 0.08549 | 0.007667 | 11.15 | 7.117e-29 |
(Dispersion parameter for binomial family taken to be 1 )
Null deviance: | 2749 on 4692 degrees of freedom |
Residual deviance: | 2581 on 4691 degrees of freedom |
The results of the logistic model show that both the coefficients for \(\beta_0\) or the intercept and \(\beta_1\), the debt to income ratio, were significant. As a result the estimated model given is as follows,
\[ P(Y_i = 1|x_i) \approx \frac{e^{-5.415+0.08549 x_i}}{1+e^{-5.415 + 0.08549 x_i}} = \hat{\pi}_i \]
plot(BAD ~ DEBTINC, data= BankLoan, main="Loan Probabilities", ylab='Probability of Defaulting on loan',
xlab ='Debt to income Ratio', pch=16)
curve(exp(-5.414621+0.085487*x)/(1+exp(-5.414621+0.085487*x)),add=TRUE)
As was shown above the model is significant, however, the model must be a good fit in order to properly predict the probability that an individual would default on their HELOC. Below the Hosmer-Lemeshow goodness of fit test was performed on the model. The null hypothesis is that the model is already a good fit and the alternative that the model is not a good fit. The test was performed at a significance of \(\alpha\) = .05 and gives a p-value of well below that level. Therefore there is sufficient evidence to conclude the alternative that the model is not a good fit.
pander(hoslem.test(bank.glm$y, bank.glm$fitted, g=10))
Test statistic | df | P value |
---|---|---|
108.8 | 8 | 0 * * * |
In order to evaluate the affect that a persons debt to income ratio has on defauting on a loan the \(\beta_1\) coefficient will be plugged in. Therefore \(e^\beta{_1}\) = \(e^.085487\) = 1.0892, which shows that the odds that a person will default on their loan for increases by a factor of 1.0892 for every 1 % increase in a persons debt to income ratio. When looking at the plot of the regression, one can see that once a persons debt to income ratio is just above 60%, the odds that they will default on their loan is about 50%. A person reaches nearly a 100% change of defaulting on their loan when their debt to income ratio is about 120%, meaning that their debt is 20% higher than their income. While this model is not the best fit, it does make economic sense that someone who has a debt to income ratio above their income, will most likely default on their loan. It is interesting as well to look at the other end of the spectrum of people who have no debt. This model predicts that \[ P(Y_i = 1|x_i) \approx \frac{e^{-5.415+0.08549 *0}}{1+e^{-5.415 + 0.08549 * 0}} = .00443 \;or \;0.443 \;percent \] In the end this model needs to be improved by adding other predictive variables such as the amount of the loan, the persons credit score, work history, along with others.