ANALYSIS: Effect of Credit Limit and Education on Credit Card Default (in Taiwan) --- (PART 2: LOGISTIC REGRESSION MODEL #1)

Sameer Mathur

YOUR BOSS WANTS TO KNOW:

How does the PROBABILITY of credit card DEFAULT depend on the level of EDUCATION of the customer and the CREDIT LIMIT offered to the customer?

YOUR BOSS WANTS TO KNOW:

(A) How much does the PROBABILITY of credit card default change, if the CREDIT LIMIT of customers without college education (i.e. Education = '1'), is raised by $10,000?

YOUR BOSS WANTS TO KNOW:

(B) How much does the PROBABILITY of credit card default change, if the CREDIT LIMIT of customers having a bachelor's degree (i.e. Education = '2'), is raised by $10,000?

Part 3: Logistic Regression Model #1

Model 1 -- Framework

$ Default = \beta_0 + \beta_1*CreditLimit + \beta_2*Education $

Model 1 -- Logistic Regression

$ log\frac{p}{1-p} = \beta_0 + \beta_1*CreditLimit + \beta_2*Education2 + \beta_3*Education3 + \beta_4*Education4 $ ….(1)

# Model 1 -- R Implementation

# fitting logistic regression model 
Model1 <- glm(Default ~ CreditLimit
                      + Education,
                       data = CCdefault.dt, 
                       family = binomial())
# summary of the model
summary(Model1)


Call:
glm(formula = Default ~ CreditLimit + Education, family = binomial(), 
    data = CCdefault.dt)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.5123   0.4299   0.6503   0.7812   0.8824  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  8.089e-01  3.421e-02  23.646  < 2e-16 ***
CreditLimit  3.199e-06  1.307e-07  24.482  < 2e-16 ***
Education2  -7.351e-02  3.284e-02  -2.239 0.025164 *  
Education3  -9.840e-02  4.266e-02  -2.307 0.021059 *  
Education4   1.342e+00  3.909e-01   3.432 0.000599 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31427  on 29600  degrees of freedom
Residual deviance: 30629  on 29596  degrees of freedom
AIC: 30639

Number of Fisher Scoring iterations: 5

Model 1 -- beta coefficients

$ log\frac{p}{1-p} = \beta_0 + \beta_1*CreditLimit + \beta_2*Education2 + \beta_3*Education3 + \beta_4*Education4 $ ….(1)

$ \beta_0 = 0.8089 $,

$ \beta_1 = 0.000003199 $

$ \beta_2 = -0.07351 $

$ \beta_3 = -0.09840 $

$ \beta_4 = 1.342 $

Solution 1

Probability of Default of Consumers, having Education = “1” & Credit Limit = $100,000

$ log\frac{p}{1-p} = \beta_0 + \beta_1*CreditLimit + \beta_2*Education2 + \beta_3*Education3 + \beta_4*Education4 $ ….(1)

$ \beta_0 = 0.8089 $, $ \beta_1 = 0.000003199 $

$ CreditLimit = 100000 $, $ Education = 1 $

$ log\frac{p}{1-p} = 0.8089 + 0.000003199*100,000 $

$ log\frac{p}{1-p} = 1.1288 $

$ p = \frac{exp(1.1288)}{1+exp(1.1288)} $

$ p = 0.7556174 $

75.56% Probability of Default among Consumers having Credit Limit = $100,000 & Education = “1”

Solution 1 (contd.)

Probability of Default of Consumers, having Education = “2” & Credit Limit = $100,000

$ log\frac{p}{1-p} = \beta_0 + \beta_1*CreditLimit + \beta_2*Education2 + \beta_3*Education3 + \beta_4*Education4 $

$ \beta_0 = 0.8089 $, $ \beta_1 = 0.000003199 $

$ \beta_2 = -0.07351 $

$ CreditLimit = 100,000 $, $ Education = "2" $

$ log\frac{p}{1-p} = 0.8089 + 0.000003199*100000 - 0.07351*1 $

$ log\frac{p}{1-p} = 1.05529 $

$ p = \frac{exp(1.05529)}{1+exp(1.05529)} $

$ p = 0.7417894 $

74.18% Probability of Default among Consumers having Education = “2” & Credit Limit = $100,000.

# creating single value dataframe
Education2 <- data.frame(CreditLimit = 100000, Education = "2")

# predicting probability
probability2 <- predict(Model1, Education2, type= "response")
probability2

        1 
0.7417904

74.18% : Probability of Default if Education = “2” & Credit Limit = $100,000.

# creating single value dataframe
Education1 <- data.frame(CreditLimit = 100000, Education = "1")

# predicting probability
probability1 <- predict(Model1, Education1, type= "response")
probability1

       1 
0.755619

75.56% : Probability of Default if Education = “1” & Credit Limit = $100,000.

Recall.. YOUR BOSS WANTS TO KNOW:

(A) How much does the PROBABILITY of credit card default change, if the CREDIT LIMIT of customers without college education (i.e. Education = '1'), is raised by $10,000?

Recall.. YOUR BOSS WANTS TO KNOW:

(B) How much does the PROBABILITY of credit card default change, if the CREDIT LIMIT of customers having a bachelor's degree (i.e. Education = '2'), is raised by $10,000?

Summary (so far)

74.18% : Probability of Default if Education = “2” & Credit = $100,000.

75.56% : Probability of Default if Education = “1” & Credit = $100,000.

What happens if we raise the credit limit by $10,000 to $110,000?

74.18% : Probability of Default if Education = “2” & Credit = $100,000.

75.56% : Probability of Default if Education = “1” & Credit = $100,000.

Education11 <- data.frame(CreditLimit = 110000, Education = "1")
probability11 <- predict(Model1, Education11, type= "response")
probability11

        1 
0.7614777

76.15% : Probability of Default if Education = “1” & Credit Limit = $110,000.

Education12 <- data.frame(CreditLimit = 110000, Education = "2")
probability12 <- predict(Model1, Education12, type= "response")
probability12

        1 
0.7478701

74.78% : Probability of Default if Education = “2” & Credit Limit = $110,000.

What happens if we raise the credit limit by $10,000 to $110,000?

EDUCATION = '1' (No College Education)

Probability of Default rises from 75.56% to 76.15%*

EDUCATION = '2' (College Graduates)

Probability of Default rises from 74.18% to 74.78%*

What is the problem with this approach?

The rise in probability of default due to raising credit limit by $10000, will change at different values of initial credit limit

=> No definitive conclusion

Could some simple Calculus help?

We want to measure the rate of change of log odds ratio, with respect to change in the credit limit

Recall Model 1

\[ log\frac{p}{1-p} = \beta_0 + \beta_1*CreditLimit + \beta_2*Education2 + \beta_3*Education3 \] \[ + \beta_4*Education4 \]

The Boss wants us to measure..

\[ \frac{\partial}{\partial (CreditLimit)}log\frac{p}{1-p} \]

Recall that the boss wanted to know by how much will the probability of default change, if the Credit Limit is raised by $10,000

Notice..

\[ \frac{\partial}{\partial (CreditLimit)}log\frac{p}{1-p} = \beta_1 \]

In the model

\[ log\frac{p}{1-p} = \beta_0 + \beta_1*CreditLimit + \beta_2*Education2 + \beta_3*Education3 \] \[ + \beta_4*Education4 \]

All the other terms are independent of CreditLimit

Taking the derivative is easy

On substitution..

\[ \frac{\partial}{\partial (CreditLimit)}log\frac{p}{1-p} = \beta_1 \]

\[ \partial log\frac{p}{1-p} = \beta_1* \partial (CreditLimit) \]

Now, we can substitute..

\[ \partial log\frac{p}{1-p} = 0.000003199*10000 \]

\[ \partial log\frac{p}{1-p} = 0.03199 \]

\[ \partial (p) = \frac{exp(0.03199)}{1 + exp(0.03199)} \]

\[ \partial (p) = 0.508 \]

The probability of default increases by 50.8%, if the credit limit rises by $10,000

What is the problem with this approach?

The change in probability of Default, with increase in Credit Limit is INDEPENDENT of the Education Level

What is the problem with this approach?

The answer is the SAME regardless of whether the consumer does not have a college degree (Education = 0) or has a college degree (Education = 1)