Which factors influence Credit Card Balance of a card holder.

Introduction

Credit cards offer you a line of credit that can be used to make purchases, balance transfers and/or cash advances and requiring that you pay back the loan amount in the future. When using a credit card, you will need to make at least the minimum payment every month by the due date on the balance. If the full balance for purchases is not paid off, interest charges will be applied. Interest charges will be applied from the date of the transaction for balance transfers and/or cash advances (Us Bank).

About the Dataset

This dataset is part of “An Introduction to Statistical Learning with Applications in R” available at http://www-bcf.usc.edu/~gareth/ISL/index.html. The Dataset consists of 400 observations and 12 variables.

Method of Analysis

We will use “Zelig” pacakage to create a linear simulation model, which will give us coefficients to interpret which factors influence credit card balance.

library (Zelig)
library (texreg)
library(ISLR)
library(dplyr)
data(Credit)
head(Credit)
str(Credit)
## 'data.frame':    400 obs. of  12 variables:
##  $ ID       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Income   : num  14.9 106 104.6 148.9 55.9 ...
##  $ Limit    : int  3606 6645 7075 9504 4897 8047 3388 7114 3300 6819 ...
##  $ Rating   : int  283 483 514 681 357 569 259 512 266 491 ...
##  $ Cards    : int  2 3 4 3 2 4 2 2 5 3 ...
##  $ Age      : int  34 82 71 36 68 77 37 87 66 41 ...
##  $ Education: int  11 15 11 11 16 10 12 9 13 19 ...
##  $ Gender   : Factor w/ 2 levels " Male","Female": 1 2 1 2 1 1 2 1 2 2 ...
##  $ Student  : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 1 2 ...
##  $ Married  : Factor w/ 2 levels "No","Yes": 2 2 1 1 2 1 1 1 1 2 ...
##  $ Ethnicity: Factor w/ 3 levels "African American",..: 3 2 2 2 3 3 1 2 3 1 ...
##  $ Balance  : int  333 903 580 964 331 1151 203 872 279 1350 ...
# Converting married status factor variable into integer 
Credit <- Credit %>%
  mutate(Marry = as.integer(Married)) %>% 
  mutate(Marrital_status = sjmisc:: rec(Marry, rec = "2=0; 1=1")) %>%
  select (ID, Income, Married, Marrital_status, everything())
head(Credit)
Simple Logistic Model

As we are predicting the average balance of a credit card by limit. Our dependent variable is Balance and Independent variable is limit. We can see that there is no significance, however an increase in the credit limit of 1.7 will decrease the average credit balance by -2.9

C1 <- zelig(Balance ~ Limit, model = "normal", data = Credit, cite = F)
summary(C1)
## Model: 
## 
## Call:
## z5$zelig(formula = Balance ~ Limit, data = Credit)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -676.95  -141.87   -11.55   134.11   776.44  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.928e+02  2.668e+01  -10.97   <2e-16
## Limit        1.716e-01  5.066e-03   33.88   <2e-16
## 
## (Dispersion parameter for gaussian family taken to be 54561.95)
## 
##     Null deviance: 84339912  on 399  degrees of freedom
## Residual deviance: 21715657  on 398  degrees of freedom
## AIC: 5502
## 
## Number of Fisher Scoring iterations: 2
## 
## Next step: Use 'setx' method
Multivariate Logistic Model

We ran a multivariate analysis with more independent variables. We can see that there is no significance.

C2 <- zelig (Balance ~ Limit + Rating + Income + Age, model = "normal", data = Credit, cite = F)
summary(C2)
## Model: 
## 
## Call:
## z5$zelig(formula = Balance ~ Limit + Rating + Income + Age, data = Credit)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -249.62  -110.89   -39.98    51.87   546.52  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -445.10477   40.57635 -10.970  < 2e-16
## Limit          0.08183    0.04461   1.834   0.0674
## Rating         2.73142    0.66435   4.111 4.79e-05
## Income        -7.61268    0.38169 -19.945  < 2e-16
## Age           -0.85612    0.47841  -1.789   0.0743
## 
## (Dispersion parameter for gaussian family taken to be 26212.8)
## 
##     Null deviance: 84339912  on 399  degrees of freedom
## Residual deviance: 10354055  on 395  degrees of freedom
## AIC: 5211.7
## 
## Number of Fisher Scoring iterations: 2
## 
## Next step: Use 'setx' method
Age effect

From the below graph, we can say that young people starting from ages 20 have a higher average credit balance as compared to when age increases the average credit card balance decreases. This may be because, young ages are most likely to work part-time and don’t use there credit cards alot. People who are between ages of 40 and below, have family and other expenses, so they are likely to use their credit cards more.

A.range = min(Credit$Age):max(Credit$Age)
x <- setx(C2, Age = A.range)
s <- sim(C2, x = x)
plot(s)

Interaction Model

We do an interaction between rating and income. This is because the higher the income, the more economic stable you are to be.

C3 <- zelig (Balance ~ Limit + Rating*Income + Cards + Age + Marrital_status, model = "normal", data = Credit, cite = F)
summary(C3)
## Model: 
## 
## Call:
## z5$zelig(formula = Balance ~ Limit + Rating * Income + Cards + 
##     Age + Marrital_status, data = Credit)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -220.14  -104.22   -40.07    49.84   531.93  
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -3.770e+02  4.871e+01  -7.740 8.55e-14
## Limit            1.321e-01  5.252e-02   2.515  0.01230
## Rating           1.806e+00  7.923e-01   2.279  0.02322
## Income          -9.638e+00  7.684e-01 -12.543  < 2e-16
## Cards            1.116e+01  6.975e+00   1.599  0.11057
## Age             -9.622e-01  4.727e-01  -2.036  0.04244
## Marrital_status  3.163e+01  1.648e+01   1.919  0.05571
## Rating:Income    3.670e-03  1.174e-03   3.125  0.00191
## 
## (Dispersion parameter for gaussian family taken to be 25375.9)
## 
##     Null deviance: 84339912  on 399  degrees of freedom
## Residual deviance:  9947353  on 392  degrees of freedom
## AIC: 5201.7
## 
## Number of Fisher Scoring iterations: 2
## 
## Next step: Use 'setx' method
Rating Effect

From the below graph, we can see that as credit rating increases so does the average credit balance. This may be because, having a good credit rate the bank is likely to increase your credit limit.

R.range = min(Credit$Rating):max(Credit$Rating)
x <- setx(C3, Rating= R.range)
s <- sim(C3, x = x)
plot(s)

Marital Difference

We can see that the mean balance difference between being single and married is 31.78.

x <- setx(C3, Marrital_status = 0 )
x1 <- setx(C3, Marrital_status = 1)
s <- sim(C3, x = x, x1 = x1)
summary(s)
## 
##  sim x :
##  -----
## ev
##          mean       sd      50%     2.5%    97.5%
## [1,] 492.7186 11.38166 492.6557 471.8985 514.9955
## pv
##         mean       sd      50%     2.5%    97.5%
## [1,] 484.038 160.6703 477.9754 175.2448 810.8603
## 
##  sim x1 :
##  -----
## ev
##          mean       sd      50%     2.5%    97.5%
## [1,] 524.0239 14.39466 523.6531 497.3755 552.9161
## pv
##          mean       sd      50%     2.5%    97.5%
## [1,] 521.8455 162.4627 531.1532 206.8242 838.7846
## fd
##          mean       sd      50%     2.5%    97.5%
## [1,] 31.30525 16.56112 30.79226 1.168131 64.45938
First Difference

This is putting our simulation into summary. From the graphs we can see an overlap between the expected and predicted values.

fd <- s$get_qi(xvalue="x1", qi="fd")
summary(fd)
##        V1        
##  Min.   :-23.69  
##  1st Qu.: 19.63  
##  Median : 30.79  
##  Mean   : 31.31  
##  3rd Qu.: 42.55  
##  Max.   : 77.87
plot(s)

Conclusion

As we can see that there is no significance influence of factors on the credit card balance. However, simulations do tell us the differences between each factor. Further analysis needs to be conducted.