Credit cards offer you a line of credit that can be used to make purchases, balance transfers and/or cash advances and requiring that you pay back the loan amount in the future. When using a credit card, you will need to make at least the minimum payment every month by the due date on the balance. If the full balance for purchases is not paid off, interest charges will be applied. Interest charges will be applied from the date of the transaction for balance transfers and/or cash advances (Us Bank).
This dataset is part of “An Introduction to Statistical Learning with Applications in R” available at http://www-bcf.usc.edu/~gareth/ISL/index.html. The Dataset consists of 400 observations and 12 variables.
We will use “Zelig” pacakage to create a linear simulation model, which will give us coefficients to interpret which factors influence credit card balance.
library (Zelig)
library (texreg)
library(ISLR)
library(dplyr)
data(Credit)
head(Credit)
str(Credit)
## 'data.frame': 400 obs. of 12 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Income : num 14.9 106 104.6 148.9 55.9 ...
## $ Limit : int 3606 6645 7075 9504 4897 8047 3388 7114 3300 6819 ...
## $ Rating : int 283 483 514 681 357 569 259 512 266 491 ...
## $ Cards : int 2 3 4 3 2 4 2 2 5 3 ...
## $ Age : int 34 82 71 36 68 77 37 87 66 41 ...
## $ Education: int 11 15 11 11 16 10 12 9 13 19 ...
## $ Gender : Factor w/ 2 levels " Male","Female": 1 2 1 2 1 1 2 1 2 2 ...
## $ Student : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 1 2 ...
## $ Married : Factor w/ 2 levels "No","Yes": 2 2 1 1 2 1 1 1 1 2 ...
## $ Ethnicity: Factor w/ 3 levels "African American",..: 3 2 2 2 3 3 1 2 3 1 ...
## $ Balance : int 333 903 580 964 331 1151 203 872 279 1350 ...
# Converting married status factor variable into integer
Credit <- Credit %>%
mutate(Marry = as.integer(Married)) %>%
mutate(Marrital_status = sjmisc:: rec(Marry, rec = "2=0; 1=1")) %>%
select (ID, Income, Married, Marrital_status, everything())
head(Credit)
As we are predicting the average balance of a credit card by limit. Our dependent variable is Balance and Independent variable is limit. We can see that there is no significance, however an increase in the credit limit of 1.7 will decrease the average credit balance by -2.9
C1 <- zelig(Balance ~ Limit, model = "normal", data = Credit, cite = F)
summary(C1)
## Model:
##
## Call:
## z5$zelig(formula = Balance ~ Limit, data = Credit)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -676.95 -141.87 -11.55 134.11 776.44
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.928e+02 2.668e+01 -10.97 <2e-16
## Limit 1.716e-01 5.066e-03 33.88 <2e-16
##
## (Dispersion parameter for gaussian family taken to be 54561.95)
##
## Null deviance: 84339912 on 399 degrees of freedom
## Residual deviance: 21715657 on 398 degrees of freedom
## AIC: 5502
##
## Number of Fisher Scoring iterations: 2
##
## Next step: Use 'setx' method
We ran a multivariate analysis with more independent variables. We can see that there is no significance.
C2 <- zelig (Balance ~ Limit + Rating + Income + Age, model = "normal", data = Credit, cite = F)
summary(C2)
## Model:
##
## Call:
## z5$zelig(formula = Balance ~ Limit + Rating + Income + Age, data = Credit)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -249.62 -110.89 -39.98 51.87 546.52
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -445.10477 40.57635 -10.970 < 2e-16
## Limit 0.08183 0.04461 1.834 0.0674
## Rating 2.73142 0.66435 4.111 4.79e-05
## Income -7.61268 0.38169 -19.945 < 2e-16
## Age -0.85612 0.47841 -1.789 0.0743
##
## (Dispersion parameter for gaussian family taken to be 26212.8)
##
## Null deviance: 84339912 on 399 degrees of freedom
## Residual deviance: 10354055 on 395 degrees of freedom
## AIC: 5211.7
##
## Number of Fisher Scoring iterations: 2
##
## Next step: Use 'setx' method
From the below graph, we can say that young people starting from ages 20 have a higher average credit balance as compared to when age increases the average credit card balance decreases. This may be because, young ages are most likely to work part-time and don’t use there credit cards alot. People who are between ages of 40 and below, have family and other expenses, so they are likely to use their credit cards more.
A.range = min(Credit$Age):max(Credit$Age)
x <- setx(C2, Age = A.range)
s <- sim(C2, x = x)
plot(s)
We do an interaction between rating and income. This is because the higher the income, the more economic stable you are to be.
C3 <- zelig (Balance ~ Limit + Rating*Income + Cards + Age + Marrital_status, model = "normal", data = Credit, cite = F)
summary(C3)
## Model:
##
## Call:
## z5$zelig(formula = Balance ~ Limit + Rating * Income + Cards +
## Age + Marrital_status, data = Credit)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -220.14 -104.22 -40.07 49.84 531.93
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.770e+02 4.871e+01 -7.740 8.55e-14
## Limit 1.321e-01 5.252e-02 2.515 0.01230
## Rating 1.806e+00 7.923e-01 2.279 0.02322
## Income -9.638e+00 7.684e-01 -12.543 < 2e-16
## Cards 1.116e+01 6.975e+00 1.599 0.11057
## Age -9.622e-01 4.727e-01 -2.036 0.04244
## Marrital_status 3.163e+01 1.648e+01 1.919 0.05571
## Rating:Income 3.670e-03 1.174e-03 3.125 0.00191
##
## (Dispersion parameter for gaussian family taken to be 25375.9)
##
## Null deviance: 84339912 on 399 degrees of freedom
## Residual deviance: 9947353 on 392 degrees of freedom
## AIC: 5201.7
##
## Number of Fisher Scoring iterations: 2
##
## Next step: Use 'setx' method
From the below graph, we can see that as credit rating increases so does the average credit balance. This may be because, having a good credit rate the bank is likely to increase your credit limit.
R.range = min(Credit$Rating):max(Credit$Rating)
x <- setx(C3, Rating= R.range)
s <- sim(C3, x = x)
plot(s)
We can see that the mean balance difference between being single and married is 31.78.
x <- setx(C3, Marrital_status = 0 )
x1 <- setx(C3, Marrital_status = 1)
s <- sim(C3, x = x, x1 = x1)
summary(s)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 492.7186 11.38166 492.6557 471.8985 514.9955
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 484.038 160.6703 477.9754 175.2448 810.8603
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 524.0239 14.39466 523.6531 497.3755 552.9161
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 521.8455 162.4627 531.1532 206.8242 838.7846
## fd
## mean sd 50% 2.5% 97.5%
## [1,] 31.30525 16.56112 30.79226 1.168131 64.45938
This is putting our simulation into summary. From the graphs we can see an overlap between the expected and predicted values.
fd <- s$get_qi(xvalue="x1", qi="fd")
summary(fd)
## V1
## Min. :-23.69
## 1st Qu.: 19.63
## Median : 30.79
## Mean : 31.31
## 3rd Qu.: 42.55
## Max. : 77.87
plot(s)
As we can see that there is no significance influence of factors on the credit card balance. However, simulations do tell us the differences between each factor. Further analysis needs to be conducted.