1 Intro

Here is my attempt at using machine-learning algorithms such as linear regression and decision tree to predict credit limit from the user’s information. I also try to utlize cross-valdiation, one hot encoding to try to improve the model and it’s prediction (Spoiler: It does not go well..)

1.1 Importing The Data

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
ddf <- read.csv("C:\\Users\\Al Haque\\OneDrive\\Desktop\\Data 622\\100 CC Records.csv")

1.2 Data Exploration

Let’s take a look at what our dataset contains

We have 100 observations and 11 features the credit limit is the dependent variable and the other features are all features of the dataset..

## Exploratory data analysis..

str(ddf)
## 'data.frame':    100 obs. of  11 variables:
##  $ Card.Type.Code     : chr  "DS" "DC" "DC" "AX" ...
##  $ Card.Type.Full.Name: chr  "Discover" "Diners Club International" "Diners Club International" "American Express" ...
##  $ Issuing.Bank       : chr  "Discover" "Diners Club" "Diners Club" "Bank of America" ...
##  $ Card.Number        : num  6.48e+15 3.03e+13 3.01e+13 3.45e+14 3.58e+15 ...
##  $ Card.Holder.s.Name : chr  "Brenda D Peterson" "Dawn U Reese" "Helen P Perry" "Christine E Kim" ...
##  $ CVV.CVV2           : int  689 70 709 3819 913 217 3555 241 82 509 ...
##  $ Issue.Date         : chr  "01/2017" "12/2015" "02/2007" "08/2011" ...
##  $ Expiry.Date        : chr  "01/2022" "12/2016" "02/2020" "08/2013" ...
##  $ Billing.Date       : int  4 11 13 4 22 12 25 15 26 12 ...
##  $ Card.PIN           : int  1998 3915 2319 9017 66 2867 8493 3907 7665 5766 ...
##  $ Credit.Limit       : int  22700 12700 81900 77700 38300 158000 68000 41500 47800 93300 ...
summary(ddf)
##  Card.Type.Code     Card.Type.Full.Name Issuing.Bank        Card.Number       
##  Length:100         Length:100          Length:100         Min.   :3.002e+13  
##  Class :character   Class :character    Class :character   1st Qu.:3.466e+14  
##  Mode  :character   Mode  :character    Mode  :character   Median :4.354e+15  
##                                                            Mean   :3.282e+15  
##                                                            3rd Qu.:5.265e+15  
##                                                            Max.   :6.504e+15  
##  Card.Holder.s.Name    CVV.CVV2       Issue.Date        Expiry.Date       
##  Length:100         Min.   :   0.0   Length:100         Length:100        
##  Class :character   1st Qu.: 336.8   Class :character   Class :character  
##  Mode  :character   Median : 584.0   Mode  :character   Mode  :character  
##                     Mean   :1149.3                                        
##                     3rd Qu.: 867.5                                        
##                     Max.   :9594.0                                        
##   Billing.Date      Card.PIN     Credit.Limit   
##  Min.   : 1.00   Min.   :  23   Min.   : 10700  
##  1st Qu.: 8.00   1st Qu.:2434   1st Qu.: 38600  
##  Median :16.00   Median :3911   Median : 82000  
##  Mean   :15.47   Mean   :4443   Mean   : 92831  
##  3rd Qu.:24.00   3rd Qu.:6901   3rd Qu.:146350  
##  Max.   :28.00   Max.   :9815   Max.   :199000

Each observations contains information about a customer and their credit card information i suspect that their personal information wouldn’t be that much helpful in predicting credit limit.

head(ddf)
##   Card.Type.Code       Card.Type.Full.Name    Issuing.Bank  Card.Number
## 1             DS                  Discover        Discover 6.480195e+15
## 2             DC Diners Club International     Diners Club 3.029520e+13
## 3             DC Diners Club International     Diners Club 3.008249e+13
## 4             AX          American Express Bank of America 3.448500e+14
## 5             JC       Japan Credit Bureau             JCB 3.583388e+15
## 6             JC       Japan Credit Bureau             JCB 3.587385e+15
##   Card.Holder.s.Name CVV.CVV2 Issue.Date Expiry.Date Billing.Date Card.PIN
## 1  Brenda D Peterson      689    01/2017     01/2022            4     1998
## 2       Dawn U Reese       70    12/2015     12/2016           11     3915
## 3      Helen P Perry      709    02/2007     02/2020           13     2319
## 4    Christine E Kim     3819    08/2011     08/2013            4     9017
## 5  Elizabeth Z Lopez      913    11/2017     11/2026           22       66
## 6     Nancy S Robles      217    11/2015     11/2029           12     2867
##   Credit.Limit
## 1        22700
## 2        12700
## 3        81900
## 4        77700
## 5        38300
## 6       158000

In this data set, we see that there are a lot of Master Card Holders followed by Diners Club International and then American Express.

ddf %>%
  select(Card.Type.Full.Name) %>%
  table() %>%
  prop.table()
## Card.Type.Full.Name
##          American Express Diners Club International                  Discover 
##                      0.16                      0.18                      0.11 
##       Japan Credit Bureau               Master Card                      Visa 
##                      0.15                      0.26                      0.14

Chase, JCB and American Express seems to account for the majority of issuing credit card in this dataset..

ddf %>%
  select(Issuing.Bank) %>%
  table() %>%
  prop.table()
## Issuing.Bank
## American Express  Cabela\x92s WFB  Bank of America         Barclays 
##             0.14             0.01             0.05             0.05 
##      Capital One            Chase         Citibank      Diners Club 
##             0.05             0.15             0.05             0.18 
##         Discover       GE Capital              JCB     U.S. Bancorp 
##             0.11             0.02             0.15             0.01 
##             USAA      Wells Fargo 
##             0.01             0.02

1.3 Data visualization

## get rid of the special characters before visualizing the data..
ddf$Issuing.Bank <- iconv(ddf$Issuing.Bank, "ASCII", "UTF-8", sub="")

ggplot(ddf,aes(x=as.factor(Issuing.Bank))) +
  geom_bar(stat = "count") + coord_flip() + 
  geom_text(stat = "count",aes(label = after_stat(count)),vjust = 0.7,size = 4,hjust = -0.3) + 
  ggtitle("Count of Bank Issuers")

ggplot(ddf,aes(x = Card.Type.Full.Name)) + 
  geom_bar(stat = "count") + coord_flip()+ 
  geom_text(stat = "count",aes(label = after_stat(count)),vjust = 0.7,size = 4,hjust = -0.3) + 
  ggtitle("Count of Bank Type")

Finally, I will look at the distribution of the credit limit, the dependent variable of our dataset.., the distribution is not normal but it doesn’t look skewed so we have a variety of values here.

ggplot(ddf,aes(x=Credit.Limit)) +
  geom_histogram(bins = 30) +
  ggtitle("Credit Limit Distribution") +
  xlab("Credit Limit") +
  ylab("Count") + theme_minimal()


1.4 Checking Correlation

I used a corrplot to check for correlation between the numeric predictors to see which predictors can predict credit limit.

library(corrplot)
## corrplot 0.92 loaded
# Assuming 'df' is your data frame with numeric predictors
# You can select only the numeric columns from your data frame
# addCoef.col adds the correlation number on your scatterplot matrix.. 
numeric_predictors <- ddf[, sapply(ddf, is.numeric)]

# Calculate the correlation matrix
correlation_matrix <- cor(numeric_predictors)

# Create a scatterplot matrix with corrplot
corrplot(correlation_matrix, method = "circle",number.cex = 0.7,addCoef.col = "black",type = "lower")


1.5 Creating a Linear Regression Model..

Creating a linear regression model with all of the predictors to see if there are any significant predictors that can help explain the Credit.Limit it seems like the linear model is unable to find such anything. I will attempt to remove any predictors that I feel are unimportant..

## Yikes this is not a good model to use to predict credit limit oh my... 
linearmod <- lm(Credit.Limit~.,data = ddf)

summary(linearmod)
## 
## Call:
## lm(formula = Credit.Limit ~ ., data = ddf)
## 
## Residuals:
## ALL 100 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (179 not defined because of singularities)
##                                                Estimate Std. Error t value
## (Intercept)                                   2.026e+05        NaN     NaN
## Card.Type.CodeDC                             -1.348e+05        NaN     NaN
## Card.Type.CodeDS                              8.293e+05        NaN     NaN
## Card.Type.CodeJC                              4.386e+05        NaN     NaN
## Card.Type.CodeMC                              8.778e+05        NaN     NaN
## Card.Type.CodeVI                              7.308e+05        NaN     NaN
## Card.Type.Full.NameDiners Club International         NA         NA      NA
## Card.Type.Full.NameDiscover                          NA         NA      NA
## Card.Type.Full.NameJapan Credit Bureau               NA         NA      NA
## Card.Type.Full.NameMaster Card                       NA         NA      NA
## Card.Type.Full.NameVisa                              NA         NA      NA
## Issuing.BankBank of America                  -6.722e+04        NaN     NaN
## Issuing.BankBarclays                         -1.198e+05        NaN     NaN
## Issuing.BankCabela\022s WFB                   1.803e+02        NaN     NaN
## Issuing.BankCapital One                      -1.501e+05        NaN     NaN
## Issuing.BankChase                            -1.151e+05        NaN     NaN
## Issuing.BankCitibank                         -6.950e+04        NaN     NaN
## Issuing.BankDiners Club                              NA         NA      NA
## Issuing.BankDiscover                                 NA         NA      NA
## Issuing.BankGE Capital                       -4.578e+04        NaN     NaN
## Issuing.BankJCB                                      NA         NA      NA
## Issuing.BankU.S. Bancorp                     -6.150e+04        NaN     NaN
## Issuing.BankUSAA                             -1.341e+05        NaN     NaN
## Issuing.BankWells Fargo                      -1.693e+05        NaN     NaN
## Card.Number                                  -1.674e-10        NaN     NaN
## Card.Holder.s.NameAlice G Mitchell            5.225e+03        NaN     NaN
## Card.Holder.s.NameAlicia S Crawford          -2.400e+04        NaN     NaN
## Card.Holder.s.NameAllen Rodgers              -8.558e+04        NaN     NaN
## Card.Holder.s.NameAmanda L Soto               8.725e+04        NaN     NaN
## Card.Holder.s.NameAnita O Brennan             2.675e+02        NaN     NaN
## Card.Holder.s.NameAnnie R House               6.958e+04        NaN     NaN
## Card.Holder.s.NameAnthony C Sanchez          -3.670e+04        NaN     NaN
## Card.Holder.s.NameArthur Hines                9.956e+04        NaN     NaN
## Card.Holder.s.NameBilly O Bell               -1.297e+04        NaN     NaN
## Card.Holder.s.NameBobby F Walker              1.900e+04        NaN     NaN
## Card.Holder.s.NameBrandon B Boone             1.462e+05        NaN     NaN
## Card.Holder.s.NameBrenda D Peterson           7.547e+04        NaN     NaN
## Card.Holder.s.NameCharles R Brooks           -1.820e+04        NaN     NaN
## Card.Holder.s.NameCharles R Farrell          -6.867e+04        NaN     NaN
## Card.Holder.s.NameChristina G Francis        -1.326e+05        NaN     NaN
## Card.Holder.s.NameChristine E Kim                    NA         NA      NA
## Card.Holder.s.NameCraig Gonzalez             -1.098e+05        NaN     NaN
## Card.Holder.s.NameCraig X Keith               1.316e+05        NaN     NaN
## Card.Holder.s.NameCynthia Evans              -8.336e+04        NaN     NaN
## Card.Holder.s.NameDavid Z Hood               -3.368e+02        NaN     NaN
## Card.Holder.s.NameDawn U Reese               -5.010e+04        NaN     NaN
## Card.Holder.s.NameDiane Carroll               1.531e+05        NaN     NaN
## Card.Holder.s.NameDiane R Ross                1.144e+05        NaN     NaN
## Card.Holder.s.NameDon P Wooten                1.374e+05        NaN     NaN
## Card.Holder.s.NameDonna A Rodriguez           1.636e+05        NaN     NaN
## Card.Holder.s.NameDorothy H Le                7.290e+04        NaN     NaN
## Card.Holder.s.NameEdward Z Owen              -6.175e+04        NaN     NaN
## Card.Holder.s.NameEleanor Callahan            1.297e+05        NaN     NaN
## Card.Holder.s.NameEleanor Crane               2.548e+04        NaN     NaN
## Card.Holder.s.NameElizabeth Z Lopez          -3.148e+03        NaN     NaN
## Card.Holder.s.NameFrances S Henderson        -3.508e+04        NaN     NaN
## Card.Holder.s.NameFrancis Koch               -6.799e+04        NaN     NaN
## Card.Holder.s.NameGerald O Lee               -1.171e+05        NaN     NaN
## Card.Holder.s.NameGrace D Sparks              4.023e+04        NaN     NaN
## Card.Holder.s.NameHeather W Acosta           -3.259e+04        NaN     NaN
## Card.Holder.s.NameHeather Wong                       NA         NA      NA
## Card.Holder.s.NameHelen P Perry               1.906e+04        NaN     NaN
## Card.Holder.s.NameHelen R Lucas              -1.054e+05        NaN     NaN
## Card.Holder.s.NameHenry T Miller              4.370e+04        NaN     NaN
## Card.Holder.s.NameHenry W Ballard            -1.868e+04        NaN     NaN
## Card.Holder.s.NameHenry W Hewitt             -6.609e+04        NaN     NaN
## Card.Holder.s.NameHoward C Dunlap             4.775e+04        NaN     NaN
## Card.Holder.s.NameHoward R Bernard            4.861e+04        NaN     NaN
## Card.Holder.s.NameHoward X Nelson             3.779e+03        NaN     NaN
## Card.Holder.s.NameJack Foster                -2.594e+04        NaN     NaN
## Card.Holder.s.NameJanet Alexander             7.088e+04        NaN     NaN
## Card.Holder.s.NameJeffrey J Barnes            1.149e+05        NaN     NaN
## Card.Holder.s.NameJennifer K Potts            2.737e+04        NaN     NaN
## Card.Holder.s.NameJennifer T Turner           1.496e+05        NaN     NaN
## Card.Holder.s.NameJeremy N White              5.832e+04        NaN     NaN
## Card.Holder.s.NameJeremy O Wilson             3.726e+03        NaN     NaN
## Card.Holder.s.NameJeremy P Price             -1.503e+04        NaN     NaN
## Card.Holder.s.NameJesse J Perez              -7.258e+04        NaN     NaN
## Card.Holder.s.NameJim X Ballard               1.723e+04        NaN     NaN
## Card.Holder.s.NameJimmy V Baker               1.110e+05        NaN     NaN
## Card.Holder.s.NameJoe N Mccarty               3.049e+04        NaN     NaN
## Card.Holder.s.NameJon O Reese                        NA         NA      NA
## Card.Holder.s.NameJoyce B Warren              2.053e+05        NaN     NaN
## Card.Holder.s.NameJudith I Rollins            2.248e+04        NaN     NaN
## Card.Holder.s.NameJudith Y Ross               9.944e+04        NaN     NaN
## Card.Holder.s.NameJustin Y Coleman            8.220e+04        NaN     NaN
## Card.Holder.s.NameKathy Robinson             -1.122e+05        NaN     NaN
## Card.Holder.s.NameKeith Z Morris             -5.136e+04        NaN     NaN
## Card.Holder.s.NameKristen Frank                      NA         NA      NA
## Card.Holder.s.NameLaura N Jackson             2.110e+04        NaN     NaN
## Card.Holder.s.NameLaura W Richardson          6.391e+04        NaN     NaN
## Card.Holder.s.NameLeonard K Browning          1.032e+05        NaN     NaN
## Card.Holder.s.NameLillian M Green            -4.293e+04        NaN     NaN
## Card.Holder.s.NameLinda T Brown               9.176e+04        NaN     NaN
## Card.Holder.s.NameLouis Martinez             -4.334e+04        NaN     NaN
## Card.Holder.s.NameMarcus D Ashley            -1.162e+05        NaN     NaN
## Card.Holder.s.NameMargaret T Ramirez          9.777e+04        NaN     NaN
## Card.Holder.s.NameMarilyn E Coleman           1.626e+05        NaN     NaN
## Card.Holder.s.NameMario D Blanchard           6.764e+04        NaN     NaN
## Card.Holder.s.NameMark Y Sheppard                    NA         NA      NA
## Card.Holder.s.NameNancy C Cox                 1.439e+05        NaN     NaN
## Card.Holder.s.NameNancy S Robles              1.172e+05        NaN     NaN
## Card.Holder.s.NamePatricia Williams          -2.982e+04        NaN     NaN
## Card.Holder.s.NamePaula E Head                7.239e+04        NaN     NaN
## Card.Holder.s.NamePaula R Bryan                      NA         NA      NA
## Card.Holder.s.NamePhyllis H Johnson                  NA         NA      NA
## Card.Holder.s.NameRachel Turner               4.126e+04        NaN     NaN
## Card.Holder.s.NameRachel V Jenkins                   NA         NA      NA
## Card.Holder.s.NameRandall Z Lott             -1.001e+05        NaN     NaN
## Card.Holder.s.NameRebecca Jenkins            -7.971e+04        NaN     NaN
## Card.Holder.s.NameRobert A Long               4.717e+04        NaN     NaN
## Card.Holder.s.NameRonald R Stewart            8.950e+04        NaN     NaN
## Card.Holder.s.NameRose Jenkins                1.230e+05        NaN     NaN
## Card.Holder.s.NameSandra H Adams                     NA         NA      NA
## Card.Holder.s.NameSandra N Parrish            1.734e+04        NaN     NaN
## Card.Holder.s.NameSean J Young                2.086e+04        NaN     NaN
## Card.Holder.s.NameStanley K Hopper           -7.047e+04        NaN     NaN
## Card.Holder.s.NameStephen H Ramirez                  NA         NA      NA
## Card.Holder.s.NameVictor J Mercado                   NA         NA      NA
## Card.Holder.s.NameVincent M Fulton                   NA         NA      NA
## Card.Holder.s.NameVincent U Jordan                   NA         NA      NA
## Card.Holder.s.NameWalter B Lopez                     NA         NA      NA
## Card.Holder.s.NameWilliam N French                   NA         NA      NA
## Card.Holder.s.NameWillie O King                      NA         NA      NA
## CVV.CVV2                                             NA         NA      NA
## Issue.Date01/2009                                    NA         NA      NA
## Issue.Date01/2010                                    NA         NA      NA
## Issue.Date01/2011                                    NA         NA      NA
## Issue.Date01/2013                                    NA         NA      NA
## Issue.Date01/2016                                    NA         NA      NA
## Issue.Date01/2017                                    NA         NA      NA
## Issue.Date02/2007                                    NA         NA      NA
## Issue.Date02/2010                                    NA         NA      NA
## Issue.Date02/2012                                    NA         NA      NA
## Issue.Date02/2013                                    NA         NA      NA
## Issue.Date02/2014                                    NA         NA      NA
## Issue.Date02/2015                                    NA         NA      NA
## Issue.Date02/2016                                    NA         NA      NA
## Issue.Date03/2011                                    NA         NA      NA
## Issue.Date03/2012                                    NA         NA      NA
## Issue.Date03/2013                                    NA         NA      NA
## Issue.Date03/2015                                    NA         NA      NA
## Issue.Date03/2016                                    NA         NA      NA
## Issue.Date04/2007                                    NA         NA      NA
## Issue.Date04/2009                                    NA         NA      NA
## Issue.Date04/2011                                    NA         NA      NA
## Issue.Date04/2013                                    NA         NA      NA
## Issue.Date04/2015                                    NA         NA      NA
## Issue.Date04/2017                                    NA         NA      NA
## Issue.Date05/2010                                    NA         NA      NA
## Issue.Date05/2011                                    NA         NA      NA
## Issue.Date05/2015                                    NA         NA      NA
## Issue.Date05/2016                                    NA         NA      NA
## Issue.Date06/2007                                    NA         NA      NA
## Issue.Date06/2008                                    NA         NA      NA
## Issue.Date06/2011                                    NA         NA      NA
## Issue.Date06/2012                                    NA         NA      NA
## Issue.Date06/2014                                    NA         NA      NA
## Issue.Date06/2015                                    NA         NA      NA
## Issue.Date07/2007                                    NA         NA      NA
## Issue.Date07/2008                                    NA         NA      NA
## Issue.Date07/2009                                    NA         NA      NA
## Issue.Date07/2010                                    NA         NA      NA
## Issue.Date07/2015                                    NA         NA      NA
## Issue.Date07/2017                                    NA         NA      NA
## Issue.Date08/2007                                    NA         NA      NA
## Issue.Date08/2008                                    NA         NA      NA
## Issue.Date08/2010                                    NA         NA      NA
## Issue.Date08/2011                                    NA         NA      NA
## Issue.Date08/2013                                    NA         NA      NA
## Issue.Date08/2014                                    NA         NA      NA
## Issue.Date08/2015                                    NA         NA      NA
## Issue.Date08/2016                                    NA         NA      NA
## Issue.Date08/2017                                    NA         NA      NA
## Issue.Date09/2009                                    NA         NA      NA
## Issue.Date09/2010                                    NA         NA      NA
## Issue.Date09/2011                                    NA         NA      NA
## Issue.Date09/2012                                    NA         NA      NA
## Issue.Date09/2013                                    NA         NA      NA
## Issue.Date09/2017                                    NA         NA      NA
## Issue.Date10/2012                                    NA         NA      NA
## Issue.Date10/2013                                    NA         NA      NA
## Issue.Date10/2014                                    NA         NA      NA
## Issue.Date10/2015                                    NA         NA      NA
## Issue.Date10/2016                                    NA         NA      NA
## Issue.Date11/2009                                    NA         NA      NA
## Issue.Date11/2013                                    NA         NA      NA
## Issue.Date11/2015                                    NA         NA      NA
## Issue.Date11/2016                                    NA         NA      NA
## Issue.Date11/2017                                    NA         NA      NA
## Issue.Date12/2007                                    NA         NA      NA
## Issue.Date12/2015                                    NA         NA      NA
## Issue.Date12/2017                                    NA         NA      NA
## Expiry.Date01/2015                                   NA         NA      NA
## Expiry.Date01/2017                                   NA         NA      NA
## Expiry.Date01/2018                                   NA         NA      NA
## Expiry.Date01/2020                                   NA         NA      NA
## Expiry.Date01/2022                                   NA         NA      NA
## Expiry.Date01/2027                                   NA         NA      NA
## Expiry.Date01/2033                                   NA         NA      NA
## Expiry.Date02/2012                                   NA         NA      NA
## Expiry.Date02/2019                                   NA         NA      NA
## Expiry.Date02/2020                                   NA         NA      NA
## Expiry.Date02/2021                                   NA         NA      NA
## Expiry.Date02/2028                                   NA         NA      NA
## Expiry.Date02/2030                                   NA         NA      NA
## Expiry.Date02/2031                                   NA         NA      NA
## Expiry.Date03/2019                                   NA         NA      NA
## Expiry.Date03/2021                                   NA         NA      NA
## Expiry.Date03/2022                                   NA         NA      NA
## Expiry.Date03/2024                                   NA         NA      NA
## Expiry.Date03/2028                                   NA         NA      NA
## Expiry.Date04/2017                                   NA         NA      NA
## Expiry.Date04/2020                                   NA         NA      NA
## Expiry.Date04/2021                                   NA         NA      NA
## Expiry.Date04/2024                                   NA         NA      NA
## Expiry.Date04/2025                                   NA         NA      NA
## Expiry.Date04/2027                                   NA         NA      NA
## Expiry.Date04/2035                                   NA         NA      NA
## Expiry.Date05/2015                                   NA         NA      NA
## Expiry.Date05/2016                                   NA         NA      NA
## Expiry.Date05/2018                                   NA         NA      NA
## Expiry.Date05/2023                                   NA         NA      NA
## Expiry.Date05/2024                                   NA         NA      NA
## Expiry.Date06/2008                                   NA         NA      NA
## Expiry.Date06/2011                                   NA         NA      NA
## Expiry.Date06/2016                                   NA         NA      NA
## Expiry.Date06/2019                                   NA         NA      NA
## Expiry.Date06/2020                                   NA         NA      NA
## Expiry.Date06/2021                                   NA         NA      NA
## Expiry.Date06/2022                                   NA         NA      NA
## Expiry.Date06/2026                                   NA         NA      NA
## Expiry.Date06/2029                                   NA         NA      NA
## Expiry.Date06/2032                                   NA         NA      NA
## Expiry.Date06/2033                                   NA         NA      NA
## Expiry.Date07/2009                                   NA         NA      NA
## Expiry.Date07/2016                                   NA         NA      NA
## Expiry.Date07/2021                                   NA         NA      NA
## Expiry.Date07/2026                                   NA         NA      NA
## Expiry.Date07/2028                                   NA         NA      NA
## Expiry.Date08/2012                                   NA         NA      NA
## Expiry.Date08/2013                                   NA         NA      NA
## Expiry.Date08/2015                                   NA         NA      NA
## Expiry.Date08/2022                                   NA         NA      NA
## Expiry.Date08/2023                                   NA         NA      NA
## Expiry.Date08/2025                                   NA         NA      NA
## Expiry.Date08/2026                                   NA         NA      NA
## Expiry.Date08/2028                                   NA         NA      NA
## Expiry.Date08/2031                                   NA         NA      NA
## Expiry.Date08/2034                                   NA         NA      NA
## Expiry.Date09/2012                                   NA         NA      NA
## Expiry.Date09/2018                                   NA         NA      NA
## Expiry.Date09/2020                                   NA         NA      NA
## Expiry.Date09/2023                                   NA         NA      NA
## Expiry.Date09/2026                                   NA         NA      NA
## Expiry.Date09/2029                                   NA         NA      NA
## Expiry.Date10/2017                                   NA         NA      NA
## Expiry.Date10/2020                                   NA         NA      NA
## Expiry.Date10/2022                                   NA         NA      NA
## Expiry.Date10/2023                                   NA         NA      NA
## Expiry.Date10/2024                                   NA         NA      NA
## Expiry.Date10/2026                                   NA         NA      NA
## Expiry.Date10/2027                                   NA         NA      NA
## Expiry.Date10/2030                                   NA         NA      NA
## Expiry.Date11/2014                                   NA         NA      NA
## Expiry.Date11/2016                                   NA         NA      NA
## Expiry.Date11/2021                                   NA         NA      NA
## Expiry.Date11/2024                                   NA         NA      NA
## Expiry.Date11/2025                                   NA         NA      NA
## Expiry.Date11/2026                                   NA         NA      NA
## Expiry.Date11/2029                                   NA         NA      NA
## Expiry.Date11/2037                                   NA         NA      NA
## Expiry.Date12/2008                                   NA         NA      NA
## Expiry.Date12/2014                                   NA         NA      NA
## Expiry.Date12/2016                                   NA         NA      NA
## Expiry.Date12/2024                                   NA         NA      NA
## Expiry.Date12/2027                                   NA         NA      NA
## Billing.Date                                         NA         NA      NA
## Card.PIN                                             NA         NA      NA
##                                              Pr(>|t|)
## (Intercept)                                       NaN
## Card.Type.CodeDC                                  NaN
## Card.Type.CodeDS                                  NaN
## Card.Type.CodeJC                                  NaN
## Card.Type.CodeMC                                  NaN
## Card.Type.CodeVI                                  NaN
## Card.Type.Full.NameDiners Club International       NA
## Card.Type.Full.NameDiscover                        NA
## Card.Type.Full.NameJapan Credit Bureau             NA
## Card.Type.Full.NameMaster Card                     NA
## Card.Type.Full.NameVisa                            NA
## Issuing.BankBank of America                       NaN
## Issuing.BankBarclays                              NaN
## Issuing.BankCabela\022s WFB                       NaN
## Issuing.BankCapital One                           NaN
## Issuing.BankChase                                 NaN
## Issuing.BankCitibank                              NaN
## Issuing.BankDiners Club                            NA
## Issuing.BankDiscover                               NA
## Issuing.BankGE Capital                            NaN
## Issuing.BankJCB                                    NA
## Issuing.BankU.S. Bancorp                          NaN
## Issuing.BankUSAA                                  NaN
## Issuing.BankWells Fargo                           NaN
## Card.Number                                       NaN
## Card.Holder.s.NameAlice G Mitchell                NaN
## Card.Holder.s.NameAlicia S Crawford               NaN
## Card.Holder.s.NameAllen Rodgers                   NaN
## Card.Holder.s.NameAmanda L Soto                   NaN
## Card.Holder.s.NameAnita O Brennan                 NaN
## Card.Holder.s.NameAnnie R House                   NaN
## Card.Holder.s.NameAnthony C Sanchez               NaN
## Card.Holder.s.NameArthur Hines                    NaN
## Card.Holder.s.NameBilly O Bell                    NaN
## Card.Holder.s.NameBobby F Walker                  NaN
## Card.Holder.s.NameBrandon B Boone                 NaN
## Card.Holder.s.NameBrenda D Peterson               NaN
## Card.Holder.s.NameCharles R Brooks                NaN
## Card.Holder.s.NameCharles R Farrell               NaN
## Card.Holder.s.NameChristina G Francis             NaN
## Card.Holder.s.NameChristine E Kim                  NA
## Card.Holder.s.NameCraig Gonzalez                  NaN
## Card.Holder.s.NameCraig X Keith                   NaN
## Card.Holder.s.NameCynthia Evans                   NaN
## Card.Holder.s.NameDavid Z Hood                    NaN
## Card.Holder.s.NameDawn U Reese                    NaN
## Card.Holder.s.NameDiane Carroll                   NaN
## Card.Holder.s.NameDiane R Ross                    NaN
## Card.Holder.s.NameDon P Wooten                    NaN
## Card.Holder.s.NameDonna A Rodriguez               NaN
## Card.Holder.s.NameDorothy H Le                    NaN
## Card.Holder.s.NameEdward Z Owen                   NaN
## Card.Holder.s.NameEleanor Callahan                NaN
## Card.Holder.s.NameEleanor Crane                   NaN
## Card.Holder.s.NameElizabeth Z Lopez               NaN
## Card.Holder.s.NameFrances S Henderson             NaN
## Card.Holder.s.NameFrancis Koch                    NaN
## Card.Holder.s.NameGerald O Lee                    NaN
## Card.Holder.s.NameGrace D Sparks                  NaN
## Card.Holder.s.NameHeather W Acosta                NaN
## Card.Holder.s.NameHeather Wong                     NA
## Card.Holder.s.NameHelen P Perry                   NaN
## Card.Holder.s.NameHelen R Lucas                   NaN
## Card.Holder.s.NameHenry T Miller                  NaN
## Card.Holder.s.NameHenry W Ballard                 NaN
## Card.Holder.s.NameHenry W Hewitt                  NaN
## Card.Holder.s.NameHoward C Dunlap                 NaN
## Card.Holder.s.NameHoward R Bernard                NaN
## Card.Holder.s.NameHoward X Nelson                 NaN
## Card.Holder.s.NameJack Foster                     NaN
## Card.Holder.s.NameJanet Alexander                 NaN
## Card.Holder.s.NameJeffrey J Barnes                NaN
## Card.Holder.s.NameJennifer K Potts                NaN
## Card.Holder.s.NameJennifer T Turner               NaN
## Card.Holder.s.NameJeremy N White                  NaN
## Card.Holder.s.NameJeremy O Wilson                 NaN
## Card.Holder.s.NameJeremy P Price                  NaN
## Card.Holder.s.NameJesse J Perez                   NaN
## Card.Holder.s.NameJim X Ballard                   NaN
## Card.Holder.s.NameJimmy V Baker                   NaN
## Card.Holder.s.NameJoe N Mccarty                   NaN
## Card.Holder.s.NameJon O Reese                      NA
## Card.Holder.s.NameJoyce B Warren                  NaN
## Card.Holder.s.NameJudith I Rollins                NaN
## Card.Holder.s.NameJudith Y Ross                   NaN
## Card.Holder.s.NameJustin Y Coleman                NaN
## Card.Holder.s.NameKathy Robinson                  NaN
## Card.Holder.s.NameKeith Z Morris                  NaN
## Card.Holder.s.NameKristen Frank                    NA
## Card.Holder.s.NameLaura N Jackson                 NaN
## Card.Holder.s.NameLaura W Richardson              NaN
## Card.Holder.s.NameLeonard K Browning              NaN
## Card.Holder.s.NameLillian M Green                 NaN
## Card.Holder.s.NameLinda T Brown                   NaN
## Card.Holder.s.NameLouis Martinez                  NaN
## Card.Holder.s.NameMarcus D Ashley                 NaN
## Card.Holder.s.NameMargaret T Ramirez              NaN
## Card.Holder.s.NameMarilyn E Coleman               NaN
## Card.Holder.s.NameMario D Blanchard               NaN
## Card.Holder.s.NameMark Y Sheppard                  NA
## Card.Holder.s.NameNancy C Cox                     NaN
## Card.Holder.s.NameNancy S Robles                  NaN
## Card.Holder.s.NamePatricia Williams               NaN
## Card.Holder.s.NamePaula E Head                    NaN
## Card.Holder.s.NamePaula R Bryan                    NA
## Card.Holder.s.NamePhyllis H Johnson                NA
## Card.Holder.s.NameRachel Turner                   NaN
## Card.Holder.s.NameRachel V Jenkins                 NA
## Card.Holder.s.NameRandall Z Lott                  NaN
## Card.Holder.s.NameRebecca Jenkins                 NaN
## Card.Holder.s.NameRobert A Long                   NaN
## Card.Holder.s.NameRonald R Stewart                NaN
## Card.Holder.s.NameRose Jenkins                    NaN
## Card.Holder.s.NameSandra H Adams                   NA
## Card.Holder.s.NameSandra N Parrish                NaN
## Card.Holder.s.NameSean J Young                    NaN
## Card.Holder.s.NameStanley K Hopper                NaN
## Card.Holder.s.NameStephen H Ramirez                NA
## Card.Holder.s.NameVictor J Mercado                 NA
## Card.Holder.s.NameVincent M Fulton                 NA
## Card.Holder.s.NameVincent U Jordan                 NA
## Card.Holder.s.NameWalter B Lopez                   NA
## Card.Holder.s.NameWilliam N French                 NA
## Card.Holder.s.NameWillie O King                    NA
## CVV.CVV2                                           NA
## Issue.Date01/2009                                  NA
## Issue.Date01/2010                                  NA
## Issue.Date01/2011                                  NA
## Issue.Date01/2013                                  NA
## Issue.Date01/2016                                  NA
## Issue.Date01/2017                                  NA
## Issue.Date02/2007                                  NA
## Issue.Date02/2010                                  NA
## Issue.Date02/2012                                  NA
## Issue.Date02/2013                                  NA
## Issue.Date02/2014                                  NA
## Issue.Date02/2015                                  NA
## Issue.Date02/2016                                  NA
## Issue.Date03/2011                                  NA
## Issue.Date03/2012                                  NA
## Issue.Date03/2013                                  NA
## Issue.Date03/2015                                  NA
## Issue.Date03/2016                                  NA
## Issue.Date04/2007                                  NA
## Issue.Date04/2009                                  NA
## Issue.Date04/2011                                  NA
## Issue.Date04/2013                                  NA
## Issue.Date04/2015                                  NA
## Issue.Date04/2017                                  NA
## Issue.Date05/2010                                  NA
## Issue.Date05/2011                                  NA
## Issue.Date05/2015                                  NA
## Issue.Date05/2016                                  NA
## Issue.Date06/2007                                  NA
## Issue.Date06/2008                                  NA
## Issue.Date06/2011                                  NA
## Issue.Date06/2012                                  NA
## Issue.Date06/2014                                  NA
## Issue.Date06/2015                                  NA
## Issue.Date07/2007                                  NA
## Issue.Date07/2008                                  NA
## Issue.Date07/2009                                  NA
## Issue.Date07/2010                                  NA
## Issue.Date07/2015                                  NA
## Issue.Date07/2017                                  NA
## Issue.Date08/2007                                  NA
## Issue.Date08/2008                                  NA
## Issue.Date08/2010                                  NA
## Issue.Date08/2011                                  NA
## Issue.Date08/2013                                  NA
## Issue.Date08/2014                                  NA
## Issue.Date08/2015                                  NA
## Issue.Date08/2016                                  NA
## Issue.Date08/2017                                  NA
## Issue.Date09/2009                                  NA
## Issue.Date09/2010                                  NA
## Issue.Date09/2011                                  NA
## Issue.Date09/2012                                  NA
## Issue.Date09/2013                                  NA
## Issue.Date09/2017                                  NA
## Issue.Date10/2012                                  NA
## Issue.Date10/2013                                  NA
## Issue.Date10/2014                                  NA
## Issue.Date10/2015                                  NA
## Issue.Date10/2016                                  NA
## Issue.Date11/2009                                  NA
## Issue.Date11/2013                                  NA
## Issue.Date11/2015                                  NA
## Issue.Date11/2016                                  NA
## Issue.Date11/2017                                  NA
## Issue.Date12/2007                                  NA
## Issue.Date12/2015                                  NA
## Issue.Date12/2017                                  NA
## Expiry.Date01/2015                                 NA
## Expiry.Date01/2017                                 NA
## Expiry.Date01/2018                                 NA
## Expiry.Date01/2020                                 NA
## Expiry.Date01/2022                                 NA
## Expiry.Date01/2027                                 NA
## Expiry.Date01/2033                                 NA
## Expiry.Date02/2012                                 NA
## Expiry.Date02/2019                                 NA
## Expiry.Date02/2020                                 NA
## Expiry.Date02/2021                                 NA
## Expiry.Date02/2028                                 NA
## Expiry.Date02/2030                                 NA
## Expiry.Date02/2031                                 NA
## Expiry.Date03/2019                                 NA
## Expiry.Date03/2021                                 NA
## Expiry.Date03/2022                                 NA
## Expiry.Date03/2024                                 NA
## Expiry.Date03/2028                                 NA
## Expiry.Date04/2017                                 NA
## Expiry.Date04/2020                                 NA
## Expiry.Date04/2021                                 NA
## Expiry.Date04/2024                                 NA
## Expiry.Date04/2025                                 NA
## Expiry.Date04/2027                                 NA
## Expiry.Date04/2035                                 NA
## Expiry.Date05/2015                                 NA
## Expiry.Date05/2016                                 NA
## Expiry.Date05/2018                                 NA
## Expiry.Date05/2023                                 NA
## Expiry.Date05/2024                                 NA
## Expiry.Date06/2008                                 NA
## Expiry.Date06/2011                                 NA
## Expiry.Date06/2016                                 NA
## Expiry.Date06/2019                                 NA
## Expiry.Date06/2020                                 NA
## Expiry.Date06/2021                                 NA
## Expiry.Date06/2022                                 NA
## Expiry.Date06/2026                                 NA
## Expiry.Date06/2029                                 NA
## Expiry.Date06/2032                                 NA
## Expiry.Date06/2033                                 NA
## Expiry.Date07/2009                                 NA
## Expiry.Date07/2016                                 NA
## Expiry.Date07/2021                                 NA
## Expiry.Date07/2026                                 NA
## Expiry.Date07/2028                                 NA
## Expiry.Date08/2012                                 NA
## Expiry.Date08/2013                                 NA
## Expiry.Date08/2015                                 NA
## Expiry.Date08/2022                                 NA
## Expiry.Date08/2023                                 NA
## Expiry.Date08/2025                                 NA
## Expiry.Date08/2026                                 NA
## Expiry.Date08/2028                                 NA
## Expiry.Date08/2031                                 NA
## Expiry.Date08/2034                                 NA
## Expiry.Date09/2012                                 NA
## Expiry.Date09/2018                                 NA
## Expiry.Date09/2020                                 NA
## Expiry.Date09/2023                                 NA
## Expiry.Date09/2026                                 NA
## Expiry.Date09/2029                                 NA
## Expiry.Date10/2017                                 NA
## Expiry.Date10/2020                                 NA
## Expiry.Date10/2022                                 NA
## Expiry.Date10/2023                                 NA
## Expiry.Date10/2024                                 NA
## Expiry.Date10/2026                                 NA
## Expiry.Date10/2027                                 NA
## Expiry.Date10/2030                                 NA
## Expiry.Date11/2014                                 NA
## Expiry.Date11/2016                                 NA
## Expiry.Date11/2021                                 NA
## Expiry.Date11/2024                                 NA
## Expiry.Date11/2025                                 NA
## Expiry.Date11/2026                                 NA
## Expiry.Date11/2029                                 NA
## Expiry.Date11/2037                                 NA
## Expiry.Date12/2008                                 NA
## Expiry.Date12/2014                                 NA
## Expiry.Date12/2016                                 NA
## Expiry.Date12/2024                                 NA
## Expiry.Date12/2027                                 NA
## Billing.Date                                       NA
## Card.PIN                                           NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 99 and 0 DF,  p-value: NA

1.6 Data Cleaning

Now that we have look at the data and visualizes some of the columns that I believe are relevant, it is time to remove irrelevant columns and clean the data.

## Let's remove columns that would not be useful for credit limit estimation.. such as card number and card holder's name cvv and card pin since these are security features and doesn't contain any useful info..

ddf <- ddf %>%
  select(-Card.Number,-Card.Holder.s.Name,-CVV.CVV2,-Card.PIN)

Create proper date-time columns.. we are going to transform the issue dat and expiry dates into numerical features to help our model runs better. We are going to calculate the card age by subtracting the issue date from the e

library(lubridate)
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
ddf$Issue.Date <- as.Date(paste("01/", ddf$Issue.Date, sep=""), format="%d/%m/%Y")
ddf$Expiry.Date <- as.Date(paste("01/", ddf$Expiry.Date, sep=""), format="%d/%m/%Y")
head(ddf)
##   Card.Type.Code       Card.Type.Full.Name    Issuing.Bank Issue.Date
## 1             DS                  Discover        Discover 2017-01-01
## 2             DC Diners Club International     Diners Club 2015-12-01
## 3             DC Diners Club International     Diners Club 2007-02-01
## 4             AX          American Express Bank of America 2011-08-01
## 5             JC       Japan Credit Bureau             JCB 2017-11-01
## 6             JC       Japan Credit Bureau             JCB 2015-11-01
##   Expiry.Date Billing.Date Credit.Limit
## 1  2022-01-01            4        22700
## 2  2016-12-01           11        12700
## 3  2020-02-01           13        81900
## 4  2013-08-01            4        77700
## 5  2026-11-01           22        38300
## 6  2029-11-01           12       158000
## I calculated the card age by subtracting the expiry date from the issue date and counted the numbers of days.. between each card duration 
ddf$Card_Age <- as.numeric(difftime(ddf$Expiry.Date, ddf$Issue.Date, units = "days"))

head(ddf)
##   Card.Type.Code       Card.Type.Full.Name    Issuing.Bank Issue.Date
## 1             DS                  Discover        Discover 2017-01-01
## 2             DC Diners Club International     Diners Club 2015-12-01
## 3             DC Diners Club International     Diners Club 2007-02-01
## 4             AX          American Express Bank of America 2011-08-01
## 5             JC       Japan Credit Bureau             JCB 2017-11-01
## 6             JC       Japan Credit Bureau             JCB 2015-11-01
##   Expiry.Date Billing.Date Credit.Limit Card_Age
## 1  2022-01-01            4        22700     1826
## 2  2016-12-01           11        12700      366
## 3  2020-02-01           13        81900     4748
## 4  2013-08-01            4        77700      731
## 5  2026-11-01           22        38300     3287
## 6  2029-11-01           12       158000     5114
## FInally, I will remove the Card Type Code Since it just a code for the card type and The Card Type Full Name since it is just the full written name for the bank card.

Final_df <- ddf %>%
  select(-Card.Type.Code,-Card.Type.Full.Name)

head(Final_df)
##      Issuing.Bank Issue.Date Expiry.Date Billing.Date Credit.Limit Card_Age
## 1        Discover 2017-01-01  2022-01-01            4        22700     1826
## 2     Diners Club 2015-12-01  2016-12-01           11        12700      366
## 3     Diners Club 2007-02-01  2020-02-01           13        81900     4748
## 4 Bank of America 2011-08-01  2013-08-01            4        77700      731
## 5             JCB 2017-11-01  2026-11-01           22        38300     3287
## 6             JCB 2015-11-01  2029-11-01           12       158000     5114

1.7 Linear Regression

I will now use linear regression on the cleaned-up data, I will remove the card_age from the dataset since it is not useful at all in creating the linear regression model.

linearmod2 <- lm(Credit.Limit~.,data = Final_df)

summary(linearmod2)
## 
## Call:
## lm(formula = Credit.Limit ~ ., data = Final_df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -90124 -50042  -7724  42308 122986 
## 
## Coefficients: (1 not defined because of singularities)
##                               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                  2.014e+05  9.682e+04   2.080   0.0406 *
## Issuing.BankBank of America  2.797e+04  3.310e+04   0.845   0.4005  
## Issuing.BankBarclays         1.002e+04  3.317e+04   0.302   0.7633  
## Issuing.BankCabela\022s WFB  5.306e+04  6.542e+04   0.811   0.4197  
## Issuing.BankCapital One      1.227e+03  3.225e+04   0.038   0.9697  
## Issuing.BankChase           -4.236e+03  2.368e+04  -0.179   0.8585  
## Issuing.BankCitibank        -1.209e+03  3.221e+04  -0.038   0.9701  
## Issuing.BankDiners Club     -6.857e+03  2.246e+04  -0.305   0.7609  
## Issuing.BankDiscover         1.148e+04  2.501e+04   0.459   0.6476  
## Issuing.BankGE Capital       1.558e+04  4.725e+04   0.330   0.7424  
## Issuing.BankJCB              1.643e+04  2.304e+04   0.713   0.4779  
## Issuing.BankU.S. Bancorp     2.645e+04  6.508e+04   0.406   0.6855  
## Issuing.BankUSAA            -7.998e+04  6.464e+04  -1.237   0.2195  
## Issuing.BankWells Fargo     -6.760e+04  4.841e+04  -1.397   0.1663  
## Issue.Date                  -6.814e+00  6.555e+00  -1.040   0.3016  
## Expiry.Date                 -4.092e-01  3.535e+00  -0.116   0.9081  
## Billing.Date                 2.468e+02  7.757e+02   0.318   0.7512  
## Card_Age                            NA         NA      NA       NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61740 on 83 degrees of freedom
## Multiple R-squared:    0.1,  Adjusted R-squared:  -0.07349 
## F-statistic: 0.5764 on 16 and 83 DF,  p-value: 0.8931
## Remove this column from the predictor .
Final_df2 <- Final_df %>%
  select(-Card_Age)

This dataset seems to be pretty good atleast we have values for each of the predictors, though the linear regression model is doing a poor job at explaining this data.

linearmod2 <- lm(Credit.Limit~.,data = Final_df2)

summary(linearmod2)
## 
## Call:
## lm(formula = Credit.Limit ~ ., data = Final_df2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -90124 -50042  -7724  42308 122986 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                  2.014e+05  9.682e+04   2.080   0.0406 *
## Issuing.BankBank of America  2.797e+04  3.310e+04   0.845   0.4005  
## Issuing.BankBarclays         1.002e+04  3.317e+04   0.302   0.7633  
## Issuing.BankCabela\022s WFB  5.306e+04  6.542e+04   0.811   0.4197  
## Issuing.BankCapital One      1.227e+03  3.225e+04   0.038   0.9697  
## Issuing.BankChase           -4.236e+03  2.368e+04  -0.179   0.8585  
## Issuing.BankCitibank        -1.209e+03  3.221e+04  -0.038   0.9701  
## Issuing.BankDiners Club     -6.857e+03  2.246e+04  -0.305   0.7609  
## Issuing.BankDiscover         1.148e+04  2.501e+04   0.459   0.6476  
## Issuing.BankGE Capital       1.558e+04  4.725e+04   0.330   0.7424  
## Issuing.BankJCB              1.643e+04  2.304e+04   0.713   0.4779  
## Issuing.BankU.S. Bancorp     2.645e+04  6.508e+04   0.406   0.6855  
## Issuing.BankUSAA            -7.998e+04  6.464e+04  -1.237   0.2195  
## Issuing.BankWells Fargo     -6.760e+04  4.841e+04  -1.397   0.1663  
## Issue.Date                  -6.814e+00  6.555e+00  -1.040   0.3016  
## Expiry.Date                 -4.092e-01  3.535e+00  -0.116   0.9081  
## Billing.Date                 2.468e+02  7.757e+02   0.318   0.7512  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61740 on 83 degrees of freedom
## Multiple R-squared:    0.1,  Adjusted R-squared:  -0.07349 
## F-statistic: 0.5764 on 16 and 83 DF,  p-value: 0.8931

1.8 Creating dummy variables for Issung.Bank

I was curious to see if one-hot encoding would do anything but alas it seems there were NA values so it seems I will stick with the final_df2 and leave it with that..

library(caret)

encoded_data <- model.matrix(~Issuing.Bank - 1,data = Final_df2)

head(encoded_data)
##   Issuing.BankAmerican Express Issuing.BankBank of America Issuing.BankBarclays
## 1                            0                           0                    0
## 2                            0                           0                    0
## 3                            0                           0                    0
## 4                            0                           1                    0
## 5                            0                           0                    0
## 6                            0                           0                    0
##   Issuing.BankCabela\022s WFB Issuing.BankCapital One Issuing.BankChase
## 1                           0                       0                 0
## 2                           0                       0                 0
## 3                           0                       0                 0
## 4                           0                       0                 0
## 5                           0                       0                 0
## 6                           0                       0                 0
##   Issuing.BankCitibank Issuing.BankDiners Club Issuing.BankDiscover
## 1                    0                       0                    1
## 2                    0                       1                    0
## 3                    0                       1                    0
## 4                    0                       0                    0
## 5                    0                       0                    0
## 6                    0                       0                    0
##   Issuing.BankGE Capital Issuing.BankJCB Issuing.BankU.S. Bancorp
## 1                      0               0                        0
## 2                      0               0                        0
## 3                      0               0                        0
## 4                      0               0                        0
## 5                      0               1                        0
## 6                      0               1                        0
##   Issuing.BankUSAA Issuing.BankWells Fargo
## 1                0                       0
## 2                0                       0
## 3                0                       0
## 4                0                       0
## 5                0                       0
## 6                0                       0
Final_df3 <- cbind(encoded_data,Final_df2)
linearmod3 <- lm(Credit.Limit~.,data = Final_df3)

summary(linearmod3)
## 
## Call:
## lm(formula = Credit.Limit ~ ., data = Final_df3)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -90124 -50042  -7724  42308 122986 
## 
## Coefficients: (14 not defined because of singularities)
##                                  Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                     1.338e+05  9.511e+04   1.407   0.1632  
## `Issuing.BankAmerican Express`  6.760e+04  4.841e+04   1.397   0.1663  
## `Issuing.BankBank of America`   9.557e+04  5.223e+04   1.830   0.0709 .
## Issuing.BankBarclays            7.762e+04  5.268e+04   1.474   0.1444  
## `Issuing.BankCabela\\022s WFB`  1.207e+05  7.636e+04   1.580   0.1179  
## `Issuing.BankCapital One`       6.883e+04  5.326e+04   1.292   0.1998  
## Issuing.BankChase               6.337e+04  4.710e+04   1.345   0.1822  
## Issuing.BankCitibank            6.639e+04  5.313e+04   1.250   0.2149  
## `Issuing.BankDiners Club`       6.075e+04  4.712e+04   1.289   0.2009  
## Issuing.BankDiscover            7.908e+04  4.903e+04   1.613   0.1105  
## `Issuing.BankGE Capital`        8.318e+04  6.234e+04   1.334   0.1858  
## Issuing.BankJCB                 8.403e+04  4.823e+04   1.742   0.0852 .
## `Issuing.BankU.S. Bancorp`      9.405e+04  7.562e+04   1.244   0.2171  
## Issuing.BankUSAA               -1.238e+04  7.597e+04  -0.163   0.8710  
## `Issuing.BankWells Fargo`              NA         NA      NA       NA  
## Issuing.BankBank of America            NA         NA      NA       NA  
## Issuing.BankBarclays                   NA         NA      NA       NA  
## Issuing.BankCabela\022s WFB            NA         NA      NA       NA  
## Issuing.BankCapital One                NA         NA      NA       NA  
## Issuing.BankChase                      NA         NA      NA       NA  
## Issuing.BankCitibank                   NA         NA      NA       NA  
## Issuing.BankDiners Club                NA         NA      NA       NA  
## Issuing.BankDiscover                   NA         NA      NA       NA  
## Issuing.BankGE Capital                 NA         NA      NA       NA  
## Issuing.BankJCB                        NA         NA      NA       NA  
## Issuing.BankU.S. Bancorp               NA         NA      NA       NA  
## Issuing.BankUSAA                       NA         NA      NA       NA  
## Issuing.BankWells Fargo                NA         NA      NA       NA  
## Issue.Date                     -6.814e+00  6.555e+00  -1.040   0.3016  
## Expiry.Date                    -4.092e-01  3.535e+00  -0.116   0.9081  
## Billing.Date                    2.468e+02  7.757e+02   0.318   0.7512  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61740 on 83 degrees of freedom
## Multiple R-squared:    0.1,  Adjusted R-squared:  -0.07349 
## F-statistic: 0.5764 on 16 and 83 DF,  p-value: 0.8931

1.9 Utlizing Cross_validation

I will now utilize cross validation in at attempt to improve my linear regression model and use RMSe as my metric to evaluate my model.

I will make a testing and training dataset.. (I had to stratify the data on Issuing.Bank because the dataset gets split in scenarios where the unknown labels are either in the training dataset or testset)

set.seed(123)
split_index <- createDataPartition(Final_df2$Issuing.Bank,p=0.7,list = FALSE)
## Warning in createDataPartition(Final_df2$Issuing.Bank, p = 0.7, list = FALSE):
## Some classes have a single record ( Cabelas WFB, U.S. Bancorp, USAA ) and these
## will be selected for the sample
training_data <- Final_df2[split_index,]
testing_data <- Final_df2[-split_index,]
## Set a 5 fold cross validation
set.seed(5)
control <- trainControl(method = "cv",number = 5)
metric = "rmse"
## Fit a Linear Algorithm.. with the cross validation..
set.seed(1234)
fit.lm <- train(Credit.Limit~.,data = training_data,method = "lm",trControl = control)
## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases

## Warning in predict.lm(modelFit, newdata): prediction from rank-deficient fit;
## attr(*, "non-estim") has doubtful cases

Here we see the performance result for each iteration we refer to the rsample object of the model

fit.lm$resample %>%
  arrange(Resample)
##       RMSE     Rsquared      MAE Resample
## 1 76504.87 3.686655e-02 66719.57    Fold1
## 2 73557.54 9.463288e-02 66697.10    Fold2
## 3 64955.93 1.561662e-02 53366.86    Fold3
## 4 64067.66 1.038633e-06 55772.15    Fold4
## 5 75021.51 9.062165e-03 62802.01    Fold5

The estimated accuracy of the model is the average of the five iterations so average RMSE is around 65877

fit.lm$resample %>%
  arrange(Resample) %>%
  summarise(AVGRMSE  = mean(RMSE))
##   AVGRMSE
## 1 70821.5

Meanwhile the average Rsquared of the model is around 3% percent using 5-fold cross validation

fit.lm$resample %>%
  arrange(Resample) %>%
  summarise(AvgRsquared  = mean(Rsquared))
##   AvgRsquared
## 1  0.03123585

Here is all the evaluation metric displayed (R-squared,RMSE,MAE) the model is pretty terrible at predicting the credit limit.

print(fit.lm)
## Linear Regression 
## 
## 76 samples
##  4 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 60, 63, 60, 61, 60 
## Resampling results:
## 
##   RMSE     Rsquared    MAE     
##   70821.5  0.03123585  61071.54
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE
predictions <- predict(fit.lm,newdata = testing_data)
head(predictions)
##        10        34        38        43        51        52 
##  58484.33 118943.20  80597.11 105214.91  86101.50  90847.18

1.10 Decision-Tree

I will now attempt to use decision tree in order to improve my predictions. I will use the Caret package and the classwork from Week # 5 to help guide me with the predictions

## Use the textbook example..

library(rpart)
set.seed(3456)
fit <- rpart(Credit.Limit~.,,method = 'anova',data = training_data)

The default k-fold cross validation is 10 so we are using 10-folds cross-validation..

## plot the cv error curve for the tree.. and visualizes the cross-validation results

plotcp(fit)

printcp(fit)
## 
## Regression tree:
## rpart(formula = Credit.Limit ~ ., data = training_data, method = "anova")
## 
## Variables actually used in tree construction:
## [1] Billing.Date Issuing.Bank
## 
## Root node error: 2.7825e+11/76 = 3661182291
## 
## n= 76 
## 
##         CP nsplit rel error xerror     xstd
## 1 0.104690      0   1.00000 1.0211 0.093379
## 2 0.047020      2   0.79062 1.1893 0.137519
## 3 0.017008      5   0.64956 1.2033 0.160569
## 4 0.010000      6   0.63255 1.1871 0.162612
summary(fit)
## Call:
## rpart(formula = Credit.Limit ~ ., data = training_data, method = "anova")
##   n= 76 
## 
##           CP nsplit rel error   xerror      xstd
## 1 0.10469046      0 1.0000000 1.021110 0.0933792
## 2 0.04701965      2 0.7906191 1.189330 0.1375194
## 3 0.01700815      5 0.6495601 1.203257 0.1605691
## 4 0.01000000      6 0.6325520 1.187051 0.1626117
## 
## Variable importance
## Billing.Date Issuing.Bank  Expiry.Date   Issue.Date 
##           48           33           12            8 
## 
## Node number 1: 76 observations,    complexity param=0.1046905
##   mean=92885.53, MSE=3.661182e+09 
##   left son=2 (52 obs) right son=3 (24 obs)
##   Primary splits:
##       Issuing.Bank splits as  LRRRRLLLRRLRLL, improve=0.07126884, (0 missing)
##       Billing.Date < 9.5     to the left,  improve=0.05548784, (0 missing)
##       Issue.Date   < 14579.5 to the right, improve=0.04089661, (0 missing)
##       Expiry.Date  < 21381.5 to the right, improve=0.01537840, (0 missing)
##   Surrogate splits:
##       Expiry.Date < 14776   to the right, agree=0.697, adj=0.042, (0 split)
## 
## Node number 2: 52 observations,    complexity param=0.04701965
##   mean=81911.54, MSE=3.540029e+09 
##   left son=4 (16 obs) right son=5 (36 obs)
##   Primary splits:
##       Billing.Date < 21      to the right, improve=0.04930842, (0 missing)
##       Issuing.Bank splits as  R----RLR--R-LL, improve=0.04543175, (0 missing)
##       Expiry.Date  < 19235.5 to the left,  improve=0.04107124, (0 missing)
##       Issue.Date   < 14518   to the right, improve=0.01963911, (0 missing)
##   Surrogate splits:
##       Expiry.Date < 22340   to the right, agree=0.75, adj=0.188, (0 split)
## 
## Node number 3: 24 observations,    complexity param=0.1046905
##   mean=116662.5, MSE=3.097409e+09 
##   left son=6 (15 obs) right son=7 (9 obs)
##   Primary splits:
##       Billing.Date < 20.5    to the left,  improve=0.51695990, (0 missing)
##       Issue.Date   < 15095.5 to the right, improve=0.09541808, (0 missing)
##       Expiry.Date  < 18566   to the right, improve=0.06577621, (0 missing)
##       Issuing.Bank splits as  -RLRL---RL-R--, improve=0.02043071, (0 missing)
##   Surrogate splits:
##       Expiry.Date  < 15735.5 to the right, agree=0.750, adj=0.333, (0 split)
##       Issue.Date   < 14442   to the right, agree=0.708, adj=0.222, (0 split)
##       Issuing.Bank splits as  -LLRL---LL-L--, agree=0.667, adj=0.111, (0 split)
## 
## Node number 4: 16 observations
##   mean=62093.75, MSE=1.663733e+09 
## 
## Node number 5: 36 observations,    complexity param=0.04701965
##   mean=90719.44, MSE=4.121805e+09 
##   left son=10 (7 obs) right son=11 (29 obs)
##   Primary splits:
##       Issuing.Bank splits as  R----RLR--R-LL, improve=0.09826102, (0 missing)
##       Billing.Date < 14.5    to the left,  improve=0.07934006, (0 missing)
##       Expiry.Date  < 20499   to the left,  improve=0.06797713, (0 missing)
##       Issue.Date   < 14518   to the right, improve=0.02139059, (0 missing)
## 
## Node number 6: 15 observations
##   mean=85666.67, MSE=2.207621e+09 
## 
## Node number 7: 9 observations
##   mean=168322.2, MSE=3.104262e+08 
## 
## Node number 10: 7 observations
##   mean=49757.14, MSE=2.097682e+09 
## 
## Node number 11: 29 observations,    complexity param=0.04701965
##   mean=100606.9, MSE=4.107612e+09 
##   left son=22 (22 obs) right son=23 (7 obs)
##   Primary splits:
##       Billing.Date < 14.5    to the left,  improve=0.13089580, (0 missing)
##       Issuing.Bank splits as  R----L-L--R---, improve=0.02664292, (0 missing)
##       Expiry.Date  < 20499   to the left,  improve=0.02592121, (0 missing)
##       Issue.Date   < 14518   to the right, improve=0.02488519, (0 missing)
## 
## Node number 22: 22 observations,    complexity param=0.01700815
##   mean=87527.27, MSE=4.376385e+09 
##   left son=44 (9 obs) right son=45 (13 obs)
##   Primary splits:
##       Issuing.Bank splits as  L----R-L--R---, improve=0.04915344, (0 missing)
##       Expiry.Date  < 18915.5 to the left,  improve=0.02988246, (0 missing)
##       Billing.Date < 9.5     to the left,  improve=0.02918640, (0 missing)
##       Issue.Date   < 15066.5 to the right, improve=0.01159310, (0 missing)
##   Surrogate splits:
##       Issue.Date   < 13848   to the left,  agree=0.727, adj=0.333, (0 split)
##       Billing.Date < 1.5     to the left,  agree=0.682, adj=0.222, (0 split)
##       Expiry.Date  < 18673   to the left,  agree=0.636, adj=0.111, (0 split)
## 
## Node number 23: 7 observations
##   mean=141714.3, MSE=1.035413e+09 
## 
## Node number 44: 9 observations
##   mean=69900, MSE=3.379296e+09 
## 
## Node number 45: 13 observations
##   mean=99730.77, MSE=4.702638e+09
par(mfrow=c(1,2)) # two plots on one page
rsq.rpart(fit) 
## 
## Regression tree:
## rpart(formula = Credit.Limit ~ ., data = training_data, method = "anova")
## 
## Variables actually used in tree construction:
## [1] Billing.Date Issuing.Bank
## 
## Root node error: 2.7825e+11/76 = 3661182291
## 
## n= 76 
## 
##         CP nsplit rel error xerror     xstd
## 1 0.104690      0   1.00000 1.0211 0.093379
## 2 0.047020      2   0.79062 1.1893 0.137519
## 3 0.017008      5   0.64956 1.2033 0.160569
## 4 0.010000      6   0.63255 1.1871 0.162612

# create attractive postcript plot of tree
plot(fit, uniform=TRUE,
   main="Regression Tree for Mileage ")
text(fit, use.n=TRUE, all=TRUE, cex=.8)

post(fit, title = "Regression Tree for Credit Limit ")
## Warning in text.default(rightptx, rightpty - 0.52 * cxy[2L],
## rows[right.child[!is.na(right.child)]], : font width unknown for character 0x12
## Warning in text.default(rightptx, rightpty - 0.52 * cxy[2L],
## rows[right.child[!is.na(right.child)]], : font metrics unknown for character
## 0x12
# Create predictions using the decision tree..

predictions1 <- predict(fit,newdata = testing_data)

head(predictions1)
##        10        34        38        43        51        52 
##  69900.00 168322.22  99730.77  99730.77  62093.75  62093.75
RMSE(predictions1,testing_data$Credit.Limit)
## [1] 71028.21

1.11 Choosing THe Minimum Cost Penalty

I was curious why the tree didn’t choouse the minimum cp since it seems like the obvious choice however after looking at the predictions I can see why.. Interesting enough we get the same predictions everytime since there were no splits being made at the tree.. The RMSE was also reduced from the second predictions appromixately a difference of 20k..

## Finds the best value of cp..

min_cp <- fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"]

min_cp
## [1] 0.1046905

Using the minimal cost penalty doesn’t help with the predictions..

min_cp_optimal <- prune(fit,cp = min_cp)
predictions3 <- predict(min_cp_optimal,newdata = testing_data)
head(predictions3)
## [1] 92885.53 92885.53 92885.53 92885.53 92885.53 92885.53
RMSE(predictions3,testing_data$Credit.Limit)
## [1] 55259.12

1.12 Thoughts

From the way I have cleaned up the data, and attempted to use K-folds cross-validation in order to improve the data the quality of the data was terrible, Regardless of how I cleaned it, it seems to always give terrible predictions no matter how much I tried removing predictors or use techniques like cross-validation. This data set contained a lot of inconsistent data such as personal information of the users and their credit card information which I felt would not be helpful at all for the predictions. It also may be the case that there was simply not a lot of observations of the data.