Here I am reviewing why CredX, a leading credit card provider, has been experienced an increased in credit loss. I am responsible for mitigating the risk to “acquire the right customers” for CredX. The data provided from CredX contains inforamtion of 71,295 customers of recent. There are two data sets; Credit Bureau and Demographics data.
credit <- read.csv("C:/Users/khend/Documents/R/Credit_Bureau.csv")
demo <- read.csv("C:/Users/khend/Documents/R/Demogs.csv")
Z <- merge(credit, demo,"Application.ID")
Provided below is a cleaner joint data frame of the Credit Bureau Data and Demographics Data.To begin the research it is required that we join the data to review it more properly.
Cred_Demo <- data.frame(Z[1:10,1:30])
Cred_Demo
## Application.ID No.of.times.90.DPD.or.worse.in.last.6.months
## 1 100450 0
## 2 128993 1
## 3 142768 0
## 4 176721 0
## 5 197956 0
## 6 203973 0
## 7 207075 1
## 8 210394 0
## 9 223052 1
## 10 237197 0
## No.of.times.60.DPD.or.worse.in.last.6.months
## 1 0
## 2 1
## 3 1
## 4 0
## 5 0
## 6 0
## 7 2
## 8 0
## 9 1
## 10 0
## No.of.times.30.DPD.or.worse.in.last.6.months
## 1 0
## 2 2
## 3 1
## 4 0
## 5 0
## 6 0
## 7 2
## 8 0
## 9 1
## 10 0
## No.of.times.90.DPD.or.worse.in.last.12.months
## 1 0
## 2 1
## 3 1
## 4 0
## 5 0
## 6 0
## 7 2
## 8 0
## 9 1
## 10 0
## No.of.times.60.DPD.or.worse.in.last.12.months
## 1 0
## 2 1
## 3 1
## 4 0
## 5 0
## 6 0
## 7 2
## 8 0
## 9 1
## 10 0
## No.of.times.30.DPD.or.worse.in.last.12.months
## 1 0
## 2 2
## 3 1
## 4 0
## 5 0
## 6 0
## 7 2
## 8 0
## 9 1
## 10 0
## Avgas.CC.Utilization.in.last.12.months No.of.trades.opened.in.last.6.months
## 1 113 2
## 2 9 1
## 3 34 2
## 4 18 0
## 5 3 7
## 6 6 1
## 7 70 4
## 8 0 1
## 9 48 2
## 10 13 1
## No.of.trades.opened.in.last.12.months
## 1 8
## 2 6
## 3 7
## 4 1
## 5 13
## 6 2
## 7 11
## 8 2
## 9 5
## 10 3
## No.of.PL.trades.opened.in.last.6.months
## 1 2
## 2 1
## 3 2
## 4 0
## 5 5
## 6 0
## 7 3
## 8 0
## 9 2
## 10 0
## No.of.PL.trades.opened.in.last.12.months
## 1 5
## 2 4
## 3 4
## 4 1
## 5 5
## 6 0
## 7 6
## 8 1
## 9 3
## 10 1
## No.of.Inquiries.in.last.6.months..excluding.home...auto.loans.
## 1 1
## 2 4
## 3 1
## 4 2
## 5 6
## 6 0
## 7 3
## 8 1
## 9 2
## 10 2
## No.of.Inquiries.in.last.12.months..excluding.home...auto.loans.
## 1 3
## 2 7
## 3 2
## 4 3
## 5 11
## 6 0
## 7 4
## 8 3
## 9 5
## 10 4
## Presence.of.open.home.loan Outstanding.Balance Total.No.of.Trades
## 1 1 3903438 9
## 2 0 741058 8
## 3 0 815325 9
## 4 0 209593 3
## 5 0 992024 25
## 6 0 556 5
## 7 0 1072760 11
## 8 0 202816 4
## 9 0 575772 6
## 10 0 204444 5
## Presence.of.open.auto.loan Performance.Tag.x Age Gender
## 1 0 0 52 M
## 2 0 1 36 M
## 3 0 0 55 M
## 4 0 0 55 M
## 5 0 0 28 F
## 6 0 0 43 F
## 7 0 NA 46 M
## 8 0 0 42 M
## 9 0 0 51 M
## 10 0 0 44 M
## Marital.Status..at.the.time.of.application. No.of.dependents Income
## 1 Married 4 32.0
## 2 Married 4 13.0
## 3 Married 1 29.0
## 4 Married 3 53.0
## 5 Married 3 35.0
## 6 Single 1 35.0
## 7 Married 3 6.0
## 8 Married 2 43.0
## 9 Married 3 4.5
## 10 Married 4 5.0
## Education Profession Type.of.residence No.of.months.in.current.residence
## 1 Professional SE_PROF Rented 79
## 2 Professional SAL Rented 6
## 3 Phd SAL Rented 46
## 4 Professional SE_PROF Rented 6
## 5 Masters SAL Rented 6
## 6 Professional SAL Rented 6
## 7 Bachelor SAL Rented 6
## 8 Bachelor SE_PROF Rented 6
## 9 Masters SAL Rented 83
## 10 Professional SAL Rented 6
## No.of.months.in.current.company Performance.Tag.y
## 1 3 0
## 2 21 1
## 3 3 0
## 4 27 0
## 5 43 0
## 6 52 0
## 7 19 NA
## 8 3 0
## 9 48 0
## 10 38 0
Within the data provided, we identified several “Application IDs” that were duplicated. You can see that there are three segments of “Application IDs” that are repeated each three times. This means that they should be removed completely from the data as we do not know which one is a reliable measurement.
Dup_Val <- Z[duplicated(Z$Application.ID),]
Dup_Val
## Application.ID No.of.times.90.DPD.or.worse.in.last.6.months
## 46614 653287861 0
## 46615 653287861 1
## 46616 653287861 1
## 47932 671989187 1
## 47933 671989187 0
## 47934 671989187 0
## 54596 765011468 0
## 54597 765011468 0
## 54598 765011468 0
## No.of.times.60.DPD.or.worse.in.last.6.months
## 46614 0
## 46615 1
## 46616 1
## 47932 2
## 47933 1
## 47934 1
## 54596 0
## 54597 0
## 54598 0
## No.of.times.30.DPD.or.worse.in.last.6.months
## 46614 0
## 46615 1
## 46616 1
## 47932 3
## 47933 2
## 47934 2
## 54596 0
## 54597 0
## 54598 0
## No.of.times.90.DPD.or.worse.in.last.12.months
## 46614 0
## 46615 2
## 46616 2
## 47932 2
## 47933 0
## 47934 0
## 54596 0
## 54597 0
## 54598 0
## No.of.times.60.DPD.or.worse.in.last.12.months
## 46614 0
## 46615 2
## 46616 2
## 47932 2
## 47933 2
## 47934 2
## 54596 0
## 54597 0
## 54598 0
## No.of.times.30.DPD.or.worse.in.last.12.months
## 46614 0
## 46615 2
## 46616 2
## 47932 3
## 47933 3
## 47934 3
## 54596 0
## 54597 0
## 54598 0
## Avgas.CC.Utilization.in.last.12.months
## 46614 15
## 46615 113
## 46616 113
## 47932 63
## 47933 76
## 47934 76
## 54596 11
## 54597 11
## 54598 11
## No.of.trades.opened.in.last.6.months
## 46614 0
## 46615 2
## 46616 2
## 47932 2
## 47933 3
## 47934 3
## 54596 1
## 54597 1
## 54598 1
## No.of.trades.opened.in.last.12.months
## 46614 0
## 46615 5
## 46616 5
## 47932 6
## 47933 7
## 47934 7
## 54596 2
## 54597 3
## 54598 3
## No.of.PL.trades.opened.in.last.6.months
## 46614 0
## 46615 1
## 46616 1
## 47932 1
## 47933 1
## 47934 1
## 54596 0
## 54597 0
## 54598 0
## No.of.PL.trades.opened.in.last.12.months
## 46614 0
## 46615 3
## 46616 3
## 47932 4
## 47933 4
## 47934 4
## 54596 1
## 54597 0
## 54598 0
## No.of.Inquiries.in.last.6.months..excluding.home...auto.loans.
## 46614 0
## 46615 1
## 46616 1
## 47932 3
## 47933 2
## 47934 2
## 54596 2
## 54597 1
## 54598 1
## No.of.Inquiries.in.last.12.months..excluding.home...auto.loans.
## 46614 0
## 46615 3
## 46616 3
## 47932 5
## 47933 6
## 47934 6
## 54596 2
## 54597 3
## 54598 3
## Presence.of.open.home.loan Outstanding.Balance Total.No.of.Trades
## 46614 0 10494 2
## 46615 0 628075 6
## 46616 0 628075 6
## 47932 0 806473 7
## 47933 0 822298 7
## 47934 0 822298 7
## 54596 0 212656 3
## 54597 0 29817 6
## 54598 0 29817 6
## Presence.of.open.auto.loan Performance.Tag.x Age Gender
## 46614 0 0 40 M
## 46615 0 1 26 M
## 46616 0 1 40 M
## 47932 1 0 57 M
## 47933 0 0 27 M
## 47934 0 0 57 M
## 54596 0 0 38 M
## 54597 0 0 57 M
## 54598 0 0 38 M
## Marital.Status..at.the.time.of.application. No.of.dependents Income
## 46614 Married 5 32.0
## 46615 Married 3 25.0
## 46616 Married 5 32.0
## 47932 Married 4 7.0
## 47933 Married 2 35.0
## 47934 Married 4 7.0
## 54596 Married 4 4.5
## 54597 Single 4 4.5
## 54598 Married 4 4.5
## Education Profession Type.of.residence
## 46614 Phd SE Rented
## 46615 Bachelor SAL Rented
## 46616 Phd SE Rented
## 47932 Professional SE Rented
## 47933 Professional SAL Living with Parents
## 47934 Professional SE Rented
## 54596 Professional SAL Rented
## 54597 Bachelor SAL Company provided
## 54598 Professional SAL Rented
## No.of.months.in.current.residence No.of.months.in.current.company
## 46614 45 46
## 46615 112 58
## 46616 45 46
## 47932 42 3
## 47933 108 7
## 47934 42 3
## 54596 6 72
## 54597 6 56
## 54598 6 72
## Performance.Tag.y
## 46614 1
## 46615 0
## 46616 1
## 47932 0
## 47933 0
## 47934 0
## 54596 0
## 54597 0
## 54598 0
With research within both the Credit Bureau Data and the Demographics we identified numerous variables that has missing values. These missing values are vital to understand, being that we are forced to remove them from the following data visualizations. The missing values will create false data or incorrect data for the reserach. As you can see below there are 272 missing values of “outstanding balance” and "presence of open home loan each one of these observations or people should be removed from the data to ensure correctiveness.
Missing_Var <-colSums(is.na(Z))
Missing_Var
## Application.ID
## 0
## No.of.times.90.DPD.or.worse.in.last.6.months
## 0
## No.of.times.60.DPD.or.worse.in.last.6.months
## 0
## No.of.times.30.DPD.or.worse.in.last.6.months
## 0
## No.of.times.90.DPD.or.worse.in.last.12.months
## 0
## No.of.times.60.DPD.or.worse.in.last.12.months
## 0
## No.of.times.30.DPD.or.worse.in.last.12.months
## 0
## Avgas.CC.Utilization.in.last.12.months
## 1058
## No.of.trades.opened.in.last.6.months
## 1
## No.of.trades.opened.in.last.12.months
## 0
## No.of.PL.trades.opened.in.last.6.months
## 0
## No.of.PL.trades.opened.in.last.12.months
## 0
## No.of.Inquiries.in.last.6.months..excluding.home...auto.loans.
## 0
## No.of.Inquiries.in.last.12.months..excluding.home...auto.loans.
## 0
## Presence.of.open.home.loan
## 272
## Outstanding.Balance
## 272
## Total.No.of.Trades
## 0
## Presence.of.open.auto.loan
## 0
## Performance.Tag.x
## 1425
## Age
## 0
## Gender
## 0
## Marital.Status..at.the.time.of.application.
## 0
## No.of.dependents
## 3
## Income
## 0
## Education
## 0
## Profession
## 0
## Type.of.residence
## 0
## No.of.months.in.current.residence
## 0
## No.of.months.in.current.company
## 0
## Performance.Tag.y
## 1425
I was curious to see if the outstanding balances that CredX provided would vary that much due to the education brackets that each indivial pertains to. The educations are broken into five different segments: Bachelor, Masters, Professional, PHD, and Other.It turns out that education background has not made much of a difference between the CredX users outstanding balances, as they all have similar averages of out standing balances(hence the middle line within the box). One thing to recognize, is the large amount of outliers within the “Other” category of education. That being said, I would advise CredX to try and market to more individuals that have a higher education.
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'ggplot2' was built under R version 3.6.3
Below we are reviewing the frequency of each Age group based on the Demographic data provided by CredX. Here you can see that the most common users of CredX are between their late 30’s to their late 50’s. However, if you want to dive deeper into the data the most common ages are late 30’s into early 40’s. CredX should remove trying to target adults younger than 35 and older than 55 to maximize the credit users and return.
Throughout this examination I was hoping to clarify and identify the “perfect customer” for CredX, but it seems I would have to do more research within the data or examine different variables. For example when researching the outstanding balances segmented by eahc education bracket of the demogrpahics I was hoping to discover something more significant. To be totally honest, this project was very difficult even though my data visualizations tables and graphs are not very developed or in depth. I was only able to do the bare minimum. The Data Camp exercises are great but they do not prepare us for this type of project, unless we have had prior coding experience.