Credit Card Customer Dataset

Introduction

This data set show that 10,000 customer mentioning their age, salary, marital_status, credit card limit, credit card category, etc.

We will done exploratory analysis on this data to understand customer demography, customer credit card usage and loyal customer from this bank dataset. This dataset is download from Kaggle

data <- read.csv("C:/Users/ANDREW/OneDrive/Desktop/UM/Course/WQD 7004 - Programming for Data Science/2. Individual Assignment 04.22/archive/BankChurners.csv")
# To show the first 10 row of the dataset
head(data, n=10)
##    CLIENTNUM    Attrition_Flag Customer_Age Gender Dependent_count
## 1  768805383 Existing Customer           45      M               3
## 2  818770008 Existing Customer           49      F               5
## 3  713982108 Existing Customer           51      M               3
## 4  769911858 Existing Customer           40      F               4
## 5  709106358 Existing Customer           40      M               3
## 6  713061558 Existing Customer           44      M               2
## 7  810347208 Existing Customer           51      M               4
## 8  818906208 Existing Customer           32      M               0
## 9  710930508 Existing Customer           37      M               3
## 10 719661558 Existing Customer           48      M               2
##    Education_Level Marital_Status Income_Category Card_Category Months_on_book
## 1      High School        Married     $60K - $80K          Blue             39
## 2         Graduate         Single  Less than $40K          Blue             44
## 3         Graduate        Married    $80K - $120K          Blue             36
## 4      High School        Unknown  Less than $40K          Blue             34
## 5       Uneducated        Married     $60K - $80K          Blue             21
## 6         Graduate        Married     $40K - $60K          Blue             36
## 7          Unknown        Married         $120K +          Gold             46
## 8      High School        Unknown     $60K - $80K        Silver             27
## 9       Uneducated         Single     $60K - $80K          Blue             36
## 10        Graduate         Single    $80K - $120K          Blue             36
##    Total_Relationship_Count Months_Inactive_12_mon Contacts_Count_12_mon
## 1                         5                      1                     3
## 2                         6                      1                     2
## 3                         4                      1                     0
## 4                         3                      4                     1
## 5                         5                      1                     0
## 6                         3                      1                     2
## 7                         6                      1                     3
## 8                         2                      2                     2
## 9                         5                      2                     0
## 10                        6                      3                     3
##    Credit_Limit Total_Revolving_Bal Avg_Open_To_Buy Total_Amt_Chng_Q4_Q1
## 1         12691                 777           11914                1.335
## 2          8256                 864            7392                1.541
## 3          3418                   0            3418                2.594
## 4          3313                2517             796                1.405
## 5          4716                   0            4716                2.175
## 6          4010                1247            2763                1.376
## 7         34516                2264           32252                1.975
## 8         29081                1396           27685                2.204
## 9         22352                2517           19835                3.355
## 10        11656                1677            9979                1.524
##    Total_Trans_Amt Total_Trans_Ct Total_Ct_Chng_Q4_Q1 Avg_Utilization_Ratio
## 1             1144             42               1.625                 0.061
## 2             1291             33               3.714                 0.105
## 3             1887             20               2.333                 0.000
## 4             1171             20               2.333                 0.760
## 5              816             28               2.500                 0.000
## 6             1088             24               0.846                 0.311
## 7             1330             31               0.722                 0.066
## 8             1538             36               0.714                 0.048
## 9             1350             24               1.182                 0.113
## 10            1441             32               0.882                 0.144

Packages info

  1. library(knitr)

  2. library(dplyr)

  3. library(ggplot2) - used to plot data for data visual

  4. library(data.table) - used to categorize age into groups defined by interval

  5. library(viridis) - used for color scale to make plot

  6. library(janitor) - used of tabyl( function to count combination of one, two or three variables and show the percentage.

Data Preparation

  1. CLIENTNUM - Client number. Unique identifier for the customer holding the account

  2. Attrition_Flag - Internal event (customer activity) variable - if the account is closed then 1 else 0

  3. Customer_Age - Demographic variable - Customer’s Age in Years

  4. Gender - Demographic variable - M=Male, F=Female

  5. Dependent_count - Demographic variable - Number of dependents

  6. Education_Level - Demographic variable - Educational Qualification of the account holder (example: high school, college graduate, etc.)

  7. Marital_Status - Demographic variable - Married, Single, Divorced, Unknown

  8. Income_Category - Demographic variable - Annual Income Category of the account holder (< $40K, $40K - 60K, $60K - $80K, $80K-$120K, >$120K)

  9. Card_Category - Product Variable - Type of Card (Blue, Silver, Gold, Platinum)

  10. Months_on_book - Period of relationship with bank

  11. Total_Relationship_Count - Total no. of products held by the customer

  12. Months_Inactive_12_mon - No. of months inactive in the last 12 months

  13. Contacts_Count_12_mon - No. of Contacts in the last 12 months

  14. Credit_Limit - Credit Limit on the Credit Card

  15. Total_Revolving_Bal - Total Revolving Balance on the Credit Card

  16. Avg_Open_To_Buy - Open to Buy Credit Line (Average of last 12 months)

  17. Total_Amt_Chng_Q4_Q1 - Change in Transaction Amount (Q4 over Q1)

  18. Total_Trans_Amt - Total Transaction Amount (Last 12 months)

  19. Total_Trans_Ct - Total Transaction Count (Last 12 months)

  20. Total_Ct_Chng_Q4_Q1 - Change in Transaction Count (Q4 over Q1)

  21. Avg_Utilization_Ratio - Average Card Utilization Ratio

# Set "df" as data frame from original data
df <- data

# Change all variable name to lowercase
names(df) <- tolower(names(df))

# Rename 'customer_age' column to 'age'
names(df)[3] <- "age"

# Add new column agegroup by categorize age
setDT(df)[age >25 & age <= 35, agegroup := "25-35 Years Old"]
df[age >35 & age <= 45, agegroup := "35-45 Years Old"]
df[age >45 & age <= 55, agegroup := "45-55 Years Old"]
df[age >55 & age <= 65, agegroup := "55-65 Years Old"]
df[age >65 & age <= 75, agegroup := "65-75 Years Old"]

# Add new column total_trans_cat (total transaction category) by categorize total transaction amount
setDT(df)[total_trans_amt <= 2000, total_trans_cat := "$0 - $2000"]
df[total_trans_amt >= 2001 & total_trans_amt <= 4000, total_trans_cat := "$2001 - $4000"]
df[total_trans_amt >= 4001 & total_trans_amt <= 6000, total_trans_cat := "$4001 - $6000"]
df[total_trans_amt >= 6001 & total_trans_amt <= 8000, total_trans_cat := "$6001 - $8000"]
df[total_trans_amt >= 8001 & total_trans_amt <= 10000, total_trans_cat := "$8001 - $10000"]
df[total_trans_amt >= 10001 & total_trans_amt <= 12000, total_trans_cat := "$10001 - $12000"]
df[total_trans_amt >= 12001 & total_trans_amt <= 14000, total_trans_cat := "$12001 - $14000"]
df[total_trans_amt >= 14001 & total_trans_amt <= 16000, total_trans_cat := "$14001 - $16000"]

# Add new column years_on_book from months_on_book
df$years_on_book <- as.numeric(as.character(df$months_on_book)) / 12

# Remove unknown in income_category
df1 <- df[!(df$income_category=="Unknown")]

Data Analysis

ggplot(data = df1,
       mapping = aes(x = age, fill = gender)) +
      geom_histogram(alpha = 5, bins = 50) +
      labs(fill = "Gender",  x = "Customer Age", y = "Count",
      title = "Customer Demography") +
      theme_bw() +
      scale_x_continuous(breaks = round(seq(min(df1$age), max(df1$age), 
                                            by = 2),1))

# Show the Male and Female in total and percentage
tabyl(df1, gender)
##  gender    n  percent
##       F 4298 0.476761
##       M 4717 0.523239
# Calculate the Majority percentage of credit card customer from age 36 - 58
major <- sum(df1$age[between(df1$age,36,58)]) / sum(df1$age) *100
print(major)
## [1] 85.28081
# Calculate the least percentage of credit card customer from age 26 - 35 and age 59 - 73
minor <- 100 - major
print(minor)
## [1] 14.71919

Summary from customer demography table

  • Customer demography table show the distribution of customer group from dataset. In total there are 4,298 Female customers (47.7%) and 4,717 Male customers (52.3%).

  • Majority of customer is between 36 years old and 58 years old which is 85.3%.

  • Customer above 65 years old and below 28 years old is least group size which is 14.7%.


ggplot(data = df1,
       mapping = aes(x = age, fill = card_category)) +
      geom_histogram(alpha = 5, bins = 50) +
      labs(fill = "Card Category" ,x = "Credit Card Customer By Category", 
           y = "Count", title = "Customer Demography by Card Category") +
      facet_grid(. ~ gender) +
      theme_bw() +
     scale_x_continuous(breaks = round(seq(min(df1$age), max(df1$age), by = 5),1))

# Show the customer demography by card category 
tabyl(df1, card_category)
##  card_category    n     percent
##           Blue 8391 0.930782030
##           Gold  107 0.011869107
##       Platinum   15 0.001663894
##         Silver  502 0.055684969
# Show the customer demography by card category by gender
tabyl(df1, card_category, gender)
##  card_category    F    M
##           Blue 4106 4285
##           Gold   29   78
##       Platinum    4   11
##         Silver  159  343

Summary from customer demography by card category

  • Table above show that there was 8,391(93.1%) customers use Blue card in which 4,106 Female customers and 4,285 Male customers.

  • 502 (5.6%) customers use Silver card in which 159 Female customers and 343 Male customers.

  • 107 (1.2%) customers use Gold card in which 29 Female customers and 78 Male customes.

  • 15 (0.1%) customers use Platinum card in which 4 Female customers and 11 Male customers.


ggplot(data = df1, mapping = aes(x = total_trans_ct, y = total_trans_amt)) +
  geom_point(aes(color = gender)) +
  facet_wrap(~ agegroup, nrow=2) +
  theme_bw() +
  ggtitle("Customer Credit Card Usage") +
  theme(plot.title = element_text(hjust = 0.5)) +
  xlab("Total Transaction Count") +
  ylab("Total Transaction Amount") +
  labs(color = "Gender") +
  scale_color_manual(labels = c("Female", "Male"), values = c("pink", "blue")) +
  scale_x_continuous(breaks = round(seq(min(df1$total_trans_ct), max(df1$total_trans_ct), by = 20),1)) +
  scale_y_continuous(breaks = round(seq(min(df1$total_trans_amt), max(df1$total_trans_amt), by = 1500),1))

# Show the customer spending behavior by total transaction amount category
tabyl(df1, total_trans_cat)
##  total_trans_cat    n     percent valid_percent
##       $0 - $2000 2034 0.225623960   0.227669577
##  $10001 - $12000   10 0.001109262   0.001119319
##  $12001 - $14000  173 0.019190238   0.019364227
##  $14001 - $16000  433 0.048031059   0.048466532
##    $2001 - $4000 2735 0.303383250   0.306133871
##    $4001 - $6000 2908 0.322573489   0.325498097
##    $6001 - $8000  334 0.037049362   0.037385270
##   $8001 - $10000  307 0.034054354   0.034363107
##             <NA>   81 0.008985025            NA
# Show the customer spending behavior by total transaction amount category by age group
tabyl(df1, total_trans_cat, agegroup) %>%
  adorn_totals("row") %>%
  adorn_percentages("all") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns %>%
  adorn_title
##                         agegroup                                
##  total_trans_cat 25-35 Years Old 35-45 Years Old 45-55 Years Old
##       $0 - $2000      1.5% (138)     7.1%  (644)     9.1%  (818)
##  $10001 - $12000      0.0%   (2)     0.0%    (4)     0.0%    (3)
##  $12001 - $14000      0.4%  (34)     0.7%   (64)     0.6%   (56)
##  $14001 - $16000      0.6%  (52)     1.8%  (161)     1.9%  (168)
##    $2001 - $4000      4.2% (380)    11.9% (1069)    11.3% (1023)
##    $4001 - $6000      1.5% (133)    12.6% (1140)    14.4% (1297)
##    $6001 - $8000      0.1%  (13)     1.5%  (138)     1.7%  (154)
##   $8001 - $10000      0.3%  (24)     1.4%  (128)     1.4%  (129)
##             <NA>      0.1%  (12)     0.3%   (24)     0.4%   (36)
##            Total      8.7% (788)    37.4% (3372)    40.9% (3684)
##                                 
##  55-65 Years Old 65-75 Years Old
##      4.7%  (428)        0.1% (6)
##      0.0%    (1)        0.0% (0)
##      0.2%   (19)        0.0% (0)
##      0.6%   (52)        0.0% (0)
##      2.9%  (262)        0.0% (1)
##      3.7%  (338)        0.0% (0)
##      0.3%   (29)        0.0% (0)
##      0.3%   (26)        0.0% (0)
##      0.1%    (9)        0.0% (0)
##     12.9% (1164)        0.1% (7)

Summary from Customer spending behavior table

  • According to the table above, the highest category of customer spending on credit card is $4001 - $6000 (32%), follow by $2001 - $4000 (30%) and $0 - $2000 (22%).

  • The least category of customer spending on credit card is $10001 - $12000 (0.1%), $12001 - $14000 (2%) and $8001 - $10000 (3%)

  • The highest category of customer spending on credit card based on age group is customer age group 45-55 years old who spend $4001 - $6000 (14.4%), customer age group 35-45 years old who spend $4001 - $6000 (12.6%) and customer age group 35-45 years old who spend $2001 - $4000 (11.9%)


ggplot(data = df1,
       mapping = aes(x = months_on_book, fill = attrition_flag)) +
      geom_histogram(alpha = 0.5 , bins = 20) +
      labs(fill = "Attrition Customer",  x = "Customer Months On Book", y = "Count",
      title = "Credit Card Loyal Customer") +
      theme_bw() +
      scale_x_continuous(breaks = round(seq(min(df1$months_on_book), max(df1$months_on_book), by = 2),1))

# To show loyal customer based on months_on_book in bank
tabyl(df1, months_on_book, attrition_flag) %>%
  adorn_totals("row") %>%
  adorn_percentages("all") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns %>%
  adorn_title
##                    attrition_flag                  
##  months_on_book Attrited Customer Existing Customer
##              13       0.1%    (6)       0.5%   (44)
##              14       0.0%    (1)       0.2%   (14)
##              15       0.1%    (7)       0.2%   (18)
##              16       0.0%    (3)       0.2%   (22)
##              17       0.0%    (4)       0.4%   (33)
##              18       0.1%   (11)       0.4%   (39)
##              19       0.1%    (6)       0.6%   (54)
##              20       0.1%   (12)       0.6%   (58)
##              21       0.1%    (9)       0.7%   (61)
##              22       0.2%   (16)       0.9%   (77)
##              23       0.1%   (12)       1.0%   (91)
##              24       0.2%   (21)       1.3%  (118)
##              25       0.3%   (26)       1.3%  (120)
##              26       0.2%   (20)       1.7%  (149)
##              27       0.2%   (21)       1.7%  (156)
##              28       0.5%   (41)       2.3%  (208)
##              29       0.4%   (32)       2.1%  (186)
##              30       0.6%   (53)       2.4%  (220)
##              31       0.3%   (30)       2.8%  (252)
##              32       0.4%   (40)       2.5%  (222)
##              33       0.5%   (44)       2.6%  (236)
##              34       0.6%   (52)       2.9%  (262)
##              35       0.4%   (40)       2.7%  (247)
##              36       4.2%  (383)      20.0% (1804)
##              37       0.6%   (55)       2.9%  (265)
##              38       0.6%   (52)       2.9%  (258)
##              39       0.6%   (55)       2.8%  (248)
##              40       0.4%   (39)       2.7%  (247)
##              41       0.5%   (44)       2.4%  (217)
##              42       0.3%   (28)       2.3%  (208)
##              43       0.5%   (41)       2.3%  (211)
##              44       0.4%   (36)       1.9%  (171)
##              45       0.3%   (30)       1.9%  (174)
##              46       0.4%   (34)       1.6%  (146)
##              47       0.2%   (22)       1.5%  (133)
##              48       0.3%   (25)       1.4%  (123)
##              49       0.2%   (16)       1.2%  (109)
##              50       0.2%   (21)       0.7%   (63)
##              51       0.1%   (13)       0.6%   (54)
##              52       0.1%   (11)       0.5%   (43)
##              53       0.1%    (7)       0.7%   (66)
##              54       0.1%    (5)       0.5%   (43)
##              55       0.0%    (2)       0.3%   (30)
##              56       0.2%   (14)       0.8%   (75)
##           Total      16.0% (1440)      84.0% (7575)
# To show loyal customer group by agegroup and gender
df1 %>%
  tabyl(attrition_flag, agegroup, gender, show_missing_levels = FALSE) %>%
  adorn_totals("row") %>%
  adorn_percentages("all") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns %>%
  adorn_title
## $F
##                           agegroup                                
##     attrition_flag 25-35 Years Old 35-45 Years Old 45-55 Years Old
##  Attrited Customer      1.2%  (52)     6.8%  (294)     7.4%  (316)
##  Existing Customer      7.1% (306)    30.7% (1320)    33.4% (1435)
##              Total      8.3% (358)    37.6% (1614)    40.7% (1751)
##                                 
##  55-65 Years Old 65-75 Years Old
##       2.0%  (86)        0.0% (0)
##      11.3% (487)        0.0% (2)
##      13.3% (573)        0.0% (2)
## 
## $M
##                           agegroup                                
##     attrition_flag 25-35 Years Old 35-45 Years Old 45-55 Years Old
##  Attrited Customer      1.2%  (56)     5.4%  (255)     6.0%  (282)
##  Existing Customer      7.9% (374)    31.9% (1503)    35.0% (1651)
##              Total      9.1% (430)    37.3% (1758)    41.0% (1933)
##                                 
##  55-65 Years Old 65-75 Years Old
##       2.1%  (98)        0.0% (1)
##      10.5% (493)        0.1% (4)
##      12.5% (591)        0.1% (5)

Summary from Credit Card Loyal Customer table

  • Data above show that there was 7,575 existing customer (84%) and 1,440 attrited_customer (16%) and majority of customer already stay at this bank for 36 months/ 3 years which is 1,804 customer (20%).

  • Majority of Female existing customer was age group 45-55 years old 1,435 customers (33.4%) and attrited customer - 316 customers (7.4%).

  • Majority of Male existing customer was age group 45-55 years old - 1,651 customers (35%) and attrited customer - 282 customers (6%).