Objective: Predict Customer Attrition using the CreditCardData set using a classification Tree along with testing Accuracy

Data Prep: Created A New Variable Named Attrited, replacing Attrition_Flag with a numeric value either 0 or 1.

Split the data Into Training and Validation Datasets (70%/30%)

The Tree;

library(dplyr)
library(rpart)
library(rpart.plot)



CreditCardData <- read.csv("C:/Users/12033/Downloads/CreditCardData.csv")

set.seed(123)

attrited <- CreditCardData %>% 
  mutate(attrited = ifelse(Attrition_Flag == "Existing Customer", 0, 1),
         random = runif(10127)) %>% 
  select(-Attrition_Flag)


# Splitting data into training and validation
train_tree <- attrited %>% 
  filter(random < .7) %>% 
  select(-random)

validation_tree <- attrited %>% 
  filter(random >= .7) %>% 
  select(-random)

ct1 <- rpart(attrited ~ ., data=train_tree , method = 'class')

rpart.plot(ct1)

Tree Plot
Tree Plot

Two interpretations; When Total Trans_Amt < 5423 within the last 12 months, there is a 66% rate customer attrition, meaning when funds are low people tend to stop using credit cards.

Customer Attrition is higher when customers are above age 38

Variable Importance: How Important each variable is to Attrition

Variable Importance with classification;

Variable Importance
Variable Importance

Model Accuracy: Predicts Correctly 16% Attrition, Misclassifies roughly 8.5%

| validation_tree$attrited_predicted 
validation_tree$attrited |         0 |         1 | Row Total | 
-------------------------|-----------|-----------|-----------|
                       0 |      2418 |        90 |      2508 | 
                         |    39.758 |   222.084 |           | 
                         |     0.964 |     0.036 |     0.839 | 
                         |     0.953 |     0.198 |           | 
                         |     0.809 |     0.030 |           | 
-------------------------|-----------|-----------|-----------|
                       1 |       118 |       364 |       482 | 
                         |   206.873 |  1155.572 |           | 
                         |     0.245 |     0.755 |     0.161 | 
                         |     0.047 |     0.802 |           | 
                         |     0.039 |     0.122 |           | 
-------------------------|-----------|-----------|-----------|
            Column Total |      2536 |       454 |      2990 | 
                         |     0.848 |     0.152 |           | 
-------------------------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  1624.287     d.f. =  1     p =  0 

Pearson's Chi-squared test with Yates' continuity correction 
------------------------------------------------------------
Chi^2 =  1618.706     d.f. =  1     p =  0