Objective: Predict Customer Attrition using the CreditCardData set using a classification Tree along with testing Accuracy
Data Prep: Created A New Variable Named Attrited, replacing Attrition_Flag with a numeric value either 0 or 1.
Split the data Into Training and Validation Datasets (70%/30%)
The Tree;
library(dplyr)
library(rpart)
library(rpart.plot)
CreditCardData <- read.csv("C:/Users/12033/Downloads/CreditCardData.csv")
set.seed(123)
attrited <- CreditCardData %>%
mutate(attrited = ifelse(Attrition_Flag == "Existing Customer", 0, 1),
random = runif(10127)) %>%
select(-Attrition_Flag)
# Splitting data into training and validation
train_tree <- attrited %>%
filter(random < .7) %>%
select(-random)
validation_tree <- attrited %>%
filter(random >= .7) %>%
select(-random)
ct1 <- rpart(attrited ~ ., data=train_tree , method = 'class')
rpart.plot(ct1)
Two interpretations; When Total Trans_Amt < 5423 within the last 12 months, there is a 66% rate customer attrition, meaning when funds are low people tend to stop using credit cards.
Customer Attrition is higher when customers are above age 38
Variable Importance: How Important each variable is to Attrition
Variable Importance with classification;
Model Accuracy: Predicts Correctly 16% Attrition, Misclassifies roughly 8.5%
| validation_tree$attrited_predicted
validation_tree$attrited | 0 | 1 | Row Total |
-------------------------|-----------|-----------|-----------|
0 | 2418 | 90 | 2508 |
| 39.758 | 222.084 | |
| 0.964 | 0.036 | 0.839 |
| 0.953 | 0.198 | |
| 0.809 | 0.030 | |
-------------------------|-----------|-----------|-----------|
1 | 118 | 364 | 482 |
| 206.873 | 1155.572 | |
| 0.245 | 0.755 | 0.161 |
| 0.047 | 0.802 | |
| 0.039 | 0.122 | |
-------------------------|-----------|-----------|-----------|
Column Total | 2536 | 454 | 2990 |
| 0.848 | 0.152 | |
-------------------------|-----------|-----------|-----------|
Statistics for All Table Factors
Pearson's Chi-squared test
------------------------------------------------------------
Chi^2 = 1624.287 d.f. = 1 p = 0
Pearson's Chi-squared test with Yates' continuity correction
------------------------------------------------------------
Chi^2 = 1618.706 d.f. = 1 p = 0