Purpose

Using independent variables (client sales in the last year, Client spending with Conv company, number of years as customers, and Normalized Fortune Reputation index) to predict category of customers (A: highest valued customer and B, C)

Analysis steps

Step 1: Load data and split data into two subsets, with 70% training and 30% test

setwd("C:\\Users\\Yang\\Desktop\\Business Data mining\\R\\Case\\Conv")
Conv <- read.csv ("Conv.csv")
set.seed(1234)
SampleID <- sample(2, nrow(Conv), replace = TRUE, prob = c(0.7, 0.3))
trainData <- Conv[SampleID==1, ]
testData <- Conv[SampleID==2, ]

Step 2: Build the decision tree and plot the tree

library(party)
Conv_ctree <- ctree(Account.Category ~ Sales.2006 + Spending.with.Convs.in.2006 + Normalized.Fortune.Reputation.Index, data = Conv)
plot(Conv_ctree)

Conclusion: Based on the plot tree, we may conclude that: 1) if the client’s fortune reputation index is higher than 5.8, it is highly likely in the catogry of A or B; 2) if the client’s fortune reputation index is less or equal 5.8, spending with Conv company less than 11.9 million dollars, and it’s total sales less than 12120.54 million dollars, it must belong to catogory C

Step 3: Predict the test data

testPred <- predict(Conv_ctree, newdata = testData)
table(testPred, testData$Account.Category)
##         
## testPred  A  B  C
##        A  2  0  0
##        B  0 15  0
##        C  0  1 76

Conclusion: According to the predicted table, we can see that the prediciton is high accurate, with only one mistake