Notes on compiling this document:
In the code chunk above (entitled “setup”) echo is set to TRUE. This means that the code in your chunks will be displayed, along with the results, in your compiled document.
Below is code to clean and prepare the data set for modeling. Before running that code, follow these preparatory steps:
Download the RMarkdown template and the data sets for the assignment from Canvas.
Copy or move these files from your downloads folder to a folder dedicated to this class–say, MKTG-6487.
You need to define this folder as your “working directory.” To do so, navigate to that folder using the files tab in the lower right quadrant in RStudio. (You should see your files you moved into this folder in the previous step.) Click the “More” button in the menu under the Files tab and select “Set As Working Directory.”
Once the files are in the right location on your computer then run this code to clean and format the data:
# You must run this code to format the data set properly!
advise_invest <- read_csv("~/MBA/MKTG 6487/Project/adviseinvest (1).csv", show_col_types = FALSE) |> # Download data
select(-product) |> # Remove the product column
filter(income > 0, # Filter out mistaken data
num_accts < 5) |>
mutate(answered = factor(ifelse(answered==0, "no","yes"), # Turn answered into yes/no factor
levels = c("yes", "no")),
female = factor(female), # Make categorical variables into factors
job = factor(job),
rent = factor(rent),
own_res = factor(own_res),
new_car = factor(new_car),
mobile = factor(mobile),
chk_acct = factor(chk_acct),
sav_acct = factor(sav_acct))
And here is code to load the data set of prospective customers from your working directory. Note that in order to use this data set for prediction, the variables need to be formatted exactly the same as in the data used to fit the model. It does not include a target variable because the event of answering or not answering has not happened yet for scheduled customers.
prospective <- read_csv("~/MBA/MKTG 6487/Project/adviseinvest_new_customer.csv", show_col_types = FALSE) |>
mutate(female = factor(female),
job = factor(job),
rent = factor(rent),
own_res = factor(own_res),
new_car = factor(new_car),
mobile = factor(mobile),
chk_acct = factor(chk_acct),
sav_acct = factor(sav_acct))
Read the instructions for this phase of the project at Canvas.
# Fit the tree model using all available predictors
tree_model <- rpart(answered ~ ., data = advise_invest, method = "class")
# Predict the class labels using the default 0.5 probability threshold
predicted_classes <- predict(tree_model, newdata = advise_invest, type = "class")
# Create a confusion matrix comparing predicted classes to actual labels
confusion_matrix <- table(Predicted = predicted_classes, Actual = advise_invest$answered)
print("Confusion Matrix:")
## [1] "Confusion Matrix:"
print(confusion_matrix)
## Actual
## Predicted yes no
## yes 13820 3008
## no 2304 10367
# Compute counts from the confusion matrix
# True Positives (TP): predicted "yes" and observed "yes"
TP <- confusion_matrix["yes", "yes"]
# False Positives (FP): predicted "yes" but observed "no"
FP <- confusion_matrix["yes", "no"]
# True Negatives (TN): predicted "no" and observed "no"
TN <- confusion_matrix["no", "no"]
# False Negatives (FN): predicted "no" but observed "yes"
FN <- confusion_matrix["no", "yes"]
# Calculate expected profit using the cost-benefit matrix:
# Benefit: $75 for a true positive (purchase: $100 benefit minus $25 agent cost)
# Cost: -$25 for a false positive (agent cost incurred when customer does not answer)
profit <- (TP * 75) + (FP * (-25))
# Ensure that profit is not negative (if negative, set to 0)
profit <- ifelse(profit < 0, 0, profit)
cat("Expected Profit: $", profit, "\n")
## Expected Profit: $ 961300
# Predict probabilities for prospective customers using the tree model
predicted_probs <- predict(tree_model, newdata = prospective, type = "prob")[, "yes"]
# Assign a predicted class label of "yes" if the probability is >= 0.3
prospective <- prospective %>%
mutate(prob_answer = predicted_probs,
pred_answer = if_else(prob_answer >= 0.3, "yes", "no"))
# Create contact list: filter only those predicted "yes"
contact_list <- prospective %>%
filter(pred_answer == "yes")
# Number of customers on the contact list
num_contacts <- nrow(contact_list)
cat("Number of Prospective Customers on the Contact List:", num_contacts, "\n")
## Number of Prospective Customers on the Contact List: 624
#The analysis suggests that targeting prospective customers with a model-estimated probability of answering of 30% or more enhances operational efficiency and profitability. The decision tree model developed from historical data shows that when a customer answers, the net profit is $75 (after accounting for agent time at $25 per call). Conversely, calling a customer who does not answer costs $25.
#Based on the cost-benefit analysis, filtering the contact list to include only those customers with at least a 30% chance of answering minimizes the risk of incurring costs on calls that are unlikely to result in a sale. By implementing this threshold, sales representatives will be assigned only to prospective customers with a higher likelihood of engagement, optimizing staffing use and reducing idle time.
#Operationalizing this approach involves integrating the model into the customer scheduling process. For every prospective customer, compute the probability of an answered call. The system should automatically flag those individuals with predicted probabilities above 30% for follow-up calls. In addition, continual monitoring of model performance and periodic recalibration of the threshold as more data becomes available can further refine the approach, ensuring that the sales team focuses on high-probability engagements and maximizes overall profitability.