LAB 3 — Classification, Model Comparison & Ensembling Weight: (as per syllabus) Tier: Core Modeling Dataset: week2_churn_data.csv Purpose This lab develops your ability to build, evaluate, compare, and combine classification models for a realistic business problem. You will move from individual models to model selection and ensembling, emphasizing performance tradeoffs and business interpretation. ________________________________________ Learning Goals By completing this lab, you will be able to: • Build and evaluate multiple classification models • Compare models using appropriate performance metrics • Select predictors intentionally and justify modeling choices • Apply simple ensemble techniques to improve performance • Translate model results into business recommendations ________________________________________ Tasks Task 1 — Load Data & Define Outcome • Load week2_churn_data.csv
#setwd("C:/Users/kelle/OneDrive/Documents/MSBA/TBANLT560W26/R working directory")
library(readr)
churnL3 <- read_csv("week2_churn_data.csv")
#View(week2_churn_data)
• Identify the business outcome variable
dim(df)
## NULL
str(df)
## function (x, df1, df2, ncp, log = FALSE)
head(df, 3)
##
## 1 function (x, df1, df2, ncp, log = FALSE)
## 2 {
## 3 if (missing(ncp))
• Confirm outcome class balance (churn rate)
churnL3$churn <- factor(churnL3$churn, levels = c("No","Yes"))
table(churnL3$churn)
##
## No Yes
## 290 110
Expected output: • Code showing dataset structure • 2–3 sentences describing the outcome and churn rate This mean the data set is imbalance and customers that churn are observed more rarely.If we want to capture more instances where customers are likely to churn we may want to oversample records of customers that churn for the model to increase sensitivity. We may also want address the issue by setting a lower threshold for our main class.
Checkpoint (write 2–4 sentences): Your
answer: - Business question: [WRITE
HERE]
- What is the business question?
Can business decision makers use a ML model to determine if a customer
will churn based on attributes like monthy_charges or tenure_months.
Task 2 — Predictor Selection
You may not automatically use all variables.
Choose 4–8 predictors that make business sense for
churn.
You may include both numeric and categorical predictors. • Review
available predictors
Write your predictor list here (exact column names):
• Select a subset of predictors for modeling
# TODO: Replace the vector below with your chosen predictor column names
churnL3predictor_cols <- c("tenure_months", "monthly_charges","online_security", "tech_support")
# "tenure_months",
# "monthly_charges",
# "contract_type"
# Keep outcome + predictors only
churnL3keep_cols <- c("churn", churnL3predictor_cols)
churnL3_df2 <- churnL3[ , churnL3keep_cols ]
• Justify inclusion/exclusion decisions (e.g., relevance, redundancy, interpretability)
# Confirm structure
str(churnL3_df2)
## tibble [400 × 5] (S3: tbl_df/tbl/data.frame)
## $ churn : Factor w/ 2 levels "No","Yes": 2 2 1 1 1 1 1 1 1 1 ...
## $ tenure_months : num [1:400] 49 29 35 1 23 31 24 9 3 11 ...
## $ monthly_charges: num [1:400] 68.6 77.4 80.2 58.4 50.8 ...
## $ online_security: chr [1:400] "Yes" "Yes" "Yes" "Yes" ...
## $ tech_support : chr [1:400] "Yes" "Yes" "No" "Yes" ...
Expected output: • Short written justification (≈1 paragraph) These variables were selected because they showed some correlation to the outcome variable while total_chares was excluded because of high correlation with tenture_months
Task 3 — Train/Test Split • Create a 70/30 train–test split
n <- nrow(churnL3_df2)
train_size <- floor(0.70 * n)
train_idx <- sample(1:n, size = train_size, replace = FALSE)
train <- churnL3_df2[train_idx, ]
test <- churnL3_df2[-train_idx, ]
• Verify that outcome proportions are similar across splits
c(nrow(train), nrow(test))
## [1] 280 120
prop.table(table(train$churn))
##
## No Yes
## 0.7321429 0.2678571
prop.table(table(test$churn))
##
## No Yes
## 0.7083333 0.2916667
Expected output: Your answer: [WRITE HERE] Checkpoint (1–2 sentences): - Are the churn rates similar in train vs test? The results show that the churn rates are similar but they also seem to include a significant amount of variance. This could mean the training data did not include observations with very similar predictor values.
• Code showing split
calc_metrics <- function(actual, pred_class, positive = "Yes") {
actual <- factor(actual, levels = c("No","Yes"))
pred_class <- factor(pred_class, levels = c("No","Yes"))
cm <- table(Predicted = pred_class, Actual = actual)
TP <- cm[positive, positive]
TN <- cm[setdiff(levels(actual), positive), setdiff(levels(actual), positive)]
FP <- cm[positive, setdiff(levels(actual), positive)]
FN <- cm[setdiff(levels(actual), positive), positive]
accuracy <- (TP + TN) / sum(cm)
precision <- if ((TP + FP) == 0) NA else TP / (TP + FP)
recall <- if ((TP + FN) == 0) NA else TP / (TP + FN)
f1 <- if (is.na(precision) | is.na(recall) | (precision + recall) == 0) NA else 2 * precision * recall / (precision + recall)
list(cm = cm, accuracy = accuracy, precision = precision, recall = recall, f1 = f1)
}
make_roc_auc <- function(actual, prob_yes) {
roc_obj <- roc(response = actual, predictor = prob_yes, levels = c("No","Yes"), direction = "<")
auc_val <- as.numeric(auc(roc_obj))
list(roc = roc_obj, auc = auc_val)
}
• Confirmation of sample sizes and churn rates
Task 4 — Exploratory Model Diagnostics Create at least
two plots that help you understand churn risk.
Each plot must include 1–3 sentences explaining what it
suggests.
• Produce at least two student-chosen plots that help you understand: o Predictor–outcome relationships, or
# TODO: Choose one predictor and plot churn rate by group (if categorical)
# Example ideas:
# - barplot of churn proportion by contract type
# - boxplot of numeric predictor split by churn
#
# PLOT 1 CODE HERE
Interpretation (1–3 sentences): [WRITE HERE]
o Potential modeling challenges (e.g., separation, scale)
# TODO: A second plot, different from Plot 1
# PLOT 2 CODE HERE
Interpretation (1–3 sentences): [WRITE HERE]
Expected output: • Plots generated by you
• Brief interpretation for each plot
Task 5 — Logistic Regression (Baseline Model)
• Fit a logistic regression using selected predictors
# Logistic regression (uses selected predictors only)
logit_fit <- glm(churn ~ ., data = train, family = binomial)
summary(logit_fit)
##
## Call:
## glm(formula = churn ~ ., family = binomial, data = train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.345371 0.569427 -2.363 0.0181 *
## tenure_months 0.010169 0.008218 1.237 0.2159
## monthly_charges -0.002821 0.006407 -0.440 0.6598
## online_securityYes 0.299793 0.272327 1.101 0.2710
## tech_supportYes 0.164144 0.275591 0.596 0.5514
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 325.42 on 279 degrees of freedom
## Residual deviance: 322.22 on 275 degrees of freedom
## AIC: 332.22
##
## Number of Fisher Scoring iterations: 4
• Generate predictions on the test set
logit_prob <- predict(logit_fit, newdata = test, type = "response")
• Compute: o Confusion matrix
# TODO: Choose a threshold (start with 0.50). You may adjust later.
threshold <- 0.50
logit_pred <- ifelse(logit_prob >= threshold, "Yes", "No")
o Accuracy, Precision, Recall, F1
logit_m <- calc_metrics(test$churn, logit_pred)
logit_m$cm
## Actual
## Predicted No Yes
## No 85 35
## Yes 0 0
logit_m$accuracy; logit_m$precision; logit_m$recall; logit_m$f1
## [1] 0.7083333
## [1] NA
## [1] 0
## [1] NA
o ROC curve and AUC
logit_ra <- make_roc_auc(test$churn, logit_prob)
logit_ra$auc
## [1] 0.5842017
Expected output: Checkpoint (2–4 sentences): - Which predictors look influential (direction/sign), and does that match your EDA?
Your answer: [WRITE HERE] • Metrics and ROC plot
• Short interpretation of coefficients and performance
Task 6 — k-Nearest Neighbors (kNN) kNN needs: - numeric design matrix (dummy variables for categorical predictors) - scaling using train statistics only
• Prepare data appropriately (dummy variables + scaling)
• Fit a kNN model (choose and report k)
x_train <- model.matrix(churn ~ . - 1, data = train)
x_test <- model.matrix(churn ~ . - 1, data = test)
y_train <- train$churn
y_test <- test$churn
# Scale using TRAIN stats
train_center <- apply(x_train, 2, mean)
train_scale <- apply(x_train, 2, sd)
train_scale[train_scale == 0] <- 1
x_train_s <- scale(x_train, center = train_center, scale = train_scale)
x_test_s <- scale(x_test, center = train_center, scale = train_scale)
• Evaluate using the same metrics as Task 5
# TODO: Choose k (try at least two values and record what happens)
k <- 15
knn_pred <- knn(train = x_train_s, test = x_test_s, cl = y_train, k = k, prob = TRUE)
# Convert "winning prob" into prob(Yes) for ROC
winprob <- attr(knn_pred, "prob")
knn_prob_yes <- ifelse(knn_pred == "Yes", winprob, 1 - winprob)
knn_m <- calc_metrics(y_test, knn_pred)
knn_m$cm
## Actual
## Predicted No Yes
## No 79 34
## Yes 6 1
knn_m$accuracy; knn_m$precision; knn_m$recall; knn_m$f1
## [1] 0.6666667
## [1] 0.1428571
## [1] 0.02857143
## [1] 0.04761905
knn_ra <- make_roc_auc(y_test, knn_prob_yes)
knn_ra$auc
## [1] 0.4905882
Expected output: Checkpoint (1–3 sentences): - What k did you choose and why? What changed when you tried another k?
Your answer: [WRITE HERE] • Metrics and ROC curve
• Comparison to logistic regression
Task 7 — Naive Bayes • Fit a Naive Bayes classifier
# NOTE: e1071::naiveBayes can handle factors directly.
nb_fit <- naiveBayes(churn ~ ., data = train)
nb_fit
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## No Yes
## 0.7321429 0.2678571
##
## Conditional probabilities:
## tenure_months
## Y [,1] [,2]
## No 29.12195 16.64302
## Yes 31.77333 17.47166
##
## monthly_charges
## Y [,1] [,2]
## No 69.16922 20.97912
## Yes 67.75907 22.62381
##
## online_security
## Y No Yes
## No 0.5707317 0.4292683
## Yes 0.4933333 0.5066667
##
## tech_support
## Y No Yes
## No 0.5073171 0.4926829
## Yes 0.4800000 0.5200000
• Evaluate using the same metrics
# Predicted class
nb_pred <- predict(nb_fit, newdata = test, type = "class")
# Predicted probability of "Yes" for ROC
nb_prob <- predict(nb_fit, newdata = test, type = "raw")[, "Yes"]
nb_m <- calc_metrics(test$churn, nb_pred)
nb_m$cm
## Actual
## Predicted No Yes
## No 85 35
## Yes 0 0
nb_m$accuracy; nb_m$precision; nb_m$recall; nb_m$f1
## [1] 0.7083333
## [1] NA
## [1] 0
## [1] NA
nb_ra <- make_roc_auc(test$churn, nb_prob)
nb_ra$auc
## [1] 0.6070588
• Discuss when Naive Bayes may outperform other models
Expected output: Checkpoint (1–3 sentences): - When might Naive Bayes be a good choice, even if AUC is not the highest?
Your answer: [WRITE HERE]
• Metrics and ROC curve
• Short interpretive discussion
Task 8 — Classification Tree • Fit a classification tree
tree_fit <- rpart(churn ~ ., data = train, method = "class")
print(tree_fit)
## n= 280
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 280 75 No (0.73214286 0.26785714)
## 2) monthly_charges>=29.45 271 69 No (0.74538745 0.25461255)
## 4) tenure_months< 48.5 218 48 No (0.77981651 0.22018349)
## 8) tenure_months>=41.5 33 3 No (0.90909091 0.09090909) *
## 9) tenure_months< 41.5 185 45 No (0.75675676 0.24324324)
## 18) online_security=No 100 19 No (0.81000000 0.19000000)
## 36) monthly_charges< 77.98 70 8 No (0.88571429 0.11428571) *
## 37) monthly_charges>=77.98 30 11 No (0.63333333 0.36666667)
## 74) monthly_charges>=89.105 18 3 No (0.83333333 0.16666667) *
## 75) monthly_charges< 89.105 12 4 Yes (0.33333333 0.66666667) *
## 19) online_security=Yes 85 26 No (0.69411765 0.30588235)
## 38) monthly_charges>=78.435 31 5 No (0.83870968 0.16129032) *
## 39) monthly_charges< 78.435 54 21 No (0.61111111 0.38888889)
## 78) monthly_charges< 49.315 11 2 No (0.81818182 0.18181818) *
## 79) monthly_charges>=49.315 43 19 No (0.55813953 0.44186047)
## 158) monthly_charges>=56.03 34 13 No (0.61764706 0.38235294)
## 316) tenure_months>=27.5 11 2 No (0.81818182 0.18181818) *
## 317) tenure_months< 27.5 23 11 No (0.52173913 0.47826087)
## 634) tenure_months< 12.5 13 4 No (0.69230769 0.30769231) *
## 635) tenure_months>=12.5 10 3 Yes (0.30000000 0.70000000) *
## 159) monthly_charges< 56.03 9 3 Yes (0.33333333 0.66666667) *
## 5) tenure_months>=48.5 53 21 No (0.60377358 0.39622642)
## 10) monthly_charges< 60.79 19 3 No (0.84210526 0.15789474) *
## 11) monthly_charges>=60.79 34 16 Yes (0.47058824 0.52941176)
## 22) tenure_months>=50.5 25 11 No (0.56000000 0.44000000)
## 44) online_security=Yes 9 2 No (0.77777778 0.22222222) *
## 45) online_security=No 16 7 Yes (0.43750000 0.56250000) *
## 23) tenure_months< 50.5 9 2 Yes (0.22222222 0.77777778) *
## 3) monthly_charges< 29.45 9 3 Yes (0.33333333 0.66666667) *
• Evaluate using the same metrics
tree_prob <- predict(tree_fit, newdata = test, type = "prob")[, "Yes"]
tree_pred <- ifelse(tree_prob >= 0.50, "Yes", "No")
tree_m <- calc_metrics(test$churn, tree_pred)
tree_m$cm
## Actual
## Predicted No Yes
## No 60 26
## Yes 25 9
tree_m$accuracy; tree_m$precision; tree_m$recall; tree_m$f1
## [1] 0.575
## [1] 0.2647059
## [1] 0.2571429
## [1] 0.2608696
tree_ra <- make_roc_auc(test$churn, tree_prob)
tree_ra$auc
## [1] 0.5368067
# Optional (requires extra package)
# install.packages("rpart.plot")
# library(rpart.plot)
# rpart.plot(tree_fit)
• Comment on interpretability vs performance
Expected output: Checkpoint (1–3 sentences): - What do you gain/lose with a tree vs logistic regression?
Your answer: [WRITE HERE]
• Metrics (tree visualization optional)
• Interpretation of strengths/weaknesses
Task 9 — Model Comparison • Create a single comparison table including: o Accuracy, Precision, Recall, F1, AUC for all models
results <- data.frame(
Model = c("Logistic", paste0("kNN (k=", k, ")"), "Naive Bayes", "CART Tree"),
Accuracy = c(logit_m$accuracy, knn_m$accuracy, nb_m$accuracy, tree_m$accuracy),
Precision = c(logit_m$precision, knn_m$precision, nb_m$precision, tree_m$precision),
Recall = c(logit_m$recall, knn_m$recall, nb_m$recall, tree_m$recall),
F1 = c(logit_m$f1, knn_m$f1, nb_m$f1, tree_m$f1),
AUC = c(logit_ra$auc, knn_ra$auc, nb_ra$auc, tree_ra$auc)
)
results
## Model Accuracy Precision Recall F1 AUC
## 1 Logistic 0.7083333 NA 0.00000000 NA 0.5842017
## 2 kNN (k=15) 0.6666667 0.1428571 0.02857143 0.04761905 0.4905882
## 3 Naive Bayes 0.7083333 NA 0.00000000 NA 0.6070588
## 4 CART Tree 0.5750000 0.2647059 0.25714286 0.26086957 0.5368067
• Plot all ROC curves on one chart
plot(logit_ra$roc, main = "ROC Curves (Test Set)", legacy.axes = TRUE)
plot(knn_ra$roc, add = TRUE)
plot(nb_ra$roc, add = TRUE)
plot(tree_ra$roc, add = TRUE)
legend(
"bottomright",
legend = c(
paste0("Logit AUC=", round(logit_ra$auc, 3)),
paste0("kNN AUC=", round(knn_ra$auc, 3)),
paste0("NB AUC=", round(nb_ra$auc, 3)),
paste0("Tree AUC=", round(tree_ra$auc, 3))
),
lty = 1,
bty = "n"
)
Expected output:
• Comparison table
• Combined ROC plot ________________________________________ Task 10 — Model Selection & Business Scenario Assume false negatives are more costly than false positives (missing likely churners).
Select one model as “best” based on the business goal. • Identify the best-performing model under this assumption
Write 1 short paragraph: - Which metric did you prioritize and why? (e.g., Recall if missing churners is costly) - Which model did you choose?
• Justify your choice using metrics and business reasoning Expected output: Your answer: [WRITE HERE] • 1–2 paragraphs written explanation ________________________________________ Task 11 — Ensemble Extension Using your best individual model: Pick one improvement path:
Option A (recommended): Bagging
- If your best model is a tree, bagging is a natural choice. - If your
best model is not a tree, you can still bag trees as a “strong
alternative” and compare.
Option B (optional): Boosting (advanced)
Option C (optional/bonus): Stacking (advanced) • Apply
one ensembling approach (e.g., bagging, boosting, or stacking) `
• Compare ensemble performance to the single model
• State whether ensembling improved results and why
Expected output: Write 3–6 sentences: - What ensemble method did you use? - Did it improve your chosen metric vs the single best model? - Why might it help (or not) on this dataset?
Your answer: [WRITE HERE] • Updated metrics • Brief interpretation of ensemble effectiveness ________________________________________ # Final task — Recommendation memo (≤ 1 page)
Write a memo to a non-technical manager: