We fit the three logistic-regression models for the Default data that correspond to ISLR Tables 4.1, 4.2 and 4.3, and draw their ROC curves on one plot. This is done in base R only — no extra packages required.

  • Table 4.1: default ~ balance
  • Table 4.2: default ~ student
  • Table 4.3: default ~ balance + income + student

1 Load the data

Put Default.csv in the same folder as this .Rmd before knitting. (If you have the ISLR2 package installed, the code falls back to its built-in Default.)

if (file.exists("Default.csv")) {
  Default <- read.csv("Default.csv")
} else if (requireNamespace("ISLR2", quietly = TRUE)) {
  data(Default, package = "ISLR2")
} else {
  stop("Please place Default.csv next to this .Rmd, or install.packages('ISLR2').")
}

# make sure default and student are factors
Default$default <- factor(Default$default, levels = c("No", "Yes"))
Default$student <- factor(Default$student, levels = c("No", "Yes"))

# numeric response: 1 = defaulted ("Yes"), 0 = did not
y <- as.integer(Default$default == "Yes")

str(Default[, c("default", "student", "balance", "income")])
## 'data.frame':    10000 obs. of  4 variables:
##  $ default: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ student: Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 2 1 1 ...
##  $ balance: num  730 817 1074 529 786 ...
##  $ income : num  44362 12106 31767 35704 38463 ...

2 Fit the three models

# Table 4.1: balance only
fit_balance <- glm(default ~ balance, data = Default, family = binomial)

# Table 4.2: student only
fit_student <- glm(default ~ student, data = Default, family = binomial)

# Table 4.3: balance + income + student
fit_full    <- glm(default ~ balance + income + student,
                   data = Default, family = binomial)

The estimated coefficients reproduce the textbook tables:

round(coef(fit_balance), 4)   # Table 4.1
## (Intercept)     balance 
##    -10.6513      0.0055
round(coef(fit_student), 4)   # Table 4.2
## (Intercept)  studentYes 
##     -3.5041      0.4049
round(coef(fit_full),    4)   # Table 4.3
## (Intercept)     balance      income  studentYes 
##    -10.8690      0.0057      0.0000     -0.6468

3 Predicted probabilities

For each model we compute the predicted probability of defaulting.

p_balance <- predict(fit_balance, type = "response")
p_student <- predict(fit_student, type = "response")
p_full    <- predict(fit_full,    type = "response")

4 ROC helper functions (base R)

An ROC curve plots the true-positive rate (sensitivity) against the false-positive rate (1 − specificity) as the classification threshold is varied. The two small functions below compute the curve and its area (AUC) directly.

# Build an ROC curve: sort scores high -> low, accumulate TP and FP rates.
roc_curve <- function(score, y) {
  ord <- order(score, decreasing = TRUE)
  yy  <- y[ord]
  P   <- sum(yy == 1)
  N   <- sum(yy == 0)
  tpr <- c(0, cumsum(yy == 1) / P)   # true-positive rate
  fpr <- c(0, cumsum(yy == 0) / N)   # false-positive rate
  list(fpr = fpr, tpr = tpr)
}

# Area under the curve via the trapezoidal rule.
auc_value <- function(r) {
  sum(diff(r$fpr) * (head(r$tpr, -1) + tail(r$tpr, -1)) / 2)
}
roc_balance <- roc_curve(p_balance, y)
roc_student <- roc_curve(p_student, y)
roc_full    <- roc_curve(p_full,    y)

auc_balance <- auc_value(roc_balance)
auc_student <- auc_value(roc_student)
auc_full    <- auc_value(roc_full)

round(c(balance = auc_balance,
        student = auc_student,
        full    = auc_full), 4)
## balance student    full 
##  0.9480  0.5473  0.9496

5 ROC plot

plot(roc_full$fpr, roc_full$tpr, type = "l", lwd = 2, col = "firebrick",
     xlab = "False positive rate (1 - specificity)",
     ylab = "True positive rate (sensitivity)",
     main = "ROC curves for the Default models (Tables 4.1-4.3)")
lines(roc_balance$fpr, roc_balance$tpr, lwd = 2, col = "steelblue")
lines(roc_student$fpr, roc_student$tpr, lwd = 2, col = "darkorange")
abline(0, 1, lty = 2, col = "grey50")   # chance line

legend("bottomright", lwd = 2, bty = "n",
       col = c("firebrick", "steelblue", "darkorange"),
       legend = c(
         sprintf("Table 4.3  balance+income+student (AUC = %.3f)", auc_full),
         sprintf("Table 4.1  balance (AUC = %.3f)",                auc_balance),
         sprintf("Table 4.2  student (AUC = %.3f)",                auc_student)
       ))

6 Interpretation

  • balance (Table 4.1) is a very strong predictor of default: its ROC curve bows sharply toward the top-left corner with an AUC of about 0.95.
  • student (Table 4.2) alone is almost useless. Because the model produces only two distinct probabilities (student vs. non-student), its ROC is a coarse curve lying close to the diagonal “chance” line, with an AUC of about 0.55 — barely better than random guessing.
  • The full model (Table 4.3) has the highest AUC (about 0.95), but only a whisker above the balance-only model. Almost all of the predictive power comes from balance; adding income and student contributes very little.

A larger area under the curve (AUC closer to 1) indicates a better classifier, so the ranking is: full ≈ balance ≫ student.