Conformal Prediction Activity

Author

Dillon Murphy

Conformal prediction for classification produces prediction sets rather than single labels. For a target coverage level \((1 - \alpha)\), we build sets \(C(x)\) such that P(Y∈C(X))≥1−α.

Activity - Try For Yourself

We’ll classify land-cover types from satellite imagery using the Satellite dataset (6,435 samples).

Each observation contains spectral reflectance features from 4 spectral bands and belongs to one of 6 terrain classes.

For this activity:

  • Use the first 10 test samples for manual computation.

  • Then we’ll evaluate the full test set for empirical coverage.

Target coverage: \((1 - \alpha) = 0.95\).

Step 1 - Train the Model

Code
data(Satellite)
sat <- as_tibble(Satellite)

# Randomize and split into train/cal/test

idx <- sample(1:nrow(sat), nrow(sat))
train <- sat[idx[1:4000], ]
cal <- sat[idx[4001:5200], ]
test <- sat[idx[5201:6435], ]

# Fit random forest

rf <- randomForest(classes ~ ., data = train, ntree = 200)

# Predict class probabilities for calibration

cal_probs <- predict(rf, cal, type = "prob")
cal_df <- as_tibble(cal_probs)
cal_df$True <- cal$classes

Step 2 – Compute Calibration Nonconformity Scores

Here are the first 10 calibration examples, try calculating the scores \(s_i = 1 - p_{y_i}(x_i)\) manually for these before revealing the answer.

Code
head(cal_df, 10) %>%
  kable()
red soil cotton crop grey soil damp grey soil vegetation stubble very damp grey soil True
0.020 0.000 0.000 0.000 0.970 0.010 vegetation stubble
0.085 0.000 0.745 0.045 0.010 0.115 grey soil
0.005 0.000 0.685 0.235 0.000 0.075 very damp grey soil
0.740 0.000 0.000 0.010 0.195 0.055 red soil
0.070 0.405 0.010 0.045 0.415 0.055 cotton crop
0.935 0.000 0.035 0.010 0.020 0.000 red soil
0.090 0.615 0.125 0.045 0.080 0.045 cotton crop
0.000 0.000 0.925 0.070 0.000 0.005 grey soil
0.000 0.000 0.000 0.830 0.000 0.170 damp grey soil
0.000 0.000 0.010 0.085 0.080 0.825 very damp grey soil
Code
cal_scores <- 1 - cal_probs[cbind(1:nrow(cal), match(cal$classes, colnames(cal_probs)))]

first10_scores <- tibble(
Index = 1:10,
True = cal$classes[1:10],
Score = round(cal_scores[1:10], 4)
)

first10_scores %>%
kable(col.names = c("Row", "True Class", "Nonconformity Score"))
Click to view results
Row True Class Nonconformity Score
1 vegetation stubble 0.030
2 grey soil 0.255
3 very damp grey soil 0.925
4 red soil 0.260
5 cotton crop 0.595
6 red soil 0.065
7 cotton crop 0.385
8 grey soil 0.075
9 damp grey soil 0.170
10 very damp grey soil 0.175

Step 2.1 – Find the calibration threshold

Sort the scores in ascending order and find the 80th percentile (use \(\alpha = 0.2\)). That value is the empirical conformal quantile \(q_{0.8}\).

Note: \(q_{1−α}=s_{[ciel((1−α)(n+1))]}\) - finite-sample correction

Code
alpha <- 0.2
n_cal <- length(first10_scores$Score)

k <- ceiling((1 - alpha) * (n_cal + 1))

q_hat <- sort(first10_scores$Score)[k]
q_hat
Click to view example 10 results
[1] 0.595

For the actual dataset we will find the 95th percentile.

Code
alpha <- 0.05
n_cal <- length(cal_scores)

k <- ceiling((1 - alpha) * (n_cal + 1))

q_95 <- sort(cal_scores)[k]
q_95
Click to view full data results
[1] 0.715

Step 3 – Build the conformal sets

Code
test_probs <- predict(rf, test, type = "prob")
test_df <- as_tibble(test_probs)
test_df$True <- test$classes
head(test_df, 10) %>%
  kable()
red soil cotton crop grey soil damp grey soil vegetation stubble very damp grey soil True
0.040 0.315 0.000 0.325 0.250 0.070 cotton crop
0.025 0.000 0.355 0.170 0.015 0.435 very damp grey soil
0.855 0.000 0.030 0.025 0.090 0.000 red soil
0.000 0.000 0.000 0.040 0.020 0.940 very damp grey soil
0.990 0.005 0.000 0.000 0.005 0.000 red soil
0.005 0.410 0.000 0.030 0.445 0.110 vegetation stubble
0.020 0.025 0.060 0.060 0.445 0.390 grey soil
0.995 0.000 0.000 0.000 0.005 0.000 red soil
0.975 0.000 0.005 0.000 0.020 0.000 red soil
0.015 0.000 0.935 0.050 0.000 0.000 grey soil

For each \(x∈X_{test}\) include all labels \(y\) whose score satisfies \(1-p_{y_i}(x_i)\leq q_{1-\alpha}\)

Code
first10 <- test_df %>%
slice(1:10)

first10$Pred <- colnames(first10)[apply(first10, 1, which.max)]

for (cls in colnames(test_probs)) {
first10[[paste0("set_", cls)]] <- 1 - first10[[cls]] <= q_95
}

first10$CP_set <- apply(first10[, grepl("^set_", names(first10))], 1, function(row) {
paste(colnames(test_probs)[row], collapse = ", ")
})

head(first10, 10) %>% select(True, Pred, CP_set) %>%
  kable()
Show first-10 sets
True Pred CP_set
cotton crop damp grey soil cotton crop, damp grey soil
very damp grey soil very damp grey soil grey soil, very damp grey soil
red soil red soil red soil
very damp grey soil very damp grey soil very damp grey soil
red soil red soil red soil
vegetation stubble vegetation stubble cotton crop, vegetation stubble
grey soil vegetation stubble vegetation stubble, very damp grey soil
red soil red soil red soil
red soil red soil red soil
grey soil grey soil grey soil
Code
test_df$Pred <- colnames(test_probs)[apply(test_probs, 1, which.max)]

for (cls in colnames(test_probs)) {
test_df[[paste0("set_", cls)]] <- 1 - test_df[[cls]] <= q_95
}

test_df$CP_set <- apply(test_df[, grepl("^set_", names(test_df))], 1, function(row) {
paste(colnames(test_probs)[row], collapse = ", ")
})

head(test_df, 10) %>% select(True, Pred, CP_set) %>%
  kable()
Click to view full data conformal sets
True Pred CP_set
cotton crop damp grey soil cotton crop, damp grey soil
very damp grey soil very damp grey soil grey soil, very damp grey soil
red soil red soil red soil
very damp grey soil very damp grey soil very damp grey soil
red soil red soil red soil
vegetation stubble vegetation stubble cotton crop, vegetation stubble
grey soil vegetation stubble vegetation stubble, very damp grey soil
red soil red soil red soil
red soil red soil red soil
grey soil grey soil grey soil

Task 4 – Check coverage

What fraction of true labels fall inside their conformal sets? Is it close to the target \(1 - \alpha = 0.95\)?

Code
covered <- mapply(function(truth, set) grepl(truth, set),
first10$True, first10$CP_set)
coverage <- mean(covered)

coverage
Click to view first-10 coverage
[1] 1
Code
covered <- mapply(function(truth, set) grepl(truth, set),
test_df$True, test_df$CP_set)
coverage <- mean(covered)
pred_class <- colnames(test_probs)[apply(test_probs, 1, which.max)]
accuracy <- mean(pred_class == test_df$True)

set_size <- rowSums(test_df[, grepl("^set_", names(test_df))])
avg_set_size <- mean(set_size)

cat("Test accuracy:", round(accuracy, 3),
    "\nConformal coverage:", round(coverage, 3),
    "\nAverage set size:", round(avg_set_size, 3))
Click to view full data coverage
Test accuracy: 0.925 
Conformal coverage: 0.97 
Average set size: 1.125

Check Marginal Coverage

Code
test_df %>%
  mutate(Covered = covered) %>%
  ggplot(aes(x = True, fill = Covered)) +
  geom_bar() +
  labs(
    title = "Conformal Prediction Coverage by Class",
    subtitle = paste0("Empirical coverage ≈ ", round(coverage, 3)),
    x = "True Land-Cover Class",
    y = "Count"
  ) +
  scale_fill_manual(
    values = c("firebrick", "seagreen3"),
    name = "Covered",
    labels = c("No", "Yes")
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))