Conformal Prediction Activity

Author

Dillon Murphy

Conformal prediction for classification produces prediction sets rather than single labels. For a target coverage level \((1 - \alpha)\), we build sets \(C(x)\) such that P(Y∈C(X))≥1−α.

Activity - Try For Yourself

We’ll classify land-cover types from satellite imagery using the Satellite dataset (6,435 samples).

Each observation contains spectral reflectance features from 4 spectral bands and belongs to one of 6 terrain classes.

For this activity:

Use the first 10 test samples for manual computation.
Then we’ll evaluate the full test set for empirical coverage.

Target coverage: \((1 - \alpha) = 0.95\).

Step 1 - Train the Model

Code

data(Satellite)
sat <- as_tibble(Satellite)

# Randomize and split into train/cal/test

idx <- sample(1:nrow(sat), nrow(sat))
train <- sat[idx[1:4000], ]
cal <- sat[idx[4001:5200], ]
test <- sat[idx[5201:6435], ]

# Fit random forest

rf <- randomForest(classes ~ ., data = train, ntree = 200)

# Predict class probabilities for calibration

cal_probs <- predict(rf, cal, type = "prob")
cal_df <- as_tibble(cal_probs)
cal_df$True <- cal$classes

Step 2 – Compute Calibration Nonconformity Scores

Here are the first 10 calibration examples, try calculating the scores \(s_i = 1 - p_{y_i}(x_i)\) manually for these before revealing the answer.

Code

head(cal_df, 10) %>%
  kable()

red soil	cotton crop	grey soil	damp grey soil	vegetation stubble	very damp grey soil	True
0.020	0.000	0.000	0.000	0.970	0.010	vegetation stubble
0.085	0.000	0.745	0.045	0.010	0.115	grey soil
0.005	0.000	0.685	0.235	0.000	0.075	very damp grey soil
0.740	0.000	0.000	0.010	0.195	0.055	red soil
0.070	0.405	0.010	0.045	0.415	0.055	cotton crop
0.935	0.000	0.035	0.010	0.020	0.000	red soil
0.090	0.615	0.125	0.045	0.080	0.045	cotton crop
0.000	0.000	0.925	0.070	0.000	0.005	grey soil
0.000	0.000	0.000	0.830	0.000	0.170	damp grey soil
0.000	0.000	0.010	0.085	0.080	0.825	very damp grey soil

Code

cal_scores <- 1 - cal_probs[cbind(1:nrow(cal), match(cal$classes, colnames(cal_probs)))]

first10_scores <- tibble(
Index = 1:10,
True = cal$classes[1:10],
Score = round(cal_scores[1:10], 4)
)

first10_scores %>%
kable(col.names = c("Row", "True Class", "Nonconformity Score"))

Click to view results

Row	True Class	Nonconformity Score
1	vegetation stubble	0.030
2	grey soil	0.255
3	very damp grey soil	0.925
4	red soil	0.260
5	cotton crop	0.595
6	red soil	0.065
7	cotton crop	0.385
8	grey soil	0.075
9	damp grey soil	0.170
10	very damp grey soil	0.175

Step 2.1 – Find the calibration threshold

Sort the scores in ascending order and find the 80th percentile (use \(\alpha = 0.2\)). That value is the empirical conformal quantile \(q_{0.8}\).

Note: \(q_{1−α}=s_{[ciel((1−α)(n+1))]}\) - finite-sample correction

Code

alpha <- 0.2
n_cal <- length(first10_scores$Score)

k <- ceiling((1 - alpha) * (n_cal + 1))

q_hat <- sort(first10_scores$Score)[k]
q_hat

Click to view example 10 results

[1] 0.595

For the actual dataset we will find the 95th percentile.

Code

alpha <- 0.05
n_cal <- length(cal_scores)

k <- ceiling((1 - alpha) * (n_cal + 1))

q_95 <- sort(cal_scores)[k]
q_95

Click to view full data results

[1] 0.715

Step 3 – Build the conformal sets

Code

test_probs <- predict(rf, test, type = "prob")
test_df <- as_tibble(test_probs)
test_df$True <- test$classes
head(test_df, 10) %>%
  kable()

red soil	cotton crop	grey soil	damp grey soil	vegetation stubble	very damp grey soil	True
0.040	0.315	0.000	0.325	0.250	0.070	cotton crop
0.025	0.000	0.355	0.170	0.015	0.435	very damp grey soil
0.855	0.000	0.030	0.025	0.090	0.000	red soil
0.000	0.000	0.000	0.040	0.020	0.940	very damp grey soil
0.990	0.005	0.000	0.000	0.005	0.000	red soil
0.005	0.410	0.000	0.030	0.445	0.110	vegetation stubble
0.020	0.025	0.060	0.060	0.445	0.390	grey soil
0.995	0.000	0.000	0.000	0.005	0.000	red soil
0.975	0.000	0.005	0.000	0.020	0.000	red soil
0.015	0.000	0.935	0.050	0.000	0.000	grey soil

For each \(x∈X_{test}\) include all labels \(y\) whose score satisfies \(1-p_{y_i}(x_i)\leq q_{1-\alpha}\)

Code

first10 <- test_df %>%
slice(1:10)

first10$Pred <- colnames(first10)[apply(first10, 1, which.max)]

for (cls in colnames(test_probs)) {
first10[[paste0("set_", cls)]] <- 1 - first10[[cls]] <= q_95
}

first10$CP_set <- apply(first10[, grepl("^set_", names(first10))], 1, function(row) {
paste(colnames(test_probs)[row], collapse = ", ")
})

head(first10, 10) %>% select(True, Pred, CP_set) %>%
  kable()

Show first-10 sets

True	Pred	CP_set
cotton crop	damp grey soil	cotton crop, damp grey soil
very damp grey soil	very damp grey soil	grey soil, very damp grey soil
red soil	red soil	red soil
very damp grey soil	very damp grey soil	very damp grey soil
red soil	red soil	red soil
vegetation stubble	vegetation stubble	cotton crop, vegetation stubble
grey soil	vegetation stubble	vegetation stubble, very damp grey soil
red soil	red soil	red soil
red soil	red soil	red soil
grey soil	grey soil	grey soil

Code

test_df$Pred <- colnames(test_probs)[apply(test_probs, 1, which.max)]

for (cls in colnames(test_probs)) {
test_df[[paste0("set_", cls)]] <- 1 - test_df[[cls]] <= q_95
}

test_df$CP_set <- apply(test_df[, grepl("^set_", names(test_df))], 1, function(row) {
paste(colnames(test_probs)[row], collapse = ", ")
})

head(test_df, 10) %>% select(True, Pred, CP_set) %>%
  kable()

Click to view full data conformal sets

True	Pred	CP_set
cotton crop	damp grey soil	cotton crop, damp grey soil
very damp grey soil	very damp grey soil	grey soil, very damp grey soil
red soil	red soil	red soil
very damp grey soil	very damp grey soil	very damp grey soil
red soil	red soil	red soil
vegetation stubble	vegetation stubble	cotton crop, vegetation stubble
grey soil	vegetation stubble	vegetation stubble, very damp grey soil
red soil	red soil	red soil
red soil	red soil	red soil
grey soil	grey soil	grey soil

Task 4 – Check coverage

What fraction of true labels fall inside their conformal sets? Is it close to the target \(1 - \alpha = 0.95\)?

Code

covered <- mapply(function(truth, set) grepl(truth, set),
first10$True, first10$CP_set)
coverage <- mean(covered)

coverage

Click to view first-10 coverage

[1] 1

Code

covered <- mapply(function(truth, set) grepl(truth, set),
test_df$True, test_df$CP_set)
coverage <- mean(covered)
pred_class <- colnames(test_probs)[apply(test_probs, 1, which.max)]
accuracy <- mean(pred_class == test_df$True)

set_size <- rowSums(test_df[, grepl("^set_", names(test_df))])
avg_set_size <- mean(set_size)

cat("Test accuracy:", round(accuracy, 3),
    "\nConformal coverage:", round(coverage, 3),
    "\nAverage set size:", round(avg_set_size, 3))

Click to view full data coverage

Test accuracy: 0.925 
Conformal coverage: 0.97 
Average set size: 1.125

Check Marginal Coverage

Code

test_df %>%
  mutate(Covered = covered) %>%
  ggplot(aes(x = True, fill = Covered)) +
  geom_bar() +
  labs(
    title = "Conformal Prediction Coverage by Class",
    subtitle = paste0("Empirical coverage ≈ ", round(coverage, 3)),
    x = "True Land-Cover Class",
    y = "Count"
  ) +
  scale_fill_manual(
    values = c("firebrick", "seagreen3"),
    name = "Covered",
    labels = c("No", "Yes")
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))