Conformal prediction for classification produces prediction sets rather than single labels. For a target coverage level \((1 - \alpha)\), we build sets \(C(x)\) such that P(Y∈C(X))≥1−α.
Activity - Try For Yourself
We’ll classify land-cover types from satellite imagery using the Satellite dataset (6,435 samples).
Each observation contains spectral reflectance features from 4 spectral bands and belongs to one of 6 terrain classes.
For this activity:
Use the first 10 test samples for manual computation.
Then we’ll evaluate the full test set for empirical coverage.
Target coverage: \((1 - \alpha) = 0.95\).
Step 1 - Train the Model
Code
data(Satellite)sat <-as_tibble(Satellite)# Randomize and split into train/cal/testidx <-sample(1:nrow(sat), nrow(sat))train <- sat[idx[1:4000], ]cal <- sat[idx[4001:5200], ]test <- sat[idx[5201:6435], ]# Fit random forestrf <-randomForest(classes ~ ., data = train, ntree =200)# Predict class probabilities for calibrationcal_probs <-predict(rf, cal, type ="prob")cal_df <-as_tibble(cal_probs)cal_df$True <- cal$classes
Step 2 – Compute Calibration Nonconformity Scores
Here are the first 10 calibration examples, try calculating the scores \(s_i = 1 - p_{y_i}(x_i)\) manually for these before revealing the answer.