Histopathologic Cancer Detection

Kaggle Mini-Project: Binary Image Classification using Convolutional Neural Networks


Author: C McGinnis Date: November 2024
Course: Deep Learning GitHub Repository: [Your GitHub URL]


1. Problem Description

1.1 Background

This project addresses the Kaggle competition “Histopathologic Cancer Detection”, which focuses on identifying metastatic cancer in small image patches taken from larger digital pathology scans of lymph node sections.

Histopathology is the microscopic examination of tissue to study the manifestations of disease. In cancer diagnosis, pathologists examine tissue samples under a microscope to identify the presence of cancer cells. This process is critical for:

  • Cancer Staging: Determining whether cancer has spread to lymph nodes
  • Treatment Planning: Guiding decisions about surgery, chemotherapy, or radiation
  • Prognosis: Predicting patient outcomes based on cancer characteristics

1.2 Clinical Significance

Manual examination of histopathology slides is:

  • Time-consuming: Pathologists may spend hours examining a single case
  • Subjective: Inter-observer variability can lead to inconsistent diagnoses
  • Error-prone: Fatigue and high caseloads can lead to missed detections

Deep learning solutions can assist pathologists by providing rapid screening, highlighting regions of interest, and offering a “second opinion” to reduce diagnostic errors.

1.3 Task Definition

Aspect Description
Task Type Binary Image Classification
Input 96×96 pixel RGB image patches (.tif format)
Output Probability of containing metastatic tissue (0 to 1)
Positive Label (1) Center 32×32 pixel region contains at least one tumor pixel
Negative Label (0) No tumor tissue in the center region
Evaluation Metric Area Under the ROC Curve (AUC-ROC)

Important Note: The label is determined by the center 32×32 pixel region only. Tumor tissue in the outer border of the 96×96 patch does not affect the label.

1.4 Approach Overview

We will use Transfer Learning with a pre-trained ResNet-18 convolutional neural network. This approach is effective because:

  1. Feature Reuse: Early CNN layers learn generic visual features (edges, textures) that transfer well to medical images
  2. Faster Convergence: Pre-trained weights provide better initialization than random weights
  3. Reduced Overfitting: Pre-trained features act as regularization on the target task

2. Data Description

2.1 Dataset Source

The dataset is a modified version of the PatchCamelyon (PCam) benchmark dataset, derived from the Camelyon16 challenge for detecting metastases in lymph node sections.

Source: Kaggle - Histopathologic Cancer Detection

2.2 Dataset Overview

Component Description
Training Images 220,025 images (96×96 RGB, .tif format)
Test Images 57,458 images for Kaggle submission
Labels File train_labels.csv with image IDs and binary labels
Total Size Approximately 6-7 GB

2.3 Data Files Structure

histopathologic-cancer-detection/
├── train/                    # Training images (220,025 .tif files)
├── test/                     # Test images (57,458 .tif files)
├── train_labels.csv          # Training labels (id, label)
└── sample_submission.csv     # Submission format template

3. Setup: Libraries & Configuration

In this section, we import the Python libraries needed for:

  • Data handling and analysis (numpy, pandas)
  • Visualization (matplotlib, seaborn)
  • Image loading and transforms (PIL, torchvision.transforms)
  • Deep learning with CNNs and transfer learning (torch, torchvision.models)
  • Model evaluation (sklearn.metrics)

We also: - Set a random seed for reproducibility - Detect whether a GPU is available - Configure PyTorch for deterministic behavior (as much as possible)

# General utilities
import os
import random
import warnings

# Data handling
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Image handling
from PIL import Image

# PyTorch core
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# TorchVision (for transforms + pretrained models)
from torchvision import transforms, models

# Metrics
from sklearn.metrics import roc_auc_score, confusion_matrix, classification_report

# Silence unnecessary warnings (optional)
warnings.filterwarnings("ignore")

# -------------------------
# Reproducibility settings
# -------------------------
SEED = 42

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

# For (more) deterministic behavior
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

from tqdm.auto import tqdm 

# -------------------------
# Device configuration
# -------------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
Using device: cpu

3. Data Loading

In this section, we:

  • Define paths to the dataset on disk
  • Load train_labels.csv into a pandas DataFrame
  • Inspect the structure of the labels (number of samples, label distribution)

If you are running this notebook on a different machine or platform (e.g., Kaggle, Colab), you may need to update the DATA_DIR path accordingly.

# Base directory where the Kaggle dataset is stored
# Update this path if your dataset is in a different location
DATA_DIR = "/Users/cynthiamcginnis/Downloads/histopathologic-cancer-detection"

TRAIN_DIR = os.path.join(DATA_DIR, "train")
TEST_DIR = os.path.join(DATA_DIR, "test")
LABELS_PATH = os.path.join(DATA_DIR, "train_labels.csv")

print("DATA_DIR :", DATA_DIR)
print("TRAIN_DIR:", TRAIN_DIR)
print("TEST_DIR :", TEST_DIR)
print("LABELS   :", LABELS_PATH)

# Load labels
labels_df = pd.read_csv(LABELS_PATH)

print("\nLabels DataFrame shape:", labels_df.shape)
labels_df.head()
DATA_DIR : /Users/cynthiamcginnis/Downloads/histopathologic-cancer-detection
TRAIN_DIR: /Users/cynthiamcginnis/Downloads/histopathologic-cancer-detection/train
TEST_DIR : /Users/cynthiamcginnis/Downloads/histopathologic-cancer-detection/test
LABELS   : /Users/cynthiamcginnis/Downloads/histopathologic-cancer-detection/train_labels.csv

Labels DataFrame shape: (220025, 2)
id label
0 f38a6374c348f90b587e046aac6079959adf3835 0
1 c18f2d887b7ae4f6742ee445113fa1aef383ed77 1
2 755db6279dae599ebb4d39a9123cce439965282d 0
3 bc3f0c64fb968ff4a8bd33af6971ecae77c75e08 0
4 068aba587a4950175d04c680d38943fd488d6a9d 0
DEBUG = True
DEBUG_SIZE = 5000

if DEBUG:
    labels_df_small = labels_df.sample(n=DEBUG_SIZE, random_state=SEED).reset_index(drop=True)
else:
    labels_df_small = labels_df

print("Using", len(labels_df_small), "samples.")
Using 5000 samples.
from sklearn.model_selection import train_test_split

train_df, val_df = train_test_split(
    labels_df_small,
    test_size=0.2,                     # 80% train, 20% val
    stratify=labels_df_small["label"],
    random_state=SEED
)

print("Train size:", len(train_df))
print("Val size  :", len(val_df))
print("\nTrain label distribution:")
print(train_df["label"].value_counts(normalize=True))
print("\nVal label distribution:")
print(val_df["label"].value_counts(normalize=True))
Train size: 4000
Val size  : 1000

Train label distribution:
label
0    0.59425
1    0.40575
Name: proportion, dtype: float64

Val label distribution:
label
0    0.594
1    0.406
Name: proportion, dtype: float64

4. Custom PyTorch Dataset

We define a custom torch.utils.data.Dataset to handle:

  • Looking up image file paths from train_labels.csv
  • Loading each .tif image by its id
  • Returning image tensors and their corresponding labels

This class is flexible enough to be used for both: - Training/validation (with labels), and - Test set (without labels), by toggling a has_labels flag.

class HistopathDataset(Dataset):
    """
    Custom Dataset for the Histopathologic Cancer Detection dataset.

    Args:
        df (pd.DataFrame): DataFrame containing at least an 'id' column.
                           If labels are available, it should also contain a 'label' column.
        img_dir (str): Directory where the .tif image files are stored.
        transform (callable, optional): Optional transform to be applied on a sample.
        has_labels (bool): Set to True if df contains labels. For test data, set to False.
    """
    def __init__(self, df, img_dir, transform=None, has_labels=True):
        self.df = df.reset_index(drop=True)
        self.img_dir = img_dir
        self.transform = transform
        self.has_labels = has_labels

        # Sanity checks
        if self.has_labels and "label" not in self.df.columns:
            raise ValueError("has_labels=True but no 'label' column found in DataFrame.")
        if "id" not in self.df.columns:
            raise ValueError("DataFrame must contain an 'id' column.")

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        # Get image ID
        img_id = self.df.loc[idx, "id"]
        img_path = os.path.join(self.img_dir, f"{img_id}.tif")

        # Load image
        image = Image.open(img_path)

        # Convert to RGB if needed (sometimes medical images can be single-channel)
        if image.mode != "RGB":
            image = image.convert("RGB")

        # Apply transforms
        if self.transform is not None:
            image = self.transform(image)

        if self.has_labels:
            label = int(self.df.loc[idx, "label"])
            return image, label
        else:
            # For test set (no labels)
            return image, img_id  # return id so we can match predictions later


# -------------------------
# Basic transform (no augmentation yet)
# -------------------------
# For ResNet (pretrained on ImageNet), we typically normalize images with ImageNet stats.
basic_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],  # ImageNet mean
        std=[0.229, 0.224, 0.225]    # ImageNet std
    )
])

# Example: create a training dataset instance
train_dataset = HistopathDataset(
    df=labels_df,
    img_dir=TRAIN_DIR,
    transform=basic_transform,
    has_labels=True
)

print("Number of training samples:", len(train_dataset))

# Peek at one sample
sample_img, sample_label = train_dataset[0]
print("Single sample shape:", sample_img.shape)
print("Single sample label:", sample_label)
Number of training samples: 220025
Single sample shape: torch.Size([3, 96, 96])
Single sample label: 0

3. Exploratory Data Analysis (EDA)

In this section, we perform basic exploratory data analysis to understand the dataset before training a model. Specifically, we will:

  1. Inspect the label distribution (tumor vs non-tumor).
  2. Verify image sizes and channels.
  3. Visualize random image patches from each class.
  4. (Optionally) Inspect pixel intensity distributions.

This helps us: - Check for class imbalance. - Confirm that all images have the expected dimensions (96×96, 3 channels). - Build intuition about what tumor and non-tumor patches look like.

# Basic label distribution
label_counts = labels_df["label"].value_counts().sort_index()
label_percent = labels_df["label"].value_counts(normalize=True).sort_index() * 100

print("Label counts:\n", label_counts)
print("\nLabel percentage:\n", label_percent.round(2))

# Plot label distribution
plt.figure(figsize=(5, 4))
sns.barplot(x=label_counts.index, y=label_counts.values)
plt.xticks([0, 1], ["Non-tumor (0)", "Tumor (1)"])
plt.xlabel("Label")
plt.ylabel("Count")
plt.title("Label Distribution")
plt.tight_layout()
plt.show()
Label counts:
 label
0    130908
1     89117
Name: count, dtype: int64

Label percentage:
 label
0    59.5
1    40.5
Name: proportion, dtype: float64

png

from collections import Counter

def inspect_random_images(df, img_dir, n_samples=500):
    sizes = []
    modes = []

    sample_df = df.sample(n=min(n_samples, len(df)), random_state=SEED)

    for img_id in sample_df["id"]:
        img_path = os.path.join(img_dir, f"{img_id}.tif")
        with Image.open(img_path) as img:
            sizes.append(img.size)  # (width, height)
            modes.append(img.mode)

    size_counts = Counter(sizes)
    mode_counts = Counter(modes)

    print("Most common sizes:", size_counts.most_common(5))
    print("Image modes:", mode_counts)

inspect_random_images(labels_df, TRAIN_DIR, n_samples=500)
Most common sizes: [((96, 96), 500)]
Image modes: Counter({'RGB': 500})
def show_image_grid(df, img_dir, label, n=9, n_cols=3):
    """
    Show a grid of random images for a given label (0 or 1).
    """
    subset = df[df["label"] == label].sample(n=n, random_state=SEED)
    n_rows = int(np.ceil(n / n_cols))

    plt.figure(figsize=(3 * n_cols, 3 * n_rows))

    for i, img_id in enumerate(subset["id"].values):
        img_path = os.path.join(img_dir, f"{img_id}.tif")
        img = Image.open(img_path)

        plt.subplot(n_rows, n_cols, i + 1)
        plt.imshow(img)
        plt.axis("off")

    class_name = "Tumor (1)" if label == 1 else "Non-tumor (0)"
    plt.suptitle(f"Random examples: {class_name}", fontsize=14)
    plt.tight_layout()
    plt.show()


# Show non-tumor examples
show_image_grid(labels_df, TRAIN_DIR, label=0, n=9, n_cols=3)

# Show tumor examples
show_image_grid(labels_df, TRAIN_DIR, label=1, n=9, n_cols=3)

png

png

def plot_pixel_histogram(df, img_dir, n_samples=200):
    sample_df = df.sample(n=min(n_samples, len(df)), random_state=SEED)

    pixels = []

    for img_id in sample_df["id"]:
        img_path = os.path.join(img_dir, f"{img_id}.tif")
        img = Image.open(img_path).convert("RGB")
        arr = np.array(img) / 255.0  # scale to 0–1
        pixels.append(arr.reshape(-1, 3))  # flatten H×W×C -> (N, 3)

    pixels = np.vstack(pixels)

    plt.figure(figsize=(8, 4))
    plt.hist(pixels[:, 0], bins=50, alpha=0.5, label="R")
    plt.hist(pixels[:, 1], bins=50, alpha=0.5, label="G")
    plt.hist(pixels[:, 2], bins=50, alpha=0.5, label="B")
    plt.legend()
    plt.title("Pixel Intensity Distribution (Sample)")
    plt.xlabel("Intensity (0–1)")
    plt.ylabel("Frequency")
    plt.tight_layout()
    plt.show()

plot_pixel_histogram(labels_df, TRAIN_DIR, n_samples=200)

png

4. Data Preprocessing & Augmentation

We now:

  1. Create a smaller debug subset to make experimentation faster.
  2. Perform a stratified train/validation split.
  3. Define image transforms (with augmentation for training).
  4. Implement a custom HistopathDataset.
  5. Wrap datasets in PyTorch DataLoaders.

4.1 Train / Validation Split

# 4.1 Debug subset + train/validation split

DEBUG = True
DEBUG_SIZE = 5000  # number of samples to use while prototyping

if DEBUG:
    labels_df_small = labels_df.sample(n=DEBUG_SIZE, random_state=SEED).reset_index(drop=True)
else:
    labels_df_small = labels_df

print("Using", len(labels_df_small), "samples.")

train_df, val_df = train_test_split(
    labels_df_small,
    test_size=0.2,
    stratify=labels_df_small["label"],
    random_state=SEED
)

print("Train size:", len(train_df))
print("Val size  :", len(val_df))
print("\nTrain label distribution:")
print(train_df["label"].value_counts(normalize=True))
print("\nVal label distribution:")
print(val_df["label"].value_counts(normalize=True))
Using 5000 samples.
Train size: 4000
Val size  : 1000

Train label distribution:
label
0    0.59425
1    0.40575
Name: proportion, dtype: float64

Val label distribution:
label
0    0.594
1    0.406
Name: proportion, dtype: float64

4.2 Code cell – transforms

# 4.2 Image transforms & augmentation

IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD  = [0.229, 0.224, 0.225]

train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomVerticalFlip(p=0.5),
    transforms.RandomRotation(degrees=15),
    transforms.ToTensor(),
    transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
])

val_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
])

4.3 Code cell – HistopathDataset class

# 4.3 Custom Dataset class

class HistopathDataset(Dataset):
    """
    Dataset for Histopathologic Cancer Detection.
    Expects a DataFrame with 'id' and (optionally) 'label'.
    """
    def __init__(self, df, img_dir, transform=None, has_labels=True):
        self.df = df.reset_index(drop=True)
        self.img_dir = img_dir
        self.transform = transform
        self.has_labels = has_labels

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        img_id = self.df.loc[idx, "id"]
        img_path = os.path.join(self.img_dir, f"{img_id}.tif")

        image = Image.open(img_path)
        if image.mode != "RGB":
            image = image.convert("RGB")

        if self.transform is not None:
            image = self.transform(image)

        if self.has_labels:
            label = int(self.df.loc[idx, "label"])
            return image, label
        else:
            return image, img_id

4.4 Code cell – DataLoaders

# 4.4 Datasets & DataLoaders

BATCH_SIZE = 64
NUM_WORKERS = 0  # important for Jupyter / Anaconda to avoid multiprocessing issues

train_dataset = HistopathDataset(
    df=train_df,
    img_dir=TRAIN_DIR,
    transform=train_transform,
    has_labels=True
)

val_dataset = HistopathDataset(
    df=val_df,
    img_dir=TRAIN_DIR,
    transform=val_transform,
    has_labels=True
)

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=NUM_WORKERS,
    pin_memory=torch.cuda.is_available()
)

val_loader = DataLoader(
    val_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=NUM_WORKERS,
    pin_memory=torch.cuda.is_available()
)

print("Train batches:", len(train_loader))
print("Val batches  :", len(val_loader))

# sanity check
images, labels = next(iter(train_loader))
print("One batch images:", images.shape)
print("One batch labels:", labels.shape)
Train batches: 63
Val batches  : 16
One batch images: torch.Size([64, 3, 96, 96])
One batch labels: torch.Size([64])

5. Model Architecture

We use transfer learning with a pre-trained ResNet-18:

  • Load ImageNet-pretrained weights.
  • Freeze convolutional layers (feature extractor).
  • Replace the final fully-connected layer with a single-output unit for binary classification (tumor vs non-tumor).

5.1 Code cell – create ResNet-18

# 5.1 ResNet-18 model

from torchvision.models import resnet18, ResNet18_Weights

def create_resnet18_model(freeze_features=True):
    weights = ResNet18_Weights.DEFAULT
    model = resnet18(weights=weights)

    if freeze_features:
        for p in model.parameters():
            p.requires_grad = False

    in_features = model.fc.in_features
    model.fc = nn.Linear(in_features, 1)  # one logit
    return model

model = create_resnet18_model(freeze_features=True).to(device)
print("Model created on device:", device)
Model created on device: cpu

6. Model Training

We now define:

  • The loss function (BCEWithLogitsLoss)
  • The optimizer (Adam on trainable parameters)
  • An optional learning rate scheduler
  • The training loop that:
    • Trains on the training set
    • Evaluates on the validation set
    • Computes AUC-ROC each epoch

6.1 Code cell – loss, optimizer, scheduler

# 6.1 Loss, optimizer, scheduler

criterion = nn.BCEWithLogitsLoss()

optimizer = optim.Adam(
    [p for p in model.parameters() if p.requires_grad],
    lr=1e-4
)

scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer,
    mode="min",
    factor=0.5,
    patience=2,
    verbose=True
)

6.2 Code cell – train_model function

# 6.2 Training loop with AUC and tqdm progress bars

def train_model(model, criterion, optimizer, scheduler,
                train_loader, val_loader, num_epochs=5, device=device):

    history = {"train_loss": [], "val_loss": [], "val_auc": []}
    best_auc = 0.0
    best_state_dict = None

    for epoch in range(1, num_epochs + 1):
        print(f"\nEpoch {epoch}/{num_epochs}")
        print("-" * 30)

        # ----- TRAIN -----
        model.train()
        running_train_loss = 0.0

        for images, labels in tqdm(train_loader, desc=f"Train epoch {epoch}", leave=False):
            images = images.to(device)
            labels = labels.to(device).float().unsqueeze(1)

            outputs = model(images)
            loss = criterion(outputs, labels)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            running_train_loss += loss.item() * images.size(0)

        epoch_train_loss = running_train_loss / len(train_loader.dataset)

        # ----- VALIDATE -----
        model.eval()
        running_val_loss = 0.0
        all_probs, all_labels = [], []

        for images, labels in tqdm(val_loader, desc=f"Val epoch {epoch}", leave=False):
            images = images.to(device)
            labels = labels.to(device).float().unsqueeze(1)

            with torch.no_grad():
                outputs = model(images)
                loss = criterion(outputs, labels)

            running_val_loss += loss.item() * images.size(0)

            probs = torch.sigmoid(outputs).cpu().numpy().ravel()
            all_probs.extend(probs)
            all_labels.extend(labels.cpu().numpy().ravel())

        epoch_val_loss = running_val_loss / len(val_loader.dataset)
        epoch_val_auc = roc_auc_score(all_labels, all_probs)

        if scheduler is not None:
            scheduler.step(epoch_val_loss)

        history["train_loss"].append(epoch_train_loss)
        history["val_loss"].append(epoch_val_loss)
        history["val_auc"].append(epoch_val_auc)

        print(f"Train Loss: {epoch_train_loss:.4f} | "
              f"Val Loss: {epoch_val_loss:.4f} | "
              f"Val AUC: {epoch_val_auc:.4f}")

        if epoch_val_auc > best_auc:
            best_auc = epoch_val_auc
            best_state_dict = model.state_dict()
            print(f"🔥 New best model with AUC: {best_auc:.4f} (saving weights)")

    if best_state_dict is not None:
        model.load_state_dict(best_state_dict)
        print(f"\nTraining complete. Best Val AUC: {best_auc:.4f}")
    else:
        print("\nTraining complete, but no valid AUC was computed.")

    return model, history

6.3 Code cell – run a 1-epoch debug train

# 6.3 Run a short debug training

NUM_EPOCHS = 5  # keep small while testing

model, history = train_model(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    train_loader=train_loader,
    val_loader=val_loader,
    num_epochs=NUM_EPOCHS,
    device=device
)
Epoch 1/5
------------------------------



Train epoch 1:   0%|          | 0/63 [00:00<?, ?it/s]



Val epoch 1:   0%|          | 0/16 [00:00<?, ?it/s]


Train Loss: 0.5021 | Val Loss: 0.5169 | Val AUC: 0.8334
🔥 New best model with AUC: 0.8334 (saving weights)

Epoch 2/5
------------------------------



Train epoch 2:   0%|          | 0/63 [00:00<?, ?it/s]



Val epoch 2:   0%|          | 0/16 [00:00<?, ?it/s]


Train Loss: 0.4896 | Val Loss: 0.5091 | Val AUC: 0.8437
🔥 New best model with AUC: 0.8437 (saving weights)

Epoch 3/5
------------------------------



Train epoch 3:   0%|          | 0/63 [00:00<?, ?it/s]



Val epoch 3:   0%|          | 0/16 [00:00<?, ?it/s]


Train Loss: 0.4782 | Val Loss: 0.5012 | Val AUC: 0.8479
🔥 New best model with AUC: 0.8479 (saving weights)

Epoch 4/5
------------------------------



Train epoch 4:   0%|          | 0/63 [00:00<?, ?it/s]



Val epoch 4:   0%|          | 0/16 [00:00<?, ?it/s]


Train Loss: 0.4649 | Val Loss: 0.4888 | Val AUC: 0.8555
🔥 New best model with AUC: 0.8555 (saving weights)

Epoch 5/5
------------------------------



Train epoch 5:   0%|          | 0/63 [00:00<?, ?it/s]



Val epoch 5:   0%|          | 0/16 [00:00<?, ?it/s]


Train Loss: 0.4603 | Val Loss: 0.4852 | Val AUC: 0.8569
🔥 New best model with AUC: 0.8569 (saving weights)

Training complete. Best Val AUC: 0.8569

7. Results & Evaluation

7.1 Training & Validation Curves

We visualize training and validation loss, as well as validation AUC-ROC, across epochs. Even though we only trained for 1 epoch in debug mode, this code works for longer runs too.

# 7.1 Plot loss and AUC curves from history

def plot_training_history(history):
    epochs = range(1, len(history["train_loss"]) + 1)

    plt.figure(figsize=(12, 4))

    # Loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, history["train_loss"], marker="o", label="Train Loss")
    plt.plot(epochs, history["val_loss"], marker="o", label="Val Loss")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.title("Training vs Validation Loss")
    plt.legend()
    plt.grid(True)

    # AUC
    plt.subplot(1, 2, 2)
    plt.plot(epochs, history["val_auc"], marker="o", label="Val AUC")
    plt.xlabel("Epoch")
    plt.ylabel("AUC-ROC")
    plt.title("Validation AUC-ROC")
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

plot_training_history(history)

png

7.2 Confusion Matrix & Validation Metrics

We now evaluate the final model on the validation set:

  • Compute predicted probabilities and hard labels (threshold = 0.5).
  • Compute AUC-ROC on the validation set.
  • Show the confusion matrix and a basic classification report.
from sklearn.metrics import confusion_matrix, classification_report

# 7.2 Evaluate on validation set

model.eval()
all_probs = []
all_labels = []

with torch.no_grad():
    for images, labels in val_loader:
        images = images.to(device)
        labels = labels.to(device).float().unsqueeze(1)

        outputs = model(images)
        probs = torch.sigmoid(outputs).cpu().numpy().ravel()

        all_probs.extend(probs)
        all_labels.extend(labels.cpu().numpy().ravel())

all_probs = np.array(all_probs)
all_labels = np.array(all_labels)

# AUC-ROC
val_auc = roc_auc_score(all_labels, all_probs)
print(f"Validation AUC-ROC (recomputed): {val_auc:.4f}")

# Hard predictions with threshold 0.5
y_pred = (all_probs >= 0.5).astype(int)

cm = confusion_matrix(all_labels, y_pred)
print("\nConfusion Matrix:\n", cm)

print("\nClassification Report:\n")
print(classification_report(all_labels, y_pred, digits=4))
Validation AUC-ROC (recomputed): 0.8569

Confusion Matrix:
 [[526  68]
 [136 270]]

Classification Report:

              precision    recall  f1-score   support

         0.0     0.7946    0.8855    0.8376       594
         1.0     0.7988    0.6650    0.7258       406

    accuracy                         0.7960      1000
   macro avg     0.7967    0.7753    0.7817      1000
weighted avg     0.7963    0.7960    0.7922      1000

8. Kaggle Submission

The competition expects a CSV file with the columns:

  • id – image ID (without .tif)
  • label – predicted probability that the image contains tumor tissue (class 1)

In this section, we: 1. Load sample_submission.csv to get the test image IDs in the correct order. 2. Create a HistopathDataset for the test folder (no labels). 3. Run the trained model on all test images to generate predicted probabilities. 4. Save a submission.csv file that can be uploaded to Kaggle.

# 8.1 Load sample_submission and create test dataset/dataloader

# Path to the Kaggle sample_submission file in your dataset directory
SAMPLE_SUB_PATH = os.path.join(DATA_DIR, "/Users/cynthiamcginnis/Downloads/histopathologic-cancer-detection/sample_submission.csv")

sample_sub_df = pd.read_csv(SAMPLE_SUB_PATH)
print("sample_submission shape:", sample_sub_df.shape)
sample_sub_df.head()
sample_submission shape: (57458, 2)
id label
0 0b2ea2a822ad23fdb1b5dd26653da899fbd2c0d5 0
1 95596b92e5066c5c52466c90b69ff089b39f2737 0
2 248e6738860e2ebcf6258cdc1f32f299e0c76914 0
3 2c35657e312966e9294eac6841726ff3a748febf 0
4 145782eb7caa1c516acbe2eda34d9a3f31c41fd6 0
# How many test IDs are there?
len(sample_sub_df), sample_sub_df.head()
(57458,
                                          id  label
 0  0b2ea2a822ad23fdb1b5dd26653da899fbd2c0d5      0
 1  95596b92e5066c5c52466c90b69ff089b39f2737      0
 2  248e6738860e2ebcf6258cdc1f32f299e0c76914      0
 3  2c35657e312966e9294eac6841726ff3a748febf      0
 4  145782eb7caa1c516acbe2eda34d9a3f31c41fd6      0)
# Do we have predictions for them?
len(id_to_prob)
31040

test DataFrame and dataset using the existing HistopathDataset and val_transform:

# We only need the 'id' column for the test set
test_df = sample_sub_df[["id"]].copy()

# Test dataset: has_labels=False so __getitem__ returns (image, id)
test_dataset = HistopathDataset(
    df=test_df,
    img_dir=TEST_DIR,
    transform=val_transform,
    has_labels=False
)

test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,             # do NOT shuffle; we keep Kaggle order
    num_workers=0,
    pin_memory=torch.cuda.is_available()
)

print("Number of test batches:", len(test_loader))
Number of test batches: 898

8.2 Generate Predictions on the Test Set

We now run the trained model in evaluation mode on all test images:

  • For each batch, we compute the output logits.
  • Apply a sigmoid to obtain probabilities for class 1 (tumor).
  • Store the probability for each image ID in a dictionary.

We then match these probabilities back to the sample_submission.csv order.

# 8.2 Inference on the test set to create Kaggle predictions (with progress prints)

model.eval()
id_to_prob = {}

total_batches = len(test_loader)
print("Total test batches:", total_batches)

with torch.no_grad():
    for batch_idx, (images, ids) in enumerate(test_loader, start=1):
        images = images.to(device)

        outputs = model(images)                       # [B, 1]
        probs = torch.sigmoid(outputs).cpu().numpy().ravel()  # [B]

        for img_id, p in zip(ids, probs):
            id_to_prob[img_id] = p

        if batch_idx % 20 == 0 or batch_idx == total_batches:
            print(f"Processed {batch_idx}/{total_batches} batches")

print("Number of predictions in id_to_prob:", len(id_to_prob))
Total test batches: 898
Processed 20/898 batches
Processed 40/898 batches
Processed 60/898 batches
Processed 80/898 batches
Processed 100/898 batches
Processed 120/898 batches
Processed 140/898 batches
Processed 160/898 batches
Processed 180/898 batches
Processed 200/898 batches
Processed 220/898 batches
Processed 240/898 batches
Processed 260/898 batches
Processed 280/898 batches
Processed 300/898 batches
Processed 320/898 batches
Processed 340/898 batches
Processed 360/898 batches
Processed 380/898 batches
Processed 400/898 batches
Processed 420/898 batches
Processed 440/898 batches
Processed 460/898 batches
Processed 480/898 batches
Processed 500/898 batches
Processed 520/898 batches
Processed 540/898 batches
Processed 560/898 batches
Processed 580/898 batches
Processed 600/898 batches
Processed 620/898 batches
Processed 640/898 batches
Processed 660/898 batches
Processed 680/898 batches
Processed 700/898 batches
Processed 720/898 batches
Processed 740/898 batches
Processed 760/898 batches
Processed 780/898 batches
Processed 800/898 batches
Processed 820/898 batches
Processed 840/898 batches
Processed 860/898 batches
Processed 880/898 batches
Processed 898/898 batches
Number of predictions in id_to_prob: 57458
# 8.2 Inference on the test set to create Kaggle predictions

model.eval()

id_to_prob = {}  # dict: image id -> predicted probability

with torch.no_grad():
    for images, ids in test_loader:   # ids is a list of image ids from HistopathDataset
        images = images.to(device)

        outputs = model(images)                       # shape [B, 1]
        probs = torch.sigmoid(outputs).cpu().numpy().ravel()  # shape [B]

        for img_id, p in zip(ids, probs):
            id_to_prob[img_id] = p

print("Number of predictions:", len(id_to_prob))
Number of predictions: 57458

8.3 Build submission.csv and Save

We now:

  1. Copy sample_submission.csv so we preserve the exact ordering of test image IDs.
  2. Fill in the label column using our predicted probabilities.
  3. Save the file as submission_resnet18_debug.csv.

This CSV can be uploaded directly to Kaggle for scoring.

# 8.3 Create submission DataFrame in the same order as sample_submission

submission_df = sample_sub_df.copy()
submission_df["label"] = submission_df["id"].map(id_to_prob)

# Sanity check: no missing predictions
missing = submission_df["label"].isnull().sum()
print("Missing predictions:", missing)

OUTPUT_CSV = "submission_resnet18_debug.csv"
submission_df.to_csv(OUTPUT_CSV, index=False)

print(f"Saved submission file: {OUTPUT_CSV}")
submission_df.head()
Missing predictions: 0
Saved submission file: submission_resnet18_debug.csv
id label
0 0b2ea2a822ad23fdb1b5dd26653da899fbd2c0d5 0.233324
1 95596b92e5066c5c52466c90b69ff089b39f2737 0.760667
2 248e6738860e2ebcf6258cdc1f32f299e0c76914 0.301223
3 2c35657e312966e9294eac6841726ff3a748febf 0.280466
4 145782eb7caa1c516acbe2eda34d9a3f31c41fd6 0.368737

8. Kaggle Submission and External Evaluation

To externally validate the model, I generated predictions on the Kaggle test set and submitted the file submission_resnet18_debug.csv to the Histopathologic Cancer Detection competition. Kaggle evaluates submissions using AUC-ROC on a hidden test set split into a public and a private portion.

My best model (ResNet-18 with transfer learning, trained for 5 epochs on a 5,000-sample subset) achieved:

  • Local validation AUC-ROC: 0.8231
  • Kaggle public AUC-ROC: 0.8442
  • Kaggle private AUC-ROC: 0.8059

The Kaggle scores are consistent with the internal validation performance: all three values are in the 0.80–0.84 range, indicating that the model generalizes reasonably well to unseen data and that my local validation procedure is not severely optimistic. The small gap between public and private scores is expected, as they are computed on different subsets of the hidden test set. Overall, the Kaggle evaluation confirms that the trained model learns clinically relevant patterns for metastatic cancer detection in the histopathology patches.

Screenshot 2025-11-21 at 7.55.03 PM.png