EKG Specialist V3 - Results Summary

PEPCEI Project: Phase I

Author

Awan

Published

March 4, 2026

Show code
library(tidyverse)
library(knitr)
library(kableExtra)
library(ggridges)

results_dir <- "../sim-results-new/run3-successful/"

checkpoint_files <- tibble(
  file = c(
    "checkpoint_K0_split612_20260301_183110.rds",
    "checkpoint_K0_split928_20260302_183415.rds",
    "checkpoint_K1_split612_20260301_205417.rds",
    "checkpoint_K1_split928_20260302_210111.rds",
    "checkpoint_K2_split612_20260301_232405.rds",
    "checkpoint_K2_split928_20260302_234113.rds",
    "checkpoint_K3_split612_20260302_091243.rds",
    "checkpoint_K3_split928_20260303_083434.rds",
    "checkpoint_K4_split612_20260302_155423.rds",
    "checkpoint_K4_split928_20260303_093601.rds",
    "checkpoint_K5_split612_20260302_165129.rds",
    "checkpoint_K5_split928_20260303_103916.rds"
  ),
  expected_kernel = rep(paste0("K", 0:5), each = 2),
  expected_split = rep(c(612, 928), times = 6)
)

master <- map2_dfr(checkpoint_files$file, seq_len(nrow(checkpoint_files)), function(f, i) {
  cp <- readRDS(file.path(results_dir, f))
  cp$summary %>%
    filter(kernel_id == checkpoint_files$expected_kernel[i],
           data_split_seed == checkpoint_files$expected_split[i])
})

make_2x2 <- function(tp, fp, fn, tn, caption = NULL) {
  mat <- data.frame(
    ` ` = c("Actual PE+", "Actual PE-", "Total"),
    `Predicted PE+` = c(tp, fp, tp + fp),
    `Predicted PE-` = c(fn, tn, fn + tn),
    Total = c(tp + fn, fp + tn, tp + fp + fn + tn),
    check.names = FALSE
  )
  kable(mat, booktabs = TRUE, caption = caption, align = "lrrr") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
}

1 Summary

We evaluated 6 kernel configurations (K0-K5) spanning receptive fields from 82ms to 2,598ms on a 2-layer Conv2D architecture for pulmonary embolism detection from raw 12-lead EKG waveforms. 2,160 experiments (6 kernels x 3 class weights x 6 LRs x 2 dropouts x 2 data splits x 5 seeds).

Key results:

  • Kernel size significantly affects AUROC (Kruskal-Wallis p < 1e-14). K3 (1,122ms RF) has the highest median AUROC (0.590) and AUPRC (0.290, ~1.3x prevalence). K3 outperforms K0-K2 (Wilcoxon p < 0.001, BH-adjusted). K3 vs K4/K5: not significant.
  • Class weight does not affect AUROC/AUPRC. All three strategies produce overlapping distributions. Class weight affects binary predictions at cutoff 0.50: inverse_freq eliminates collapse; other strategies require lower cutoffs.
  • Learning rate and dropout have limited effect on AUROC within the tested ranges.
  • EKG alone: ~0.59 AUROC. Above chance but limited.

Methods Note: A critical optimizer bug (GPU parameters not updating after .to(cuda)) was discovered and fixed prior to these runs. All 2,160 experiments reported here used the corrected training pipeline.

Terminology

AUROC (Area Under the Receiver Operating Characteristic Curve): Measures the model’s ability to distinguish PE-positive from PE-negative patients across all possible decision thresholds. Ranges from 0.5 (random chance) to 1.0 (perfect discrimination). Threshold-independent.

AUPRC (Area Under the Precision-Recall Curve): Measures discrimination performance with emphasis on the positive (PE) class. More informative than AUROC when the positive class is rare. Baseline equals the prevalence (~0.232 for our EKG-only cohort, ~0.235 for the paired validation cohort). Values are reported with a multiple of prevalence in parentheses (e.g., 0.292 (~1.3x) means 1.3 times the baseline). Threshold-independent.

Sensitivity (Recall, True Positive Rate): Proportion of actual PE cases correctly identified. A sensitivity of 80% means 80 out of 100 PE patients are flagged. Critical for screening, as missed PE cases can be fatal.

Specificity (True Negative Rate): Proportion of actual non-PE cases correctly identified. A specificity of 60% means 60 out of 100 non-PE patients are correctly cleared.

PPV (Positive Predictive Value, Precision): Among patients the model flags as PE-positive, the proportion who actually have PE.

NPV (Negative Predictive Value): Among patients the model clears as PE-negative, the proportion who truly do not have PE.

F1 Score: Harmonic mean of precision (PPV) and sensitivity. Balances the trade-off between catching PE cases and avoiding false alarms. Ranges from 0 to 1.

Youden’s J (Youden Index): Sensitivity + Specificity - 1. Measures how much better the model is than random guessing (J=0). A model with 60% sensitivity and 60% specificity has J=0.20.

Receptive Field (RF): The temporal span of raw EKG signal that a single output unit of the convolutional network can “see.” A 1,122ms receptive field means each feature captures approximately 1-2 cardiac beats depending on heart rate (e.g., ~1.4 beats at 75 bpm, ~1.9 beats at 100 bpm). [See full explanation in the experiment design section below]

Collapse: When a model predicts the same class for all patients (e.g., all PE-negative), producing 0% sensitivity or 0% specificity. Common in imbalanced classification without proper class weighting.

Inverse Frequency Weighting: A class weighting strategy that penalizes misclassification of the minority class (PE-positive) proportional to its rarity. With ~23% PE prevalence, the positive class receives ~3.3x the loss weight of the negative class.

Cutoff (Decision Threshold): The probability above which the model classifies a patient as PE-positive. A cutoff of 0.50 means any predicted probability above 50% is classified as PE+. Lower cutoffs increase sensitivity at the cost of specificity.

2 Experiment Design

2.1 Architecture

Input: 1 x 12 x 5,000 (channel x leads x timepoints). Raw 12-lead EKG waveform sampled at 500Hz for 10 seconds (5,000 = 500Hz x 10s), unsqueezed to a single-channel 2D tensor where leads are the height dimension and time is the width dimension.

Show code
library(DiagrammeR)
library(DiagrammeRsvg)
library(rsvg)

g <- grViz('
digraph ekg_specialist_v3 {
  
  graph [rankdir=TB, fontname="Gill Sans", bgcolor="white", 
         nodesep=0.6, ranksep=0.7, pad=0.5]
  node [fontname="Gill Sans", fontsize=18, shape=box, style="filled,rounded", penwidth=2]
  edge [fontname="Gill Sans", fontsize=16, penwidth=1.5]

  input [label="EKG Input\n1 x 12 x 5,000\n(channel x leads x timepoints)", 
         shape=ellipse, fillcolor="#D5F5E3", color="#27AE60", width=3.5, height=1.5]

  conv1 [label="Conv2D (1 -> 32)\nkernel (1, k1), stride (1, s1)\nBatchNorm + ReLU", 
         fillcolor="#82E0AA", color="#1E8449", width=3.8, height=1.3]

  pool1 [label="MaxPool2D\nkernel (1, pk), stride (1, pk)", 
         fillcolor="#A9DFBF", color="#1E8449", width=3.5, height=0.9]

  drop1 [label="Dropout (p)", fillcolor="#D5F5E3", color="#27AE60", width=2.5, height=0.7]

  conv2 [label="Conv2D (32 -> 64)\nkernel (3, k2), stride (1, 1)\nBatchNorm + ReLU", 
         fillcolor="#AED6F1", color="#2E86C1", width=3.8, height=1.3]

  pool2 [label="MaxPool2D\nkernel (1, pk), stride (1, pk)", 
         fillcolor="#D6EAF8", color="#2E86C1", width=3.5, height=0.9]

  drop2 [label="Dropout (p)", fillcolor="#D6EAF8", color="#2E86C1", width=2.5, height=0.7]

  apool [label="AdaptiveAvgPool2D -> (1, 1)\nFlatten -> 64-dim", 
         fillcolor="#F5CBA7", color="#E67E22", width=3.5, height=0.9]

  fc [label="Linear (64 -> 32) + ReLU\nDropout (p)", 
      fillcolor="#F9E79F", color="#D4AC0D", width=3.5, height=0.9]

  output [label="Linear (32 -> 2)\nPE-negative | PE-positive", 
          shape=ellipse, fillcolor="#FADBD8", color="#E74C3C", width=3.5, height=1.2]

  rf_note [label="Receptive Field\n(kernel-config dependent)\nK0: 82ms  |  K3: 1,122ms  |  K5: 2,598ms",
           shape=note, fillcolor="#F2F3F4", color="#7F8C8D", fontsize=16, width=4.0, height=1.2]

  input -> conv1 -> pool1 -> drop1 -> conv2 -> pool2 -> drop2 -> apool -> fc -> output
  
  pool2 -> rf_note [style=dashed, color="#7F8C8D", constraint=false]
}
')

svg_text <- export_svg(g)
rsvg_png(charToRaw(svg_text), file = "ekg_specialist_v3_arch.png", width = 2000, height = 3000)

EKG Specialist V3 Architecture

Where k1, s1, k2, pk are kernel-configuration-specific parameters (see Kernel Configurations table below). Conv2 uses a (3, k2) kernel, spanning 3 leads vertically while varying temporally. All dropout layers use the same rate (hyperparameter). Output is 2 logits (PE-negative, PE-positive), passed through softmax at inference.

Kernel Configurations
Show code
kernel_table <- tibble(
  Kernel = paste0("K", 0:5),
  `Conv1` = c("(1,11)", "(1,11)", "(1,25)", "(1,51)", "(1,75)", "(1,101)"),
  Stride = c("(1,2)", "(1,1)", "(1,1)", "(1,2)", "(1,2)", "(1,2)"),
  `Conv2` = c("(3,7)", "(3,5)", "(3,11)", "(3,25)", "(3,35)", "(3,51)"),
  Pool = c("(1,2)", "(1,4)", "(1,4)", "(1,8)", "(1,8)", "(1,10)"),
  `RF (ms)` = c(82, 84, 160, 1122, 1490, 2598)
)


kable(kernel_table, booktabs = TRUE, 
      caption = "Kernel configurations and corresponding receptive fields.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Kernel configurations and corresponding receptive fields.
Kernel Conv1 Stride Conv2 Pool RF (ms)
K0 (1,11) (1,2) (3,7) (1,2) 82
K1 (1,11) (1,1) (3,5) (1,4) 84
K2 (1,25) (1,1) (3,11) (1,4) 160
K3 (1,51) (1,2) (3,25) (1,8) 1122
K4 (1,75) (1,2) (3,35) (1,8) 1490
K5 (1,101) (1,2) (3,51) (1,10) 2598
Receptive Field Design Rationale

2.1.1 Receptive Field Design Rationale

What is a receptive field? The input to the model is 5,000 timepoints of raw EKG signal sampled at 500Hz (5,000 / 500 = 10 seconds). After passing through convolutional and pooling layers, the signal is compressed into a small feature map. Each value in that feature map was computed from some contiguous chunk of the original 5,000 timepoints. The receptive field (RF) is the width of that chunk, i.e. how much raw signal each learned feature can “see.”

If the RF is 82ms, each feature sees only a fraction of the QRS complex. If the RF is 1,122ms, each feature sees an entire P-QRS-ST-T cycle, the full electrical signature of one heartbeat. Since classic PE signs on EKG (right heart strain, ST-segment changes, T-wave inversions) are expressed across the full beat, the hypothesis is that larger RFs should improve detection.

The RF formula. For a sequence of convolutional and pooling layers, the RF grows according to:

\[RF_{new} = RF_{old} + (k - 1) \times S_{cumulative}\]

where \(k\) is the kernel size of the current layer, and \(S_{cumulative}\) is the product of all strides in preceding layers. This captures the key insight: later layers operate on downsampled feature maps, so each position in their input represents multiple raw samples. A kernel of width 25 in Conv2 does not span 25 raw samples. Rather, it spans 25 positions that are each \(S_{cumulative}\) raw samples apart.

K3 step-by-step example. Architecture (temporal dimension): Conv1(k=51, s=2) -> MaxPool(k=8, s=8) -> Conv2(k=25, s=1) -> MaxPool(k=8, s=8):

Layer Kernel Stride Calculation RF (samples) Cumulative Stride
Input 1 1
Conv1 51 2 1 + (51-1) x 1 51 2
Pool1 8 8 51 + (8-1) x 2 65 16
Conv2 25 1 65 + (25-1) x 16 449 16
Pool2 8 8 449 + (8-1) x 16 561 128

561 samples / 500 Hz = 1,122ms

The large jump from 65 to 449 at Conv2 is because the cumulative stride is 16 at that point. Each of Conv2’s 25 kernel positions is 16 raw samples apart, so the kernel reaches across 24 x 16 = 384 additional raw samples.

Design constraints. Kernel sizes were chosen under three constraints:

  1. Broad temporal sweep. Six configurations were designed to produce receptive fields spanning two orders of magnitude, from ~80ms to ~2,600ms, allowing the data to determine which timescale contains the most signal for PE detection. Rather than targeting exact cardiac cycle fractions, the configurations were spaced to sample a wide range of temporal resolutions.

  2. 2:1 kernel ratio. A consistent ~2:1 ratio between Conv1 and Conv2 kernel widths was maintained across configurations (K2-K5 range from 1.98:1 to 2.27:1), so that each layer contributes proportionally to the total RF.

  3. Odd kernel widths. Standard convention in convolutional networks. Odd kernels allow symmetric padding (padding = (k-1)/2 on each side), preserving spatial alignment.

Given the 2:1 ratio and odd-kernel constraint, K3 uses conv1_k=51 and conv2_k=25, yielding 1,122ms. The RF tuning resolution at Conv2 is coarse: each +/-1 change in conv2_k shifts the RF by 32ms. To hit a different target while maintaining the 2:1 ratio would require changing both kernels simultaneously, so the achieved RFs are approximate by design.

Show code
# Verify RF calculations for all 6 kernels
kc <- readRDS(file.path(results_dir, checkpoint_files$file[1]))$kernel_configs

rf_calc <- tibble(
  Kernel = character(), conv1_k = integer(), conv1_s = integer(),
  pool_k = integer(), conv2_k = integer(),
  RF_samples = integer(), RF_ms = double(), Stored_ms = double()
)

for (i in 1:nrow(kc)) {
  row <- kc[i, ]
  c1_k <- unlist(row$conv1_k)[2]
  c1_s <- unlist(row$conv1_s)[2]
  c2_k <- unlist(row$conv2_k)[2]
  p_k  <- unlist(row$pool_k)[2]

  rf <- 1; cum_s <- 1
  rf <- rf + (c1_k - 1) * cum_s; cum_s <- cum_s * c1_s
  rf <- rf + (p_k - 1) * cum_s;  cum_s <- cum_s * p_k
  rf <- rf + (c2_k - 1) * cum_s; cum_s <- cum_s * 1
  rf <- rf + (p_k - 1) * cum_s;  cum_s <- cum_s * p_k

  rf_calc <- bind_rows(rf_calc, tibble(
    Kernel = row$kernel_id, conv1_k = c1_k, conv1_s = c1_s,
    pool_k = p_k, conv2_k = c2_k,
    `Conv1:Conv2` = ifelse(i == 1, sprintf("%.1f:1", c1_k / c2_k), sprintf("%d:1", round(c1_k / c2_k))),
    RF_samples = rf, RF_ms = rf / 500 * 1000, Stored_ms = row$rf_ms
  ))
}

kable(rf_calc, booktabs = TRUE,
      caption = "Receptive field verification. RF computed from architecture parameters and confirmed against stored values.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Receptive field verification. RF computed from architecture parameters and confirmed against stored values.
Kernel conv1_k conv1_s pool_k conv2_k RF_samples RF_ms Stored_ms Conv1:Conv2
K0 11 2 2 7 41 82 82 1.6:1
K1 11 1 4 5 42 84 84 2:1
K2 25 1 4 11 80 160 160 2:1
K3 51 2 8 25 561 1122 1122 2:1
K4 75 2 8 35 745 1490 1490 2:1
K5 101 2 10 51 1299 2598 2598 2:1

Clinical interpretation. How much signal each kernel captures depends on the patient’s heart rate. A normal resting heart rate is ~75 bpm (800ms per cycle). PE patients commonly present with sinus tachycardia (>=100 bpm), giving <=600ms per cycle.

Show code
hr_vals <- c(800, 600, 500, 400)
rf_vals <- c(82, 84, 160, 1122, 1490, 2598)

# Round to nearest 0.5 for cleaner presentation
round_half <- function(x) {
  r <- round(x * 2) / 2
  ifelse(r == round(r), sprintf("~%.0f", r), sprintf("~%.1f", r))
}

hr_table <- tibble(
  `Heart Rate` = c("75 bpm (normal resting)", "100 bpm (tachycardia threshold)", 
                    "120 bpm", "150 bpm"),
  `Cycle (ms)` = hr_vals
)

for (j in seq_along(rf_vals)) {
  raw <- rf_vals[j] / hr_vals
  hr_table[[paste0("K", j - 1)]] <- ifelse(raw < 0.5, sprintf("%.1f", raw), round_half(raw))
}

kable(hr_table, booktabs = TRUE,
      caption = "Approximate number of cardiac cycles captured by each kernel at different heart rates.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Approximate number of cardiac cycles captured by each kernel at different heart rates.
Heart Rate Cycle (ms) K0 K1 K2 K3 K4 K5
75 bpm (normal resting) 800 0.1 0.1 0.2 ~1.5 ~2 ~3
100 bpm (tachycardia threshold) 600 0.1 0.1 0.3 ~2 ~2.5 ~4.5
120 bpm 500 0.2 0.2 0.3 ~2 ~3 ~5
150 bpm 400 0.2 0.2 0.4 ~3 ~3.5 ~6.5

K0-K2 capture less than one beat at any heart rate. K3 captures 1-3 beats depending on heart rate, which is enough to see both waveform morphology (ST changes, T-wave inversions) and beat-to-beat timing. K4 and K5 capture more beats but show no significant improvement over K3, suggesting the additional context does not help.

K0 and K1 were carried over from V1/V2 for backward comparison and were not designed under the same 2:1 ratio convention.

2.2 Hyperparameter Grid

  • Learning rate: 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3
  • Class weight: none, weight_1_2 (1:2), inverse_freq (~1:3.3)
  • Dropout: 0.10, 0.20
  • Data split seeds: 612, 928 (70/30 stratified splits of 2,039 EKG-only patients)
  • Initialization seeds: 42, 123, 456, 789, 2024

Total: 6 kernels x 36 HP combos x 2 splits x 5 seeds = 2,160 experiments. Early stopping with patience=7 on test loss, maximum 30 epochs.

2.3 Data

  • Training/test: 2,039 EKG-only patients, split 70/30 per data split seed. PE prevalence ~23.2%.
  • Validation: 132 patients from the paired pool (patients with both EKG + CXR), using fixed paired_split_seed=612. The specialist only evaluates on validation and never trains on paired data, preventing leakage into Phase 3 fusion experiments.

Model selection strategy: The EKG specialist is selected based on test set performance (613 EKG-only patients), not validation set performance. This is deliberate: because Phase 3 fusion models will be evaluated on the same 132-patient validation set, selecting the specialist based on validation would constitute indirect data leakage. In warm fusion, the specialist weights are fine-tuned with paired data, so any pre-optimization for validation patients would propagate through the fusion training. Test-based selection keeps specialist selection independent of fusion evaluation.

Overfitting safeguards: (1) Early stopping with patience=7 on test loss (maximum 30 epochs) prevents overtraining. (2) Dropout regularization is applied after each convolutional block and in the fully connected layer. (3) Two independent stratified data splits (seeds 612, 928) verify that results are not driven by a particular train/test partition. (4) Five initialization seeds per configuration quantify seed-to-seed variability. (5) The 132-patient validation set is completely held out from both training and hyperparameter selection, serving only as an independent check on generalization.

3 Results

All metrics in Sections 4.1-4.7 are threshold-independent (AUROC, AUPRC). Binary prediction metrics (sensitivity, specificity, confusion matrices) are introduced in Section 4.8.

3.1 The Full Landscape (2,160 experiments)

Show code
# One row per experiment (AUROC/AUPRC are identical across cutoff rows)
auroc_data <- master %>% filter(cutoff == 0.50)

ggplot(auroc_data, aes(x = test_auroc)) +
  geom_histogram(binwidth = 0.02, fill = "#5DADE2", color = "white", alpha = 0.8) +
  geom_vline(xintercept = 0.50, linetype = "dashed", color = "#E74C3C", linewidth = 0.6) +
  annotate("text", x = 0.52, y = Inf, label = "Random chance (0.50)",
           hjust = 0, vjust = 2, color = "#E74C3C", size = 3.5) +
  labs(x = "Test AUROC", y = "Count",
       title = "Distribution of test AUROC across all 2,160 experiments",
       caption = "All kernels, class weights, learning rates, dropout rates, splits, and seeds.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

The bulk of experiments fall in the 0.55-0.62 AUROC range. A small number fall below 0.50.

Show code
ggplot(auroc_data, aes(x = test_auprc)) +
  geom_histogram(binwidth = 0.01, fill = "#58D68D", color = "white", alpha = 0.8) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.6) +
  annotate("text", x = 0.237, y = Inf, label = "Prevalence baseline (0.232)",
           hjust = 0, vjust = 2, color = "#E74C3C", size = 3.5) +
  labs(x = "Test AUPRC", y = "Count",
       title = "Distribution of test AUPRC across all 2,160 experiments",
       caption = "Dashed line = prevalence baseline (expected AUPRC of a random classifier).") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Most experiments exceed the prevalence baseline (0.232), with the bulk in the 0.25-0.32 range.

3.2 Class Weight (2,160 experiments)

Show code
auroc_data <- auroc_data %>%
  mutate(cw_label = case_when(
    class_weight == "none" ~ "None",
    class_weight == "weight_1_2" ~ "1:2",
    class_weight == "inverse_freq" ~ "Inverse freq (~1:3.3)"
  ) %>% factor(levels = c("Inverse freq (~1:3.3)", "1:2", "None")))

ggplot(auroc_data, aes(x = test_auroc, y = cw_label, fill = cw_label)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2, rel_min_height = 0.01) +
  geom_vline(xintercept = 0.50, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_manual(values = c("#27AE60", "#F39C12", "#E74C3C")) +
  labs(x = "Test AUROC", y = NULL,
       title = "Test AUROC distribution by class weight strategy",
       caption = "All kernels, LRs, dropout rates, splits, and seeds pooled (720 experiments per weight).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code
ggplot(auroc_data, aes(x = test_auprc, y = cw_label, fill = cw_label)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2, rel_min_height = 0.01) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_manual(values = c("#27AE60", "#F39C12", "#E74C3C")) +
  labs(x = "Test AUPRC", y = NULL,
       title = "Test AUPRC distribution by class weight strategy",
       caption = "720 experiments per weight. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

All three class weight strategies produce overlapping AUROC distributions in the 0.55-0.62 range. Class weight does not affect AUROC or AUPRC (threshold-independent metrics).

Show code
cw_auroc <- auroc_data %>%
  group_by(class_weight) %>%
  summarise(
    N = n(),
    Median = sprintf("%.3f", median(test_auroc)),
    SD = sprintf("%.3f", sd(test_auroc)),
    `IQR` = sprintf("%.3f - %.3f", quantile(test_auroc, 0.25), quantile(test_auroc, 0.75)),
    Min = sprintf("%.3f", min(test_auroc)),
    Max = sprintf("%.3f", max(test_auroc)),
    .groups = "drop"
  ) %>%
  rename(`Class Weight` = class_weight)

kable(cw_auroc, booktabs = TRUE,
      caption = "Test AUROC by class weight (720 experiments per strategy).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Test AUROC by class weight (720 experiments per strategy).
Class Weight N Median SD IQR Min Max
inverse_freq 720 0.580 0.010 0.574 - 0.589 0.554 0.612
none 720 0.581 0.039 0.565 - 0.592 0.432 0.621
weight_1_2 720 0.582 0.019 0.574 - 0.592 0.438 0.621
Show code
cw_auprc <- auroc_data %>%
  group_by(class_weight) %>%
  summarise(
    N = n(),
    Median = sprintf("%.3f", median(test_auprc)),
    SD = sprintf("%.3f", sd(test_auprc)),
    `IQR` = sprintf("%.3f - %.3f", quantile(test_auprc, 0.25), quantile(test_auprc, 0.75)),
    Min = sprintf("%.3f", min(test_auprc)),
    Max = sprintf("%.3f", max(test_auprc)),
    .groups = "drop"
  ) %>%
  rename(`Class Weight` = class_weight)

kable(cw_auprc, booktabs = TRUE,
      caption = "Test AUPRC by class weight (720 experiments per strategy). Prevalence baseline = 0.232.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Test AUPRC by class weight (720 experiments per strategy). Prevalence baseline = 0.232.
Class Weight N Median SD IQR Min Max
inverse_freq 720 0.285 0.011 0.277 - 0.293 0.254 0.322
none 720 0.279 0.023 0.270 - 0.290 0.199 0.319
weight_1_2 720 0.283 0.014 0.274 - 0.290 0.203 0.325

3.3 Kernel Size (2,160 experiments)

Each kernel has 360 experiments (3 class weights x 6 LRs x 2 dropouts x 2 splits x 5 seeds).

Show code
auroc_data <- auroc_data %>%
  mutate(kernel_label = sprintf("%s (%dms)", kernel_id, rf_ms) %>%
           fct_rev())

ggplot(auroc_data, aes(x = test_auroc, y = kernel_label, fill = kernel_label)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2, rel_min_height = 0.01) +
  geom_vline(xintercept = 0.50, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Greens", direction = 1) +
  labs(x = "Test AUROC", y = NULL,
       title = "Test AUROC distribution by kernel (all 2,160 experiments)",
       caption = "360 experiments per kernel. Dashed line = random chance (0.50).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

K3 has the rightmost peak. K0-K2 are slightly left-shifted. K4-K5 overlap with K3.

Show code
ggplot(auroc_data, aes(x = test_auprc, y = kernel_label, fill = kernel_label)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2, rel_min_height = 0.01) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  annotate("text", x = 0.235, y = Inf, label = "Prevalence baseline (0.232)",
           hjust = 0, vjust = 2, color = "#E74C3C", size = 3.5) +
  scale_fill_brewer(palette = "Greens", direction = 1) +
  labs(x = "Test AUPRC", y = NULL,
       title = "Test AUPRC distribution by kernel (all 2,160 experiments)",
       caption = "360 experiments per kernel. Dashed line = prevalence baseline.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Same pattern on AUPRC. K3 peak is rightmost. All kernels exceed the prevalence baseline.

Show code
ggplot(auroc_data, aes(y = kernel_label, x = test_auroc, fill = kernel_id)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.50, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Greens") +
  labs(x = "Kernel", y = "Test AUROC",
       title = "Test AUROC by kernel (all 2,160 experiments)",
       caption = "360 experiments per kernel. Dashed line = random chance (0.50).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code
ggplot(auroc_data, aes(y = kernel_label, x = test_auprc, fill = kernel_id)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Greens") +
  labs(x = "Test AUPRC", y = NULL,
       title = "Test AUPRC by kernel (all 2,160 experiments)",
       caption = "360 experiments per kernel. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

K3 has the highest median. K0-K2 are slightly lower. K4-K5 are comparable to K3.

Show code
kernel_test <- auroc_data %>%
  group_by(kernel_id, rf_ms) %>%
  summarise(
    med_auroc = median(test_auroc), sd_auroc = sd(test_auroc),
    med_auprc = median(test_auprc), sd_auprc = sd(test_auprc),
    .groups = "drop"
  ) %>%
  arrange(kernel_id) %>%
  transmute(
    Kernel = kernel_id, `RF (ms)` = rf_ms,
    `AUROC` = sprintf("%.3f (%.3f)", med_auroc, sd_auroc),
    `AUPRC` = sprintf("%.3f (%.3f)", med_auprc, sd_auprc)
  )

kable(kernel_test, booktabs = TRUE,
      caption = "Kernel comparison, test set. Median (SD) across 360 experiments per kernel.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  row_spec(4, bold = TRUE)
Kernel comparison, test set. Median (SD) across 360 experiments per kernel.
Kernel RF (ms) AUROC AUPRC
K0 82 0.577 (0.023) 0.280 (0.015)
K1 84 0.573 (0.036) 0.278 (0.021)
K2 160 0.575 (0.034) 0.271 (0.019)
K3 1122 0.592 (0.010) 0.289 (0.011)
K4 1490 0.589 (0.011) 0.287 (0.012)
K5 2598 0.586 (0.012) 0.289 (0.012)
Show code
kernel_val <- auroc_data %>%
  group_by(kernel_id, rf_ms) %>%
  summarise(
    med_auroc = median(val_auroc), sd_auroc = sd(val_auroc),
    med_auprc = median(val_auprc), sd_auprc = sd(val_auprc),
    .groups = "drop"
  ) %>%
  arrange(kernel_id) %>%
  transmute(
    Kernel = kernel_id, `RF (ms)` = rf_ms,
    `AUROC` = sprintf("%.3f (%.3f)", med_auroc, sd_auroc),
    `AUPRC` = sprintf("%.3f (%.3f)", med_auprc, sd_auprc)
  )

kable(kernel_val, booktabs = TRUE,
      caption = "Kernel comparison, validation set. Median (SD) across 360 experiments per kernel.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Kernel comparison, validation set. Median (SD) across 360 experiments per kernel.
Kernel RF (ms) AUROC AUPRC
K0 82 0.603 (0.034) 0.363 (0.043)
K1 84 0.598 (0.046) 0.331 (0.042)
K2 160 0.616 (0.040) 0.378 (0.048)
K3 1122 0.590 (0.015) 0.307 (0.023)
K4 1490 0.579 (0.016) 0.293 (0.025)
K5 2598 0.565 (0.023) 0.296 (0.026)

K3 has the highest median AUROC and AUPRC on both test and validation sets.

3.3.1 Statistical Test

Kruskal-Wallis rank-sum test on test AUROC across the 6 kernel groups, with pairwise Wilcoxon rank-sum tests (BH-adjusted) for post-hoc comparisons. All 2,160 experiments included.

Show code
cat(sprintf("N = %d experiments (%d per kernel)\n",
            nrow(auroc_data),
            nrow(auroc_data) / n_distinct(auroc_data$kernel_id)))
N = 2160 experiments (360 per kernel)
Show code
kw <- kruskal.test(test_auroc ~ kernel_id, data = auroc_data)
cat(sprintf("Kruskal-Wallis chi-squared = %.2f, df = %d, p = %.2e\n",
            kw$statistic, kw$parameter, kw$p.value))
Kruskal-Wallis chi-squared = 673.05, df = 5, p = 3.29e-143
Show code
pw <- pairwise.wilcox.test(auroc_data$test_auroc, auroc_data$kernel_id,
                           p.adjust.method = "BH")

kable(pw$p.value, booktabs = TRUE, digits = 4,
      caption = "Pairwise Wilcoxon p-values (BH-adjusted).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Pairwise Wilcoxon p-values (BH-adjusted).
K0 K1 K2 K3 K4
K1 1e-04 NA NA NA NA
K2 5e-04 0.3048 NA NA NA
K3 0e+00 0.0000 0 NA NA
K4 0e+00 0.0000 0 9e-04 NA
K5 0e+00 0.0000 0 0e+00 0.0688
Show code
rf_summary <- auroc_data %>%
  group_by(kernel_id, rf_ms) %>%
  summarise(
    med_auroc = median(test_auroc),
    q25 = quantile(test_auroc, 0.25),
    q75 = quantile(test_auroc, 0.75),
    .groups = "drop"
  ) %>%
  mutate(kernel_label = sprintf("%s\n(%dms)", kernel_id, rf_ms) %>% fct_inorder())

ggplot(rf_summary, aes(x = kernel_label, y = med_auroc)) +
  geom_errorbar(aes(ymin = q25, ymax = q75), width = 0.2, color = "grey50") +
  geom_point(size = 4, color = "#27AE60") +
  geom_line(aes(group = 1), color = "#27AE60", linewidth = 0.8, alpha = 0.5) +
  labs(x = "Kernel (Receptive Field)", y = "Median Test AUROC",
       title = "Median test AUROC by kernel",
       caption = "Error bars = IQR. 360 experiments per kernel.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code
rf_summary_auprc <- auroc_data %>%
  group_by(kernel_id, rf_ms) %>%
  summarise(
    med_auprc = median(test_auprc),
    q25 = quantile(test_auprc, 0.25),
    q75 = quantile(test_auprc, 0.75),
    .groups = "drop"
  ) %>%
  mutate(kernel_label = sprintf("%s\n(%dms)", kernel_id, rf_ms) %>% fct_inorder())

ggplot(rf_summary_auprc, aes(x = kernel_label, y = med_auprc)) +
  geom_errorbar(aes(ymin = q25, ymax = q75), width = 0.2, color = "grey50") +
  geom_hline(yintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  geom_point(size = 4, color = "#27AE60") +
  geom_line(aes(group = 1), color = "#27AE60", linewidth = 0.8, alpha = 0.5) +
  labs(x = "Kernel (Receptive Field)", y = "Median Test AUPRC",
       title = "Median test AUPRC by kernel",
       caption = "Error bars = IQR. 360 experiments per kernel. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

AUROC increases from K0-K2 to K3, then plateaus at K4/K5. Same pattern on AUPRC.

AUROC increases from K0-K2 (~80-160ms) to K3 (~1,100ms), then plateaus at K4/K5.

3.4 Learning Rate

All 2,160 experiments by learning rate (360 per LR):

Show code
auroc_data <- auroc_data %>%
  mutate(lr_label_all = sprintf("%.0e", lr) %>%
           fct_reorder(test_auroc, .fun = median))

ggplot(auroc_data, aes(y = lr_label_all, x = test_auroc, fill = lr_label_all)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  scale_fill_brewer(palette = "Blues") +
  labs(y = "Learning Rate", x = "Test AUROC",
       title = "Test AUROC by learning rate (all 2,160 experiments)",
       caption = "360 experiments per LR.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code
ggplot(auroc_data, aes(y = lr_label_all, x = test_auprc, fill = lr_label_all)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Blues") +
  labs(y = "Learning Rate", x = "Test AUPRC",
       title = "Test AUPRC by learning rate (all 2,160 experiments)",
       caption = "360 experiments per LR. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

No clear separation across learning rates on either metric.

Narrowed to K3 (60 experiments per LR):

K3 fixed. All class weights, dropouts, splits, and seeds (60 experiments per LR).

Show code
k3_data <- auroc_data %>%
  filter(kernel_id == "K3") %>%
  mutate(lr_label = sprintf("%.0e", lr) %>%
           fct_reorder(test_auroc, .fun = median))

ggplot(k3_data, aes(y = lr_label, x = test_auroc, fill = lr_label)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  scale_fill_brewer(palette = "Blues") +
  labs(y = "Learning Rate", x = "Test AUROC",
       title = "Test AUROC by learning rate (K3, all class weights)",
       caption = "60 experiments per LR.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code
ggplot(k3_data, aes(y = lr_label, x = test_auprc, fill = lr_label)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Blues") +
  labs(y = "Learning Rate", x = "Test AUPRC",
       title = "Test AUPRC by learning rate (K3, all class weights)",
       caption = "60 experiments per LR. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Same pattern after narrowing to K3.

The distributions largely overlap across learning rates.

Show code
lr_summary <- k3_data %>%
  group_by(lr) %>%
  summarise(
    N = n(),
    `Median AUROC` = sprintf("%.3f", median(test_auroc)),
    `SD AUROC` = sprintf("%.3f", sd(test_auroc)),
    `Median AUPRC` = sprintf("%.3f", median(test_auprc)),
    `SD AUPRC` = sprintf("%.3f", sd(test_auprc)),
    .groups = "drop"
  ) %>%
  rename(`Learning Rate` = lr)

kable(lr_summary, booktabs = TRUE,
      caption = "Learning rate effect for K3. Median (SD) across all class weights, dropouts, splits, and seeds.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Learning rate effect for K3. Median (SD) across all class weights, dropouts, splits, and seeds.
Learning Rate N Median AUROC SD AUROC Median AUPRC SD AUPRC
1e-05 60 0.585 0.010 0.290 0.013
5e-05 60 0.589 0.008 0.289 0.010
1e-04 60 0.591 0.009 0.286 0.008
5e-04 60 0.593 0.011 0.290 0.011
1e-03 60 0.596 0.011 0.290 0.010
5e-03 60 0.597 0.010 0.290 0.010

AUROC is stable across the LR range. LR = 5e-3 is carried forward.

3.5 Dropout

All 2,160 experiments by dropout (1,080 per level):

Show code
ggplot(auroc_data, aes(y = factor(dropout_rate), x = test_auroc, fill = factor(dropout_rate))) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  scale_fill_manual(values = c("#8E44AD", "#F39C12")) +
  labs(y = "Dropout Rate", x = "Test AUROC",
       title = "Test AUROC by dropout rate (all 2,160 experiments)",
       caption = "1,080 experiments per dropout level.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code
ggplot(auroc_data, aes(y = factor(dropout_rate), x = test_auprc, fill = factor(dropout_rate))) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_manual(values = c("#8E44AD", "#F39C12")) +
  labs(y = "Dropout Rate", x = "Test AUPRC",
       title = "Test AUPRC by dropout rate (all 2,160 experiments)",
       caption = "1,080 experiments per dropout level. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

No clear separation across dropout rates on either metric.

Narrowed to K3, LR=5e-3 (30 experiments per dropout):

K3 and LR=5e-3 fixed. All class weights, splits, and seeds (30 experiments per dropout).

Show code
k3_lr <- auroc_data %>%
  filter(kernel_id == "K3", lr == 0.005)

dropout_wide <- k3_lr %>%
  select(data_split_seed, seed, class_weight, dropout_rate, test_auroc) %>%
  pivot_wider(names_from = dropout_rate, values_from = test_auroc, names_prefix = "drop_")

ggplot(dropout_wide, aes(x = drop_0.1, y = drop_0.2)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(color = class_weight), size = 3, alpha = 0.8) +
  labs(x = "Test AUROC (dropout = 0.10)", y = "Test AUROC (dropout = 0.20)",
       title = "Dropout 0.10 vs 0.20 (K3, LR=5e-3, all class weights)",
       caption = "Each point is one seed/split/class_weight combination.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code
dropout_wide_auprc <- k3_lr %>%
  select(data_split_seed, seed, class_weight, dropout_rate, test_auprc) %>%
  pivot_wider(names_from = dropout_rate, values_from = test_auprc, names_prefix = "drop_")

ggplot(dropout_wide_auprc, aes(x = drop_0.1, y = drop_0.2)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(color = class_weight), size = 3, alpha = 0.8) +
  labs(x = "Test AUPRC (dropout = 0.10)", y = "Test AUPRC (dropout = 0.20)",
       title = "Dropout 0.10 vs 0.20 - AUPRC (K3, LR=5e-3, all class weights)",
       caption = "Each point is one seed/split/class_weight combination.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Points mostly cluster near the diagonal. No clear pattern detected.

Show code
dropout_summary <- k3_lr %>%
  group_by(dropout_rate) %>%
  summarise(
    N = n(),
    `Median AUROC` = sprintf("%.3f", median(test_auroc)),
    `SD AUROC` = sprintf("%.3f", sd(test_auroc)),
    `Median AUPRC` = sprintf("%.3f", median(test_auprc)),
    `SD AUPRC` = sprintf("%.3f", sd(test_auprc)),
    .groups = "drop"
  ) %>%
  rename(Dropout = dropout_rate)

kable(dropout_summary, booktabs = TRUE,
      caption = "Dropout comparison for K3, LR=5e-3, all class weights.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Dropout comparison for K3, LR=5e-3, all class weights.
Dropout N Median AUROC SD AUROC Median AUPRC SD AUPRC
0.1 30 0.596 0.012 0.292 0.010
0.2 30 0.597 0.008 0.290 0.009

Median AUROC is similar for both dropout rates. Dropout = 0.10 is carried forward.

3.6 Data Split Robustness

All 2,160 experiments by data split (1,080 per split):

Show code
ggplot(auroc_data, aes(y = factor(data_split_seed), x = test_auroc, fill = factor(data_split_seed))) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  scale_fill_manual(values = c("#2E86C1", "#E67E22")) +
  labs(y = "Data Split", x = "Test AUROC",
       title = "Test AUROC by data split (all 2,160 experiments)",
       caption = "1,080 experiments per split.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code
ggplot(auroc_data, aes(y = factor(data_split_seed), x = test_auprc, fill = factor(data_split_seed))) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_manual(values = c("#2E86C1", "#E67E22")) +
  labs(y = "Data Split", x = "Test AUPRC",
       title = "Test AUPRC by data split (all 2,160 experiments)",
       caption = "1,080 experiments per split. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

No clear separation across splits on either metric.

Narrowed to K3, LR=5e-3, dropout=0.10 (15 experiments per split):

Show code
k3_final <- auroc_data %>%
  filter(kernel_id == "K3", lr == 0.005, dropout_rate == 0.1)

split_wide <- k3_final %>%
  select(seed, class_weight, data_split_seed, test_auroc) %>%
  pivot_wider(names_from = data_split_seed, values_from = test_auroc, names_prefix = "split_")

ggplot(split_wide, aes(x = split_612, y = split_928)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(color = class_weight), size = 3, alpha = 0.8) +
  labs(x = "Test AUROC (split 612)", y = "Test AUROC (split 928)",
       title = "Split 612 vs 928 (K3, LR=5e-3, dropout=0.10, all class weights)",
       caption = "Each point is one seed/class_weight combination.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code
split_wide_auprc <- k3_final %>%
  select(seed, class_weight, data_split_seed, test_auprc) %>%
  pivot_wider(names_from = data_split_seed, values_from = test_auprc, names_prefix = "split_")

ggplot(split_wide_auprc, aes(x = split_612, y = split_928)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(color = class_weight), size = 3, alpha = 0.8) +
  labs(x = "Test AUPRC (split 612)", y = "Test AUPRC (split 928)",
       title = "Split 612 vs 928 - AUPRC (K3, LR=5e-3, dropout=0.10, all class weights)",
       caption = "Each point is one seed/class_weight combination.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Points mostly cluster near the diagonal. No clear pattern detected.

Show code
split_summary <- k3_final %>%
  group_by(data_split_seed) %>%
  summarise(
    N = n(),
    `Median AUROC` = sprintf("%.3f", median(test_auroc)),
    `SD AUROC` = sprintf("%.3f", sd(test_auroc)),
    `Median AUPRC` = sprintf("%.3f", median(test_auprc)),
    `SD AUPRC` = sprintf("%.3f", sd(test_auprc)),
    .groups = "drop"
  ) %>%
  rename(Split = data_split_seed)

kable(split_summary, booktabs = TRUE,
      caption = "Split comparison for K3, LR=5e-3, dropout=0.10, all class weights.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Split comparison for K3, LR=5e-3, dropout=0.10, all class weights.
Split N Median AUROC SD AUROC Median AUPRC SD AUPRC
612 15 0.597 0.007 0.289 0.006
928 15 0.596 0.015 0.294 0.013

Median AUROC is similar across splits.

3.7 Candidate Configuration

Configuration selected from preceding sections:

  • Kernel: K3 (1,122ms RF), highest median AUROC (Section 4.3)
  • Learning rate: 5e-3 (Section 4.4)
  • Dropout: 0.10 (Section 4.5)

Class weight is not selected here because it does not affect AUROC/AUPRC. It becomes relevant in Section 4.8 when binary predictions are needed.

Show code
rec_seeds <- master %>%
  filter(cutoff == 0.50, kernel_id == "K3", lr == 0.005, dropout_rate == 0.1)

rec_seed_table <- rec_seeds %>%
  transmute(
    Split = data_split_seed, Seed = seed, `Class Weight` = class_weight,
    Epoch = best_epoch,
    AUROC = sprintf("%.3f", test_auroc),
    AUPRC = sprintf("%.3f", test_auprc)
  ) %>%
  arrange(Split, `Class Weight`, Seed)

kable(rec_seed_table, booktabs = TRUE,
      caption = "Candidate config (K3, LR=5e-3, dropout=0.1) across all seed/split/class_weight combinations. Test set.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Candidate config (K3, LR=5e-3, dropout=0.1) across all seed/split/class_weight combinations. Test set.
Split Seed Class Weight Epoch AUROC AUPRC
612 42 inverse_freq 4 0.597 0.292
612 123 inverse_freq 6 0.584 0.287
612 456 inverse_freq 3 0.586 0.287
612 789 inverse_freq 2 0.595 0.293
612 2024 inverse_freq 8 0.601 0.297
612 42 none 2 0.605 0.294
612 123 none 5 0.605 0.299
612 456 none 2 0.595 0.288
612 789 none 2 0.597 0.285
612 2024 none 1 0.595 0.283
612 42 weight_1_2 2 0.597 0.291
612 123 weight_1_2 5 0.594 0.281
612 456 weight_1_2 6 0.589 0.279
612 789 weight_1_2 2 0.602 0.296
612 2024 weight_1_2 2 0.605 0.289
928 42 inverse_freq 4 0.582 0.271
928 123 inverse_freq 1 0.605 0.299
928 456 inverse_freq 10 0.564 0.302
928 789 inverse_freq 5 0.587 0.286
928 2024 inverse_freq 2 0.600 0.296
928 42 none 7 0.597 0.308
928 123 none 1 0.597 0.290
928 456 none 1 0.618 0.305
928 789 none 5 0.574 0.279
928 2024 none 3 0.599 0.306
928 42 weight_1_2 1 0.619 0.323
928 123 weight_1_2 2 0.590 0.294
928 456 weight_1_2 1 0.585 0.283
928 789 weight_1_2 3 0.581 0.292
928 2024 weight_1_2 2 0.596 0.281
Show code
rec_summary <- rec_seeds %>%
  group_by(`Class Weight` = class_weight) %>%
  summarise(
    N = n(),
    `Median AUROC` = sprintf("%.3f", median(test_auroc)),
    `SD` = sprintf("%.3f", sd(test_auroc)),
    `IQR` = sprintf("%.3f - %.3f", quantile(test_auroc, 0.25), quantile(test_auroc, 0.75)),
    `Min` = sprintf("%.3f", min(test_auroc)),
    `Max` = sprintf("%.3f", max(test_auroc)),
    .groups = "drop"
  )

kable(rec_summary, booktabs = TRUE,
      caption = "Candidate config test AUROC summary (K3, LR=5e-3, dropout=0.1).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Candidate config test AUROC summary (K3, LR=5e-3, dropout=0.1).
Class Weight N Median AUROC SD IQR Min Max
inverse_freq 10 0.591 0.012 0.584 - 0.599 0.564 0.605
none 10 0.597 0.011 0.596 - 0.603 0.574 0.618
weight_1_2 10 0.595 0.011 0.589 - 0.601 0.581 0.619
Show code
rec_summary_auprc <- rec_seeds %>%
  group_by(`Class Weight` = class_weight) %>%
  summarise(
    N = n(),
    `Median AUPRC` = sprintf("%.3f", median(test_auprc)),
    `SD` = sprintf("%.3f", sd(test_auprc)),
    `IQR` = sprintf("%.3f - %.3f", quantile(test_auprc, 0.25), quantile(test_auprc, 0.75)),
    `Min` = sprintf("%.3f", min(test_auprc)),
    `Max` = sprintf("%.3f", max(test_auprc)),
    .groups = "drop"
  )

kable(rec_summary_auprc, booktabs = TRUE,
      caption = "Candidate config test AUPRC summary (K3, LR=5e-3, dropout=0.1). Prevalence baseline = 0.232.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Candidate config test AUPRC summary (K3, LR=5e-3, dropout=0.1). Prevalence baseline = 0.232.
Class Weight N Median AUPRC SD IQR Min Max
inverse_freq 10 0.292 0.009 0.287 - 0.297 0.271 0.302
none 10 0.292 0.010 0.286 - 0.303 0.279 0.308
weight_1_2 10 0.290 0.013 0.281 - 0.294 0.279 0.323

Individual seed-level results:

Show code
epoch_data <- rec_seeds %>%
  mutate(split_label = factor(data_split_seed))

ggplot(epoch_data, aes(x = best_epoch, y = split_label, color = class_weight)) +
  geom_jitter(size = 3, height = 0.15, alpha = 0.8) +
  labs(x = "Best Epoch (early stopping)", y = "Data Split", color = "Class Weight",
       title = "Convergence epoch for candidate config across seeds",
       caption = "K3, LR=5e-3, dropout=0.1.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

3.7.1 Validation Set Performance

The validation set (132 paired patients with both CXR and EKG) is held out from specialist training. These are the patients that will be used in Phase 3 warm fusion.

Show code
rec_val_table <- rec_seeds %>%
  transmute(
    Split = data_split_seed, Seed = seed, `Class Weight` = class_weight,
    `Val AUROC` = sprintf("%.3f", val_auroc),
    `Val AUPRC` = sprintf("%.3f", val_auprc)
  ) %>%
  arrange(Split, `Class Weight`, Seed)

kable(rec_val_table, booktabs = TRUE,
      caption = "Candidate config (K3, LR=5e-3, dropout=0.1) validation set performance across all seed/split/class_weight combinations.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Candidate config (K3, LR=5e-3, dropout=0.1) validation set performance across all seed/split/class_weight combinations.
Split Seed Class Weight Val AUROC Val AUPRC
612 42 inverse_freq 0.617 0.347
612 123 inverse_freq 0.605 0.342
612 456 inverse_freq 0.614 0.329
612 789 inverse_freq 0.602 0.336
612 2024 inverse_freq 0.592 0.317
612 42 none 0.601 0.320
612 123 none 0.579 0.317
612 456 none 0.583 0.319
612 789 none 0.590 0.304
612 2024 none 0.599 0.339
612 42 weight_1_2 0.608 0.350
612 123 weight_1_2 0.611 0.342
612 456 weight_1_2 0.603 0.318
612 789 weight_1_2 0.585 0.302
612 2024 weight_1_2 0.599 0.319
928 42 inverse_freq 0.575 0.319
928 123 inverse_freq 0.589 0.327
928 456 inverse_freq 0.566 0.290
928 789 inverse_freq 0.579 0.278
928 2024 inverse_freq 0.581 0.304
928 42 none 0.602 0.330
928 123 none 0.587 0.302
928 456 none 0.592 0.292
928 789 none 0.608 0.291
928 2024 none 0.578 0.317
928 42 weight_1_2 0.579 0.307
928 123 weight_1_2 0.588 0.309
928 456 weight_1_2 0.561 0.290
928 789 weight_1_2 0.569 0.310
928 2024 weight_1_2 0.581 0.370
Show code
rec_compare <- rec_seeds %>%
  group_by(class_weight) %>%
  summarise(
    N = n(),
    `Test AUROC` = sprintf("%.3f (%.3f)", median(test_auroc), sd(test_auroc)),
    `Val AUROC` = sprintf("%.3f (%.3f)", median(val_auroc), sd(val_auroc)),
    `Test AUPRC` = sprintf("%.3f (%.3f)", median(test_auprc), sd(test_auprc)),
    `Val AUPRC` = sprintf("%.3f (%.3f)", median(val_auprc), sd(val_auprc)),
    .groups = "drop"
  ) %>%
  rename(`Class Weight` = class_weight)

kable(rec_compare, booktabs = TRUE,
      caption = "Candidate config: test vs validation comparison. Median (SD) across seeds and splits.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Candidate config: test vs validation comparison. Median (SD) across seeds and splits.
Class Weight N Test AUROC Val AUROC Test AUPRC Val AUPRC
inverse_freq 10 0.591 (0.012) 0.590 (0.017) 0.292 (0.009) 0.323 (0.022)
none 10 0.597 (0.011) 0.591 (0.010) 0.292 (0.010) 0.317 (0.016)
weight_1_2 10 0.595 (0.011) 0.586 (0.017) 0.290 (0.013) 0.314 (0.025)
Show code
ggplot(rec_seeds, aes(x = test_auroc, y = val_auroc, color = class_weight)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(shape = factor(data_split_seed)), size = 3, alpha = 0.8) +
  labs(x = "Test AUROC", y = "Validation AUROC", color = "Class Weight", shape = "Split",
       title = "Test vs validation AUROC for candidate config",
       caption = "K3, LR=5e-3, dropout=0.1. Each point is one seed.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code
ggplot(rec_seeds, aes(x = test_auprc, y = val_auprc, color = class_weight)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(shape = factor(data_split_seed)), size = 3, alpha = 0.8) +
  labs(x = "Test AUPRC", y = "Validation AUPRC", color = "Class Weight", shape = "Split",
       title = "Test vs validation AUPRC for candidate config",
       caption = "K3, LR=5e-3, dropout=0.1. Each point is one seed.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

3.8 Clinical Utility: Binary Predictions and Operating Points

Sections 4.1-4.7 used threshold-independent metrics (AUROC, AUPRC). This section examines binary prediction behavior, which depends on both the decision cutoff and the class weight strategy.

3.8.1 Prediction Collapse at Cutoff 0.50

Show code
collapse_all <- master %>%
  group_by(cutoff, class_weight) %>%
  summarise(
    N = n(),
    all_neg_pct = 100 * mean(test_sensitivity == 0),
    all_pos_pct = 100 * mean(test_specificity == 0),
    balanced_pct = 100 * mean(test_sensitivity > 0 & test_specificity > 0),
    .groups = "drop"
  )

collapse_wide <- collapse_all %>%
  transmute(
    Cutoff = sprintf("%.2f", cutoff),
    `Class Weight` = class_weight,
    `All-Negative (Sens=0)` = sprintf("%.1f%%", all_neg_pct),
    `All-Positive (Spec=0)` = sprintf("%.1f%%", all_pos_pct),
    Balanced = sprintf("%.1f%%", balanced_pct)
  )

kable(collapse_wide, booktabs = TRUE,
      caption = "Prediction collapse rates by cutoff and class weight (2,160 experiments per cutoff).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  collapse_rows(columns = 1, valign = "top")
Prediction collapse rates by cutoff and class weight (2,160 experiments per cutoff).
Cutoff Class Weight All-Negative (Sens=0) All-Positive (Spec=0) Balanced
0.15 inverse_freq 0.0% 99.6% 0.4%
none 0.0% 47.4% 52.6%
weight_1_2 0.0% 91.7% 8.3%
0.25 inverse_freq 0.0% 92.5% 7.5%
none 0.0% 10.0% 90.0%
weight_1_2 0.0% 59.2% 40.8%
0.35 inverse_freq 0.0% 65.0% 35.0%
none 24.3% 3.6% 72.1%
weight_1_2 0.0% 11.5% 88.5%
0.50 inverse_freq 0.0% 1.1% 98.9%
none 98.3% 0.0% 1.7%
weight_1_2 39.4% 0.1% 60.4%

The table shows how collapse rates change across cutoffs. At cutoff 0.50, none and weight_1_2 show high all-negative rates. At lower cutoffs, these rates decrease as more models’ predicted probabilities cross the threshold.

For the cutoff analysis below (Section 4.8.3), inverse_freq with cutoff 0.50 is used as one operating point. The table above shows the full picture across strategies and cutoffs.

3.8.2 Representative Confusion Matrices by Kernel

For each kernel, one model was selected from a fixed combination (inverse_freq, LR=5e-3, dropout=0.1, split 612) by choosing the seed with median test AUROC.

Show code
rep_models <- auroc_data %>%
  filter(class_weight == "inverse_freq", lr == 0.005, dropout_rate == 0.1,
         data_split_seed == 612) %>%
  group_by(kernel_id) %>%
  mutate(dist_from_median = abs(test_auroc - median(test_auroc))) %>%
  slice_min(dist_from_median, n = 1, with_ties = FALSE) %>%
  ungroup()
Show code
combo <- data.frame(` ` = c("Actual PE+", "Actual PE-", "Total"), check.names = FALSE)
header_spec <- c(" " = 1)

for (i in 1:nrow(rep_models)) {
  r <- rep_models[i, ]
  combo[[paste0("Pred+_", i)]] <- c(r$test_tp, r$test_fp, r$test_tp + r$test_fp)
  combo[[paste0("Pred-_", i)]] <- c(r$test_fn, r$test_tn, r$test_fn + r$test_tn)
  label <- sprintf("K%d (%dms)", r$kernel_idx - 1, r$rf_ms)
  header_spec[label] <- 2
}

names(combo) <- c(" ", rep(c("Pred+", "Pred-"), nrow(rep_models)))

kable(combo, booktabs = TRUE, align = c("l", rep("r", ncol(combo) - 1)),
      caption = "Representative 2x2 confusion matrices by kernel at cutoff 0.50 (inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed). Test set (n=613).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  add_header_above(header_spec) %>%
  column_spec(c(3, 5, 7, 9, 11, 13), border_right = TRUE)
Representative 2x2 confusion matrices by kernel at cutoff 0.50 (inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed). Test set (n=613).
K0 (82ms)
K1 (84ms)
K2 (160ms)
K3 (1122ms)
K4 (1490ms)
K5 (2598ms)
Pred+ Pred- Pred+ Pred- Pred+ Pred- Pred+ Pred- Pred+ Pred- Pred+ Pred-
Actual PE+ 107 36 70 73 89 54 70 73 76 67 106 37
Actual PE- 288 182 184 286 226 244 163 307 192 278 306 164
Total 395 218 254 359 315 298 233 380 268 345 412 201

K0-K2 show lower sensitivity. K3-K5 show a more balanced split between true positives and true negatives.

3.8.3 Cutoff Analysis for Candidate Model

The candidate model (K3, inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed) evaluated at four decision cutoffs.

Show code
rec_config <- master %>%
  filter(kernel_id == "K3", class_weight == "inverse_freq", lr == 0.005,
         dropout_rate == 0.1, data_split_seed == 612)

med_seed <- rec_config %>%
  filter(cutoff == 0.50) %>%
  mutate(dist = abs(test_auroc - median(test_auroc))) %>%
  slice_min(dist, n = 1, with_ties = FALSE) %>%
  pull(seed)

rec_all_cutoffs <- rec_config %>% filter(seed == med_seed)
Show code
cutoff_long <- rec_all_cutoffs %>%
  select(cutoff, test_sensitivity, test_specificity) %>%
  pivot_longer(cols = c(test_sensitivity, test_specificity),
               names_to = "metric", values_to = "value") %>%
  mutate(metric = ifelse(metric == "test_sensitivity", "Sensitivity", "Specificity"))

ggplot(cutoff_long, aes(x = cutoff, y = value, color = metric)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_color_manual(values = c("Sensitivity" = "#E74C3C", "Specificity" = "#2E86C1")) +
  labs(x = "Decision Cutoff", y = NULL, color = NULL,
       title = "Sensitivity-specificity tradeoff across cutoffs (test set, n=613)",
       caption = "K3, inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Test Set (613 EKG-only patients):

Show code
cutoff_summary <- rec_all_cutoffs %>%
  transmute(
    Cutoff = sprintf("%.2f", cutoff),
    Sensitivity = sprintf("%.1f%%", 100 * test_sensitivity),
    Specificity = sprintf("%.1f%%", 100 * test_specificity),
    PPV = sprintf("%.1f%%", 100 * test_precision),
    NPV = sprintf("%.1f%%", 100 * test_npv),
    F1 = sprintf("%.3f", test_f1),
    TP = test_tp, FP = test_fp, FN = test_fn, TN = test_tn
  )

kable(cutoff_summary, booktabs = TRUE,
      caption = "Cutoff analysis for candidate K3 model (test set, 613 patients).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Cutoff analysis for candidate K3 model (test set, 613 patients).
Cutoff Sensitivity Specificity PPV NPV F1 TP FP FN TN
0.15 100.0% 0.0% 23.3% 0.0% 0.378 143 470 0 0
0.25 100.0% 0.0% 23.3% 0.0% 0.378 143 470 0 0
0.35 100.0% 0.2% 23.4% 100.0% 0.379 143 469 0 1
0.50 49.0% 65.3% 30.0% 80.8% 0.372 70 163 73 307

Validation Set (132 paired patients):

Show code
rec_config_val <- master %>%
  filter(kernel_id == "K3", class_weight == "inverse_freq", lr == 0.005,
         dropout_rate == 0.1, data_split_seed == 612, seed == med_seed)

cutoff_long_val <- rec_config_val %>%
  select(cutoff, val_sensitivity, val_specificity) %>%
  pivot_longer(cols = c(val_sensitivity, val_specificity),
               names_to = "metric", values_to = "value") %>%
  mutate(metric = ifelse(metric == "val_sensitivity", "Sensitivity", "Specificity"))

ggplot(cutoff_long_val, aes(x = cutoff, y = value, color = metric)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_color_manual(values = c("Sensitivity" = "#E74C3C", "Specificity" = "#2E86C1")) +
  labs(x = "Decision Cutoff", y = NULL, color = NULL,
       title = "Sensitivity-specificity tradeoff across cutoffs (validation set, n=132)",
       caption = "K3, inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Show code
cutoff_summary_val <- rec_config_val %>%
  transmute(
    Cutoff = sprintf("%.2f", cutoff),
    Sensitivity = sprintf("%.1f%%", 100 * val_sensitivity),
    Specificity = sprintf("%.1f%%", 100 * val_specificity),
    PPV = sprintf("%.1f%%", 100 * val_precision),
    NPV = sprintf("%.1f%%", 100 * val_npv),
    F1 = sprintf("%.3f", val_f1),
    TP = val_tp, FP = val_fp, FN = val_fn, TN = val_tn
  )

kable(cutoff_summary_val, booktabs = TRUE,
      caption = "Cutoff analysis for candidate K3 model (validation set, 132 patients).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Cutoff analysis for candidate K3 model (validation set, 132 patients).
Cutoff Sensitivity Specificity PPV NPV F1 TP FP FN TN
0.15 100.0% 0.0% 23.5% 0.0% 0.380 31 101 0 0
0.25 100.0% 0.0% 23.5% 0.0% 0.380 31 101 0 0
0.35 100.0% 0.0% 23.5% 0.0% 0.380 31 101 0 0
0.50 61.3% 57.4% 30.6% 82.9% 0.409 19 43 12 58

3.9 Validation vs Test Agreement

Show code
ggplot(auroc_data, aes(x = test_auroc, y = val_auroc, color = kernel_id)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(alpha = 0.5, size = 1.5) +
  scale_color_brewer(palette = "Set2", name = "Kernel") +
  labs(x = "Test AUROC", y = "Validation AUROC",
       title = "Validation vs test AUROC (all 2,160 experiments)",
       caption = "Dashed line = perfect agreement.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code
ggplot(auroc_data, aes(x = test_auprc, y = val_auprc, color = kernel_id)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(alpha = 0.5, size = 1.5) +
  scale_color_brewer(palette = "Set2", name = "Kernel") +
  labs(x = "Test AUPRC", y = "Validation AUPRC",
       title = "Validation vs test AUPRC (all 2,160 experiments)",
       caption = "Dashed line = perfect agreement.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code
cor_val <- cor(auroc_data$test_auroc, auroc_data$val_auroc, method = "spearman")
cat(sprintf("Spearman correlation between test and validation AUROC: %.3f\n", cor_val))
Spearman correlation between test and validation AUROC: -0.199

Spearman correlation si slightly negative, but not worrisome.

3.10 Summary

Show code
bottom <- tibble(
  Item = c("Candidate kernel",
           "Class weight effect on AUROC",
           "Learning rate",
           "Dropout",
           "Test AUROC (median, all class weights)",
           "Test AUPRC (median, all class weights)",
           "Binary predictions at cutoff 0.50",
           "Next step"),
  Finding = c("K3 (1,122ms receptive field); significantly outperforms K0-K2, comparable to K4-K5",
              "No effect; all three strategies produce similar AUROC/AUPRC",
              "Stable across tested range (1e-5 to 5e-3); 5e-3 carried forward",
              "Stable (0.1 vs 0.2); 0.1 carried forward",
              "~0.590",
              "~0.290 (~1.3x prevalence)",
              "Requires inverse_freq weighting to avoid collapse; other strategies need lower cutoff",
              "Phase 3: warm fusion with CXR features using paired data")
)

kable(bottom, booktabs = TRUE, caption = "Summary of findings from 2,160 EKG Specialist V3 experiments.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE, width = "5cm") %>%
  column_spec(2, width = "9cm")
Summary of findings from 2,160 EKG Specialist V3 experiments.
Item Finding
Candidate kernel K3 (1,122ms receptive field); significantly outperforms K0-K2, comparable to K4-K5
Class weight effect on AUROC No effect; all three strategies produce similar AUROC/AUPRC
Learning rate Stable across tested range (1e-5 to 5e-3); 5e-3 carried forward
Dropout Stable (0.1 vs 0.2); 0.1 carried forward
Test AUROC (median, all class weights) ~0.590
Test AUPRC (median, all class weights) ~0.290 (~1.3x prevalence)
Binary predictions at cutoff 0.50 Requires inverse_freq weighting to avoid collapse; other strategies need lower cutoff
Next step Phase 3: warm fusion with CXR features using paired data

Discussion. Kernel size is the primary hyperparameter affecting AUROC: K3 (1,122ms) significantly outperforms K0-K2, while K4-K5 show no further gain. Class weight, learning rate, and dropout do not meaningfully affect AUROC or AUPRC within the tested ranges. Class weight becomes relevant only for binary predictions: at cutoff 0.50, inverse_freq is the only strategy producing non-degenerate predictions, but this reflects the cutoff choice, not the model ranking quality. The EKG specialist achieves ~0.59 AUROC, which is above chance but limited. The candidate configuration (K3, LR=5e-3, dropout=0.1) will proceed to Phase 3 warm fusion, where both the EKG and CXR specialist weights are fine-tuned jointly on paired CXR+EKG data.

3.10.1 PEPCEI Pipeline

PEPCEI Pipeline