EKG Specialist V3 - Results Summary

PEPCEI Project: Phase I

Author

Awan

Published

March 4, 2026

Show code

library(tidyverse)
library(knitr)
library(kableExtra)
library(ggridges)

results_dir <- "../sim-results-new/run3-successful/"

checkpoint_files <- tibble(
  file = c(
    "checkpoint_K0_split612_20260301_183110.rds",
    "checkpoint_K0_split928_20260302_183415.rds",
    "checkpoint_K1_split612_20260301_205417.rds",
    "checkpoint_K1_split928_20260302_210111.rds",
    "checkpoint_K2_split612_20260301_232405.rds",
    "checkpoint_K2_split928_20260302_234113.rds",
    "checkpoint_K3_split612_20260302_091243.rds",
    "checkpoint_K3_split928_20260303_083434.rds",
    "checkpoint_K4_split612_20260302_155423.rds",
    "checkpoint_K4_split928_20260303_093601.rds",
    "checkpoint_K5_split612_20260302_165129.rds",
    "checkpoint_K5_split928_20260303_103916.rds"
  ),
  expected_kernel = rep(paste0("K", 0:5), each = 2),
  expected_split = rep(c(612, 928), times = 6)
)

master <- map2_dfr(checkpoint_files$file, seq_len(nrow(checkpoint_files)), function(f, i) {
  cp <- readRDS(file.path(results_dir, f))
  cp$summary %>%
    filter(kernel_id == checkpoint_files$expected_kernel[i],
           data_split_seed == checkpoint_files$expected_split[i])
})

make_2x2 <- function(tp, fp, fn, tn, caption = NULL) {
  mat <- data.frame(
    ` ` = c("Actual PE+", "Actual PE-", "Total"),
    `Predicted PE+` = c(tp, fp, tp + fp),
    `Predicted PE-` = c(fn, tn, fn + tn),
    Total = c(tp + fn, fp + tn, tp + fp + fn + tn),
    check.names = FALSE
  )
  kable(mat, booktabs = TRUE, caption = caption, align = "lrrr") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
}

1 Summary

We evaluated 6 kernel configurations (K0-K5) spanning receptive fields from 82ms to 2,598ms on a 2-layer Conv2D architecture for pulmonary embolism detection from raw 12-lead EKG waveforms. 2,160 experiments (6 kernels x 3 class weights x 6 LRs x 2 dropouts x 2 data splits x 5 seeds).

Key results:

Kernel size significantly affects AUROC (Kruskal-Wallis p < 1e-14). K3 (1,122ms RF) has the highest median AUROC (0.590) and AUPRC (0.290, ~1.3x prevalence). K3 outperforms K0-K2 (Wilcoxon p < 0.001, BH-adjusted). K3 vs K4/K5: not significant.
Class weight does not affect AUROC/AUPRC. All three strategies produce overlapping distributions. Class weight affects binary predictions at cutoff 0.50: inverse_freq eliminates collapse; other strategies require lower cutoffs.
Learning rate and dropout have limited effect on AUROC within the tested ranges.
EKG alone: ~0.59 AUROC. Above chance but limited.

Methods Note: A critical optimizer bug (GPU parameters not updating after .to(cuda)) was discovered and fixed prior to these runs. All 2,160 experiments reported here used the corrected training pipeline.

Terminology

AUROC (Area Under the Receiver Operating Characteristic Curve): Measures the model’s ability to distinguish PE-positive from PE-negative patients across all possible decision thresholds. Ranges from 0.5 (random chance) to 1.0 (perfect discrimination). Threshold-independent.

AUPRC (Area Under the Precision-Recall Curve): Measures discrimination performance with emphasis on the positive (PE) class. More informative than AUROC when the positive class is rare. Baseline equals the prevalence (~0.232 for our EKG-only cohort, ~0.235 for the paired validation cohort). Values are reported with a multiple of prevalence in parentheses (e.g., 0.292 (~1.3x) means 1.3 times the baseline). Threshold-independent.

Sensitivity (Recall, True Positive Rate): Proportion of actual PE cases correctly identified. A sensitivity of 80% means 80 out of 100 PE patients are flagged. Critical for screening, as missed PE cases can be fatal.

Specificity (True Negative Rate): Proportion of actual non-PE cases correctly identified. A specificity of 60% means 60 out of 100 non-PE patients are correctly cleared.

PPV (Positive Predictive Value, Precision): Among patients the model flags as PE-positive, the proportion who actually have PE.

NPV (Negative Predictive Value): Among patients the model clears as PE-negative, the proportion who truly do not have PE.

F1 Score: Harmonic mean of precision (PPV) and sensitivity. Balances the trade-off between catching PE cases and avoiding false alarms. Ranges from 0 to 1.

Youden’s J (Youden Index): Sensitivity + Specificity - 1. Measures how much better the model is than random guessing (J=0). A model with 60% sensitivity and 60% specificity has J=0.20.

Receptive Field (RF): The temporal span of raw EKG signal that a single output unit of the convolutional network can “see.” A 1,122ms receptive field means each feature captures approximately 1-2 cardiac beats depending on heart rate (e.g., ~1.4 beats at 75 bpm, ~1.9 beats at 100 bpm). [See full explanation in the experiment design section below]

Collapse: When a model predicts the same class for all patients (e.g., all PE-negative), producing 0% sensitivity or 0% specificity. Common in imbalanced classification without proper class weighting.

Inverse Frequency Weighting: A class weighting strategy that penalizes misclassification of the minority class (PE-positive) proportional to its rarity. With ~23% PE prevalence, the positive class receives ~3.3x the loss weight of the negative class.

Cutoff (Decision Threshold): The probability above which the model classifies a patient as PE-positive. A cutoff of 0.50 means any predicted probability above 50% is classified as PE+. Lower cutoffs increase sensitivity at the cost of specificity.

2 Experiment Design

2.1 Architecture

Input: 1 x 12 x 5,000 (channel x leads x timepoints). Raw 12-lead EKG waveform sampled at 500Hz for 10 seconds (5,000 = 500Hz x 10s), unsqueezed to a single-channel 2D tensor where leads are the height dimension and time is the width dimension.

Show code

library(DiagrammeR)
library(DiagrammeRsvg)
library(rsvg)

g <- grViz('
digraph ekg_specialist_v3 {
  
  graph [rankdir=TB, fontname="Gill Sans", bgcolor="white", 
         nodesep=0.6, ranksep=0.7, pad=0.5]
  node [fontname="Gill Sans", fontsize=18, shape=box, style="filled,rounded", penwidth=2]
  edge [fontname="Gill Sans", fontsize=16, penwidth=1.5]

  input [label="EKG Input\n1 x 12 x 5,000\n(channel x leads x timepoints)", 
         shape=ellipse, fillcolor="#D5F5E3", color="#27AE60", width=3.5, height=1.5]

  conv1 [label="Conv2D (1 -> 32)\nkernel (1, k1), stride (1, s1)\nBatchNorm + ReLU", 
         fillcolor="#82E0AA", color="#1E8449", width=3.8, height=1.3]

  pool1 [label="MaxPool2D\nkernel (1, pk), stride (1, pk)", 
         fillcolor="#A9DFBF", color="#1E8449", width=3.5, height=0.9]

  drop1 [label="Dropout (p)", fillcolor="#D5F5E3", color="#27AE60", width=2.5, height=0.7]

  conv2 [label="Conv2D (32 -> 64)\nkernel (3, k2), stride (1, 1)\nBatchNorm + ReLU", 
         fillcolor="#AED6F1", color="#2E86C1", width=3.8, height=1.3]

  pool2 [label="MaxPool2D\nkernel (1, pk), stride (1, pk)", 
         fillcolor="#D6EAF8", color="#2E86C1", width=3.5, height=0.9]

  drop2 [label="Dropout (p)", fillcolor="#D6EAF8", color="#2E86C1", width=2.5, height=0.7]

  apool [label="AdaptiveAvgPool2D -> (1, 1)\nFlatten -> 64-dim", 
         fillcolor="#F5CBA7", color="#E67E22", width=3.5, height=0.9]

  fc [label="Linear (64 -> 32) + ReLU\nDropout (p)", 
      fillcolor="#F9E79F", color="#D4AC0D", width=3.5, height=0.9]

  output [label="Linear (32 -> 2)\nPE-negative | PE-positive", 
          shape=ellipse, fillcolor="#FADBD8", color="#E74C3C", width=3.5, height=1.2]

  rf_note [label="Receptive Field\n(kernel-config dependent)\nK0: 82ms  |  K3: 1,122ms  |  K5: 2,598ms",
           shape=note, fillcolor="#F2F3F4", color="#7F8C8D", fontsize=16, width=4.0, height=1.2]

  input -> conv1 -> pool1 -> drop1 -> conv2 -> pool2 -> drop2 -> apool -> fc -> output
  
  pool2 -> rf_note [style=dashed, color="#7F8C8D", constraint=false]
}
')

svg_text <- export_svg(g)
rsvg_png(charToRaw(svg_text), file = "ekg_specialist_v3_arch.png", width = 2000, height = 3000)

Where k1, s1, k2, pk are kernel-configuration-specific parameters (see Kernel Configurations table below). Conv2 uses a (3, k2) kernel, spanning 3 leads vertically while varying temporally. All dropout layers use the same rate (hyperparameter). Output is 2 logits (PE-negative, PE-positive), passed through softmax at inference.

Kernel Configurations

Show code

kernel_table <- tibble(
  Kernel = paste0("K", 0:5),
  `Conv1` = c("(1,11)", "(1,11)", "(1,25)", "(1,51)", "(1,75)", "(1,101)"),
  Stride = c("(1,2)", "(1,1)", "(1,1)", "(1,2)", "(1,2)", "(1,2)"),
  `Conv2` = c("(3,7)", "(3,5)", "(3,11)", "(3,25)", "(3,35)", "(3,51)"),
  Pool = c("(1,2)", "(1,4)", "(1,4)", "(1,8)", "(1,8)", "(1,10)"),
  `RF (ms)` = c(82, 84, 160, 1122, 1490, 2598)
)


kable(kernel_table, booktabs = TRUE, 
      caption = "Kernel configurations and corresponding receptive fields.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Kernel configurations and corresponding receptive fields.
Kernel	Conv1	Stride	Conv2	Pool	RF (ms)
K0	(1,11)	(1,2)	(3,7)	(1,2)	82
K1	(1,11)	(1,1)	(3,5)	(1,4)	84
K2	(1,25)	(1,1)	(3,11)	(1,4)	160
K3	(1,51)	(1,2)	(3,25)	(1,8)	1122
K4	(1,75)	(1,2)	(3,35)	(1,8)	1490
K5	(1,101)	(1,2)	(3,51)	(1,10)	2598

Receptive Field Design Rationale

2.1.1 Receptive Field Design Rationale

What is a receptive field? The input to the model is 5,000 timepoints of raw EKG signal sampled at 500Hz (5,000 / 500 = 10 seconds). After passing through convolutional and pooling layers, the signal is compressed into a small feature map. Each value in that feature map was computed from some contiguous chunk of the original 5,000 timepoints. The receptive field (RF) is the width of that chunk, i.e. how much raw signal each learned feature can “see.”

If the RF is 82ms, each feature sees only a fraction of the QRS complex. If the RF is 1,122ms, each feature sees an entire P-QRS-ST-T cycle, the full electrical signature of one heartbeat. Since classic PE signs on EKG (right heart strain, ST-segment changes, T-wave inversions) are expressed across the full beat, the hypothesis is that larger RFs should improve detection.

The RF formula. For a sequence of convolutional and pooling layers, the RF grows according to:

\[RF_{new} = RF_{old} + (k - 1) \times S_{cumulative}\]

where \(k\) is the kernel size of the current layer, and \(S_{cumulative}\) is the product of all strides in preceding layers. This captures the key insight: later layers operate on downsampled feature maps, so each position in their input represents multiple raw samples. A kernel of width 25 in Conv2 does not span 25 raw samples. Rather, it spans 25 positions that are each \(S_{cumulative}\) raw samples apart.

K3 step-by-step example. Architecture (temporal dimension): Conv1(k=51, s=2) -> MaxPool(k=8, s=8) -> Conv2(k=25, s=1) -> MaxPool(k=8, s=8):

Layer	Kernel	Stride	Calculation	RF (samples)	Cumulative Stride
Input	—	—	—	1	1
Conv1	51	2	1 + (51-1) x 1	51	2
Pool1	8	8	51 + (8-1) x 2	65	16
Conv2	25	1	65 + (25-1) x 16	449	16
Pool2	8	8	449 + (8-1) x 16	561	128

561 samples / 500 Hz = 1,122ms

The large jump from 65 to 449 at Conv2 is because the cumulative stride is 16 at that point. Each of Conv2’s 25 kernel positions is 16 raw samples apart, so the kernel reaches across 24 x 16 = 384 additional raw samples.

Design constraints. Kernel sizes were chosen under three constraints:

Broad temporal sweep. Six configurations were designed to produce receptive fields spanning two orders of magnitude, from ~80ms to ~2,600ms, allowing the data to determine which timescale contains the most signal for PE detection. Rather than targeting exact cardiac cycle fractions, the configurations were spaced to sample a wide range of temporal resolutions.
2:1 kernel ratio. A consistent ~2:1 ratio between Conv1 and Conv2 kernel widths was maintained across configurations (K2-K5 range from 1.98:1 to 2.27:1), so that each layer contributes proportionally to the total RF.
Odd kernel widths. Standard convention in convolutional networks. Odd kernels allow symmetric padding (padding = (k-1)/2 on each side), preserving spatial alignment.

Given the 2:1 ratio and odd-kernel constraint, K3 uses conv1_k=51 and conv2_k=25, yielding 1,122ms. The RF tuning resolution at Conv2 is coarse: each +/-1 change in conv2_k shifts the RF by 32ms. To hit a different target while maintaining the 2:1 ratio would require changing both kernels simultaneously, so the achieved RFs are approximate by design.

Show code

# Verify RF calculations for all 6 kernels
kc <- readRDS(file.path(results_dir, checkpoint_files$file[1]))$kernel_configs

rf_calc <- tibble(
  Kernel = character(), conv1_k = integer(), conv1_s = integer(),
  pool_k = integer(), conv2_k = integer(),
  RF_samples = integer(), RF_ms = double(), Stored_ms = double()
)

for (i in 1:nrow(kc)) {
  row <- kc[i, ]
  c1_k <- unlist(row$conv1_k)[2]
  c1_s <- unlist(row$conv1_s)[2]
  c2_k <- unlist(row$conv2_k)[2]
  p_k  <- unlist(row$pool_k)[2]

  rf <- 1; cum_s <- 1
  rf <- rf + (c1_k - 1) * cum_s; cum_s <- cum_s * c1_s
  rf <- rf + (p_k - 1) * cum_s;  cum_s <- cum_s * p_k
  rf <- rf + (c2_k - 1) * cum_s; cum_s <- cum_s * 1
  rf <- rf + (p_k - 1) * cum_s;  cum_s <- cum_s * p_k

  rf_calc <- bind_rows(rf_calc, tibble(
    Kernel = row$kernel_id, conv1_k = c1_k, conv1_s = c1_s,
    pool_k = p_k, conv2_k = c2_k,
    `Conv1:Conv2` = ifelse(i == 1, sprintf("%.1f:1", c1_k / c2_k), sprintf("%d:1", round(c1_k / c2_k))),
    RF_samples = rf, RF_ms = rf / 500 * 1000, Stored_ms = row$rf_ms
  ))
}

kable(rf_calc, booktabs = TRUE,
      caption = "Receptive field verification. RF computed from architecture parameters and confirmed against stored values.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Receptive field verification. RF computed from architecture parameters and confirmed against stored values.
Kernel	conv1_k	conv1_s	pool_k	conv2_k	RF_samples	RF_ms	Stored_ms	Conv1:Conv2
K0	11	2	2	7	41	82	82	1.6:1
K1	11	1	4	5	42	84	84	2:1
K2	25	1	4	11	80	160	160	2:1
K3	51	2	8	25	561	1122	1122	2:1
K4	75	2	8	35	745	1490	1490	2:1
K5	101	2	10	51	1299	2598	2598	2:1

Clinical interpretation. How much signal each kernel captures depends on the patient’s heart rate. A normal resting heart rate is ~75 bpm (800ms per cycle). PE patients commonly present with sinus tachycardia (>=100 bpm), giving <=600ms per cycle.

Show code

hr_vals <- c(800, 600, 500, 400)
rf_vals <- c(82, 84, 160, 1122, 1490, 2598)

# Round to nearest 0.5 for cleaner presentation
round_half <- function(x) {
  r <- round(x * 2) / 2
  ifelse(r == round(r), sprintf("~%.0f", r), sprintf("~%.1f", r))
}

hr_table <- tibble(
  `Heart Rate` = c("75 bpm (normal resting)", "100 bpm (tachycardia threshold)", 
                    "120 bpm", "150 bpm"),
  `Cycle (ms)` = hr_vals
)

for (j in seq_along(rf_vals)) {
  raw <- rf_vals[j] / hr_vals
  hr_table[[paste0("K", j - 1)]] <- ifelse(raw < 0.5, sprintf("%.1f", raw), round_half(raw))
}

kable(hr_table, booktabs = TRUE,
      caption = "Approximate number of cardiac cycles captured by each kernel at different heart rates.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Approximate number of cardiac cycles captured by each kernel at different heart rates.
Heart Rate	Cycle (ms)	K0	K1	K2	K3	K4	K5
75 bpm (normal resting)	800	0.1	0.1	0.2	~1.5	~2	~3
100 bpm (tachycardia threshold)	600	0.1	0.1	0.3	~2	~2.5	~4.5
120 bpm	500	0.2	0.2	0.3	~2	~3	~5
150 bpm	400	0.2	0.2	0.4	~3	~3.5	~6.5

K0-K2 capture less than one beat at any heart rate. K3 captures 1-3 beats depending on heart rate, which is enough to see both waveform morphology (ST changes, T-wave inversions) and beat-to-beat timing. K4 and K5 capture more beats but show no significant improvement over K3, suggesting the additional context does not help.

K0 and K1 were carried over from V1/V2 for backward comparison and were not designed under the same 2:1 ratio convention.

2.2 Hyperparameter Grid

Learning rate: 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3
Class weight: none, weight_1_2 (1:2), inverse_freq (~1:3.3)
Dropout: 0.10, 0.20
Data split seeds: 612, 928 (70/30 stratified splits of 2,039 EKG-only patients)
Initialization seeds: 42, 123, 456, 789, 2024

Total: 6 kernels x 36 HP combos x 2 splits x 5 seeds = 2,160 experiments. Early stopping with patience=7 on test loss, maximum 30 epochs.

2.3 Data

Training/test: 2,039 EKG-only patients, split 70/30 per data split seed. PE prevalence ~23.2%.
Validation: 132 patients from the paired pool (patients with both EKG + CXR), using fixed paired_split_seed=612. The specialist only evaluates on validation and never trains on paired data, preventing leakage into Phase 3 fusion experiments.

Model selection strategy: The EKG specialist is selected based on test set performance (613 EKG-only patients), not validation set performance. This is deliberate: because Phase 3 fusion models will be evaluated on the same 132-patient validation set, selecting the specialist based on validation would constitute indirect data leakage. In warm fusion, the specialist weights are fine-tuned with paired data, so any pre-optimization for validation patients would propagate through the fusion training. Test-based selection keeps specialist selection independent of fusion evaluation.

Overfitting safeguards: (1) Early stopping with patience=7 on test loss (maximum 30 epochs) prevents overtraining. (2) Dropout regularization is applied after each convolutional block and in the fully connected layer. (3) Two independent stratified data splits (seeds 612, 928) verify that results are not driven by a particular train/test partition. (4) Five initialization seeds per configuration quantify seed-to-seed variability. (5) The 132-patient validation set is completely held out from both training and hyperparameter selection, serving only as an independent check on generalization.

3 Results

All metrics in Sections 4.1-4.7 are threshold-independent (AUROC, AUPRC). Binary prediction metrics (sensitivity, specificity, confusion matrices) are introduced in Section 4.8.

3.1 The Full Landscape (2,160 experiments)

Show code

# One row per experiment (AUROC/AUPRC are identical across cutoff rows)
auroc_data <- master %>% filter(cutoff == 0.50)

ggplot(auroc_data, aes(x = test_auroc)) +
  geom_histogram(binwidth = 0.02, fill = "#5DADE2", color = "white", alpha = 0.8) +
  geom_vline(xintercept = 0.50, linetype = "dashed", color = "#E74C3C", linewidth = 0.6) +
  annotate("text", x = 0.52, y = Inf, label = "Random chance (0.50)",
           hjust = 0, vjust = 2, color = "#E74C3C", size = 3.5) +
  labs(x = "Test AUROC", y = "Count",
       title = "Distribution of test AUROC across all 2,160 experiments",
       caption = "All kernels, class weights, learning rates, dropout rates, splits, and seeds.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

The bulk of experiments fall in the 0.55-0.62 AUROC range. A small number fall below 0.50.

Show code

ggplot(auroc_data, aes(x = test_auprc)) +
  geom_histogram(binwidth = 0.01, fill = "#58D68D", color = "white", alpha = 0.8) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.6) +
  annotate("text", x = 0.237, y = Inf, label = "Prevalence baseline (0.232)",
           hjust = 0, vjust = 2, color = "#E74C3C", size = 3.5) +
  labs(x = "Test AUPRC", y = "Count",
       title = "Distribution of test AUPRC across all 2,160 experiments",
       caption = "Dashed line = prevalence baseline (expected AUPRC of a random classifier).") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Most experiments exceed the prevalence baseline (0.232), with the bulk in the 0.25-0.32 range.

3.2 Class Weight (2,160 experiments)

Show code

auroc_data <- auroc_data %>%
  mutate(cw_label = case_when(
    class_weight == "none" ~ "None",
    class_weight == "weight_1_2" ~ "1:2",
    class_weight == "inverse_freq" ~ "Inverse freq (~1:3.3)"
  ) %>% factor(levels = c("Inverse freq (~1:3.3)", "1:2", "None")))

ggplot(auroc_data, aes(x = test_auroc, y = cw_label, fill = cw_label)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2, rel_min_height = 0.01) +
  geom_vline(xintercept = 0.50, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_manual(values = c("#27AE60", "#F39C12", "#E74C3C")) +
  labs(x = "Test AUROC", y = NULL,
       title = "Test AUROC distribution by class weight strategy",
       caption = "All kernels, LRs, dropout rates, splits, and seeds pooled (720 experiments per weight).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code

ggplot(auroc_data, aes(x = test_auprc, y = cw_label, fill = cw_label)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2, rel_min_height = 0.01) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_manual(values = c("#27AE60", "#F39C12", "#E74C3C")) +
  labs(x = "Test AUPRC", y = NULL,
       title = "Test AUPRC distribution by class weight strategy",
       caption = "720 experiments per weight. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

All three class weight strategies produce overlapping AUROC distributions in the 0.55-0.62 range. Class weight does not affect AUROC or AUPRC (threshold-independent metrics).

Show code

cw_auroc <- auroc_data %>%
  group_by(class_weight) %>%
  summarise(
    N = n(),
    Median = sprintf("%.3f", median(test_auroc)),
    SD = sprintf("%.3f", sd(test_auroc)),
    `IQR` = sprintf("%.3f - %.3f", quantile(test_auroc, 0.25), quantile(test_auroc, 0.75)),
    Min = sprintf("%.3f", min(test_auroc)),
    Max = sprintf("%.3f", max(test_auroc)),
    .groups = "drop"
  ) %>%
  rename(`Class Weight` = class_weight)

kable(cw_auroc, booktabs = TRUE,
      caption = "Test AUROC by class weight (720 experiments per strategy).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Test AUROC by class weight (720 experiments per strategy).
Class Weight	N	Median	SD	IQR	Min	Max
inverse_freq	720	0.580	0.010	0.574 - 0.589	0.554	0.612
none	720	0.581	0.039	0.565 - 0.592	0.432	0.621
weight_1_2	720	0.582	0.019	0.574 - 0.592	0.438	0.621

Show code

cw_auprc <- auroc_data %>%
  group_by(class_weight) %>%
  summarise(
    N = n(),
    Median = sprintf("%.3f", median(test_auprc)),
    SD = sprintf("%.3f", sd(test_auprc)),
    `IQR` = sprintf("%.3f - %.3f", quantile(test_auprc, 0.25), quantile(test_auprc, 0.75)),
    Min = sprintf("%.3f", min(test_auprc)),
    Max = sprintf("%.3f", max(test_auprc)),
    .groups = "drop"
  ) %>%
  rename(`Class Weight` = class_weight)

kable(cw_auprc, booktabs = TRUE,
      caption = "Test AUPRC by class weight (720 experiments per strategy). Prevalence baseline = 0.232.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Test AUPRC by class weight (720 experiments per strategy). Prevalence baseline = 0.232.
Class Weight	N	Median	SD	IQR	Min	Max
inverse_freq	720	0.285	0.011	0.277 - 0.293	0.254	0.322
none	720	0.279	0.023	0.270 - 0.290	0.199	0.319
weight_1_2	720	0.283	0.014	0.274 - 0.290	0.203	0.325

3.3 Kernel Size (2,160 experiments)

Each kernel has 360 experiments (3 class weights x 6 LRs x 2 dropouts x 2 splits x 5 seeds).

Show code

auroc_data <- auroc_data %>%
  mutate(kernel_label = sprintf("%s (%dms)", kernel_id, rf_ms) %>%
           fct_rev())

ggplot(auroc_data, aes(x = test_auroc, y = kernel_label, fill = kernel_label)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2, rel_min_height = 0.01) +
  geom_vline(xintercept = 0.50, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Greens", direction = 1) +
  labs(x = "Test AUROC", y = NULL,
       title = "Test AUROC distribution by kernel (all 2,160 experiments)",
       caption = "360 experiments per kernel. Dashed line = random chance (0.50).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

K3 has the rightmost peak. K0-K2 are slightly left-shifted. K4-K5 overlap with K3.

Show code

ggplot(auroc_data, aes(x = test_auprc, y = kernel_label, fill = kernel_label)) +
  geom_density_ridges(alpha = 0.7, scale = 1.2, rel_min_height = 0.01) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  annotate("text", x = 0.235, y = Inf, label = "Prevalence baseline (0.232)",
           hjust = 0, vjust = 2, color = "#E74C3C", size = 3.5) +
  scale_fill_brewer(palette = "Greens", direction = 1) +
  labs(x = "Test AUPRC", y = NULL,
       title = "Test AUPRC distribution by kernel (all 2,160 experiments)",
       caption = "360 experiments per kernel. Dashed line = prevalence baseline.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Same pattern on AUPRC. K3 peak is rightmost. All kernels exceed the prevalence baseline.

Show code

ggplot(auroc_data, aes(y = kernel_label, x = test_auroc, fill = kernel_id)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.50, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Greens") +
  labs(x = "Kernel", y = "Test AUROC",
       title = "Test AUROC by kernel (all 2,160 experiments)",
       caption = "360 experiments per kernel. Dashed line = random chance (0.50).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code

ggplot(auroc_data, aes(y = kernel_label, x = test_auprc, fill = kernel_id)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Greens") +
  labs(x = "Test AUPRC", y = NULL,
       title = "Test AUPRC by kernel (all 2,160 experiments)",
       caption = "360 experiments per kernel. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

K3 has the highest median. K0-K2 are slightly lower. K4-K5 are comparable to K3.

Show code

kernel_test <- auroc_data %>%
  group_by(kernel_id, rf_ms) %>%
  summarise(
    med_auroc = median(test_auroc), sd_auroc = sd(test_auroc),
    med_auprc = median(test_auprc), sd_auprc = sd(test_auprc),
    .groups = "drop"
  ) %>%
  arrange(kernel_id) %>%
  transmute(
    Kernel = kernel_id, `RF (ms)` = rf_ms,
    `AUROC` = sprintf("%.3f (%.3f)", med_auroc, sd_auroc),
    `AUPRC` = sprintf("%.3f (%.3f)", med_auprc, sd_auprc)
  )

kable(kernel_test, booktabs = TRUE,
      caption = "Kernel comparison, test set. Median (SD) across 360 experiments per kernel.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  row_spec(4, bold = TRUE)

Kernel comparison, test set. Median (SD) across 360 experiments per kernel.
Kernel	RF (ms)	AUROC	AUPRC
K0	82	0.577 (0.023)	0.280 (0.015)
K1	84	0.573 (0.036)	0.278 (0.021)
K2	160	0.575 (0.034)	0.271 (0.019)
K3	1122	0.592 (0.010)	0.289 (0.011)
K4	1490	0.589 (0.011)	0.287 (0.012)
K5	2598	0.586 (0.012)	0.289 (0.012)

Show code

kernel_val <- auroc_data %>%
  group_by(kernel_id, rf_ms) %>%
  summarise(
    med_auroc = median(val_auroc), sd_auroc = sd(val_auroc),
    med_auprc = median(val_auprc), sd_auprc = sd(val_auprc),
    .groups = "drop"
  ) %>%
  arrange(kernel_id) %>%
  transmute(
    Kernel = kernel_id, `RF (ms)` = rf_ms,
    `AUROC` = sprintf("%.3f (%.3f)", med_auroc, sd_auroc),
    `AUPRC` = sprintf("%.3f (%.3f)", med_auprc, sd_auprc)
  )

kable(kernel_val, booktabs = TRUE,
      caption = "Kernel comparison, validation set. Median (SD) across 360 experiments per kernel.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Kernel comparison, validation set. Median (SD) across 360 experiments per kernel.
Kernel	RF (ms)	AUROC	AUPRC
K0	82	0.603 (0.034)	0.363 (0.043)
K1	84	0.598 (0.046)	0.331 (0.042)
K2	160	0.616 (0.040)	0.378 (0.048)
K3	1122	0.590 (0.015)	0.307 (0.023)
K4	1490	0.579 (0.016)	0.293 (0.025)
K5	2598	0.565 (0.023)	0.296 (0.026)

K3 has the highest median AUROC and AUPRC on both test and validation sets.

3.3.1 Statistical Test

Kruskal-Wallis rank-sum test on test AUROC across the 6 kernel groups, with pairwise Wilcoxon rank-sum tests (BH-adjusted) for post-hoc comparisons. All 2,160 experiments included.

Show code

cat(sprintf("N = %d experiments (%d per kernel)\n",
            nrow(auroc_data),
            nrow(auroc_data) / n_distinct(auroc_data$kernel_id)))

N = 2160 experiments (360 per kernel)

Show code

kw <- kruskal.test(test_auroc ~ kernel_id, data = auroc_data)
cat(sprintf("Kruskal-Wallis chi-squared = %.2f, df = %d, p = %.2e\n",
            kw$statistic, kw$parameter, kw$p.value))

Kruskal-Wallis chi-squared = 673.05, df = 5, p = 3.29e-143

Show code

pw <- pairwise.wilcox.test(auroc_data$test_auroc, auroc_data$kernel_id,
                           p.adjust.method = "BH")

kable(pw$p.value, booktabs = TRUE, digits = 4,
      caption = "Pairwise Wilcoxon p-values (BH-adjusted).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Pairwise Wilcoxon p-values (BH-adjusted).
	K0	K1	K2	K3	K4
K1	1e-04	NA	NA	NA	NA
K2	5e-04	0.3048	NA	NA	NA
K3	0e+00	0.0000	0	NA	NA
K4	0e+00	0.0000	0	9e-04	NA
K5	0e+00	0.0000	0	0e+00	0.0688

Show code

rf_summary <- auroc_data %>%
  group_by(kernel_id, rf_ms) %>%
  summarise(
    med_auroc = median(test_auroc),
    q25 = quantile(test_auroc, 0.25),
    q75 = quantile(test_auroc, 0.75),
    .groups = "drop"
  ) %>%
  mutate(kernel_label = sprintf("%s\n(%dms)", kernel_id, rf_ms) %>% fct_inorder())

ggplot(rf_summary, aes(x = kernel_label, y = med_auroc)) +
  geom_errorbar(aes(ymin = q25, ymax = q75), width = 0.2, color = "grey50") +
  geom_point(size = 4, color = "#27AE60") +
  geom_line(aes(group = 1), color = "#27AE60", linewidth = 0.8, alpha = 0.5) +
  labs(x = "Kernel (Receptive Field)", y = "Median Test AUROC",
       title = "Median test AUROC by kernel",
       caption = "Error bars = IQR. 360 experiments per kernel.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code

rf_summary_auprc <- auroc_data %>%
  group_by(kernel_id, rf_ms) %>%
  summarise(
    med_auprc = median(test_auprc),
    q25 = quantile(test_auprc, 0.25),
    q75 = quantile(test_auprc, 0.75),
    .groups = "drop"
  ) %>%
  mutate(kernel_label = sprintf("%s\n(%dms)", kernel_id, rf_ms) %>% fct_inorder())

ggplot(rf_summary_auprc, aes(x = kernel_label, y = med_auprc)) +
  geom_errorbar(aes(ymin = q25, ymax = q75), width = 0.2, color = "grey50") +
  geom_hline(yintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  geom_point(size = 4, color = "#27AE60") +
  geom_line(aes(group = 1), color = "#27AE60", linewidth = 0.8, alpha = 0.5) +
  labs(x = "Kernel (Receptive Field)", y = "Median Test AUPRC",
       title = "Median test AUPRC by kernel",
       caption = "Error bars = IQR. 360 experiments per kernel. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

AUROC increases from K0-K2 to K3, then plateaus at K4/K5. Same pattern on AUPRC.

AUROC increases from K0-K2 (~80-160ms) to K3 (~1,100ms), then plateaus at K4/K5.

3.4 Learning Rate

All 2,160 experiments by learning rate (360 per LR):

Show code

auroc_data <- auroc_data %>%
  mutate(lr_label_all = sprintf("%.0e", lr) %>%
           fct_reorder(test_auroc, .fun = median))

ggplot(auroc_data, aes(y = lr_label_all, x = test_auroc, fill = lr_label_all)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  scale_fill_brewer(palette = "Blues") +
  labs(y = "Learning Rate", x = "Test AUROC",
       title = "Test AUROC by learning rate (all 2,160 experiments)",
       caption = "360 experiments per LR.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code

ggplot(auroc_data, aes(y = lr_label_all, x = test_auprc, fill = lr_label_all)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Blues") +
  labs(y = "Learning Rate", x = "Test AUPRC",
       title = "Test AUPRC by learning rate (all 2,160 experiments)",
       caption = "360 experiments per LR. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

No clear separation across learning rates on either metric.

Narrowed to K3 (60 experiments per LR):

K3 fixed. All class weights, dropouts, splits, and seeds (60 experiments per LR).

Show code

k3_data <- auroc_data %>%
  filter(kernel_id == "K3") %>%
  mutate(lr_label = sprintf("%.0e", lr) %>%
           fct_reorder(test_auroc, .fun = median))

ggplot(k3_data, aes(y = lr_label, x = test_auroc, fill = lr_label)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  scale_fill_brewer(palette = "Blues") +
  labs(y = "Learning Rate", x = "Test AUROC",
       title = "Test AUROC by learning rate (K3, all class weights)",
       caption = "60 experiments per LR.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code

ggplot(k3_data, aes(y = lr_label, x = test_auprc, fill = lr_label)) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_brewer(palette = "Blues") +
  labs(y = "Learning Rate", x = "Test AUPRC",
       title = "Test AUPRC by learning rate (K3, all class weights)",
       caption = "60 experiments per LR. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Same pattern after narrowing to K3.

The distributions largely overlap across learning rates.

Show code

lr_summary <- k3_data %>%
  group_by(lr) %>%
  summarise(
    N = n(),
    `Median AUROC` = sprintf("%.3f", median(test_auroc)),
    `SD AUROC` = sprintf("%.3f", sd(test_auroc)),
    `Median AUPRC` = sprintf("%.3f", median(test_auprc)),
    `SD AUPRC` = sprintf("%.3f", sd(test_auprc)),
    .groups = "drop"
  ) %>%
  rename(`Learning Rate` = lr)

kable(lr_summary, booktabs = TRUE,
      caption = "Learning rate effect for K3. Median (SD) across all class weights, dropouts, splits, and seeds.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Learning rate effect for K3. Median (SD) across all class weights, dropouts, splits, and seeds.
Learning Rate	N	Median AUROC	SD AUROC	Median AUPRC	SD AUPRC
1e-05	60	0.585	0.010	0.290	0.013
5e-05	60	0.589	0.008	0.289	0.010
1e-04	60	0.591	0.009	0.286	0.008
5e-04	60	0.593	0.011	0.290	0.011
1e-03	60	0.596	0.011	0.290	0.010
5e-03	60	0.597	0.010	0.290	0.010

AUROC is stable across the LR range. LR = 5e-3 is carried forward.

3.5 Dropout

All 2,160 experiments by dropout (1,080 per level):

Show code

ggplot(auroc_data, aes(y = factor(dropout_rate), x = test_auroc, fill = factor(dropout_rate))) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  scale_fill_manual(values = c("#8E44AD", "#F39C12")) +
  labs(y = "Dropout Rate", x = "Test AUROC",
       title = "Test AUROC by dropout rate (all 2,160 experiments)",
       caption = "1,080 experiments per dropout level.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code

ggplot(auroc_data, aes(y = factor(dropout_rate), x = test_auprc, fill = factor(dropout_rate))) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_manual(values = c("#8E44AD", "#F39C12")) +
  labs(y = "Dropout Rate", x = "Test AUPRC",
       title = "Test AUPRC by dropout rate (all 2,160 experiments)",
       caption = "1,080 experiments per dropout level. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

No clear separation across dropout rates on either metric.

Narrowed to K3, LR=5e-3 (30 experiments per dropout):

K3 and LR=5e-3 fixed. All class weights, splits, and seeds (30 experiments per dropout).

Show code

k3_lr <- auroc_data %>%
  filter(kernel_id == "K3", lr == 0.005)

dropout_wide <- k3_lr %>%
  select(data_split_seed, seed, class_weight, dropout_rate, test_auroc) %>%
  pivot_wider(names_from = dropout_rate, values_from = test_auroc, names_prefix = "drop_")

ggplot(dropout_wide, aes(x = drop_0.1, y = drop_0.2)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(color = class_weight), size = 3, alpha = 0.8) +
  labs(x = "Test AUROC (dropout = 0.10)", y = "Test AUROC (dropout = 0.20)",
       title = "Dropout 0.10 vs 0.20 (K3, LR=5e-3, all class weights)",
       caption = "Each point is one seed/split/class_weight combination.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code

dropout_wide_auprc <- k3_lr %>%
  select(data_split_seed, seed, class_weight, dropout_rate, test_auprc) %>%
  pivot_wider(names_from = dropout_rate, values_from = test_auprc, names_prefix = "drop_")

ggplot(dropout_wide_auprc, aes(x = drop_0.1, y = drop_0.2)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(color = class_weight), size = 3, alpha = 0.8) +
  labs(x = "Test AUPRC (dropout = 0.10)", y = "Test AUPRC (dropout = 0.20)",
       title = "Dropout 0.10 vs 0.20 - AUPRC (K3, LR=5e-3, all class weights)",
       caption = "Each point is one seed/split/class_weight combination.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Points mostly cluster near the diagonal. No clear pattern detected.

Show code

dropout_summary <- k3_lr %>%
  group_by(dropout_rate) %>%
  summarise(
    N = n(),
    `Median AUROC` = sprintf("%.3f", median(test_auroc)),
    `SD AUROC` = sprintf("%.3f", sd(test_auroc)),
    `Median AUPRC` = sprintf("%.3f", median(test_auprc)),
    `SD AUPRC` = sprintf("%.3f", sd(test_auprc)),
    .groups = "drop"
  ) %>%
  rename(Dropout = dropout_rate)

kable(dropout_summary, booktabs = TRUE,
      caption = "Dropout comparison for K3, LR=5e-3, all class weights.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Dropout comparison for K3, LR=5e-3, all class weights.
Dropout	N	Median AUROC	SD AUROC	Median AUPRC	SD AUPRC
0.1	30	0.596	0.012	0.292	0.010
0.2	30	0.597	0.008	0.290	0.009

Median AUROC is similar for both dropout rates. Dropout = 0.10 is carried forward.

3.6 Data Split Robustness

All 2,160 experiments by data split (1,080 per split):

Show code

ggplot(auroc_data, aes(y = factor(data_split_seed), x = test_auroc, fill = factor(data_split_seed))) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  scale_fill_manual(values = c("#2E86C1", "#E67E22")) +
  labs(y = "Data Split", x = "Test AUROC",
       title = "Test AUROC by data split (all 2,160 experiments)",
       caption = "1,080 experiments per split.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show code

ggplot(auroc_data, aes(y = factor(data_split_seed), x = test_auprc, fill = factor(data_split_seed))) +
  geom_boxplot(alpha = 0.7, outlier.size = 1) +
  geom_vline(xintercept = 0.232, linetype = "dashed", color = "#E74C3C", linewidth = 0.4) +
  scale_fill_manual(values = c("#2E86C1", "#E67E22")) +
  labs(y = "Data Split", x = "Test AUPRC",
       title = "Test AUPRC by data split (all 2,160 experiments)",
       caption = "1,080 experiments per split. Dashed line = prevalence baseline (0.232).") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

No clear separation across splits on either metric.

Narrowed to K3, LR=5e-3, dropout=0.10 (15 experiments per split):

Show code

k3_final <- auroc_data %>%
  filter(kernel_id == "K3", lr == 0.005, dropout_rate == 0.1)

split_wide <- k3_final %>%
  select(seed, class_weight, data_split_seed, test_auroc) %>%
  pivot_wider(names_from = data_split_seed, values_from = test_auroc, names_prefix = "split_")

ggplot(split_wide, aes(x = split_612, y = split_928)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(color = class_weight), size = 3, alpha = 0.8) +
  labs(x = "Test AUROC (split 612)", y = "Test AUROC (split 928)",
       title = "Split 612 vs 928 (K3, LR=5e-3, dropout=0.10, all class weights)",
       caption = "Each point is one seed/class_weight combination.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code

split_wide_auprc <- k3_final %>%
  select(seed, class_weight, data_split_seed, test_auprc) %>%
  pivot_wider(names_from = data_split_seed, values_from = test_auprc, names_prefix = "split_")

ggplot(split_wide_auprc, aes(x = split_612, y = split_928)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(color = class_weight), size = 3, alpha = 0.8) +
  labs(x = "Test AUPRC (split 612)", y = "Test AUPRC (split 928)",
       title = "Split 612 vs 928 - AUPRC (K3, LR=5e-3, dropout=0.10, all class weights)",
       caption = "Each point is one seed/class_weight combination.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Points mostly cluster near the diagonal. No clear pattern detected.

Show code

split_summary <- k3_final %>%
  group_by(data_split_seed) %>%
  summarise(
    N = n(),
    `Median AUROC` = sprintf("%.3f", median(test_auroc)),
    `SD AUROC` = sprintf("%.3f", sd(test_auroc)),
    `Median AUPRC` = sprintf("%.3f", median(test_auprc)),
    `SD AUPRC` = sprintf("%.3f", sd(test_auprc)),
    .groups = "drop"
  ) %>%
  rename(Split = data_split_seed)

kable(split_summary, booktabs = TRUE,
      caption = "Split comparison for K3, LR=5e-3, dropout=0.10, all class weights.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Split comparison for K3, LR=5e-3, dropout=0.10, all class weights.
Split	N	Median AUROC	SD AUROC	Median AUPRC	SD AUPRC
612	15	0.597	0.007	0.289	0.006
928	15	0.596	0.015	0.294	0.013

Median AUROC is similar across splits.

3.7 Candidate Configuration

Configuration selected from preceding sections:

Kernel: K3 (1,122ms RF), highest median AUROC (Section 4.3)
Learning rate: 5e-3 (Section 4.4)
Dropout: 0.10 (Section 4.5)

Class weight is not selected here because it does not affect AUROC/AUPRC. It becomes relevant in Section 4.8 when binary predictions are needed.

Show code

rec_seeds <- master %>%
  filter(cutoff == 0.50, kernel_id == "K3", lr == 0.005, dropout_rate == 0.1)

rec_seed_table <- rec_seeds %>%
  transmute(
    Split = data_split_seed, Seed = seed, `Class Weight` = class_weight,
    Epoch = best_epoch,
    AUROC = sprintf("%.3f", test_auroc),
    AUPRC = sprintf("%.3f", test_auprc)
  ) %>%
  arrange(Split, `Class Weight`, Seed)

kable(rec_seed_table, booktabs = TRUE,
      caption = "Candidate config (K3, LR=5e-3, dropout=0.1) across all seed/split/class_weight combinations. Test set.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Candidate config (K3, LR=5e-3, dropout=0.1) across all seed/split/class_weight combinations. Test set.
Split	Seed	Class Weight	Epoch	AUROC	AUPRC
612	42	inverse_freq	4	0.597	0.292
612	123	inverse_freq	6	0.584	0.287
612	456	inverse_freq	3	0.586	0.287
612	789	inverse_freq	2	0.595	0.293
612	2024	inverse_freq	8	0.601	0.297
612	42	none	2	0.605	0.294
612	123	none	5	0.605	0.299
612	456	none	2	0.595	0.288
612	789	none	2	0.597	0.285
612	2024	none	1	0.595	0.283
612	42	weight_1_2	2	0.597	0.291
612	123	weight_1_2	5	0.594	0.281
612	456	weight_1_2	6	0.589	0.279
612	789	weight_1_2	2	0.602	0.296
612	2024	weight_1_2	2	0.605	0.289
928	42	inverse_freq	4	0.582	0.271
928	123	inverse_freq	1	0.605	0.299
928	456	inverse_freq	10	0.564	0.302
928	789	inverse_freq	5	0.587	0.286
928	2024	inverse_freq	2	0.600	0.296
928	42	none	7	0.597	0.308
928	123	none	1	0.597	0.290
928	456	none	1	0.618	0.305
928	789	none	5	0.574	0.279
928	2024	none	3	0.599	0.306
928	42	weight_1_2	1	0.619	0.323
928	123	weight_1_2	2	0.590	0.294
928	456	weight_1_2	1	0.585	0.283
928	789	weight_1_2	3	0.581	0.292
928	2024	weight_1_2	2	0.596	0.281

Show code

rec_summary <- rec_seeds %>%
  group_by(`Class Weight` = class_weight) %>%
  summarise(
    N = n(),
    `Median AUROC` = sprintf("%.3f", median(test_auroc)),
    `SD` = sprintf("%.3f", sd(test_auroc)),
    `IQR` = sprintf("%.3f - %.3f", quantile(test_auroc, 0.25), quantile(test_auroc, 0.75)),
    `Min` = sprintf("%.3f", min(test_auroc)),
    `Max` = sprintf("%.3f", max(test_auroc)),
    .groups = "drop"
  )

kable(rec_summary, booktabs = TRUE,
      caption = "Candidate config test AUROC summary (K3, LR=5e-3, dropout=0.1).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Candidate config test AUROC summary (K3, LR=5e-3, dropout=0.1).
Class Weight	N	Median AUROC	SD	IQR	Min	Max
inverse_freq	10	0.591	0.012	0.584 - 0.599	0.564	0.605
none	10	0.597	0.011	0.596 - 0.603	0.574	0.618
weight_1_2	10	0.595	0.011	0.589 - 0.601	0.581	0.619

Show code

rec_summary_auprc <- rec_seeds %>%
  group_by(`Class Weight` = class_weight) %>%
  summarise(
    N = n(),
    `Median AUPRC` = sprintf("%.3f", median(test_auprc)),
    `SD` = sprintf("%.3f", sd(test_auprc)),
    `IQR` = sprintf("%.3f - %.3f", quantile(test_auprc, 0.25), quantile(test_auprc, 0.75)),
    `Min` = sprintf("%.3f", min(test_auprc)),
    `Max` = sprintf("%.3f", max(test_auprc)),
    .groups = "drop"
  )

kable(rec_summary_auprc, booktabs = TRUE,
      caption = "Candidate config test AUPRC summary (K3, LR=5e-3, dropout=0.1). Prevalence baseline = 0.232.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Candidate config test AUPRC summary (K3, LR=5e-3, dropout=0.1). Prevalence baseline = 0.232.
Class Weight	N	Median AUPRC	SD	IQR	Min	Max
inverse_freq	10	0.292	0.009	0.287 - 0.297	0.271	0.302
none	10	0.292	0.010	0.286 - 0.303	0.279	0.308
weight_1_2	10	0.290	0.013	0.281 - 0.294	0.279	0.323

Individual seed-level results:

Show code

epoch_data <- rec_seeds %>%
  mutate(split_label = factor(data_split_seed))

ggplot(epoch_data, aes(x = best_epoch, y = split_label, color = class_weight)) +
  geom_jitter(size = 3, height = 0.15, alpha = 0.8) +
  labs(x = "Best Epoch (early stopping)", y = "Data Split", color = "Class Weight",
       title = "Convergence epoch for candidate config across seeds",
       caption = "K3, LR=5e-3, dropout=0.1.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

3.7.1 Validation Set Performance

The validation set (132 paired patients with both CXR and EKG) is held out from specialist training. These are the patients that will be used in Phase 3 warm fusion.

Show code

rec_val_table <- rec_seeds %>%
  transmute(
    Split = data_split_seed, Seed = seed, `Class Weight` = class_weight,
    `Val AUROC` = sprintf("%.3f", val_auroc),
    `Val AUPRC` = sprintf("%.3f", val_auprc)
  ) %>%
  arrange(Split, `Class Weight`, Seed)

kable(rec_val_table, booktabs = TRUE,
      caption = "Candidate config (K3, LR=5e-3, dropout=0.1) validation set performance across all seed/split/class_weight combinations.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Candidate config (K3, LR=5e-3, dropout=0.1) validation set performance across all seed/split/class_weight combinations.
Split	Seed	Class Weight	Val AUROC	Val AUPRC
612	42	inverse_freq	0.617	0.347
612	123	inverse_freq	0.605	0.342
612	456	inverse_freq	0.614	0.329
612	789	inverse_freq	0.602	0.336
612	2024	inverse_freq	0.592	0.317
612	42	none	0.601	0.320
612	123	none	0.579	0.317
612	456	none	0.583	0.319
612	789	none	0.590	0.304
612	2024	none	0.599	0.339
612	42	weight_1_2	0.608	0.350
612	123	weight_1_2	0.611	0.342
612	456	weight_1_2	0.603	0.318
612	789	weight_1_2	0.585	0.302
612	2024	weight_1_2	0.599	0.319
928	42	inverse_freq	0.575	0.319
928	123	inverse_freq	0.589	0.327
928	456	inverse_freq	0.566	0.290
928	789	inverse_freq	0.579	0.278
928	2024	inverse_freq	0.581	0.304
928	42	none	0.602	0.330
928	123	none	0.587	0.302
928	456	none	0.592	0.292
928	789	none	0.608	0.291
928	2024	none	0.578	0.317
928	42	weight_1_2	0.579	0.307
928	123	weight_1_2	0.588	0.309
928	456	weight_1_2	0.561	0.290
928	789	weight_1_2	0.569	0.310
928	2024	weight_1_2	0.581	0.370

Show code

rec_compare <- rec_seeds %>%
  group_by(class_weight) %>%
  summarise(
    N = n(),
    `Test AUROC` = sprintf("%.3f (%.3f)", median(test_auroc), sd(test_auroc)),
    `Val AUROC` = sprintf("%.3f (%.3f)", median(val_auroc), sd(val_auroc)),
    `Test AUPRC` = sprintf("%.3f (%.3f)", median(test_auprc), sd(test_auprc)),
    `Val AUPRC` = sprintf("%.3f (%.3f)", median(val_auprc), sd(val_auprc)),
    .groups = "drop"
  ) %>%
  rename(`Class Weight` = class_weight)

kable(rec_compare, booktabs = TRUE,
      caption = "Candidate config: test vs validation comparison. Median (SD) across seeds and splits.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Candidate config: test vs validation comparison. Median (SD) across seeds and splits.
Class Weight	N	Test AUROC	Val AUROC	Test AUPRC	Val AUPRC
inverse_freq	10	0.591 (0.012)	0.590 (0.017)	0.292 (0.009)	0.323 (0.022)
none	10	0.597 (0.011)	0.591 (0.010)	0.292 (0.010)	0.317 (0.016)
weight_1_2	10	0.595 (0.011)	0.586 (0.017)	0.290 (0.013)	0.314 (0.025)

Show code

ggplot(rec_seeds, aes(x = test_auroc, y = val_auroc, color = class_weight)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(shape = factor(data_split_seed)), size = 3, alpha = 0.8) +
  labs(x = "Test AUROC", y = "Validation AUROC", color = "Class Weight", shape = "Split",
       title = "Test vs validation AUROC for candidate config",
       caption = "K3, LR=5e-3, dropout=0.1. Each point is one seed.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code

ggplot(rec_seeds, aes(x = test_auprc, y = val_auprc, color = class_weight)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(aes(shape = factor(data_split_seed)), size = 3, alpha = 0.8) +
  labs(x = "Test AUPRC", y = "Validation AUPRC", color = "Class Weight", shape = "Split",
       title = "Test vs validation AUPRC for candidate config",
       caption = "K3, LR=5e-3, dropout=0.1. Each point is one seed.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

3.8 Clinical Utility: Binary Predictions and Operating Points

Sections 4.1-4.7 used threshold-independent metrics (AUROC, AUPRC). This section examines binary prediction behavior, which depends on both the decision cutoff and the class weight strategy.

3.8.1 Prediction Collapse at Cutoff 0.50

Show code

collapse_all <- master %>%
  group_by(cutoff, class_weight) %>%
  summarise(
    N = n(),
    all_neg_pct = 100 * mean(test_sensitivity == 0),
    all_pos_pct = 100 * mean(test_specificity == 0),
    balanced_pct = 100 * mean(test_sensitivity > 0 & test_specificity > 0),
    .groups = "drop"
  )

collapse_wide <- collapse_all %>%
  transmute(
    Cutoff = sprintf("%.2f", cutoff),
    `Class Weight` = class_weight,
    `All-Negative (Sens=0)` = sprintf("%.1f%%", all_neg_pct),
    `All-Positive (Spec=0)` = sprintf("%.1f%%", all_pos_pct),
    Balanced = sprintf("%.1f%%", balanced_pct)
  )

kable(collapse_wide, booktabs = TRUE,
      caption = "Prediction collapse rates by cutoff and class weight (2,160 experiments per cutoff).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  collapse_rows(columns = 1, valign = "top")

Prediction collapse rates by cutoff and class weight (2,160 experiments per cutoff).
Cutoff	Class Weight	All-Negative (Sens=0)	All-Positive (Spec=0)	Balanced
0.15	inverse_freq	0.0%	99.6%	0.4%
	none	0.0%	47.4%	52.6%
	weight_1_2	0.0%	91.7%	8.3%
0.25	inverse_freq	0.0%	92.5%	7.5%
	none	0.0%	10.0%	90.0%
	weight_1_2	0.0%	59.2%	40.8%
0.35	inverse_freq	0.0%	65.0%	35.0%
	none	24.3%	3.6%	72.1%
	weight_1_2	0.0%	11.5%	88.5%
0.50	inverse_freq	0.0%	1.1%	98.9%
	none	98.3%	0.0%	1.7%
	weight_1_2	39.4%	0.1%	60.4%

The table shows how collapse rates change across cutoffs. At cutoff 0.50, none and weight_1_2 show high all-negative rates. At lower cutoffs, these rates decrease as more models’ predicted probabilities cross the threshold.

For the cutoff analysis below (Section 4.8.3), inverse_freq with cutoff 0.50 is used as one operating point. The table above shows the full picture across strategies and cutoffs.

3.8.2 Representative Confusion Matrices by Kernel

For each kernel, one model was selected from a fixed combination (inverse_freq, LR=5e-3, dropout=0.1, split 612) by choosing the seed with median test AUROC.

Show code

rep_models <- auroc_data %>%
  filter(class_weight == "inverse_freq", lr == 0.005, dropout_rate == 0.1,
         data_split_seed == 612) %>%
  group_by(kernel_id) %>%
  mutate(dist_from_median = abs(test_auroc - median(test_auroc))) %>%
  slice_min(dist_from_median, n = 1, with_ties = FALSE) %>%
  ungroup()

Show code

combo <- data.frame(` ` = c("Actual PE+", "Actual PE-", "Total"), check.names = FALSE)
header_spec <- c(" " = 1)

for (i in 1:nrow(rep_models)) {
  r <- rep_models[i, ]
  combo[[paste0("Pred+_", i)]] <- c(r$test_tp, r$test_fp, r$test_tp + r$test_fp)
  combo[[paste0("Pred-_", i)]] <- c(r$test_fn, r$test_tn, r$test_fn + r$test_tn)
  label <- sprintf("K%d (%dms)", r$kernel_idx - 1, r$rf_ms)
  header_spec[label] <- 2
}

names(combo) <- c(" ", rep(c("Pred+", "Pred-"), nrow(rep_models)))

kable(combo, booktabs = TRUE, align = c("l", rep("r", ncol(combo) - 1)),
      caption = "Representative 2x2 confusion matrices by kernel at cutoff 0.50 (inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed). Test set (n=613).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  add_header_above(header_spec) %>%
  column_spec(c(3, 5, 7, 9, 11, 13), border_right = TRUE)

Representative 2x2 confusion matrices by kernel at cutoff 0.50 (inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed). Test set (n=613).
	K0 (82ms)		K1 (84ms)		K2 (160ms)		K3 (1122ms)		K4 (1490ms)		K5 (2598ms)
	Pred+	Pred-	Pred+	Pred-	Pred+	Pred-	Pred+	Pred-	Pred+	Pred-	Pred+	Pred-
Actual PE+	107	36	70	73	89	54	70	73	76	67	106	37
Actual PE-	288	182	184	286	226	244	163	307	192	278	306	164
Total	395	218	254	359	315	298	233	380	268	345	412	201

K0-K2 show lower sensitivity. K3-K5 show a more balanced split between true positives and true negatives.

3.8.3 Cutoff Analysis for Candidate Model

The candidate model (K3, inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed) evaluated at four decision cutoffs.

Show code

rec_config <- master %>%
  filter(kernel_id == "K3", class_weight == "inverse_freq", lr == 0.005,
         dropout_rate == 0.1, data_split_seed == 612)

med_seed <- rec_config %>%
  filter(cutoff == 0.50) %>%
  mutate(dist = abs(test_auroc - median(test_auroc))) %>%
  slice_min(dist, n = 1, with_ties = FALSE) %>%
  pull(seed)

rec_all_cutoffs <- rec_config %>% filter(seed == med_seed)

Show code

cutoff_long <- rec_all_cutoffs %>%
  select(cutoff, test_sensitivity, test_specificity) %>%
  pivot_longer(cols = c(test_sensitivity, test_specificity),
               names_to = "metric", values_to = "value") %>%
  mutate(metric = ifelse(metric == "test_sensitivity", "Sensitivity", "Specificity"))

ggplot(cutoff_long, aes(x = cutoff, y = value, color = metric)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_color_manual(values = c("Sensitivity" = "#E74C3C", "Specificity" = "#2E86C1")) +
  labs(x = "Decision Cutoff", y = NULL, color = NULL,
       title = "Sensitivity-specificity tradeoff across cutoffs (test set, n=613)",
       caption = "K3, inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Test Set (613 EKG-only patients):

Show code

cutoff_summary <- rec_all_cutoffs %>%
  transmute(
    Cutoff = sprintf("%.2f", cutoff),
    Sensitivity = sprintf("%.1f%%", 100 * test_sensitivity),
    Specificity = sprintf("%.1f%%", 100 * test_specificity),
    PPV = sprintf("%.1f%%", 100 * test_precision),
    NPV = sprintf("%.1f%%", 100 * test_npv),
    F1 = sprintf("%.3f", test_f1),
    TP = test_tp, FP = test_fp, FN = test_fn, TN = test_tn
  )

kable(cutoff_summary, booktabs = TRUE,
      caption = "Cutoff analysis for candidate K3 model (test set, 613 patients).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Cutoff analysis for candidate K3 model (test set, 613 patients).
Cutoff	Sensitivity	Specificity	PPV	NPV	F1	TP	FP	FN	TN
0.15	100.0%	0.0%	23.3%	0.0%	0.378	143	470	0	0
0.25	100.0%	0.0%	23.3%	0.0%	0.378	143	470	0	0
0.35	100.0%	0.2%	23.4%	100.0%	0.379	143	469	0	1
0.50	49.0%	65.3%	30.0%	80.8%	0.372	70	163	73	307

Validation Set (132 paired patients):

Show code

rec_config_val <- master %>%
  filter(kernel_id == "K3", class_weight == "inverse_freq", lr == 0.005,
         dropout_rate == 0.1, data_split_seed == 612, seed == med_seed)

cutoff_long_val <- rec_config_val %>%
  select(cutoff, val_sensitivity, val_specificity) %>%
  pivot_longer(cols = c(val_sensitivity, val_specificity),
               names_to = "metric", values_to = "value") %>%
  mutate(metric = ifelse(metric == "val_sensitivity", "Sensitivity", "Specificity"))

ggplot(cutoff_long_val, aes(x = cutoff, y = value, color = metric)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_color_manual(values = c("Sensitivity" = "#E74C3C", "Specificity" = "#2E86C1")) +
  labs(x = "Decision Cutoff", y = NULL, color = NULL,
       title = "Sensitivity-specificity tradeoff across cutoffs (validation set, n=132)",
       caption = "K3, inverse_freq, LR=5e-3, dropout=0.1, split 612, median seed.") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Show code

cutoff_summary_val <- rec_config_val %>%
  transmute(
    Cutoff = sprintf("%.2f", cutoff),
    Sensitivity = sprintf("%.1f%%", 100 * val_sensitivity),
    Specificity = sprintf("%.1f%%", 100 * val_specificity),
    PPV = sprintf("%.1f%%", 100 * val_precision),
    NPV = sprintf("%.1f%%", 100 * val_npv),
    F1 = sprintf("%.3f", val_f1),
    TP = val_tp, FP = val_fp, FN = val_fn, TN = val_tn
  )

kable(cutoff_summary_val, booktabs = TRUE,
      caption = "Cutoff analysis for candidate K3 model (validation set, 132 patients).") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Cutoff analysis for candidate K3 model (validation set, 132 patients).
Cutoff	Sensitivity	Specificity	PPV	NPV	F1	TP	FP	FN	TN
0.15	100.0%	0.0%	23.5%	0.0%	0.380	31	101	0	0
0.25	100.0%	0.0%	23.5%	0.0%	0.380	31	101	0	0
0.35	100.0%	0.0%	23.5%	0.0%	0.380	31	101	0	0
0.50	61.3%	57.4%	30.6%	82.9%	0.409	19	43	12	58

3.9 Validation vs Test Agreement

Show code

ggplot(auroc_data, aes(x = test_auroc, y = val_auroc, color = kernel_id)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(alpha = 0.5, size = 1.5) +
  scale_color_brewer(palette = "Set2", name = "Kernel") +
  labs(x = "Test AUROC", y = "Validation AUROC",
       title = "Validation vs test AUROC (all 2,160 experiments)",
       caption = "Dashed line = perfect agreement.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code

ggplot(auroc_data, aes(x = test_auprc, y = val_auprc, color = kernel_id)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "grey50") +
  geom_point(alpha = 0.5, size = 1.5) +
  scale_color_brewer(palette = "Set2", name = "Kernel") +
  labs(x = "Test AUPRC", y = "Validation AUPRC",
       title = "Validation vs test AUPRC (all 2,160 experiments)",
       caption = "Dashed line = perfect agreement.") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

Show code

cor_val <- cor(auroc_data$test_auroc, auroc_data$val_auroc, method = "spearman")
cat(sprintf("Spearman correlation between test and validation AUROC: %.3f\n", cor_val))

Spearman correlation between test and validation AUROC: -0.199

Spearman correlation si slightly negative, but not worrisome.

3.10 Summary

Show code

bottom <- tibble(
  Item = c("Candidate kernel",
           "Class weight effect on AUROC",
           "Learning rate",
           "Dropout",
           "Test AUROC (median, all class weights)",
           "Test AUPRC (median, all class weights)",
           "Binary predictions at cutoff 0.50",
           "Next step"),
  Finding = c("K3 (1,122ms receptive field); significantly outperforms K0-K2, comparable to K4-K5",
              "No effect; all three strategies produce similar AUROC/AUPRC",
              "Stable across tested range (1e-5 to 5e-3); 5e-3 carried forward",
              "Stable (0.1 vs 0.2); 0.1 carried forward",
              "~0.590",
              "~0.290 (~1.3x prevalence)",
              "Requires inverse_freq weighting to avoid collapse; other strategies need lower cutoff",
              "Phase 3: warm fusion with CXR features using paired data")
)

kable(bottom, booktabs = TRUE, caption = "Summary of findings from 2,160 EKG Specialist V3 experiments.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE, width = "5cm") %>%
  column_spec(2, width = "9cm")

Summary of findings from 2,160 EKG Specialist V3 experiments.
Item	Finding
Candidate kernel	K3 (1,122ms receptive field); significantly outperforms K0-K2, comparable to K4-K5
Class weight effect on AUROC	No effect; all three strategies produce similar AUROC/AUPRC
Learning rate	Stable across tested range (1e-5 to 5e-3); 5e-3 carried forward
Dropout	Stable (0.1 vs 0.2); 0.1 carried forward
Test AUROC (median, all class weights)	~0.590
Test AUPRC (median, all class weights)	~0.290 (~1.3x prevalence)
Binary predictions at cutoff 0.50	Requires inverse_freq weighting to avoid collapse; other strategies need lower cutoff
Next step	Phase 3: warm fusion with CXR features using paired data

Discussion. Kernel size is the primary hyperparameter affecting AUROC: K3 (1,122ms) significantly outperforms K0-K2, while K4-K5 show no further gain. Class weight, learning rate, and dropout do not meaningfully affect AUROC or AUPRC within the tested ranges. Class weight becomes relevant only for binary predictions: at cutoff 0.50, inverse_freq is the only strategy producing non-degenerate predictions, but this reflects the cutoff choice, not the model ranking quality. The EKG specialist achieves ~0.59 AUROC, which is above chance but limited. The candidate configuration (K3, LR=5e-3, dropout=0.1) will proceed to Phase 3 warm fusion, where both the EKG and CXR specialist weights are fine-tuned jointly on paired CXR+EKG data.

1 Summary

2 Experiment Design

2.1 Architecture

2.1.1 Receptive Field Design Rationale

2.2 Hyperparameter Grid

2.3 Data

3 Results

3.1 The Full Landscape (2,160 experiments)

3.2 Class Weight (2,160 experiments)

3.3 Kernel Size (2,160 experiments)

3.3.1 Statistical Test

3.4 Learning Rate

3.5 Dropout

3.6 Data Split Robustness

3.7 Candidate Configuration

3.7.1 Validation Set Performance

3.8 Clinical Utility: Binary Predictions and Operating Points

3.8.1 Prediction Collapse at Cutoff 0.50

3.8.2 Representative Confusion Matrices by Kernel

3.8.3 Cutoff Analysis for Candidate Model

3.9 Validation vs Test Agreement

3.10 Summary

3.10.1 PEPCEI Pipeline