Notation: \(n_i, i = 1, ..., 4\) = the number of total samples that have stage \(i\) in the training data
\(y_i, i = 1, ..., 4\) = the number of positive calls among patients in the stage \(i\) category
\(w_i, i = 1, ..., 4\) = the stage-weighted multiplier for stage \(i\); i.e. \((w_1, w_2, w_3, w_4) = (0.547, 0.075, 0.218, 0.16)\)
Per training donor, define
\(c_j, j = 1, ..., n\) = the call for each sample \(i\)
\(s_j, j = 1, ..., n\) = the stage of each sample \(i\)
Then, \(y_i = \sum_{j = 1}^n c_j \boldsymbol{1}\left(s_j = i \right)\)
Blended sensitivity can be expressed as:
\[ \sum_{i = 1}^{4} \left(\frac{w_i y_i}{n_i} \right) \]
Weighted sensitivity can be expressed as:
\[ \sum_{j = 1}^{n} \left(\frac{w_{s_j}}{\sum_{j = 1}^{n} w_{s_j}} \right) c_j \\ = \sum_{i = 1}^{4} \left(\frac{w_{i}}{\sum_{j = 1}^{n} w_{s_j}} \right) \sum_{j = 1}^{n}c_j \boldsymbol{1}\left(s_j = i \right) \\ = \sum_{i = 1}^{4} \left(\frac{w_{i} y_i}{\sum_{j = 1}^{n} w_{s_j}} \right) \] The difference between the two estimates lies in the denominator within the sum; \(n_i\) vs \(\sum_{j = 1}^{n} w_{s_j}\).
We can define the ‘true’ sensitivity as what we would observe in an infinitely large validation set with stage-wise proportion of samples equal to the stage-weights \(w_i\): \[ w_i = \lim_{n \to \infty}\frac{n_i}{n} \\ = \lim_{n \to \infty}\frac{ \sum_{j = 1}^{n} \boldsymbol{1}\left(s_j = i \right)}{n} \\ = \lim_{n \to \infty}\frac{ \sum_{j = 1}^{n} \boldsymbol{1}\left(s_j = i \right)}{\sum_{i = 1}^{4} \sum_{j = 1}^{n}\boldsymbol{1}\left(s_j = i \right)} \text{ for } i= 1,...,4 \] We can show that true sensitivity = \(\lim_{n \to \infty} \frac{\sum_{j = 1}^n c_j}{n}\)
Let \(p_i, i = 1, ..., 4\) = the true stage-wise sensitivities in this infinitely large validation set: \[ p_i = \lim_{n \to \infty} \frac{\sum_{j = 1}^{n}c_j \boldsymbol{1}\left(s_j = i \right)}{\sum_{j = 1}^{n} \boldsymbol{1}\left(s_j = i \right)} \\ = \lim_{n \to \infty} \frac{y_i}{n_i} \text{ for } i= 1,...,4 \] Then, the true validation sensitivity is equal to \(\sum_{i = 1}^4 p_i w_i\), exactly how we calculate blended sensitivity. To see that in detail: \[ \text{True sensitivity }= \lim_{n \to \infty} \frac{\sum_{j = 1}^n c_j}{n} \\ = \lim_{n \to \infty} \frac{\sum_{i = 1}^{4} \sum_{j = 1}^{n}c_j \boldsymbol{1}\left(s_j = i \right)}{\sum_{i = 1}^{4} \sum_{j = 1}^{n}\boldsymbol{1}\left(s_j = i \right)} \\ = \lim_{n \to \infty} \sum_{i = 1}^{4} \left( \frac{\sum_{j = 1}^{n}c_j \boldsymbol{1}\left(s_j = i \right)}{\sum_{j = 1}^{n} \boldsymbol{1}\left(s_j = i \right)}\right) \left(\frac{ \sum_{j = 1}^{n} \boldsymbol{1}\left(s_j = i \right)}{\sum_{i = 1}^{4} \sum_{j = 1}^{n}\boldsymbol{1}\left(s_j = i \right)}\right) \\ = \sum_{i = 1}^{4} p_i \times w_i \]
Seems like it would always be advantageous to use blended sensitivity in cross-validation then, given the assumption that the training and test data are drawn from a common distribution - in expectation, blended sensitivity equals true validation sensitivity.
We can also show that weighted sensitivity will always be less or equal to blended sensitivity, and only equal if \(\frac{n_i}{4} = 1/4\). Weighted sensitivity \(\leq\) blended sensitivity iff:
\[ \sum_{j = 1}^{n} w_{s_j} \geq n_{i^{'}} \text{ for all } i^{'} = 1, ..., 4 \\ \therefore \sum_{i = 1}^{4} \sum_{j = 1}^{n} w_{s_j} \boldsymbol{1}\left(s_j = i \right) \geq n_{i^{'}} \\ \therefore \sum_{i = 1}^{4} n_i w_i \geq n_{i^{'}} \\ \therefore \sum_{i = 1}^{4} \left(\frac{n_i}{n}\right) w_i \geq \frac{n_{i^{'}}}{n} \\ \text{ which is always true} \] The above comparison also shows that the only time in which weighted sensitivity = blended sensitivity is when \(\frac{n_i}{n} = 1/4\) for all \(i = 1, ..., 4\). Otherwise, weighted sensitivity is always underestimating the true sensitivity given our assumptions. Given the above notation: blended sensitivity - weighted sensitivity = \[ \sum_{i = 1}^{4} \left(\frac{w_i y_i}{n_i} \right) \left(\frac{n_i - \sum_{j = 1}^{n} w_{s_j}}{\sum_{j = 1}^{n} w_{s_j}} \right) \] - this also gives the expected bias of weighted sensitivity compared to true validation sensitivity.
We can simulate small-sample estimates of blended and weighted sensitivity and estimate how these estimates compare with true sensitivity as follows:
Simulate training donor stage assignments and calls according to these distributions: \[ p_i, i = 1, ...,4 = \text{ the true sensitivities per stage}\\ \text{stage}_j \sim multinomial(w_i) \\ \text{call}_j \sim bernoulli(p_{\text{stage}_j}) \\ \text{for } j = 1,...,n_{training} \]
set.seed(777)
# p = true stage-wise sensitivities in validation set
simulate_calls <- function(n = 1000,
p = c(.7, .78, .85, .95),
w = c(0.547, 0.075, 0.218, 0.16),
seed = 777) {
set.seed(seed)
stage_sim <- apply(stats::rmultinom(n = n, size = 1, prob = w), 2, which.max)
call_sim <- sapply(1:length(stage_sim), \(i) stats::rbinom(n = 1, size = 1, prob = p[stage_sim[i]]))
stage_names <- c("Stage I", "Stage II", "Stage III", "Stage IV")
sim_data <- data.frame(stage = stage_names[stage_sim], calls = call_sim)
return(sim_data)
}
# testing function
call_sim <- simulate_calls()
#check that the proportions of simulated stages in the training set are close to w:
table(call_sim$stage)/length(call_sim$stage)
##
## Stage I Stage II Stage III Stage IV
## 0.551 0.065 0.241 0.143
#check that the simulated stage-wise sensitivities are close to p:
call_sim |>
dplyr::group_by(stage) |>
dplyr::summarise(sensitivity = mean(calls))
## # A tibble: 4 × 2
## stage sensitivity
## <chr> <dbl>
## 1 Stage I 0.706
## 2 Stage II 0.831
## 3 Stage III 0.859
## 4 Stage IV 0.965
Function to calculate weighted sensitivity:
calculate_weighted_sensitivity <- function(df,
w = c(0.547, 0.075, 0.218, 0.16)) {
weighted_df <-
data.frame(
stage = c("Stage I", "Stage II", "Stage III", "Stage IV"),
weights = w
) |>
dplyr::mutate(weights = weights / sum(weights))
call_sim |>
dplyr::left_join(weighted_df, by = "stage") |>
dplyr::group_by(stage) |>
dplyr::mutate(weights = weights / sum(weights)) |>
dplyr::ungroup() |>
dplyr::summarise(sensitivity = sum(calls * weights))
df |>
dplyr::left_join(weighted_df, by = "stage") |>
dplyr::summarise(sensitivity = stats::weighted.mean(x = calls, w = weights)) |>
dplyr::pull(sensitivity)
}
# testing function
calculate_weighted_sensitivity(call_sim)
## [1] 0.7441615
Function to calculate blended sensitivity:
calculate_blended_sensitivity <- function(df,
w = c(0.547, 0.075, 0.218, 0.16)) {
df |>
dplyr::group_by(stage) |>
dplyr::summarise(sensitivity = mean(calls)) |>
dplyr::left_join(
data.frame(
stage = c("Stage I", "Stage II", "Stage III", "Stage IV"),
weights = w
),
by = "stage"
) |>
dplyr::mutate(w_sensitivity = sensitivity * weights) |>
dplyr::summarise(
sensitivity = sum(w_sensitivity)
) |>
dplyr::pull(sensitivity)
}
# testing function
calculate_blended_sensitivity(call_sim)
## [1] 0.7901341
Calculate true sensitivity:
# stage-wise sensitivity:
p <- c(.7, .78, .85, .95)
# stage-weights
w <- c(0.547, 0.075, 0.218, 0.16)
true_sens <- sum(p * w)
## [1] "true sensitivity: 0.7787"
## [1] "blended sensitivity - true sensitivity: 0.0114"
## [1] "weighted sensitivity - true sensitivity: -0.0345"
Function to repeat simulation over repeats:
repeat_sim <- function(reps = 1000,
n = 1000,
p = c(.7, .78, .85, .95),
w = c(0.547, 0.075, 0.218, 0.16),
w_train = c(0.547, 0.075, 0.218, 0.16)) {
furrr::future_map(1:reps, \(i){
call_sim <- simulate_calls(
n = n,
p = p,
w = w_train,
seed = i
)
weighted_sens <- calculate_weighted_sensitivity(call_sim, w = w)
blended_sens <- calculate_blended_sensitivity(call_sim, w = w)
return(data.frame(weighted_sens = weighted_sens, blended_sens = blended_sens))
},
.options = furrr::furrr_options(seed = T)
) |>
dplyr::bind_rows() |>
dplyr::mutate(true_sens = sum(w * p))
}
Evaluate simulation with different number of training samples:
sim_n10e3 <- repeat_sim(n = 1000) |> dplyr::mutate(n = 1000)
sim_n10e4 <- repeat_sim(n = 10000) |> dplyr::mutate(n = 10000)
sim_n10e5 <- repeat_sim(n = 50000) |> dplyr::mutate(n = 50000)
sim_concat <- dplyr::bind_rows(sim_n10e3, sim_n10e4, sim_n10e5) |>
dplyr::mutate(n = paste0("n = ", n))
Plotting function
plotting_fun <- function(sim_data) {
reshape2::melt(sim_data |> dplyr::select(-true_sens), id = "n") |>
ggplot2::ggplot(ggplot2::aes(x = variable, y = value)) +
ggplot2::geom_boxplot() +
ggplot2::facet_wrap(~n) +
ggplot2::geom_abline(ggplot2::aes(intercept = sim_data$true_sens[1], slope = 0),
linetype = "dashed", col = "red"
) +
ggplot2::theme_minimal() +
ggplot2::xlab("") +
ggplot2::ylab("Estimated sensitivity")
}
plotting_fun(sim_concat)
Evaluate results with higher true stage-wise sensitivities and when the stage-wise proportions in the training set do not equal the validation stage-wise proportions (which are used as the stage-weights)
new_p <- c(.8, .86, .91, .98)
w_train = c(.4, .2, .15,.25)
sim_n10e3_new <- repeat_sim(n = 1000, p = new_p, w_train = w_train) |> dplyr::mutate(n = 1000)
sim_n10e4_new <- repeat_sim(n = 10000, p = new_p, w_train = w_train) |> dplyr::mutate(n = 10000)
sim_n10e5_new <- repeat_sim(n = 50000, p = new_p, w_train = w_train ) |> dplyr::mutate(n = 50000)
sim_concat_new <- dplyr::bind_rows(sim_n10e3_new, sim_n10e4_new, sim_n10e5_new) |>
dplyr::mutate(n = paste0("n = ", n))
plotting_fun(sim_concat_new)
The 2 estimates are only equivalent when the stage-wise proportions in the training set are all equal (i.e. the norm of w_train is the lowest it possibly can be, and equal to 1/4). If the validation set has those same equal stage-wise proportions, the same result holds, but this is not necessary for equivalence.
new_p <- c(.8, .86, .91, .98)
w_train = c(.25, .25, .25, .25)
sim_n10e3_new <- repeat_sim(n = 1000, p = new_p, w_train = w_train) |> dplyr::mutate(n = 1000)
sim_n10e4_new <- repeat_sim(n = 10000, p = new_p, w_train = w_train) |> dplyr::mutate(n = 10000)
sim_n10e5_new <- repeat_sim(n = 50000, p = new_p, w_train = w_train ) |> dplyr::mutate(n = 50000)
sim_concat_new <- dplyr::bind_rows(sim_n10e3_new, sim_n10e4_new, sim_n10e5_new) |>
dplyr::mutate(n = paste0("n = ", n))
plotting_fun(sim_concat_new)
However, even if the validation set stage-wise proportions (used as stage-weights) are all equal to .25, only blended sensitivity accurately estimates true sensitivity whereas weighted-sensitivity does not.
new_p <- c(.8, .86, .91, .98)
new_w = c(.25, .25, .25, .25)
sim_n10e3_equal <- repeat_sim(n = 1000, p = new_p, w = new_w) |> dplyr::mutate(n = 1000)
sim_n10e4_equal <- repeat_sim(n = 10000, p = new_p, w = new_w) |> dplyr::mutate(n = 10000)
sim_n10e5_equal <- repeat_sim(n = 50000, p = new_p, w = new_w) |> dplyr::mutate(n = 50000)
sim_concat_equal <- dplyr::bind_rows(sim_n10e3_equal, sim_n10e4_equal, sim_n10e5_equal) |>
dplyr::mutate(n = paste0("n = ", n))
plotting_fun(sim_concat_equal)