My scripts:

  • rendered html versions: Rpubs/thomas-weissensteiner
  • .rmd files with executable code chunks: www.github.com/thomas-weissensteiner/portfolio/tree/main/

About myself: www.linkedin.com/in/ThomasWs-Mopfair



1. Background and motivation


Current dosing practices are based on average responses from limited clinical trials. However, overlooking variations such as to sex, age, and genetics can lead to adverse events in certain patient groups. Conversely, a new drug which might have benefit for a sub-group of patients might fail to get approval because the pivotal trial showed insufficient general efficacy in the general population.

The scripts were written for the VIS-SIG Wonderful Wednesdays challenge 10 Oct 24 (link to recording). “Wonderful Wednesdays” are a monthly webinar organised by the Visualisation Special Interest Group of Statisticians in the Pharmaceutical Industry (PSI).

The task consisted of identifying patient subgroups with dose responses that deviate from the remaining study participants, and who therefore might benefit from personalized dosing.

2. R packages and data set


# - set-up                                                     - #

library(dplyr)        # version 1.1.4  # general grammar
library(purrr)        # version 1.0.2  # set_names, list_rbind
library(tidyr)        # version 1.3.1  # pivot_longer, pivot_wider
library(ggpubr)       # version , includes ggplot2 3.4.0 # stat_compare_means, ggtexttable, ggarrange
library(kableExtra)   # rendering tables in HTML format

exampleData <- read.csv("https://raw.githubusercontent.com/VIS-SIG/Wonderful-Wednesdays/refs/heads/master/data/2024/2024-09-11/WWW54ExampleData.csv")

The data were a simulated dose response set with the following structure:

dose: dose levels (0, 100, 200 mg)
target: continuous response variable (higher is better)

subgroup variables:
bmi (body mass index: ‘low BMI’ or ‘high BMI’)
age (‘<40 years’ or ‘≥40 years’)
race (‘Black’, ‘Asian’, ‘White’)
sex (‘Female’, ‘Male’)
type (type of disease: ‘Acute disease’ or ‘Chronic disease’)

3. Target vs. dose in all patients

# - Chunk_1                                              - #
# - Requires R-libraries and object exampleData (set-up) - #

## Plot distribution of target values at different doses in the total population

sign_test <- 
  exampleData %>% 
  compare_means(
  data = .,
  target ~ dose,
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme(
      "light", 
      base_size = 10
      ),
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(
  
exampleData %>% 
  select (dose, target) %>% 
  
  ggplot (
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200"
        )
      ), 
      y = target)
    ) +
  geom_hline(
    yintercept = 0, 
    color = "darkgrey",
    linewidth = 1.1
    ) +
  geom_violin(
    color = "lightgrey",
    fill  = "lightgrey", 
    width = 0.45
    ) +
  geom_jitter(
    height = 0, width = 0.1, 
    size = 2, 
    alpha = 0.2
    ) +
  geom_boxplot(
    width = 0.15 , 
    outlier.shape = NA,
    color = "salmon",
    lwd = 0.7,
    alpha = 0.5
    ) +
  theme_light() +
  theme(
    text=element_text(size = 13), 
    axis.text = element_text(
      size = 12),
    ) +
  scale_y_continuous(
    breaks = c(-2: 5),
    minor_breaks = NULL
    ) +
  labs(
    x = "Dose", 
    y = "Target", 
    size = 12, hjust = -15,
    title = "Target vs. dose levels in the total patient population\n"
    ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1)
)   



The reference for the Wilcoxon test is the group of untreated patients (dose = 0). Differences in target levels in treated patients are significant. However, the distribution of target values at the three dose levels suggests that at least two subgroups might exists that could respond differently.

4. Best response in patient subgroups


In order to identify these potential groups, I stratified patients by one or two of the clinical and demographic characteristics that were provided. For each group, I then calculated the median response for doses 100 and 200 (difference between median target values in the treated subgroups and patients receiving a dose 0).

# - Chunk_2                                              - #
# - Requires R-libraries and object exampleData (set-up) - #

## Table of responses in patient subgroups characterised by one or two variables

# Generate all pairwise combinations of patient characteristics
pVars <- 
  exampleData %>% 
  names() %>% .[!. %in% c("dose","target")]

pVar_pairs <- 
  pVars %>% 
  { c(
    lapply(., function(x) c(x, x)),       # duplicate each characteristic
    combn(.,2, simplify = F)              # all combinations of 2 characteristics
  ) }
 
# ---
pChar_pairs <- 
  lapply(
    seq_along(pVar_pairs), 
    function (x) 
      exampleData %>% 
      select(pVar_pairs[[x]] ) %>% 
      unique(.)
    )
# ---

# Calculate median responses for the patient subgroups 
response_tbl <- 
  lapply(
    pVar_pairs, 
    function(x) 
      exampleData %>% 
        group_by(
          across(all_of(x) ), 
          dose
          ) %>% 
      summarise(
        median(target),
        N = n()
      )
  ) %>%
  # ensure all dataframes have the same number of columns 
    lapply(function(df)
      if(
        ncol(df) < length(pVars)) {
        cbind(df[, 1, drop = FALSE], df)    
        }else{
          df} 
      ) %>% 
  lapply(function(df)
    pivot_wider(
      df, 
      names_from = "dose",
      values_from = c("median(target)", "N")
      ) %>%
    set_names(
     "Var1_value", "Var2_value", 
     "median_0", "median_100", "median_200", 
      "N_0", "N_100", "N_200"
      )
    ) %>% 
  list_rbind() %>%
  mutate(
    response_100 = median_100 - median_0,
    response_200 = median_200 - median_0,
    max_response = max(response_100, response_200) ,
    max_resp_dose = which.max(c(response_100, response_200))*100,
    N_max = c(N_100, N_200)[max_resp_dose/100]
    )


## Table of best median responses stratified by patient subgroups

# Re-order columns, sort by maximum response (highest first)
response_tbl %>%   
  select( 
    "Var1_value", "Var2_value", 
    "N_max", "max_response",
    "N_0", "N_100", "N_200",
    "median_0", "median_100", "median_200"
    ) %>% 
  arrange( desc(max_response)) %>% 
  kable(., "html") %>% 
  scroll_box(width = "100%", height = "200px") %>% 
  kable_styling( 
    bootstrap_options = c(
      "striped", "hover", "responsive", full_width = F) 
    ) 


# Optional: rearrange so that diagonal of the plot shows responses in order of magnitude
response_tbl <- 
  response_tbl %>%
  mutate(
    .before = Var1,
    Var1 = case_when (
      Var1_value %in% c("high BMI", "low BMI") ~ "BMI",
      Var1_value %in% c(">=40 years", "<40 years") ~ "Age",
      Var1_value %in% c("Asian", "Black", "White") ~ "Ethnic",
      Var1_value %in% c("Chronic disease", "Acute disease") ~ "Type",
      Var1_value %in% c("Female", "Male") ~ "Sex"
      ),
    Var2 = case_when (
      Var2_value %in% c("high BMI", "low BMI") ~ "BMI",
      Var2_value %in% c(">=40 years", "<40 years") ~ "Age",
      Var2_value %in% c("Asian", "Black", "White") ~ "Ethnic",
      Var2_value %in% c("Chronic disease", "Acute disease") ~ "Type",
      Var2_value %in% c("Female", "Male") ~ "Sex"
      )
    ) %>% 
    mutate(
      pVar_equal = (Var1 == Var2)
      ) %>%  
    { rbind( 
        filter(., pVar_equal) %>% 
          group_by(Var1) %>%
          arrange(desc(max_response), .by_group = TRUE) %>%
          mutate(max_value = max(max_response, na.rm = TRUE)) %>%
          arrange(desc(max_value)) %>% 
          select (!max_value),
        filter(., !pVar_equal) 
        ) 
      } %>% 
# Add column for joining with Wilcox and Mood test results (chunks 3b and 3c)
  unite(
    Vars1_2,
    c(Var1_value, Var2_value), 
    sep = "_", 
    remove = F
    ) 


4.1. Best responses in patients stratified by demographic and clinical parameters - showing patient numbers in treated and matched untreated subgroups

# - Chunk_3a                                              - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object response_tbl (chunk_2)               - #


# Select and reshape  

response_tbl %>% 
  select(
    Var1,
    Var1_value, Var2_value, 
    N_max, N_0, 
    max_resp_dose, max_response
    ) %>% 
  pivot_wider(
    names_from = "Var2_value",
    values_from = c(
      "max_resp_dose", "max_response", "N_max", "N_0"), 
      names_sep = "-"
      ) %>% 
  pivot_longer(
    cols = !c("Var1", "Var1_value"),
    names_to = c("resp", "Var2_value"),
    values_to = c("max_response"), names_sep = "-"
    ) %>% 
  pivot_wider(
    names_from = "resp", 
    values_from = "max_response") %>% 
  mutate(
    Var1 = as.factor(Var1)
    ) %>% 

# Plot data
  ggplot(
    aes(
      x = Var1_value %>% 
        factor(., levels = unique(.) %>% rev ), 
      y = Var2_value %>% 
        factor(., levels = unique(.) %>% rev ),
      col   = max_response,
      label = max_response %>% round(3) 
      )
    ) +
    geom_tile(
      aes(fill= factor(max_resp_dose)),
      col="lightgrey"
      ) +
    scale_fill_manual(
      values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
      na.translate = F
      ) +
    geom_text(
      col = "black", 
      vjust = 2.5, hjust = 0.5, 
      size = 10/.pt
      ) +
    geom_point(
      aes( size = N_0), 
      shape = 15, 
      color = "white",
      position = position_nudge(y = 0.1)
      ) +
    geom_point(
      aes( size = N_max), 
      shape = 19,
      position = position_nudge(y = 0.1)
      ) +
    geom_point(
      aes( size = N_0), 
      shape = 22, 
      color = "darkblue",
      position = position_nudge(y = 0.1)
      ) +
    scale_size(range = c(1, 15), name = "N") +
    labs(
      x = NULL, y = NULL, 
      col = "Maximum\nresponse",
      fill = "Dose",
      title = "Best median response, stratified by one or two patient characteristics") +
    scale_color_gradient2(
      low = "white", high="darkblue", 
      limits = c(-0.25, 1.5)
      ) +
    scale_x_discrete(
      position = "top"
      ) +
    scale_y_discrete() +
    theme_light(base_size = 15) + 
    theme(
      axis.text.x = element_text(
        angle = 90, 
        vjust = 0.5,
        hjust = 0
        ),
      panel.grid = element_blank() 
    )

response_tbl %>% 
  select(
    Var1, Var1_value, 
    Var2, Var2_value, 
    max_response, max_resp_dose, N_max, 
    N_0, median_0, 
    N_100, median_100,
    N_200, median_200
    ) %>%
  arrange(
    desc(max_response) ) %>% 
  kable(., "html",
         caption = "Maximum response in each patient group, in descending order") %>% 
    scroll_box(width = "120%", height = "150px") %>% 
    kable_styling( 
      font_size = 12,
      bootstrap_options = "striped"
      )
Maximum response in each patient group, in descending order
Var1 Var1_value Var2 Var2_value max_response max_resp_dose N_max N_0 median_0 N_100 median_100 N_200 median_200
Age <40 years Sex Female 1.4738253 200 77 80 -0.0936159 85 0.9365259 77 1.3802094
Ethnic White Type Acute disease 1.4209837 200 52 59 0.1040044 68 0.6494141 52 1.5249881
Ethnic White Sex Female 1.4080069 200 74 62 0.0783431 72 0.9754775 74 1.4863500
Sex Female Type Acute disease 1.4073091 200 59 70 0.0783431 62 1.1145576 59 1.4856522
Ethnic Black Type Acute disease 1.3886972 100 19 26 -0.4210677 19 0.9676295 22 0.4197576
BMI low BMI Sex Female 1.3772039 200 69 85 0.0179963 78 0.8670915 69 1.3952003
Age <40 years Ethnic White 1.3769607 200 67 66 -0.1072127 76 0.6428964 67 1.2697480
Ethnic Asian Sex Female 1.3629210 200 45 60 -0.1107637 50 0.9448312 45 1.2521573
Age >=40 years Type Acute disease 1.3518500 200 55 68 -0.0903008 60 0.7337059 55 1.2615492
Type Acute disease Type Acute disease 1.3149731 200 120 135 -0.0743407 129 0.7299124 120 1.2406324
BMI high BMI Sex Female 1.3101590 200 81 65 -0.0404110 72 1.1110143 81 1.2697480
Sex Female Sex Female 1.2942516 200 150 150 0.0092135 150 0.9423150 150 1.3034651
Sex Female Type Chronic disease 1.2642623 200 91 80 -0.0349792 88 0.8593007 91 1.2292830
Age <40 years Type Acute disease 1.2573735 200 65 67 -0.0743407 69 0.7294408 65 1.1830328
BMI low BMI Type Acute disease 1.2362381 200 57 68 0.0509698 71 0.6381219 57 1.2872080
BMI low BMI Ethnic White 1.2238384 200 74 73 0.0403501 77 0.5240786 74 1.2641885
Age >=40 years Sex Female 1.1970881 200 73 70 0.0430077 65 0.9744453 73 1.2400958
BMI high BMI Type Acute disease 1.1837232 100 58 67 -0.2336337 58 0.9500895 63 0.9350078
Ethnic White Ethnic White 1.1166348 200 144 119 0.0403501 144 0.5907656 144 1.1569848
Age >=40 years Ethnic Black 1.1131057 100 19 33 0.0271723 19 1.1402780 25 0.7689209
BMI low BMI Age <40 years 1.0661942 200 73 82 -0.1320552 84 0.5855097 73 0.9341390
BMI low BMI Ethnic Asian 1.0381097 200 50 53 -0.1972625 49 0.6504704 50 0.8408472
Age <40 years Age <40 years 1.0314618 200 143 144 -0.1032044 164 0.6061683 143 0.9282574
Ethnic Asian Type Acute disease 1.0228892 200 46 50 -0.1666440 42 0.8153932 46 0.8562452
Age <40 years Type Chronic disease 1.0150956 200 78 77 -0.1406907 95 0.5859389 78 0.8744048
BMI low BMI BMI low BMI 1.0041402 200 146 157 -0.0589245 151 0.5962553 146 0.9452158
Sex Male Type Acute disease 1.0002103 200 61 65 -0.1684336 67 0.4530151 61 0.8317767
BMI low BMI Type Chronic disease 0.9663556 200 89 89 -0.0895662 80 0.5915215 89 0.8767894
Ethnic White Type Chronic disease 0.9598003 200 92 60 -0.0663296 76 0.5824246 92 0.8934708
BMI high BMI Age <40 years 0.9355423 200 70 62 -0.0187840 80 0.6653357 70 0.9167583
BMI low BMI Age >=40 years 0.9289176 200 73 75 0.0271723 67 0.6504704 73 0.9560899
BMI high BMI Ethnic White 0.8957185 200 70 46 0.0231029 67 0.6272804 70 0.9188214
Ethnic White Sex Male 0.8795431 200 70 57 -0.0397959 72 0.2292620 70 0.8397472
Age <40 years Ethnic Asian 0.8610026 200 53 56 -0.1666440 59 0.5917255 53 0.6943586
BMI high BMI BMI high BMI 0.8563161 200 154 123 -0.0397959 149 0.6116301 154 0.8165202
Age >=40 years Ethnic White 0.8418724 200 77 53 0.1142174 68 0.5675774 77 0.9560899
BMI low BMI Sex Male 0.8092369 200 77 72 -0.1148783 73 0.2511608 77 0.6943586
Ethnic Asian Ethnic Asian 0.8071959 200 108 106 -0.1371630 108 0.6662038 108 0.6700329
Age >=40 years Age >=40 years 0.8054468 200 157 136 0.0225843 136 0.5995700 157 0.8280312
Age >=40 years Ethnic Asian 0.7938925 100 49 50 -0.1107637 49 0.6831288 55 0.6657136
BMI high BMI Ethnic Asian 0.7914879 100 59 53 -0.1083591 59 0.6831288 58 0.6097214
Ethnic Asian Sex Male 0.7794252 200 63 46 -0.2523170 58 0.4049125 63 0.5271082
BMI high BMI Age >=40 years 0.7624675 200 84 61 -0.0404110 69 0.5920707 84 0.7220564
Type Chronic disease Type Chronic disease 0.7574756 200 180 145 0.0164722 171 0.5721217 180 0.7739478
BMI low BMI Ethnic Black 0.7561552 100 25 31 0.1919490 25 0.9481041 22 0.4964860
Ethnic Black Sex Female 0.7044310 200 31 28 0.2305768 28 0.7605207 31 0.9350078
Age >=40 years Type Chronic disease 0.6906872 200 102 68 0.0495966 76 0.5632136 102 0.7402838
Sex Male Sex Male 0.6787458 200 150 130 -0.0819534 150 0.3836353 150 0.5967923
Ethnic Asian Type Chronic disease 0.6786501 100 66 56 -0.1107637 66 0.5678864 62 0.5051804
Age >=40 years Sex Male 0.6711926 200 84 66 -0.0212261 71 0.4517713 84 0.6499665
Age <40 years Sex Male 0.5360498 200 66 64 -0.1184169 79 0.2307690 66 0.4176329
Ethnic Black Ethnic Black 0.4771506 200 48 55 0.2162899 48 0.5984809 48 0.6934405
BMI high BMI Sex Male 0.4547091 200 73 58 -0.0187840 77 0.3924904 73 0.4359251
Ethnic Black Sex Male 0.3691546 100 20 27 0.2162899 20 0.5854445 17 0.2796167
Age <40 years Ethnic Black 0.3291949 200 23 22 0.3089670 29 0.5962553 23 0.6381619
Sex Male Type Chronic disease 0.3281991 200 89 65 0.0403501 83 0.1721311 89 0.3685491
Ethnic Black Type Chronic disease 0.3114366 200 26 29 0.4957048 29 0.5701826 26 0.8071414
BMI high BMI Ethnic Black 0.2817528 200 26 24 0.4770672 23 0.4165172 26 0.7588200
BMI high BMI Type Chronic disease 0.2493957 200 91 56 0.4476301 91 0.5636511 91 0.6970257


Tiles in the plot correspond to patient groups characterised by the row and column labels. Magnitude of the maximum median response in each patient group is shown by the colour of the dots and the numerical values below them. The numbers (N) of patients in the different groups is indicated by the size of the dots (treated), and white squares with dark blue border (untreated). The dose that achieved the maximum median response is shown by the colour of the tiles (dark green: 100, lighter green: 200).

The diagonal shows groups stratified by a single characteristic, arranged so that the magnitude of the responses increases from bottom left to top right. The acute disease type was associated with the highest response, followed by female sex, and White ethnic. Conversely, patients with chronic disease, or were male or Black, had the lowest responses. BMI and age had relatively smaller effects as a single stratifier.

When two characteristics were applied, females <40 years had the highest response, followed by Whites with acute disease, White females, and females or Blacks with acute disease. The group wit the lowest median response were patients with chronic disease and high BMI. Low responses were also observed in patients who were Black and male, <40 years, with chronic disease, or high BMI. This is in contrast to the Blacks with acute disease having among the highest responses. However, numbers in these groups were relatively small and the result might need confirmation in a larger sample.

In most groups, the higher dose (200) was associated with the greater response, but Asians with chronic disease or high BMI, or patients with high BMI and acute disease might benefit more from the lower dose.

4.2. Best response in patients stratified by demographic and clinical parameters - showing Wilcoxon / Mann-Whitney test p-values for difference between treated and matched untreated subroups

# - Chunk__3b:                                      - #

## Calculate Wilcoxon test p-values for all comparisons

pTest_label <- "Wilcoxon test"

pValue.test<- function(x, y) {
  test <- wilcox.test(x, y)
  return(test$p.value)
}

# - Chunk__pTests: common code for p-value calculations in chunks 3b and c - #
# - Requires R-libraries and object exampleData (set-up)                   - #
# - Requires objects pChar_pairs and response_tbl (chunk_2)                - #


pValue.test_results <- list ()

for (i in 1: length(pChar_pairs)) {
  pValue.test_results_j <- list()

  for (j in 1: nrow(pChar_pairs[[i]])) {
    x <- pChar_pairs[[i]] %>% names
    y <- pChar_pairs[[i]] %>% .[j, ]%>% set_names(x)
# Ensure names are preserved when cbinding as transformed dataframe below
    if (length(x) == 1) {
      y <- as.data.frame (t(y))
      }

  pValue.test_results_j[[j]] <- 
    exampleData %>% 
      {if (length(x) == 1) {
        filter(.,
          get(x[1]) == paste(y[1])  
          ) 
        } else {
          filter(.,
            get(x[1]) == paste(y[1]) & 
            get(x[2]) == paste(y[2])
            ) 
          } 
        } %>% 
      summarise(
        p_value_0_vs_100 = pValue.test(target[dose == "0"], target[dose == "100"] ),
        p_value_0_vs_200 = pValue.test(target[dose == "0"], target[dose == "200"] )
        ) %>% 
      cbind(y, .)
    }
    pValue.test_results[[i]] <- pValue.test_results_j

  }

pValue.test_results <- 
pValue.test_results %>% 
  flatten() %>% 
  lapply(function(df)
      if(
        ncol(df) < 4) {
        cbind(df[, 1, drop = FALSE], df)    
        }else{
          df} 
      ) %>% 
    lapply(function(df)
      set_names(df,
        "Var1_value", "Var2_value", 
        "p_value_0_vs_100", "p_value_0_vs_200"
        ) 
      ) %>% 
    list_rbind() %>% 
    mutate(
      log.p_100 = log10(p_value_0_vs_100) %>% round(digits = 2),
      log.p_200 = log10(p_value_0_vs_200) %>% round(digits = 2)
    ) %>% 
    select(
      !c(p_value_0_vs_100, p_value_0_vs_200)
    ) %>% 
# Add column for joining with response_tbl and join
    unite(
      Vars1_2,
      c(Var1_value, Var2_value), 
      sep = "_", 
      remove = F
      ) %>% 
  left_join(
    response_tbl, .) %>% 
  mutate(
     log.p_max.resp = case_when(
      max_resp_dose == "100" ~ log.p_100, 
      max_resp_dose == "200" ~ log.p_200
     )
    )
 

 ## Generate new plot of best responses, showing Wilcoxon p-values

pValue.test_results %>% 
# Select and reshape  
  select(
    Var1_value, Var2_value, 
    log.p_max.resp, 
    max_resp_dose, max_response
    ) %>% 
  pivot_wider(
    names_from = "Var2_value",
    values_from = c(
      "max_resp_dose", "max_response", "log.p_max.resp"), 
      names_sep = "-"
      ) %>% 
  pivot_longer(
    cols = !c("Var1", "Var1_value"),
    names_to = c("resp", "Var2_value"),
    values_to = c("max_response"), names_sep = "-"
    ) %>% 
  pivot_wider(
    names_from = "resp", 
    values_from = "max_response") %>% 
  mutate(
    Var1_value = as.factor(Var1_value)
    ) %>% 

# Plot data
  ggplot(
    aes(
      x = Var1_value %>% 
        factor(., 
          levels = unique(.) %>% rev()
          ), 
      y = Var2_value %>% 
        factor(., 
          levels = unique(.) %>% rev()
          ),
      col   = max_response,
      label = max_response %>% round(3) 
      )
    ) +
    geom_tile(
      aes(fill= factor(max_resp_dose)),
      col="lightgrey"
      ) +
    scale_fill_manual(
      values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
      na.translate = F
      ) +
    geom_text(
      col = "black", 
      vjust = 3.5, hjust = 0.5, 
      size = 10/.pt
      ) +
    geom_point(
      aes( size = log.p_max.resp), 
      shape = 19,
      position = position_nudge(y = 0.1)
      ) +
    scale_size(
      range = c(18, 1), 
      name = "log.p_max.resp") +
    labs(
      x = NULL, y = NULL, 
      col = "Maximum\nresponse",
      fill = "Dose",
      title = "Best median response, stratified by one or two patient characteristics",
      subtitle = paste( "\nlog.p: ", pTest_label)
      ) +
    scale_color_gradient2(
      low = "white", high="darkblue", 
      limits = c(-0.25, 1.5)
      ) +
    scale_x_discrete(
      position = "top"
      ) +
    scale_y_discrete() +
    theme_light(base_size = 15) + 
    theme(
      axis.text.x = element_text(
        angle = 90, 
        vjust = 0.5,
        hjust = 0
        ),
      panel.grid = element_blank() 
      )

# Display, sorted by significance of response at dose 200

pValue.test_results %>% 
  select(
    Var1, Var1_value, 
    Var2, Var2_value, 
    log.p_max.resp, 
    max_response, max_resp_dose,
    N_100, median_100,
    N_200, median_200
    ) %>%
  arrange(log.p_max.resp) %>% 
  kable(., "html",
        caption = "Maximum responses, sorted by significance") %>% 
    scroll_box(width = "120%", height = "150px") %>% 
    kable_styling( 
      font_size = 12,
      bootstrap_options = "striped"
      )
Maximum responses, sorted by significance
Var1 Var1_value Var2 Var2_value log.p_max.resp max_response max_resp_dose N_100 median_100 N_200 median_200
Sex Female Sex Female -21.33 1.2942516 200 150 0.9423150 150 1.3034651
Type Acute disease Type Acute disease -15.94 1.3149731 200 129 0.7299124 120 1.2406324
BMI low BMI BMI low BMI -14.33 1.0041402 200 151 0.5962553 146 0.9452158
Age <40 years Sex Female -13.00 1.4738253 200 85 0.9365259 77 1.3802094
BMI low BMI Sex Female -12.67 1.3772039 200 78 0.8670915 69 1.3952003
Age <40 years Age <40 years -12.35 1.0314618 200 164 0.6061683 143 0.9282574
Ethnic White Ethnic White -12.00 1.1166348 200 144 0.5907656 144 1.1569848
Ethnic White Sex Female -11.93 1.4080069 200 72 0.9754775 74 1.4863500
Sex Female Type Acute disease -11.41 1.4073091 200 62 1.1145576 59 1.4856522
Sex Female Type Chronic disease -10.93 1.2642623 200 88 0.8593007 91 1.2292830
Ethnic White Type Acute disease -9.57 1.4209837 200 68 0.6494141 52 1.5249881
BMI high BMI Sex Female -9.39 1.3101590 200 72 1.1110143 81 1.2697480
BMI low BMI Type Acute disease -9.37 1.2362381 200 71 0.6381219 57 1.2872080
Age <40 years Type Acute disease -9.30 1.2573735 200 69 0.7294408 65 1.1830328
Age >=40 years Age >=40 years -9.08 0.8054468 200 136 0.5995700 157 0.8280312
Age >=40 years Sex Female -8.89 1.1970881 200 65 0.9744453 73 1.2400958
BMI low BMI Age <40 years -8.82 1.0661942 200 84 0.5855097 73 0.9341390
BMI low BMI Ethnic White -8.82 1.2238384 200 77 0.5240786 74 1.2641885
Ethnic Asian Ethnic Asian -7.93 0.8071959 200 108 0.6662038 108 0.6700329
BMI high BMI BMI high BMI -7.75 0.8563161 200 149 0.6116301 154 0.8165202
Age <40 years Ethnic White -7.66 1.3769607 200 76 0.6428964 67 1.2697480
Type Chronic disease Type Chronic disease -7.50 0.7574756 200 171 0.5721217 180 0.7739478
Age >=40 years Type Acute disease -7.36 1.3518500 200 60 0.7337059 55 1.2615492
Ethnic Asian Sex Female -7.11 1.3629210 200 50 0.9448312 45 1.2521573
BMI low BMI Type Chronic disease -6.64 0.9663556 200 80 0.5915215 89 0.8767894
Ethnic Asian Type Acute disease -6.50 1.0228892 200 42 0.8153932 46 0.8562452
BMI high BMI Type Acute disease -6.36 1.1837232 100 58 0.9500895 63 0.9350078
Sex Male Type Acute disease -6.14 1.0002103 200 67 0.4530151 61 0.8317767
BMI low BMI Age >=40 years -6.05 0.9289176 200 67 0.6504704 73 0.9560899
BMI low BMI Ethnic Asian -5.39 1.0381097 200 49 0.6504704 50 0.8408472
Ethnic White Type Chronic disease -5.02 0.9598003 200 76 0.5824246 92 0.8934708
Age <40 years Ethnic Asian -4.94 0.8610026 200 59 0.5917255 53 0.6943586
Age <40 years Type Chronic disease -4.66 1.0150956 200 95 0.5859389 78 0.8744048
Age >=40 years Ethnic White -4.56 0.8418724 200 68 0.5675774 77 0.9560899
Sex Male Sex Male -4.45 0.6787458 200 150 0.3836353 150 0.5967923
BMI low BMI Sex Male -4.34 0.8092369 200 73 0.2511608 77 0.6943586
BMI high BMI Age <40 years -4.31 0.9355423 200 80 0.6653357 70 0.9167583
Age >=40 years Ethnic Asian -4.25 0.7938925 100 49 0.6831288 55 0.6657136
BMI high BMI Ethnic White -4.16 0.8957185 200 67 0.6272804 70 0.9188214
BMI high BMI Age >=40 years -4.12 0.7624675 200 69 0.5920707 84 0.7220564
Ethnic Asian Type Chronic disease -3.63 0.6786501 100 66 0.5678864 62 0.5051804
BMI high BMI Ethnic Asian -3.61 0.7914879 100 59 0.6831288 58 0.6097214
Ethnic Black Sex Female -3.08 0.7044310 200 28 0.7605207 31 0.9350078
Age >=40 years Type Chronic disease -2.98 0.6906872 200 76 0.5632136 102 0.7402838
Ethnic White Sex Male -2.64 0.8795431 200 72 0.2292620 70 0.8397472
Ethnic Asian Sex Male -2.50 0.7794252 200 58 0.4049125 63 0.5271082
Age >=40 years Sex Male -2.42 0.6711926 200 71 0.4517713 84 0.6499665
Ethnic Black Ethnic Black -2.39 0.4771506 200 48 0.5984809 48 0.6934405
Ethnic Black Type Acute disease -2.35 1.3886972 100 19 0.9676295 22 0.4197576
Age <40 years Sex Male -2.29 0.5360498 200 79 0.2307690 66 0.4176329
Age >=40 years Ethnic Black -1.98 1.1131057 100 19 1.1402780 25 0.7689209
BMI low BMI Ethnic Black -1.76 0.7561552 100 25 0.9481041 22 0.4964860
BMI high BMI Type Chronic disease -1.69 0.2493957 200 91 0.5636511 91 0.6970257
BMI high BMI Ethnic Black -1.26 0.2817528 200 23 0.4165172 26 0.7588200
Age <40 years Ethnic Black -1.26 0.3291949 200 29 0.5962553 23 0.6381619
Ethnic Black Sex Male -1.12 0.3691546 100 20 0.5854445 17 0.2796167
BMI high BMI Sex Male -1.09 0.4547091 200 77 0.3924904 73 0.4359251
Ethnic Black Type Chronic disease -0.75 0.3114366 200 29 0.5701826 26 0.8071414
Sex Male Type Chronic disease -0.62 0.3281991 200 83 0.1721311 89 0.3685491



4.3. Best response in patients stratified by demographic and clinical parameters - showing Mood test p-values for difference between treated and matched untreated subroups

# - Chunk__3c:                                      - #

## Calculate Mood test p-values for all comparisons

pTest_label <- "Mood test"

pValue.test<- function(x, y) {
  test <- mood.test(x, y)
  return(test$p.value)
}

# - Chunk__pTests: common code for p-value calculations in chunks 3b and c - #
# - Requires R-libraries and object exampleData (set-up)                   - #
# - Requires objects pChar_pairs and response_tbl (chunk_2)                - #


pValue.test_results <- list ()

for (i in 1: length(pChar_pairs)) {
  pValue.test_results_j <- list()

  for (j in 1: nrow(pChar_pairs[[i]])) {
    x <- pChar_pairs[[i]] %>% names
    y <- pChar_pairs[[i]] %>% .[j, ]%>% set_names(x)
# Ensure names are preserved when cbinding as transformed dataframe below
    if (length(x) == 1) {
      y <- as.data.frame (t(y))
      }

  pValue.test_results_j[[j]] <- 
    exampleData %>% 
      {if (length(x) == 1) {
        filter(.,
          get(x[1]) == paste(y[1])  
          ) 
        } else {
          filter(.,
            get(x[1]) == paste(y[1]) & 
            get(x[2]) == paste(y[2])
            ) 
          } 
        } %>% 
      summarise(
        p_value_0_vs_100 = pValue.test(target[dose == "0"], target[dose == "100"] ),
        p_value_0_vs_200 = pValue.test(target[dose == "0"], target[dose == "200"] )
        ) %>% 
      cbind(y, .)
    }
    pValue.test_results[[i]] <- pValue.test_results_j

  }

pValue.test_results <- 
pValue.test_results %>% 
  flatten() %>% 
  lapply(function(df)
      if(
        ncol(df) < 4) {
        cbind(df[, 1, drop = FALSE], df)    
        }else{
          df} 
      ) %>% 
    lapply(function(df)
      set_names(df,
        "Var1_value", "Var2_value", 
        "p_value_0_vs_100", "p_value_0_vs_200"
        ) 
      ) %>% 
    list_rbind() %>% 
    mutate(
      log.p_100 = log10(p_value_0_vs_100) %>% round(digits = 2),
      log.p_200 = log10(p_value_0_vs_200) %>% round(digits = 2)
    ) %>% 
    select(
      !c(p_value_0_vs_100, p_value_0_vs_200)
    ) %>% 
# Add column for joining with response_tbl and join
    unite(
      Vars1_2,
      c(Var1_value, Var2_value), 
      sep = "_", 
      remove = F
      ) %>% 
  left_join(
    response_tbl, .) %>% 
  mutate(
     log.p_max.resp = case_when(
      max_resp_dose == "100" ~ log.p_100, 
      max_resp_dose == "200" ~ log.p_200
     )
    )
 

 ## Generate new plot of best responses, showing Wilcoxon p-values

pValue.test_results %>% 
# Select and reshape  
  select(
    Var1_value, Var2_value, 
    log.p_max.resp, 
    max_resp_dose, max_response
    ) %>% 
  pivot_wider(
    names_from = "Var2_value",
    values_from = c(
      "max_resp_dose", "max_response", "log.p_max.resp"), 
      names_sep = "-"
      ) %>% 
  pivot_longer(
    cols = !c("Var1", "Var1_value"),
    names_to = c("resp", "Var2_value"),
    values_to = c("max_response"), names_sep = "-"
    ) %>% 
  pivot_wider(
    names_from = "resp", 
    values_from = "max_response") %>% 
  mutate(
    Var1_value = as.factor(Var1_value)
    ) %>% 

# Plot data
  ggplot(
    aes(
      x = Var1_value %>% 
        factor(., 
          levels = unique(.) %>% rev()
          ), 
      y = Var2_value %>% 
        factor(., 
          levels = unique(.) %>% rev()
          ),
      col   = max_response,
      label = max_response %>% round(3) 
      )
    ) +
    geom_tile(
      aes(fill= factor(max_resp_dose)),
      col="lightgrey"
      ) +
    scale_fill_manual(
      values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
      na.translate = F
      ) +
    geom_text(
      col = "black", 
      vjust = 3.5, hjust = 0.5, 
      size = 10/.pt
      ) +
    geom_point(
      aes( size = log.p_max.resp), 
      shape = 19,
      position = position_nudge(y = 0.1)
      ) +
    scale_size(
      range = c(18, 1), 
      name = "log.p_max.resp") +
    labs(
      x = NULL, y = NULL, 
      col = "Maximum\nresponse",
      fill = "Dose",
      title = "Best median response, stratified by one or two patient characteristics",
      subtitle = paste( "\nlog.p: ", pTest_label)
      ) +
    scale_color_gradient2(
      low = "white", high="darkblue", 
      limits = c(-0.25, 1.5)
      ) +
    scale_x_discrete(
      position = "top"
      ) +
    scale_y_discrete() +
    theme_light(base_size = 15) + 
    theme(
      axis.text.x = element_text(
        angle = 90, 
        vjust = 0.5,
        hjust = 0
        ),
      panel.grid = element_blank() 
      )

# Display, sorted by significance of response at dose 200

pValue.test_results %>% 
  select(
    Var1, Var1_value, 
    Var2, Var2_value, 
    log.p_max.resp, 
    max_response, max_resp_dose,
    N_100, median_100,
    N_200, median_200
    ) %>%
  arrange(log.p_max.resp) %>% 
  kable(., "html",
        caption = "Maximum responses, sorted by significance") %>% 
    scroll_box(width = "120%", height = "150px") %>% 
    kable_styling( 
      font_size = 12,
      bootstrap_options = "striped"
      )
Maximum responses, sorted by significance
Var1 Var1_value Var2 Var2_value log.p_max.resp max_response max_resp_dose N_100 median_100 N_200 median_200
BMI low BMI Ethnic Black -1.79 0.7561552 100 25 0.9481041 22 0.4964860
BMI low BMI BMI low BMI -1.20 1.0041402 200 151 0.5962553 146 0.9452158
Age <40 years Ethnic White -1.06 1.3769607 200 76 0.6428964 67 1.2697480
BMI high BMI Sex Male -0.95 0.4547091 200 77 0.3924904 73 0.4359251
BMI low BMI Sex Female -0.90 1.3772039 200 78 0.8670915 69 1.3952003
Age <40 years Type Chronic disease -0.90 1.0150956 200 95 0.5859389 78 0.8744048
BMI high BMI Ethnic Black -0.88 0.2817528 200 23 0.4165172 26 0.7588200
Age <40 years Age <40 years -0.81 1.0314618 200 164 0.6061683 143 0.9282574
BMI low BMI Age <40 years -0.80 1.0661942 200 84 0.5855097 73 0.9341390
Ethnic Asian Sex Female -0.77 1.3629210 200 50 0.9448312 45 1.2521573
Ethnic Asian Sex Male -0.75 0.7794252 200 58 0.4049125 63 0.5271082
BMI low BMI Type Acute disease -0.70 1.2362381 200 71 0.6381219 57 1.2872080
Type Chronic disease Type Chronic disease -0.67 0.7574756 200 171 0.5721217 180 0.7739478
BMI low BMI Age >=40 years -0.67 0.9289176 200 67 0.6504704 73 0.9560899
BMI high BMI Type Chronic disease -0.65 0.2493957 200 91 0.5636511 91 0.6970257
BMI high BMI Age >=40 years -0.64 0.7624675 200 69 0.5920707 84 0.7220564
Ethnic White Ethnic White -0.55 1.1166348 200 144 0.5907656 144 1.1569848
BMI low BMI Ethnic White -0.55 1.2238384 200 77 0.5240786 74 1.2641885
Sex Male Type Acute disease -0.52 1.0002103 200 67 0.4530151 61 0.8317767
Ethnic Asian Type Chronic disease -0.50 0.6786501 100 66 0.5678864 62 0.5051804
BMI low BMI Ethnic Asian -0.46 1.0381097 200 49 0.6504704 50 0.8408472
BMI low BMI Type Chronic disease -0.42 0.9663556 200 80 0.5915215 89 0.8767894
Age >=40 years Sex Male -0.42 0.6711926 200 71 0.4517713 84 0.6499665
Age <40 years Sex Female -0.41 1.4738253 200 85 0.9365259 77 1.3802094
Sex Female Sex Female -0.39 1.2942516 200 150 0.9423150 150 1.3034651
Ethnic White Sex Male -0.33 0.8795431 200 72 0.2292620 70 0.8397472
BMI high BMI Age <40 years -0.29 0.9355423 200 80 0.6653357 70 0.9167583
Sex Female Type Acute disease -0.29 1.4073091 200 62 1.1145576 59 1.4856522
Ethnic Black Type Acute disease -0.26 1.3886972 100 19 0.9676295 22 0.4197576
Age >=40 years Ethnic Asian -0.25 0.7938925 100 49 0.6831288 55 0.6657136
BMI low BMI Sex Male -0.24 0.8092369 200 73 0.2511608 77 0.6943586
Ethnic Black Type Chronic disease -0.23 0.3114366 200 29 0.5701826 26 0.8071414
Sex Female Type Chronic disease -0.22 1.2642623 200 88 0.8593007 91 1.2292830
Sex Male Sex Male -0.21 0.6787458 200 150 0.3836353 150 0.5967923
Ethnic Black Sex Female -0.21 0.7044310 200 28 0.7605207 31 0.9350078
Ethnic White Type Acute disease -0.21 1.4209837 200 68 0.6494141 52 1.5249881
Ethnic Black Ethnic Black -0.20 0.4771506 200 48 0.5984809 48 0.6934405
Ethnic Asian Ethnic Asian -0.19 0.8071959 200 108 0.6662038 108 0.6700329
BMI high BMI Ethnic White -0.17 0.8957185 200 67 0.6272804 70 0.9188214
Ethnic White Type Chronic disease -0.17 0.9598003 200 76 0.5824246 92 0.8934708
Age >=40 years Type Chronic disease -0.16 0.6906872 200 76 0.5632136 102 0.7402838
BMI high BMI BMI high BMI -0.15 0.8563161 200 149 0.6116301 154 0.8165202
BMI high BMI Sex Female -0.14 1.3101590 200 72 1.1110143 81 1.2697480
Age >=40 years Sex Female -0.13 1.1970881 200 65 0.9744453 73 1.2400958
BMI high BMI Ethnic Asian -0.12 0.7914879 100 59 0.6831288 58 0.6097214
Ethnic Asian Type Acute disease -0.12 1.0228892 200 42 0.8153932 46 0.8562452
Age >=40 years Ethnic Black -0.11 1.1131057 100 19 1.1402780 25 0.7689209
Age <40 years Sex Male -0.11 0.5360498 200 79 0.2307690 66 0.4176329
Ethnic White Sex Female -0.09 1.4080069 200 72 0.9754775 74 1.4863500
Age <40 years Ethnic Black -0.07 0.3291949 200 29 0.5962553 23 0.6381619
Sex Male Type Chronic disease -0.07 0.3281991 200 83 0.1721311 89 0.3685491
Age >=40 years Age >=40 years -0.04 0.8054468 200 136 0.5995700 157 0.8280312
Ethnic Black Sex Male -0.04 0.3691546 100 20 0.5854445 17 0.2796167
BMI high BMI Type Acute disease -0.03 1.1837232 100 58 0.9500895 63 0.9350078
Age >=40 years Type Acute disease -0.03 1.3518500 200 60 0.7337059 55 1.2615492
Type Acute disease Type Acute disease -0.02 1.3149731 200 129 0.7299124 120 1.2406324
Age <40 years Ethnic Asian -0.02 0.8610026 200 59 0.5917255 53 0.6943586
Age >=40 years Ethnic White -0.02 0.8418724 200 68 0.5675774 77 0.9560899
Age <40 years Type Acute disease -0.01 1.2573735 200 69 0.7294408 65 1.1830328



5. Distribution of target values in patient subgroups treated with different doses


# - splitViolins                                                                      - #
# - From: https://stackoverflow.com/questions/35717353/split-violin-plot-with-ggplot2 - #

## Custom ggproto object for generating split violin plots

GeomSplitViolin <- ggplot2::ggproto(
    "GeomSplitViolin",
    ggplot2::GeomViolin,
    draw_group = function(self,
                          data,
                          ...,
                          # add the nudge here
                          nudge = 0,
                          draw_quantiles = NULL) {
        data <- transform(data,
                          xminv = x - violinwidth * (x - xmin),
                          xmaxv = x + violinwidth * (xmax - x))
        grp <- data[1, "group"]
        newdata <- plyr::arrange(transform(data,
                                           x = if (grp %% 2 == 1) xminv else xmaxv),
                                 if (grp %% 2 == 1) y else -y)
        newdata <- rbind(newdata[1, ],
                         newdata,
                         newdata[nrow(newdata), ],
                         newdata[1, ])
        newdata[c(1, nrow(newdata)-1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

        # now nudge them apart
        newdata$x <- ifelse(newdata$group %% 2 == 1,
                            newdata$x - nudge,
                            newdata$x + nudge)

        if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {

            stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))

            quantiles <- ggplot2:::create_quantile_segment_frame(data,
                                                             draw_quantiles)
            aesthetics <- data[rep(1, nrow(quantiles)),
                               setdiff(names(data), c("x", "y")),
                               drop = FALSE]
            aesthetics$alpha <- rep(1, nrow(quantiles))
            both <- cbind(quantiles, aesthetics)
            quantile_grob <- ggplot2::GeomPath$draw_panel(both, ...)
            ggplot2:::ggname("geom_split_violin",
                             grid::grobTree(ggplot2::GeomPolygon$draw_panel(newdata, ...),
                                            quantile_grob))
        }
    else {
            ggplot2:::ggname("geom_split_violin",
                             ggplot2::GeomPolygon$draw_panel(newdata, ...))
        }
    }
)

geom_split_violin <- function(mapping = NULL,
                              data = NULL,
                              stat = "ydensity",
                              position = "identity",
                              # nudge param here
                              nudge = 0,
                              ...,
                              draw_quantiles = NULL,
                              trim = TRUE,
                              scale = "area",
                              na.rm = FALSE,
                              show.legend = NA,
                              inherit.aes = TRUE) {

    ggplot2::layer(data = data,
                   mapping = mapping,
                   stat = stat,
                   geom = GeomSplitViolin,
                   position = position,
                   show.legend = show.legend,
                   inherit.aes = inherit.aes,
                   params = list(trim = trim,
                                 scale = scale,
                                 # don't forget the nudge
                                 nudge = nudge,
                                 draw_quantiles = draw_quantiles,
                                 na.rm = na.rm,
                                 ...))
}

I used split violin box plots for the pairwise comparison of target medians and distributions in low and high responder groups. Low responders are shown in green on the left side, high responder groups in red on the right. Likewise, facets for the second group variable are arranged from lowest (left) to highest responders (right).



5.1. Dose vs. target values in patients stratified by sex and age


(see also 3.6.: female patients stratified by sex and age)

# - Chunk_4a                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- compare_means(
  data = exampleData,
  target ~ dose,
  group.by = c("sex", "age"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("sex", "age", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Sex", "Age", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select( dose, target) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200") 
        ), 
      y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
# Order facets low > high responders
    facet_grid(. 
      ~ factor(
        age, 
        levels = c(">=40 years", "<40 years")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 2),
widths = c(0.2, 1, 1),
align = "h"
)    



5.2. Dose vs. target values in patients stratified by sex and type of disease

# - Chunk_4b                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- compare_means(
  data = exampleData,
  target ~ dose,
  group.by = c("sex", "type"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("sex", "type", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Sex", "Type", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select( dose, target) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200") 
        ), 
      y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        type, 
        levels = c("Chronic disease", "Acute disease")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)    



5.3. Dose vs. target values in patients stratified by ethnic and type of disease


(see also 3.5.: female patients stratified by ethnic and type of disease)

# - Chunk_4c                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #


# Generate a table of significant differences (Wilcoxon test) 

sign_test <- compare_means(
  data = exampleData,
  target ~ dose,
  group.by = c("type", "race"),
  ref.group = "0",
 ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select(!".y." & !"group1" & !"method")  %>% 
  set_names(
    "Sex", "Ethnic", "Dose", "p", "p.adj", "p.format", "p.signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by( bmi, age, race, sex, type) %>% 
  select( dose, target) %>% 
  ggplot(
    aes(
      x= factor(dose, levels = c("0", "100", "200") ), 
      y = target,
# Order low > high responders
    fill = factor (type, levels = c("Chronic disease", "Acute disease") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue (direction = -1) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Type",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 3),
widths = c(0.2, 1, 1),
align = "h"
)    



Interestingly, Blacks with chronic disease had high target levels, even when untreated,

5.4. Dose vs. target values in patients stratified by race and sex

# - Chunk_4d                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- compare_means(
  data = exampleData,
  target ~ dose,
  group.by = c("sex", "race"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select(!".y." & !"group1" & !"method")  %>% 
  set_names(
    "Sex", "Ethnic", "Dose", "p", "p.adj", "p.format", "p.signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot(
    aes(
      x= factor(dose, levels = c("0", "100", "200") ), 
      y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
  # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 2),
widths = c(0.2, 1, 1),
align = "h"
)    



5.5. Dose vs. target values in female patients stratified by ethnic and disease type

# - Chunk_4e                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(sex == "Female") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("race", "type"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("race", "type", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Ethnic", "Type", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200") 
        ), 
         y = target,
# Order low > high responders
    fill = factor (type, levels = c("Chronic disease", "Acute disease") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue (direction = -1) +       geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Type",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 3),
widths = c(0.2, 1, 1),
align = "h"
)    



5.6. Dose vs. target values in female patients stratified by ethnic and age

# - Chunk_4f                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(sex == "Female") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("race", "age"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("race", "age", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Ethnic", "Age", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot (
    aes (x= factor(dose, levels = c("0", "100", "200") ), 
         y = target,
# Order low > high responders
        fill = factor (age, levels = c(">=40 years", "<40 years") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue (direction = -1) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target",
      fill = "Age",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 2.5),
widths = c(0.2, 1, 1),
align = "h"
)    



Younger white females appear to respond better than older, especially at the higher dose, whereas the opposite appears to be true for Asian females (not sure about the significance of that…).

5.7. Dose vs. target values in White patients stratified by sex and disease type

# - Chunk_4g                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(race == "White") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("sex", "type"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("sex", "type", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Sex", "Type", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot (
    aes (x= factor(dose, levels = c("0", "100", "200") ), 
         y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        type, 
        levels = c("Chronic disease", "Acute disease")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)    



5.8. Dose vs. target values in White patients stratified by sex and age

# - Chunk_4h                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(race == "White") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("sex", "age"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("sex", "age", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Sex", "Age", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot(
    aes(
      x = factor(
        dose, 
        levels = c("0", "100", "200") 
        ), 
      y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
     geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
# Order facets low > high responders
    facet_grid(. 
      ~ factor(
        age, 
        levels = c(">=40 years", "<40 years")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)    



5.9. Dose vs. target values in patients stratified by race and BMI


(The difference between median targets in the Black/ low BMI/ dose 100 group had the highest significance in the Mood test)

# - Chunk_4i                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(race == "White") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("race", "bmi"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("race", "bmi", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Ethnic", "BMI", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select(dose, target) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200")
        ), 
      y = target,
      fill = bmi
      ) 
    ) +
# Color red for patient variable value with higher response 
    scale_fill_hue (direction = -1) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
     geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
  # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "BMI",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)    



6. Groups with the highest and lowest responses


It appears that the groups the received the most significant benefit were women who were either <40 years (p.adj = 7.9 * 10-13, median difference 1.474), White (p.adj = 1.2 * 10-12, median difference 1.406), or with acute disease (p.adj = 3.9 * 10-12, median difference 1.407). Whites with acute disease or <40 years also showed significant benefit (p.adj = 2.7 * 10-10, median difference 1.421; median difference 1.474, 3.3 * 10-9, median difference 1.377), but this could be mainly because of the female sub-population of these groups.

6.1. White females <40 years respond better than the remaining patients


# - Chunk_5a                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

exampleData <- 
  exampleData %>% 
    mutate(
      Group = 
        case_when(
          race == "White" &
          #type == "Acute disease" &
          age == "<40 years" &
          sex == "Female"
             ~ "White / F / <40 yrs",
        TRUE ~ "Remaining"
        )
      )

sign_test <- 
  exampleData %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = "Group",
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("Group", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Group", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme(
      "light", 
      base_size = 10
      ),
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  select (dose, target, type, Group) %>% 
  group_by(Group) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200")
        ), 
      y = target,
      fill = Group 
      ) 
    ) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    fill_palette(
      palette = c("lightgrey","salmon")
      ) +
    geom_split_violin( 
      width = 1,
      nudge = 0.01,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
      width = 0.2, 
      outlier.shape = NA) +
    geom_point(
      pch = 21,
      size = 3,
      position = 
        position_jitterdodge(
          dodge.width = 0.7, 
          jitter.width = 0.1
          ),
      color = "grey",
      alpha = 0.5
      ) +
    theme_light() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) +    
    labs(
      x = "Dose", 
      y = "Target", 
      size = 13, hjust = -15,
      title = "Target vs. dose levels in the group with highest responses\n"
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)    



6.2. Black males with chronic disease show no or negative response


However, this might need confirmation in a larger sample of this subgroup.

# - Chunk_5b                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

exampleData <- 
  exampleData %>% 
    mutate(
      Group = 
        case_when(
          race == "Black" &
          # bmi == "high BMI" &
           type == "Chronic disease" &
          # age == ">=40 years" &
          sex == "Male"
             ~ "Black/ M / Chronic disease",
        TRUE ~ "Remaining"
        )
      )

sign_test <- 
  exampleData %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = "Group",
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("Group", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Group", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme(
      "light", 
      base_size = 10
      ),
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  select (dose, target, type, Group) %>% 
  group_by(Group) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200")
        ), 
      y = target,
      fill = Group 
      ) 
    ) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    fill_palette(
      palette = c("turquoise", "lightgrey")
      ) +
    geom_split_violin( 
      width = 1,
      nudge = 0.01,
      alpha = 0.3,
      color = "grey") +
  geom_boxplot(
      width = 0.2, 
      outlier.shape = NA) +
  geom_point(
    pch = 21,
    size = 3,
    position = 
      position_jitterdodge(
        dodge.width = 0.7, 
        jitter.width = 0.1
        ),
    color = "grey",
    alpha = 0.5
    ) +
  theme_light() +
  theme(
    text=element_text( size = 13), 
    axis.text = element_text( size = 12),
    ) +
  scale_y_continuous(
    breaks = c(-2: 5),
    minor_breaks = NULL
    ) +    labs(
      x = "Dose", 
      y = "Target", 
      size = 13, hjust = -15,
      title = "Target vs. dose levels in the group with lowest responses\n"
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)    



7. Questions and alternative visualisations


The distribution of target in the total population looks similar for all 3 doses, and suggests that at least two major subgroups exist that might respond differently.

In the following 3 figures I tried to produce an overview of best responses in subgroups characterised by 1-2 of the patient variables. Because there was no guarantee for a linear relationship between dose and target, I decided to compare the dose 100 and dose 200 groups individually with the dose 0 (untreated) group. Because in many subgroups the target values did not seem to be normally distributed, I choose to compare the differences between the median target values. I also tried to assess the significance of these differences. It struck me that the p-values of the Wilcoxon test seemed extremely significant in all groups, even after correction. By contrast, the Mood test produced no significant results.
In contrast to the Wilcoxon test, the Mood test does not require the distributions of the compared groups to be similar. However, its power to detect inter-group differences is much lower. My question to the statistician would be whether any of these tests should be applied to all subgroups characterised by one or two of the variables. Certainly, visually inspecting the target distributions for each seems subjective and cumbersome.

A better approach could be the modified Forest plot suggested by Ballarine et al., and implemented in the “UpSet plot” function from the R package “SubgrPlots” ^1, 2^.

Forest/UpSet plot


I found two other alternatives interesting, although perhaps more as tools for subgroup analysis than visualisation. The webinar discussed the shiny app DoRiS (Dose Response in Subgroup Analysis) which aims to facilitate data exploration and hypothesis generation in data sets with a very large numbers of subgroups. Rather than comparing the responses themselves, it employed a user-determined response threshold which could be, e.g., the clinically meaningful effect size. Either manual choice or automatic recognition of subgroup patterns was possible, but at the time only for a single patient variable. Favoring an exploratory approach, p-values and confidence intervals were not calculated to leave room for interdisciplinary discussions of plausible mechanisms and clinical relevance. I could not find the link to the shiny app on github, but it appeared to be very similar to the package “subscreen” (SAS® version 9.4 and R) 3.

Finally, “personalized” seems to be a very comprehensive analysis package in R that unifies a variety of methods for subgroup identification and treatment evaluation into one general, consistent framework 4.


Data exploration with Subscreen



8. Citation


Please quote as: Weissensteiner, Thomas. Visualising patient groups who might benefit from personalised dosing. RPubs, 10 Aug. 2024. https://rpubs.com/thomas-weissensteiner/1231597.

9. References


  1. A critical review of graphics for subgroup analyses in clinical trials. Ballarini NM, Chiu YD, König F, Posch M, Jaki T. Pharm Stat 2020, 19(5): 541-560. https://doi.org/10.1002/pst.2012

  2. A critical review of graphics for subgroup analyses in clinical trials. Ballarini NM, Chiu YD, König F, Posch M, Jaki T. Pharm Stat. 2020 Sep;19(5):541-560. https://doi.org/10.1002/pst.2012
    SubgrPlots: Graphical Displays for Subgroup Analysis in Clinical Trials. R package version 0.1.0. Ballarini N, Chiu YD.; 2018

  3. A novel concept of screening for subgrouping factors for the association between socioeconomic status and respiratory allergies. Muysers, C., Messina, F., Keil, T. et al.  J Expo Sci Environ Epidemiol 2022, 32: 295–302. https://doi.org/10.1038/s41370-021-00365-x

  4. Subgroup Identification Using the personalized Package. Huling, JD, & Yu, M. Journal of Statistical Software 2021, 98(5): 1–60. https://doi.org/10.18637/jss.v098.i05