Visualising patient groups who might benefit from personalised dosing

My scripts:

rendered html versions: Rpubs/thomas-weissensteiner
.rmd files with executable code chunks: www.github.com/thomas-weissensteiner/portfolio/tree/main/

About myself: www.linkedin.com/in/ThomasWs-Mopfair

1. Background and motivation

Current dosing practices are based on average responses from limited clinical trials. However, overlooking variations such as to sex, age, and genetics can lead to adverse events in certain patient groups. Conversely, a new drug which might have benefit for a sub-group of patients might fail to get approval because the pivotal trial showed insufficient general efficacy in the general population.

The scripts were written for the VIS-SIG Wonderful Wednesdays challenge 10 Oct 24 (link to recording). “Wonderful Wednesdays” are a monthly webinar organised by the Visualisation Special Interest Group of Statisticians in the Pharmaceutical Industry (PSI).

The task consisted of identifying patient subgroups with dose responses that deviate from the remaining study participants, and who therefore might benefit from personalized dosing.

2. R packages and data set

# - set-up                                                     - #

library(dplyr)        # version 1.1.4  # general grammar
library(purrr)        # version 1.0.2  # set_names, list_rbind
library(tidyr)        # version 1.3.1  # pivot_longer, pivot_wider
library(ggpubr)       # version , includes ggplot2 3.4.0 # stat_compare_means, ggtexttable, ggarrange
library(kableExtra)   # rendering tables in HTML format

exampleData <- read.csv("https://raw.githubusercontent.com/VIS-SIG/Wonderful-Wednesdays/refs/heads/master/data/2024/2024-09-11/WWW54ExampleData.csv")

The data were a simulated dose response set with the following structure:

dose: dose levels (0, 100, 200 mg)
target: continuous response variable (higher is better)

subgroup variables:
bmi (body mass index: ‘low BMI’ or ‘high BMI’)
age (‘<40 years’ or ‘≥40 years’)
race (‘Black’, ‘Asian’, ‘White’)
sex (‘Female’, ‘Male’)
type (type of disease: ‘Acute disease’ or ‘Chronic disease’)

3. Target vs. dose in all patients

# - Chunk_1                                              - #
# - Requires R-libraries and object exampleData (set-up) - #

## Plot distribution of target values at different doses in the total population

sign_test <- 
  exampleData %>% 
  compare_means(
  data = .,
  target ~ dose,
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme(
      "light", 
      base_size = 10
      ),
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(
  
exampleData %>% 
  select (dose, target) %>% 
  
  ggplot (
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200"
        )
      ), 
      y = target)
    ) +
  geom_hline(
    yintercept = 0, 
    color = "darkgrey",
    linewidth = 1.1
    ) +
  geom_violin(
    color = "lightgrey",
    fill  = "lightgrey", 
    width = 0.45
    ) +
  geom_jitter(
    height = 0, width = 0.1, 
    size = 2, 
    alpha = 0.2
    ) +
  geom_boxplot(
    width = 0.15 , 
    outlier.shape = NA,
    color = "salmon",
    lwd = 0.7,
    alpha = 0.5
    ) +
  theme_light() +
  theme(
    text=element_text(size = 13), 
    axis.text = element_text(
      size = 12),
    ) +
  scale_y_continuous(
    breaks = c(-2: 5),
    minor_breaks = NULL
    ) +
  labs(
    x = "Dose", 
    y = "Target", 
    size = 12, hjust = -15,
    title = "Target vs. dose levels in the total patient population\n"
    ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1)
)

The reference for the Wilcoxon test is the group of untreated patients (dose = 0). Differences in target levels in treated patients are significant. However, the distribution of target values at the three dose levels suggests that at least two subgroups might exists that could respond differently.

4. Best response in patient subgroups

In order to identify these potential groups, I stratified patients by one or two of the clinical and demographic characteristics that were provided. For each group, I then calculated the median response for doses 100 and 200 (difference between median target values in the treated subgroups and patients receiving a dose 0).

# - Chunk_2                                              - #
# - Requires R-libraries and object exampleData (set-up) - #

## Table of responses in patient subgroups characterised by one or two variables

# Generate all pairwise combinations of patient characteristics
pVars <- 
  exampleData %>% 
  names() %>% .[!. %in% c("dose","target")]

pVar_pairs <- 
  pVars %>% 
  { c(
    lapply(., function(x) c(x, x)),       # duplicate each characteristic
    combn(.,2, simplify = F)              # all combinations of 2 characteristics
  ) }
 
# ---
pChar_pairs <- 
  lapply(
    seq_along(pVar_pairs), 
    function (x) 
      exampleData %>% 
      select(pVar_pairs[[x]] ) %>% 
      unique(.)
    )
# ---

# Calculate median responses for the patient subgroups 
response_tbl <- 
  lapply(
    pVar_pairs, 
    function(x) 
      exampleData %>% 
        group_by(
          across(all_of(x) ), 
          dose
          ) %>% 
      summarise(
        median(target),
        N = n()
      )
  ) %>%
  # ensure all dataframes have the same number of columns 
    lapply(function(df)
      if(
        ncol(df) < length(pVars)) {
        cbind(df[, 1, drop = FALSE], df)    
        }else{
          df} 
      ) %>% 
  lapply(function(df)
    pivot_wider(
      df, 
      names_from = "dose",
      values_from = c("median(target)", "N")
      ) %>%
    set_names(
     "Var1_value", "Var2_value", 
     "median_0", "median_100", "median_200", 
      "N_0", "N_100", "N_200"
      )
    ) %>% 
  list_rbind() %>%
  mutate(
    response_100 = median_100 - median_0,
    response_200 = median_200 - median_0,
    max_response = max(response_100, response_200) ,
    max_resp_dose = which.max(c(response_100, response_200))*100,
    N_max = c(N_100, N_200)[max_resp_dose/100]
    )


## Table of best median responses stratified by patient subgroups

# Re-order columns, sort by maximum response (highest first)
response_tbl %>%   
  select( 
    "Var1_value", "Var2_value", 
    "N_max", "max_response",
    "N_0", "N_100", "N_200",
    "median_0", "median_100", "median_200"
    ) %>% 
  arrange( desc(max_response)) %>% 
  kable(., "html") %>% 
  scroll_box(width = "100%", height = "200px") %>% 
  kable_styling( 
    bootstrap_options = c(
      "striped", "hover", "responsive", full_width = F) 
    ) 


# Optional: rearrange so that diagonal of the plot shows responses in order of magnitude
response_tbl <- 
  response_tbl %>%
  mutate(
    .before = Var1,
    Var1 = case_when (
      Var1_value %in% c("high BMI", "low BMI") ~ "BMI",
      Var1_value %in% c(">=40 years", "<40 years") ~ "Age",
      Var1_value %in% c("Asian", "Black", "White") ~ "Ethnic",
      Var1_value %in% c("Chronic disease", "Acute disease") ~ "Type",
      Var1_value %in% c("Female", "Male") ~ "Sex"
      ),
    Var2 = case_when (
      Var2_value %in% c("high BMI", "low BMI") ~ "BMI",
      Var2_value %in% c(">=40 years", "<40 years") ~ "Age",
      Var2_value %in% c("Asian", "Black", "White") ~ "Ethnic",
      Var2_value %in% c("Chronic disease", "Acute disease") ~ "Type",
      Var2_value %in% c("Female", "Male") ~ "Sex"
      )
    ) %>% 
    mutate(
      pVar_equal = (Var1 == Var2)
      ) %>%  
    { rbind( 
        filter(., pVar_equal) %>% 
          group_by(Var1) %>%
          arrange(desc(max_response), .by_group = TRUE) %>%
          mutate(max_value = max(max_response, na.rm = TRUE)) %>%
          arrange(desc(max_value)) %>% 
          select (!max_value),
        filter(., !pVar_equal) 
        ) 
      } %>% 
# Add column for joining with Wilcox and Mood test results (chunks 3b and 3c)
  unite(
    Vars1_2,
    c(Var1_value, Var2_value), 
    sep = "_", 
    remove = F
    )

4.1. Best responses in patients stratified by demographic and clinical parameters - showing patient numbers in treated and matched untreated subgroups

# - Chunk_3a                                              - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object response_tbl (chunk_2)               - #


# Select and reshape  

response_tbl %>% 
  select(
    Var1,
    Var1_value, Var2_value, 
    N_max, N_0, 
    max_resp_dose, max_response
    ) %>% 
  pivot_wider(
    names_from = "Var2_value",
    values_from = c(
      "max_resp_dose", "max_response", "N_max", "N_0"), 
      names_sep = "-"
      ) %>% 
  pivot_longer(
    cols = !c("Var1", "Var1_value"),
    names_to = c("resp", "Var2_value"),
    values_to = c("max_response"), names_sep = "-"
    ) %>% 
  pivot_wider(
    names_from = "resp", 
    values_from = "max_response") %>% 
  mutate(
    Var1 = as.factor(Var1)
    ) %>% 

# Plot data
  ggplot(
    aes(
      x = Var1_value %>% 
        factor(., levels = unique(.) %>% rev ), 
      y = Var2_value %>% 
        factor(., levels = unique(.) %>% rev ),
      col   = max_response,
      label = max_response %>% round(3) 
      )
    ) +
    geom_tile(
      aes(fill= factor(max_resp_dose)),
      col="lightgrey"
      ) +
    scale_fill_manual(
      values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
      na.translate = F
      ) +
    geom_text(
      col = "black", 
      vjust = 2.5, hjust = 0.5, 
      size = 10/.pt
      ) +
    geom_point(
      aes( size = N_0), 
      shape = 15, 
      color = "white",
      position = position_nudge(y = 0.1)
      ) +
    geom_point(
      aes( size = N_max), 
      shape = 19,
      position = position_nudge(y = 0.1)
      ) +
    geom_point(
      aes( size = N_0), 
      shape = 22, 
      color = "darkblue",
      position = position_nudge(y = 0.1)
      ) +
    scale_size(range = c(1, 15), name = "N") +
    labs(
      x = NULL, y = NULL, 
      col = "Maximum\nresponse",
      fill = "Dose",
      title = "Best median response, stratified by one or two patient characteristics") +
    scale_color_gradient2(
      low = "white", high="darkblue", 
      limits = c(-0.25, 1.5)
      ) +
    scale_x_discrete(
      position = "top"
      ) +
    scale_y_discrete() +
    theme_light(base_size = 15) + 
    theme(
      axis.text.x = element_text(
        angle = 90, 
        vjust = 0.5,
        hjust = 0
        ),
      panel.grid = element_blank() 
    )

response_tbl %>% 
  select(
    Var1, Var1_value, 
    Var2, Var2_value, 
    max_response, max_resp_dose, N_max, 
    N_0, median_0, 
    N_100, median_100,
    N_200, median_200
    ) %>%
  arrange(
    desc(max_response) ) %>% 
  kable(., "html",
         caption = "Maximum response in each patient group, in descending order") %>% 
    scroll_box(width = "120%", height = "150px") %>% 
    kable_styling( 
      font_size = 12,
      bootstrap_options = "striped"
      )

Maximum response in each patient group, in descending order
Var1	Var1_value	Var2	Var2_value	max_response	max_resp_dose	N_max	N_0	median_0	N_100	median_100	N_200	median_200
Age	<40 years	Sex	Female	1.4738253	200	77	80	-0.0936159	85	0.9365259	77	1.3802094
Ethnic	White	Type	Acute disease	1.4209837	200	52	59	0.1040044	68	0.6494141	52	1.5249881
Ethnic	White	Sex	Female	1.4080069	200	74	62	0.0783431	72	0.9754775	74	1.4863500
Sex	Female	Type	Acute disease	1.4073091	200	59	70	0.0783431	62	1.1145576	59	1.4856522
Ethnic	Black	Type	Acute disease	1.3886972	100	19	26	-0.4210677	19	0.9676295	22	0.4197576
BMI	low BMI	Sex	Female	1.3772039	200	69	85	0.0179963	78	0.8670915	69	1.3952003
Age	<40 years	Ethnic	White	1.3769607	200	67	66	-0.1072127	76	0.6428964	67	1.2697480
Ethnic	Asian	Sex	Female	1.3629210	200	45	60	-0.1107637	50	0.9448312	45	1.2521573
Age	>=40 years	Type	Acute disease	1.3518500	200	55	68	-0.0903008	60	0.7337059	55	1.2615492
Type	Acute disease	Type	Acute disease	1.3149731	200	120	135	-0.0743407	129	0.7299124	120	1.2406324
BMI	high BMI	Sex	Female	1.3101590	200	81	65	-0.0404110	72	1.1110143	81	1.2697480
Sex	Female	Sex	Female	1.2942516	200	150	150	0.0092135	150	0.9423150	150	1.3034651
Sex	Female	Type	Chronic disease	1.2642623	200	91	80	-0.0349792	88	0.8593007	91	1.2292830
Age	<40 years	Type	Acute disease	1.2573735	200	65	67	-0.0743407	69	0.7294408	65	1.1830328
BMI	low BMI	Type	Acute disease	1.2362381	200	57	68	0.0509698	71	0.6381219	57	1.2872080
BMI	low BMI	Ethnic	White	1.2238384	200	74	73	0.0403501	77	0.5240786	74	1.2641885
Age	>=40 years	Sex	Female	1.1970881	200	73	70	0.0430077	65	0.9744453	73	1.2400958
BMI	high BMI	Type	Acute disease	1.1837232	100	58	67	-0.2336337	58	0.9500895	63	0.9350078
Ethnic	White	Ethnic	White	1.1166348	200	144	119	0.0403501	144	0.5907656	144	1.1569848
Age	>=40 years	Ethnic	Black	1.1131057	100	19	33	0.0271723	19	1.1402780	25	0.7689209
BMI	low BMI	Age	<40 years	1.0661942	200	73	82	-0.1320552	84	0.5855097	73	0.9341390
BMI	low BMI	Ethnic	Asian	1.0381097	200	50	53	-0.1972625	49	0.6504704	50	0.8408472
Age	<40 years	Age	<40 years	1.0314618	200	143	144	-0.1032044	164	0.6061683	143	0.9282574
Ethnic	Asian	Type	Acute disease	1.0228892	200	46	50	-0.1666440	42	0.8153932	46	0.8562452
Age	<40 years	Type	Chronic disease	1.0150956	200	78	77	-0.1406907	95	0.5859389	78	0.8744048
BMI	low BMI	BMI	low BMI	1.0041402	200	146	157	-0.0589245	151	0.5962553	146	0.9452158
Sex	Male	Type	Acute disease	1.0002103	200	61	65	-0.1684336	67	0.4530151	61	0.8317767
BMI	low BMI	Type	Chronic disease	0.9663556	200	89	89	-0.0895662	80	0.5915215	89	0.8767894
Ethnic	White	Type	Chronic disease	0.9598003	200	92	60	-0.0663296	76	0.5824246	92	0.8934708
BMI	high BMI	Age	<40 years	0.9355423	200	70	62	-0.0187840	80	0.6653357	70	0.9167583
BMI	low BMI	Age	>=40 years	0.9289176	200	73	75	0.0271723	67	0.6504704	73	0.9560899
BMI	high BMI	Ethnic	White	0.8957185	200	70	46	0.0231029	67	0.6272804	70	0.9188214
Ethnic	White	Sex	Male	0.8795431	200	70	57	-0.0397959	72	0.2292620	70	0.8397472
Age	<40 years	Ethnic	Asian	0.8610026	200	53	56	-0.1666440	59	0.5917255	53	0.6943586
BMI	high BMI	BMI	high BMI	0.8563161	200	154	123	-0.0397959	149	0.6116301	154	0.8165202
Age	>=40 years	Ethnic	White	0.8418724	200	77	53	0.1142174	68	0.5675774	77	0.9560899
BMI	low BMI	Sex	Male	0.8092369	200	77	72	-0.1148783	73	0.2511608	77	0.6943586
Ethnic	Asian	Ethnic	Asian	0.8071959	200	108	106	-0.1371630	108	0.6662038	108	0.6700329
Age	>=40 years	Age	>=40 years	0.8054468	200	157	136	0.0225843	136	0.5995700	157	0.8280312
Age	>=40 years	Ethnic	Asian	0.7938925	100	49	50	-0.1107637	49	0.6831288	55	0.6657136
BMI	high BMI	Ethnic	Asian	0.7914879	100	59	53	-0.1083591	59	0.6831288	58	0.6097214
Ethnic	Asian	Sex	Male	0.7794252	200	63	46	-0.2523170	58	0.4049125	63	0.5271082
BMI	high BMI	Age	>=40 years	0.7624675	200	84	61	-0.0404110	69	0.5920707	84	0.7220564
Type	Chronic disease	Type	Chronic disease	0.7574756	200	180	145	0.0164722	171	0.5721217	180	0.7739478
BMI	low BMI	Ethnic	Black	0.7561552	100	25	31	0.1919490	25	0.9481041	22	0.4964860
Ethnic	Black	Sex	Female	0.7044310	200	31	28	0.2305768	28	0.7605207	31	0.9350078
Age	>=40 years	Type	Chronic disease	0.6906872	200	102	68	0.0495966	76	0.5632136	102	0.7402838
Sex	Male	Sex	Male	0.6787458	200	150	130	-0.0819534	150	0.3836353	150	0.5967923
Ethnic	Asian	Type	Chronic disease	0.6786501	100	66	56	-0.1107637	66	0.5678864	62	0.5051804
Age	>=40 years	Sex	Male	0.6711926	200	84	66	-0.0212261	71	0.4517713	84	0.6499665
Age	<40 years	Sex	Male	0.5360498	200	66	64	-0.1184169	79	0.2307690	66	0.4176329
Ethnic	Black	Ethnic	Black	0.4771506	200	48	55	0.2162899	48	0.5984809	48	0.6934405
BMI	high BMI	Sex	Male	0.4547091	200	73	58	-0.0187840	77	0.3924904	73	0.4359251
Ethnic	Black	Sex	Male	0.3691546	100	20	27	0.2162899	20	0.5854445	17	0.2796167
Age	<40 years	Ethnic	Black	0.3291949	200	23	22	0.3089670	29	0.5962553	23	0.6381619
Sex	Male	Type	Chronic disease	0.3281991	200	89	65	0.0403501	83	0.1721311	89	0.3685491
Ethnic	Black	Type	Chronic disease	0.3114366	200	26	29	0.4957048	29	0.5701826	26	0.8071414
BMI	high BMI	Ethnic	Black	0.2817528	200	26	24	0.4770672	23	0.4165172	26	0.7588200
BMI	high BMI	Type	Chronic disease	0.2493957	200	91	56	0.4476301	91	0.5636511	91	0.6970257

Tiles in the plot correspond to patient groups characterised by the row and column labels. Magnitude of the maximum median response in each patient group is shown by the colour of the dots and the numerical values below them. The numbers (N) of patients in the different groups is indicated by the size of the dots (treated), and white squares with dark blue border (untreated). The dose that achieved the maximum median response is shown by the colour of the tiles (dark green: 100, lighter green: 200).

The diagonal shows groups stratified by a single characteristic, arranged so that the magnitude of the responses increases from bottom left to top right. The acute disease type was associated with the highest response, followed by female sex, and White ethnic. Conversely, patients with chronic disease, or were male or Black, had the lowest responses. BMI and age had relatively smaller effects as a single stratifier.

When two characteristics were applied, females <40 years had the highest response, followed by Whites with acute disease, White females, and females or Blacks with acute disease. The group wit the lowest median response were patients with chronic disease and high BMI. Low responses were also observed in patients who were Black and male, <40 years, with chronic disease, or high BMI. This is in contrast to the Blacks with acute disease having among the highest responses. However, numbers in these groups were relatively small and the result might need confirmation in a larger sample.

In most groups, the higher dose (200) was associated with the greater response, but Asians with chronic disease or high BMI, or patients with high BMI and acute disease might benefit more from the lower dose.

4.2. Best response in patients stratified by demographic and clinical parameters - showing Wilcoxon / Mann-Whitney test p-values for difference between treated and matched untreated subroups

# - Chunk__3b:                                      - #

## Calculate Wilcoxon test p-values for all comparisons

pTest_label <- "Wilcoxon test"

pValue.test<- function(x, y) {
  test <- wilcox.test(x, y)
  return(test$p.value)
}

# - Chunk__pTests: common code for p-value calculations in chunks 3b and c - #
# - Requires R-libraries and object exampleData (set-up)                   - #
# - Requires objects pChar_pairs and response_tbl (chunk_2)                - #


pValue.test_results <- list ()

for (i in 1: length(pChar_pairs)) {
  pValue.test_results_j <- list()

  for (j in 1: nrow(pChar_pairs[[i]])) {
    x <- pChar_pairs[[i]] %>% names
    y <- pChar_pairs[[i]] %>% .[j, ]%>% set_names(x)
# Ensure names are preserved when cbinding as transformed dataframe below
    if (length(x) == 1) {
      y <- as.data.frame (t(y))
      }

  pValue.test_results_j[[j]] <- 
    exampleData %>% 
      {if (length(x) == 1) {
        filter(.,
          get(x[1]) == paste(y[1])  
          ) 
        } else {
          filter(.,
            get(x[1]) == paste(y[1]) & 
            get(x[2]) == paste(y[2])
            ) 
          } 
        } %>% 
      summarise(
        p_value_0_vs_100 = pValue.test(target[dose == "0"], target[dose == "100"] ),
        p_value_0_vs_200 = pValue.test(target[dose == "0"], target[dose == "200"] )
        ) %>% 
      cbind(y, .)
    }
    pValue.test_results[[i]] <- pValue.test_results_j

  }

pValue.test_results <- 
pValue.test_results %>% 
  flatten() %>% 
  lapply(function(df)
      if(
        ncol(df) < 4) {
        cbind(df[, 1, drop = FALSE], df)    
        }else{
          df} 
      ) %>% 
    lapply(function(df)
      set_names(df,
        "Var1_value", "Var2_value", 
        "p_value_0_vs_100", "p_value_0_vs_200"
        ) 
      ) %>% 
    list_rbind() %>% 
    mutate(
      log.p_100 = log10(p_value_0_vs_100) %>% round(digits = 2),
      log.p_200 = log10(p_value_0_vs_200) %>% round(digits = 2)
    ) %>% 
    select(
      !c(p_value_0_vs_100, p_value_0_vs_200)
    ) %>% 
# Add column for joining with response_tbl and join
    unite(
      Vars1_2,
      c(Var1_value, Var2_value), 
      sep = "_", 
      remove = F
      ) %>% 
  left_join(
    response_tbl, .) %>% 
  mutate(
     log.p_max.resp = case_when(
      max_resp_dose == "100" ~ log.p_100, 
      max_resp_dose == "200" ~ log.p_200
     )
    )
 

 ## Generate new plot of best responses, showing Wilcoxon p-values

pValue.test_results %>% 
# Select and reshape  
  select(
    Var1_value, Var2_value, 
    log.p_max.resp, 
    max_resp_dose, max_response
    ) %>% 
  pivot_wider(
    names_from = "Var2_value",
    values_from = c(
      "max_resp_dose", "max_response", "log.p_max.resp"), 
      names_sep = "-"
      ) %>% 
  pivot_longer(
    cols = !c("Var1", "Var1_value"),
    names_to = c("resp", "Var2_value"),
    values_to = c("max_response"), names_sep = "-"
    ) %>% 
  pivot_wider(
    names_from = "resp", 
    values_from = "max_response") %>% 
  mutate(
    Var1_value = as.factor(Var1_value)
    ) %>% 

# Plot data
  ggplot(
    aes(
      x = Var1_value %>% 
        factor(., 
          levels = unique(.) %>% rev()
          ), 
      y = Var2_value %>% 
        factor(., 
          levels = unique(.) %>% rev()
          ),
      col   = max_response,
      label = max_response %>% round(3) 
      )
    ) +
    geom_tile(
      aes(fill= factor(max_resp_dose)),
      col="lightgrey"
      ) +
    scale_fill_manual(
      values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
      na.translate = F
      ) +
    geom_text(
      col = "black", 
      vjust = 3.5, hjust = 0.5, 
      size = 10/.pt
      ) +
    geom_point(
      aes( size = log.p_max.resp), 
      shape = 19,
      position = position_nudge(y = 0.1)
      ) +
    scale_size(
      range = c(18, 1), 
      name = "log.p_max.resp") +
    labs(
      x = NULL, y = NULL, 
      col = "Maximum\nresponse",
      fill = "Dose",
      title = "Best median response, stratified by one or two patient characteristics",
      subtitle = paste( "\nlog.p: ", pTest_label)
      ) +
    scale_color_gradient2(
      low = "white", high="darkblue", 
      limits = c(-0.25, 1.5)
      ) +
    scale_x_discrete(
      position = "top"
      ) +
    scale_y_discrete() +
    theme_light(base_size = 15) + 
    theme(
      axis.text.x = element_text(
        angle = 90, 
        vjust = 0.5,
        hjust = 0
        ),
      panel.grid = element_blank() 
      )

# Display, sorted by significance of response at dose 200

pValue.test_results %>% 
  select(
    Var1, Var1_value, 
    Var2, Var2_value, 
    log.p_max.resp, 
    max_response, max_resp_dose,
    N_100, median_100,
    N_200, median_200
    ) %>%
  arrange(log.p_max.resp) %>% 
  kable(., "html",
        caption = "Maximum responses, sorted by significance") %>% 
    scroll_box(width = "120%", height = "150px") %>% 
    kable_styling( 
      font_size = 12,
      bootstrap_options = "striped"
      )

Maximum responses, sorted by significance
Var1	Var1_value	Var2	Var2_value	log.p_max.resp	max_response	max_resp_dose	N_100	median_100	N_200	median_200
Sex	Female	Sex	Female	-21.33	1.2942516	200	150	0.9423150	150	1.3034651
Type	Acute disease	Type	Acute disease	-15.94	1.3149731	200	129	0.7299124	120	1.2406324
BMI	low BMI	BMI	low BMI	-14.33	1.0041402	200	151	0.5962553	146	0.9452158
Age	<40 years	Sex	Female	-13.00	1.4738253	200	85	0.9365259	77	1.3802094
BMI	low BMI	Sex	Female	-12.67	1.3772039	200	78	0.8670915	69	1.3952003
Age	<40 years	Age	<40 years	-12.35	1.0314618	200	164	0.6061683	143	0.9282574
Ethnic	White	Ethnic	White	-12.00	1.1166348	200	144	0.5907656	144	1.1569848
Ethnic	White	Sex	Female	-11.93	1.4080069	200	72	0.9754775	74	1.4863500
Sex	Female	Type	Acute disease	-11.41	1.4073091	200	62	1.1145576	59	1.4856522
Sex	Female	Type	Chronic disease	-10.93	1.2642623	200	88	0.8593007	91	1.2292830
Ethnic	White	Type	Acute disease	-9.57	1.4209837	200	68	0.6494141	52	1.5249881
BMI	high BMI	Sex	Female	-9.39	1.3101590	200	72	1.1110143	81	1.2697480
BMI	low BMI	Type	Acute disease	-9.37	1.2362381	200	71	0.6381219	57	1.2872080
Age	<40 years	Type	Acute disease	-9.30	1.2573735	200	69	0.7294408	65	1.1830328
Age	>=40 years	Age	>=40 years	-9.08	0.8054468	200	136	0.5995700	157	0.8280312
Age	>=40 years	Sex	Female	-8.89	1.1970881	200	65	0.9744453	73	1.2400958
BMI	low BMI	Age	<40 years	-8.82	1.0661942	200	84	0.5855097	73	0.9341390
BMI	low BMI	Ethnic	White	-8.82	1.2238384	200	77	0.5240786	74	1.2641885
Ethnic	Asian	Ethnic	Asian	-7.93	0.8071959	200	108	0.6662038	108	0.6700329
BMI	high BMI	BMI	high BMI	-7.75	0.8563161	200	149	0.6116301	154	0.8165202
Age	<40 years	Ethnic	White	-7.66	1.3769607	200	76	0.6428964	67	1.2697480
Type	Chronic disease	Type	Chronic disease	-7.50	0.7574756	200	171	0.5721217	180	0.7739478
Age	>=40 years	Type	Acute disease	-7.36	1.3518500	200	60	0.7337059	55	1.2615492
Ethnic	Asian	Sex	Female	-7.11	1.3629210	200	50	0.9448312	45	1.2521573
BMI	low BMI	Type	Chronic disease	-6.64	0.9663556	200	80	0.5915215	89	0.8767894
Ethnic	Asian	Type	Acute disease	-6.50	1.0228892	200	42	0.8153932	46	0.8562452
BMI	high BMI	Type	Acute disease	-6.36	1.1837232	100	58	0.9500895	63	0.9350078
Sex	Male	Type	Acute disease	-6.14	1.0002103	200	67	0.4530151	61	0.8317767
BMI	low BMI	Age	>=40 years	-6.05	0.9289176	200	67	0.6504704	73	0.9560899
BMI	low BMI	Ethnic	Asian	-5.39	1.0381097	200	49	0.6504704	50	0.8408472
Ethnic	White	Type	Chronic disease	-5.02	0.9598003	200	76	0.5824246	92	0.8934708
Age	<40 years	Ethnic	Asian	-4.94	0.8610026	200	59	0.5917255	53	0.6943586
Age	<40 years	Type	Chronic disease	-4.66	1.0150956	200	95	0.5859389	78	0.8744048
Age	>=40 years	Ethnic	White	-4.56	0.8418724	200	68	0.5675774	77	0.9560899
Sex	Male	Sex	Male	-4.45	0.6787458	200	150	0.3836353	150	0.5967923
BMI	low BMI	Sex	Male	-4.34	0.8092369	200	73	0.2511608	77	0.6943586
BMI	high BMI	Age	<40 years	-4.31	0.9355423	200	80	0.6653357	70	0.9167583
Age	>=40 years	Ethnic	Asian	-4.25	0.7938925	100	49	0.6831288	55	0.6657136
BMI	high BMI	Ethnic	White	-4.16	0.8957185	200	67	0.6272804	70	0.9188214
BMI	high BMI	Age	>=40 years	-4.12	0.7624675	200	69	0.5920707	84	0.7220564
Ethnic	Asian	Type	Chronic disease	-3.63	0.6786501	100	66	0.5678864	62	0.5051804
BMI	high BMI	Ethnic	Asian	-3.61	0.7914879	100	59	0.6831288	58	0.6097214
Ethnic	Black	Sex	Female	-3.08	0.7044310	200	28	0.7605207	31	0.9350078
Age	>=40 years	Type	Chronic disease	-2.98	0.6906872	200	76	0.5632136	102	0.7402838
Ethnic	White	Sex	Male	-2.64	0.8795431	200	72	0.2292620	70	0.8397472
Ethnic	Asian	Sex	Male	-2.50	0.7794252	200	58	0.4049125	63	0.5271082
Age	>=40 years	Sex	Male	-2.42	0.6711926	200	71	0.4517713	84	0.6499665
Ethnic	Black	Ethnic	Black	-2.39	0.4771506	200	48	0.5984809	48	0.6934405
Ethnic	Black	Type	Acute disease	-2.35	1.3886972	100	19	0.9676295	22	0.4197576
Age	<40 years	Sex	Male	-2.29	0.5360498	200	79	0.2307690	66	0.4176329
Age	>=40 years	Ethnic	Black	-1.98	1.1131057	100	19	1.1402780	25	0.7689209
BMI	low BMI	Ethnic	Black	-1.76	0.7561552	100	25	0.9481041	22	0.4964860
BMI	high BMI	Type	Chronic disease	-1.69	0.2493957	200	91	0.5636511	91	0.6970257
BMI	high BMI	Ethnic	Black	-1.26	0.2817528	200	23	0.4165172	26	0.7588200
Age	<40 years	Ethnic	Black	-1.26	0.3291949	200	29	0.5962553	23	0.6381619
Ethnic	Black	Sex	Male	-1.12	0.3691546	100	20	0.5854445	17	0.2796167
BMI	high BMI	Sex	Male	-1.09	0.4547091	200	77	0.3924904	73	0.4359251
Ethnic	Black	Type	Chronic disease	-0.75	0.3114366	200	29	0.5701826	26	0.8071414
Sex	Male	Type	Chronic disease	-0.62	0.3281991	200	83	0.1721311	89	0.3685491

4.3. Best response in patients stratified by demographic and clinical parameters - showing Mood test p-values for difference between treated and matched untreated subroups

# - Chunk__3c:                                      - #

## Calculate Mood test p-values for all comparisons

pTest_label <- "Mood test"

pValue.test<- function(x, y) {
  test <- mood.test(x, y)
  return(test$p.value)
}

# - Chunk__pTests: common code for p-value calculations in chunks 3b and c - #
# - Requires R-libraries and object exampleData (set-up)                   - #
# - Requires objects pChar_pairs and response_tbl (chunk_2)                - #


pValue.test_results <- list ()

for (i in 1: length(pChar_pairs)) {
  pValue.test_results_j <- list()

  for (j in 1: nrow(pChar_pairs[[i]])) {
    x <- pChar_pairs[[i]] %>% names
    y <- pChar_pairs[[i]] %>% .[j, ]%>% set_names(x)
# Ensure names are preserved when cbinding as transformed dataframe below
    if (length(x) == 1) {
      y <- as.data.frame (t(y))
      }

  pValue.test_results_j[[j]] <- 
    exampleData %>% 
      {if (length(x) == 1) {
        filter(.,
          get(x[1]) == paste(y[1])  
          ) 
        } else {
          filter(.,
            get(x[1]) == paste(y[1]) & 
            get(x[2]) == paste(y[2])
            ) 
          } 
        } %>% 
      summarise(
        p_value_0_vs_100 = pValue.test(target[dose == "0"], target[dose == "100"] ),
        p_value_0_vs_200 = pValue.test(target[dose == "0"], target[dose == "200"] )
        ) %>% 
      cbind(y, .)
    }
    pValue.test_results[[i]] <- pValue.test_results_j

  }

pValue.test_results <- 
pValue.test_results %>% 
  flatten() %>% 
  lapply(function(df)
      if(
        ncol(df) < 4) {
        cbind(df[, 1, drop = FALSE], df)    
        }else{
          df} 
      ) %>% 
    lapply(function(df)
      set_names(df,
        "Var1_value", "Var2_value", 
        "p_value_0_vs_100", "p_value_0_vs_200"
        ) 
      ) %>% 
    list_rbind() %>% 
    mutate(
      log.p_100 = log10(p_value_0_vs_100) %>% round(digits = 2),
      log.p_200 = log10(p_value_0_vs_200) %>% round(digits = 2)
    ) %>% 
    select(
      !c(p_value_0_vs_100, p_value_0_vs_200)
    ) %>% 
# Add column for joining with response_tbl and join
    unite(
      Vars1_2,
      c(Var1_value, Var2_value), 
      sep = "_", 
      remove = F
      ) %>% 
  left_join(
    response_tbl, .) %>% 
  mutate(
     log.p_max.resp = case_when(
      max_resp_dose == "100" ~ log.p_100, 
      max_resp_dose == "200" ~ log.p_200
     )
    )
 

 ## Generate new plot of best responses, showing Wilcoxon p-values

pValue.test_results %>% 
# Select and reshape  
  select(
    Var1_value, Var2_value, 
    log.p_max.resp, 
    max_resp_dose, max_response
    ) %>% 
  pivot_wider(
    names_from = "Var2_value",
    values_from = c(
      "max_resp_dose", "max_response", "log.p_max.resp"), 
      names_sep = "-"
      ) %>% 
  pivot_longer(
    cols = !c("Var1", "Var1_value"),
    names_to = c("resp", "Var2_value"),
    values_to = c("max_response"), names_sep = "-"
    ) %>% 
  pivot_wider(
    names_from = "resp", 
    values_from = "max_response") %>% 
  mutate(
    Var1_value = as.factor(Var1_value)
    ) %>% 

# Plot data
  ggplot(
    aes(
      x = Var1_value %>% 
        factor(., 
          levels = unique(.) %>% rev()
          ), 
      y = Var2_value %>% 
        factor(., 
          levels = unique(.) %>% rev()
          ),
      col   = max_response,
      label = max_response %>% round(3) 
      )
    ) +
    geom_tile(
      aes(fill= factor(max_resp_dose)),
      col="lightgrey"
      ) +
    scale_fill_manual(
      values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
      na.translate = F
      ) +
    geom_text(
      col = "black", 
      vjust = 3.5, hjust = 0.5, 
      size = 10/.pt
      ) +
    geom_point(
      aes( size = log.p_max.resp), 
      shape = 19,
      position = position_nudge(y = 0.1)
      ) +
    scale_size(
      range = c(18, 1), 
      name = "log.p_max.resp") +
    labs(
      x = NULL, y = NULL, 
      col = "Maximum\nresponse",
      fill = "Dose",
      title = "Best median response, stratified by one or two patient characteristics",
      subtitle = paste( "\nlog.p: ", pTest_label)
      ) +
    scale_color_gradient2(
      low = "white", high="darkblue", 
      limits = c(-0.25, 1.5)
      ) +
    scale_x_discrete(
      position = "top"
      ) +
    scale_y_discrete() +
    theme_light(base_size = 15) + 
    theme(
      axis.text.x = element_text(
        angle = 90, 
        vjust = 0.5,
        hjust = 0
        ),
      panel.grid = element_blank() 
      )

# Display, sorted by significance of response at dose 200

pValue.test_results %>% 
  select(
    Var1, Var1_value, 
    Var2, Var2_value, 
    log.p_max.resp, 
    max_response, max_resp_dose,
    N_100, median_100,
    N_200, median_200
    ) %>%
  arrange(log.p_max.resp) %>% 
  kable(., "html",
        caption = "Maximum responses, sorted by significance") %>% 
    scroll_box(width = "120%", height = "150px") %>% 
    kable_styling( 
      font_size = 12,
      bootstrap_options = "striped"
      )

Maximum responses, sorted by significance
Var1	Var1_value	Var2	Var2_value	log.p_max.resp	max_response	max_resp_dose	N_100	median_100	N_200	median_200
BMI	low BMI	Ethnic	Black	-1.79	0.7561552	100	25	0.9481041	22	0.4964860
BMI	low BMI	BMI	low BMI	-1.20	1.0041402	200	151	0.5962553	146	0.9452158
Age	<40 years	Ethnic	White	-1.06	1.3769607	200	76	0.6428964	67	1.2697480
BMI	high BMI	Sex	Male	-0.95	0.4547091	200	77	0.3924904	73	0.4359251
BMI	low BMI	Sex	Female	-0.90	1.3772039	200	78	0.8670915	69	1.3952003
Age	<40 years	Type	Chronic disease	-0.90	1.0150956	200	95	0.5859389	78	0.8744048
BMI	high BMI	Ethnic	Black	-0.88	0.2817528	200	23	0.4165172	26	0.7588200
Age	<40 years	Age	<40 years	-0.81	1.0314618	200	164	0.6061683	143	0.9282574
BMI	low BMI	Age	<40 years	-0.80	1.0661942	200	84	0.5855097	73	0.9341390
Ethnic	Asian	Sex	Female	-0.77	1.3629210	200	50	0.9448312	45	1.2521573
Ethnic	Asian	Sex	Male	-0.75	0.7794252	200	58	0.4049125	63	0.5271082
BMI	low BMI	Type	Acute disease	-0.70	1.2362381	200	71	0.6381219	57	1.2872080
Type	Chronic disease	Type	Chronic disease	-0.67	0.7574756	200	171	0.5721217	180	0.7739478
BMI	low BMI	Age	>=40 years	-0.67	0.9289176	200	67	0.6504704	73	0.9560899
BMI	high BMI	Type	Chronic disease	-0.65	0.2493957	200	91	0.5636511	91	0.6970257
BMI	high BMI	Age	>=40 years	-0.64	0.7624675	200	69	0.5920707	84	0.7220564
Ethnic	White	Ethnic	White	-0.55	1.1166348	200	144	0.5907656	144	1.1569848
BMI	low BMI	Ethnic	White	-0.55	1.2238384	200	77	0.5240786	74	1.2641885
Sex	Male	Type	Acute disease	-0.52	1.0002103	200	67	0.4530151	61	0.8317767
Ethnic	Asian	Type	Chronic disease	-0.50	0.6786501	100	66	0.5678864	62	0.5051804
BMI	low BMI	Ethnic	Asian	-0.46	1.0381097	200	49	0.6504704	50	0.8408472
BMI	low BMI	Type	Chronic disease	-0.42	0.9663556	200	80	0.5915215	89	0.8767894
Age	>=40 years	Sex	Male	-0.42	0.6711926	200	71	0.4517713	84	0.6499665
Age	<40 years	Sex	Female	-0.41	1.4738253	200	85	0.9365259	77	1.3802094
Sex	Female	Sex	Female	-0.39	1.2942516	200	150	0.9423150	150	1.3034651
Ethnic	White	Sex	Male	-0.33	0.8795431	200	72	0.2292620	70	0.8397472
BMI	high BMI	Age	<40 years	-0.29	0.9355423	200	80	0.6653357	70	0.9167583
Sex	Female	Type	Acute disease	-0.29	1.4073091	200	62	1.1145576	59	1.4856522
Ethnic	Black	Type	Acute disease	-0.26	1.3886972	100	19	0.9676295	22	0.4197576
Age	>=40 years	Ethnic	Asian	-0.25	0.7938925	100	49	0.6831288	55	0.6657136
BMI	low BMI	Sex	Male	-0.24	0.8092369	200	73	0.2511608	77	0.6943586
Ethnic	Black	Type	Chronic disease	-0.23	0.3114366	200	29	0.5701826	26	0.8071414
Sex	Female	Type	Chronic disease	-0.22	1.2642623	200	88	0.8593007	91	1.2292830
Sex	Male	Sex	Male	-0.21	0.6787458	200	150	0.3836353	150	0.5967923
Ethnic	Black	Sex	Female	-0.21	0.7044310	200	28	0.7605207	31	0.9350078
Ethnic	White	Type	Acute disease	-0.21	1.4209837	200	68	0.6494141	52	1.5249881
Ethnic	Black	Ethnic	Black	-0.20	0.4771506	200	48	0.5984809	48	0.6934405
Ethnic	Asian	Ethnic	Asian	-0.19	0.8071959	200	108	0.6662038	108	0.6700329
BMI	high BMI	Ethnic	White	-0.17	0.8957185	200	67	0.6272804	70	0.9188214
Ethnic	White	Type	Chronic disease	-0.17	0.9598003	200	76	0.5824246	92	0.8934708
Age	>=40 years	Type	Chronic disease	-0.16	0.6906872	200	76	0.5632136	102	0.7402838
BMI	high BMI	BMI	high BMI	-0.15	0.8563161	200	149	0.6116301	154	0.8165202
BMI	high BMI	Sex	Female	-0.14	1.3101590	200	72	1.1110143	81	1.2697480
Age	>=40 years	Sex	Female	-0.13	1.1970881	200	65	0.9744453	73	1.2400958
BMI	high BMI	Ethnic	Asian	-0.12	0.7914879	100	59	0.6831288	58	0.6097214
Ethnic	Asian	Type	Acute disease	-0.12	1.0228892	200	42	0.8153932	46	0.8562452
Age	>=40 years	Ethnic	Black	-0.11	1.1131057	100	19	1.1402780	25	0.7689209
Age	<40 years	Sex	Male	-0.11	0.5360498	200	79	0.2307690	66	0.4176329
Ethnic	White	Sex	Female	-0.09	1.4080069	200	72	0.9754775	74	1.4863500
Age	<40 years	Ethnic	Black	-0.07	0.3291949	200	29	0.5962553	23	0.6381619
Sex	Male	Type	Chronic disease	-0.07	0.3281991	200	83	0.1721311	89	0.3685491
Age	>=40 years	Age	>=40 years	-0.04	0.8054468	200	136	0.5995700	157	0.8280312
Ethnic	Black	Sex	Male	-0.04	0.3691546	100	20	0.5854445	17	0.2796167
BMI	high BMI	Type	Acute disease	-0.03	1.1837232	100	58	0.9500895	63	0.9350078
Age	>=40 years	Type	Acute disease	-0.03	1.3518500	200	60	0.7337059	55	1.2615492
Type	Acute disease	Type	Acute disease	-0.02	1.3149731	200	129	0.7299124	120	1.2406324
Age	<40 years	Ethnic	Asian	-0.02	0.8610026	200	59	0.5917255	53	0.6943586
Age	>=40 years	Ethnic	White	-0.02	0.8418724	200	68	0.5675774	77	0.9560899
Age	<40 years	Type	Acute disease	-0.01	1.2573735	200	69	0.7294408	65	1.1830328

5. Distribution of target values in patient subgroups treated with different doses

# - splitViolins                                                                      - #
# - From: https://stackoverflow.com/questions/35717353/split-violin-plot-with-ggplot2 - #

## Custom ggproto object for generating split violin plots

GeomSplitViolin <- ggplot2::ggproto(
    "GeomSplitViolin",
    ggplot2::GeomViolin,
    draw_group = function(self,
                          data,
                          ...,
                          # add the nudge here
                          nudge = 0,
                          draw_quantiles = NULL) {
        data <- transform(data,
                          xminv = x - violinwidth * (x - xmin),
                          xmaxv = x + violinwidth * (xmax - x))
        grp <- data[1, "group"]
        newdata <- plyr::arrange(transform(data,
                                           x = if (grp %% 2 == 1) xminv else xmaxv),
                                 if (grp %% 2 == 1) y else -y)
        newdata <- rbind(newdata[1, ],
                         newdata,
                         newdata[nrow(newdata), ],
                         newdata[1, ])
        newdata[c(1, nrow(newdata)-1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

        # now nudge them apart
        newdata$x <- ifelse(newdata$group %% 2 == 1,
                            newdata$x - nudge,
                            newdata$x + nudge)

        if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {

            stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))

            quantiles <- ggplot2:::create_quantile_segment_frame(data,
                                                             draw_quantiles)
            aesthetics <- data[rep(1, nrow(quantiles)),
                               setdiff(names(data), c("x", "y")),
                               drop = FALSE]
            aesthetics$alpha <- rep(1, nrow(quantiles))
            both <- cbind(quantiles, aesthetics)
            quantile_grob <- ggplot2::GeomPath$draw_panel(both, ...)
            ggplot2:::ggname("geom_split_violin",
                             grid::grobTree(ggplot2::GeomPolygon$draw_panel(newdata, ...),
                                            quantile_grob))
        }
    else {
            ggplot2:::ggname("geom_split_violin",
                             ggplot2::GeomPolygon$draw_panel(newdata, ...))
        }
    }
)

geom_split_violin <- function(mapping = NULL,
                              data = NULL,
                              stat = "ydensity",
                              position = "identity",
                              # nudge param here
                              nudge = 0,
                              ...,
                              draw_quantiles = NULL,
                              trim = TRUE,
                              scale = "area",
                              na.rm = FALSE,
                              show.legend = NA,
                              inherit.aes = TRUE) {

    ggplot2::layer(data = data,
                   mapping = mapping,
                   stat = stat,
                   geom = GeomSplitViolin,
                   position = position,
                   show.legend = show.legend,
                   inherit.aes = inherit.aes,
                   params = list(trim = trim,
                                 scale = scale,
                                 # don't forget the nudge
                                 nudge = nudge,
                                 draw_quantiles = draw_quantiles,
                                 na.rm = na.rm,
                                 ...))
}

I used split violin box plots for the pairwise comparison of target medians and distributions in low and high responder groups. Low responders are shown in green on the left side, high responder groups in red on the right. Likewise, facets for the second group variable are arranged from lowest (left) to highest responders (right).

5.1. Dose vs. target values in patients stratified by sex and age

(see also 3.6.: female patients stratified by sex and age)

# - Chunk_4a                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- compare_means(
  data = exampleData,
  target ~ dose,
  group.by = c("sex", "age"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("sex", "age", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Sex", "Age", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select( dose, target) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200") 
        ), 
      y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
# Order facets low > high responders
    facet_grid(. 
      ~ factor(
        age, 
        levels = c(">=40 years", "<40 years")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 2),
widths = c(0.2, 1, 1),
align = "h"
)

5.2. Dose vs. target values in patients stratified by sex and type of disease

# - Chunk_4b                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- compare_means(
  data = exampleData,
  target ~ dose,
  group.by = c("sex", "type"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("sex", "type", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Sex", "Type", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select( dose, target) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200") 
        ), 
      y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        type, 
        levels = c("Chronic disease", "Acute disease")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)

5.3. Dose vs. target values in patients stratified by ethnic and type of disease

(see also 3.5.: female patients stratified by ethnic and type of disease)

# - Chunk_4c                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #


# Generate a table of significant differences (Wilcoxon test) 

sign_test <- compare_means(
  data = exampleData,
  target ~ dose,
  group.by = c("type", "race"),
  ref.group = "0",
 ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select(!".y." & !"group1" & !"method")  %>% 
  set_names(
    "Sex", "Ethnic", "Dose", "p", "p.adj", "p.format", "p.signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by( bmi, age, race, sex, type) %>% 
  select( dose, target) %>% 
  ggplot(
    aes(
      x= factor(dose, levels = c("0", "100", "200") ), 
      y = target,
# Order low > high responders
    fill = factor (type, levels = c("Chronic disease", "Acute disease") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue (direction = -1) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Type",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 3),
widths = c(0.2, 1, 1),
align = "h"
)

Interestingly, Blacks with chronic disease had high target levels, even when untreated,

5.4. Dose vs. target values in patients stratified by race and sex

# - Chunk_4d                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- compare_means(
  data = exampleData,
  target ~ dose,
  group.by = c("sex", "race"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select(!".y." & !"group1" & !"method")  %>% 
  set_names(
    "Sex", "Ethnic", "Dose", "p", "p.adj", "p.format", "p.signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot(
    aes(
      x= factor(dose, levels = c("0", "100", "200") ), 
      y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
  # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 2),
widths = c(0.2, 1, 1),
align = "h"
)

5.5. Dose vs. target values in female patients stratified by ethnic and disease type

# - Chunk_4e                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(sex == "Female") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("race", "type"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("race", "type", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Ethnic", "Type", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200") 
        ), 
         y = target,
# Order low > high responders
    fill = factor (type, levels = c("Chronic disease", "Acute disease") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue (direction = -1) +       geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Type",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 3),
widths = c(0.2, 1, 1),
align = "h"
)

5.6. Dose vs. target values in female patients stratified by ethnic and age

# - Chunk_4f                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(sex == "Female") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("race", "age"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("race", "age", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Ethnic", "Age", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot (
    aes (x= factor(dose, levels = c("0", "100", "200") ), 
         y = target,
# Order low > high responders
        fill = factor (age, levels = c(">=40 years", "<40 years") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue (direction = -1) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target",
      fill = "Age",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 2.5),
widths = c(0.2, 1, 1),
align = "h"
)

Younger white females appear to respond better than older, especially at the higher dose, whereas the opposite appears to be true for Asian females (not sure about the significance of that…).

5.7. Dose vs. target values in White patients stratified by sex and disease type

# - Chunk_4g                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(race == "White") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("sex", "type"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("sex", "type", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Sex", "Type", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot (
    aes (x= factor(dose, levels = c("0", "100", "200") ), 
         y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
 # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        type, 
        levels = c("Chronic disease", "Acute disease")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)

5.8. Dose vs. target values in White patients stratified by sex and age

# - Chunk_4h                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(race == "White") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("sex", "age"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("sex", "age", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Sex", "Age", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select (dose, target) %>% 
  ggplot(
    aes(
      x = factor(
        dose, 
        levels = c("0", "100", "200") 
        ), 
      y = target,
# Order low > high responders
    fill = 
      factor(
        sex, 
        levels = c("Male", "Female") )
        ) 
      ) +
# Color red for patient variable value with higher response 
    scale_fill_hue( direction = -1) +    
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
     geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
# Order facets low > high responders
    facet_grid(. 
      ~ factor(
        age, 
        levels = c(">=40 years", "<40 years")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "Sex",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)

5.9. Dose vs. target values in patients stratified by race and BMI

(The difference between median targets in the Black/ low BMI/ dose 100 group had the highest significance in the Mood test)

# - Chunk_4i                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

sign_test <- 
  exampleData %>% 
  filter(race == "White") %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = c("race", "bmi"),
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("race", "bmi", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Ethnic", "BMI", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme("light")
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 
    

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  group_by(bmi, age, race, sex, type) %>% 
  select(dose, target) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200")
        ), 
      y = target,
      fill = bmi
      ) 
    ) +
# Color red for patient variable value with higher response 
    scale_fill_hue (direction = -1) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
     geom_split_violin( 
      width = 1,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
        width = 0.3, 
        outlier.shape = NA) +
  # Order facets low > high responders
    facet_grid(. 
      ~ factor(
        race, 
        levels = c("Black", "Asian", "White")
        ) 
      ) +
    theme_bw() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) + 
    labs(
      x = "Dose", 
      y = "Target", 
      fill = "BMI",
      size = 12, 
      hjust = -15
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)

6. Groups with the highest and lowest responses

It appears that the groups the received the most significant benefit were women who were either <40 years (p.adj = 7.9 * 10^-13, median difference 1.474), White (p.adj = 1.2 * 10^-12, median difference 1.406), or with acute disease (p.adj = 3.9 * 10^-12, median difference 1.407). Whites with acute disease or <40 years also showed significant benefit (p.adj = 2.7 * 10^-10, median difference 1.421; median difference 1.474, 3.3 * 10^-9, median difference 1.377), but this could be mainly because of the female sub-population of these groups.

6.1. White females <40 years respond better than the remaining patients

# - Chunk_5a                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

exampleData <- 
  exampleData %>% 
    mutate(
      Group = 
        case_when(
          race == "White" &
          #type == "Acute disease" &
          age == "<40 years" &
          sex == "Female"
             ~ "White / F / <40 yrs",
        TRUE ~ "Remaining"
        )
      )

sign_test <- 
  exampleData %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = "Group",
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("Group", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Group", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme(
      "light", 
      base_size = 10
      ),
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  select (dose, target, type, Group) %>% 
  group_by(Group) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200")
        ), 
      y = target,
      fill = Group 
      ) 
    ) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    fill_palette(
      palette = c("lightgrey","salmon")
      ) +
    geom_split_violin( 
      width = 1,
      nudge = 0.01,
      alpha = 0.3,
      color = "grey") +
    geom_boxplot(
      width = 0.2, 
      outlier.shape = NA) +
    geom_point(
      pch = 21,
      size = 3,
      position = 
        position_jitterdodge(
          dodge.width = 0.7, 
          jitter.width = 0.1
          ),
      color = "grey",
      alpha = 0.5
      ) +
    theme_light() +
    theme(
      text=element_text( size = 13), 
      axis.text = element_text( size = 12),
      ) +
    scale_y_continuous(
      breaks = c(-2: 5),
      minor_breaks = NULL
      ) +    
    labs(
      x = "Dose", 
      y = "Target", 
      size = 13, hjust = -15,
      title = "Target vs. dose levels in the group with highest responses\n"
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)

6.2. Black males with chronic disease show no or negative response

However, this might need confirmation in a larger sample of this subgroup.

# - Chunk_5b                                                 - #
# - Requires R-libraries and object exampleData (set-up)     - #
# - Requires object geom_split_violin (chunk: splitViolins)  - #

# Generate a table of significant differences (Wilcoxon test) 

exampleData <- 
  exampleData %>% 
    mutate(
      Group = 
        case_when(
          race == "Black" &
          # bmi == "high BMI" &
           type == "Chronic disease" &
          # age == ">=40 years" &
          sex == "Male"
             ~ "Black/ M / Chronic disease",
        TRUE ~ "Remaining"
        )
      )

sign_test <- 
  exampleData %>% 
  compare_means(
  data = .,
  target ~ dose,
  group.by = "Group",
  ref.group = "0",
  ) %>%
  filter (
    !group1 == "100" &
    !p.signif == "ns"  ) %>% 
  select("Group", "group2", "p.format", "p.adj", "p.signif")  %>% 
  set_names(
    "Group", "Dose", "p", "p.adj", "signif") %>% 
  ggtexttable(
    rows = NULL,
    theme = ttheme(
      "light", 
      base_size = 10
      ),
    ) %>% 
  tab_add_title(
    text = "Wilcoxon test (ref. dose = 0)", 
    face = "bold", size = 12,
    ) 

# Generate split violin plot and insert the Wilcoxon test table 

ggarrange(

exampleData %>% 
  select (dose, target, type, Group) %>% 
  group_by(Group) %>% 
  ggplot(
    aes(
      x= factor(
        dose, 
        levels = c("0", "100", "200")
        ), 
      y = target,
      fill = Group 
      ) 
    ) +
    geom_hline(
      yintercept = 0, 
      color = "darkgrey",
      linewidth = 1.1
      ) +
    fill_palette(
      palette = c("turquoise", "lightgrey")
      ) +
    geom_split_violin( 
      width = 1,
      nudge = 0.01,
      alpha = 0.3,
      color = "grey") +
  geom_boxplot(
      width = 0.2, 
      outlier.shape = NA) +
  geom_point(
    pch = 21,
    size = 3,
    position = 
      position_jitterdodge(
        dodge.width = 0.7, 
        jitter.width = 0.1
        ),
    color = "grey",
    alpha = 0.5
    ) +
  theme_light() +
  theme(
    text=element_text( size = 13), 
    axis.text = element_text( size = 12),
    ) +
  scale_y_continuous(
    breaks = c(-2: 5),
    minor_breaks = NULL
    ) +    labs(
      x = "Dose", 
      y = "Target", 
      size = 13, hjust = -15,
      title = "Target vs. dose levels in the group with lowest responses\n"
      ),

# Add space between plot and table
NULL,

sign_test,

ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)

7. Questions and alternative visualisations

The distribution of target in the total population looks similar for all 3 doses, and suggests that at least two major subgroups exist that might respond differently.

In the following 3 figures I tried to produce an overview of best responses in subgroups characterised by 1-2 of the patient variables. Because there was no guarantee for a linear relationship between dose and target, I decided to compare the dose 100 and dose 200 groups individually with the dose 0 (untreated) group. Because in many subgroups the target values did not seem to be normally distributed, I choose to compare the differences between the median target values. I also tried to assess the significance of these differences. It struck me that the p-values of the Wilcoxon test seemed extremely significant in all groups, even after correction. By contrast, the Mood test produced no significant results.
In contrast to the Wilcoxon test, the Mood test does not require the distributions of the compared groups to be similar. However, its power to detect inter-group differences is much lower. My question to the statistician would be whether any of these tests should be applied to all subgroups characterised by one or two of the variables. Certainly, visually inspecting the target distributions for each seems subjective and cumbersome.

A better approach could be the modified Forest plot suggested by Ballarine et al., and implemented in the “UpSet plot” function from the R package “SubgrPlots” ^1, 2^.

Forest/UpSet plot

I found two other alternatives interesting, although perhaps more as tools for subgroup analysis than visualisation. The webinar discussed the shiny app DoRiS (Dose Response in Subgroup Analysis) which aims to facilitate data exploration and hypothesis generation in data sets with a very large numbers of subgroups. Rather than comparing the responses themselves, it employed a user-determined response threshold which could be, e.g., the clinically meaningful effect size. Either manual choice or automatic recognition of subgroup patterns was possible, but at the time only for a single patient variable. Favoring an exploratory approach, p-values and confidence intervals were not calculated to leave room for interdisciplinary discussions of plausible mechanisms and clinical relevance. I could not find the link to the shiny app on github, but it appeared to be very similar to the package “subscreen” (SAS® version 9.4 and R) ³.

Finally, “personalized” seems to be a very comprehensive analysis package in R that unifies a variety of methods for subgroup identification and treatment evaluation into one general, consistent framework ⁴.

Data exploration with Subscreen

8. Citation

Please quote as: Weissensteiner, Thomas. Visualising patient groups who might benefit from personalised dosing. RPubs, 10 Aug. 2024. https://rpubs.com/thomas-weissensteiner/1231597.

9. References

A critical review of graphics for subgroup analyses in clinical trials. Ballarini NM, Chiu YD, König F, Posch M, Jaki T. Pharm Stat 2020, 19(5): 541-560. https://doi.org/10.1002/pst.2012
A critical review of graphics for subgroup analyses in clinical trials. Ballarini NM, Chiu YD, König F, Posch M, Jaki T. Pharm Stat. 2020 Sep;19(5):541-560. https://doi.org/10.1002/pst.2012
SubgrPlots: Graphical Displays for Subgroup Analysis in Clinical Trials. R package version 0.1.0. Ballarini N, Chiu YD.; 2018
A novel concept of screening for subgrouping factors for the association between socioeconomic status and respiratory allergies. Muysers, C., Messina, F., Keil, T. et al. J Expo Sci Environ Epidemiol 2022, 32: 295–302. https://doi.org/10.1038/s41370-021-00365-x
Subgroup Identification Using the personalized Package. Huling, JD, & Yu, M. Journal of Statistical Software 2021, 98(5): 1–60. https://doi.org/10.18637/jss.v098.i05