My scripts:
About myself: www.linkedin.com/in/ThomasWs-Mopfair
Current dosing practices are based on average responses from limited clinical trials. However, overlooking variations such as to sex, age, and genetics can lead to adverse events in certain patient groups. Conversely, a new drug which might have benefit for a sub-group of patients might fail to get approval because the pivotal trial showed insufficient general efficacy in the general population.
The scripts were written for the VIS-SIG Wonderful Wednesdays challenge 10 Oct 24 (link to recording). “Wonderful Wednesdays” are a monthly webinar organised by the Visualisation Special Interest Group of Statisticians in the Pharmaceutical Industry (PSI).
The task consisted of identifying patient subgroups with dose
responses that deviate from the remaining study participants, and who
therefore might benefit from personalized dosing.
# - set-up - #
library(dplyr) # version 1.1.4 # general grammar
library(purrr) # version 1.0.2 # set_names, list_rbind
library(tidyr) # version 1.3.1 # pivot_longer, pivot_wider
library(ggpubr) # version , includes ggplot2 3.4.0 # stat_compare_means, ggtexttable, ggarrange
library(kableExtra) # rendering tables in HTML format
exampleData <- read.csv("https://raw.githubusercontent.com/VIS-SIG/Wonderful-Wednesdays/refs/heads/master/data/2024/2024-09-11/WWW54ExampleData.csv")
The data were a simulated dose response set with the following structure:
dose: dose levels (0, 100, 200 mg)
target: continuous response variable (higher is better)
subgroup variables:
bmi (body mass index: ‘low BMI’ or ‘high BMI’)
age (‘<40 years’ or ‘≥40 years’)
race (‘Black’, ‘Asian’, ‘White’)
sex (‘Female’, ‘Male’)
type (type of disease: ‘Acute disease’ or ‘Chronic disease’)
# - Chunk_1 - #
# - Requires R-libraries and object exampleData (set-up) - #
## Plot distribution of target values at different doses in the total population
sign_test <-
exampleData %>%
compare_means(
data = .,
target ~ dose,
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme(
"light",
base_size = 10
),
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
select (dose, target) %>%
ggplot (
aes(
x= factor(
dose,
levels = c("0", "100", "200"
)
),
y = target)
) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_violin(
color = "lightgrey",
fill = "lightgrey",
width = 0.45
) +
geom_jitter(
height = 0, width = 0.1,
size = 2,
alpha = 0.2
) +
geom_boxplot(
width = 0.15 ,
outlier.shape = NA,
color = "salmon",
lwd = 0.7,
alpha = 0.5
) +
theme_light() +
theme(
text=element_text(size = 13),
axis.text = element_text(
size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
size = 12, hjust = -15,
title = "Target vs. dose levels in the total patient population\n"
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 1)
)
The reference for the Wilcoxon test is the group of untreated
patients (dose = 0). Differences in target levels in treated patients
are significant. However, the distribution of target values at the three
dose levels suggests that at least two subgroups might exists that could
respond differently.
In order to identify these potential groups, I stratified patients by one or two of the clinical and demographic characteristics that were provided. For each group, I then calculated the median response for doses 100 and 200 (difference between median target values in the treated subgroups and patients receiving a dose 0).
# - Chunk_2 - #
# - Requires R-libraries and object exampleData (set-up) - #
## Table of responses in patient subgroups characterised by one or two variables
# Generate all pairwise combinations of patient characteristics
pVars <-
exampleData %>%
names() %>% .[!. %in% c("dose","target")]
pVar_pairs <-
pVars %>%
{ c(
lapply(., function(x) c(x, x)), # duplicate each characteristic
combn(.,2, simplify = F) # all combinations of 2 characteristics
) }
# ---
pChar_pairs <-
lapply(
seq_along(pVar_pairs),
function (x)
exampleData %>%
select(pVar_pairs[[x]] ) %>%
unique(.)
)
# ---
# Calculate median responses for the patient subgroups
response_tbl <-
lapply(
pVar_pairs,
function(x)
exampleData %>%
group_by(
across(all_of(x) ),
dose
) %>%
summarise(
median(target),
N = n()
)
) %>%
# ensure all dataframes have the same number of columns
lapply(function(df)
if(
ncol(df) < length(pVars)) {
cbind(df[, 1, drop = FALSE], df)
}else{
df}
) %>%
lapply(function(df)
pivot_wider(
df,
names_from = "dose",
values_from = c("median(target)", "N")
) %>%
set_names(
"Var1_value", "Var2_value",
"median_0", "median_100", "median_200",
"N_0", "N_100", "N_200"
)
) %>%
list_rbind() %>%
mutate(
response_100 = median_100 - median_0,
response_200 = median_200 - median_0,
max_response = max(response_100, response_200) ,
max_resp_dose = which.max(c(response_100, response_200))*100,
N_max = c(N_100, N_200)[max_resp_dose/100]
)
## Table of best median responses stratified by patient subgroups
# Re-order columns, sort by maximum response (highest first)
response_tbl %>%
select(
"Var1_value", "Var2_value",
"N_max", "max_response",
"N_0", "N_100", "N_200",
"median_0", "median_100", "median_200"
) %>%
arrange( desc(max_response)) %>%
kable(., "html") %>%
scroll_box(width = "100%", height = "200px") %>%
kable_styling(
bootstrap_options = c(
"striped", "hover", "responsive", full_width = F)
)
# Optional: rearrange so that diagonal of the plot shows responses in order of magnitude
response_tbl <-
response_tbl %>%
mutate(
.before = Var1,
Var1 = case_when (
Var1_value %in% c("high BMI", "low BMI") ~ "BMI",
Var1_value %in% c(">=40 years", "<40 years") ~ "Age",
Var1_value %in% c("Asian", "Black", "White") ~ "Ethnic",
Var1_value %in% c("Chronic disease", "Acute disease") ~ "Type",
Var1_value %in% c("Female", "Male") ~ "Sex"
),
Var2 = case_when (
Var2_value %in% c("high BMI", "low BMI") ~ "BMI",
Var2_value %in% c(">=40 years", "<40 years") ~ "Age",
Var2_value %in% c("Asian", "Black", "White") ~ "Ethnic",
Var2_value %in% c("Chronic disease", "Acute disease") ~ "Type",
Var2_value %in% c("Female", "Male") ~ "Sex"
)
) %>%
mutate(
pVar_equal = (Var1 == Var2)
) %>%
{ rbind(
filter(., pVar_equal) %>%
group_by(Var1) %>%
arrange(desc(max_response), .by_group = TRUE) %>%
mutate(max_value = max(max_response, na.rm = TRUE)) %>%
arrange(desc(max_value)) %>%
select (!max_value),
filter(., !pVar_equal)
)
} %>%
# Add column for joining with Wilcox and Mood test results (chunks 3b and 3c)
unite(
Vars1_2,
c(Var1_value, Var2_value),
sep = "_",
remove = F
)
# - Chunk_3a - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object response_tbl (chunk_2) - #
# Select and reshape
response_tbl %>%
select(
Var1,
Var1_value, Var2_value,
N_max, N_0,
max_resp_dose, max_response
) %>%
pivot_wider(
names_from = "Var2_value",
values_from = c(
"max_resp_dose", "max_response", "N_max", "N_0"),
names_sep = "-"
) %>%
pivot_longer(
cols = !c("Var1", "Var1_value"),
names_to = c("resp", "Var2_value"),
values_to = c("max_response"), names_sep = "-"
) %>%
pivot_wider(
names_from = "resp",
values_from = "max_response") %>%
mutate(
Var1 = as.factor(Var1)
) %>%
# Plot data
ggplot(
aes(
x = Var1_value %>%
factor(., levels = unique(.) %>% rev ),
y = Var2_value %>%
factor(., levels = unique(.) %>% rev ),
col = max_response,
label = max_response %>% round(3)
)
) +
geom_tile(
aes(fill= factor(max_resp_dose)),
col="lightgrey"
) +
scale_fill_manual(
values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
na.translate = F
) +
geom_text(
col = "black",
vjust = 2.5, hjust = 0.5,
size = 10/.pt
) +
geom_point(
aes( size = N_0),
shape = 15,
color = "white",
position = position_nudge(y = 0.1)
) +
geom_point(
aes( size = N_max),
shape = 19,
position = position_nudge(y = 0.1)
) +
geom_point(
aes( size = N_0),
shape = 22,
color = "darkblue",
position = position_nudge(y = 0.1)
) +
scale_size(range = c(1, 15), name = "N") +
labs(
x = NULL, y = NULL,
col = "Maximum\nresponse",
fill = "Dose",
title = "Best median response, stratified by one or two patient characteristics") +
scale_color_gradient2(
low = "white", high="darkblue",
limits = c(-0.25, 1.5)
) +
scale_x_discrete(
position = "top"
) +
scale_y_discrete() +
theme_light(base_size = 15) +
theme(
axis.text.x = element_text(
angle = 90,
vjust = 0.5,
hjust = 0
),
panel.grid = element_blank()
)
response_tbl %>%
select(
Var1, Var1_value,
Var2, Var2_value,
max_response, max_resp_dose, N_max,
N_0, median_0,
N_100, median_100,
N_200, median_200
) %>%
arrange(
desc(max_response) ) %>%
kable(., "html",
caption = "Maximum response in each patient group, in descending order") %>%
scroll_box(width = "120%", height = "150px") %>%
kable_styling(
font_size = 12,
bootstrap_options = "striped"
)
Var1 | Var1_value | Var2 | Var2_value | max_response | max_resp_dose | N_max | N_0 | median_0 | N_100 | median_100 | N_200 | median_200 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Age | <40 years | Sex | Female | 1.4738253 | 200 | 77 | 80 | -0.0936159 | 85 | 0.9365259 | 77 | 1.3802094 |
Ethnic | White | Type | Acute disease | 1.4209837 | 200 | 52 | 59 | 0.1040044 | 68 | 0.6494141 | 52 | 1.5249881 |
Ethnic | White | Sex | Female | 1.4080069 | 200 | 74 | 62 | 0.0783431 | 72 | 0.9754775 | 74 | 1.4863500 |
Sex | Female | Type | Acute disease | 1.4073091 | 200 | 59 | 70 | 0.0783431 | 62 | 1.1145576 | 59 | 1.4856522 |
Ethnic | Black | Type | Acute disease | 1.3886972 | 100 | 19 | 26 | -0.4210677 | 19 | 0.9676295 | 22 | 0.4197576 |
BMI | low BMI | Sex | Female | 1.3772039 | 200 | 69 | 85 | 0.0179963 | 78 | 0.8670915 | 69 | 1.3952003 |
Age | <40 years | Ethnic | White | 1.3769607 | 200 | 67 | 66 | -0.1072127 | 76 | 0.6428964 | 67 | 1.2697480 |
Ethnic | Asian | Sex | Female | 1.3629210 | 200 | 45 | 60 | -0.1107637 | 50 | 0.9448312 | 45 | 1.2521573 |
Age | >=40 years | Type | Acute disease | 1.3518500 | 200 | 55 | 68 | -0.0903008 | 60 | 0.7337059 | 55 | 1.2615492 |
Type | Acute disease | Type | Acute disease | 1.3149731 | 200 | 120 | 135 | -0.0743407 | 129 | 0.7299124 | 120 | 1.2406324 |
BMI | high BMI | Sex | Female | 1.3101590 | 200 | 81 | 65 | -0.0404110 | 72 | 1.1110143 | 81 | 1.2697480 |
Sex | Female | Sex | Female | 1.2942516 | 200 | 150 | 150 | 0.0092135 | 150 | 0.9423150 | 150 | 1.3034651 |
Sex | Female | Type | Chronic disease | 1.2642623 | 200 | 91 | 80 | -0.0349792 | 88 | 0.8593007 | 91 | 1.2292830 |
Age | <40 years | Type | Acute disease | 1.2573735 | 200 | 65 | 67 | -0.0743407 | 69 | 0.7294408 | 65 | 1.1830328 |
BMI | low BMI | Type | Acute disease | 1.2362381 | 200 | 57 | 68 | 0.0509698 | 71 | 0.6381219 | 57 | 1.2872080 |
BMI | low BMI | Ethnic | White | 1.2238384 | 200 | 74 | 73 | 0.0403501 | 77 | 0.5240786 | 74 | 1.2641885 |
Age | >=40 years | Sex | Female | 1.1970881 | 200 | 73 | 70 | 0.0430077 | 65 | 0.9744453 | 73 | 1.2400958 |
BMI | high BMI | Type | Acute disease | 1.1837232 | 100 | 58 | 67 | -0.2336337 | 58 | 0.9500895 | 63 | 0.9350078 |
Ethnic | White | Ethnic | White | 1.1166348 | 200 | 144 | 119 | 0.0403501 | 144 | 0.5907656 | 144 | 1.1569848 |
Age | >=40 years | Ethnic | Black | 1.1131057 | 100 | 19 | 33 | 0.0271723 | 19 | 1.1402780 | 25 | 0.7689209 |
BMI | low BMI | Age | <40 years | 1.0661942 | 200 | 73 | 82 | -0.1320552 | 84 | 0.5855097 | 73 | 0.9341390 |
BMI | low BMI | Ethnic | Asian | 1.0381097 | 200 | 50 | 53 | -0.1972625 | 49 | 0.6504704 | 50 | 0.8408472 |
Age | <40 years | Age | <40 years | 1.0314618 | 200 | 143 | 144 | -0.1032044 | 164 | 0.6061683 | 143 | 0.9282574 |
Ethnic | Asian | Type | Acute disease | 1.0228892 | 200 | 46 | 50 | -0.1666440 | 42 | 0.8153932 | 46 | 0.8562452 |
Age | <40 years | Type | Chronic disease | 1.0150956 | 200 | 78 | 77 | -0.1406907 | 95 | 0.5859389 | 78 | 0.8744048 |
BMI | low BMI | BMI | low BMI | 1.0041402 | 200 | 146 | 157 | -0.0589245 | 151 | 0.5962553 | 146 | 0.9452158 |
Sex | Male | Type | Acute disease | 1.0002103 | 200 | 61 | 65 | -0.1684336 | 67 | 0.4530151 | 61 | 0.8317767 |
BMI | low BMI | Type | Chronic disease | 0.9663556 | 200 | 89 | 89 | -0.0895662 | 80 | 0.5915215 | 89 | 0.8767894 |
Ethnic | White | Type | Chronic disease | 0.9598003 | 200 | 92 | 60 | -0.0663296 | 76 | 0.5824246 | 92 | 0.8934708 |
BMI | high BMI | Age | <40 years | 0.9355423 | 200 | 70 | 62 | -0.0187840 | 80 | 0.6653357 | 70 | 0.9167583 |
BMI | low BMI | Age | >=40 years | 0.9289176 | 200 | 73 | 75 | 0.0271723 | 67 | 0.6504704 | 73 | 0.9560899 |
BMI | high BMI | Ethnic | White | 0.8957185 | 200 | 70 | 46 | 0.0231029 | 67 | 0.6272804 | 70 | 0.9188214 |
Ethnic | White | Sex | Male | 0.8795431 | 200 | 70 | 57 | -0.0397959 | 72 | 0.2292620 | 70 | 0.8397472 |
Age | <40 years | Ethnic | Asian | 0.8610026 | 200 | 53 | 56 | -0.1666440 | 59 | 0.5917255 | 53 | 0.6943586 |
BMI | high BMI | BMI | high BMI | 0.8563161 | 200 | 154 | 123 | -0.0397959 | 149 | 0.6116301 | 154 | 0.8165202 |
Age | >=40 years | Ethnic | White | 0.8418724 | 200 | 77 | 53 | 0.1142174 | 68 | 0.5675774 | 77 | 0.9560899 |
BMI | low BMI | Sex | Male | 0.8092369 | 200 | 77 | 72 | -0.1148783 | 73 | 0.2511608 | 77 | 0.6943586 |
Ethnic | Asian | Ethnic | Asian | 0.8071959 | 200 | 108 | 106 | -0.1371630 | 108 | 0.6662038 | 108 | 0.6700329 |
Age | >=40 years | Age | >=40 years | 0.8054468 | 200 | 157 | 136 | 0.0225843 | 136 | 0.5995700 | 157 | 0.8280312 |
Age | >=40 years | Ethnic | Asian | 0.7938925 | 100 | 49 | 50 | -0.1107637 | 49 | 0.6831288 | 55 | 0.6657136 |
BMI | high BMI | Ethnic | Asian | 0.7914879 | 100 | 59 | 53 | -0.1083591 | 59 | 0.6831288 | 58 | 0.6097214 |
Ethnic | Asian | Sex | Male | 0.7794252 | 200 | 63 | 46 | -0.2523170 | 58 | 0.4049125 | 63 | 0.5271082 |
BMI | high BMI | Age | >=40 years | 0.7624675 | 200 | 84 | 61 | -0.0404110 | 69 | 0.5920707 | 84 | 0.7220564 |
Type | Chronic disease | Type | Chronic disease | 0.7574756 | 200 | 180 | 145 | 0.0164722 | 171 | 0.5721217 | 180 | 0.7739478 |
BMI | low BMI | Ethnic | Black | 0.7561552 | 100 | 25 | 31 | 0.1919490 | 25 | 0.9481041 | 22 | 0.4964860 |
Ethnic | Black | Sex | Female | 0.7044310 | 200 | 31 | 28 | 0.2305768 | 28 | 0.7605207 | 31 | 0.9350078 |
Age | >=40 years | Type | Chronic disease | 0.6906872 | 200 | 102 | 68 | 0.0495966 | 76 | 0.5632136 | 102 | 0.7402838 |
Sex | Male | Sex | Male | 0.6787458 | 200 | 150 | 130 | -0.0819534 | 150 | 0.3836353 | 150 | 0.5967923 |
Ethnic | Asian | Type | Chronic disease | 0.6786501 | 100 | 66 | 56 | -0.1107637 | 66 | 0.5678864 | 62 | 0.5051804 |
Age | >=40 years | Sex | Male | 0.6711926 | 200 | 84 | 66 | -0.0212261 | 71 | 0.4517713 | 84 | 0.6499665 |
Age | <40 years | Sex | Male | 0.5360498 | 200 | 66 | 64 | -0.1184169 | 79 | 0.2307690 | 66 | 0.4176329 |
Ethnic | Black | Ethnic | Black | 0.4771506 | 200 | 48 | 55 | 0.2162899 | 48 | 0.5984809 | 48 | 0.6934405 |
BMI | high BMI | Sex | Male | 0.4547091 | 200 | 73 | 58 | -0.0187840 | 77 | 0.3924904 | 73 | 0.4359251 |
Ethnic | Black | Sex | Male | 0.3691546 | 100 | 20 | 27 | 0.2162899 | 20 | 0.5854445 | 17 | 0.2796167 |
Age | <40 years | Ethnic | Black | 0.3291949 | 200 | 23 | 22 | 0.3089670 | 29 | 0.5962553 | 23 | 0.6381619 |
Sex | Male | Type | Chronic disease | 0.3281991 | 200 | 89 | 65 | 0.0403501 | 83 | 0.1721311 | 89 | 0.3685491 |
Ethnic | Black | Type | Chronic disease | 0.3114366 | 200 | 26 | 29 | 0.4957048 | 29 | 0.5701826 | 26 | 0.8071414 |
BMI | high BMI | Ethnic | Black | 0.2817528 | 200 | 26 | 24 | 0.4770672 | 23 | 0.4165172 | 26 | 0.7588200 |
BMI | high BMI | Type | Chronic disease | 0.2493957 | 200 | 91 | 56 | 0.4476301 | 91 | 0.5636511 | 91 | 0.6970257 |
Tiles in the plot correspond to patient groups characterised by the
row and column labels. Magnitude of the maximum median response in each
patient group is shown by the colour of the dots and the numerical
values below them. The numbers (N) of patients in the different groups
is indicated by the size of the dots (treated), and white squares with
dark blue border (untreated). The dose that achieved the maximum median
response is shown by the colour of the tiles (dark green: 100, lighter
green: 200).
The diagonal shows groups stratified by a single characteristic, arranged so that the magnitude of the responses increases from bottom left to top right. The acute disease type was associated with the highest response, followed by female sex, and White ethnic. Conversely, patients with chronic disease, or were male or Black, had the lowest responses. BMI and age had relatively smaller effects as a single stratifier.
When two characteristics were applied, females <40 years had the highest response, followed by Whites with acute disease, White females, and females or Blacks with acute disease. The group wit the lowest median response were patients with chronic disease and high BMI. Low responses were also observed in patients who were Black and male, <40 years, with chronic disease, or high BMI. This is in contrast to the Blacks with acute disease having among the highest responses. However, numbers in these groups were relatively small and the result might need confirmation in a larger sample.
In most groups, the higher dose (200) was associated with the greater
response, but Asians with chronic disease or high BMI, or patients with
high BMI and acute disease might benefit more from the lower dose.
# - Chunk__3b: - #
## Calculate Wilcoxon test p-values for all comparisons
pTest_label <- "Wilcoxon test"
pValue.test<- function(x, y) {
test <- wilcox.test(x, y)
return(test$p.value)
}
# - Chunk__pTests: common code for p-value calculations in chunks 3b and c - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires objects pChar_pairs and response_tbl (chunk_2) - #
pValue.test_results <- list ()
for (i in 1: length(pChar_pairs)) {
pValue.test_results_j <- list()
for (j in 1: nrow(pChar_pairs[[i]])) {
x <- pChar_pairs[[i]] %>% names
y <- pChar_pairs[[i]] %>% .[j, ]%>% set_names(x)
# Ensure names are preserved when cbinding as transformed dataframe below
if (length(x) == 1) {
y <- as.data.frame (t(y))
}
pValue.test_results_j[[j]] <-
exampleData %>%
{if (length(x) == 1) {
filter(.,
get(x[1]) == paste(y[1])
)
} else {
filter(.,
get(x[1]) == paste(y[1]) &
get(x[2]) == paste(y[2])
)
}
} %>%
summarise(
p_value_0_vs_100 = pValue.test(target[dose == "0"], target[dose == "100"] ),
p_value_0_vs_200 = pValue.test(target[dose == "0"], target[dose == "200"] )
) %>%
cbind(y, .)
}
pValue.test_results[[i]] <- pValue.test_results_j
}
pValue.test_results <-
pValue.test_results %>%
flatten() %>%
lapply(function(df)
if(
ncol(df) < 4) {
cbind(df[, 1, drop = FALSE], df)
}else{
df}
) %>%
lapply(function(df)
set_names(df,
"Var1_value", "Var2_value",
"p_value_0_vs_100", "p_value_0_vs_200"
)
) %>%
list_rbind() %>%
mutate(
log.p_100 = log10(p_value_0_vs_100) %>% round(digits = 2),
log.p_200 = log10(p_value_0_vs_200) %>% round(digits = 2)
) %>%
select(
!c(p_value_0_vs_100, p_value_0_vs_200)
) %>%
# Add column for joining with response_tbl and join
unite(
Vars1_2,
c(Var1_value, Var2_value),
sep = "_",
remove = F
) %>%
left_join(
response_tbl, .) %>%
mutate(
log.p_max.resp = case_when(
max_resp_dose == "100" ~ log.p_100,
max_resp_dose == "200" ~ log.p_200
)
)
## Generate new plot of best responses, showing Wilcoxon p-values
pValue.test_results %>%
# Select and reshape
select(
Var1_value, Var2_value,
log.p_max.resp,
max_resp_dose, max_response
) %>%
pivot_wider(
names_from = "Var2_value",
values_from = c(
"max_resp_dose", "max_response", "log.p_max.resp"),
names_sep = "-"
) %>%
pivot_longer(
cols = !c("Var1", "Var1_value"),
names_to = c("resp", "Var2_value"),
values_to = c("max_response"), names_sep = "-"
) %>%
pivot_wider(
names_from = "resp",
values_from = "max_response") %>%
mutate(
Var1_value = as.factor(Var1_value)
) %>%
# Plot data
ggplot(
aes(
x = Var1_value %>%
factor(.,
levels = unique(.) %>% rev()
),
y = Var2_value %>%
factor(.,
levels = unique(.) %>% rev()
),
col = max_response,
label = max_response %>% round(3)
)
) +
geom_tile(
aes(fill= factor(max_resp_dose)),
col="lightgrey"
) +
scale_fill_manual(
values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
na.translate = F
) +
geom_text(
col = "black",
vjust = 3.5, hjust = 0.5,
size = 10/.pt
) +
geom_point(
aes( size = log.p_max.resp),
shape = 19,
position = position_nudge(y = 0.1)
) +
scale_size(
range = c(18, 1),
name = "log.p_max.resp") +
labs(
x = NULL, y = NULL,
col = "Maximum\nresponse",
fill = "Dose",
title = "Best median response, stratified by one or two patient characteristics",
subtitle = paste( "\nlog.p: ", pTest_label)
) +
scale_color_gradient2(
low = "white", high="darkblue",
limits = c(-0.25, 1.5)
) +
scale_x_discrete(
position = "top"
) +
scale_y_discrete() +
theme_light(base_size = 15) +
theme(
axis.text.x = element_text(
angle = 90,
vjust = 0.5,
hjust = 0
),
panel.grid = element_blank()
)
# Display, sorted by significance of response at dose 200
pValue.test_results %>%
select(
Var1, Var1_value,
Var2, Var2_value,
log.p_max.resp,
max_response, max_resp_dose,
N_100, median_100,
N_200, median_200
) %>%
arrange(log.p_max.resp) %>%
kable(., "html",
caption = "Maximum responses, sorted by significance") %>%
scroll_box(width = "120%", height = "150px") %>%
kable_styling(
font_size = 12,
bootstrap_options = "striped"
)
Var1 | Var1_value | Var2 | Var2_value | log.p_max.resp | max_response | max_resp_dose | N_100 | median_100 | N_200 | median_200 |
---|---|---|---|---|---|---|---|---|---|---|
Sex | Female | Sex | Female | -21.33 | 1.2942516 | 200 | 150 | 0.9423150 | 150 | 1.3034651 |
Type | Acute disease | Type | Acute disease | -15.94 | 1.3149731 | 200 | 129 | 0.7299124 | 120 | 1.2406324 |
BMI | low BMI | BMI | low BMI | -14.33 | 1.0041402 | 200 | 151 | 0.5962553 | 146 | 0.9452158 |
Age | <40 years | Sex | Female | -13.00 | 1.4738253 | 200 | 85 | 0.9365259 | 77 | 1.3802094 |
BMI | low BMI | Sex | Female | -12.67 | 1.3772039 | 200 | 78 | 0.8670915 | 69 | 1.3952003 |
Age | <40 years | Age | <40 years | -12.35 | 1.0314618 | 200 | 164 | 0.6061683 | 143 | 0.9282574 |
Ethnic | White | Ethnic | White | -12.00 | 1.1166348 | 200 | 144 | 0.5907656 | 144 | 1.1569848 |
Ethnic | White | Sex | Female | -11.93 | 1.4080069 | 200 | 72 | 0.9754775 | 74 | 1.4863500 |
Sex | Female | Type | Acute disease | -11.41 | 1.4073091 | 200 | 62 | 1.1145576 | 59 | 1.4856522 |
Sex | Female | Type | Chronic disease | -10.93 | 1.2642623 | 200 | 88 | 0.8593007 | 91 | 1.2292830 |
Ethnic | White | Type | Acute disease | -9.57 | 1.4209837 | 200 | 68 | 0.6494141 | 52 | 1.5249881 |
BMI | high BMI | Sex | Female | -9.39 | 1.3101590 | 200 | 72 | 1.1110143 | 81 | 1.2697480 |
BMI | low BMI | Type | Acute disease | -9.37 | 1.2362381 | 200 | 71 | 0.6381219 | 57 | 1.2872080 |
Age | <40 years | Type | Acute disease | -9.30 | 1.2573735 | 200 | 69 | 0.7294408 | 65 | 1.1830328 |
Age | >=40 years | Age | >=40 years | -9.08 | 0.8054468 | 200 | 136 | 0.5995700 | 157 | 0.8280312 |
Age | >=40 years | Sex | Female | -8.89 | 1.1970881 | 200 | 65 | 0.9744453 | 73 | 1.2400958 |
BMI | low BMI | Age | <40 years | -8.82 | 1.0661942 | 200 | 84 | 0.5855097 | 73 | 0.9341390 |
BMI | low BMI | Ethnic | White | -8.82 | 1.2238384 | 200 | 77 | 0.5240786 | 74 | 1.2641885 |
Ethnic | Asian | Ethnic | Asian | -7.93 | 0.8071959 | 200 | 108 | 0.6662038 | 108 | 0.6700329 |
BMI | high BMI | BMI | high BMI | -7.75 | 0.8563161 | 200 | 149 | 0.6116301 | 154 | 0.8165202 |
Age | <40 years | Ethnic | White | -7.66 | 1.3769607 | 200 | 76 | 0.6428964 | 67 | 1.2697480 |
Type | Chronic disease | Type | Chronic disease | -7.50 | 0.7574756 | 200 | 171 | 0.5721217 | 180 | 0.7739478 |
Age | >=40 years | Type | Acute disease | -7.36 | 1.3518500 | 200 | 60 | 0.7337059 | 55 | 1.2615492 |
Ethnic | Asian | Sex | Female | -7.11 | 1.3629210 | 200 | 50 | 0.9448312 | 45 | 1.2521573 |
BMI | low BMI | Type | Chronic disease | -6.64 | 0.9663556 | 200 | 80 | 0.5915215 | 89 | 0.8767894 |
Ethnic | Asian | Type | Acute disease | -6.50 | 1.0228892 | 200 | 42 | 0.8153932 | 46 | 0.8562452 |
BMI | high BMI | Type | Acute disease | -6.36 | 1.1837232 | 100 | 58 | 0.9500895 | 63 | 0.9350078 |
Sex | Male | Type | Acute disease | -6.14 | 1.0002103 | 200 | 67 | 0.4530151 | 61 | 0.8317767 |
BMI | low BMI | Age | >=40 years | -6.05 | 0.9289176 | 200 | 67 | 0.6504704 | 73 | 0.9560899 |
BMI | low BMI | Ethnic | Asian | -5.39 | 1.0381097 | 200 | 49 | 0.6504704 | 50 | 0.8408472 |
Ethnic | White | Type | Chronic disease | -5.02 | 0.9598003 | 200 | 76 | 0.5824246 | 92 | 0.8934708 |
Age | <40 years | Ethnic | Asian | -4.94 | 0.8610026 | 200 | 59 | 0.5917255 | 53 | 0.6943586 |
Age | <40 years | Type | Chronic disease | -4.66 | 1.0150956 | 200 | 95 | 0.5859389 | 78 | 0.8744048 |
Age | >=40 years | Ethnic | White | -4.56 | 0.8418724 | 200 | 68 | 0.5675774 | 77 | 0.9560899 |
Sex | Male | Sex | Male | -4.45 | 0.6787458 | 200 | 150 | 0.3836353 | 150 | 0.5967923 |
BMI | low BMI | Sex | Male | -4.34 | 0.8092369 | 200 | 73 | 0.2511608 | 77 | 0.6943586 |
BMI | high BMI | Age | <40 years | -4.31 | 0.9355423 | 200 | 80 | 0.6653357 | 70 | 0.9167583 |
Age | >=40 years | Ethnic | Asian | -4.25 | 0.7938925 | 100 | 49 | 0.6831288 | 55 | 0.6657136 |
BMI | high BMI | Ethnic | White | -4.16 | 0.8957185 | 200 | 67 | 0.6272804 | 70 | 0.9188214 |
BMI | high BMI | Age | >=40 years | -4.12 | 0.7624675 | 200 | 69 | 0.5920707 | 84 | 0.7220564 |
Ethnic | Asian | Type | Chronic disease | -3.63 | 0.6786501 | 100 | 66 | 0.5678864 | 62 | 0.5051804 |
BMI | high BMI | Ethnic | Asian | -3.61 | 0.7914879 | 100 | 59 | 0.6831288 | 58 | 0.6097214 |
Ethnic | Black | Sex | Female | -3.08 | 0.7044310 | 200 | 28 | 0.7605207 | 31 | 0.9350078 |
Age | >=40 years | Type | Chronic disease | -2.98 | 0.6906872 | 200 | 76 | 0.5632136 | 102 | 0.7402838 |
Ethnic | White | Sex | Male | -2.64 | 0.8795431 | 200 | 72 | 0.2292620 | 70 | 0.8397472 |
Ethnic | Asian | Sex | Male | -2.50 | 0.7794252 | 200 | 58 | 0.4049125 | 63 | 0.5271082 |
Age | >=40 years | Sex | Male | -2.42 | 0.6711926 | 200 | 71 | 0.4517713 | 84 | 0.6499665 |
Ethnic | Black | Ethnic | Black | -2.39 | 0.4771506 | 200 | 48 | 0.5984809 | 48 | 0.6934405 |
Ethnic | Black | Type | Acute disease | -2.35 | 1.3886972 | 100 | 19 | 0.9676295 | 22 | 0.4197576 |
Age | <40 years | Sex | Male | -2.29 | 0.5360498 | 200 | 79 | 0.2307690 | 66 | 0.4176329 |
Age | >=40 years | Ethnic | Black | -1.98 | 1.1131057 | 100 | 19 | 1.1402780 | 25 | 0.7689209 |
BMI | low BMI | Ethnic | Black | -1.76 | 0.7561552 | 100 | 25 | 0.9481041 | 22 | 0.4964860 |
BMI | high BMI | Type | Chronic disease | -1.69 | 0.2493957 | 200 | 91 | 0.5636511 | 91 | 0.6970257 |
BMI | high BMI | Ethnic | Black | -1.26 | 0.2817528 | 200 | 23 | 0.4165172 | 26 | 0.7588200 |
Age | <40 years | Ethnic | Black | -1.26 | 0.3291949 | 200 | 29 | 0.5962553 | 23 | 0.6381619 |
Ethnic | Black | Sex | Male | -1.12 | 0.3691546 | 100 | 20 | 0.5854445 | 17 | 0.2796167 |
BMI | high BMI | Sex | Male | -1.09 | 0.4547091 | 200 | 77 | 0.3924904 | 73 | 0.4359251 |
Ethnic | Black | Type | Chronic disease | -0.75 | 0.3114366 | 200 | 29 | 0.5701826 | 26 | 0.8071414 |
Sex | Male | Type | Chronic disease | -0.62 | 0.3281991 | 200 | 83 | 0.1721311 | 89 | 0.3685491 |
# - Chunk__3c: - #
## Calculate Mood test p-values for all comparisons
pTest_label <- "Mood test"
pValue.test<- function(x, y) {
test <- mood.test(x, y)
return(test$p.value)
}
# - Chunk__pTests: common code for p-value calculations in chunks 3b and c - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires objects pChar_pairs and response_tbl (chunk_2) - #
pValue.test_results <- list ()
for (i in 1: length(pChar_pairs)) {
pValue.test_results_j <- list()
for (j in 1: nrow(pChar_pairs[[i]])) {
x <- pChar_pairs[[i]] %>% names
y <- pChar_pairs[[i]] %>% .[j, ]%>% set_names(x)
# Ensure names are preserved when cbinding as transformed dataframe below
if (length(x) == 1) {
y <- as.data.frame (t(y))
}
pValue.test_results_j[[j]] <-
exampleData %>%
{if (length(x) == 1) {
filter(.,
get(x[1]) == paste(y[1])
)
} else {
filter(.,
get(x[1]) == paste(y[1]) &
get(x[2]) == paste(y[2])
)
}
} %>%
summarise(
p_value_0_vs_100 = pValue.test(target[dose == "0"], target[dose == "100"] ),
p_value_0_vs_200 = pValue.test(target[dose == "0"], target[dose == "200"] )
) %>%
cbind(y, .)
}
pValue.test_results[[i]] <- pValue.test_results_j
}
pValue.test_results <-
pValue.test_results %>%
flatten() %>%
lapply(function(df)
if(
ncol(df) < 4) {
cbind(df[, 1, drop = FALSE], df)
}else{
df}
) %>%
lapply(function(df)
set_names(df,
"Var1_value", "Var2_value",
"p_value_0_vs_100", "p_value_0_vs_200"
)
) %>%
list_rbind() %>%
mutate(
log.p_100 = log10(p_value_0_vs_100) %>% round(digits = 2),
log.p_200 = log10(p_value_0_vs_200) %>% round(digits = 2)
) %>%
select(
!c(p_value_0_vs_100, p_value_0_vs_200)
) %>%
# Add column for joining with response_tbl and join
unite(
Vars1_2,
c(Var1_value, Var2_value),
sep = "_",
remove = F
) %>%
left_join(
response_tbl, .) %>%
mutate(
log.p_max.resp = case_when(
max_resp_dose == "100" ~ log.p_100,
max_resp_dose == "200" ~ log.p_200
)
)
## Generate new plot of best responses, showing Wilcoxon p-values
pValue.test_results %>%
# Select and reshape
select(
Var1_value, Var2_value,
log.p_max.resp,
max_resp_dose, max_response
) %>%
pivot_wider(
names_from = "Var2_value",
values_from = c(
"max_resp_dose", "max_response", "log.p_max.resp"),
names_sep = "-"
) %>%
pivot_longer(
cols = !c("Var1", "Var1_value"),
names_to = c("resp", "Var2_value"),
values_to = c("max_response"), names_sep = "-"
) %>%
pivot_wider(
names_from = "resp",
values_from = "max_response") %>%
mutate(
Var1_value = as.factor(Var1_value)
) %>%
# Plot data
ggplot(
aes(
x = Var1_value %>%
factor(.,
levels = unique(.) %>% rev()
),
y = Var2_value %>%
factor(.,
levels = unique(.) %>% rev()
),
col = max_response,
label = max_response %>% round(3)
)
) +
geom_tile(
aes(fill= factor(max_resp_dose)),
col="lightgrey"
) +
scale_fill_manual(
values = c("100" = "#BBE0CF", "200" = "#D5F9E8"),
na.translate = F
) +
geom_text(
col = "black",
vjust = 3.5, hjust = 0.5,
size = 10/.pt
) +
geom_point(
aes( size = log.p_max.resp),
shape = 19,
position = position_nudge(y = 0.1)
) +
scale_size(
range = c(18, 1),
name = "log.p_max.resp") +
labs(
x = NULL, y = NULL,
col = "Maximum\nresponse",
fill = "Dose",
title = "Best median response, stratified by one or two patient characteristics",
subtitle = paste( "\nlog.p: ", pTest_label)
) +
scale_color_gradient2(
low = "white", high="darkblue",
limits = c(-0.25, 1.5)
) +
scale_x_discrete(
position = "top"
) +
scale_y_discrete() +
theme_light(base_size = 15) +
theme(
axis.text.x = element_text(
angle = 90,
vjust = 0.5,
hjust = 0
),
panel.grid = element_blank()
)
# Display, sorted by significance of response at dose 200
pValue.test_results %>%
select(
Var1, Var1_value,
Var2, Var2_value,
log.p_max.resp,
max_response, max_resp_dose,
N_100, median_100,
N_200, median_200
) %>%
arrange(log.p_max.resp) %>%
kable(., "html",
caption = "Maximum responses, sorted by significance") %>%
scroll_box(width = "120%", height = "150px") %>%
kable_styling(
font_size = 12,
bootstrap_options = "striped"
)
Var1 | Var1_value | Var2 | Var2_value | log.p_max.resp | max_response | max_resp_dose | N_100 | median_100 | N_200 | median_200 |
---|---|---|---|---|---|---|---|---|---|---|
BMI | low BMI | Ethnic | Black | -1.79 | 0.7561552 | 100 | 25 | 0.9481041 | 22 | 0.4964860 |
BMI | low BMI | BMI | low BMI | -1.20 | 1.0041402 | 200 | 151 | 0.5962553 | 146 | 0.9452158 |
Age | <40 years | Ethnic | White | -1.06 | 1.3769607 | 200 | 76 | 0.6428964 | 67 | 1.2697480 |
BMI | high BMI | Sex | Male | -0.95 | 0.4547091 | 200 | 77 | 0.3924904 | 73 | 0.4359251 |
BMI | low BMI | Sex | Female | -0.90 | 1.3772039 | 200 | 78 | 0.8670915 | 69 | 1.3952003 |
Age | <40 years | Type | Chronic disease | -0.90 | 1.0150956 | 200 | 95 | 0.5859389 | 78 | 0.8744048 |
BMI | high BMI | Ethnic | Black | -0.88 | 0.2817528 | 200 | 23 | 0.4165172 | 26 | 0.7588200 |
Age | <40 years | Age | <40 years | -0.81 | 1.0314618 | 200 | 164 | 0.6061683 | 143 | 0.9282574 |
BMI | low BMI | Age | <40 years | -0.80 | 1.0661942 | 200 | 84 | 0.5855097 | 73 | 0.9341390 |
Ethnic | Asian | Sex | Female | -0.77 | 1.3629210 | 200 | 50 | 0.9448312 | 45 | 1.2521573 |
Ethnic | Asian | Sex | Male | -0.75 | 0.7794252 | 200 | 58 | 0.4049125 | 63 | 0.5271082 |
BMI | low BMI | Type | Acute disease | -0.70 | 1.2362381 | 200 | 71 | 0.6381219 | 57 | 1.2872080 |
Type | Chronic disease | Type | Chronic disease | -0.67 | 0.7574756 | 200 | 171 | 0.5721217 | 180 | 0.7739478 |
BMI | low BMI | Age | >=40 years | -0.67 | 0.9289176 | 200 | 67 | 0.6504704 | 73 | 0.9560899 |
BMI | high BMI | Type | Chronic disease | -0.65 | 0.2493957 | 200 | 91 | 0.5636511 | 91 | 0.6970257 |
BMI | high BMI | Age | >=40 years | -0.64 | 0.7624675 | 200 | 69 | 0.5920707 | 84 | 0.7220564 |
Ethnic | White | Ethnic | White | -0.55 | 1.1166348 | 200 | 144 | 0.5907656 | 144 | 1.1569848 |
BMI | low BMI | Ethnic | White | -0.55 | 1.2238384 | 200 | 77 | 0.5240786 | 74 | 1.2641885 |
Sex | Male | Type | Acute disease | -0.52 | 1.0002103 | 200 | 67 | 0.4530151 | 61 | 0.8317767 |
Ethnic | Asian | Type | Chronic disease | -0.50 | 0.6786501 | 100 | 66 | 0.5678864 | 62 | 0.5051804 |
BMI | low BMI | Ethnic | Asian | -0.46 | 1.0381097 | 200 | 49 | 0.6504704 | 50 | 0.8408472 |
BMI | low BMI | Type | Chronic disease | -0.42 | 0.9663556 | 200 | 80 | 0.5915215 | 89 | 0.8767894 |
Age | >=40 years | Sex | Male | -0.42 | 0.6711926 | 200 | 71 | 0.4517713 | 84 | 0.6499665 |
Age | <40 years | Sex | Female | -0.41 | 1.4738253 | 200 | 85 | 0.9365259 | 77 | 1.3802094 |
Sex | Female | Sex | Female | -0.39 | 1.2942516 | 200 | 150 | 0.9423150 | 150 | 1.3034651 |
Ethnic | White | Sex | Male | -0.33 | 0.8795431 | 200 | 72 | 0.2292620 | 70 | 0.8397472 |
BMI | high BMI | Age | <40 years | -0.29 | 0.9355423 | 200 | 80 | 0.6653357 | 70 | 0.9167583 |
Sex | Female | Type | Acute disease | -0.29 | 1.4073091 | 200 | 62 | 1.1145576 | 59 | 1.4856522 |
Ethnic | Black | Type | Acute disease | -0.26 | 1.3886972 | 100 | 19 | 0.9676295 | 22 | 0.4197576 |
Age | >=40 years | Ethnic | Asian | -0.25 | 0.7938925 | 100 | 49 | 0.6831288 | 55 | 0.6657136 |
BMI | low BMI | Sex | Male | -0.24 | 0.8092369 | 200 | 73 | 0.2511608 | 77 | 0.6943586 |
Ethnic | Black | Type | Chronic disease | -0.23 | 0.3114366 | 200 | 29 | 0.5701826 | 26 | 0.8071414 |
Sex | Female | Type | Chronic disease | -0.22 | 1.2642623 | 200 | 88 | 0.8593007 | 91 | 1.2292830 |
Sex | Male | Sex | Male | -0.21 | 0.6787458 | 200 | 150 | 0.3836353 | 150 | 0.5967923 |
Ethnic | Black | Sex | Female | -0.21 | 0.7044310 | 200 | 28 | 0.7605207 | 31 | 0.9350078 |
Ethnic | White | Type | Acute disease | -0.21 | 1.4209837 | 200 | 68 | 0.6494141 | 52 | 1.5249881 |
Ethnic | Black | Ethnic | Black | -0.20 | 0.4771506 | 200 | 48 | 0.5984809 | 48 | 0.6934405 |
Ethnic | Asian | Ethnic | Asian | -0.19 | 0.8071959 | 200 | 108 | 0.6662038 | 108 | 0.6700329 |
BMI | high BMI | Ethnic | White | -0.17 | 0.8957185 | 200 | 67 | 0.6272804 | 70 | 0.9188214 |
Ethnic | White | Type | Chronic disease | -0.17 | 0.9598003 | 200 | 76 | 0.5824246 | 92 | 0.8934708 |
Age | >=40 years | Type | Chronic disease | -0.16 | 0.6906872 | 200 | 76 | 0.5632136 | 102 | 0.7402838 |
BMI | high BMI | BMI | high BMI | -0.15 | 0.8563161 | 200 | 149 | 0.6116301 | 154 | 0.8165202 |
BMI | high BMI | Sex | Female | -0.14 | 1.3101590 | 200 | 72 | 1.1110143 | 81 | 1.2697480 |
Age | >=40 years | Sex | Female | -0.13 | 1.1970881 | 200 | 65 | 0.9744453 | 73 | 1.2400958 |
BMI | high BMI | Ethnic | Asian | -0.12 | 0.7914879 | 100 | 59 | 0.6831288 | 58 | 0.6097214 |
Ethnic | Asian | Type | Acute disease | -0.12 | 1.0228892 | 200 | 42 | 0.8153932 | 46 | 0.8562452 |
Age | >=40 years | Ethnic | Black | -0.11 | 1.1131057 | 100 | 19 | 1.1402780 | 25 | 0.7689209 |
Age | <40 years | Sex | Male | -0.11 | 0.5360498 | 200 | 79 | 0.2307690 | 66 | 0.4176329 |
Ethnic | White | Sex | Female | -0.09 | 1.4080069 | 200 | 72 | 0.9754775 | 74 | 1.4863500 |
Age | <40 years | Ethnic | Black | -0.07 | 0.3291949 | 200 | 29 | 0.5962553 | 23 | 0.6381619 |
Sex | Male | Type | Chronic disease | -0.07 | 0.3281991 | 200 | 83 | 0.1721311 | 89 | 0.3685491 |
Age | >=40 years | Age | >=40 years | -0.04 | 0.8054468 | 200 | 136 | 0.5995700 | 157 | 0.8280312 |
Ethnic | Black | Sex | Male | -0.04 | 0.3691546 | 100 | 20 | 0.5854445 | 17 | 0.2796167 |
BMI | high BMI | Type | Acute disease | -0.03 | 1.1837232 | 100 | 58 | 0.9500895 | 63 | 0.9350078 |
Age | >=40 years | Type | Acute disease | -0.03 | 1.3518500 | 200 | 60 | 0.7337059 | 55 | 1.2615492 |
Type | Acute disease | Type | Acute disease | -0.02 | 1.3149731 | 200 | 129 | 0.7299124 | 120 | 1.2406324 |
Age | <40 years | Ethnic | Asian | -0.02 | 0.8610026 | 200 | 59 | 0.5917255 | 53 | 0.6943586 |
Age | >=40 years | Ethnic | White | -0.02 | 0.8418724 | 200 | 68 | 0.5675774 | 77 | 0.9560899 |
Age | <40 years | Type | Acute disease | -0.01 | 1.2573735 | 200 | 69 | 0.7294408 | 65 | 1.1830328 |
# - splitViolins - #
# - From: https://stackoverflow.com/questions/35717353/split-violin-plot-with-ggplot2 - #
## Custom ggproto object for generating split violin plots
GeomSplitViolin <- ggplot2::ggproto(
"GeomSplitViolin",
ggplot2::GeomViolin,
draw_group = function(self,
data,
...,
# add the nudge here
nudge = 0,
draw_quantiles = NULL) {
data <- transform(data,
xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
grp <- data[1, "group"]
newdata <- plyr::arrange(transform(data,
x = if (grp %% 2 == 1) xminv else xmaxv),
if (grp %% 2 == 1) y else -y)
newdata <- rbind(newdata[1, ],
newdata,
newdata[nrow(newdata), ],
newdata[1, ])
newdata[c(1, nrow(newdata)-1, nrow(newdata)), "x"] <- round(newdata[1, "x"])
# now nudge them apart
newdata$x <- ifelse(newdata$group %% 2 == 1,
newdata$x - nudge,
newdata$x + nudge)
if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))
quantiles <- ggplot2:::create_quantile_segment_frame(data,
draw_quantiles)
aesthetics <- data[rep(1, nrow(quantiles)),
setdiff(names(data), c("x", "y")),
drop = FALSE]
aesthetics$alpha <- rep(1, nrow(quantiles))
both <- cbind(quantiles, aesthetics)
quantile_grob <- ggplot2::GeomPath$draw_panel(both, ...)
ggplot2:::ggname("geom_split_violin",
grid::grobTree(ggplot2::GeomPolygon$draw_panel(newdata, ...),
quantile_grob))
}
else {
ggplot2:::ggname("geom_split_violin",
ggplot2::GeomPolygon$draw_panel(newdata, ...))
}
}
)
geom_split_violin <- function(mapping = NULL,
data = NULL,
stat = "ydensity",
position = "identity",
# nudge param here
nudge = 0,
...,
draw_quantiles = NULL,
trim = TRUE,
scale = "area",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
ggplot2::layer(data = data,
mapping = mapping,
stat = stat,
geom = GeomSplitViolin,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(trim = trim,
scale = scale,
# don't forget the nudge
nudge = nudge,
draw_quantiles = draw_quantiles,
na.rm = na.rm,
...))
}
I used split violin box plots for the pairwise comparison of target medians and distributions in low and high responder groups. Low responders are shown in green on the left side, high responder groups in red on the right. Likewise, facets for the second group variable are arranged from lowest (left) to highest responders (right).
(see also 3.6.: female patients stratified by sex and age)
# - Chunk_4a - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <- compare_means(
data = exampleData,
target ~ dose,
group.by = c("sex", "age"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("sex", "age", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Sex", "Age", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by(bmi, age, race, sex, type) %>%
select( dose, target) %>%
ggplot(
aes(
x= factor(
dose,
levels = c("0", "100", "200")
),
y = target,
# Order low > high responders
fill =
factor(
sex,
levels = c("Male", "Female") )
)
) +
# Color red for patient variable value with higher response
scale_fill_hue( direction = -1) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
age,
levels = c(">=40 years", "<40 years")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "Sex",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 2),
widths = c(0.2, 1, 1),
align = "h"
)
# - Chunk_4b - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <- compare_means(
data = exampleData,
target ~ dose,
group.by = c("sex", "type"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("sex", "type", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Sex", "Type", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by(bmi, age, race, sex, type) %>%
select( dose, target) %>%
ggplot(
aes(
x= factor(
dose,
levels = c("0", "100", "200")
),
y = target,
# Order low > high responders
fill =
factor(
sex,
levels = c("Male", "Female") )
)
) +
# Color red for patient variable value with higher response
scale_fill_hue( direction = -1) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
type,
levels = c("Chronic disease", "Acute disease")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "Sex",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)
(see also 3.5.: female patients stratified by ethnic and type of
disease)
# - Chunk_4c - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <- compare_means(
data = exampleData,
target ~ dose,
group.by = c("type", "race"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select(!".y." & !"group1" & !"method") %>%
set_names(
"Sex", "Ethnic", "Dose", "p", "p.adj", "p.format", "p.signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by( bmi, age, race, sex, type) %>%
select( dose, target) %>%
ggplot(
aes(
x= factor(dose, levels = c("0", "100", "200") ),
y = target,
# Order low > high responders
fill = factor (type, levels = c("Chronic disease", "Acute disease") )
)
) +
# Color red for patient variable value with higher response
scale_fill_hue (direction = -1) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
race,
levels = c("Black", "Asian", "White")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "Type",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 3),
widths = c(0.2, 1, 1),
align = "h"
)
Interestingly, Blacks with chronic disease had high target
levels, even when untreated,
# - Chunk_4d - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <- compare_means(
data = exampleData,
target ~ dose,
group.by = c("sex", "race"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select(!".y." & !"group1" & !"method") %>%
set_names(
"Sex", "Ethnic", "Dose", "p", "p.adj", "p.format", "p.signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by(bmi, age, race, sex, type) %>%
select (dose, target) %>%
ggplot(
aes(
x= factor(dose, levels = c("0", "100", "200") ),
y = target,
# Order low > high responders
fill =
factor(
sex,
levels = c("Male", "Female") )
)
) +
# Color red for patient variable value with higher response
scale_fill_hue( direction = -1) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
race,
levels = c("Black", "Asian", "White")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "Sex",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 2),
widths = c(0.2, 1, 1),
align = "h"
)
# - Chunk_4e - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <-
exampleData %>%
filter(sex == "Female") %>%
compare_means(
data = .,
target ~ dose,
group.by = c("race", "type"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("race", "type", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Ethnic", "Type", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by(bmi, age, race, sex, type) %>%
select (dose, target) %>%
ggplot(
aes(
x= factor(
dose,
levels = c("0", "100", "200")
),
y = target,
# Order low > high responders
fill = factor (type, levels = c("Chronic disease", "Acute disease") )
)
) +
# Color red for patient variable value with higher response
scale_fill_hue (direction = -1) + geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
race,
levels = c("Black", "Asian", "White")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "Type",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 3),
widths = c(0.2, 1, 1),
align = "h"
)
# - Chunk_4f - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <-
exampleData %>%
filter(sex == "Female") %>%
compare_means(
data = .,
target ~ dose,
group.by = c("race", "age"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("race", "age", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Ethnic", "Age", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by(bmi, age, race, sex, type) %>%
select (dose, target) %>%
ggplot (
aes (x= factor(dose, levels = c("0", "100", "200") ),
y = target,
# Order low > high responders
fill = factor (age, levels = c(">=40 years", "<40 years") )
)
) +
# Color red for patient variable value with higher response
scale_fill_hue (direction = -1) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
race,
levels = c("Black", "Asian", "White")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "Age",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 2.5),
widths = c(0.2, 1, 1),
align = "h"
)
Younger white females appear to respond better than older, especially
at the higher dose, whereas the opposite appears to be true for Asian
females (not sure about the significance of that…).
# - Chunk_4g - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <-
exampleData %>%
filter(race == "White") %>%
compare_means(
data = .,
target ~ dose,
group.by = c("sex", "type"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("sex", "type", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Sex", "Type", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by(bmi, age, race, sex, type) %>%
select (dose, target) %>%
ggplot (
aes (x= factor(dose, levels = c("0", "100", "200") ),
y = target,
# Order low > high responders
fill =
factor(
sex,
levels = c("Male", "Female") )
)
) +
# Color red for patient variable value with higher response
scale_fill_hue( direction = -1) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
type,
levels = c("Chronic disease", "Acute disease")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "Sex",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)
# - Chunk_4h - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <-
exampleData %>%
filter(race == "White") %>%
compare_means(
data = .,
target ~ dose,
group.by = c("sex", "age"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("sex", "age", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Sex", "Age", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by(bmi, age, race, sex, type) %>%
select (dose, target) %>%
ggplot(
aes(
x = factor(
dose,
levels = c("0", "100", "200")
),
y = target,
# Order low > high responders
fill =
factor(
sex,
levels = c("Male", "Female") )
)
) +
# Color red for patient variable value with higher response
scale_fill_hue( direction = -1) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
age,
levels = c(">=40 years", "<40 years")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "Sex",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 1.5),
widths = c(0.2, 1, 1),
align = "h"
)
(The difference between median targets in the Black/ low BMI/
dose 100 group had the highest significance in the Mood test)
# - Chunk_4i - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
sign_test <-
exampleData %>%
filter(race == "White") %>%
compare_means(
data = .,
target ~ dose,
group.by = c("race", "bmi"),
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("race", "bmi", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Ethnic", "BMI", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme("light")
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
group_by(bmi, age, race, sex, type) %>%
select(dose, target) %>%
ggplot(
aes(
x= factor(
dose,
levels = c("0", "100", "200")
),
y = target,
fill = bmi
)
) +
# Color red for patient variable value with higher response
scale_fill_hue (direction = -1) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
geom_split_violin(
width = 1,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.3,
outlier.shape = NA) +
# Order facets low > high responders
facet_grid(.
~ factor(
race,
levels = c("Black", "Asian", "White")
)
) +
theme_bw() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
fill = "BMI",
size = 12,
hjust = -15
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)
It appears that the groups the received the most significant benefit
were women who were either <40 years (p.adj = 7.9 * 10-13,
median difference 1.474), White (p.adj = 1.2 * 10-12, median
difference 1.406), or with acute disease (p.adj = 3.9 *
10-12, median difference 1.407). Whites with acute disease or
<40 years also showed significant benefit (p.adj = 2.7 *
10-10, median difference 1.421; median difference 1.474, 3.3
* 10-9, median difference 1.377), but this could be mainly
because of the female sub-population of these groups.
# - Chunk_5a - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
exampleData <-
exampleData %>%
mutate(
Group =
case_when(
race == "White" &
#type == "Acute disease" &
age == "<40 years" &
sex == "Female"
~ "White / F / <40 yrs",
TRUE ~ "Remaining"
)
)
sign_test <-
exampleData %>%
compare_means(
data = .,
target ~ dose,
group.by = "Group",
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("Group", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Group", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme(
"light",
base_size = 10
),
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
select (dose, target, type, Group) %>%
group_by(Group) %>%
ggplot(
aes(
x= factor(
dose,
levels = c("0", "100", "200")
),
y = target,
fill = Group
)
) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
fill_palette(
palette = c("lightgrey","salmon")
) +
geom_split_violin(
width = 1,
nudge = 0.01,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.2,
outlier.shape = NA) +
geom_point(
pch = 21,
size = 3,
position =
position_jitterdodge(
dodge.width = 0.7,
jitter.width = 0.1
),
color = "grey",
alpha = 0.5
) +
theme_light() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) +
labs(
x = "Dose",
y = "Target",
size = 13, hjust = -15,
title = "Target vs. dose levels in the group with highest responses\n"
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)
However, this might need confirmation in a larger sample of this
subgroup.
# - Chunk_5b - #
# - Requires R-libraries and object exampleData (set-up) - #
# - Requires object geom_split_violin (chunk: splitViolins) - #
# Generate a table of significant differences (Wilcoxon test)
exampleData <-
exampleData %>%
mutate(
Group =
case_when(
race == "Black" &
# bmi == "high BMI" &
type == "Chronic disease" &
# age == ">=40 years" &
sex == "Male"
~ "Black/ M / Chronic disease",
TRUE ~ "Remaining"
)
)
sign_test <-
exampleData %>%
compare_means(
data = .,
target ~ dose,
group.by = "Group",
ref.group = "0",
) %>%
filter (
!group1 == "100" &
!p.signif == "ns" ) %>%
select("Group", "group2", "p.format", "p.adj", "p.signif") %>%
set_names(
"Group", "Dose", "p", "p.adj", "signif") %>%
ggtexttable(
rows = NULL,
theme = ttheme(
"light",
base_size = 10
),
) %>%
tab_add_title(
text = "Wilcoxon test (ref. dose = 0)",
face = "bold", size = 12,
)
# Generate split violin plot and insert the Wilcoxon test table
ggarrange(
exampleData %>%
select (dose, target, type, Group) %>%
group_by(Group) %>%
ggplot(
aes(
x= factor(
dose,
levels = c("0", "100", "200")
),
y = target,
fill = Group
)
) +
geom_hline(
yintercept = 0,
color = "darkgrey",
linewidth = 1.1
) +
fill_palette(
palette = c("turquoise", "lightgrey")
) +
geom_split_violin(
width = 1,
nudge = 0.01,
alpha = 0.3,
color = "grey") +
geom_boxplot(
width = 0.2,
outlier.shape = NA) +
geom_point(
pch = 21,
size = 3,
position =
position_jitterdodge(
dodge.width = 0.7,
jitter.width = 0.1
),
color = "grey",
alpha = 0.5
) +
theme_light() +
theme(
text=element_text( size = 13),
axis.text = element_text( size = 12),
) +
scale_y_continuous(
breaks = c(-2: 5),
minor_breaks = NULL
) + labs(
x = "Dose",
y = "Target",
size = 13, hjust = -15,
title = "Target vs. dose levels in the group with lowest responses\n"
),
# Add space between plot and table
NULL,
sign_test,
ncol = 1, nrow = 3,
heights = c(2, 0.1, 1),
widths = c(0.2, 1, 1),
align = "h"
)
The distribution of target in the total
population looks similar for all 3 doses, and suggests that at least
two major subgroups exist that might respond differently.
In the following 3 figures I tried to produce an overview of best responses in subgroups characterised by 1-2 of the
patient variables. Because there was no guarantee for a linear
relationship between dose and target, I decided to compare the dose 100
and dose 200 groups individually with the dose 0 (untreated) group.
Because in many subgroups the target values did not seem to be normally
distributed, I choose to compare the differences between the median
target values. I also tried to assess the significance of these
differences. It struck me that the p-values of the Wilcoxon test seemed extremely significant in all
groups, even after correction. By contrast, the Mood
test produced no significant results.
In contrast to the Wilcoxon test, the Mood test does not require the
distributions of the compared groups to be similar. However, its power
to detect inter-group differences is much lower. My question to the
statistician would be whether any of these tests should be applied to
all subgroups characterised by one or two of the variables. Certainly,
visually inspecting the target distributions for each seems subjective
and cumbersome.
A better approach could be the modified Forest plot suggested by
Ballarine et al., and implemented in the “UpSet plot” function from the
R package “SubgrPlots” ^1, 2^.
Forest/UpSet plot
I found two other alternatives interesting, although perhaps more as tools for subgroup analysis than visualisation. The webinar discussed the shiny app DoRiS (Dose Response in Subgroup Analysis) which aims to facilitate data exploration and hypothesis generation in data sets with a very large numbers of subgroups. Rather than comparing the responses themselves, it employed a user-determined response threshold which could be, e.g., the clinically meaningful effect size. Either manual choice or automatic recognition of subgroup patterns was possible, but at the time only for a single patient variable. Favoring an exploratory approach, p-values and confidence intervals were not calculated to leave room for interdisciplinary discussions of plausible mechanisms and clinical relevance. I could not find the link to the shiny app on github, but it appeared to be very similar to the package “subscreen” (SAS® version 9.4 and R) 3.
Finally, “personalized” seems to be a very comprehensive analysis package in R that unifies a variety of methods for subgroup identification and treatment evaluation into one general, consistent framework 4.
Data exploration with Subscreen
Please quote as: Weissensteiner, Thomas. Visualising patient
groups who might benefit from personalised dosing. RPubs, 10 Aug. 2024.
https://rpubs.com/thomas-weissensteiner/1231597.
A critical review of graphics for subgroup analyses in clinical trials. Ballarini NM, Chiu YD, König F, Posch M, Jaki T. Pharm Stat 2020, 19(5): 541-560. https://doi.org/10.1002/pst.2012
A critical review of graphics for subgroup analyses in clinical
trials. Ballarini NM, Chiu YD, König F, Posch M, Jaki T. Pharm Stat.
2020 Sep;19(5):541-560. https://doi.org/10.1002/pst.2012
SubgrPlots: Graphical Displays for Subgroup Analysis in Clinical Trials.
R package version 0.1.0. Ballarini N, Chiu YD.; 2018
A novel concept of screening for subgrouping factors for the association between socioeconomic status and respiratory allergies. Muysers, C., Messina, F., Keil, T. et al. J Expo Sci Environ Epidemiol 2022, 32: 295–302. https://doi.org/10.1038/s41370-021-00365-x
Subgroup Identification Using the personalized Package. Huling, JD, & Yu, M. Journal of Statistical Software 2021, 98(5): 1–60. https://doi.org/10.18637/jss.v098.i05