My scripts:
About myself: www.linkedin.com/in/ThomasWs-Mopfair
“Wonderful Wednesdays” are a monthly webinar organised by the Visualisation Special Interest Group of Statisticians in the Pharmaceutical Industry (PSI). Principles of effective data visualisation, based on physiological and psychological insights, have been the topic of a previous PSI VisSIG Webinar 1.
This script was written for the VIS-SIG Wonderful Wednesdays challenge 13 Nov 24 (link to video recording). The task involved summary data of a randomized study, and consisted of suggesting a better visualisation than the example chart.
# - r libraries and data - #
library(dplyr) # version 1.1.4 # general grammar
library(purrr) # version 1.0.2 # set_names, list_rbind
library(tidyr) # version 1.3.1 # pivot_longer, pivot_wider
library(ggpubr) # version 0.6.0, includes ggplot2 3.4.0 # ggarrange
library(ggrepel) # version 0.9.6 # geom_text_repel
library(RColorBrewer) # version 1.1-3 # scale_color_brewer (optional)
# for interactive multiplot:
library(ggiraph) # version 0.8.10 # geom_ ...interactive, girafe
library(patchwork) # version 1.3.0 # assembly of interactive plots, synchronous response
exampleData <- read.csv("https://raw.githubusercontent.com/VIS-SIG/Wonderful-Wednesdays/refs/heads/master/data/2024/2024-10-09/WW_Nov2024.csv")
exampleData
## TREATMENT X1 X2 X3 X4 X5 X6 X7
## 1 1 6.2 5.7 5.2 4.8 4.1 6.0 5.2
## 2 2 2.2 3.8 4.0 4.2 4.7 2.3 2.8
## 3 3 5.5 4.6 4.2 5.3 5.4 6.7 4.3
## 4 4 6.1 6.4 7.3 7.2 8.0 5.8 5.1
## 5 5 7.0 7.2 8.3 8.5 9.3 8.1 6.4
## 6 6 8.5 8.1 9.0 8.3 7.4 6.3 5.6
## 7 7 2.5 2.1 1.8 2.3 4.1 9.4 10.0
## 8 8 7.6 8.1 7.4 7.9 6.1 5.5 5.7
Additional information provided for the challenge:
# - Chunk_1 - #
# - Requires R-libraries and object exampleData (r libraries and data) - #
exampleData_int <-
exampleData %>%
set_names(
"Treatment",
paste(c(1:7), sep = ", " )
) %>%
# Add a column with differences between the medians of disease activity at visits before and after the intervention
mutate(
Change_intervention =
apply(., 1, function(row) {
median(row[c("6", "7")]) -
median(row[c("1", "2", "3", "4", "5")])
}
)
) %>%
# Optional: replace treatment numbers with letters, to differentiate from the index for visits
mutate(
Treatment = LETTERS[1:8]
) %>%
# Re-arrange dataframe
arrange (Change_intervention) %>%
pivot_longer(
cols = paste(1:7),
names_to = "Visit",
values_to = "Mean_Disease_Index"
) %>%
mutate(.after = Visit,
Visit = as.numeric(Visit),
Intervention = ordered(
rep(
c(rep("before", 5), rep("after", 2)),
8),
levels = c("before", "after")
),
Treatment = factor(
Treatment,
levels = unique(Treatment)
)
)
# Plot
p_visits <-
exampleData_int %>%
mutate(facet_strip = "") %>% # Empty facet label for aligning the y-axis scales of the two subplots
ggplot(
aes(
x = Visit,
y = Mean_Disease_Index,
color = Treatment
)
) +
scale_color_brewer(palette="Dark2") +
geom_text_repel(
data = subset(
exampleData_int,
Visit == "7" & (Treatment %in% LETTERS[1:8] )
),
aes(label = Treatment),
nudge_x = 0.5,
color = "black",
size = 5
) +
facet_grid(.~ facet_strip) +
geom_line(
lwd = 1.5,
alpha = 0.6
) +
scale_x_continuous(
breaks = 1:7,
minor_breaks = NULL,
expand = c(0, 0), limits = c(0.5, 7.5),
labels = c(paste(1:6), "7\n")
) +
scale_y_continuous(
breaks = seq(0, 10, 2),
minor_breaks = NULL,
expand = c(0, 0), limits = c(0, 10.5)
) +
geom_vline(
aes( xintercept = 5.5),
linetype = "dashed",
col = "blue"
) +
annotate(
"rect",
xmin = 5.4, xmax = 5.6,
ymin = -Inf, ymax=Inf,
fill="white", alpha=0.6) +
annotate(
"text",
x = 5.5, y = 0.5,
label = "Intervention",
size = 4,
color = "blue"
) +
theme_light() +
theme(
legend.position = "none",
strip.background = element_blank(),
strip.text.x=element_text(
colour="black",
size = 12
),
) +
labs(
y = "Mean Disease Index Score",
title = "Disease Activity vs. Visit",
vjust = -0.1
)
p_intervention <-
exampleData_int %>%
ggplot(
aes(
x = Intervention,
y = Mean_Disease_Index,
color = Treatment,
)
) +
scale_color_brewer(palette="Dark2") +
scale_y_continuous( # same y-scale for both subplots
breaks = seq(0, 10, 2),
minor_breaks = NULL,
expand = c(0, 0),
limits = c(0, 10.5)
) +
scale_x_discrete(
labels = c("before", "\nafter")
)+
facet_grid(.
~ Treatment
) +
geom_point(
size = 3
) +
stat_summary(
fun = median,
shape = 5,
size = 0.8,
color = "darkgrey"
) +
stat_summary(
fun = median,
group = 1,
geom = "path",
color = "darkgrey", alpha = 0.5,
lwd = 1.3
) +
theme_light() +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank(),
strip.background = element_blank(),
strip.text.x=element_text(
colour="black",
size = 12
),
legend.position = "none"
) +
labs(
title = "Change following Intervention",
vjust = -0.1
)
ggarrange(
p_visits,
p_intervention,
ncol = 2,
widths = c(0.6, 1)
)
The original data used numbers for indexing visits (X1-X7) as well as treatment groups (1-8). For better distinction, I changed the names of the treatment groups to upper case letters (A = 1, B = 2, … H = 8).
“Mean Disease Index vs Visit” (left subplot)
Disease activity varied widely between groups and visits, in the periods
before and after the intervention. There was a sharp rise in disease
activity in group “G” after the intervention, and a notable drop in
group “F”. However, both of these trends seemed to have started already
at visits 4 and 5.
“Change following Intervention” (right subplot)
Nevertheless, I added a summary of mean disease index scores before and
after intervention. My assumption was that visits 1-5 were meant to
establish a baseline value, to which levels at visits 6 and 7 were to be
compared. Median values of the mean disease indices before and after
intervention are shown as grey open diamonds, their changes are
highlighted by grey lines. Treatment groups are shown left to right in
order of greatest decrease (group “F”, left) to greatest the increase
(group “G”, right).
# - Chunk_2 - #
# - Requires R-libraries (r libraries and data) and object exampleData_int (chunk_1) - #
## Additional parameters, not supplied with the challenge
# Intervals at which visits 1-7 took place (e.g. weeks)
visitIntervals <- c(0, 2, 4, 8, 12, 16, 24)
interventionTime <- mean (c(visitIntervals[5], visitIntervals[6]))
# Clinically meaningful target for mean disease index score
diseaseScore_target <- 3.5
## Plot
exampleData_int <-
exampleData_int %>%
mutate(
Time_of_visits = rep(visitIntervals, 8)
)
p_visits <-
exampleData_int %>%
mutate(facet_strip = "") %>% # Empty facet label for aligning the y-axis scales of the two subplots
ggplot(
aes(
x = Time_of_visits,
y = Mean_Disease_Index,
color = Treatment,
data_id = Treatment
)
) +
geom_line( # geom_line_interactive mouse over bug: https://github.com/davidgohel/ggiraph/issues/299
lwd = 1,
alpha = 0.3
) +
geom_point_interactive( # as substitute for geom_line_interactive
size = 2
) +
geom_text_interactive(
data = subset(
exampleData_int,
Time_of_visits ==
max(visitIntervals)
),
aes(label = Treatment),
nudge_x = 1,
color = "black",
size = 3
) +
scale_color_brewer(palette="Dark2") +
annotate(
"rect",
xmin = -Inf, xmax = Inf,
ymin = diseaseScore_target, ymax=Inf,
fill="lightgrey", alpha=0.2) +
facet_grid(.
~ facet_strip
) +
scale_x_continuous(
breaks = visitIntervals,
minor_breaks = NULL,
expand = c(0, 0),
limits =
range(visitIntervals) +
max(visitIntervals)/14 * c(-1, 1),
labels = c(
visitIntervals[1:6], paste0(visitIntervals[7],"\n") )
) +
scale_y_continuous(
breaks = seq(0, 10, 2),
minor_breaks = NULL,
expand = c(0, 0), limits = c(0, 10.5)
) +
geom_vline(
aes(xintercept = interventionTime ),
linetype = "dashed",
col = "blue"
) +
annotate(
"text",
x = 0.67 - max(visitIntervals)/14,
y = 1.8,
label = "Target Range",
size = 3,
angle = 90,
color = "blue"
) +
annotate(
"rect",
xmin = interventionTime - 0.1,
xmax = interventionTime + 0.1,
ymin = -Inf, ymax=Inf,
fill="white", alpha=0.6) +
annotate(
"text",
x = interventionTime,
y = 0.5,
label = "Intervention",
size = 3,
color = "blue"
) +
theme_light() +
theme(
legend.position = "none",
strip.background = element_blank(),
strip.text.x=element_text(
colour="black",
size = 12
),
) +
labs(
y = "Mean Disease Index Score",
x = "Time of Visit [week]",
title = "Disease Activity vs. Time of Visit",
vjust = -0.1
)
p_intervention <-
exampleData_int %>%
ggplot(
aes(
x = Intervention,
y = Mean_Disease_Index,
color = Treatment
)
) +
scale_color_brewer(palette="Dark2") +
annotate(
"rect",
xmin = -Inf, xmax = Inf,
ymin = 3.5, ymax=Inf,
fill="lightgrey", alpha=0.2) +
scale_y_continuous( # same y-scale for both subplots
breaks = seq(0, 10, 2),
minor_breaks = NULL,
expand = c(0, 0),
limits = c(0, 10.5)
) +
scale_x_discrete(
labels = c("before", "\nafter")
)+
facet_wrap_interactive(
~Treatment,
interactive_on = "both",
ncol = 8,
labeller = labeller_interactive( # facet_wrap_interactive requires labeller_interactive,
aes(data_id = Treatment) # and data_id most be defined as local aesthetic
)
) +
geom_point_interactive(
aes(data_id = Treatment),
size = 2
) +
stat_summary(
fun = median,
shape = 5,
size = 0.8,
color = "darkgrey"
) +
stat_summary(
fun = median,
group = 1,
geom = "path",
color = "darkgrey", alpha = 0.5,
lwd = 1.3
) +
theme_light() +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank(),
strip.background = element_blank(),
strip.text.x = element_text_interactive(
colour="black",
size = 10
),
legend.position = "none"
) +
labs(
title = "Change following Intervention",
vjust = -0.1
)
girafe (
ggobj =
p_visits +
plot_spacer() +
p_intervention +
plot_layout(widths = c(0.6, 0.01, 0.9)
),
options = list(
opts_hover(css = ''), # CSS code of selected line
opts_hover_inv(css = "opacity:0.05;"), # CSS code of all other lines
opts_sizing(rescale = FALSE)
)
)
For this version of the figure, I added two parameters that were
not supplied with the example data but which could improve the
visualisation:
The first was a hypothetical target range for the disease index. The aim of treatment/intervention in this scenario would have been to keep index scores below the values in the shaded areas of the plots. For the target shown here, the intervention achieved a meaningful benefit only in group “B”, whereas group “G” became significantly worse.
My second additional parameter were the intervals (in weeks) at which the visits were scheduled. This should give a better impression of the variability of the disease index over time, in particular if the visits were not evenly spaced.
Following a suggestion by Steve Mallet at the PSI webinar, the figure
now also has an interactive feature to help identifying corresponding
line and facet dot plots for individual treatment groups: Hovering the
mouse over a point or label highlights an individual treatment group in
both subplots. Interactivity was implemented with the packages ggiraph
and patchwork. The treatment labels in the line plot overlap, but using
“geom_text_repel_interactive” generated non-fading connections between
line and letter. More importantly, a bug in ggiraph meant that lines did
not respond properly on mouse-over 2. Therefore, I had to add
interactive dots to the line plot instead.
My difficulty with this task was understanding the data, and what messages the figure should convey. It was not clear whether the treatment and/or intervention had a meaningful effect, except for the worsening of disease in group “G”.
Was the “intervention” given between visits 5 and 6 the initiation of treatments A-H? Or were these treatments started before the first visit, followed by an additional intervention that was identical for all groups? If treatments were initiated between visits 5 and 6, and mean disease indices at prior visits represented the baselines for supposedly randomised groups, their large variability should be a concern. Alternatively, the study could have been undertaken to assess the effect of a common intervention in patients on different treatments. However, there were no suggestions of this in the supplied data and “bad” figure.
Redrawing the figure was further limited by having only basic summary
data as a source, i.e., mean scores without even margins of error. I
therefore compared the medians of those means before and after
“intervention” although this is probably an inappropriate method.
Far better would have been an analysis of covariance 3.
However, this would have required patient-level data.
1. PSI
VisSIG Webinar: Rapid Insights to Data
2. https://github.com/davidgohel/ggiraph/issues/299
3. Statistics notes:
Analysing controlled trials with baseline and follow up
measurements. Vickers AJ, Altman DG. BMJ. 2001 Nov
10;323(7321):1123-4.
… more bad chards