My scripts:
About myself: www.linkedin.com/in/ThomasWs-Mopfair
## - chunk libraries_and_data
## Load R libraries and the data file - #
library(dplyr) # version 1.1.4 # general R grammar
library(tidyr) # version 1.3.1 # pivot_longer, pivot_wider, nest
library(purrr) # version 1.0.2 # map
library(ggplot2) # version 3.4.0 # plots
HyKal_simData <-
read.csv("https://raw.githubusercontent.com/VIS-SIG/Wonderful-Wednesdays/refs/heads/master/data/2025/2025-06-11/serum_potassium_study_data.csv") %>%
# Avoid confusion with the function "group"
rename(., Treatment = Group)
The concentration of potassium ions (K+) in human serum is maintained
at 4.0–4.9 mEq/L under normal conditions.1
Serum or plasma potassium levels above the normal range, usually greater
than 5.0 mEq/L to 5.5 mEq/L, are defined as hyperkalemia. Mild
hyperkalemia is usually asymptomatic, but high potassium levels may
cause life-threatening cardiac arrhythmias, muscle weakness, or
paralysis. Symptoms usually develop at higher levels, 6.5 mEq/L to 7
mEq/L, but the rate of change is more important than the numerical
value. Chronic hyperkalemia may be asymptomatic, whereas acute
concentration shifts may cause severe symptoms, even at lower K+
levels.2
Major risk factors for hyperkalemia among chronic kidney disease
patients include lower estimated glomerular filtration rate (eGFR), use
of renin–angiotensin–aldosterone system inhibitors (RAASis), diabetes,
older age and male gender.3
A recently published trial compared sodium zirconium cyclosilicate
versus sodium polystyrene sulfonate for treatment of hyperkalemia in
hemodialysis patients.1
The primary results were presented as a single figure showing mean serum
potassium concentrations versus time in a separate panel for each
treatment group. Error bars probably represent the standard deviation
(see Methods/Statistical Analysis, no figure caption):
Mean serum K⁺ in patients receiving SPS and SZC vs. week of treatment
Published summary results: a) SZC (orange line), b) SPS(blue line)
Means +/- 1 standard deviation
The preceding PSI
“Wonderful Wednesday” challenge invited improvements for this
figure. In this follow-up, the task was to generate a plot that made use
of patient level information mentioned in the table of baseline
characteristics. Because the publication reported only summary data,
“Wonderful Wednesday” provided a simulated data-set. Plotting mean serum
K[+] versus time with these simulated data shows a general agreement
with the published figure. I used colors similar to the original and
included the standard deviations as lighter shaded bands for comparison:
# Normal serum K[+] as defined in reference 2
normokalemia <- c(4, 4.9)
HyKal_simData %>%
select(c(Treatment, Week, Serum_Potassium_mEq_L)) %>%
group_by(Treatment, Week) %>%
summarise(., n = n(), mean = mean(Serum_Potassium_mEq_L), sd = sd(Serum_Potassium_mEq_L)) %>%
mutate(
Treatment = factor(Treatment),
SDhigh = mean -sd,
SDlow = mean +sd,
CI = 1.96*sd/sqrt(n),
CIlow = mean - CI,
CIhigh = mean + CI) %>%
ggplot(
aes(
x = Week,
y = mean,
colour = Treatment,
fill = Treatment
)
) +
geom_ribbon(
inherit.aes = FALSE,
aes(
x = Week,
ymin = min(normokalemia),
ymax = max(normokalemia)),
fill = "lightgrey",
alpha = 0.5
) +
geom_line(
size = 1
) +
geom_ribbon(
aes(
ymin = SDlow,
ymax = SDhigh),
linetype = 0,
alpha = 0.3
) +
geom_ribbon(
aes(
ymin = CIlow,
ymax = CIhigh),
linetype = 0,
alpha = 0.4
) +
annotate(
"text",
label = "Normal range",
size = 3,
x = 0.3,
y = range(normokalemia) %>% sum()/2,
angle = 90
) +
geom_label(
inherit.aes = FALSE,
aes(
x = 7.5,
y = 6.3 - as.numeric(Treatment),
label = Treatment,
color = Treatment
),
fontface = "bold",
size = 3,
label.padding = unit(0.25, "lines"),
label.size = NA,
fill = "white"
) +
scale_color_manual(values = c("#27C8F5", "#F58E27") ) +
scale_fill_manual(values = c("#27C8F5", "#F58E27") ) +
scale_y_continuous(
breaks = c(3:6),
limits = c(3.5, 6.5),
minor_breaks = NULL,
) +
labs(
title = "Mean serum K[+] in patients receiving SPS and SZC \nvs. week of treatment",
subtitle = "Simulated summary results",
y = "Mean serum K[+] (mEq/L)",
caption = "Dark colored bands: 95% confidence intervals. \nLight colored bands: standard deviation."
) +
theme_light() +
theme(
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0.5, size = 8),
legend.position = "none"
)
The published study showed that the average serum K[+] response in
patients on SZC was greater than in patients treated with SPS. My aim
for this challenge was to visualise patient subgroups in which the
efficacy of SPS and SZC differed most, based on the simulated data.
## - Chunk 1
# Required objects and functions: HyKal_simData (chunk libraries_and_data)
## Individual patient responses vs. week of treatment
# Normal serum K[+] as defined in reference 2
normokalemia <- c(4, 4.9)
# K_threshold is set based on results in chunk 3
K_threshold <- 5.2
# Number of patients in each treatment group
N_group <-
HyKal_simData %>%
group_by(Treatment) %>%
summarize(., N = unique(Patient_ID) %>% length )
HyKal_simData %>%
# Add a column showing whether baseline levels were below or above K_threshold
mutate(
K_base = case_when(
Serum_Potassium_mEq_L < K_threshold & Time_Point == "Baseline"
~ paste0(Treatment, ".K_low"),
Serum_Potassium_mEq_L >= K_threshold & Time_Point == "Baseline"
~ paste0(Treatment, ".K_high"),
.default = NA_character_
) %>% factor()
) %>%
group_by(Patient_ID) %>%
fill(K_base, .direction = "downup") %>%
# Add patient numbers in each treatment group, and a column showing whether serum K[+] levels were normal at any time during treatment
left_join(., N_group) %>%
mutate(
hyKal = ifelse(
Serum_Potassium_mEq_L <= max(normokalemia),
Week, NA),
hyKal = if (all(is.na(hyKal[-1]))) "Yes" else "No",
color = paste(K_base, hyKal, sep = "_") %>% factor(),
Treatment = paste0(Treatment, " (N = ", N, ")")
) %>%
ungroup() %>%
ggplot(
aes(
x = Week,
y = Serum_Potassium_mEq_L,
group = Patient_ID,
color = color,
size = K_base
)
) +
geom_ribbon(
inherit.aes = FALSE,
aes(
x = Week,
ymin = min(normokalemia),
ymax = max(normokalemia)),
fill = "lightgrey",
alpha = 0.7
) +
geom_line() +
annotate(
"text",
label = "Normal range",
size = 3,
x = 0.3,
y = range(normokalemia) %>% sum()/2,
angle = 90
) +
scale_color_manual(
values = c(
SPS.K_low_No = "#27C8F5", SPS.K_high_No = "#4795A8",
SPS.K_low_Yes = "#551182", SPS.K_high_Yes = "#CC2DB7",
SZC.K_high_No = "#F58E27", SZC.K_low_No = "#A34018"
# Levels that do not exist in this data set:
# SZC.K_low_Yes = "#551182", SZC.K_high_Yes = "#CC2DB7",
)
) +
scale_size_manual(values = c(0.7 ,1.4 , 0.7, 1.4) ) +
scale_y_continuous(
breaks = c(4:6, K_threshold),
minor_breaks = NULL,
) +
facet_wrap(.~Treatment) +
labs(
title = "Serum potassium levels vs. week of treatment in individual patients receiving SPS or SZC",
y = "Serum K[+] (mEq/L)",
caption = paste("Darker and thicker lines represent patients with a baseline serum K[+] below the", K_threshold, "mEq/L threshold (purple y-axis tick label). \nPurple lines indicate patients who never achieved normokalemia. See the following figures for an explanation.")
) +
theme_light() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.y = element_text(
face = c("plain", "plain", "plain", "bold"),
color = c("black", "black", "black", "#CC2DB7")),
strip.background = element_blank(),
strip.text.x = element_text(
color = "black",
size = 11
),
plot.caption = element_text(hjust = 0, size = 8),
legend.position = "none"
)
The raw data already show a more consistent decline of serum
K[+] levels, with lower inter- and intra-individual variability in SZC-
compared to SPS-treated patients. In the SPS group only, several patient
progressed directly between hyper- and hypokalemia at consecutive time
points, while eight patients never achieved normal levels throughout the
treatment period (purple lines).
At the previous PSI-webinar, it was noted that reaching normokalemia
took on average about five weeks in patients treated with SPS, compared
to 1.5 weeks in patients receiving SZC. Marking these time points was a
suggested improvement of the figure. Accordingly, I choose to estimate
the times at which normal serum K[+] were first achieved in individual
patients, and compare the means between treatment groups.
However, the first crossing point into the normal range is determined by
the patients’ treatment response as well as their baseline serum K[+]
levels. Moreover, baseline levels of serum K[+] might also be covariate
associated with the type and/or strength of the treatment response.
Therefore, I decided to compare treatment response as a separate
outcome, modelling it as a linear function of serum K[+] vs. time.
Change from baseline could have been another option, but with only a
single baseline value for each patient “rate of change” seemed a better
metric for the efficacy of each drug for lowering serum K[+]
concentrations.
K[+] measurements in individual patients fluctuated considerably over
time, especially in the SPS group. This means that estimates of the
response rate or the time of reaching normokalemia are uncertain, and
might even be inappropriate for comparing the two treatment groups. A
better and clinically more meaningful outcome might therefore be the
total time during the trial when patients achieved normokalemia.
## - quantBase_dotPlot
# Function for plotting outcomes vs. continuous baseline variables
# Setting argument "lambda" to N (number of patients in each treatment group) adds median regression lines, smaller values will result in median smoothing similar to the panels for individual groups in a STEPP plot
# Moving the facet strip text to the bottom is a workaround for generating individual x-axis titles for each facet
quantBase_dotPlot <- function(data, outcome, label_nudge, label_offset, lambda = 1) {
outcome <- sym(outcome)
# Reformat input data-frame
data_long <-
data %>%
rename(
"Age \n(years)" = Age,
"Baseline eGFR \n(ml/min/1.73m^2)" = Baseline_eGFR,
"Baseline serum K[+] \n(mEq/L)" = Serum_Potassium_mEq_L) %>%
pivot_longer(
cols = !c(Treatment, Gender, Diabetes, Hypertension, !!outcome),
names_to = "variables"
) %>%
mutate(., variables =
factor(variables,
levels = c(
"Baseline serum K[+] \n(mEq/L)",
"Baseline eGFR \n(ml/min/1.73m^2)",
"Age \n(years)"
)
)
)
# Labelled horizontal lines for SPS and SZC group medians
median_labels <-
left_join(
# Group median values
data_long %>%
group_by(variables, Treatment) %>%
summarise(median_outcome = median(!!outcome), .groups = "drop"),
# x-limits for label positioning
data_long %>%
group_by(variables) %>%
summarise(x_pos = max(value, na.rm = TRUE), .groups = "drop")
) %>%
# y-values for label positioning, with +/- nudging from median line
mutate(
y_pos = median_outcome +
((as.factor(median_outcome) %>% as.numeric)*label_nudge - label_offset),
Label = Treatment
)
# Show treatment group labels only in last facet
median_labels[1:(nrow(median_labels)-2), "Label"] <- ""
data_long %>%
ggplot(
aes(
x = value,
y = !!outcome,
color = Treatment
)
) +
geom_point() +
geom_hline(
data = median_labels,
aes(yintercept = median_outcome, color = Treatment),
linetype = "dotted",
linewidth = 0.8
) +
geom_quantile(
method = "rqss", quantiles = 0.5, lambda = lambda, size = 2, alpha = 0.3 ) +
geom_text(
data = median_labels,
aes(
x = x_pos, y = y_pos,
label = paste(Label),
color = Treatment
),
fontface = "bold",
hjust = 0.65,
size = 4
) +
scale_color_manual(values = c("#27C8F5", "#F58E27") ) +
facet_wrap(
.~variables,
scales = "free_x",
strip.position = "bottom",
ncol = 3
) +
theme_minimal() +
theme(
strip.placement = "outside",
strip.text = element_text(size = 11),
axis.title.x = element_blank(),
plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0, size = 8),
legend.position = "none",
)
}
## - geom_split_violin
# Custom ggproto object for generating split violin plots (source: https://stackoverflow.com/questions/35717353/split-violin-plot-with-ggplot2)
GeomSplitViolin <- ggplot2::ggproto(
"GeomSplitViolin",
ggplot2::GeomViolin,
draw_group = function(self,
data,
...,
# add the nudge here
nudge = 0,
draw_quantiles = NULL) {
data <- transform(data,
xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
grp <- data[1, "group"]
newdata <- plyr::arrange(transform(data,
x = if (grp %% 2 == 1) xminv else xmaxv),
if (grp %% 2 == 1) y else -y)
newdata <- rbind(newdata[1, ],
newdata,
newdata[nrow(newdata), ],
newdata[1, ])
newdata[c(1, nrow(newdata)-1, nrow(newdata)), "x"] <- round(newdata[1, "x"])
# now nudge them apart
newdata$x <- ifelse(newdata$group %% 2 == 1,
newdata$x - nudge,
newdata$x + nudge)
if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))
quantiles <- ggplot2:::create_quantile_segment_frame(data,
draw_quantiles)
aesthetics <- data[rep(1, nrow(quantiles)),
setdiff(names(data), c("x", "y")),
drop = FALSE]
aesthetics$alpha <- rep(1, nrow(quantiles))
both <- cbind(quantiles, aesthetics)
quantile_grob <- ggplot2::GeomPath$draw_panel(both, ...)
ggplot2:::ggname("geom_split_violin",
grid::grobTree(ggplot2::GeomPolygon$draw_panel(newdata, ...),
quantile_grob))
}
else {
ggplot2:::ggname("geom_split_violin",
ggplot2::GeomPolygon$draw_panel(newdata, ...))
}
}
)
geom_split_violin <- function(mapping = NULL,
data = NULL,
stat = "ydensity",
position = "identity",
# nudge param here
nudge = 0,
...,
draw_quantiles = NULL,
trim = TRUE,
scale = "area",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
ggplot2::layer(data = data,
mapping = mapping,
stat = stat,
geom = GeomSplitViolin,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(trim = trim,
scale = scale,
# don't forget the nudge
nudge = nudge,
draw_quantiles = draw_quantiles,
na.rm = na.rm,
...))
}
## - geom_split_violin_labels
# Custom geom for split violin labels
# Default "colour = NULL" in geom_split_violin_labels inherits color aesthetics from geom_split_violin, aesthetics cannot be defined in this geom
splitViolinLabels <-
ggproto(
"splitViolinLabels", Stat,
required_aes = c("x", "y", "fill"
),
setup_params = function(data, params) {
params
},
# Compute panel for same y-coordinates
compute_panel = function(data, scales, nudge_amount = 0.4, x_offset = 0,
y_offset = 0, y_position_pct = 0.9, label_colour = NULL) {
# Calculate y position based on OVERALL data range (not per group)
y_max <- max(data$y)
y_min <- min(data$y)
y_pos <- y_min + (y_max - y_min) * y_position_pct + y_offset
# Process each group separately but use the same y_pos for all
result_list <- split(data, data$group)
results <- lapply(result_list, function(group_data) {
# Get the group information
group_val <- unique(group_data$fill)
x_val <- unique(group_data$x)
group_num <- unique(group_data$group)
# Calculate x position based on ggplot2's internal group number
x_pos <- if(group_num %% 2 == 1) {
x_val - nudge_amount/2 + x_offset # Odd groups (left)
} else {
x_val + nudge_amount/2 + x_offset # Even groups (right)
}
# Create result data-frame
result <- data.frame(
x = x_pos,
y = y_pos,
label = as.character(group_val),
fill = group_val,
group = group_num,
stringsAsFactors = FALSE
)
# Handle colour aesthetic
if ("colour" %in% names(group_data)) {
# Use mapped colour aesthetic
result$colour <- unique(group_data$colour)
} else if (!is.null(label_colour)) {
# Use parameter-specified colour
result$colour <- label_colour
}
# If neither, let default geom aesthetic handle it
return(result)
})
# Combine all results
do.call(rbind, results)
}
)
# Create custom Geom for the labels
GeomSplitViolinLabels <- ggproto("GeomSplitViolinLabels", GeomText,
default_aes = aes(colour = "black", size = 4, angle = 0, hjust = 0.5,
vjust = 0.5, alpha = 1, family = "", fontface = "bold",
lineheight = 1.2)
)
geom_split_violin_labels <-
function(
mapping = NULL, data = NULL,
nudge_amount = 0.4, x_offset = 0, y_offset = 0,
y_position_pct = 0.9, size = 4, fontface = "bold",
colour = NULL, color = NULL, ..., na.rm = FALSE,
show.legend = FALSE, inherit.aes = TRUE) {
# Handle color/colour argument
if (!is.null(color)) colour <- color
# Create the layer parameters
params <- list(
nudge_amount = nudge_amount,
x_offset = x_offset,
y_offset = y_offset,
y_position_pct = y_position_pct,
size = size,
fontface = fontface,
na.rm = na.rm,
label_colour = colour, # Pass colour to stat like original
...
)
# Don't remove colour from params if using default
if (!is.null(colour) && is.null(mapping)) {
# Only remove colour from params if we have aesthetic mapping
# Otherwise keep it for the geom
params$colour <- colour
params <- params[names(params) != "label_colour"]
}
layer(
data = data,
mapping = mapping,
stat = splitViolinLabels,
geom = GeomSplitViolinLabels,
position = "identity",
show.legend = show.legend,
inherit.aes = inherit.aes,
params = params
)
}
## - split_violin_plots_and_annotations
# Functions for generating faceted split violin plots comparing outcomes across treatments and patient subgroups
# Required objects and functions: geom_split_violin, geom_split_violin_labels
## Function for thresholding continuous variables and transforming the input dataframes to long format
# Input: outcome and comparison tables derived from Hykal_simData ("data", "data_p"), thresholds for categorising continuous baseline characteristic variables ("Age_threshold", "eGFR_threshold", "K_threshold")
reformat_data <- function(data, Age_threshold, eGFR_threshold, K_threshold) {
data %>%
mutate(
Age_level = ifelse(
Age < Age_threshold,
"Yes", "No"
),
eGFR_level = ifelse(
Baseline_eGFR < eGFR_threshold,
"Yes", "No"
),
K_level = ifelse(
Serum_Potassium_mEq_L < K_threshold,
"Yes", "No"
)
) %>%
pivot_longer(
cols = c(Age_level, eGFR_level, K_level, Gender, Diabetes, Hypertension),
names_to = "pChar",
values_to = "value"
) %>%
mutate(
value = factor(value, levels = c("Yes", "No", "Male", "Female") ),
pChar = case_when(
pChar == "Age_level" ~ paste("Age <", Age_threshold, "years"),
pChar == "eGFR_level" ~ paste("Baseline eGFR <", eGFR_threshold, "ml/min/1.73m^2"),
pChar == "K_level" ~ paste("Baseline K[+] <", K_threshold, "mEq/mL"),
.default = pChar
),
Treatment = factor(Treatment)
)
}
## Standalone functions for treatment comparisons (SPS vs SZC), for generating p-values for plot annotations
# Input: outcome table derived from Hykal_simData ("data"), name of outcome variable ("outcome")
treatment_comparisons <- function(data, outcome) {
data %>%
group_by(pChar, value) %>%
summarise(
p_value_groups = tryCatch({
# Use string column name to extract values
sps_values <- .data[[outcome]][Treatment == "SPS"]
szc_values <- .data[[outcome]][Treatment == "SZC"]
wilcox.test(sps_values, szc_values)$p.value
},
error = function(e) NA_real_
),
delta_group = tryCatch({
# Use string column name to extract values
sps_values <- .data[[outcome]][Treatment == "SPS"]
szc_values <- .data[[outcome]][Treatment == "SZC"]
median(sps_values) - median(szc_values)
},
error = function(e) NA_real_
),
n_SPS = sum(Treatment == "SPS"),
n_SZC = sum(Treatment == "SZC"),
.groups = "drop"
) %>%
group_by(pChar) %>%
mutate(
min.p_value = min(p_value_groups, na.rm = TRUE),
max_delta = abs(delta_group) %>% max(., na.rm = TRUE)) %>%
ungroup() %>%
# Clean and format
filter(!is.na(p_value_groups)) %>%
mutate(
# Order patient subgroups by differences of their medians or greatest significance
pChar = factor(
pChar,
levels = unique(
pChar[order(max_delta) %>% rev]
# pChar[order(min.p_value)]
)
)
)
}
# Standalone functions for level comparisons (factor levels within groups)
level_comparisons <- function(data, outcome, pChar_order = NULL) {
# Helper function for safe t-tests
perform_comparisons <- function(rates, values, outcome_col) {
val_numeric <- as.numeric(values)
safe_ttest <- function(group1_idx, group2_idx) {
rates1 <- rates[val_numeric == group1_idx]
rates2 <- rates[val_numeric == group2_idx]
if(length(rates1) > 0 & length(rates2) > 0) {
tryCatch(
wilcox.test(rates1, rates2)$p.value,
error = function(e) NA_real_
)
} else {
NA_real_
}
}
tibble(
comparison = c(
paste(levels(values)[1], "vs", levels(values)[2]),
paste(levels(values)[3], "vs", levels(values)[4])
),
p_value_levels = c(
safe_ttest(1, 2),
safe_ttest(3, 4)
),
n_group1 = c(
sum(val_numeric == 1),
sum(val_numeric == 3)
),
n_group2 = c(
sum(val_numeric == 2),
sum(val_numeric == 4)
)
)
}
result <- data %>%
group_by(pChar, Treatment) %>%
group_modify(~ perform_comparisons(.x[[outcome]], .x$value, outcome)) %>%
# Clean and format
filter(!is.na(p_value_levels))
# Apply pChar ordering if provided
if (!is.null(pChar_order)) {
result <- result %>%
mutate(pChar = factor(pChar, levels = pChar_order))
}
return(result)
}
## Combine the two comparison functions
all_comparisons <- function(data, outcome, K_threshold) {
# Run treatment comparison
treatment_results <- treatment_comparisons(
data,
outcome
)
# Run level comparisons (with same pChar ordering as treatment_results)
level_results <- level_comparisons(
data,
outcome,
pChar_order = levels(treatment_results$pChar)
)
# Join results and re-format data
left_join(
treatment_results,
level_results,
relationship = "many-to-many"
) %>%
mutate(
max_delta_value = ifelse(max_delta == abs(delta_group), 0, 1),
Treatment = factor(Treatment)
)
}
## Function for generating the annotated split-violin plots
# Input: outcome and comparison tables derived from Hykal_simData ("data", "data_p"), name of outcome variable ("outcome")
halfviolin_facetPlot <- function(data, outcome, data_p) {
outcome <- sym(outcome)
labels_yOffset <- min(data[[outcome]]) %>% abs()
data_p <- data_p %>%
mutate(., labels_yOffset = labels_yOffset)
data %>%
# If facets are to be arranged in order of greatest difference between the group medians or decreasing significance of the differences
# (see options for function reformat_data above)
mutate(
pChar = factor(pChar, levels = levels(data_p$pChar) )
)%>%
ggplot(
aes(
x = value,
y = !!outcome,
color = Treatment,
fill = Treatment,
label = Treatment
)
) +
geom_tile(
inherit.aes = FALSE,
data = data_p,
aes(
x = value,
y = 0,
height = max_delta_value*Inf,
width = 1
),
fill = "lightgrey",
alpha = 0.15
) +
geom_split_violin(
width = 0.9,
nudge = 0.01,
alpha = 0.3,
color = "grey") +
geom_point(
pch = 1,
size = 2,
position =
position_jitterdodge(
dodge.width = 0.5,
jitter.width = 0.35
),
stroke = 1
) +
geom_boxplot(
color = "black",
alpha = 0,
width = 0.3,
outlier.shape = NA) +
# p-value annotation for treatment comparisons
geom_text(
data = data_p,
inherit.aes = FALSE,
aes(
alpha = 1 - max_delta_value/1.5,
label =
ifelse (
p_value_groups > 0.05, "ns",
paste0(
"p == ",
signif(p_value_groups, digits = 2)
)
),
x = value,
y = -labels_yOffset - 1
),
parse = TRUE
) +
# Brackets and p-value annotation for level comparisons
geom_errorbarh(
inherit.aes = FALSE,
aes(
color = Treatment,
y = - as.numeric(Treatment) * 2 - labels_yOffset -1,
xmin = as.numeric(Treatment)/5.75 + 0.733,
xmax = as.numeric(Treatment)/5.75 + 1.733
)
) +
geom_text(
data = data_p,
inherit.aes = FALSE,
aes(
label =
ifelse (
p_value_levels > 0.05, "ns",
paste0(
"p == ",
signif(p_value_levels, digits = 2)
)
),
y = -as.numeric(Treatment) * 2 - labels_yOffset,
color = Treatment
),
x = 1.5,
parse = TRUE
) +
geom_split_violin_labels() +
# Prevent automatic re-scaling of alpha by ggplot2 to ensures alpha is controlled by max_delta_value in geom_text
scale_alpha_identity() +
scale_color_manual(values = c("#27C8F5", "#F58E27") ) +
scale_fill_manual(values = c("#27C8F5", "#F58E27") ) +
scale_x_discrete(expand = c(0,0)) +
facet_wrap(
.~pChar,
scales = "free_x") +
theme_light () +
theme(
strip.background = element_rect(fill = "#f2f3f4", color = "grey"),
strip.text = element_text(color = "black"),
text = element_text( size = 13),
plot.title = element_text(
size = 14,
hjust = 0.5,
vjust = 5
),
axis.text = element_text(size = 12),
plot.caption = element_text(
size = 8,
hjust = 0,
vjust = -5
),
plot.margin = unit(c(0.5,0.1,0.5,0), "cm"),
legend.position = "none"
)
}
# Required objects and functions: HyKal_simData_rate (chunk 2), HyKal_simData_normalFirst (chunk 3), HyKal_simData_normalPeriod (chunk 4), reformat_data (split_violin_plots_and_annotations)
optimize_thresholds <- function(data,
Age_threshold,
eGFR_threshold,
K_threshold,
analysis_type = "Rate",
plot_results = TRUE) {
# Function to optimize a single parameter
optimize_single <- function(param_name, param_values, fixed_age, fixed_eGFR, fixed_K) {
# Define the pattern to filter by based on optimization parameter
filter_pattern <- switch(param_name,
"Age" = "^Age",
"eGFR" = "^Baseline eGFR",
"K" = "^Baseline K")
# Perform optimisation for this parameter
results <- lapply(param_values, function(x) {
# Set the threshold values based on which parameter we're optimizing
current_Age <- ifelse(param_name == "Age", x, fixed_age)
current_eGFR <- ifelse(param_name == "eGFR", x, fixed_eGFR)
current_K <- ifelse(param_name == "K", x, fixed_K)
# Run the analysis pipeline
reformat_data(data, current_Age, current_eGFR, current_K) %>%
treatment_comparisons(., analysis_type) %>%
filter(grepl(filter_pattern, pChar)) %>%
select(max_delta) %>%
unique()
})
# Combine results
results_vector <- as.numeric(list_cbind(results))
return(list(
parameter = param_name,
threshold_values = param_values,
max_delta_values = results_vector
))
}
# Use first values as fixed values for optimization
fixed_age <- Age_threshold[1]
fixed_eGFR <- eGFR_threshold[1]
fixed_K <- K_threshold[1]
# Initialize results list
optimization_results <- list()
# Optimize Age threshold (if vector provided)
if (length(Age_threshold) > 1) {
optimization_results$Age <- optimize_single("Age", Age_threshold, fixed_age, fixed_eGFR, fixed_K)
if (plot_results) {
plot(optimization_results$Age$threshold_values,
optimization_results$Age$max_delta_values,
xlab = "Age threshold",
ylab = "max_delta",
main = paste("Age Threshold Optimization -", analysis_type),
type = "b")
}
}
# Optimize eGFR threshold (if vector provided)
if (length(eGFR_threshold) > 1) {
optimization_results$eGFR <- optimize_single("eGFR", eGFR_threshold, fixed_age, fixed_eGFR, fixed_K)
if (plot_results) {
plot(optimization_results$eGFR$threshold_values,
optimization_results$eGFR$max_delta_values,
xlab = "eGFR threshold",
ylab = "max_delta",
main = paste("eGFR Threshold Optimization -", analysis_type),
type = "b")
}
}
# Optimize K threshold (if vector provided)
if (length(K_threshold) > 1) {
optimization_results$K <- optimize_single("K", K_threshold, fixed_age, fixed_eGFR, fixed_K)
if (plot_results) {
plot(optimization_results$K$threshold_values,
optimization_results$K$max_delta_values,
xlab = "K threshold",
ylab = "max_delta",
main = paste("K Threshold Optimization -", analysis_type),
type = "b")
}
}
return(optimization_results)
}
# Only run after HyKal_simData_rate has been created (chunk 2)
optimize_thresholds(
data = HyKal_simData_rate,
Age_threshold = seq(50, 80, 1),
eGFR_threshold = seq(0, 60, 1),
K_threshold = seq(4, 7, 0.1),
analysis_type = "Rate"
)
# Only run after HyKal_simData_normalFirst has been created (chunk 3)
optimize_thresholds(
data = HyKal_simData_normalFirst,
Age_threshold = seq(50, 80, 1),
eGFR_threshold = seq(0, 60, 1),
K_threshold = seq(4, 7, 0.1),
analysis_type = "First_Week"
)
# Only run after HyKal_simData_normalPeriod has been created (chunk 4)
optimize_thresholds(
data = HyKal_simData_normalPeriod,
Age_threshold = seq(50, 80, 1),
eGFR_threshold = seq(0, 60, 1),
K_threshold = seq(4, 7, 0.1),
analysis_type = "Period"
)
## - Chunk 2
# Required objects and functions: HyKal_simData (chunk libraries_and_data), quantBase_dotPlot
## Rate of serum K[+] change vs. continuous baseline characteristic variables
# Calculate treatments response rates (change in serum K[+] per week, fitted as a linear model)
HyKal_simData_rate <-
HyKal_simData %>%
group_by(Patient_ID) %>%
summarise(
Rate = coef(lm(Week ~ Serum_Potassium_mEq_L))[2],
.groups = "drop"
) %>%
right_join(HyKal_simData, by = "Patient_ID") %>%
filter(Time_Point == "Baseline") %>%
select(
Treatment,
Age, Gender,
Baseline_eGFR, Serum_Potassium_mEq_L,
Diabetes, Hypertension,
Rate
)
# Plot rate of K[+] level change vs. continuous baseline variables
quantBase_dotPlot(HyKal_simData_rate, outcome = "Rate", label_nudge = 6, label_offset = 9, lambda = 20) +
labs(
title = "Rate of serum K[+] decline vs. continuous baseline characteristic variables",
x ="",
y = bquote("Serum K[+] change (" ~ Delta ~ "mEq /L /Week)"),
caption = "Thick solid lines: smoothed medians of treatment response rates. Thin dotted lines: overall median."
)
For age and baseline eGFR, smoothed medians of the treatment response
rate (thick solid lines) diverged minimally from the overall median
(thin dotted lines), showing no evidence of an association. Baseline
serum K[+] levels appeared to be negatively associated with response
rate variance. Weak trends existed for declining response rates with
increasing K[+] levels in SPS-treated patients, and increasing rates in
patients treated with SZC.
## - Chunk 3
# Required objects and functions: HyKal_simData (chunk libraries_and_data), normokalemia (chunk 1), quantBase_dotPlot
## Time of first normal serum K[+] vs. continuous baseline characteristic variables
# Interpolate time points at which serum K[+] levels move in and out of the normal range
HyKal_simData_normal <-
HyKal_simData %>%
select(Patient_ID, Week, Serum_Potassium_mEq_L) %>%
nest(response = !Patient_ID) %>%
mutate(response_diff = map(response, ~ {
x <- .
Week1 <- x$Week[-length(x$Week)]
K1 <- x$Serum_Potassium_mEq_L[-length(x$Serum_Potassium_mEq_L)]
Delta_Week <- diff(x$Week)
Delta_K <- diff(x$Serum_Potassium_mEq_L)
m <- Delta_K / Delta_Week
# Estimate crossing points in and out of normal K[+] range
hyper_normokal <- (max(normokalemia) - K1) / m
hyper_normokal <- ifelse(hyper_normokal < 0 | hyper_normokal > Delta_Week, NA, hyper_normokal + Week1)
hypo_normokal <- (min(normokalemia) - K1) / m
hypo_normokal <- ifelse(hypo_normokal < 0 | hypo_normokal > Delta_Week, NA, hypo_normokal + Week1)
tibble(
Week1 = Week1,
K1 = K1,
Delta_Week = Delta_Week,
Delta_K = Delta_K,
m = m,
hyper_normokal = hyper_normokal,
hypo_normokal = hypo_normokal
)
})) %>%
select(Patient_ID, response_diff) %>%
unnest(cols = c(response_diff))
# Select first crossing point from elevated to normal serum K[+] levels (all patients start as hyperkalemic)
HyKal_simData_normalFirst <-
HyKal_simData_normal %>%
# Remove duplicate when a crossing point coincides with a measured time-point
filter(hyper_normokal > Week1 | hypo_normokal > Week1) %>%
select("Patient_ID", "hyper_normokal") %>%
group_by(Patient_ID) %>%
slice_min(., hyper_normokal) %>%
ungroup() %>%
na.omit %>%
left_join(
.,
HyKal_simData %>% filter(Time_Point == "Baseline") %>%
select(!c(Week, Time_Point) )
) %>%
rename("First_Week" = "hyper_normokal") %>%
select(!Patient_ID)
# Plot first week within normal range vs. continuous baseline characteristic variables
HyKal_simData_normalFirst %>%
quantBase_dotPlot(., outcome = "First_Week", label_nudge = 0.1, label_offset = -0.9, lambda = 20) +
labs(
title = "Time of first normal serum K[+] vs. continuous baseline characteristic variables",
x ="",
y = bquote("Time of first normal serum K[+] (Week)"),
caption = "Thick solid lines: smoothed medians of treatment response rates. Thin dotted lines: overall median."
)
Time of first normal serum K[+] does not appear to correlate with
age, baseline eGFR, or baseline serum K[+] levels.
## - Chunk 4
# Required objects and functions: HyKal_simData (chunk libraries_and_data), HyKal_simData_normal (chunk 3), quantBase_dotPlot
## Length of time normal serum K[+] levels were observed vs. continuous baseline characteristic variables
# Calculate time intervals in the normal range and their sums
HyKal_simData_normalPeriod <-
HyKal_simData_normal %>%
select(Patient_ID, hyper_normokal, hypo_normokal) %>%
nest(normal_periods = !Patient_ID) %>%
mutate(
Period = map(
normal_periods, ~ {
x <- .
unlist(x) %>%
# Times of all crossing points, in and out of normokalemia, plus time of last observation (right censored)
{c(.[!is.na(.)], max(HyKal_simData$Week) )} %>%
# Calculate time intervals between consecutive crossing points
sort() %>% diff() %>%
# Keep only the intervals when K[+] is in the normal range (first crossing is into the normal range because all patients start hyperkalemic)
.[seq(1,max(HyKal_simData_normal$Week1),2)] %>%
sum(na.rm = TRUE)
}
)
) %>%
select(Patient_ID, Period) %>%
unnest(cols = c(Period) ) %>%
left_join(
.,
HyKal_simData %>% filter(Time_Point == "Baseline")
) %>%
select(., !c(Patient_ID, Time_Point, Week))
# Plot first week within normal range vs. continuous baseline variables
HyKal_simData_normalPeriod %>%
quantBase_dotPlot(., outcome = "Period", label_nudge = 1.7, label_offset = 2.5, lambda = 20) +
labs(
title = "Length of time normal serum K[+] levels were observed vs. continuous baseline characteristic variables",
x ="",
y = bquote("Total time at normal serum K[+] levels (Weeks)"),
caption = "Thick solid lines: smoothed medians of treatment response rates. Thin dotted lines: overall median."
)
The total length of time normal serum K[+] levels were observed
correlates with age, baseline eGFR, and baseline serum K[+] levels in
SZC-treated patients.
Continuous baseline characteristic variables were categorised
with thresholds that resulted in the greatest differences between median
outcomes in SPS and SZC-treated patients. I decided to show the
distribution of outcome values for all patient subgroups. The R script
dynamically shades the comparison with the smaller difference for each
baseline category value pair, and arranges the baseline variables from
left to right in order of decreasing maximum difference.
## - Chunk 5
# Required objects and functions: HyKal_simData_rate (chunk 2), reformat_data, all_comparisons, halfviolin_facetPlot
## Plot distribution of treatment response rates and their differences between treatment groups
# Thresholds that resulted in the greatest or most significant differences between serum K[+] response rates in the two treatment groups
Age_threshold <- 61 # greatest significance: 60
eGFR_threshold <- 28 # greatest significance: 41
K_threshold <- 5.3 # greatest significance: 5.4
# Generate a data-frame with test statistics for annotating the plot and sorting comparisons in stratified groups by significance
HyKal_simData_rate_p <-
HyKal_simData_rate %>%
reformat_data(. , Age_threshold, eGFR_threshold, K_threshold) %>%
all_comparisons(., "Rate")
# Plot first week within normal range vs. categorical and categorised baseline variables
HyKal_simData_rate %>%
reformat_data(. , Age_threshold, eGFR_threshold, K_threshold) %>%
halfviolin_facetPlot(., "Rate", HyKal_simData_rate_p) +
labs(
title = paste(
"Rate of serum K[+] change in patients treated with SPS or SZC, \nstratified by categorical or categorised baseline patient characteristics"),
x = NULL,
y = bquote("Rate of serum K[+] " ~ "change (" ~ Delta ~ "mEq /L /Week)"),
caption = "Facets with baseline characteristics are arranged in order of decreasing maximum difference between the median responses in SPS and SZC groups (shaded background: \nbaseline characteristic value associated with the smaller difference ). \nP-values: two-sided Wilcox test, no adjustment for multiple comparisons."
)
## - Chunk 6
# Required objects and functions: HyKal_simData_normalFirst (chunk 4), reformat_data, all_comparisons, halfviolin_facetPlot
## Plot distribution of times to first normal serum K[+] and their differences between treatment groups
# Thresholds that resulted in the greatest or most significant differences between the times of first normokalemia in the two treatment groups
Age_threshold <- 65 # greatest significance: 60
eGFR_threshold <- 23.5 # greatest significance: 41
K_threshold <- 5.2 # greatest significance: 5.2
# Generate a data-frame with test statistics for annotating the plot and sorting comparisons in stratified groups by significance
HyKal_simData_normalFirst_p <-
HyKal_simData_normalFirst %>%
reformat_data(. , Age_threshold, eGFR_threshold, K_threshold) %>%
all_comparisons(., "First_Week")
# Plot time to first observation of normal serum K[+] vs. categorical and categorised baseline variables
HyKal_simData_normalFirst %>%
reformat_data(. , Age_threshold, eGFR_threshold, K_threshold) %>%
halfviolin_facetPlot(., "First_Week", HyKal_simData_normalFirst_p) +
# Do not show negative scale for First_Week
scale_y_continuous(breaks = seq(0, 8, 2)) +
labs(
title = paste(
"Time of first normal serum K[+] in patients treated with SPS or SZC \nand stratified by categorical or categorised baseline patient characteristics"),
x = NULL,
caption = "Facets with baseline characteristics are arranged in order of decreasing maximum difference between the median responses in SPS and SZC groups (shaded background: \nbaseline characteristic value associated with the smaller difference ). \nP-values: two-sided Wilcox test, no adjustment for multiple comparisons."
)
## - Chunk 7
# Required objects and functions: HyKal_simData_normalPeriod (chunk 6), reformat_data, all_comparisons, halfviolin_facetPlot
## Plot distribution of total lengths of time with normal serum K[+], and their differences between treatment groups
# Thresholds that resulted in the greatest or most significant differences between total times of normokalemia in the two treatment groups
Age_threshold <- 69 # greatest significance: 60
eGFR_threshold <- 28 # greatest significance: 41
K_threshold <- 5.2 # greatest significance: 5.2
# Generate a data-frame with test statistics for annotating the plot and sorting comparisons in stratified groups by significance
HyKal_simData_normalPeriod_p <-
HyKal_simData_normalPeriod %>%
reformat_data(. , Age_threshold, eGFR_threshold, K_threshold) %>%
all_comparisons(., "Period")
# Plot total lengths of time when serum K[+] was within the normal range vs. categorical and categorised baseline variables
HyKal_simData_normalPeriod %>%
reformat_data(. , Age_threshold, eGFR_threshold, K_threshold) %>%
halfviolin_facetPlot(., "Period", HyKal_simData_normalPeriod_p) +
# Do not show negative scale for First_Week
scale_y_continuous(breaks = seq(0, 8, 2)) +
labs(
title = paste(
"Total time serum K[+] levels were within the normal range in patients treated with SPS or SZC \nand stratified by categorical or categorise baseline patient characteristics"),
x = NULL,
y = "Total lenght of time (Weeks)",
caption = "Facets with baseline characteristics are arranged in order of decreasing maximum difference between the median responses in SPS and SZC groups (shaded background: \nbaseline characteristic value associated with the smaller difference ). \nP-values: two-sided Wilcox test, no adjustment for multiple comparisons."
)
The largest difference between the two treatments existed in patients
with low baseline serum K[+], for all three outcomes. However, the
significance of these difference was low (insignificant after correction
for multible comparisons) because the patient numbers in this subgroup
are small.
The next-largest more significant differences were in diabetics (rate),
patients aged 65 years or older (time to first normal K[+]), and
baseline eGFR (total time of normokalemia). Total length of time with
normokalemia was the outcome that showed the most significant
differences between SPS and SZC across patient subgroups.
[Note: Because I “optimised” the threshold for categorising the
continuous variables, Wilcox test p-values are not really appropriate
here, and just serve as illustration for how a visualisation of the data
might work.]
First week of normal serum K[+] levels differed from the other
two analyses by not showing patients in whom normokalemia was never
observed at any time. Only eight of these non-responders existed, but
for completeness compared their baseline characteristics with those of
the responders in both treatment groups.
# Required objects and functions: R libraries, HyKal_simData (chunk libraries_and_data)
## Association of baseline variables with achieving normokalemia in at least one week of treatment
# Identify patients whose serum K[+] was outside the normal range throughout treatment
HyKal_simData_normoKal <-
HyKal_simData %>%
# Add a column showing whether baseline levels were below or above K_threshold
mutate(
K_base = case_when(
Serum_Potassium_mEq_L < K_threshold & Time_Point == "Baseline"
~ paste0(Treatment, ".K_low"),
Serum_Potassium_mEq_L > K_threshold & Time_Point == "Baseline"
~ paste0(Treatment, ".K_high"),
.default = NA_character_
) %>% factor()
) %>%
group_by(Patient_ID) %>%
fill(K_base, .direction = "downup") %>%
# Add a column showing whether serum K[+] levels were normal at any time during treatment
mutate(
normoKal = ifelse(
Serum_Potassium_mEq_L < max(normokalemia),
Week, NA),
normoKal = if (all(is.na(normoKal[-1]))) "Never" else "At least once"
) %>%
ungroup()
# Plot distribution of continuous baseline variable values for patients who achieved normokalemia and those who did not
HyKal_simData_normoKal %>%
filter(Time_Point == "Baseline") %>%
mutate(
normoKal = paste0(normoKal, "\n(", Treatment, ")"),
normoKal = factor(normoKal, levels = rev(unique(normoKal)))
) %>%
select(Treatment, Age, Baseline_eGFR, Serum_Potassium_mEq_L, normoKal) %>%
rename(
"Age \n(years)" = Age,
"Baseline eGFR \n(ml/min/1.73m^2)" = Baseline_eGFR,
"Baseline serum K[+] \n(mEq/L)" = Serum_Potassium_mEq_L) %>%
pivot_longer(
cols = !c(Treatment, normoKal),
names_to = "pChar"
) %>%
ggplot(
aes(
x = normoKal,
y = value,
fill = Treatment,
color = Treatment
)
) +
geom_violin(
width = 0.5,
alpha = 0.4,
color = "grey") +
geom_point(
pch = 1,
size = 2,
position =
position_jitterdodge(
dodge.width = 0.6,
jitter.width = 0.2
),
stroke = 1,
alpha = 0.7
) +
geom_boxplot(
color = "black",
alpha = 0,
width = 0.3,
outlier.shape = NA) +
scale_color_manual(values = c("#27C8F5", "#F58E27") ) +
scale_fill_manual(values = c("#27C8F5", "#F58E27") ) +
facet_grid(
pChar ~ Treatment,
switch = "y",
scales = "free", space = "free_x"
) +
labs(
x = "Number of times normal serum K[+] levels were observed",
y = "",
title = "Association of continuous baseline characteristic variables \nwith achievement of normokalemia at any timepoint during treatment",
caption = "All patients treated with SZC had normal serum K[+] levels in at least one week of treatment. \nHigh eGFR and low K[+] at baseline could be risk factors for not achieving normokalemia at any time in patients treated with SPS."
) +
theme_light () +
theme(
strip.background = element_blank(),
strip.text.x = element_text(
color = "black",
vjust = 3,
size = 12,
),
strip.text.y = element_text(
color = "black",
size = 12
),
strip.placement = "outside",
plot.title = element_text(
size = 14,
hjust = 0.5,
vjust = 5
),
axis.text = element_text(size = 11),
plot.caption = element_text(
size = 8,
hjust = 0,
vjust = -5
),
plot.margin = unit(c(0.5,0,0.5,0), "cm"),
legend.position = "none"
)
# Required objects and functions: - R libraries (chunk libraries_and_data)
# - HyKal_simData_normoKal (chunk 8)
## Plot fraction of categorical baseline characteristics among who did and did not achieve normokalemia
HyKal_simData_normoKal %>%
filter(Time_Point == "Baseline") %>%
mutate(normoKal = paste0(normoKal, "\n(", Treatment, ")")) %>%
select(Treatment, Gender, Diabetes, Hypertension, normoKal) %>%
pivot_longer(
cols = !c(Treatment, normoKal),
names_to = "pChar",
values_to = "pChar_value"
) %>%
group_by(pChar, normoKal) %>%
summarise(
"Yes" = mean(pChar_value == "Yes") * 100,
"No" = mean(pChar_value == "No") * 100,
"Male" = mean(pChar_value == "Male") * 100,
"Female" = mean(pChar_value == "Female") * 100,
n = n()
) %>% ungroup() %>%
pivot_longer(
cols = c(Yes, No, Male, Female),
names_to = "pChar_value",
values_to = "pChar_value_perc"
) %>%
mutate(
Treatment = ifelse(grepl("SZC", normoKal), "SZC", "SPS"),
Treatment_value = paste0(Treatment, "_", pChar_value),
normoKal = paste0(normoKal, "\nn = ", n),
normoKal = factor(normoKal, levels = rev(unique(normoKal)))
) %>%
filter(!pChar_value_perc == 0) %>%
ggplot(
aes(
x = normoKal,
y = pChar_value_perc,
fill = Treatment_value,
group = Treatment_value
)
) +
geom_col(
width = 0.5
) +
geom_text(
aes(label = pChar_value),
color = "white",
fontface = "bold",
size = 3.5,
position = position_stack(vjust = 0.5)
) +
scale_fill_manual(
values = c(
"#27C8F5", "#4795A8", "#27C8F5", "#4795A8",
"#F58E27", "#A34018", "#F58E27", "#A34018"
)
) +
facet_grid(
pChar ~ Treatment,
switch = "y",
scales = "free_x", space = "free_x"
) +
labs(
x = "Number of times normal serum K[+] levels were observed",
y = "Fraction (%)",
title = "Association of categorical baseline characteristic variables \nwith achievement of normokalemia at any timepoint during treatment"
) +
theme_light () +
theme(
strip.background = element_blank(),
strip.text.x = element_text(
color = "black",
vjust = 3,
size = 12,
),
strip.text.y = element_text(
color = "black",
size = 12
),
strip.placement = "outside",
text = element_text( size = 13),
plot.title = element_text(
size = 14,
hjust = 0.5,
vjust = 5
),
axis.text = element_text(size = 11),
plot.caption = element_text(
hjust = 0,
vjust = -5
),
plot.margin = unit(c(0.5,0,0.5,0), "cm"),
legend.position = "none"
)
I am grateful to the panelists of the PSI VIS webinar for suggestions
for improving the figures. One that I did not implement is adding a
Subpopulation Treatment Effect Pattern Plot (STEPP). STEPP plots were
originally developed for the analysis of time-to-event (i.e., survival),
or generic data where the outcome has an exponential distribution
5. They commonly consist of three panels: the first is
similar to the smoothed median lines in the quantitative baseline
characteristic variable plots of section 4.1. of this document. The
second and third panel show difference and ratio, respectively, of the
smoothed medians of two groups with confidence intervals. Visual
inspection may help to identify covariate value ranges for which the
treatment effect might differ. However, STEPP is intended as an
exploratory tool rather than for determining specific cut-points. An R
package “stepp” is available at CRAN 6.
Chronic Hyperkaliemia in Chronic Kidney Disease: An Old Concern with New Answers. Borrelli S, Matarazzo I, Lembo E, Peccarino L, Annoiato C, Scognamiglio MR, Foderini A, Ruotolo C, Franculli A, Capozzi F, Yavorskiy P, Merheb F, Provenzano M, La Manna G, De Nicola L, Minutolo R, Garofalo C. Int J Mol Sci. 2022 Jun 7;23(12):6378 https://doi.org/10.3390/ijms23126378
Stephen L Seliger, Hyperkalemia in patients with chronic renal failure, Nephrology Dialysis Transplantation, Volume 34, Issue Supplement_3, December 2019, Pages iii12–iii18, https://doi.org/10.1093/ndt/gfz231
Sodium zirconium cyclosilicate versus sodium polystyrene sulfonate for treatment of hyperkalemia in hemodialysis patients: a randomized clinical trial. Elsayed MM, Abdelrahman MA, Sorour AM, Rizk IG, Hassab MAA. BMC Nephrol. 2025 May 6;26(1):227 https://doi.org/10.1186/s12882-025-04129-9
Sergio Venturini, Marco Bonetti, Ann A. Lazar, Bernard F. Cole, Xin Victoria Wang, Richard D. Gelber, Wai-Ki Yip, Subpopulation Treatment Effect Pattern Plot (STEPP) Methods with R and Stata, J. data sci. 21(2022), no. 1, 106-126 https://doi.org/10.6339/22-JDS1060