Implementing the method proposed by Xing et al.
Source: Xing Xing, Aiwen Xing, Kannan Natarajan, Haitao Chu, Lifeng Lin, Jiayi Tong, An alternative method for assessing the fragility of survival analysis results: a proof-of-concept study based on the log-rank test, American Journal of Epidemiology, Volume 195, Issue 4, April 2026, Pages 1175–1181, https://doi.org/10.1093/aje/kwaf229
Case Study 1 (Demonstrating Modifying One Group)
Dataset: AML Maintenance study
Does extending the standard course of chemotherapy with additional maintenance cycles improve survival of patients with acute myelogenous leukemia (AML)?
Treatment group (“Maintained”) had 11 participants, and the control group (“Nonmaintained”) had 12
Source: page 43: Rupert G. Miller (1997), Survival Analysis. John Wiley & Sons. ISBN: 0-471-25218-2
Step 1: Conduct Log Rank Test on Original Data
original_p <- survdiff(Surv(time, status) ~ x, data = aml)$pvalue
cat("Original p-value =", round(original_p, 4), "\n")## Original p-value = 0.0653
Since p > .05, we do not have a statistically significant result. To assess fragility of this result we want to see how many changes it takes to get a significant result. Since the maintained group was smaller, that is the one we will modify. The maintained group had a higher survival than the control so we will modify Event Status to Censored to raise survival and bring it farther from the control.
Step 2/3 Make Change and Recalculate P-value
#Make new data set to modify (mod) and order it by time to find earliest time
mod <- aml[order(aml$time), ]
#Start counter for FIS value (#iters)
FIS = 0
#Go through sorted data until you find "Maintained" (trt group) and status ==1 (Event)
#and change to 0 (Censored) then recalculate p
for (i in 1:nrow(mod)) {
if (mod$x[i] == "Maintained" && mod$status[i] == 1) {
mod$status[i] <- 0
pval <- survdiff(Surv(time, status) ~ x, data = mod)$pvalue
FIS = FIS + 1 #increase count by 1
if (pval < 0.05) break #stop when significance flips
}
}
cat("FIS = ", FIS, "\n")## FIS = 1
Plot
#call it Maintained_modified to emphasize in graph
mod$x <- ifelse(mod$x == "Maintained", "Maintained_modified", as.character(mod$x))
#combining original data and modified data (nonmaintained group didn't change so only need original)
plot_df <- bind_rows(
aml[aml$x == "Nonmaintained", ],
aml[aml$x == "Maintained", ],
mod[mod$x == "Maintained_modified", ]
)
plot_df$x <- factor(plot_df$x,
levels = c("Nonmaintained", "Maintained", "Maintained_modified"),
labels = c("No Maintenance", "Maintenance", "Maintenance Modified"))
surv_obj <- survfit(Surv(time, status) ~ x, data = plot_df)
p <- ggsurvplot(
surv_obj,
data = plot_df,
palette = c("skyblue", "red", "red"),
linetype = c("solid", "solid", "twodash"),
censor.shape = NA,
xlab = "Time (weeks)",
legend.title = "",
legend.labs = c("Control", "Maintenance", "Maintenance Modified"),
ggtheme = theme_minimal(base_size = 14),
legend = c(0.87, 0.9)
)
p$plot +
annotate("text", x = 79, y = 0.75, label = paste0("FIS = ", FIS), fontface = "bold") +
annotate("text", x = 74, y = 0.65,
label = paste0("Maintenance vs. Control: P = ", round(original_p, 4), "\nModified Maintenance vs. Control : P = ", round(pval, 4)),
hjust = 0, size = 4, fontface = "bold")## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).
With an FIS of 1 this result is very fragile. This is not very surprising, since the sample was small and the original p-value was close to the 0.05 threshold (~0.06).
Case Study 2 (Demonstrating FIS With Changing Both Groups)
Dataset: Data from a trial of ursodeoxycholic acid (UDCA) in patients with primary biliary cirrohosis (PBC).
Does UDCA improve survival in PBC patients?
Treatment group had 84 participants, and the placebo group had 86
Source: T. M. Therneau and P. M. Grambsch, Modeling survival data: extending the Cox model. Springer, 2000.
Step 1: Conduct Log Rank Test on Original Data
original_p <- survdiff(Surv(futime, status) ~ trt, data = udca1)$pvalue
cat("Original p-value =", round(original_p, 4), "\n")## Original p-value = 3e-04
Since the original result is statistically significant, to access its fragility, we want to see how many changes it takes to make it not significant (p >= 0.05). Since the UDCA group had higher survival, we will change the status from Censored to Event in the UDCA group to lower survival and change from Event to Censored in the Placebo group to improve survival. This will bring the survival curves closer together.
Step 2/3 Make Change and Recalculate P-value
#Make new data set to modify (mod) and order it by time to find earliest time
mod <- udca1[order(udca1$futime), ]
#Start counter for FIS value (#iters)
FIS = 0
#create variables to store most recent index where modification occured for each group
prev_index_p = 0
prev_index_t = 0
#first iteration to make the first modification (earliest eligible regardless of group)
for(i in 1:nrow(mod)) {
# if trt = control and status = Event
if(mod$trt[i] == 0 && mod$status[i] == 1) {
#modify status to Censored
mod$status[i] = 0
FIS = FIS +1 #increase counter
prev_index_p = i #store index
prev_group = 0 #store group we just changed
#if trt = treatment and status = Censored
} else if(mod$trt[i] == 1 && mod$status[i] == 0) {
#modify status to Event
mod$status[i] = 1
FIS = FIS +1 #increase counter
prev_index_t = i#store index
prev_group = 1#store group we just changed
} else {
next #skips to next iteration bc don't need to recalculate p if no change
}
#recalculate p-value
pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
break #stop loop after first change
}
#now we have to alternate groups
while (pval < .05) {
#if last change was in treatment group
if (prev_group == 1) {
#search from index after previous index to end of mod (don't need to check indices we've already checked)
for (i in (prev_index_p+1):nrow(mod)) {
#look for placebo group and Event Status
if(mod$trt[i] == 0 && mod$status[i] == 1) {
#modify status to Censored
mod$status[i] = 0
FIS = FIS +1 #increase counter
prev_group = 0 #set previous group to control
} else {
next #skips to next iteration bc don't need to recalculate p if no change
}
#recalculate p-value
pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
prev_index_p = i#store previous index for control group
break
}
#else if previous group was placebo
} else if (prev_group == 0) {
#search from index after previous index to end of mod (don't need to check indices we've already checked)
for (i in (prev_index_t+1):nrow(mod)) {
#look for treatment group and Censored Status
if(mod$trt[i] == 1 && mod$status[i] == 0) {
#modify status to Event
mod$status[i] = 1
FIS = FIS +1 #increase counter
prev_group = 1 #set previous group to treatment
} else {
next #skips to next iteration bc don't need to recalculate p if no change
}
#recalculate p-value
pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
prev_index_t = i#store previous index for treatment group
break
}
}
}
cat("FIS = ", FIS)## FIS = 16
Plot
#mark as modified to see on graph
mod$trt <- paste0(mod$trt, "_modified")
#combine original and modified data to show on graph
plot_df <- rbind(udca1, mod)
p <- ggsurvplot(
survfit(Surv(futime, status) ~ trt, data = plot_df),
data = plot_df,
risk.table = FALSE,
palette = c("skyblue", "skyblue","red", "red"),
linetype = c("solid", "dashed", "solid", "dashed"),
censor.shape = NA,
legend.title = "",
legend.labs = c("Placebo", "Placebo Modified", "UDCA", "UDCA Modified"),
ggtheme = theme_minimal(base_size = 14),
legend = c(0.9, 0.9)
)
p$plot +
xlab("Time(days)") +
ylab("Survival Probability") +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold"),
axis.text = element_text(face = "bold")
) +
annotate("text", x = 200, y = 0.25,
label = paste0("FIS = ", FIS),
fontface = "bold", size = 4) +
annotate("text", x = 100, y = 0.13,
label = paste0("UDCA vs. Placebo: P-value < 0.001\nModified UDCA vs. Placebo: P-value = ", round(pval, 4)),
hjust = 0, fontface = "bold", size = 4)## Warning: Removed 192 rows containing missing values or values outside the scale range
## (`geom_point()`).
In this case we got a higher FIS (16), so our results are much more stable.
What if we always change the earliest time and don’t alternate groups?
#Make new data set to modify (mod) and order it by time to find earliest time
mod <- udca1[order(udca1$futime), ]
#Start counter for FIS value (#iters)
FIS = 0
for(i in 1:nrow(mod)) {
# if trt = control and status = Event
if(mod$trt[i] == 0 && mod$status[i] == 1) {
#modify status to Censored
mod$status[i] = 0
FIS = FIS +1
#if trt = treatment and status = Censored
} else if(mod$trt[i] == 1 && mod$status[i] == 0) {
#modify status to Event
mod$status[i] = 1
FIS = FIS +1
} else {
next #skips to next iteration bc don't need to recalculate p if no change
}
#recalculate p-value
pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
if(pval >= 0.05) break #stop if significance flipped
}
cat("FIS = ", FIS)## FIS = 15
Changing the outcome of the earliest eligible subject regardless of groupings did result in a lower FIS.
What if we only changed treatment group?
#Make new data set to modify (mod) and order it by time to find earliest time
mod <- udca1[order(udca1$futime), ]
#Start counter for FIS value (#iters)
FIS = 0
for (i in 1:nrow(mod)) {
if (mod$trt[i] == 1 && mod$status[i] == 0) {
mod$status[i] <- 1
pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
FIS = FIS + 1 #increase count by 1
if (pval >= 0.05) break #stop when significance flips
}
}
cat("FIS = ", FIS, "\n")## FIS = 17
Only changing the treatment group resulted in a higher FIS than change both groups.
What if we only changed placebo group?
#Make new data set to modify (mod) and order it by time to find earliest time
mod <- udca1[order(udca1$futime), ]
#Start counter for FIS value (#iters)
FIS = 0
for (i in 1:nrow(mod)) {
if (mod$trt[i] == 0 && mod$status[i] == 1) {
mod$status[i] <- 0
pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
FIS = FIS + 1 #increase count by 1
if (pval >= 0.05) break #stop when significance flips
}
}
cat("FIS = ", FIS, "\n")## FIS = 15
Only changing the control group resulted in a lower FIS than only changing the treatment group. It was actually the same as changing in both groups without alternating.