FIS Demo

Implementing the method proposed by Xing et al.

Source: Xing Xing, Aiwen Xing, Kannan Natarajan, Haitao Chu, Lifeng Lin, Jiayi Tong, An alternative method for assessing the fragility of survival analysis results: a proof-of-concept study based on the log-rank test, American Journal of Epidemiology, Volume 195, Issue 4, April 2026, Pages 1175–1181, https://doi.org/10.1093/aje/kwaf229

Case Study 1 (Demonstrating Modifying One Group)

Dataset: AML Maintenance study

Does extending the standard course of chemotherapy with additional maintenance cycles improve survival of patients with acute myelogenous leukemia (AML)?

Treatment group (“Maintained”) had 11 participants, and the control group (“Nonmaintained”) had 12

Source: page 43: Rupert G. Miller (1997), Survival Analysis. John Wiley & Sons. ISBN: 0-471-25218-2

Step 1: Conduct Log Rank Test on Original Data

original_p <- survdiff(Surv(time, status) ~ x, data = aml)$pvalue
cat("Original p-value =", round(original_p, 4), "\n")

## Original p-value = 0.0653

ggsurvplot(survfit(Surv(time, status) ~ x, data = aml))

Since p > .05, we do not have a statistically significant result. To assess fragility of this result we want to see how many changes it takes to get a significant result. Since the maintained group was smaller, that is the one we will modify. The maintained group had a higher survival than the control so we will modify Event Status to Censored to raise survival and bring it farther from the control.

Step 2/3 Make Change and Recalculate P-value

#Make new data set to modify (mod) and order it by time to find earliest time
mod <- aml[order(aml$time), ]
#Start counter for FIS value (#iters)
FIS = 0
#Go through sorted data until you find "Maintained" (trt group) and status ==1 (Event) 
#and change to 0 (Censored) then recalculate p
for (i in 1:nrow(mod)) {
  if (mod$x[i] == "Maintained" && mod$status[i] == 1) {
    mod$status[i] <- 0
    pval <- survdiff(Surv(time, status) ~ x, data = mod)$pvalue
    FIS = FIS + 1 #increase count by 1
    if (pval < 0.05) break #stop when significance flips
  }
}
cat("FIS = ", FIS, "\n")

## FIS =  1

Plot

#call it Maintained_modified to emphasize in graph
mod$x <- ifelse(mod$x == "Maintained", "Maintained_modified", as.character(mod$x))

#combining original data and modified data (nonmaintained group didn't change so only need original)
plot_df <- bind_rows(
  aml[aml$x == "Nonmaintained", ],
  aml[aml$x == "Maintained", ],
  mod[mod$x == "Maintained_modified", ]
)

plot_df$x <- factor(plot_df$x,
                        levels = c("Nonmaintained", "Maintained", "Maintained_modified"),
                        labels = c("No Maintenance", "Maintenance", "Maintenance Modified"))
surv_obj <- survfit(Surv(time, status) ~ x, data = plot_df)
p <- ggsurvplot(
  surv_obj,
  data = plot_df,
  palette = c("skyblue", "red", "red"),
  linetype = c("solid", "solid", "twodash"),
  censor.shape = NA,
  xlab = "Time (weeks)",
  legend.title = "",
  legend.labs = c("Control", "Maintenance", "Maintenance Modified"),
  ggtheme = theme_minimal(base_size = 14),
  legend = c(0.87, 0.9)
)
p$plot +
  annotate("text", x = 79, y = 0.75, label = paste0("FIS = ", FIS), fontface = "bold") +
  annotate("text", x = 74, y = 0.65,
           label = paste0("Maintenance vs. Control: P = ", round(original_p, 4), "\nModified Maintenance vs. Control : P = ", round(pval, 4)),
           hjust = 0, size = 4, fontface = "bold")

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).

With an FIS of 1 this result is very fragile. This is not very surprising, since the sample was small and the original p-value was close to the 0.05 threshold (~0.06).

Case Study 2 (Demonstrating FIS With Changing Both Groups)

Dataset: Data from a trial of ursodeoxycholic acid (UDCA) in patients with primary biliary cirrohosis (PBC).

Does UDCA improve survival in PBC patients?

Treatment group had 84 participants, and the placebo group had 86

Source: T. M. Therneau and P. M. Grambsch, Modeling survival data: extending the Cox model. Springer, 2000.

Step 1: Conduct Log Rank Test on Original Data

original_p <- survdiff(Surv(futime, status) ~ trt, data = udca1)$pvalue
cat("Original p-value =", round(original_p, 4), "\n")

## Original p-value = 3e-04

ggsurvplot(survfit(Surv(futime, status) ~ trt, data = udca1), legend.labs = c("Placebo", "UDCA"))

Since the original result is statistically significant, to access its fragility, we want to see how many changes it takes to make it not significant (p >= 0.05). Since the UDCA group had higher survival, we will change the status from Censored to Event in the UDCA group to lower survival and change from Event to Censored in the Placebo group to improve survival. This will bring the survival curves closer together.

Step 2/3 Make Change and Recalculate P-value

#Make new data set to modify (mod) and order it by time to find earliest time
mod <- udca1[order(udca1$futime), ]
#Start counter for FIS value (#iters)
FIS = 0
#create variables to store most recent index where modification occured for each group
prev_index_p = 0
prev_index_t = 0
#first iteration to make the first modification (earliest eligible regardless of group)
for(i in 1:nrow(mod)) {
  # if trt = control and status = Event
  if(mod$trt[i] == 0 && mod$status[i] == 1) {
    #modify status to Censored
    mod$status[i] = 0
    FIS = FIS +1 #increase counter
    prev_index_p = i #store index
    prev_group = 0 #store group we just changed
    #if trt = treatment and status = Censored
  } else if(mod$trt[i] == 1 && mod$status[i] == 0) {
    #modify status to Event
    mod$status[i] = 1 
    FIS = FIS +1 #increase counter
    prev_index_t = i#store index
    prev_group = 1#store group we just changed
  } else {
    next #skips to next iteration bc don't need to recalculate p if no change
  }
  #recalculate p-value
  pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
  break #stop loop after first change
}

#now we have to alternate groups
while (pval < .05) {
  #if last change was in treatment group
  if (prev_group == 1) {
    #search from index after previous index to end of mod (don't need to check indices we've already checked)
    for (i in (prev_index_p+1):nrow(mod)) {
      #look for placebo group and Event Status
      if(mod$trt[i] == 0 && mod$status[i] == 1) {
        #modify status to Censored
        mod$status[i] = 0
        FIS = FIS +1 #increase counter
        prev_group = 0 #set previous group to control
      } else {
        next #skips to next iteration bc don't need to recalculate p if no change
      }
      #recalculate p-value
      pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
      prev_index_p = i#store previous index for control group
      break
    }
    #else if previous group was placebo
  } else if (prev_group == 0) {
     #search from index after previous index to end of mod (don't need to check indices we've already checked)
    for (i in (prev_index_t+1):nrow(mod)) {
      #look for treatment group and Censored Status
      if(mod$trt[i] == 1 && mod$status[i] == 0) {
        #modify status to Event
        mod$status[i] = 1
        FIS = FIS +1 #increase counter
        prev_group = 1 #set previous group to treatment
      } else {
        next #skips to next iteration bc don't need to recalculate p if no change
      }
      #recalculate p-value
      pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
      prev_index_t = i#store previous index for treatment group
      break
    }
  }
}

cat("FIS = ", FIS)

## FIS =  16

Plot

#mark as modified to see on graph
mod$trt <- paste0(mod$trt, "_modified")
#combine original and modified data to show on graph
plot_df <- rbind(udca1, mod)

p <- ggsurvplot(
  survfit(Surv(futime, status) ~ trt, data = plot_df),
  data = plot_df,
  risk.table = FALSE, 
  palette = c("skyblue", "skyblue","red", "red"),
  linetype = c("solid", "dashed", "solid", "dashed"),
  censor.shape = NA,
  legend.title = "",
  legend.labs = c("Placebo", "Placebo Modified", "UDCA", "UDCA Modified"),
  ggtheme = theme_minimal(base_size = 14),
  legend = c(0.9, 0.9)
)

p$plot +
  xlab("Time(days)") +
  ylab("Survival Probability") +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(face = "bold")
  ) +
  annotate("text", x = 200, y = 0.25,
           label = paste0("FIS = ", FIS),
           fontface = "bold", size = 4) +
  annotate("text", x = 100, y = 0.13,
           label = paste0("UDCA vs. Placebo: P-value < 0.001\nModified UDCA  vs. Placebo: P-value = ", round(pval, 4)),
           hjust = 0, fontface = "bold", size = 4)

## Warning: Removed 192 rows containing missing values or values outside the scale range
## (`geom_point()`).

In this case we got a higher FIS (16), so our results are much more stable.

What if we always change the earliest time and don’t alternate groups?

#Make new data set to modify (mod) and order it by time to find earliest time
mod <- udca1[order(udca1$futime), ]
#Start counter for FIS value (#iters)
FIS = 0
for(i in 1:nrow(mod)) {
  # if trt = control and status = Event
  if(mod$trt[i] == 0 && mod$status[i] == 1) {
    #modify status to Censored
    mod$status[i] = 0
    FIS = FIS +1
    #if trt = treatment and status = Censored
  } else if(mod$trt[i] == 1 && mod$status[i] == 0) {
    #modify status to Event
    mod$status[i] = 1 
    FIS = FIS +1
  } else {
    next #skips to next iteration bc don't need to recalculate p if no change
  }
  #recalculate p-value
  pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
  if(pval >= 0.05) break #stop if significance flipped
}

cat("FIS = ", FIS)

## FIS =  15

Changing the outcome of the earliest eligible subject regardless of groupings did result in a lower FIS.

What if we only changed treatment group?

#Make new data set to modify (mod) and order it by time to find earliest time
mod <- udca1[order(udca1$futime), ]
#Start counter for FIS value (#iters)
FIS = 0
for (i in 1:nrow(mod)) {
  if (mod$trt[i] == 1 && mod$status[i] == 0) {
    mod$status[i] <- 1
    pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
    FIS = FIS + 1 #increase count by 1
    if (pval >= 0.05) break #stop when significance flips
  }
}
cat("FIS = ", FIS, "\n")

## FIS =  17

Only changing the treatment group resulted in a higher FIS than change both groups.

What if we only changed placebo group?

#Make new data set to modify (mod) and order it by time to find earliest time
mod <- udca1[order(udca1$futime), ]
#Start counter for FIS value (#iters)
FIS = 0
for (i in 1:nrow(mod)) {
  if (mod$trt[i] == 0 && mod$status[i] == 1) {
    mod$status[i] <- 0
    pval <- survdiff(Surv(futime, status) ~ trt, data = mod)$pvalue
    FIS = FIS + 1 #increase count by 1
    if (pval >= 0.05) break #stop when significance flips
  }
}
cat("FIS = ", FIS, "\n")

## FIS =  15

Only changing the control group resulted in a lower FIS than only changing the treatment group. It was actually the same as changing in both groups without alternating.

FIS Demo

Brandon Pate

2026-06-11

Implementing the method proposed by Xing et al.

Case Study 1 (Demonstrating Modifying One Group)

Step 1: Conduct Log Rank Test on Original Data

Step 2/3 Make Change and Recalculate P-value

Plot

Case Study 2 (Demonstrating FIS With Changing Both Groups)

Step 1: Conduct Log Rank Test on Original Data

Step 2/3 Make Change and Recalculate P-value

Plot

What if we always change the earliest time and don’t alternate groups?

What if we only changed treatment group?

What if we only changed placebo group?