Phase 3 PFS GSD — NPH Sensitivity Simulation

Skill: clinical-trial-simulation v0.1.0 — TrialSimulator backend Replicates per scenario: 1000

1. Why this design

Implementation-mode simulation. The user posed a Phase 3 1:1 trial with co-primary PFS (TTE) and ORR (binary), GSD on PFS at IF 0.49/0.75/1.00 with Lan-DeMets OBF spending and non-binding futility, alpha_PFS = 0.024 and alpha_ORR = 0.001. The question for this analysis is: how does PFS power degrade under a delayed treatment effect (immune-activation lag)? Because ORR power is independent of the PFS hazard structure, ORR is intentionally omitted from the simulated data; ORR power is reported in the SOC sensitivity run.

2. Confirmed parameters

Parameter Value
N 500 (1:1)
Accrual uniform 20/mo (StaggeredRecruiter, accrual_rate = data.frame(end_time=Inf, piecewise_rate=20))
Dropout exponential, rate = -log(0.85)/50 = 0.003250 (15% by month 50)
Trial duration 200 mo (backstop; events drive timing)
PFS — control exponential, rate = log(2)/20, median 20 mo
PFS — treatment (per scenario) piecewise-constant exp, PiecewiseConstantExponentialRNG with finite tail_end = 1000
GSD on PFS kMax=3, IFs = (0.49, 0.75, 1.00), asOF, alpha = 0.024 (one-sided)
Futility non-binding, bsHSD, gamma = -4
Power assumption (anchors D_total) 90% under HR = 0.6667

2.5 Boundary computation

Computed once via rpact in scripts/boundaries.R. Verbatim call:

design <- getDesignGroupSequential(
  kMax              = 3,
  informationRates  = c(0.49, 0.75, 1.00),
  alpha             = 0.024,  beta = 0.10,  sided = 1,
  typeOfDesign      = "asOF",
  typeBetaSpending  = "bsHSD", gammaB = -4,
  bindingFutility   = FALSE
)

Hardcoded literals used in scripts/actions.R:

EFFICACY_BOUNDS  <- c(IA1 = 3.0204, IA2 = 2.3762, FA = 2.0303)
FUTILITY_BOUNDS  <- c(IA1 = 0.0490, IA2 = 1.0217)   # non-binding
D_IA1 <- 132 ; D_IA2 <- 202 ; D_TOTAL <- 269

3. Arms (with endpoints)

Per scenario, both arms are reconstructed inside run_scenario(...). The control arm is invariant; the treatment arm’s PFS is rebuilt from the scenario’s (delay, post_hr) tuple.

ep_pfs_ctrl <- endpoint(name = "pfs", type = "tte",
                        generator = rexp, rate = log(2) / 20)
ep_pfs_trt  <- make_trt_pfs_endpoint(delay = delay, post_hr = post_hr)

ctrl <- arm(name = "control");      ctrl$add_endpoints(ep_pfs_ctrl)
exp1 <- arm(name = "experimental"); exp1$add_endpoints(ep_pfs_trt)

make_trt_pfs_endpoint() (in helpers.R) builds a piecewise-constant hazard: lambda_ctrl during [0, delay) and post_hr * lambda_ctrl thereafter. tail_end = 1000 is required because PiecewiseConstantExponentialRNG returns Inf if the trailing end_time is Inf — a non-obvious gotcha worth recording.

4. Trial setup

tr <- trial(
  name         = paste0("nph_", sc_id),
  n_patients   = 500,
  duration     = 200,
  enroller     = StaggeredRecruiter,
  accrual_rate = data.frame(end_time = Inf, piecewise_rate = 20),
  dropout      = rexp,
  rate         = -log(0.85) / 50,
  silent       = TRUE
)
tr$add_arms(sample_ratio = c(1, 1), ctrl, exp1)

Duration is generous so the event-driven FA milestone reliably fires; an earlier attempt with duration = 96 | calendarTime(96) backstop failed because TS pre-validates that eventNumber(...) is reachable.

5. Milestones

m_ia1   <- milestone(name = "ia1",
                      when = eventNumber(endpoint = "pfs", n = 132),
                      action = action_ia1)
m_ia2   <- milestone(name = "ia2",
                      when = eventNumber(endpoint = "pfs", n = 202),
                      action = action_ia2)
m_final <- milestone(name = "final",
                      when = eventNumber(endpoint = "pfs", n = 269),
                      action = action_final)

Each fires when the cumulative number of PFS events crosses its threshold. 132/202/269 implement IF = 0.49/0.75/1.00 of the rpact D_total = 269.

6. Action functions

analyze_pfs <- function(trial, milestone_name) {
  data <- trial$get_locked_data(milestone_name = milestone_name)
  lr <- fitLogrank(formula = Surv(pfs, pfs_event) ~ arm,
                   placebo = "control", data = data,
                   alternative = "less")
  cph <- fitCoxph(formula = Surv(pfs, pfs_event) ~ arm,
                  placebo = "control", data = data,
                  alternative = "less", scale = "log hazard ratio")
  list(z = -lr$z, log_hr = cph$estimate)
}

action_ia1 <- function(trial, ...) {
  r <- analyze_pfs(trial, "ia1")
  trial$save(value = as.integer(r$z > EFFICACY_BOUNDS["IA1"]),
             name = "reject_pfs_ia1")
  trial$save(value = as.integer(r$z < FUTILITY_BOUNDS["IA1"]),
             name = "futility_ia1")
  trial$save(value = r$z,      name = "z_pfs_ia1")
  trial$save(value = r$log_hr, name = "loghr_pfs_ia1")
}
# action_ia2 / action_final follow the same shape with their own bounds

Sign convention. fitLogrank(alternative = "less") returns z<0 when the treatment hazard is lower; rpact’s bounds are positive (treatment- better-is-positive). The action negates z so the comparison z > EFFICACY_BOUNDS matches the rpact convention directly.

Per replicate per analysis the action saves: efficacy rejection flag, futility crossing flag, the z statistic, and the Cox log-HR. Power and stop-stage are derived in post-processing in main.R.

7. Operating characteristics

PFS power, P(stop), median observed HR, expected duration — by scenario:

Scenario Delay (mo) Post HR Power P(stop IA1 eff) P(stop IA1 fut) P(stop IA2 eff) P(stop IA2 fut) P(stop FA eff) P(stop FA no-rej) Median obs HR E[dur] non-bind (mo) E[dur] binding (mo)
S0_PH — PH baseline (HR 0.667) 0 0.6667 91.0% 22.4% 1.4% 46.3% 2.9% 21.4% 5.6% 0.667 42.24 32.91
S1_d3_h60 — 3-mo delay, post HR 0.60 3 0.6 92.7% 12.8% 3.2% 51.2% 2.8% 26.2% 3.8% 0.660 42.65 33.90
S2_d6_h55 — 6-mo delay, post HR 0.55 6 0.55 86.8% 5.9% 11.4% 39.9% 4.4% 35.1% 3.3% 0.676 42.50 34.36
S3_d6_h62 — 6-mo delay, post HR 0.62 (matched to PH) 6 0.62 70.8% 3.3% 14.4% 26.2% 9.2% 36.2% 10.7% 0.735 41.28 34.46
S4_d9_h50 — 9-mo delay, post HR 0.50 9 0.5 76.6% 1.5% 18.9% 22.8% 10.3% 41.8% 4.7% 0.707 42.17 34.58

Monte Carlo precision (1-sigma):

Reading the table.

Median observed HR ranges from 0.660 to 0.735 across scenarios. In delayed-effect scenarios the median observed HR is biased upward (closer to 1) relative to the late-period HR — log-rank averages the no-effect window into the estimate.

Expected duration is reported two ways:

8. Caveats and limitations

Milestone times — PH baseline