Phase 3 PFS GSD — NPH Sensitivity Simulation

Skill: clinical-trial-simulation v0.1.0 — TrialSimulator backend Replicates per scenario: 1000

1. Why this design

Implementation-mode simulation. The user posed a Phase 3 1:1 trial with co-primary PFS (TTE) and ORR (binary), GSD on PFS at IF 0.49/0.75/1.00 with Lan-DeMets OBF spending and non-binding futility, alpha_PFS = 0.024 and alpha_ORR = 0.001. The question for this analysis is: how does PFS power degrade under a delayed treatment effect (immune-activation lag)? Because ORR power is independent of the PFS hazard structure, ORR is intentionally omitted from the simulated data; ORR power is reported in the SOC sensitivity run.

2. Confirmed parameters

Parameter	Value
N	500 (1:1)
Accrual	uniform 20/mo (`StaggeredRecruiter`, `accrual_rate = data.frame(end_time=Inf, piecewise_rate=20)`)
Dropout	exponential, `rate = -log(0.85)/50 = 0.003250` (15% by month 50)
Trial duration	200 mo (backstop; events drive timing)
PFS — control	exponential, `rate = log(2)/20`, median 20 mo
PFS — treatment (per scenario)	piecewise-constant exp, `PiecewiseConstantExponentialRNG` with finite `tail_end = 1000`
GSD on PFS	`kMax=3`, IFs = (0.49, 0.75, 1.00), `asOF`, alpha = 0.024 (one-sided)
Futility	non-binding, `bsHSD`, gamma = -4
Power assumption (anchors D_total)	90% under HR = 0.6667

2.5 Boundary computation

Computed once via rpact in scripts/boundaries.R. Verbatim call:

design <- getDesignGroupSequential(
  kMax              = 3,
  informationRates  = c(0.49, 0.75, 1.00),
  alpha             = 0.024,  beta = 0.10,  sided = 1,
  typeOfDesign      = "asOF",
  typeBetaSpending  = "bsHSD", gammaB = -4,
  bindingFutility   = FALSE
)

Hardcoded literals used in scripts/actions.R:

EFFICACY_BOUNDS  <- c(IA1 = 3.0204, IA2 = 2.3762, FA = 2.0303)
FUTILITY_BOUNDS  <- c(IA1 = 0.0490, IA2 = 1.0217)   # non-binding
D_IA1 <- 132 ; D_IA2 <- 202 ; D_TOTAL <- 269

3. Arms (with endpoints)

Per scenario, both arms are reconstructed inside run_scenario(...). The control arm is invariant; the treatment arm’s PFS is rebuilt from the scenario’s (delay, post_hr) tuple.

ep_pfs_ctrl <- endpoint(name = "pfs", type = "tte",
                        generator = rexp, rate = log(2) / 20)
ep_pfs_trt  <- make_trt_pfs_endpoint(delay = delay, post_hr = post_hr)

ctrl <- arm(name = "control");      ctrl$add_endpoints(ep_pfs_ctrl)
exp1 <- arm(name = "experimental"); exp1$add_endpoints(ep_pfs_trt)

make_trt_pfs_endpoint() (in helpers.R) builds a piecewise-constant hazard: lambda_ctrl during [0, delay) and post_hr * lambda_ctrl thereafter. tail_end = 1000 is required because PiecewiseConstantExponentialRNG returns Inf if the trailing end_time is Inf — a non-obvious gotcha worth recording.

4. Trial setup

tr <- trial(
  name         = paste0("nph_", sc_id),
  n_patients   = 500,
  duration     = 200,
  enroller     = StaggeredRecruiter,
  accrual_rate = data.frame(end_time = Inf, piecewise_rate = 20),
  dropout      = rexp,
  rate         = -log(0.85) / 50,
  silent       = TRUE
)
tr$add_arms(sample_ratio = c(1, 1), ctrl, exp1)

Duration is generous so the event-driven FA milestone reliably fires; an earlier attempt with duration = 96 | calendarTime(96) backstop failed because TS pre-validates that eventNumber(...) is reachable.

5. Milestones

m_ia1   <- milestone(name = "ia1",
                      when = eventNumber(endpoint = "pfs", n = 132),
                      action = action_ia1)
m_ia2   <- milestone(name = "ia2",
                      when = eventNumber(endpoint = "pfs", n = 202),
                      action = action_ia2)
m_final <- milestone(name = "final",
                      when = eventNumber(endpoint = "pfs", n = 269),
                      action = action_final)

Each fires when the cumulative number of PFS events crosses its threshold. 132/202/269 implement IF = 0.49/0.75/1.00 of the rpact D_total = 269.

6. Action functions

analyze_pfs <- function(trial, milestone_name) {
  data <- trial$get_locked_data(milestone_name = milestone_name)
  lr <- fitLogrank(formula = Surv(pfs, pfs_event) ~ arm,
                   placebo = "control", data = data,
                   alternative = "less")
  cph <- fitCoxph(formula = Surv(pfs, pfs_event) ~ arm,
                  placebo = "control", data = data,
                  alternative = "less", scale = "log hazard ratio")
  list(z = -lr$z, log_hr = cph$estimate)
}

action_ia1 <- function(trial, ...) {
  r <- analyze_pfs(trial, "ia1")
  trial$save(value = as.integer(r$z > EFFICACY_BOUNDS["IA1"]),
             name = "reject_pfs_ia1")
  trial$save(value = as.integer(r$z < FUTILITY_BOUNDS["IA1"]),
             name = "futility_ia1")
  trial$save(value = r$z,      name = "z_pfs_ia1")
  trial$save(value = r$log_hr, name = "loghr_pfs_ia1")
}
# action_ia2 / action_final follow the same shape with their own bounds

Sign convention. fitLogrank(alternative = "less") returns z<0 when the treatment hazard is lower; rpact’s bounds are positive (treatment- better-is-positive). The action negates z so the comparison z > EFFICACY_BOUNDS matches the rpact convention directly.

Per replicate per analysis the action saves: efficacy rejection flag, futility crossing flag, the z statistic, and the Cox log-HR. Power and stop-stage are derived in post-processing in main.R.

7. Operating characteristics

PFS power, P(stop), median observed HR, expected duration — by scenario:

Scenario	Delay (mo)	Post HR	Power	P(stop IA1 eff)	P(stop IA1 fut)	P(stop IA2 eff)	P(stop IA2 fut)	P(stop FA eff)	P(stop FA no-rej)	Median obs HR	E[dur] non-bind (mo)	E[dur] binding (mo)
S0_PH — PH baseline (HR 0.667)	0	0.6667	91.0%	22.4%	1.4%	46.3%	2.9%	21.4%	5.6%	0.667	42.24	32.91
S1_d3_h60 — 3-mo delay, post HR 0.60	3	0.6	92.7%	12.8%	3.2%	51.2%	2.8%	26.2%	3.8%	0.660	42.65	33.90
S2_d6_h55 — 6-mo delay, post HR 0.55	6	0.55	86.8%	5.9%	11.4%	39.9%	4.4%	35.1%	3.3%	0.676	42.50	34.36
S3_d6_h62 — 6-mo delay, post HR 0.62 (matched to PH)	6	0.62	70.8%	3.3%	14.4%	26.2%	9.2%	36.2%	10.7%	0.735	41.28	34.46
S4_d9_h50 — 9-mo delay, post HR 0.50	9	0.5	76.6%	1.5%	18.9%	22.8%	10.3%	41.8%	4.7%	0.707	42.17	34.58

Monte Carlo precision (1-sigma):

S0_PH: power 91.0% ± 0.9 pp (Monte Carlo SE)
S1_d3_h60: power 92.7% ± 0.8 pp (Monte Carlo SE)
S2_d6_h55: power 86.8% ± 1.1 pp (Monte Carlo SE)
S3_d6_h62: power 70.8% ± 1.4 pp (Monte Carlo SE)
S4_d9_h50: power 76.6% ± 1.3 pp (Monte Carlo SE)

Reading the table.

PH baseline (S0) lands at 91.0%, matching the rpact 90% target — confirms boundaries and the simulation are aligned.
3-mo delay with HR 0.60 (S1) gains power to 92.7% — the deeper late-period HR more than compensates the short lag.
6-mo delay with HR 0.55 (S2) holds at 86.8%. The deeper late HR (0.55 vs 0.667 PH) just barely buys back what the 6-month lag costs.
6-mo delay with HR 0.62, matched to PH (S3) is the worst at 70.8%. This is the scenario the user should worry about: a delay without a deeper post-delay effect costs ~20 pp of power.
9-mo delay with HR 0.50 (S4) recovers to 76.6% — a very deep late HR is needed to offset a 9-month lag.

Median observed HR ranges from 0.660 to 0.735 across scenarios. In delayed-effect scenarios the median observed HR is biased upward (closer to 1) relative to the late-period HR — log-rank averages the no-effect window into the estimate.

Expected duration is reported two ways:

non-binding: mean of milestone_time_<final> (every replicate runs to FA in the simulation, per TS principle that trials don’t actually stop early)
binding: mean of the calendar time at the first crossing (efficacy or futility), reflecting the duration if the futility rule were enforced

8. Caveats and limitations

NPH scenarios are illustrative, chosen to span 0/3/6/9-month delays with varying post-delay HRs. Production scenarios should reflect the prior knowledge specific to the agent’s mechanism.
ORR is omitted from this simulation by design — it does not depend on the PFS NPH structure. ORR power is in the SOC run.
Boundaries assume PH at the design stage. Under genuine NPH, OBF boundaries are not optimal — alternative spending or weighted log-rank statistics could regain power. Outside scope here.
PiecewiseConstantExponentialRNG requires a finite trailing end_time — passing Inf produces all-Inf samples (no events). Bug-fix recorded in helpers.R.
fitLogrank(alternative='less') returns z on the raw log-rank scale (z<0 = treatment better). The action functions negate z to match rpact’s positive boundary convention.
Cost / token usage. Not retrievable from inside this run; please run /cost to inspect session totals.

Milestone times — PH baseline