The user provided a fixed Phase 3 oncology spec (implementation mode): two-arm trial, 1:1 randomization, exponential PFS with median 60 months in placebo and HR = 0.74 in treatment, 1200 patients, group-sequential testing with one interim and one final at information fractions 0.66 and 1.00, O’Brien-Fleming alpha spending, one-sided α = 0.025, 80% power target. The simulation answers four questions: power at the interim, overall power, expected trial duration accounting for binding efficacy stop, and the calendar-time distribution of when the interim and final milestones fire.
Adaptive sample-size reassessment was raised by the user as a possible follow-up but is explicitly out of scope for this run.
| Item | Value | Notes |
|---|---|---|
| Endpoint | PFS, TTE, exponential | single primary endpoint |
| Placebo hazard | log(2) / 60 ≈ 0.01155 / month |
median PFS = 60 mo |
| Treatment hazard | log(2) / 60 * 0.74 ≈ 0.00855 / month |
HR = 0.74 |
| Sample size | 1200 patients | 1:1 randomization |
| Trial duration cap | 96 months | generous upper bound; final fires on event count, not calendar |
| Accrual schedule | 24/mo for months 0–6, then 42/mo | linear-midpoint approximation of 6→42 ramp |
| Dropout | exponential, rate = -log(1 - 0.025) / 12 ≈ 0.00211 /
month |
2.5% by month 12, both arms |
| Stratification factors | none | |
| Seed | NULL (auto per replicate) | |
| Interim trigger | 233 PFS events | IF = 0.66 |
| Final trigger | 353 PFS events | IF = 1.00 |
| Interim z-bound | 2.524 (one-sided upper) | from gsDesign::gsSurv with sfLDOF |
| Final z-bound | 1.992 (one-sided upper) | from gsDesign::gsSurv with sfLDOF |
| Replicates | 1000 |
Boundaries and required event counts come from
scripts/boundaries.R. The information fractions are
pre-specified in the protocol and the event counts deterministically fix
them, so this calculation is constant across replicates — it is run once
and the literals are hardcoded into actions.R.
library(gsDesign)
gs <- gsSurv(
k = 2,
test.type = 1,
alpha = 0.025,
beta = 0.20,
timing = c(0.66, 1.00),
sfu = sfLDOF,
lambdaC = log(2) / 60,
hr = 0.74,
ratio = 1
)
--- Critical z-values (one-sided upper) ---
[1] 2.524189 1.991501
--- Cumulative alpha spent ---
[1] 0.005798279 0.025000000
--- Target events per stage (cumulative) ---
[1] 233 353
gsSurv also reports its own implied N (3120) and study
duration (18 months) under its own assumed accrual model. These are
not used in the simulation — the simulation uses the
user-specified 1200 patients and the accrual schedule in §4. Only the
event counts (233 / 353) and z-bounds (2.524 / 1.992) are taken from
this calculation.
Both arms share the same endpoint structure — a single TTE PFS
endpoint generated by rexp — with only the rate parameter
differing. Inlining log(2) / 60 and
log(2) / 60 * 0.74 at each call site keeps the placebo
median and the hazard ratio visible to a reviewer.
ep_pfs_placebo <- endpoint(
name = "pfs",
type = "tte",
generator = rexp,
rate = log(2) / 60
)
placebo <- arm(name = "placebo")
placebo$add_endpoints(ep_pfs_placebo)
Median PFS = 60 months under exponential survival.
ep_pfs_treatment <- endpoint(
name = "pfs",
type = "tte",
generator = rexp,
rate = log(2) / 60 * 0.74
)
treatment <- arm(name = "treatment")
treatment$add_endpoints(ep_pfs_treatment)
HR = 0.74 implies median PFS ≈ 81.1 months under exponential survival. Constant hazard ratio — Cox PH and log-rank are both valid; we use log-rank per spec.
accrual_rate <- data.frame(
end_time = c(6, Inf),
piecewise_rate = c(24, 42)
)
tr <- trial(
name = "phase3_pfs_obf",
n_patients = 1200,
duration = 96,
enroller = StaggeredRecruiter,
accrual_rate = accrual_rate,
dropout = rexp,
rate = -log(1 - 0.025) / 12
)
tr$add_arms(sample_ratio = c(1, 1), placebo, treatment)
rate = -log(1 - 0.025) / 12, matching 2.5% dropout by month
12 in both arms (single-landmark exponential default; no Weibull solver
needed).Two milestones, in chronological order. Both are event-driven on PFS so their information fractions are deterministic across replicates.
m_interim <- milestone(
name = "interim",
when = eventNumber(endpoint = "pfs", n = 233),
action = action_interim
)
m_final <- milestone(
name = "final",
when = eventNumber(endpoint = "pfs", n = 353),
action = action_final
)
The TrialSimulator engine never actually stops a trial early — every
replicate runs through both milestones, and the binding efficacy stop is
applied post-hoc in §7 from the saved reject_interim
flag.
Both actions follow the same shape: lock data, run a one-sided
log-rank against placebo, save the test stat and decision flag. The OBF
z-bounds are hardcoded from boundaries.R so no optimizer
runs per replicate.
action_interim <- function(trial, ...) {
data <- trial$get_locked_data(milestone_name = "interim")
fit <- fitLogrank(
formula = Surv(pfs, pfs_event) ~ arm,
placebo = "placebo",
data = data,
alternative = "less"
)
# OBF interim z-bound = 2.524 (one-sided upper) from scripts/boundaries.R.
# fitLogrank with alternative = "less" returns z < 0 when treatment
# hazard is lower, so the rejection rule is z <= -2.524.
trial$save(value = fit$z, name = "z_interim")
trial$save(value = fit$p, name = "p_interim")
trial$save(value = as.integer(fit$z <= -2.524), name = "reject_interim")
}
fitLogrank with alternative = "less" because
the treatment is expected to have lower PFS hazard; this
returns z < 0 under the alternative.trial$set_duration / $resize /
$remove_arms calls.z_interim,
p_interim, reject_interim (1 if z ≤ −2.524).
reject_interim feeds power-at-interim and the
expected-duration calculation; z_interim and
p_interim let a reviewer cross-check the decision flag
against the test statistic.action_final <- function(trial, ...) {
data <- trial$get_locked_data(milestone_name = "final")
fit <- fitLogrank(
formula = Surv(pfs, pfs_event) ~ arm,
placebo = "placebo",
data = data,
alternative = "less"
)
# OBF final z-bound = 1.992 (one-sided upper) from scripts/boundaries.R.
trial$save(value = fit$z, name = "z_final")
trial$save(value = fit$p, name = "p_final")
trial$save(value = as.integer(fit$z <= -1.992), name = "reject_final")
}
z_final, p_final,
reject_final (1 if z ≤ −1.992). reject_final
combines with reject_interim to give overall power.All values from 1000 replicates of the production run
(scripts/main.R).
“What is the probability of crossing the OBF interim bound under the assumed HR = 0.74?”
mean(out$reject_interim)
# 0.415
gsDesign predicted ~0.407 for marginal interim crossing
under HR = 0.74; the simulation’s 0.415 is within MCSE (≈
√(0.415·0.585/1000) ≈ 0.016).
“What is the probability of rejecting H₀ at either the interim or the final?”
mean(out$reject_interim | out$reject_final)
# 0.805
Matches the 80% design target. MCSE ≈ √(0.805·0.195/1000) ≈ 0.013, so the 95% confidence band is roughly 0.78–0.83 — comfortably consistent with 0.80.
“How long does the trial run on average, accounting for binding efficacy stop at the interim?”
stop_time <- ifelse(out$reject_interim == 1,
out[["milestone_time_<interim>"]],
out[["milestone_time_<final>"]])
mean(stop_time)
# 47.46 months
Interpretation: with binding efficacy stopping, ~41.5% of replicates end at the interim trigger (mean 39.1 mo) and the rest at the final (mean 53.4 mo); the weighted average is ~47.5 months. Without the binding stop (every trial runs to final), expected duration would be the unconditional mean final time, 53.4 mo.
summarizeMilestoneTime(out)
| Milestone | Mean (mo) | Median (mo) | SD | n |
|---|---|---|---|---|
| interim | 39.11 | 39.09 | 1.49 | 1000 |
| final | 53.38 | 53.31 | 2.02 | 1000 |
The SDs are small relative to the means — accrual is large enough that event-count timing is concentrated. The 96-month duration cap is far above any observed final-trigger time, so no replicate is censored by the calendar limit.
mean(out[["milestone_time_<final>"]]) ≈
53.4 months instead.