Simulation prompt — HIMALAYA (NCT03298451)
Simulate the operating characteristics of the HIMALAYA Phase 3 trial design. Source: AstraZeneca SAP D419CC00002, Edition 4.0, 30-JUL-2021 (kept locally as source_sap.pdf).
Trial
1L unresectable advanced hepatocellular carcinoma, open-label, three-arm RCT (after Amendment 4 closed the original Arm B):
- Arm A — durvalumab monotherapy
- Arm C — STRIDE: single priming dose of tremelimumab + durvalumab
- Arm D — sorafenib (active control)
Allocation: 1:1:1 (A:C:D). Total N = 1,324 (~441 per arm). [Curator-supplied; redacted in SAP, taken from public results.]
Primary endpoint: overall survival (OS), randomization to death from any cause.
Hypothesis structure and α budget
Familywise α = 5% two-sided, strongly controlled.
| Tag |
Comparison |
Test |
α budget |
Gating |
| ORR-IA1 |
A & C |
ORR / DoR at IA1 |
0.001 |
— (out of scope) |
| H1 |
C vs D |
OS superiority |
0.049 |
Primary |
| H2 |
A vs D |
OS non-inferiority (margin 1.08) |
recycled from H1 |
After H1 success |
| H3 |
A vs D |
OS superiority |
recycled from H2 |
After H2 NI achieved |
| OS36 |
C vs D |
3-yr OS rate |
recycled from H3 |
After H1+H2+H3 all positive |
Spending function: Lan-DeMets approximation of O’Brien-Fleming across IA2 + FA. If H1 fails at IA2 but succeeds at FA, H2 is tested only at FA.
Survival assumptions
- Arm D (sorafenib): exponential, median OS = 11.5 months.
- Arm C (STRIDE) vs D: average HR = 0.70 with a 2-month delay in separation. Translate this into a hazard specification consistent with the SAP wording.
- Arm A (durvalumab mono) vs D: HR = 0.84, proportional hazards (no delay specified in SAP).
Trial conduct
- Accrual: non-uniform over 22 months, total enrollment 1,324.
- Follow-up after accrual ends: 15.5 months. Total study horizon: 37.5 months from FSR.
- Dropout: none modeled (matches SAP). Censoring only at administrative cutoffs.
- Stratification: SAP stratifies by etiology (HBV / HCV / other), ECOG (0 / 1), macrovascular invasion (Y / N). For this pilot, unstratified analysis.
| Look |
Trigger |
C+D events |
A+D events |
Approx calendar |
SAP-stated 2-sided α |
| IA2 |
~404 events in C+D |
404 |
~453 |
~30 mo |
H1: 0.0222 · H2: 0.0248 |
| FA |
~515 events in C+D |
515 |
~560 |
~37.5 mo |
H1: 0.0425 · H2: 0.0418 |
Boundaries should be derived from observed event counts at each look using Lan-DeMets / OF; the SAP α values are cross-checks.
Analysis methods
- OS: log-rank test (unstratified for this pilot); HR via Cox PH.
- H2 NI: reject H0 if upper limit of the 2-sided α-adjusted CI for HR(A/D) is below 1.08.
Operating characteristics to compute
Use 1,000 replications per scenario.
- Planning alternative (HR(C/D) = 0.70 average with 2-month delay; HR(A/D) = 0.84):
- Empirical power, H1 at IA2 and at FA, plus cumulative.
- Empirical power, H2 NI at FA.
- Empirical power, H3 superiority at FA.
- Empirical probability of early stop at IA2 for H1.
- Global null (HR = 1 for both C vs D and A vs D): empirical FWER under the alpha-recycling rule.
- Boundary verification: OF-derived boundaries vs SAP’s 0.0222 / 0.0425 (H1) and 0.0248 / 0.0418 (H2/H3).
- Calendar timing: distribution of months from FSR to IA2 and to FA. Compare to sponsor’s 30 / 37.5.
- NPH-translation sensitivity for H1: rerun with average HR fixed at 0.70 but the post-delay slope varied to bracket the translation choice.
- (Optional) MaxCombo sensitivity for H1: max{logrank, FH(0,1), FH(1,1)} per Karrison 2016 / He-Koch-Kurland 2021.
| Quantity |
Sponsor value |
| Power H1 at IA2 |
≥ 85% |
| Power H1 at FA |
≥ 97% |
| Power H2 NI at FA |
~ 84% |
| Time to IA2 |
~ 30 months |
| Time to FA |
~ 37.5 months |
| Smallest detectable average HR (H1, FA) |
0.84 |