S1801 Reconstructed Data: Hypothesis Testing Analysis

Methods Identified from the Paper

Based on the manuscript “A comparative study of two-sample hypothesis tests in the presence of long-term survivors” (full_3__3__3.tex), the following methods were identified and implemented:

  1. Log-rank test (standard) - Conventional non-parametric test (ρ=0, γ=0)
  2. Early weighted log-rank test - Fleming-Harrington (ρ=1, γ=0)
  3. Late weighted log-rank test - Fleming-Harrington (ρ=0, γ=1)
  4. Optimal weighted log-rank test - (ρ≈-1, γ≈0) approximation
  5. Yang-Prentice (YP) test - Adaptive weighted log-rank test
  6. Two-Stage (TS) test - Qiu and Sheng (2008)
  7. Mixture Cure Model LRT - Parametric estimation with Weibull uncured distribution

Data Summary

Group n Events Censored Median Time Max Time
Neoadjuvant-adjuvant 155 42 113 10.90 36.00
Adjuvant-only 159 75 84 8.39 36.00
Total 314 117 197 - -

Hypothesis Test Results (P-Values)

Method Test Statistic DF P-Value Significant
Log-rank test (standard) 9.6455 1 0.001898
Early weighted log-rank (ρ=1, γ=0) 7.1775 1 0.007382
Late weighted log-rank (ρ=0, γ=1) 15.5901 1 0.000079
Optimal weighted log-rank (ρ=-1, γ=0) 12.0471 1 0.000519
Yang-Prentice (YP) test 15.7487 2 0.000380
Two-Stage (TS) test 9.6455 1 0.001898
Mixture Cure Model LRT (Weibull) 18.5452 2 0.000094

Note: The original S1801 trial reported p = 0.004 from the planned log-rank test. Our reconstructed data yields p = 0.0019, which is consistent with the published result.


Cure Model Parameter Estimates (Weibull)

Parameter Value
Cure fraction (Adjuvant-only) 42.0%
Cure fraction (Neoadjuvant-adjuvant) 68.2%
Weibull shape parameter 1.4445
Weibull scale (Adjuvant) 9.14
Weibull scale (Neoadjuvant) 6.06

Interpretation: The neoadjuvant-adjuvant group shows a substantially higher cure fraction (68.2%) compared to the adjuvant-only group (42.0%), suggesting that pre-surgical treatment may improve long-term outcomes.


Model Comparison (AIC)

Cure Models

Model Log-Likelihood Parameters AIC Cure (Adj) Cure (Neo)
Cure Model (Log-normal) -481.77 5 973.54 42.1% 67.8%
Cure Model (Weibull) -494.81 5 999.61 42.0% 68.2%
Cure Model (Exponential) -504.48 4 1016.96 37.0% 66.9%

Standard (Non-Cure) Models

Model Log-Likelihood Parameters AIC
Log-normal -509.20 3 1024.41
Log-logistic -515.47 3 1036.94
Weibull AFT -521.77 3 1049.54
Exponential -524.05 2 1052.11
Cox PH -617.20 1 1236.40

Combined Ranking by AIC

Rank Model Type AIC ΔAIC
1 Cure Model (Log-normal) Cure Model 973.54 0.00
2 Cure Model (Weibull) Cure Model 999.61 26.07
3 Cure Model (Exponential) Cure Model 1016.96 43.42
4 Log-normal Standard Model 1024.41 50.86
5 Log-logistic Standard Model 1036.94 63.39
6 Weibull AFT Standard Model 1049.54 76.00
7 Exponential Standard Model 1052.11 78.56
8 Cox PH Standard Model 1236.40 262.86

Conclusions

  1. All hypothesis tests are statistically significant (p < 0.01), providing strong evidence for a treatment effect favoring the neoadjuvant-adjuvant arm.

  2. The late weighted log-rank test shows the strongest evidence (p = 0.000079), suggesting the treatment effect is most pronounced at later follow-up times, consistent with a cure model framework.

  3. Cure models substantially outperform standard survival models based on AIC, with the log-normal cure model providing the best fit (AIC = 973.54 vs. 1024.41 for the best standard model).

  4. The cure fraction is approximately 26 percentage points higher in the neoadjuvant-adjuvant arm (68.2% vs. 42.0%), indicating a meaningful increase in long-term survivors.

  5. The best fitting model is the Log-normal Cure Model, which accounts for both the presence of long-term survivors and the treatment effect on both the cure fraction and the uncured survival distribution.


Files Generated

  • hypothesis_test_results.csv - P-values and test statistics for all methods
  • model_aic_comparison.csv - AIC comparison for cure and standard models