Win-based methods—such as generalized pairwise comparisons (GPC), the win ratio (WR), win odds (WO), and net benefit / “proportion in favor of treatment”—summarize treatment effects by comparing outcomes for pairs of individuals across treatment arms using a pre-specified clinical priority rule.citeturn0search1turn0search0turn8search3turn8search2 These methods are increasingly used for hierarchical composite endpoints where traditional “time-to-first event” analyses may underrepresent clinically important outcomes.citeturn0search0turn0search1turn8search3
For patient-centered CER and PCORI-style decision support, the central HTE question becomes: how do win-based treatment effects vary across baseline covariates and patient profiles? This requires (i) formal definitions of conditional win estimands—especially conditional win probability \(p_W(x)\) and conditional net benefit \(\Delta(x)\)—and (ii) methods that estimate these targets under realistic complexities (censoring, missing data, clustering, recurrent/terminal events, and confounding).citeturn1search0turn1search1turn4search17turn6search0
A key conceptual distinction is between:
Importantly, win-probability-style estimands (Mann–Whitney / “probability of superiority”) are not the same as the (generally non-identifiable) “probability an individual benefits,” and paradoxes can arise if one conflates these.citeturn3search3turn3search7
Methodologically, the strongest current toolset for win-based HTE comprises: stratified/subgroup WR with homogeneity testing; probabilistic index models (PIMs) for pairwise regression; semiparametric proportional win-fractions regression; and causal weighting (IPTW) or censoring adjustment (IPCW/CovIPCW) for observational CER and informative censoring.citeturn13search0turn9search0turn3search0turn14search1turn14search0turn2search2 Emerging directions include nearest-neighbor pairing causal frameworks, win-fraction “pseudo-outcomes” to enable off-the-shelf ML HTE tools, and nonparametric MLE approaches for censoring + missingness.citeturn7search0turn5search1turn2search6
Let \(A\in\{0,1\}\) denote treatment assignment (1=treatment, 0=control), \(X\) baseline covariates, and \(Y\) the (possibly multicomponent) outcome used for win comparisons.
Win methods define a pairwise comparison function (a “win kernel”) \[ W(Y_i, Y_j)\in\{-1,0,+1\}, \] where \(W(Y_i,Y_j)=+1\) means “\(i\) wins over \(j\),” \(-1\) means “\(i\) loses,” and \(0\) indicates a tie. For hierarchical composites, \(W(\cdot,\cdot)\) is determined by a priority order of components and possibly thresholds that define clinically negligible differences as ties.citeturn0search1turn0search0turn8search3
A common estimand is the probability that a random treated individual has a more favorable outcome than a random control individual under the win rule: \[ p_W \;=\; \Pr\{W(Y^{(1)},Y^{(0)\,*})=+1\}, \] where \(Y^{(1)}\) is drawn from the marginal distribution under treatment and \(Y^{(0)\,*}\) is an independent draw from the marginal control distribution. The corresponding loss probability is \[ p_L \;=\; \Pr\{W(Y^{(1)},Y^{(0)\,*})=-1\}, \] and \(p_T=1-p_W-p_L\).
From these, standard summaries include:
Net benefit / “proportion in favor”
difference:
\[
\Delta \;=\; p_W - p_L.
\] citeturn0search1turn8search3turn8search2
Win ratio:
\[
\mathrm{WR} \;=\; \frac{p_W}{p_L},
\] typically defined with conventions about how ties are
handled.citeturn0search0turn8search2
Win odds (incorporating ties):
\[
\mathrm{WO} \;=\; \frac{p_W + \tfrac12 p_T}{p_L + \tfrac12 p_T}.
\] citeturn2search1turn2search18
Large-sample inference for \(\Delta\) and WR can be developed using multivariate multi-sample \(U\)-statistic theory.citeturn8search10turn8search2
To define HTE, introduce covariate-dependent (conditional) win probabilities. A useful general object is the two-profile conditional win probability \[ p_W(x,x') \;=\; \Pr\{W(Y^{(1)}\mid X=x,\;Y^{(0)}\mid X=x')=+1\}. \] PIMs are precisely regression models for such probabilistic-index parameters (probability of superiority) as functions of covariates of both individuals compared.citeturn9search0turn9search14turn1search6
Two specializations are central for HTE reporting:
Same-profile conditional win probability \[ p_W(x) \equiv p_W(x,x), \] which compares two hypothetical draws from the treatment and control outcome distributions at the same baseline profile \(x\).
Conditional net benefit (win difference) \[ \Delta(x) \;=\; p_W(x) - p_L(x), \] with \(p_L(x)=\Pr\{W(\cdot)=-1\mid X=x\}\) defined analogously.
These define win-based analogues of a CATE: rather than contrasting means, risks, or hazards, they contrast probabilities of being better under the win rule. The “effect modification” question becomes whether \(\Delta(x)\) or \(\log\{\mathrm{WR}(x)\}\) varies with \(x\).citeturn9search0turn3search0
A critical and often under-specified part of win-based HTE is the pairing operator: how treated and control individuals are paired for comparison.
In an RCT, randomization yields identification of marginal win estimands because treatment assignment is independent of potential outcomes, enabling \(p_W\) and \(\Delta\) to be interpreted causally as population-level contrasts.citeturn1search1turn16search8turn3search3
In observational CER, causal interpretations require standard assumptions such as consistency/SUTVA, conditional exchangeability (unconfoundedness), and positivity.citeturn16search0turn12search10turn12search1 Under these, one can define causal win estimands that contrast the distributions of potential outcomes under \(A=1\) vs \(A=0\) using the win rule, and then estimate them via weighting or doubly robust methods.citeturn2search2turn15search3turn7search0
A critical caution is that the marginal “probability of superiority” estimand is not equal to the generally non-identifiable causal probability \(\Pr\{Y^{(1)} > Y^{(0)}\}\) for the same individual, and paradoxical reversals can occur if interpreted as such.citeturn3search7turn3search3
flowchart TB
A[Define clinical priority rule<br/>W(·,·): win/loss/tie] --> B[Choose estimand class]
B --> B1[Marginal: random treated vs random control<br/>p_W, p_L, Δ, WR, WO]
B --> B2[Conditional: profile-specific<br/>p_W(x), Δ(x), WR(x)]
B --> B3[Pairing-sensitive causal targets<br/>stratified/matched/NN pairing]
B1 --> C[Method choices]
B2 --> C
B3 --> C
C --> C1[Design-based: stratified WR, subgroup tests]
C --> C2[Regression: PIM, PW regression, semiparametric U-statistic]
C --> C3[Causal: IPTW, IPCW/CovIPCW, DR/TMLE, cross-fitting]
C --> C4[ML HTE: win-fraction pseudo-outcomes, causal trees/forests]
C --> D[Inference layer]
D --> D1[U-statistic asymptotics / sandwich]
D --> D2[EIF / one-step / TMLE + cross-fitting]
D --> D3[Bootstrap / permutation]
D --> D4[Cluster-robust / cluster bootstrap]
This section provides method “profiles” matching the requested list: targeted estimand, assumptions, implementation steps, inference, software, pros/cons, and gaps.
Targeted estimand. Stratum-specific marginal win estimands \(p_W(s), \Delta(s), \mathrm{WR}(s)\) for strata \(S(X)=s\), and (optionally) a Mantel–Haenszel-type combined estimand across strata.citeturn13search0turn13search1
Assumptions.
- Subgroups/strata are well-defined and not overly
sparse.citeturn13search0
- In observational settings, within-stratum exchangeability may still
fail unless covariates are adequately
controlled.citeturn12search10
Implementation steps.
1) Pre-specify strata and hierarchy. PCORI emphasizes transparent
reporting and discourages claims of differential effects based on
“significant in one subgroup but not
another.”citeturn1search0turn1search4
2) Compute stratum-specific WR/WO/NB.citeturn13search0
3) Combine via stratified weighting (analogous to Mantel–Haenszel) if a
single summary is desired.citeturn13search1
4) Test heterogeneity using a Cochran-style homogeneity test adapted in
the stratified WR framework.citeturn13search1turn13search0
Inference. Plug-in variance estimators (built from win-count covariance estimators) and large-sample normal approximations; can be complemented with bootstrap for finite-sample robustness.citeturn13search1turn8search10
Software. Win-statistic packages commonly support
stratification workflows; WR supports stratified analyses
for prioritized survival composites in its methods
suite.citeturn3search5turn3search1
Pros/cons.
- Pros: highly interpretable for stakeholders; naturally aligned with
PCORI HTE reporting expectations.citeturn1search0turn1search4
- Cons: limited to low-dimensional HTE; multiplicity and low power
remain; sparse strata can destabilize estimates.citeturn1search0
Open gaps. Multiplicity-aware stratified win HTE procedures and principled integration with high-dimensional adjustment.
Targeted estimand. A conditional probabilistic index (pairwise win probability) of the form \[ \Pr\{Y \preceq Y^* \mid X, X^*\} \quad \text{or} \quad \Pr\{W(Y,Y^*)=+1\mid X,X^*\}, \] linked to a regression predictor via a link function (often logit/probit).citeturn9search0turn9search14turn1search6
Assumptions.
- Correct specification of the PIM link-linear structure (or use as a
working model).citeturn9search0
- Pairwise pseudo-observations create dependence; robust variance or
design-restricted comparisons are needed for clustered
designs.citeturn5search2turn9search2
Implementation steps.
1) Define an ordering or win rule; define pairwise response (e.g.,
“wins” indicator).citeturn9search0
2) Construct pairwise covariates (often using contrasts \(X-X^*\)).citeturn9search0
3) Fit PIM with treatment and treatment×covariate interactions to
estimate HTE in \(p_W(x,x')\) or in
\(p_W(x,x)\).citeturn9search0turn9search22
4) Address computational scaling (\(O(n^2)\) pseudo-observations) using
scalable algorithms if needed.citeturn9search3
Inference. Semiparametric theory provides asymptotic normality and consistent covariance estimators; robust/sandwich methods are typical.citeturn9search0
Software. pim (CRAN) implements PIMs
and provides practical guidance.citeturn1search7turn1search3
Pros/cons.
- Pros: continuous covariates + interactions for HTE; direct modeling of
conditional win probability.citeturn9search0
- Cons: computational burden and dependence; interpretation requires
mapping regression output to patient-facing
probabilities.citeturn9search3turn9search0
Open gaps. Robust, scalable PIMs for large pragmatic trials; principled handling of censoring in hierarchical comparisons; and standardized patient-facing summaries of \(\Delta(x)\).
Targeted estimand. A semiparametric regression model for covariate-dependent win fractions whose special cases include the two-sample WR; coefficients can be interpreted as log win-ratio-type effects under a proportionality structure.citeturn3search0turn3search4
Assumptions.
- A “proportional win-fractions” structure (analogous in spirit to
proportional hazards) underpinning the regression parameter
interpretation.citeturn3search0
- Censoring assumptions relevant to the chosen win estimand; diagnostics
are needed when follow-up-time dependence threatens
interpretability.citeturn4search17turn4search1
Implementation steps.
1) Specify \(W(\cdot,\cdot)\) for the
prioritized endpoint.citeturn3search0
2) Fit the proportional win-fractions regression; include
treatment×covariate interactions for
HTE.citeturn3search0turn3search4
3) Use model diagnostics proposed for the PW model (score-process-based
checks).citeturn3search0
4) Translate fitted parameters into predicted \(p_W(x)\) or \(\Delta(x)\) for representative
profiles.citeturn3search0
Inference. Semiparametric \(U\)-process theory; model-based standard errors and diagnostics are provided in the methodological framework.citeturn3search0
Software. WR R package includes PW
regression with vignettes for
use.citeturn3search5turn3search1turn3search16
Pros/cons.
- Pros: regression-based HTE within a WR-aligned model family; bridges
WR and Cox-type thinking.citeturn3search0turn3search4
- Cons: proportionality may fail; interpretability can be follow-up
dependent if estimands are not
time-restricted.citeturn4search17turn4search1
Open gaps. Extensions to doubly robust / ML nuisance fitting and to complex missingness.
Targeted estimand. A marginal win-odds estimand (linked to the marginal probabilistic index) with covariate adjustment for precision gains, rather than a conditional PIM coefficient.citeturn4search2turn4search10
Assumptions.
- Correct specification of the adjustment approach (as developed via the
connection to the marginal probabilistic
index).citeturn4search2
- Finite-sample behavior may involve type I error inflation in small
samples (as reported in
simulations).citeturn4search2turn4search10
Implementation steps.
1) Define WO (and tie rule).citeturn2search1turn2search4
2) Implement covariate adjustment using the marginal-PI
connection.citeturn4search2
3) Produce adjusted WO and CIs; evaluate small-sample performance if
sample sizes are modest.citeturn4search2
Inference. Derived within the marginal PI framework; simulation-based operating characteristics reported by the authors.citeturn4search2
Software. Research code availability varies; the method is presented as an arXiv preprint with accompanying theory and simulation examples.citeturn4search2turn4search6
Pros/cons.
- Pros: retains marginal estimand interpretation (useful for
CER/policy); can increase power when covariates are
prognostic.citeturn4search2
- Cons: methodology is newer; careful attention to small-sample
calibration is needed.citeturn4search2
Open gaps. Generalization to hierarchical time-to-event + non-survival mixtures with censoring and missing endpoints.
Targeted estimand. A marginal causal win estimand (ATE/ATT-type) depending on the weighting scheme; addresses baseline imbalance/confounding via inverse probability of treatment weighting.citeturn2search2turn14search14
Assumptions.
- Observational CER: conditional exchangeability, positivity, and a
sufficiently accurate propensity
model.citeturn12search10turn12search6
- Weight stability (diagnostics and truncation often
needed).citeturn12search2turn12search10
Implementation steps.
1) Estimate propensity scores \(e(X)=\Pr(A=1\mid
X)\).citeturn2search2turn12search10
2) Construct IPTW weights (ATE, stabilized ATE, or
ATT).citeturn2search2
3) Compute weighted win counts and derive WR (and/or \(\Delta\),
WO).citeturn2search2turn7search5
4) Check balance and overlap; consider
truncation.citeturn12search2turn12search10
Inference. As developed in the IPTW-adjusted WR framework with derived variance estimators and simulation assessments.citeturn2search2turn14search17
Software. WINS documents IPTW-adjusted
WR and related win-statistic
tooling.citeturn7search1turn7search5
Pros/cons.
- Pros: directly aligned with marginal causal estimands for CER;
compatible with subgroup HTE by stratification or interaction
modeling.citeturn2search2turn1search0
- Cons: sensitive to poor overlap; does not automatically handle
censoring/missingness unless combined with IPCW/DR
layers.citeturn12search2turn14search1
Open gaps. Unified weighting for simultaneous confounding + informative censoring + missing endpoints in win settings.
Targeted estimand. A win estimand corrected for censoring-induced bias (independent censoring via IPCW; dependent censoring via CovIPCW using baseline and/or time-dependent predictors).citeturn14search1turn14search0
Assumptions.
- IPCW validity requires correct modeling of the censoring mechanism (or
at least correct weights for the relevant comparison
contributions).citeturn14search1turn14search0
- For CovIPCW, dependent censoring is assumed predictable by included
covariates/time-dependent
covariates.citeturn14search0turn14search8
Implementation steps.
1) Model censoring (e.g., estimate \(\Pr(C\ge
t\mid \text{covariates})\)).citeturn14search1
2) Construct IPCW (or CovIPCW) weights and apply to pairwise win
contributions.citeturn14search1turn14search8
3) Compute adjusted WR/WO/NB; assess sensitivity to censoring model
choices.citeturn14search1turn14search0
Inference. Asymptotic variance formulas and simulations are provided for IPCW and extensions for dependent censoring.citeturn14search1turn14search0
Software. WINS supports IPCW-style
adjusted win estimation (and documents CovIPCW
concepts).citeturn7search1turn14search3
Pros/cons.
- Pros: addresses a major threat to interpretable win estimands:
censoring and follow-up-time
dependence.citeturn14search1turn4search17
- Cons: reliant on censoring model quality; can be complex with multiple
endpoints and time-dependent
covariates.citeturn14search0turn14search8
Open gaps. Cross-fitted IPCW for flexible ML censoring models while maintaining valid inference in win settings.
Targeted estimand. Generally, a causal \(U\)-statistic estimand defined by a contrast kernel \(W(\cdot,\cdot)\) averaged over the marginal distributions of potential outcomes; this includes Mann–Whitney-type causal effects and is directly relevant to win-based kernels.citeturn15search3turn15search0turn12search1
Assumptions.
- Causal identification in observational settings: unconfoundedness and
positivity.citeturn12search1turn12search10
- Double robustness typically requires at least one of the nuisance
components (e.g., propensity or outcome/distribution model) to be
correctly specified under the method’s
assumptions.citeturn15search3turn15search5
- Cross-fitting requires sample-splitting and appropriate rate
conditions to avoid overfitting
bias.citeturn12search0turn12search12
Implementation steps (blueprint).
1) Specify the win kernel \(W\) and
define the causal target (marginal \(\Delta\) vs conditional \(\Delta(x)\)).citeturn15search3turn7search0
2) Estimate nuisance functions: propensity \(e(X)\) and outcome/distributional
components as required for the DR
score.citeturn12search1turn15search3
3) Construct an orthogonal / EIF-based one-step or AIPW estimator; for
flexible ML nuisances, use \(K\)-fold
cross-fitting.citeturn12search0turn12search12
4) For TMLE-style targeting, use the TMLE framework (when an EIF is
available) to update nuisance fits toward the
estimand.citeturn7search3turn12search3
Inference.
- EIF-based variance estimation (sandwich) plus asymptotic normality
where established.citeturn15search3turn12search0
- Bootstrap may be used but can be computationally heavy for \(U\)-statistic-like win kernels; this is
discussed as a limitation in the DR Mann–Whitney
literature.citeturn15search5turn9search3
Software. No dominant “TMLE-for-win-ratio” package is established; building blocks exist in general TMLE and DML ecosystems and win-statistic packages.citeturn7search3turn12search0turn7search2
Pros/cons.
- Pros: strongest route to causal win-based HTE with high-dimensional
confounding and ML-based adjustment; principled inference via orthogonal
scores.citeturn12search0turn15search3
- Cons: EIF derivations for general hierarchical, censored,
mixed-type win rules are complex and remain a key research
frontier.citeturn4search17turn2search6
Open gaps. Full EIF/TMLE development for prioritized survival + PRO mixtures, with censoring and missing endpoints, and scalable computation.
Targeted estimand. A pairing-sensitive causal estimand for hierarchical outcomes that better approximates an individual-level causal notion by using nearest-neighbor pairing; shown to differ from the historical all-pairs WR/NB estimand and to avoid “reversed recommendations” under heterogeneity in constructed examples.citeturn7search0turn7search4
Assumptions.
- Identification assumptions (randomization or unconfoundedness) and
adequate covariate support for meaningful nearest-neighbor
pairing.citeturn7search0
- Curse-of-dimensionality concerns motivate additional modeling
(distributional regression) in observational
settings.citeturn7search0
Implementation steps.
1) Define the causal estimand explicitly (all-pairs vs
NN-paired).citeturn7search0
2) Form NN pairs between treated and controls in covariate
space.citeturn7search0
3) Compute win statistics on the NN-paired sample; extend via propensity
weighting and augmented estimation as proposed.citeturn7search0
Inference. Provided in the proposed framework with consistency results and DR claims; more work is needed for broad, routine inferential tooling.citeturn7search0
Software. Research-stage; described as straightforward to implement, but not yet standardized.citeturn7search0
Pros/cons.
- Pros: directly addresses the “pairing defines the estimand” issue;
offers a path to individual-centric, HTE-aware win
estimands.citeturn7search0turn3search7
- Cons: sensitive to high-dimensional covariates; needs robust inference
and extensions for
censoring/missingness.citeturn7search0turn4search17
Open gaps. Integration with IPCW/missing-data methods and development of inferential theory under complex sampling and clustering.
Targeted estimand. Individual-level win fraction summaries, used as pseudo-outcomes whose averages correspond to global win probability parameters; these enable regression and mixed models directly on win fractions.citeturn5search1turn5search9
Assumptions.
- The win fraction construction is rank-based and relies on the chosen
endpoint ordering and tie handling; inference in clustered designs
assumes a workable mixed-model framework for the
pseudo-outcomes.citeturn5search1turn5search9
Implementation steps.
1) Convert each endpoint to ranks and compute per-subject win fractions
against the opposite arm.citeturn5search9turn5search1
2) Aggregate across endpoints into a single “global win
fraction.”citeturn5search9
3) Fit regression or linear mixed models; for HTE, include interactions
or use ML regressors on the
pseudo-outcome.citeturn5search9turn5search1
Inference. Mixed-model-based interval estimation for cluster trials is demonstrated with simulation support for coverage and type I error.citeturn5search1turn5search9
Software. The method is designed to be implementable using standard tools, with R/SAS code provided in the work.citeturn5search9turn5search1
Pros/cons.
- Pros: makes win-based outcomes compatible with standard regression/ML
pipelines; attractive for pragmatic cluster
trials.citeturn5search9turn5search2
- Cons: pseudo-outcomes may obscure component-level interpretation
unless decomposed; theoretical links to causal win-CATE need
strengthening.citeturn5search9turn7search0
Open gaps. EIF-based pseudo-outcomes for \(\Delta(x)\) with honest ML inference under pairwise dependence.
Targeted estimand. Win-CATE analogues such as \(\tau(x)=\Delta(x)\), learned flexibly via ML HTE methods once an appropriate outcome or pseudo-outcome is defined. Causal trees/forests and meta-learners provide frameworks for subgroup discovery and individualized effect estimation under unconfoundedness.citeturn11search1turn11search0turn11search2
Assumptions.
- Unconfoundedness and positivity for causal CATE learning in
observational data; randomization in
RCTs.citeturn11search0turn11search2
- Valid inference requires honest splitting or asymptotic theory for
forest estimators.citeturn11search1turn11search0
Implementation steps (win-adaptation pattern).
1) Choose a win-based target (e.g., \(\Delta(x)\)) and define a pseudo-outcome
with \(\mathbb{E}[\text{pseudo}\mid
X=x]=\Delta(x)\). EIF-based pseudo-outcomes are principled when
available.citeturn15search3turn12search0
2) Fit causal forests (or meta-learners) using nuisance models for
propensity and outcome
components.citeturn11search0turn11search2
3) Use honest sub-sampling / cross-fitting to control
overfitting.citeturn11search1turn12search12
4) Validate subgroups with pre-specification and careful reporting per
PCORI standards.citeturn1search0turn1search4
Inference.
- Causal forests: asymptotic normality and interval methods are provided
in the causal forest theory and
implementations.citeturn11search0turn11search3
- Recursive partitioning: uses “honest” trees and built-in testing
strategies for heterogeneity.citeturn11search1
Software. grf provides causal forests
and related estimators, with support for HTE inference and some
censoring/missingness
options.citeturn11search3turn11search11turn11search23
Pros/cons.
- Pros: scalable high-dimensional HTE discovery; strong inferential
foundation for forests and honest
trees.citeturn11search0turn11search1
- Cons: win outcomes are pairwise/structured; defining correct
pseudo-outcomes and preserving component interpretability requires
methodological care.citeturn15search3turn5search9
Open gaps. A “standard recipe” for win-based causal forests with validated pseudo-outcomes and component-level explainability.
Targeted estimand. Cluster- or period-adjusted win estimands in clustered and stepped-wedge designs, often combining GPC with mixed models for cluster-period summaries or using within-cluster PIMs.citeturn5search2turn9search2
Assumptions.
- Correct handling of time trends and clustering is essential in
stepped-wedge designs; failure can inflate type I
error.citeturn5search2turn5search10
- Mixed-effects modeling assumptions for random effects; design-specific
restrictions (e.g., within-cluster comparisons) may be
required.citeturn5search2turn5search9
Implementation steps.
1) For stepped-wedge CRTs, compute cluster-period win odds (or related
summaries) and fit hierarchical mixed-effects models; alternatively,
apply a cluster-restricted PIM using within-cluster
comparisons.citeturn5search2turn9search2
2) Include fixed effects for time/sequence and random effects for
clusters (and possibly random
slopes).citeturn5search2turn5search10
3) For HTE: include patient- and cluster-level interactions (e.g.,
treatment×baseline risk, treatment×cluster
characteristics).citeturn1search0turn5search2
Inference. Simulation evidence shows that mixed-effects and cluster-restricted PIM approaches can maintain nominal type I error where naive methods fail in stepped-wedge settings.citeturn5search2turn5search6
Software. Standard mixed-model software + PIM tools; the win-fraction mixed-model approach emphasizes implementability in standard packages.citeturn5search9turn5search1
Pros/cons.
- Pros: directly addresses pragmatic trial designs common in PCORI
portfolios; supports cluster-level
HTE.citeturn5search10turn5search2
- Cons: win-based mixed modeling remains less standardized;
methodological guidance is still
emerging.citeturn5search2turn5search6
Open gaps. Formal semiparametric theory for hierarchical PIMs; small-sample corrections and robust variance tools for few clusters.
Targeted estimand. Many win estimators are (multivariate) \(U\)-statistics; semiparametric theory provides large-sample inference for WR and \(\Delta\) and supports stratified and multi-group extensions.citeturn8search10turn8search2turn0search1
Assumptions.
- Regularity conditions for \(U\)-statistic asymptotics; independence
assumptions may be violated under clustering or repeated measures
without correction.citeturn8search10turn5search2
Implementation steps.
1) Write the estimator as a \(U\)-statistic or sum of pairwise
contributions.citeturn8search10turn0search1
2) Use derived asymptotic variance formulas; optionally use resampling
(bootstrap/permutation) for
calibration.citeturn8search10turn7search10turn2search18
Inference. Large-sample normal approximations for WR and “proportion in favor of treatment” are derived from multivariate multi-sample \(U\)-statistic limits.citeturn8search10turn8search2
Software. BuyseTest explicitly supports
inference via asymptotic \(U\)-statistic theory or resampling
methods.citeturn7search10turn7search6
Pros/cons.
- Pros: principled inference for classical win estimands; strong
foundation for extensions.citeturn8search10
- Cons: scaling and dependence issues in complex designs; extensions to
censored/missing hierarchical mixtures require additional
work.citeturn4search17turn2search6
Open gaps. Efficient computation and EIF-based extensions for modern causal/ML HTE targets.
Targeted estimand. Win ratio for hierarchical endpoints when data include right-censoring and missing endpoints; a nonparametric MLE uses all observed information and yields closed-form asymptotic variance.citeturn2search6turn14search7
Assumptions.
- As specified in the NPMLE framework: censoring and missingness
mechanisms compatible with the modeling/identification assumptions
(e.g., missing-at-random-type structures, depending on the exact
setup).citeturn2search6turn14search7
Implementation steps.
1) Specify the two-level hierarchical endpoint structure and missingness
indicators.citeturn2search6
2) Fit the NPMLE for the joint structure and derive
WR.citeturn2search6turn14search7
3) Report asymptotic variance-based CIs; validate via simulation if
possible.citeturn2search6turn14search7
Inference. Closed-form asymptotic variance estimator is provided.citeturn2search6turn14search7
Software. Research-stage; distributed with the preprint and emerging tooling.citeturn2search6
Pros/cons.
- Pros: directly addresses a common PCORI pain point (censoring +
missing patient-centered outcomes) without heavy parametric
assumptions.citeturn2search6turn10search17
- Cons: currently specialized (two hierarchical endpoints);
generalization to richer hierarchies and ML-based missingness modeling
remains open.citeturn2search6turn5search4
Open gaps. Extending NPMLE concepts to multi-endpoint hierarchies and integrating with causal DR methods.
U-statistic asymptotics. The large-sample inference framework for WR and the “proportion in favor” parameter uses multivariate \(U\)-statistics to derive tests and confidence intervals, and it extends to stratified settings.citeturn8search10turn8search2
Influence functions and local efficiency. A general causal \(U\)-statistic framework provides naive IPW and locally efficient doubly robust estimators for contrast-function estimands—which directly applies to win kernels—and supports the path toward EIF-based win estimators.citeturn15search3turn15search0
Bootstrap and permutation. Win-odds inference work explicitly discusses exact permutation and bootstrap variance estimators and regression extensions via probabilistic index ideas.citeturn2search18turn2search16 Win-statistic software notes both resampling and asymptotic approaches.citeturn7search10turn7search6
Cluster-robust inference. In cluster and stepped-wedge designs, ignoring clustering/time can compromise type I error, while mixed-effects and cluster-restricted PIM approaches can restore calibration.citeturn5search2turn5search6
A recurring challenge is that some “traditional” WR estimands can be influenced by trial-specific censoring patterns if the estimand does not explicitly define a comparison horizon; this threatens interpretability and cross-study transportability.citeturn4search17turn4search1
IPCW/CovIPCW. IPCW-adjusted win ratio estimators are developed to remove bias under right censoring, with extensions (CovIPCW) for dependent censoring predictable by covariates/time-dependent covariates.citeturn14search1turn14search0turn14search8
Censoring-aware GPC. GPC has been extended to right-censoring by defining pairwise contributions based on estimated survival functions.citeturn8search3turn8search7
Time-restricted estimands. A principled response is to define restricted-time win ratio estimands at a pre-specified horizon and provide corresponding estimation approaches, improving interpretability across studies.citeturn6search2turn4search5
RMT-IF (restricted mean time in favor). RMT-IF is defined as the net average time treated individuals spend in more favorable states than controls over a pre-specified time window, generalizing RMST to multistate settings and providing a patient-friendly time-based effect size.citeturn4search4turn4search16turn4search0
Win methods are naturally suited to settings where mortality competes with recurrent nonfatal events (hospitalizations). Variants such as “last-event-assisted” WR use more recurrent-event information than the standard WR.citeturn6search0turn6search16
For semi-competing risks (terminal and non-terminal events), event-specific win ratios and global tests based on them are proposed to enhance power and interpretability when effects differ by event type—an explicit form of component-level heterogeneity that can interact with patient-level HTE.citeturn5search3turn5search7
Inference for win-loss parameters with right-censored death and recurrent events has been developed, addressing a gap in censored multiple-event win inference.citeturn6search1turn6search5
PCORI and ICH emphasize clarity about estimands and sensitivity analyses for missing data.citeturn1search1turn1search0turn10search1
General missing data guidance. Missing data can undermine trial validity and should be addressed via design and principled analysis; major guidance recommends sensitivity analyses for plausible MNAR mechanisms.citeturn10search17turn10search13turn10search0
Win-specific missingness methods.
- Global win probability methods explicitly accommodate missing data and
baseline adjustment across
endpoints.citeturn5search4turn5search0
- NPMLE approaches explicitly address hierarchical endpoints with
censoring and missing data.citeturn2search6turn14search7
MNAR sensitivity. Practical frameworks exist for MNAR sensitivity analysis and are recommended in applied settings when MNAR is plausible.citeturn10search0turn10search13
Joint models. When missingness is driven by informative dropout linked to longitudinal and survival processes, joint models are widely used to model the outcome and dropout/time-to-event jointly; these can be used as sensitivity/robustness tools around win analyses (particularly when PROs are missing due to deteriorating health).citeturn10search10turn10search18turn10search2
| Method | Estimand targeted | Causal vs associational | Handles censoring | Handles missing data | Covariate adjustment | ML nuisance estimation | Inference | Software | PCORI-style CER suitability |
|---|---|---|---|---|---|---|---|---|---|
| All-pairs WR / \(\Delta\) / WO | Marginal \(p_W,p_L,p_T\), \(\Delta\), WR/WO | Causal in RCT; associational in observational | Partially; estimand may be follow-up dependent | Limited | Not beyond ad hoc | No | \(U\)-stat asymptotics; boot/permutation | BuyseTest,
WINSciteturn7search2turn7search1turn8search10 |
Strong for transparent primary analyses; limited for confounding/HTE |
| Stratified WR | Stratum-specific marginal effects; MH-type combined | Causal in RCT; conditional on adequacy in observational | If combined with IPCW or restricted-time | Limited | Low-dim via strata | No | Plug-in variance + homogeneity test | Implementable via win packages | Strong for pre-specified subgroups and PCORI reportingciteturn13search0turn1search0 |
| PIM / pairwise regression | Conditional \(p_W(x,x')\) (and interactions) | Conditional; causal in RCT; causal in obs w/ assumptions | Not native; needs extension | Limited | Yes (interactions) | Possible (research) | Sandwich/semiparametric | pimciteturn9search0turn1search7 |
Strong for modeling HTE; needs careful interpretation |
| Proportional win-fractions regression | Conditional win-fractions regression parameters | Conditional; causal in RCT | Requires estimand clarity; can combine with IPCW/time restriction | Not primary | Yes | Not standard | Semiparametric theory + diagnostics | WRciteturn3search5turn3search0 |
Good for trials/pragmatic studies; needs robustness extensions |
| Marginal covariate-adjusted WO | Marginal WO via marginal PI | Causal in RCT | As per WO; extensions needed | Limited | Yes (precision) | Not emphasized | Theory + simulations | Research-stage | Promising for trial efficiency; still emergingciteturn4search2 |
| IPTW-adjusted WR | Marginal causal WR (ATE/ATT) | Causal under unconfoundedness | Not automatic; can combine with IPCW | Not automatic | Yes (via IPTW) | Yes (PS w/ ML + cross-fitting) | Derived variance + diagnostics | WINS + custom |
High relevance for observational CERciteturn2search2turn12search10 |
| IPCW / CovIPCW | Censoring-robust marginal WR/WO/NB | Causal in RCT; under censoring-model assumptions | Yes | No | Yes (censoring model) | Yes (censor model ML) | Theory + simulations | WINS |
Essential when censoring differs by covariatesciteturn14search1turn14search0 |
| DR / AIPW / TMLE + cross-fitting | Causal win estimands defined by win kernel | Causal under unconfoundedness | In principle (needs EIF) | In principle (needs EIF) | Yes | Yes (core strength) | EIF-based + cross-fit | General TMLE/DML toolchain | High potential; major open developmentciteturn15search3turn12search0turn7search3 |
| NN pairing causal win estimands | Pairing-sensitive causal estimands | Causal (assumptions) | Not central (extendable) | Not central (extendable) | Yes (pairing/PS) | Suggested via distributional regression | Research-stage | Research-stage | High potential for patient-centric HTE; needs broader toolingciteturn7search0 |
| Win-fraction pseudo-outcomes + ML | \(\Delta(x)\) analogues via pseudo-outcomes | Causal if pseudo-outcome identifies CATE | Needs integration | Possible (depends) | Yes | Yes | ML inference + resampling/forest theory | Standard ML + mixed models | Promising for scalable HTE + communication; needs validationciteturn5search9turn11search0 |
| Cluster/stepped-wedge GPC (mixed effects / cluster PIM) | Cluster/time-adjusted win estimands | Causal under design assumptions | Depends | Depends | Yes | Possible | Simulation-supported calibration | Standard mixed model + PIM | Highly relevant for pragmatic PCORI designsciteturn5search2turn5search10 |
| NPMLE (censoring + missingness) | WR for hierarchical endpoints | Causal in RCT | Yes | Yes | Limited (via modeling) | Not emphasized | Closed-form asymptotic variance | Research-stage | Highly relevant for patient-centered endpoints with missingnessciteturn2search6turn14search7 |
A PCORI-aligned workflow emphasizes transparency, stakeholder relevance, and reproducibility.citeturn1search0turn1search4turn1search1
Define the clinical priority rule \(W(\cdot,\cdot)\) with stakeholders. Document priorities, tie thresholds, and the rationale for patient-centered relevance.citeturn0search0turn0search1
Specify the estimand and pairing choice. Explicitly state whether the target is marginal (\(p_W\), \(\Delta\)) or conditional (\(p_W(x)\), \(\Delta(x)\)), and whether all-pairs, stratified/matched, or NN pairing is used.citeturn4search17turn7search0turn1search1
Primary overall analysis. Estimate marginal WR/WO/NB with \(U\)-statistic-based inference or resampling.citeturn8search10turn2search18turn7search10
Pre-specified subgroup HTE. Use stratified WR and homogeneity tests; report effect sizes and uncertainty within each subgroup and avoid interpretive fallacies highlighted by PCORI.citeturn13search0turn1search0turn1search4
Model-based HTE.
Observational CER adjustment. Use IPTW-adjusted WR; check balance/overlap and consider doubly robust extensions where feasible.citeturn2search2turn12search10turn15search3
Censoring and missing data robustness.
Win-based HTE is often easier to communicate on absolute scales (e.g., \(\Delta(x)\)) than ratio scales (WR).citeturn2search18turn0search1 Suggested templates:
The aims below are concrete, PCORI-relevant, and aligned with current gaps evidenced in the win-statistics and HTE literatures.
Create a formal taxonomy linking (i) marginal vs conditional win estimands, (ii) pairing operators (all-pairs, stratified, NN), and (iii) causal interpretations under randomization and unconfoundedness, explicitly addressing paradoxes in “probability of benefit” interpretations.citeturn7search0turn3search7turn4search17turn1search1
Using the causal \(U\)-statistic framework for contrast kernels, derive EIFs for key win estimands with censoring and missing endpoints, enabling AIPW/TMLE estimators with cross-fitting and ML nuisance estimation (propensity, censoring, distributional regression).citeturn15search3turn12search0turn14search1turn2search6
Develop win-fraction/EI F-based pseudo-outcomes sustaining \(\mathbb{E}[\text{pseudo}\mid X]=\Delta(x)\), then adapt causal forests/meta-learners to estimate \(\Delta(x)\) with valid intervals and component-level explanation (attribution) aligned with the hierarchy.citeturn11search0turn11search2turn5search9turn0search1
Combine IPTW with IPCW/CovIPCW and extend to doubly robust estimators, with clear guidance on diagnostics (overlap, censoring-model fit, missingness sensitivity) and simulation-backed recommendations.citeturn2search2turn14search0turn10search0turn10search13
Extend and unify cluster-restricted PIMs, mixed-effects GPC summaries, and win-fraction mixed models, adding small-sample corrections and standardized reporting for subgroup and individualized heterogeneity (including site-level effect modification).citeturn5search2turn5search9turn5search10
timeline
title Research trajectory and near-term opportunities for win-based HTE
2010 : GPC and net benefit framework (pairwise comparisons)
2012 : Win ratio popularization for hierarchical composites
2012 : Probabilistic Index Models (pairwise regression for superiority probabilities)
2016 : Large-sample U-statistic inference for WR and net benefit
2018 : Stratified win ratio + homogeneity testing
2020 : IPCW-adjusted WR for censoring bias
2021 : Proportional win-fractions regression (semiparametric WR regression)
2021 : Win odds + inference accounting for ties
2024 : Estimand clarity for WR; global win probability with missing endpoints
2025 : IPTW-adjusted WR; covariate-adjusted win odds; causal NN pairing framework
2026 : NPMLE for censoring + missingness; stepped-wedge GPC/PIM simulation guidance
2026+ : (Opportunity) DR/TMLE win-CATE, win-based causal forests, unified CER toolkits
The chart below is illustrative (conceptual) and is intended as a proposal-friendly depiction of expected tradeoffs; it is not based on a specific simulation run.
xychart-beta
title "Illustrative robustness to CER complications (higher is better; conceptual)"
x-axis ["Unadj WR","Strat WR","PIM","PW reg","IPTW WR","IPCW WR","DR/cross-fit win","NPMLE miss+cens","NN pairing causal","Win-frac + ML"]
y-axis "Conceptual robustness score" 0 --> 10
bar [2,4,5,5,7,7,9,8,8,8]
Below are prioritized primary/official/original sources (≥25) with DOIs/URLs when available.
doi:10.1093/eurheartj/ehr352.
citeturn0search0turn0search4doi:10.1002/sim.3923.
citeturn0search1turn0search5doi:10.1093/biostatistics/kxv032.
citeturn8search10turn8search2doi:10.1177/0962280216658320.
citeturn8search3turn8search7doi:10.1080/10543406.2017.1397007.
citeturn13search0turn13search1doi:10.1111/biom.13382.
citeturn3search0turn3search4WR R package (PW regression and win methodology).
https://cran.r-project.org/package=WR.
citeturn3search5turn3search1doi:10.1002/pst.2086.
citeturn14search0turn14search4WINS R package documentation.
https://cran.r-project.org/web/packages/WINS/.
citeturn7search1turn7search5doi:10.1002/sim.8967.
citeturn2search1turn2search4https://arxiv.org/abs/2511.14292.
citeturn4search2turn4search6doi:10.1111/j.1467-9868.2011.01020.x.
citeturn9search0turn9search14pim R package.
https://cran.r-project.org/package=pim.
citeturn1search7turn1search3doi:10.1002/sim.7799.
citeturn3search3turn3search7doi:10.1002/sim.6026.
citeturn12search1turn12search5doi:10.1093/biomet/asx071.
citeturn15search0turn15search3doi:10.1111/insr.12326.
citeturn15search1turn15search14doi:10.1111/ectj.12097.
citeturn12search0turn12search4turn12search12doi:10.1007/978-1-4419-9782-1.
citeturn7search3turn12search3https://arxiv.org/abs/2501.16933.
citeturn7search0turn7search4doi:10.1073/pnas.1510489113.
citeturn11search1https://arxiv.org/abs/1510.04342.
citeturn11search0doi:10.1073/pnas.1804597116.
citeturn11search2turn11search6grf package for causal forests and HTE inference.
https://cran.r-project.org/package=grf.
citeturn11search3turn11search11turn11search23doi:10.1002/sim.9937.
citeturn6search1turn6search5doi:10.1177/1740774520972408.
citeturn5search3turn5search7doi:10.1080/19466315.2024.2332675.
citeturn6search2turn6search10doi:10.1111/biom.13570.
citeturn4search4turn4search0turn4search16https://arxiv.org/abs/2603.02003.
citeturn5search2turn9search2https://arxiv.org/abs/2602.13533.
citeturn2search3turn2search6https://www.pcori.org/.../page-5.
citeturn1search0turn1search4doi:10.17226/12955.
citeturn10search13turn10search5doi:10.1056/NEJMsr1203730. citeturn10search17https://miguelhernan.org/whatifbook.
citeturn16search8turn16search0