Executive summary

This document presents a structured application of the Causal Roadmap to evaluate the comparative effectiveness of HIV treatment regimens under imperfect adherence using real-world data. Standard survival analyses, particularly Cox regression with time-varying covariates, are not well suited to this question: hazard ratios are difficult to interpret causally due to selection bias, non-collapsibility, and dependence on the censoring distribution. Moreover, such approaches do not directly target clinically interpretable estimands such as risk differences or cumulative incidence under adherence strategies.

To address these limitations, we employ Targeted Learning (TL) methods, specifically longitudinal TMLE and LMTP, that align with the ICH E9(R1) estimand framework. These methods enable estimation of regimen-specific cumulative incidences accounting for competing risks (e.g., discontinuation, switching), with contrasts expressed as marginal risk differences and risk ratios at clinically relevant time horizons i.e., 12, and 36 months). By explicitly incorporating adherence patterns, intercurrent events, and censoring mechanisms into the estimand definition and estimation strategy, the roadmap provides a principled path from causal question to statistical inference.

We supplement this roadmap with simulation studies to verify that LMTP estimators recover the true causal effect in longitudinal data generating processes with treatment–confounder feedback. Simulations demonstrate that naïve Cox models, even with time-varying covariates, can yield biased or misleading results, while TL-based estimators remain consistent under realistic assumptions.

The roadmap and simulation together provide a transparent, reproducible template for regulatory-grade RWE analyses. They illustrate how causal inference and TL methodology can generate interpretable and unbiased evidence for comparative effectiveness of HIV medications under real-world adherence, supporting policy and regulatory decision-making.

NOTE: This is a living draft document that will evolve as the simulation is refined to better realistically emulate the real data and as the analysis plan is edited by collaborators.

1 Step 1a. Define the Causal Question and Causal Estimand

Purpose of the step. To clearly articulate the scientific question of interest and the corresponding causal estimand, defined in terms of potential outcomes.

1.0.1 Scientific Question

Among adults living with HIV initiating antiretroviral therapy (ART), what is the causal effect of initiating and remaining on Regimen A versus Regimen B on the risk of virologic failure within 12 and 36 months, if adherence were standardized according to a realistic adherence policy?

This question acknowledges that differences in observed outcomes may arise both from intrinsic differences in regimen efficacy under imperfect adherence and from variation in adherence. By applying a longitudinal modified treatment policy (LMTP) intervention on adherence, we ask how outcomes would differ if both regimens were subject to the same adherence process, thus isolating regimen performance from adherence patterns.

1.0.2 Causal Estimand

Let:

\(A_0 \in \{A,B\}\): indicator of initiating Regimen A or Regimen B at baseline.
\(Z_t \in [0,1]\): adherence in block \(t\) (measured by proportion of days covered, PDC).
\(Y_\tau(a_0, g^Z)\): potential outcome, equal to 1 if virologic failure has occurred by time \(\tau\), under baseline regimen \(a_0\) and adherence modified by policy \(g^Z\).
Policies of interest include:
- Observed adherence: \(g^{obs}\), the natural adherence distribution.
- Improved adherence: e.g., capping refill gaps at ≤14 days.
- Perfect adherence: \(Z_t = 1 \ \forall t\) (sensitivity only, may violate positivity).

The primary causal estimand is the risk difference at time \(\tau\):

\[ \Psi_{\text{RD},\tau} = \Pr\{Y_\tau(A_0=A, g^Z)\} - \Pr\{Y_\tau(A_0=B, g^Z)\}, \quad \tau \in \{12, 36\} \text{ months}. \]

We will report both the risk difference and risk ratio under the prespecified adherence policy \(g^Z\). Secondary estimands include contrasts under alternative adherence policies (e.g., observed adherence vs capped gaps, “forgiveness curves” increasing minimum adherence thresholds).

1.1 Step 1b. Specify the Causal Model

Purpose. To describe the data-generating process, including variables, temporal ordering, and causal pathways, highlighting potential confounders and intercurrent events.

1.1.1 Summary of Key Analytic Concerns

Confounding: Baseline disease severity (VL, CD4), longitudinal comorbidities and co-medications, and predictors of adherence may influence both regimen assignment and outcomes.
Intercurrent events: Nonadherence, regimen switching, loss to follow-up, and death. Strategies for handling each (e.g., censoring, LMTP interventions) are prespecified later.
Positivity concerns: Perfect adherence or no-switch regimes may lack empirical support. LMTP interventions are designed to improve positivity by defining feasible adherence modifications, and simulation test the practical effects of positivity issues.

1.1.2 Longitudinal Data Structure

Follow-up is partitioned into discrete 90-day intervals (with sensitivity analyses at 30 and 180 days). For each patient:

Baseline covariates (\(W\)): demographics, region, calendar time, baseline HIV RNA and CD4, comorbidities, behavioral/mental health, care utilization, payer stability.
Baseline treatment (\(A_0\)): initial ART regimen (A vs B).
Time-varying covariates (\(L_t\)): interim diagnoses, comedications, prior VL status, utilization, comorbidities, and CD4 count.
Adherence (\(Z_t\)): proportion of days covered (PDC) in block \(t\).
Monitoring (\(M_t\)): indicator of viral load measured in block \(t\).
Censoring (\(C_t\)): disenrollment or loss to follow-up.
Switching (\(S_t\)): switch off initial ART regimen to alternative (A vs B).
Death (\(D_t\)): all-cause mortality.
Outcome (\(Y_t\)): confirmed virologic failure by end of block \(t\), defined as ≥2 consecutive HIV RNA ≥200 copies/mL or ≥200 followed by regimen change.

1.1.3 Structural Equations

With exogenous errors \(U=(U_W, U_A, U_L, U_Z, U_M, U_C, U_S, U_D, U_Y)\), our structural equation models are:

\[ \begin{aligned} W &= f_W(U_W) \\ A_0 &= f_A(W, U_A) \\ L_t &= f_{L_t}(W, A_0, \bar{L}_{t-1}, \bar{Z}_{t-1}, \bar{M}_{t-1}, U_{L_t}) \\ Z_t &= f_{Z_t}(W, A_0, \bar{L}_t, \bar{Z}_{t-1}, \bar{M}_{t-1}, U_{Z_t}) \\ M_t &= f_{M_t}(W, A_0, \bar{L}_t, \bar{Z}_t, \bar{M}_{t-1}, U_{M_t}) \\ C_t &= f_{C_t}(W, A_0, \bar{L}_t, \bar{Z}_t, \bar{M}_t, U_{C_t}) \\ S_t &= f_{S_t}(W, A_0, \bar{L}_t, \bar{Z}_t, \bar{M}_t, U_{S_t}) \\ D_t &= f_{D_t}(W, A_0, \bar{L}_t, \bar{Z}_t, \bar{M}_t, U_{D_t}) \\ Y_t &= f_{Y_t}(W, A_0, \bar{L}_t, \bar{Z}_t, \bar{M}_t, \bar{D}_t, U_{Y_t}) \end{aligned} \]

This structure encodes that adherence (\(Z_t\)) is influenced by prior health status and prior adherence, and that both adherence and monitoring affect the probability of outcome ascertainment. This reflects that adherence is history-dependent and that monitoring and censoring processes are informative for the virologic outcome and its ascertainment.

1.1.4 Directed Acyclic Graph

The DAG below encodes a single time-lag approximation (nodes \(L_{t-1}, Z_{t-1}, M_{t-1}\)) with contemporaneous nodes in block \(t\). Arrows follow the structural equations above. Note for simplicity, we show a single prior timepoint, but are adjusting for the full history in the estimation step unless otherwise noted.

1.1.5 Interpretation of the DAG

\(W\) confounds baseline treatment and downstream processes.
Time-varying confounding arises via \(L_{t-1}\), \(Z_{t-1}\), and \(M_{t-1}\), which affect current adherence \(Z_t\), monitoring \(M_t\), and ultimately \(Y_t\).
\(M_t\) is shown pointing to \(Y_t\) to reflect ascertainment (whether a VL is measured determines whether failure is observed), which is handled via monitoring models (IPCW) in estimation.
\(C_t, S_t, D_t\) are intercurrent events (loss, switching, death) that compete with or modify failure processes; primary analyses use hypothetical interventions on censoring/monitoring and treat death as a competing risk.

1.1.6 Adjustment Sets

Why a static back-door set is insufficient.
Because adherence and clinical state evolve over time and are themselves affected by prior treatment, a single baseline adjustment set (e.g., \(\{W\}\)) does not block all confounding paths for the effect of interest at time \(\tau\). In particular, time-varying confounders (e.g., \(L_{t-1}\)) are (i) predictors of the current adherence \(Z_t\) and outcome \(Y_t\), and (ii) are themselves affected by prior adherence and treatment, creating treatment-confounder feedback. This is precisely the setting where longitudinal methods like TMLE and IPTW are required.

Baseline covariates.
For the direct effect of baseline regimen \(A_0\) on \(Y_t\) ignoring dynamics, a minimal back-door set is: - \(\{ W \}\).
However, this is not sufficient for causal identification in the presence of time-varying confounding.

Sequential adjustment sets for longitudinal identification.
Under standard conditions (consistency, sequential exchangeability, positivity), the longitudinal g-formula identifies the target parameter when models are conditioned on the full observed history up to each time: - At each block \(t\): condition on
\[ H_t \equiv \{ W,\ A_0,\ \bar L_{t-1},\ \bar Z_{t-1},\ \bar M_{t-1} \}, \] when modeling \(Z_t\), \(M_t\), and the failure hazard for \(Y_t\).
- For censoring/monitoring adjustments (IPCW), model \(C_t\) and \(M_t\) given \(H_t\) as above.
- For switching (if treated as censoring in a per-protocol analysis), include \(S_t\) in the censoring model given \(H_t\).
- For LMTP, the policy \(g^Z\) modifies \(Z_t\); the identification still relies on correctly modeling the observed-data mechanisms conditional on \(H_t\).

Practical implementation: what we adjust for in models.
1. Outcome mechanism: model \(P(Y_t=1 \mid H_t, Z_t, M_t, \text{intercurrent events})\).
2. Treatment/adherence mechanism: model \(P(Z_t \mid H_t)\) (and apply the LMTP policy \(g^Z\) when computing counterfactual risks).
3. Monitoring/censoring mechanism: model \(P(M_t \mid H_t, Z_t)\) and \(P(C_t \mid H_t, Z_t, M_t)\); include \(S_t\) and \(D_t\) as competing risks.

2 Step 2. Observed Data

Purpose of the step. Make the analysis-ready data structure explicit and testable by (i) formalizing the observed-data vector and factorization, (ii) stating time indexing and block aggregation rules, (iii) operationalizing each variable (including monitoring and censoring), and (iv) mapping nodes to the estimand and estimator. This aligns Step 2 with the Causal Roadmap and ICH E9(R1) so the SAP, simulation, and {lmtp} implementation are coherent.

Note: this is currently using simulated data for illustration without having explored the HealthVerity data. The specific covariates included, dataset size, and realism of the simulation will be updated after exploring the HealthVerity EHR data avalability (while remaining blinded to the outcome beyond broad descriptive statistics).

2.1 Data sources and provenance

The observed data is linked claims–pharmacy–lab RWD from HealthVerity, with pharmacy fills with days’ supply and payers (for percent-days covered (PDC) calculations), and linked quantitative HIV RNA Viral Load (VL) from national laboratories.

To align the analysis with a target-trial emulation, we define the time zero as the first new-use ART fill after ≥12 months washout, the eligibility as adults (≥18), we follow-up at fixed horizons (12 and 36 months), and discritize the data into 90 blocks to match expected refill behavior; sensitivity

Discretize to K blocks (primary: 90-day; sensitivity: 30/180-day) .

2.2 Time scale and discretization

2.2.1 Defining time nodes

\(t = 0,\dots,K-1\). Primary discretization: 90-day blocks to match expected refill behavior, with a sensitivity analysis using 60 or120-day blocks to assess discretization bias on adherence, monitoring, and risk. We wide-reshape the data with one row per subject with block-level columns (e.g., PDC_0..PDC_{K-1}, VL_imp_0.., M_0.., C_0..). This is the format used by the {lmtp} package as well.

2.3 2.3 Observed-data vector and factorization

We analyze longitudinal blocks with the observed datum \[ O=\Big(W,\ A_0,\ \{L_t,\ Z_t,\ M_t,\ C_t,\ S_t\}_{t=0}^{K-1},\ Y\Big), \] where:

\(W\): baseline covariates (demography, region, calendar time; baseline VL/CD4 if available; comorbidity/utilization indices).
\(A_0\): baseline regimen (Drug A vs B) or block‐specific regimen A_blk_t when the estimand intervenes on switching each block.
\(Z_t\): adherence in block \(t\) (PDC\(_t\in[0,1]\)).
\(L_t\): time-varying confounders (e.g., VL_imp_t, CD4_t, engagement/visits, utilization proxies).
\(M_t\): monitoring/measurement indicator (any VL in block \(t\)).
\(C_t\): censoring/observability (coverage/eligibility present for the block; administrative censoring at horizon).
\(S_t\): regimen switching indicator (away from \(A_0\) index drug).
\(Y_t\): survival outcome of any failure by time t, evaluated at the pre-specified horizon (e.g., failed suppression by 36 months).

The observed-data likelihood factorizes as \[ P(O)=P(W)\,P(A_0\!\mid W)\,\prod_{t=0}^{K-1}\!\!\Big\{ P(L_t\!\mid \text{history})\, P(Z_t\!\mid \text{history})\, P(M_t\!\mid \text{history})\, P(C_t\!\mid \text{history})\, P(S_t\!\mid \text{history}) \Big\}\, P(Y\!\mid \text{history}), \] which is the statistical basis for the longitudinal g-formula and SDR/TMLE estimators used in our SAP.

2.4 Intercurrent-event strategies

2.4.1 Primary estimand (fixed horizon \(T\); risk, RD, RR, RMST)

We target the risk (and RD/RR; RMST as secondary) of virologic failure by \(T\) under real-world adherence and care. Because Cox HRs can depend on the censoring distribution under non-proportional hazards, we prefer risk-based summaries at fixed \(T\).

2.4.2 Strategy for each intercurrent event

Death (\(F_t\)) : Competing risk
Count death before first suppression as failure.
Regimen switching (\(S_t\)) : Treatment-policy (primary)
Continue follow-up after switch; include \(S_t\) in the time-varying history \(L_t\) to adjust for informative switching in SDR/TMLE.
Sensitivity: While-on-treatment (censor at \(S_t=1\)); Hypothetical (as if no switching) via censor-at-switch with identification via sequential independent censoring given \(\overline L_t\).
Treatment discontinuation (\(D_t\)) : Treatment-policy (primary)
Continue follow-up after discontinuation; include \(D_t\) (start-of-gap, see below) in \(L_t\).
Sensitivity: Composite (count discontinuation as failure); While-on-treatment (censor at \(D_t=1\)); Hypothetical (as if no discontinuation) via censor-at-\(D_t\) with appropriate modeling.
Monitoring/assessment process (\(M_t\)) : Treatment-policy (estimation-level handling)
Do not intervene on monitoring; include \(M_t\) in \(L_t\) so TMLE accounts for outcome-dependent/visit-dependent measurement. Use imputed working variable VL_imp_t only to avoid missingness in learners; keep \(M_t\) explicit so robustness relies on missingness-at-random given \(\overline L_t,\overline M_t\).
Observability/coverage (\(C_t\): disenrollment/LTFU/admin) : Censoring mechanism
Reserve \(C_t\) for data observability only. Model it as the censoring process in SDR/TMLE. Administrative censoring at \(T\) is ignorable by design; other censoring relies on (sequential) independent censoring given \(\overline L_t,\overline M_t\).

2.5 Node-mapping table

Block \(t\)	Treatment nodes (shifted/intervened)	Time-varying \(L\)-nodes (enter estimation)	Monitoring / Censoring nodes	Notes
\(t=0\)	`A` (or `A_blk_0` for static-regimen estimands); optionally `PDC_0` if policy targets baseline adherence	`VL_imp_0`, `CD4_0`, `ENG_0`; initialize `S_0=0`, `D_0=0`	`M_0`, `C_0`	Baseline covariates belong in the baseline set; include here only if time-indexed.
\(t=1..K-1\)	`PDC_t` (add `A_blk_t` when intervening on regimen each block)	`VL_imp_t`, `CD4_t`, `ENG_t`, `S_t` (switch), `D_t` (start-of-gap / latent stop)	`M_t`, `C_t`, and `C_disc_t` when discontinuation is the endpoint	Primary strategy is treatment-policy (keep after `S_t`/`D_t` and adjust via \(L_t\)); sensitivities: while-on-treatment (censor at `S_t`/`D_t`), hypothetical (censor + model).

Conventions / implementation notes:

Discontinuation detection lag (\(L=90\) days).
When discontinuation is the outcome, record events at detection and truncate the risk set with C_disc_t = 1{ t > C_disc_month } (remove last \(L\) days). When discontinuation is an intercurrent event (not the endpoint), encode D_t at the start of the qualifying gap (latent stop) for strategy logic.
Switching (S_t) and discontinuation (D_t).
As intercurrent events, include both in \(L_t\) under the treatment-policy primary estimand. For while-on-treatment or hypothetical sensitivities, censor at S_t and/or D_t (handled via the censoring mechanism).
Monitoring (M_t) and observability (C_t).
Do not intervene on monitoring; include M_t in \(L_t\) so SDR/TMLE accounts for outcome-dependent visits. Reserve C_t for data observability (coverage/disenrollment/admin). Administrative censoring at the analysis horizon \(T\) is ignorable by design.
LMTP/TMLE coding (wide, block-level).
- trt = list(PDC_0, PDC_1, ..., PDC_{K-1}) (and include A_blk_t if intervening on regimen).
- time_vary = list(L_t ∪ {M_t, C_t} ∪ {C_disc_t when disc is endpoint}).
- No missingness in the working variables (VL_imp_t); keep M_t explicit for robustness.

2.6 Implementing different statistical estimands

Perfect adherence. Set PDC_t := 1 ∀ \(t\); contrast Drug A vs Drug B (baseline A=1 vs A=0 or static A_blk_t). Outcome = risk of failure by 36 months (composite). Implement with {lmtp} MTP (mtp=TRUE).
Imperfect adherence (observed). Leave PDC_t as observed, compare Drug A vs Drug B. For static regimen estimands, intervene on all A_blk_t; for baseline assignment estimands, intervene only at A and let switching follow the natural course, distinct estimands.
Forgiveness (constant-PDC) curve. For \(\alpha\in[\alpha_{\min},1]\), set PDC_t := α ∀ \(t\) and compute risk curves by drug; report min \(\alpha\) with risk ≤ target (e.g., 50%). Same nodes; set mtp=TRUE8

2.7 Simulate observed longitudinal data

rm(list=ls())
library(simstudy)   
library(data.table) 
library(here)

set.seed(123456)

# -----------------------------
# Global parameters
# -----------------------------
N_patients   <- 5000L
months_fu    <- 48                  # max follow-up months
month_len    <- 30                  # days per "month"
admin_end    <- months_fu            # admin censoring at 48 mo
p_instI      <- 0.55                 # marginal drug_A vs drug_B
rho_adh      <- 0.65                 # adherence persistence (AR1-like)
sigma_adh    <- 0.35                 # shock scale on adherence logit
base_supp_h  <- 0.015                # base monthly hazard of suppression
base_death_h <- 0.002                # base monthly hazard of death
lTFU_base_h  <- 0.01                 # base monthly hazard of loss to follow-up

# effect sizes (log-odds / log-hazard approximations)
beta_regimen_supp  <- 0.01     # drug_A small advantage on suppression
beta_regimen_gap   <- 0.60     # drug_A advantage scales with (1 - adherence); tune 0.4-0.8
beta_vl_gap <- -0.40   # negative => lowers VL for A only when adherence < 1
beta_vl_gap <- 0   # negative => lowers VL for A only when adherence < 1


beta_adh_supp      <-  1.20   # adherence → suppression
beta_vl_supp       <- -0.50   # higher baseline VL reduces suppression
beta_cd4_supp      <-  0.25   # higher baseline CD4 increases suppression

beta_adh_death     <- -0.40   # higher adherence reduces death hazard
beta_age_death     <-  0.015  # older → higher death hazards
beta_comorb_death  <-  0.20   # higher comorbidity → higher death hazards

beta_adh_lTFU      <- -0.50   # higher adherence reduces LTFU
beta_engage_lTFU   <- -0.40   # care engagement reduces LTFU

# pharmacy fill control
p_any_fill_month   <- 0.80     # prob(≥1 fill given adherence>0)
dispersion_days    <- 5        # variability around days-supply

2.7.1 Baseline cohort (members)

Define baseline distributions

def <- defData(varname = "sexM",     dist = "binary",  formula = 0.6)
def <- defData(def, "age",           dist = "normal",  formula = 44, variance = 10)
def <- defData(def, "riskMSM",       dist = "binary",  formula = 0.35)
def <- defData(def, "ins_cat", dist = "categorical", formula = ".55;.30;.15")
# simple comorbidity score
def <- defData(def, "cci",           dist = "poisson", formula = 1.2)
# baseline CD4, log-normal-ish via normal then transform
def <- defData(def, "cd4_raw",       dist = "normal",  formula = 450, variance = 120)
def <- defData(def, "cd4",           dist = "nonrandom", formula = "pmax(20, cd4_raw)")
# baseline log10 VL (copies/mL)
def <- defData(def, "log10vl_raw",   dist = "normal",  formula = 4.2, variance = 0.4)
def <- defData(def, "log10vl",       dist = "nonrandom", formula = "pmax(1.3, log10vl_raw)")

members <- genData(N_patients, def)
setDT(members)[, id := .I]

assign regimen (drug_A vs drug_B) with covariate-dependent probability

members[, linp_reg := qlogis(p_instI) +
          0.2*sexM - 0.1*riskMSM - 0.15*cci - 0.05*(age-44)/10 ]
members[, p_instI_i := plogis(linp_reg)]
members[, regimen := rbinom(.N, 1L, p_instI_i)]          # 1=drug_A, 0=drug_B
members[, regimen_lbl := fifelse(regimen==1, "drug_A", "drug_B")]

Enrollment start/end (claims-like)

members[, enroll_start := as.Date("2018-01-01") + sample(0:365, .N, TRUE)]
members[, enroll_end   := enroll_start + months_fu*30]

Examine patient data:

head(members)

2.7.2 Person-month panel

create person-month panel (0..months_fu-1)

panel <- members[, .(id, sexM, age, riskMSM, ins_cat, cci, cd4, log10vl, regimen, regimen_lbl,
                     enroll_start, enroll_end)]
panel <- addPeriods(panel, nPeriods = months_fu, idvars = "id", perName = "month")
setDT(panel)
panel[, month_date := enroll_start + month*month_len]

# Initialize time-varying states
panel[, `:=`(alive=1L, ltfu=0L, cens_admin = 0L,
             engaged=0L, dep_dx=0L, sub_use=0L,
             suppress=0L, death=0L)]

# adherence latent logit model parameters per person (random intercept)
members[, adh_intercept := rnorm(.N, qlogis(0.75), 0.7)]
members[, adh_slope     := rnorm(.N, 0.02, 0.05)]  # slow drift per month

# join into panel
panel <- members[panel, on = "id"]

# container for adherence [0,1]
panel[, adher := NA_real_]

2.7.3 Simulate longitudinal processes

For now, use simple Autoregressive 1-like process for adherence, this will be updated to use the full history

setorder(panel, id, month)

# helper: clamp to [0.001, 0.999]
clamp01 <- function(x) pmin(pmax(x, 0.001), 0.999)

# initialize month 0 adherence
panel[month==0,
      adher := plogis(adh_intercept + adh_slope*0 + rnorm(.N, 0, sigma_adh))]

# iterate months for adherence and confounders; then hazards for outcomes
for (m in 1:(months_fu-1)) {
  # previous adherence
  panel_m1 <- panel[month==m-1, .(id, adher_prev = adher, engaged_prev = engaged,
                                  dep_prev = dep_dx, sub_prev = sub_use,
                                  alive_prev = alive, ltfu_prev = ltfu, suppress_prev = suppress)]

  # merge
  setkey(panel, id, month)
  setkey(panel_m1, id)
  panel[month==m, c("adher_prev","engaged_prev","dep_prev","sub_prev","alive_prev","ltfu_prev","suppress_prev") :=
          panel_m1[.(id), .(adher_prev, engaged_prev, dep_prev, sub_prev, alive_prev, ltfu_prev, suppress_prev)]]

  # if already dead or ltfu, keep states and skip updates
  panel[month==m & (alive_prev==0 | ltfu_prev==1), `:=`(alive=alive_prev, ltfu=ltfu_prev,
                                                        adher=adher_prev, engaged=engaged_prev,
                                                        dep_dx=dep_prev, sub_use=sub_prev,
                                                        suppress=suppress_prev)]
  # update only for at-risk
  idx <- panel[month==m & alive_prev==1 & ltfu_prev==0, which=TRUE]
  if (length(idx)>0) {
    # adherence: AR1 on logit scale with regimen effect (say, drug_A regimens show slightly better adherence)
    lp_adh <- with(panel[idx],
                   qlogis( clamp01(adher_prev) ) * rho_adh +
                     0.15*regimen + adh_intercept + adh_slope*m + rnorm(length(idx), 0, sigma_adh))
    panel[idx, adher := clamp01(plogis(lp_adh))]

    # care engagement (monthly visit): logit with dependence on prior engage and adherence
    lp_eng <- with(panel[idx], -0.3 + 1.2*adher + 0.6*engaged_prev - 0.2*dep_prev - 0.15*sub_prev)
    panel[idx, engaged := rbinom(.N, 1L, plogis(lp_eng))]

    # depression dx (claims): persistence + small adherence effect
    lp_dep <- with(panel[idx], -2.0 + 0.6*dep_prev - 0.25*adher + 0.10*cci)
    panel[idx, dep_dx := rbinom(.N, 1L, plogis(lp_dep))]

    # substance use claims: persistence + weaker adherence effect
    lp_sub <- with(panel[idx], -1.6 + 0.7*sub_prev - 0.20*adher + 0.05*cci)
    panel[idx, sub_use := rbinom(.N, 1L, plogis(lp_sub))]

    # monthly hazards
    # suppression (virologic): baseline * exp(linear)
    # approximate hazard via complementary log-log or keep small and use Bernoulli
    lin_supp <- with(panel[idx],
                     log(base_supp_h) +
                       beta_regimen_supp*regimen +
                       beta_regimen_gap * regimen * (1 - adher) +    # A advantage larger when adherence < 1
                       beta_adh_supp * adher +
                       beta_vl_supp * (log10vl - 4.0) +
                       beta_cd4_supp * ((cd4 - 450)/200)
    )

    p_supp <- pmin(pmax(exp(lin_supp), 0), 0.5)  # keep monthly prob under control
    # if already suppressed, remain suppressed (carry forward)
    panel[idx, suppress := fifelse(suppress_prev==1, 1L, rbinom(.N, 1L, p_supp))]

    # death hazard
    lin_death <- with(panel[idx],
                      log(base_death_h) +
                        beta_adh_death*adher + beta_age_death*(age + m/12 - 44) +
                        beta_comorb_death*cci)
    p_death <- pmin(pmax(exp(lin_death), 0), 0.3)
    died <- rbinom(length(idx), 1L, p_death)
    panel[idx, death := fifelse(died==1, 1L, 0L)]
    # update alive
    panel[idx, alive := fifelse(death==1, 0L, 1L)]

    # LTFU hazard (if alive)
    lin_ltfu <- with(panel[idx],
                     log(lTFU_base_h) +
                       beta_adh_lTFU*adher + beta_engage_lTFU*engaged)
    p_ltfu <- pmin(pmax(exp(lin_ltfu), 0), 0.5)
    went_ltfu <- rbinom(length(idx), 1L, p_ltfu)
    panel[idx, ltfu := fifelse(alive==1 & went_ltfu==1, 1L, 0L)]
  }
}

# Administrative censoring (simple indicator)
panel[, cens_admin := as.integer(month >= admin_end)]
head(panel)

2.7.4 Derive pharmacy “fill” claims from adherence

We’ll synthesize monthly ART fills: if adher>0 and alive & not ltfu, then create one row with an approximate days_supply ~ adherence*30 (+ noise)

pharm <- panel[alive==1 & ltfu==0 & adher > 0 & runif(.N) < p_any_fill_month,
               .(id, month, month_date, regimen_lbl, adher)]

# days_supply with noise; clamp to [0, 30]
pharm[, days_supply := pmax(1L, pmin(30L, round(rnorm(.N, adher*month_len, dispersion_days))))]
# quantity is arbitrary here
pharm[, quantity := pmax(1L, round(days_supply / 1.0))]
# fake NDC by regimen
pharm[, ndc := fifelse(regimen_lbl=="drug_A", "00000-1111-22", "00000-9999-88")]
pharm[, drug_name := paste0("ART-", regimen_lbl)]
pharm[, paid_amount := round(50 + 3*days_supply + rnorm(.N, 0, 10), 2)]
setorder(pharm, id, month)

head(pharm)

2.7.5 Medical & lab claims (engagement, depression, substance use, VL/CD4)

Medical visits billed if engaged==1

med <- panel[alive==1 & ltfu==0 & engaged==1,
             .(id, svc_date = month_date, icd10 = "Z51.81", cpt="99213", pos="11")]  # simple placeholders

# Add depression/substance use dx claims occasionally
dep_claims <- panel[alive==1 & ltfu==0 & dep_dx==1 & runif(.N)<0.6,
                    .(id, svc_date = month_date, icd10 = "F32.9", cpt="99213", pos="11")]
sub_claims <- panel[alive==1 & ltfu==0 & sub_use==1 & runif(.N)<0.6,
                    .(id, svc_date = month_date, icd10 = "F19.10", cpt="99213", pos="11")]
med <- rbindlist(list(med, dep_claims, sub_claims), use.names = TRUE, fill = TRUE)
setorder(med, id, svc_date)
head(med)

Labs: viral load & CD4 quarterly if engaged; suppression reflected by lower VL

labs <- panel[alive==1 & ltfu==0 & engaged==1 & (month %% 3L == 0L),
              .(id, lab_date = month_date,
                loinc = c("HIVRNA","CD4")), by=.(id, month, month_date)]
setorder(labs, id, lab_date)
# Simulated values
# VL copies/mL (if suppress==1 → lower)
panel_key <- panel[, .(id, month, suppress, cd4)]

# Make names unique (id, id.1, ...)
data.table::setnames(labs, make.unique(names(labs)))

# Drop the extra id (keep the first one)
labs[, id.1 := NULL]

labs <- merge(labs, panel_key, by=c("id","month"), all.x=TRUE)
labs[, value := fifelse(loinc=="HIVRNA",
                        pmax(20, round(ifelse(suppress==1, rlnorm(.N, log(100), 0.4), rlnorm(.N, log(50000), 0.7)))),
                        pmax(10, round(pmax(10, rnorm(.N, cd4, 80)))))]
labs[, reference := fifelse(loinc=="HIVRNA", "<200 c/mL", "500-1500 cells/µL")]
labs[, loinc := fifelse(loinc=="HIVRNA", "LAB-HIVRNA", "LAB-CD4")]
labs <- labs[, .(id, lab_date, loinc, value, reference)]
setorder(labs, id, lab_date)

2.7.6 Create “claims-like” simulated dataset

members_tbl <- members[, .(id, sex = ifelse(sexM==1,"M","F"), age, riskMSM, ins_cat, cci,
                           cd4, log10vl, regimen_lbl, enroll_start, enroll_end)]

pharm_tbl   <- pharm[, .(id, fill_date = month_date, ndc, drug_name, days_supply, quantity, paid_amount)]
med_tbl     <- med[,   .(id, svc_date, icd10, cpt, pos)]
labs_tbl    <- labs

# core longitudinal analysis table (person-month panel)
panel_tbl   <- panel[, .(id, month, month_date, alive, ltfu, suppress, death,
                         adher, engaged, dep_dx, sub_use, regimen, regimen_lbl)]
head(panel_tbl)

2.7.7 Add additional baseline & longitudinal variables to match planned analysis

baseline_tbl <- members[, .(
  id,
  sex    = ifelse(sexM==1,"M","F"),
  age,
  riskMSM,
  ins_cat,
  cci_base = cci,
  cd4_base = cd4,
  vl_log10_base = log10vl,
  vl_copies_base = as.integer(pmax(20, round(10^log10vl))),
  regimen_base = regimen,
  regimen_lbl_base = regimen_lbl,
  enroll_start,
  enroll_end
)]

Time-updated CD4 & VL latent trajectories; regimen switching & discontinuation; disenrollment censoring

# Initialize monthly latent states at baseline (month 0)
panel[month==0, `:=`(
  cd4_t        = cd4,
  vl_copies_t  = pmax(20, round(10^log10vl)),
  vl_log10_t   = log10(pmax(20, round(10^log10vl))),
  reg_switched = 0L,
  discont      = 0L,
  suppress_prev = 0
)]


# Loop to update monthly latent states and events (building on the existing loop)
for (m in 1:(months_fu-1)) {

  # pull previous-month states (including latent labs & suppression) for all ids
  prev <- panel[month==m-1,
                .(id,
                  cd4_prev        = cd4_t,
                  vl_prev         = vl_copies_t,
                  vl_log10_prev   = vl_log10_t,
                  suppress_prev   = suppress,
                  engaged_prev    = engaged,
                  dep_prev        = dep_dx,
                  sub_prev        = sub_use,
                  alive_prev      = alive,
                  ltfu_prev       = ltfu,
                  regimen_prev    = regimen)]
  setkey(prev, id)
  setkey(panel, id, month)

  # join prev into current month
  panel[month==m, c("cd4_prev", "vl_prev", "vl_log10_prev",
                    "suppress_prev2","engaged_prev2","dep_prev2","sub_prev2",
                    "alive_prev2","ltfu_prev2","regimen_prev2") :=
          prev[.(id), .(cd4_prev, vl_prev, vl_log10_prev,
                        suppress_prev, engaged_prev, dep_prev, sub_prev,
                        alive_prev, ltfu_prev, regimen_prev)]]

  # only update at-risk (alive & not ltfu as of previous month)
  idx <- panel[month==m & alive_prev2==1 & ltfu_prev2==0, which=TRUE]
  if (length(idx)>0) {

    # ---- regimen switch hazard (switch more likely if not suppressed previously, with MH comorbidity dx)
    lp_switch <- with(panel[idx],
                      -3.0 + 0.8*(1 - suppress_prev2) + 0.20*dep_prev2 + 0.10*sub_prev2)
    switch_now <- rbinom(length(idx), 1L, plogis(lp_switch))
    panel[idx, reg_switched := switch_now]
    # flip regimen when switch
    panel[idx & reg_switched==1, regimen := 1L - regimen]
    panel[idx & reg_switched==1, regimen_lbl := fifelse(regimen==1,"drug_A","drug_B")]

    # ---- discontinuation hazard (if disengaged or substance use this month)
    lp_disc <- with(panel[idx],
                    -3.0 + 0.5*(1 - engaged) + 0.30*sub_use)
    disc_now <- rbinom(length(idx), 1L, plogis(lp_disc))
    panel[idx, discont := pmax(discont, disc_now)]
    # if discontinued this month, cut adherence & engagement
    panel[idx & discont==1, `:=`(adher = pmax(0, adher*0.2), engaged = 0L)]

    # ---- update CD4: rises if previously suppressed & adherent; small penalty otherwise
    cd4_mu <- with(panel[idx],  20*(suppress_prev2) + 10*adher - 5*(1 - suppress_prev2) - 2*(cci)/5 )
    cd4_sd <- 30
    cd4_new <- pmax(10, round(panel[idx]$cd4_prev + rnorm(length(idx), cd4_mu, cd4_sd)))
    panel[idx, cd4_t := cd4_new]

    # ---- update VL: declines if suppressed/adherent/drug_A; floor at 20 c/mL
    # work on log-scale for stability
    vl_mu_log <- with(panel[idx],
                      log(pmax(20, vl_prev)) +
                        (-0.40*suppress_prev) +
                        # regimen benefit only when adherence is imperfect:
                        (beta_vl_gap * regimen * (1 - adher)) +
                        (-0.20*adher) +
                        (0.25*(1 - suppress_prev))
    )
    vl_sd_log <- 0.40
    vl_log_new <- rnorm(length(idx), mean = vl_mu_log, sd = vl_sd_log)
    vl_copies_new <- pmax(20, round(exp(vl_log_new)))
    panel[idx, `:=`(vl_copies_t = vl_copies_new,
                    vl_log10_t  = log10(vl_copies_new))]
  }

  # ---- disenrollment censoring indicator (admin coverage end)
  panel[month==m, cens_disenroll := as.integer(month_date > enroll_end)]
  # treat disenrollment as censoring (mark LTFU if occurs while alive)
  panel[month==m & alive_prev2==1 & ltfu_prev2==0 & cens_disenroll==1, ltfu := 1L]

  # ---- optional competing-cause labeling for deaths that occurred this month
  # if 'death' already set in the earlier loop, label likely cause
  if ("death" %in% names(panel)) {
    idxd <- panel[month==m & death==1, which=TRUE]
    if (length(idxd)>0) {
      pr_other <- with(panel[idxd], plogis(-1 + 0.02*(age-44) + 0.30*cci + 0.10*sub_prev2))
      cause_other <- rbinom(length(idxd), 1L, pr_other)
      panel[idxd, death_cause := fifelse(cause_other==1, "Other", "HIV-related")]
    }
  }
}

head(panel)

# Ensure missing death_cause is NA where no death occurred
if (!"death_cause" %in% names(panel)) panel[, death_cause := NA_character_]
panel[death==0, death_cause := NA_character_]

# ---- 9c. Improved lab draws tied to latent monthly states (quarterly if engaged)
labs2 <- panel[alive==1 & ltfu==0 & engaged==1 & (month %% 3L == 0L),
               .(id, month, lab_date = month_date,
                 loinc = c("LAB-HIVRNA","LAB-CD4")),
               by=.(id, month, month_date)]
setorder(labs2, id, lab_date)

# Make names unique (id, id.1, ...)
data.table::setnames(labs2, make.unique(names(labs2)))

# Drop the extra id (keep the first one)
labs2[, id.1 := NULL]

# attach latent monthly states
labs2 <- merge(labs2,
               panel[, .(id, month, vl_copies_t, cd4_t, suppress)],
               by = c("id","month"), all.x = TRUE)

# produce observed values around latent with measurement noise
labs2[loinc=="LAB-HIVRNA",
      value := pmax(20, round(rlnorm(.N, log(vl_copies_t), 0.25)))]
labs2[loinc=="LAB-CD4",
      value := pmax(10, round(rnorm(.N, cd4_t, 50)))]

labs2[, reference := fifelse(loinc=="LAB-HIVRNA","<200 c/mL","500-1500 cells/µL")]
labs2 <- labs2[, .(id, lab_date, loinc, value, reference)]
setorder(labs2, id, lab_date)

# ---- 9d. Pharmacy-derived PDC (monthly & rolling 3-month)
# monthly PDC proxy = min(total days_supply in month / 30, 1)
pdc_month <- pharm[, .(pdc_m = pmin(1, sum(days_supply)/month_len)),
                   by = .(id, month)]
# join back to panel and fill 0 if no fill in month
panel <- merge(panel, pdc_month, by=c("id","month"), all.x=TRUE)
panel[is.na(pdc_m), pdc_m := 0]

# rolling 3-month PDC aligned to current month (requires data.table >= 1.12)
panel[, pdc3m := frollmean(pdc_m, n = 3L, align = "right", na.rm = TRUE), by = id]
panel[is.na(pdc3m), pdc3m := pdc_m]  # fallback early in follow-up

head(panel)

2.7.8 Save final data

# Enhanced longitudinal analysis table with latent labs & policy flags
panel_tbl_ext <- panel[, .(
  id, month, month_date,
  alive, ltfu, cens_admin, cens_disenroll,
  suppress, death, death_cause,
  adher, pdc_m, pdc3m,
  engaged, dep_dx, sub_use,
  regimen, regimen_lbl, reg_switched, discont,
  cd4_t, vl_copies_t, vl_log10_t
)]

# Replace labs_tbl with improved draws (keep old as labs_tbl if you need both)
labs_tbl_ext <- labs2


saveRDS(members, here("data/members.RDS"))
saveRDS(baseline_tbl, here("data/baseline_tbl.RDS"))
saveRDS(labs_tbl_ext, here("data/labs_tbl_ext.RDS"))
saveRDS(panel_tbl_ext, here("data/panel_tbl_ext.RDS"))

3 Step 3. Identifiability

Identifiability refers to whether the causal estimand of interest can be determined from the observed data under a set of assumptions. We assess the plausibility of these assumptions in the context of HIV adherence and virologic outcomes.

3.0.1 Consistency (SUTVA)

Stable Unit Treatment Value Assumption (SUTVA) requires that (i) each individual’s potential outcome depends only on their own treatment or adherence path (no interference), and (ii) different “versions” of the same treatment or adherence category are assumed to have the same effect on the outcome (treatment version irrelevance). Aka the potential outcome under a treatment/adherence/monitoring/censoring regime equals the observed outcome when the individual’s actual trajectory follows that regime.

No interference: One person’s regimen or adherence should not affect another person’s virologic outcome.
- Potential threats in HIV research: clinic-level practices (provider culture), peer effects (support groups), or household spillovers.
- Assessment: Because HealthVerity data do not include explicit networks, we assume minimal interference at the individual level, but we will conduct sensitivity analyses for clustering at the provider/region level if diagnostics suggest heterogeneity.
Treatment version irrelevance: All versions of a baseline regimen (e.g., Drug A) are assumed exchangeable.
- Threats: heterogeneity across providers, dose/formulation differences.
- Plan: Regimen definitions (Table 2) harmonize NDC/HCPCS codes to reduce version heterogeneity.

3.0.2 Exchangeability (No Unmeasured Confounding)

Conditional on measured baseline and time-varying history \(H_t\), assignment of baseline regimen \(A_0\), adherence \(Z_t\), monitoring \(M_t\), censoring \(C_t\), and switching \(S_t\) is independent of counterfactual outcomes.

Threats in HIV adherence data:
- Individual-level: motivation, cognitive function, underreported substance use, social support.
- Provider-level: provider expertise, clinic resources, patient mix.
- System-level: insurance access, neighborhood context.
Assessment strategies:
- Prespecify rich covariate adjustment (See observed data).
- DAG review to confirm sufficient adjustment sets.
- Sensitivity analyses:
  - E-values to quantify the strength of unmeasured confounding needed to explain observed effects.
  - Negative control outcomes (e.g., dental visits, unrelated preventive services) to detect residual bias.

3.0.3 Positivity (Data support)

For every covariate history \(H_t\) with positive density, there must be nonzero probability of both regimen initiation and adherence levels required by the policy \(g\), as well as monitoring and non-censoring.

Threats:
- Deterministic rules (contraindications, strict guidelines).
- Sparse data in subgroups (rare combinations of comorbidities, extreme adherence).
- Provider preferences producing near-deterministic prescribing.
Diagnostics:
- Propensity score distributions and overlap plots.
- Effective sample size after weighting.
- Clever covariate range diagnostics.
- Pre-specified truncation (e.g., 0.01/0.99).
Fallback: If mapping to the A-style adherence policy is infeasible for one arm, the prespecified symmetric gap-cap policy will be used.

3.0.4 Sequential Exchangeability for Time-Varying Exposures

Because adherence and monitoring vary over time, we assume sequential exchangeability: \[ Y^{\bar{a},g} \perp A_t, Z_t, M_t, C_t, S_t \mid H_t \] This assumption is addressed via LMTP-TMLE, which appropriately models longitudinal confounding.

3.0.5 Summary

Under consistency, exchangeability, positivity, and coarsening at random, the longitudinal g-formula identifies the causal risks of virologic failure under standardized adherence with enforced monitoring, no censoring, and no switching:

\[ \Psi_{\text{RD},\tau} = \mathbb{E}[Y_\tau \mid A_0=a_0, g, c=0, m=1, s=0] \]

Planned diagnostics include:
- Overlap plots and weight diagnostics for positivity.
- Negative control outcomes and E-values for exchangeability.
- Provider/region clustering checks for interference.

If assumptions are implausible, we will report estimates as associations, accompanied by sensitivity analyses (Step 6).

3.0.6 Note on observed data and coarsening due to the detection lag

Discontinuation is only detectable after a \(\ge L\) day no-fill gap (here \(L=90\)). Let \(T^{\text{disc}}\) be the latent stop (gap start) and \(T^{\text{disc,obs}}=T^{\text{disc}}+L\) the detection time. Define monthly counting and at-risk processes:

Switch: \(N_k^{\text{sw}}=\mathbb{1}\{T^{\text{sw}}\le k\}\), \(R_k^{\text{sw}}=\mathbb{1}\{k<C,\ k<T^{\text{disc}}\}\).
Discontinuation (as detected): \(N_k^{\text{disc}}=\mathbb{1}\{T^{\text{disc,obs}}\le k\}\), \(R_k^{\text{disc}}=\mathbb{1}\{k<C-L,\ k<T^{\text{sw}}\}\).

Thus the last \(L\) days of follow-up are removed from the discontinuation risk set (administrative truncation) and discontinuation events are coded at detection. For intercurrent-event strategies, also encode \(D_k=\mathbb{1}\{k=T^{\text{disc}}\}\) (gap start) for strategy logic, distinct from censoring \(C_k\) (observability).

4 Step 4. Statistical Estimand

Having established identifiability in Step 3 (including the discontinuation detection-lag mapping and risk-set truncation), we now define the statistical estimands, functionals of the observed data distribution corresponding to our causal questions.

4.0.1 Identification formula

Under consistency, sequential exchangeability, and positivity, the target risk or CIF at horizon \(\tau\) is a functional of the observed data distribution. For baseline contrasts (Drug A vs B) one may write, abstractly, \[ \Psi_{\text{RD},\tau} \;=\; \int E\!\left[Y_\tau \mid A_0=1, X=x\right]\,dP(x) \;-\; \int E\!\left[Y_\tau \mid A_0=0, X=x\right]\,dP(x), \] with the understanding that, in our longitudinal setting, \(X\) summarizes baseline and time-varying histories and that discontinuation events are recorded at detection while discontinuation risk is truncated in the last \(L=90\) days.

For competing-risk endpoints, the statistical estimand for cause \(k\) is the marginal CIF \[ F_r^{(k)}(t) \;=\; \mathbb{E}\!\left[ \sum_{u \le t} \lambda_k(u \mid H_{u-1}) \prod_{s<u} \{1-\lambda_{\text{disc}}(s \mid H_{s-1}) - \lambda_{\text{switch}}(s \mid H_{s-1})\} \right], \] with the discontinuation risk-set indicator \(R_u^{\text{disc}}=\mathbb{1}\{u < C-L,\ u<T^{\text{switch}}\}\) implicit in the hazards.

4.0.2 4.3 Primary and secondary estimands

Primary (virologic failure by \(T\)).
Risk difference (and RR) at \(T\in\{12,36\}\) for Drug A vs Drug B under the prespecified adherence policy (e.g., A-style or gap-cap fallback), with treatment-policy handling of switching/discontinuation in the primary analysis and immortal-time trimming applied wherever discontinuation is an endpoint.
Secondary (forgiveness).
\(\psi_r(t;\mathcal I)\) under LMTP policies \(\mathcal I\) (dynamic thresholds or stochastic shifts of adherence), contrasting regimens under the same \(\mathcal I\).

4.0.3 4.4 Binary outcome metrics

Primary contrast: risk difference; secondaries: risk ratio and RMST up to \(T\). (Cox HRs reported only as exploratory due to NPH/censoring dependence.)

4.0.4 4.5 Nuisance functions and learning

Outcome/cause-specific hazards, adherence/treatment, monitoring, and censoring mechanisms are estimated with SuperLearner using full covariate history \(\bar L_t\). For simulation speed, the SuperLearner library is a small set including simple means, regressions, penalized regressions, and random forest models.

5 Step 5. Statistical Model and Estimator

5.1 Statistical Model

We consider the observed longitudinal data
\(O = (W, A_0, \bar{L}_t, \bar{Z}_t, \bar{M}_t, \bar{C}_t, \bar{S}_t, \bar{D}_t, Y_\tau)\),
with temporal ordering defined in Step 1. The statistical model is semi-parametric:

Constraints: \(Z_t \in [0,1]\); \(M_t, C_t, S_t, D_t, Y_t \in \{0,1\}\).
Support: all nodes bounded within their natural domains.
Time: discrete blocks of 90 days (primary), with sensitivity analyses at 30 and 180 days.
Administrative censoring: truncated at 36 months.

This model makes no additional parametric assumptions; flexible machine learning methods will be used for nuisance estimation.

5.2 Primary Estimator

The primary estimator will be the LMTP-TMLE implemented via the lmtp package:

TMLE chosen as the primary because it is a substitution estimator that respects bounds (probabilities in [0,1]).
Sequentially doubly robust: consistent if either outcome regression or treatment/censoring mechanism is correctly specified.

5.3 Nuisance Estimation

We will estimate:

Outcome regression: \(\Pr(Y_t=1 \mid H_t)\).
Adherence mechanism under \(g\): \(\Pr(Z_t \mid H_t)\).
Monitoring and censoring: \(\Pr(M_t=1 \mid H_t)\), \(\Pr(C_t=0 \mid H_t)\).
Switching process: \(\Pr(S_t=0 \mid H_t)\) when applicable.

All nuisance functions will be estimated using Super Learner with 10-fold cross-validation.
Libraries will include: GLMs, penalized regression (glmnet), and random forests.

5.4 Comparator Estimators

For benchmarking:

LMTP-SDR (sequentially doubly robust, not a substitution estimator).
IPTW (stabilized, truncated).
Parametric g-computation (GLM-based).
Cox proportional hazards regression

Note: IPTW and g-computation not yet implemented in the simulation study

5.5 LMTP Settings and History Depth

Control settings: lmtp_control(.trim = 0.999, .bound = 1e5, .learners_outcome_folds = 10, .learners_trt_folds = 10).
History depth (k): default is infinite (full history), but as a comparison in the low-adherence analysis, we will set k=1, a finite Markov process (e.g., current status only affected by past 1 time node).

5.6 Table 3. Statistical Estimators

Estimator	Role in Analysis	Required Nuisance Inputs	Strengths	Limitations
LMTP-TMLE	Primary estimator for causal risk differences under standardized adherence policies	Outcome regression, treatment/adherence mechanism (with `mtp=TRUE`), censoring, monitoring, switching models	• Substitution estimator (respects bounds) • Sequentially doubly robust • Compatible with machine learning via Super Learner • Handles dynamic/stochastic policies	• Computationally intensive • Requires careful weight trimming and control tuning
LMTP-SDR	Comparator estimator; sensitivity to robustness across approaches	Same as LMTP-TMLE	• Sequentially doubly robust • More efficient in some settings	• Not a substitution estimator • May yield predictions outside [0,1]
IPTW for MSMs	Benchmark estimator for transparency and comparison	Treatment/adherence mechanism, censoring/monitoring models	• Simple to implement and interpret • Useful as baseline comparison	• Sensitive to model misspecification • Prone to extreme weights, especially with time-varying confounding
Parametric g-computation	Benchmark estimator for comparison	Outcome regression models	• Intuitive standardization approach • Straightforward to implement	• Relies heavily on correct parametric specification • Poor performance if model misspecified
Cox proportional hazards regression	Descriptive only (non-causal benchmark)	None beyond standard covariates	• Familiar to clinicians and regulators • Provides hazard ratios for context	• Not a causal estimator • HR lacks clear causal interpretation under non-proportional hazards

5.7 Estimator Summary

LMTP-TMLE is prespecified as the primary estimator because it aligns most closely with the causal model, supports flexible machine learning for nuisance adjustment, and respects parameter bounds. LMTP-SDR, IPTW, and parametric g-computation will serve as comparator estimators to benchmark robustness and performance across methods, while Cox regression is included only for descriptive purposes given its widespread familiarity but lack of causal interpretability. Together, this suite of estimators ensures transparency, facilitates diagnostic checks, and allows clear communication of causal results alongside more traditional approaches.

6 Step 6: Estimation

Load simulated data:

6.1 Define time nodes

Now, mark which block each observation is in:

K <- 6                                # number of blocks
followup <- 36                         # length of followup
block_len <- ceiling(36 / K)          # months per block (36 months -> 6 blocks of 4)
stopifnot(block_len * (K-1) < 36)     # basic sanity
blocks <- 0:(K - 1)


# Start from monthly panel

pm <- copy(panel_tbl_ext)   # must include: id, month, pdc_m, vl_log10_t, cd4_t, engaged, ltfu, cens_disenroll, suppress
pm[, block := pmin(month %/% block_len, K-1)]
head(pm)

6.1.1 Define censoring/monitoring indicator

per block: Define C = 1 if ‘observed/covered’ in block; 0 otherwise C=1 if observed/covered in the block (no LTFU and not disenrolled at any month in that block), else 0. {lmtp} expects censoring nodes coded 0/1 with 1 = “observed next time”; once 0, it cannot revert to 1. Here, the logic is observed=1 if any month in block has ltfu==0 AND (is.na(cens_disenroll) or cens_disenroll==0). Also define death=1 if died in block (ltfu==0 & death==1 in any month of block); else 0.

blkC <- pm[, .(
  C = as.integer(any(ltfu == 0 & (is.na(cens_disenroll) | cens_disenroll == 0))),
  death = as.integer(any(ltfu == 0 &  death == 1))
), by = .(id, block)]
head(blkC)

6.1.2 Code exposure and time-varying covariate nodes

Here, we code our exposure variable PDC_b := mean monthly Percent Days Covered in block, bounded between 0 and 1.

blk_exp <- pm[, .(PDC = pmin(1, mean(pdc_m, na.rm = TRUE))),
              by = .(id, block)]

head(blk_exp)

We also code our L nodes: * VL_log10 := mean latent VL * CD4 := mean latent CD4 * ENG := mean engagement with care (proportion of months engaged in block), a measure of the monitoring process

blk_L <- pm[, .(
  VL_log10 = mean(vl_log10_t, na.rm = TRUE),
  CD4      = mean(cd4_t,       na.rm = TRUE),
  ENG      = mean(engaged,     na.rm = TRUE)
), by = .(id, block)]

head(blk_L)

6.2 Assemble wide dataset

Now we reshape the data to wide format (one row per patient) for {lmtp} format: spread each block’s measures to columns such as PDC_0..PDC_5, VL_0..VL_5, CD4_0..CD4_5, ENG_0..ENG_5, C_0..C_5, and final outcome Y.

blk_all <- Reduce(function(x,y) merge(x,y, by=c("id","block"), all=TRUE),
                  list(blk_exp, blk_L, blkC))

blk_all <- merge(
  CJ(id = unique(pm$id), block = 0:(K-1)),  # full grid
  blk_all, by = c("id","block"), all.x = TRUE
)

head(blk_all)

6.2.1 Missing data imputation

Fill NA’: - PDC: if truly missing in block -> set to 0 - VL_log10, CD4: carry-forward/back-fill within id; fallback to baseline - ENG: carry-forward/back-fill; fallback to 0

blk_all[is.na(PDC), PDC := 0]

# Pull baseline fallbacks
fallback <- baseline_tbl[, .(id, cd4_base, vl_log10_base = vl_log10_base)]
blk_all <- merge(blk_all, fallback, by = "id", all.x = TRUE)

# carry-forward/back-fill within id
cols_L <- c("VL_log10","CD4","ENG")
setorder(blk_all, id, block)

for (v in cols_L) {
  blk_all[, (v) := na.locf(get(v), na.rm = FALSE), by = id]
  blk_all[, (v) := na.locf(get(v), fromLast = TRUE, na.rm = FALSE), by = id]
}

# fallback for any remaining NA
blk_all[is.na(VL_log10), VL_log10 := vl_log10_base]
blk_all[is.na(CD4),      CD4      := cd4_base]
blk_all[is.na(ENG),      ENG      := 0]

blk_all[, c("cd4_base","vl_log10_base") := NULL]

# Also ensure C has no NA: if still NA (no info), set C := 0 (not observed)
blk_all[is.na(C), C := 0L]

6.2.2 Reshape to wide

Wlong <- dcast(blk_all, id ~ block,
               value.var = c("PDC","VL_log10","CD4","ENG","C", "death"))

# Baseline: pick covariates + baseline drug
base <- baseline_tbl[, .(id, age, sex = as.integer(sex=="M"), cd4_base, A = regimen_base)]

# Outcome Y: failed suppression at end of follow-up (last month)
last_supp <- pm[, .SD[which.max(month)], by = id][, .(id, suppress)]
Y_dt <- last_supp[, .(id, Y = as.integer(suppress == 0L))]

wide <- Reduce(function(x, y) merge(x, y, by="id", all=FALSE),
               list(base, Wlong, Y_dt))

6.2.3 Build longitudinal outcome

\((Y_0..Y\_{K-1})\) and monitoring \((M_0..M\_{K-1})\)

Y per block: any monthly suppression in the block (suppress == 1).

Make Y the first-event with event_locf().

M per block: any VL measured in the block (!is.na(vl_log10_t)).

# block-level "any suppression" in the block
y_blk <- pm[, .(Y = as.integer(any(suppress == 1, na.rm = TRUE))),
            by = .(id, block)]

# block-level monitoring (1 if any VL was recorded in block)
mon_blk <- pm[, .(M = as.integer(any(!is.na(vl_log10_t), na.rm = TRUE))),
              by = .(id, block)]

# wide Y
y_wide_raw <- dcast(y_blk, id ~ block, value.var = "Y", fill = 0)
setnames(y_wide_raw, as.character(blocks), paste0("Y_", blocks))

# wide M
m_wide <- dcast(mon_blk, id ~ block, value.var = "M", fill = 0)
setnames(m_wide, as.character(blocks), paste0("M_", blocks))

# Make Y monotone (first suppression carried forward)
Y_mat <- as.matrix(y_wide_raw[, -1, with = FALSE])  # drop id
Y_mon <- lmtp::event_locf(data=Y_mat, outcomes = colnames(Y_mat))
y_wide <- cbind(id = y_wide_raw$id, as.data.table(Y_mon))

6.2.4 Set baseline covariates and merge all wide pieces

Replace single final Y with longitudinal \(Y_0..Y\_{K-1}\).

Include monitoring \(M_0..M\_{K-1}\).

Keep censoring \(C_0..C\_{K-1}\) and L’s from Wlong.

# Baseline covariates + baseline regimen
base <- baseline_tbl[, .(id, age, sex = as.integer(sex == "M"), cd4_base, A = regimen_base)]

# Merge into final wide dataset for survival analysis
wide <- Reduce(function(x, y) merge(x, y, by = "id", all = FALSE),
               list(base, Wlong, y_wide, m_wide))

# Optional sanity checks
stopifnot(all(paste0("Y_", blocks) %in% names(wide)))
stopifnot(all(paste0("M_", blocks) %in% names(wide)))
stopifnot(all(paste0("C_", blocks) %in% names(wide)))

# lmtp requires data.frame
wide <- as.data.frame(wide)

#save clean data
saveRDS(wide, here("data/lmtp_ready_wide_data.RDS"))

6.3 Setup LMTP analysis

A_is_char <- is.character(wide$A) || is.factor(wide$A)


## Treatment nodes (A at time 0 + PDC_0...PDC_K)
P_cols <- grep("^PDC_\\d+$", names(wide), value = TRUE)
Kmax <- max(as.integer(sub("^PDC_", "", P_cols)), na.rm = TRUE)
trt_nodes <- as.list(c("A", paste0("PDC_", 0:Kmax)))
trt_nodes[[1]] <- c("A","PDC_0")   # time-0 joint: baseline drug A plus PDC_0

## Time-varying L nodes *before* each block’s exposure (no C_* here)
## Use VL_k, CD4_k, and M_k (monitoring)
Lnodes <- lapply(0:Kmax, function(k) c(paste0("VL_log10_", k), paste0("CD4_", k), paste0("M_", k)))

## Lightweight SL for demo
sl_lib <- "SL.glm"

## choose the horizon you want; here, last block
Tidx <- max(as.integer(sub("^Y_", "", grep("^Y_\\d+$", names(wide), value = TRUE))))


## If Kmax == 5 (i.e., 0..5 present)
trt_nodes <- c(list(c("A","PDC_0")), as.list(paste0("PDC_", 1:5)))

colnames(wide)

##  [1] "id"         "age"        "sex"        "cd4_base"   "A"         
##  [6] "PDC_0"      "PDC_1"      "PDC_2"      "PDC_3"      "PDC_4"     
## [11] "PDC_5"      "VL_log10_0" "VL_log10_1" "VL_log10_2" "VL_log10_3"
## [16] "VL_log10_4" "VL_log10_5" "CD4_0"      "CD4_1"      "CD4_2"     
## [21] "CD4_3"      "CD4_4"      "CD4_5"      "ENG_0"      "ENG_1"     
## [26] "ENG_2"      "ENG_3"      "ENG_4"      "ENG_5"      "C_0"       
## [31] "C_1"        "C_2"        "C_3"        "C_4"        "C_5"       
## [36] "death_0"    "death_1"    "death_2"    "death_3"    "death_4"   
## [41] "death_5"    "Y_0"        "Y_1"        "Y_2"        "Y_3"       
## [46] "Y_4"        "Y_5"        "M_0"        "M_1"        "M_2"       
## [51] "M_3"        "M_4"        "M_5"

Y_nodes <- paste0("Y_", 0:Tidx)
C_nodes <- paste0("C_", 0:Tidx)
death_nodes <- paste0("death_", 0:Tidx)


## helper: returns a named list as lmtp expects (one entry per 'trt' element)
set_cols <- function(data, trt, new_A = NULL, new_PDC = NULL){
  out <- vector("list", length(trt)); names(out) <- trt
  for(nm in trt){
    if(nm == "A" && !is.null(new_A))                out[[nm]] <- rep(new_A, nrow(data))
    else if(grepl("^PDC", nm) && !is.null(new_PDC)) out[[nm]] <- new_PDC[[nm]]
    else                                            out[[nm]] <- data[[nm]]
  }
  out
}



gap_vec <- sapply(0:Kmax, function(k){
  pdc_col <- paste0("PDC_", k)
  mean(wide[[pdc_col]][wide$A == 1], na.rm = TRUE) -
    mean(wide[[pdc_col]][wide$A == 0], na.rm = TRUE)
})
names(gap_vec) <- paste0("PDC_", 0:Kmax)

## block-wise adherence remap to Drug-A style (cap to [0,1])
Astyle_PDC <- function(data, trt){
  L <- lapply(trt[grepl("^PDC", trt)], function(col){
    pmin(1, pmax(0, data[[col]] + gap_vec[col]))
  })
  names(L) <- trt[grepl("^PDC", trt)]
  L
}

## shift functions: set Drug to A (1) or B (0), and apply A-style PDC
A_Astyle <- function(data, trt){
  new_A   <- if (A_is_char) 1 else 1   # ensure we set numeric 1; see note below
  set_cols(data, trt, new_A = new_A, new_PDC = Astyle_PDC(data, trt))
}
B_Astyle <- function(data, trt){
  new_A   <- if (A_is_char) 0 else 0   # numeric 0 for B
  set_cols(data, trt, new_A = new_A, new_PDC = Astyle_PDC(data, trt))
}

6.4 Run longitudinal TMLE analysis with LMTP

#survival outcome - tmle
fit_A_Astyle <- lmtp_tmle(
  data             = as.data.frame(wide),
  trt              = trt_nodes,
  outcome          = paste0("Y_", 0:Tidx),
  baseline         = c("age","sex","cd4_base"),
  time_vary        = Lnodes,
  cens             = C_nodes,
  compete          = death_nodes,
  shift            = A_Astyle,                   # force index drug A + A-style adherence
  mtp              = TRUE,                       # continuous PDC
  outcome_type     = "survival",
  folds            = 2,
  learners_trt     = sl_lib,
  learners_outcome = sl_lib
)

fit_B_Astyle <- lmtp_tmle(
  data             = as.data.frame(wide),
  trt              = trt_nodes,
  outcome          = paste0("Y_", 0:Tidx),
  baseline         = c("age","sex","cd4_base"),
  time_vary        = Lnodes,
  cens             = C_nodes,
  compete          = death_nodes,
  shift            = B_Astyle,                   # force index drug B + A-style adherence
  mtp              = TRUE,
  outcome_type     = "survival",
  folds            = 2,
  learners_trt     = sl_lib,
  learners_outcome = sl_lib
)

6.5 Interpret results

cat("\nDrug A vs Drug B    *  Drug-A adherence pattern (imperfect policy)\n")

## 
## Drug A vs Drug B    *  Drug-A adherence pattern (imperfect policy)

print(lmtp_contrast(fit_A_Astyle, ref = fit_B_Astyle, type = "additive"))

## 
##   shift   ref estimate std.error conf.low conf.high p.value
## 1 0.173 0.194  -0.0215    0.0103  -0.0418  -0.00122  0.0377

print(lmtp_contrast(fit_A_Astyle, ref = fit_B_Astyle, type = "rr"))

## 
##   shift   ref estimate std.error conf.low conf.high p.value
## 1 0.173 0.194    0.889    0.0501    0.791     0.988  <0.001

We see a significant absolute (RD = -0.022) and relative (RR = 0.89) reduction in 36-month virologic failure risk for Drug A vs Drug B under the imperfect adherence policy that remaps each arm to Drug-A style adherence to control for adherence differences.

6.6 Simulation study results

6.6.1 Question 1: Effect of low versus high prior adherence

sim_res_rr = sim_res_rr %>% mutate(estimator = case_when(
  estimator == "Cox" ~ "Cox-HR",
  estimator == "TMLE" ~ "TMLE-Full History",
  estimator == "TMLE-markov" ~ "TMLE-Markov process"
),estimator=factor(estimator, levels=c("Cox-HR","TMLE-Markov process","TMLE-Full History")))


ggplot(sim_res_rr, aes(x=estimator, y=log(estimate)-log(1/1.651221), color=estimator)) +
  geom_boxplot() +
  geom_jitter(height=0, width=0.3) +
  coord_cartesian(ylim=c(-0.6,0.8)) +
  geom_hline(yintercept =   0, linetype="dashed", color="red") +
  labs(title="Simulation results: bias in relative risk\nfor low vs. high adherence in Drug-A initiators", y="log-Bias (Relative risk)", x="Estimation Method") +
  theme_minimal() +
  theme(legend.position = "none")

6.6.2 Question 2: Effect of low versus high prior adherence

#bias plot

ggplot(sim_res_rr, aes(x=estimator, y=log(estimate)-log(0.8664666), color=estimator)) +
  geom_boxplot() +
  geom_jitter() +
  #scale_y_log10() +
  geom_hline(yintercept =   0, linetype="dashed", color="red") +
  labs(title="Simulation results: 36 Month Failed Suppression\nUnder Observed Adherence of Drug-A", y="log-Bias (Relative risk)", x="Estimation Method") +
  theme_minimal() +
  theme(legend.position = "none")

6.6.3 Question 3: Forgiveness curve estimation

The goal of the forgiveness curve is to estimate an adherence-response curve and identify if there is a level of suboptimal adherence above which the failure risk of viral load suppression is sufficiently low. These forgiveness curves are estimated via longitudinal modified treatment policies that shift adherence toward a target level (e.g., 50% PDC) with a specified strength (e.g., alpha=0.5). Here, we illustrate the setup for such an analysis and plot initial results. The modified treatment policies help avoid estimation issues due to data sparsity, but methods are still in refinement to better estimate the curves when there are realistic positivity issues across the range of observed adherence.

wide=readRDS(here("data/lmtp_ready_wide_data.RDS"))

#temp subset for speed
#wide <- wide[1:500, ]

A_is_char <- is.character(wide$A) || is.factor(wide$A)


## Treatment nodes (A at time 0 + PDC_0...PDC_K)
P_cols <- grep("^PDC_\\d+$", names(wide), value = TRUE)
Kmax <- max(as.integer(sub("^PDC_", "", P_cols)), na.rm = TRUE)
trt_nodes <- as.list(c("A", paste0("PDC_", 0:Kmax)))
trt_nodes[[1]] <- c("A","PDC_0")   # time-0 joint: baseline drug A plus PDC_0

## Time-varying L nodes *before* each block’s exposure (no C_* here)
## Use VL_k, CD4_k, and M_k (monitoring)
Lnodes <- lapply(0:Kmax, function(k) c(paste0("VL_log10_", k), paste0("CD4_", k), paste0("M_", k)))

## Lightweight SL for demo
sl_lib <- "SL.glm"

## choose the horizon you want; here, last block
Tidx <- max(as.integer(sub("^Y_", "", grep("^Y_\\d+$", names(wide), value = TRUE))))


## If Kmax == 5 (i.e., 0..5 present)
trt_nodes <- c(list(c("A","PDC_0")), as.list(paste0("PDC_", 1:5)))

colnames(wide)





Y_nodes <- paste0("Y_", 0:Tidx)
C_nodes <- paste0("C_", 0:Tidx)
death_nodes <- paste0("death_", 0:Tidx)




## helper: returns a named list as lmtp expects (one entry per 'trt' element)
set_cols <- function(data, trt, new_A = NULL, new_PDC = NULL){
  out <- vector("list", length(trt)); names(out) <- trt
  for(nm in trt){
    if(nm == "A" && !is.null(new_A))                out[[nm]] <- rep(new_A, nrow(data))
    else if(grepl("^PDC", nm) && !is.null(new_PDC)) out[[nm]] <- new_PDC[[nm]]
    else                                            out[[nm]] <- data[[nm]]
  }
  out
}



gap_vec <- sapply(0:Kmax, function(k){
  pdc_col <- paste0("PDC_", k)
  mean(wide[[pdc_col]][wide$A == 1], na.rm = TRUE) -
    mean(wide[[pdc_col]][wide$A == 0], na.rm = TRUE)
})
names(gap_vec) <- paste0("PDC_", 0:Kmax)

## block-wise adherence remap to Drug-A style (cap to [0,1])
Astyle_PDC <- function(data, trt){
  L <- lapply(trt[grepl("^PDC", trt)], function(col){
    pmin(1, pmax(0, data[[col]] + gap_vec[col]))
  })
  names(L) <- trt[grepl("^PDC", trt)]
  L
}




#Shrink PDC to set amount

point_shrink <- function(data, trt, alpha = 0.0, PDC=0.5) {
  # a* = (1 - alpha)*a + alpha*0.5 ; alpha in (0,1) pulls toward 0.5 but preserves support
  n <- nrow(data)
  L <- lapply(trt[grepl("^PDC", trt)], function(col) {
    a <- data[[col]]
    pmin(1, pmax(0, (1 - alpha) * a + alpha * PDC))
  })
  names(L) <- trt[grepl("^PDC", trt)]
  L
}
A_point_shrink <- function(data, trt) set_cols(data, trt, new_A = 1, new_PDC = point_shrink(data, trt, alpha = 0.8))
B_point_shrink <- function(data, trt) set_cols(data, trt, new_A = 0, new_PDC = point_shrink(data, trt, alpha = 0.8))



## --- MTP shift toward a chosen target tau with alpha=0.8 ---
point_toward <- function(data, trt, tau, alpha = 0.8) {
  L <- lapply(trt[grepl("^PDC", trt)], function(col) {
    a <- data[[col]]
    pmin(1, pmax(0, (1 - alpha) * a + alpha * tau))
  })
  names(L) <- trt[grepl("^PDC", trt)]
  L
}

## Factory to build the LMTP 'shift' function for a given tau and drug (A=1 or 0)
make_mtp_toward <- function(tau, alpha = 0.8, drug = 1L) {
  force(tau); force(alpha); force(drug)
  function(data, trt) {
    new_PDC <- point_toward(data, trt, tau = tau, alpha = alpha)
    new_A   <- if (A_is_char) drug else drug
    set_cols(data, trt, new_A = new_A, new_PDC = new_PDC)
  }
}


library(data.table)
library(ggplot2)

## Grid of target adherences (adjust as desired)
targets <- seq(0.10, 0.90, by = 0.05)

## Wrapper to fit LMTP-SDR risk under a given (tau, drug)
fit_under_target <- function(tau, drug) {
  shift_fun <- make_mtp_toward(tau = tau, alpha = 0.5, drug = drug)
  lmtp_sdr(
    data             = as.data.frame(wide),
    trt              = trt_nodes,
    outcome          = paste0("Y_", 0:Tidx),   # final risk at horizon Tidx
    baseline         = c("age","sex","cd4_base"),
    time_vary        = Lnodes,
    shift            = shift_fun,
    mtp              = TRUE,                 # continuous PDC intervention
    outcome_type     = "survival",
    folds            = 2,
    learners_trt     = sl_lib,
    learners_outcome = sl_lib
  )
}

summary(wide$PDC_0)
summary(wide$PDC_5)


## Compute risks across the grid for Drug A and Drug B
targets <- c(0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75,  0.8, 0.85,0.875, 0.9,0.925, 0.95, 0.975, 1)

fits_B=fits_A=list()
for(i in 1:length(targets)){
  cat(i,"\n")
  resA=NULL
  resB=NULL
  resA=try(fit_under_target(targets[i], drug = 1))
  resB=try(fit_under_target(targets[i], drug = 0))
  fits_A[[i]] =resA
  fits_B[[i]] =resB
}

library(data.table)
library(zoo)
library(here)
library(lmtp)

## Compute risks across the grid for Drug A and Drug B
targets <- c(0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75,  0.8, 0.85,0.875, 0.9,0.925, 0.95, 0.975, 1)

#load results for speed
fits_A= readRDS(file=here("results/forgiveness_mtp_05_A.RDS"))
fits_B = readRDS(file=here("results/forgiveness_mtp_05_B.RDS"))



## Tidy results
res_A=NULL
for(i in 1:length(fits_A)){
  res=NULL
  if(fits_A[[i]] %>% class() == "try-error"){
    res=data.frame(estimate=NA, std.error=NA, conf.low=NA, conf.high=NA, alpha=targets[i], drug="A")
  }else{
    res=tidy(fits_A[[i]]) %>% mutate( alpha=targets[i], drug="A")

  }
  res_A=bind_rows(res_A, res)
}


res_B=NULL
for(i in 1:length(fits_B)){
  res=NULL
  if(fits_B[[i]] %>% class() == "try-error"){
    res=data.frame(estimate=NA, std.error=NA, conf.low=NA, conf.high=NA, alpha=targets[i], drug="A")
  }else{
    res=tidy(fits_B[[i]]) %>% mutate( alpha=targets[i], drug="B")

  }
  res_B=bind_rows(res_B, res)
}



## Combined
res_all <- rbind(res_A, res_B) %>%
  rename(theta = estimate,
         lcl = conf.low,
         ucl = conf.high,
         target = alpha) %>%
  as.data.table()


ggplot(res_all, aes(x = target, y = theta, color = drug, fill = drug, group=drug)) +
  geom_smooth(size = 2, se=F) +
  geom_ribbon(aes(ymin = lcl, ymax = ucl), alpha = 0.15, color = NA) +
  coord_cartesian(xlim=c(0.4,1), ylim=c(0,0.95), expand = FALSE) +
  labs(title = "Forgiveness curves under an MTP(α=0.5) shift toward target PDC",
       subtitle = "Each curve shows the cumulative incidence after 36 months under Drug A (red)\nor Drug B (blue) with all PDC_t shifted 50% toward target.") +
  xlab("Target Percent Days Covered") + ylab("Cumulative incidence") +
  theme_bw() +
  theme(legend.position = "none",
        margin = margin(0, 0, 0, 0),
        plot.title = element_text(face = "bold"))

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

HIV Adherence Causal Roadmap Analysis Plan, LMTP tutorial, and Simulation results

Andrew Mertens