Overview of study

Outcome is a binary variable, viral load suppression or engagement in care. Each treated patient is matched to a control patient by propensity score. We have two observations for each patient: at one year prior to treatment, and at one year post-treatment. For control patients, the two observations are relative to an “anchor” date, randomly selected at a time when the control patient was eligible for CCP treatment, with probabilities such that the distribution of anchor dates in the control group is similar to the distribution of actual enrollments in the treatment group.

Approach 1: conditional logistic regression without baseline matching

We can use a conditional logistic regression to account for differing probabilities of the outcome between patients:

\[ logit [ P(Y_{i, post, trt}) ] = \alpha_i + \beta_{secular} I(post=1) + \beta_{trt} I(trt=1) \]

where: \[ \begin{align} i &= \textrm{match identifier} \\ post &= \textrm{time variable, 0 for pre-treatment and 1 for post-treatment} \\ trt &= \textrm{treatment variable, 0 for control patients and 1 for treatment patients} \\ \beta_{secular} &= \textrm{coefficient for the "secular" trend, the change in outcome attributable to time} \\ \beta_{trt} &= \textrm{coefficient for treatment effectiveness,} \\ & \textrm{interpretable as log-odds of a good outcome for treated patients relative to control patients} \\ I() &= \textrm{identity function (0 if its argument is 0, 1 if its argument is 1)} \end{align} \]

The coefficient of interest here is \(\beta_{trt}\).

We will have two count tables as follows, one for the control group:

Y=1 Y=0
pre \(n_{11}\) \(n_{12}\)
post \(n_{21}\) \(n_{22}\)

and one for the treatment group:

Y=1 Y=0
pre \(m_{11}\) \(m_{12}\)
post \(m_{21}\) \(m_{22}\)

The above model could be fit using PROC GLIMMIX, with a random intercept for matching ID. I believe the maximum likelihood estimate of \(exp(\beta_{trt})\) in the conditional logistic regression equation above will be:

\[ \frac{ \left( \frac{ m_{21} }{ m_{12} } \right) }{ \left( \frac{ n_{21} }{ n_{12} } \right) } \]

i.e., only the individuals who changed outcome between the two time points will enter into the point estimate of the odds ratio.

Approach 2: Conditional logistic regression with matching on baseline and effect of baseline value

Short and Pasta (link) analyzed a similar dataset and settled on a logistic regression with a random intercept for matching ID:

This model requires that patients have the same status at the pre- time point, and can be written down as:

\[ logit [ P(Y_{i, initialY, trt}) ] = \alpha_i + \beta_{initialY} I(initialY=1) + \beta_{trt} I(trt=1) + \beta_{initialY:trt}{I(initialY=1) I(trt=1)} \]

Definitions are as above, with the additional terms: \[ \begin{align} initialY &= \textrm{patient's status (e.g. VLS or EIC) at the pre- time point} \\ \beta_{initialY:trt} &= \textrm{coefficient for interaction between treatment effectiveness and initial status of the patient.} \\ \end{align} \]

This model differs in that it allows estimation of a different treatment effect depending of patients’ initial status. It does not estimate the secular trend, the change in outcome over time.