April 29, 2016

Background

staph

  • Staphylococcus aureus (S. aureus) is ubiquitous and can be commensal or pathogenic.
  • S. aureus has variable antibiotic resistance patterns, but resistance to methicillin is important.
  • Methicillin is rarely used in treatment, but resistance to it is associated with resistance to other commonly used antibiotics.

Background

staph

Background

  • In particular S. aureus bacteremia (SAB) is common (3.6-6.0 cases per 100,000 py)
    - Community and hospital settings
  • SAB in general has very high mortality and morbidity (~20-40% per episode)

AIM: Understand the relationship between methicillin-resistance and mortality in patients with SAB

Data

  • Serial enrollment of every patient with positive blood culture for S. aureus in SFGH lab from Jan 2008 - July 2013
  • Database managed by infection control, but all patients had mandatory ID service consultation and recommendations
  • Followed prospectively
  • Outcome = death within 90 days (redundant: manual chart review, EMR query, online search, SSA MDF, CDC NDI database through 2011)

Specify the causal question

What is the causal effect of the presence versus absence of methicillin-resistance in the context of a first presentation of Staphylococcus aureus bloodstream infection on 90-day mortality among patients presenting to San Francisco General Hospital (SFGH)?

Specify the causal model \(M^F\)

  • Endogenous nodes: \(X = (W, A, Z, Y)\),
    where \(W\) is the vector of baseline covariates
  • \(W_1\) is race (caucasian yes/no)
  • \(W_2\) is sex
  • \(W_3\) is age
  • \(W_4\) is duration of bacteremia
  • \(W_5\) is presumed source of bacteremia
  • \(W_6\) is hospital onset of bacteremia
  • \(W_7\) is homelessness
  • \(W_8\) is IV drug use
  • \(W_9\) is sepsis criteria on admit

Specify the causal model \(M^F\)

  • \(W_{10}\) prior hospitalization in the last year
  • \(W_{11}\) is severity of comorbidities (by Charlson index)
  • \(W_{12}\) is HIV
  • \(W_{13}\) is cirrhosis
  • \(W_{14}\) is is immunosuppresent use

\(A\) is absence or presence of methicillin-resistance
\(Z\) is a set of mediators denoting severity of illness presentation
- \(Z_1\) intensive care unit (ICU)
- \(Z_2\) length of hospital stay
\(Y\) is mortality

Specify the causal model \(M^F\)

  • Exogenous nodes: \(U = (U_W, U_A, U_Z, U_Y) \sim P_U\). There are no independence assumptions.
  • Structural equations F:
  • \(W = f_W (U_W)\)
  • \(A = f_A(W, U_A)\)
  • \(Z = f_Z(W, A, U_Z)\)
  • \(Y = f_Y(W, A, Z, U_Y)\)
  • Exclusion restrictions: We are assuming that the baseline covariates do not affect each other directly. We have not specified any functional forms.

Specify observed data and its link to SCM

Specify the causal parameter

  • Causal risk of death (within 90 days) due to the presence of methicillin-resistance in the context of a first presentation of Staphylococcus aureus bloodstream infection

\[\Psi_F (P_{U,X}) = P_{U,X}(Y_1 = 1) - PU,X(Y_0 = 1)\] \[\Psi_F= E_{U,X}(Y_1) - E_{U,X}(Y_0)\] where \(Y_a\) is the counterfactual outcome (90-day mortality), if the patient had methicillin-resistance A = a.

Identifiability

DAG1

  • We use \(M^{F*}\) to denote the original SCM augmented by the assumptions needed for identifiability. Under \(M^{F*}\), the backdoor criteria will hold conditionally on \(W\).

Identifiability

DAG2

  • We use \(M^{F*}\) to denote the original SCM augmented by the assumptions needed for identifiability. Under \(M^{F*}\), the backdoor criteria will hold conditionally on \(W\).

Positivity Assumption

  • For identifiability, we also need the positivity assumption to hold \(min_{a \in A} P_0(A=a|W=w) > 0\) for all \(w\) for which \(P_0(W = w) > 0\).
  • There must be a positive probability of presence of methicillin-resistance or not, within in every covariate strata; W denotes the set of covariates that satisfy the backdoor criteria under the working \(M^{F*}\).

Ignore Z

  • The intervention to set A=a changes the value of Z and changes the value of Y. The effect of methicillin resistance on mortality has two pathways: directly (e.g., biological mechanism) and by changing severity of illness presentation. \[W = f_W(U_W)\] \[A = a\] \[Za = f_M(W, a, U_Z)\] \[Y = f_Y(W, a, Z_a, U_Y)\]

Ignore Z

  • For the analysis, ignore presence of Z. \[W = f_W(U_W)\] \[A = a\] \[Y = f_Y(W, a, U_Y)\]

Specify the statistical estimand

  • Under the working SCM \(M^{F*}\) and with the positivity assumption, the average treatment effect \(\Psi^F(P_U,X)\) is identified using the G-Computation formula: \[\Psi(P_0) = E_0[E_0(Y|A=1,W) - E_0(Y|A=0,W)]\]
    \[=\Sigma[E_0(Y|A=1,W=w)\] \[- E_0(Y|A=0,W=w)]P_0(W=w)\] where \(W\) is the vector of baseline covariates.

Estimators

  • Simple substitution estimator based on the G-Computation formula

\[\Psi_{SS}(P_n)= \frac{1}{n} \Sigma[\overline{\mbox{Q}}_n(1,Wi)-\overline{\mbox{Q}}_n(0,W_i)]\] where \(P_n\) is the empirical distribution and \(\overline{\mbox{Q}}_n(A,W)\) is the estimate of the conditional mean outcome given methicillin-resistance and baseline covariates \(E_0(Y|A,W)\).

Estimators

  • Inverse Probability of Treatment Weighting (IPTW) \[\Psi IPTW(P_n)= \frac{1}{n} \Sigma \frac{I(A_i=1)}{g_n(A_i|W_i)}Y_i\] \[ - \frac{1}{n} \Sigma\frac{I(A_i=0)}{g_n(A_i|W_i)} Y_i \]
    Where \(g_n(A_i|W_i)\) is an estimate of the conditional probability of methicillin-resistance given the baseline characteristics \(P_0(A|W)\). This conditional distribution is often referred to as the exposure or treatment mechanism.

Estimators

  • Targeted Maximum Likelihood Estimation

\[\Psi TMLE(P_n)= \frac{1}{n} \Sigma (\overline{\mbox{Q}}^*_n(1,W_i)-\overline{\mbox{Q}}^*_n(0,W_i)) \] Where \(\overline{\mbox{Q}}^*_n(A,W)\) is the updated estimate of the conditional mean outcome given methicillin-resistance and baseline covariates \(E_0(Y|A,W)\).

SuperLearner Library

  • Main Effects Logistic Regression, all covariates
  • Logistic Regression \[Death = B_0 + B_1MRSA + B_2Age + B_3Sex\] \[+ B_4Caucasian +B_5Sepsis + B_6(MRSA\times Age)\]
  • Logistic Regression \[Death = B_0 + B_1MRSA + B_2CharlsonIndex + B_3Sepsis\] \[+ B_4(MRSA\times CharlsonIndex) +B_5(MRSA\times Sepsis)\]
  • LASSO Regression
  • Generalized additive model
  • Multivariate adaptive polynomial spline regression
  • Random Forest

Simple Substitution

#SuperLearner model of Q(Y|A,W)
Qinit90D
## 
## Call:  
## SuperLearner(Y = ObsData$Y, X = X, newX = newdata, family = "binomial",  
##     SL.library = SL.library, method = "method.AUC", cvControl = list(V = 10L,  
##         stratifyCV = FALSE, shuffle = TRUE, validRows = NULL)) 
## 
## 
##                          Risk        Coef
## SL.glm_All          0.2599549  0.21119738
## SL.glm.EstA_All     0.3111445  0.11547734
## SL.glm.EstB_All     0.4250188 -0.08488417
## SL.glmnet_All       0.2896068  0.12512740
## SL.gam_All          0.2591034  0.25119178
## SL.polymars_All     0.3393188  0.20174807
## SL.randomForest_All 0.2822690  0.17152182

Simple Substitution

# pred prob of survival given A,W 
QbarAW <- Qinit90D$SL.predict[1:n]
Qbar1W <- Qinit90D$SL.predict[(n+1): (2*n)]
Qbar0W <- Qinit90D$SL.predict[(2*n+1): (3*n)]
PsiHat.SS<-mean(Qbar1W - Qbar0W) 
PsiHat.SS
## [1] 0.01131206
#Percent effect on 90 day mortality:
PsiHat.SS*100
## [1] 1.131206

IPTW

#SuperLearner model of g(A|W)
gHatSL90D
## 
## Call:  
## SuperLearner(Y = ObsData$A, X = W, family = "binomial", SL.library = SL.library,  
##     method = "method.AUC", verbose = TRUE, cvControl = list(V = 10L,  
##         stratifyCV = FALSE, shuffle = TRUE, validRows = NULL)) 
## 
## 
##                          Risk      Coef
## SL.glm_All          0.4402636 0.1717388
## SL.glmnet_All       0.4830365 0.2041316
## SL.gam_All          0.4351526 0.2018086
## SL.polymars_All     0.4572965 0.2455639
## SL.randomForest_All 0.4461725 0.2184342

IPTW

# generate the predicted prob of MRSA given baseline cov 
gHat1W<- gHatSL90D$SL.predict 
# generate the predicted prob of no MRSA, given baseline cov 
gHat0W<- 1- gHat1W
#IPTW Estimator:
PsiHat.IPTW1<- mean(as.numeric(ObsData$A==1)*ObsData$Y/gHat1W)
PsiHat.IPTW2<- mean(as.numeric(ObsData$A==0)*ObsData$Y/gHat0W) 
PsiHat.IPTW1 - PsiHat.IPTW2
## [1] -0.008696856
#Percent effect on 90 day mortality:
(PsiHat.IPTW1 - PsiHat.IPTW2)*100
## [1] -0.8696856

TMLE

# 3. Estimate to create the clever covariate H_n^*(A,W)$ for each subject 
  H.AW<- ObsData$A/gHat1W - (1-ObsData$A)/gHat0W 
# also want to evaluate the clever covariates at A=1 and A=0  
  H.1W<- 1/gHat1W 
  H.0W<- -1/gHat0W
# 4. Update the initial estimate of Qbar_0(A,W) 
logitUpdate<- glm(ObsData$Y ~ -1 +offset(qlogis(QbarAW)) + H.AW, family=binomial)
eps<- logitUpdate$coef
# calc the predicted values for each subj under each txt 
QbarAW.star<- plogis(qlogis(QbarAW)+ eps*H.AW) 
Qbar1W.star<- plogis(qlogis(Qbar1W)+ eps*H.1W) 
Qbar0W.star<- plogis(qlogis(Qbar0W)+ eps*H.0W) 
# 5. Estimate Psi(P_0) as the emp mean of the difference in the pred 
# outcomes under A=1 and A=0
PsiHat.TMLE<- mean(Qbar1W.star - Qbar0W.star, na.rm=TRUE) 
PsiHat.TMLE
## [1] 0.01789815
#Percent effect on 90 day mortality:
PsiHat.TMLE*100
## [1] 1.789815

TMLE Influence Curve

# evaluate the influence curve for all observations 
IC <- H.AW*(ObsData$Y - QbarAW.star) + Qbar1W.star - Qbar0W.star - PsiHat.TMLE 
summary(IC)
##        V1          
##  Min.   :-1.67606  
##  1st Qu.:-0.15624  
##  Median : 0.01898  
##  Mean   : 0.00000  
##  3rd Qu.: 0.13974  
##  Max.   : 1.82458
varHat.IC<-var(IC, na.rm=TRUE)/(n)
varHat.IC
##              [,1]
## [1,] 0.0005547578

TMLE Influence Curve

# obtain confidence intervals
c(PsiHat.TMLE -1.96*sqrt(varHat.IC), PsiHat.TMLE +1.96*sqrt(varHat.IC))
## [1] -0.02826631  0.06406261
# calculate the pvalue 
2* pnorm( abs(PsiHat.TMLE / sqrt(varHat.IC)), lower.tail=F )
##           [,1]
## [1,] 0.4473144

Bootstapped Estimates

  • 400 Bootstrapped Samples

Bootstrapped 95% Confidence Intervals

# CI simple substitution estimator  using quantiles
quantile(estimates[,"SimpSubs"], prob=c(0.025,0.975))
##        2.5%       97.5% 
## -0.03527925  0.04048608
# CI for IPTW  using quantiles 
quantile(estimates[,"IPTW"], prob=c(0.025,0.975))
##        2.5%       97.5% 
## -0.04371289  0.08244644
# CI for TMLE \ using quantiles
quantile(estimates[,"TMLE"], prob=c(0.025,0.975))
##       2.5%      97.5% 
## -0.1806116  0.2601936

Results Comparison

Estimator ATE 95% CI

SS

0.011

(-0.035, 0.041)

IPTW

-0.0087

(-0.044, 0.082)

TMLE

0.0179

(-0.028, 0.064)

Conclusion

  • Diagnosis with methicillin-resistant Staphylococcus aureus (MRSA) did not significatly affect mortality in the next 90 days among the population of patients at San Francisco General Hospital who were diagnosed with Staphylococcus aureus.
  • Limitations
  • Next steps?