Observational Studies as a Service

The gold standard for establishing causality are randomized experiments or as they are called in medical terms, clinical trials. From the point of view of statistics, randomized experiments ensure that an observed outcome is produced from or caused by an intervention.

With OSaaS, our goal is to provide clinical researchers with a tool to conduct efficient fast observational studies integrating methods from health sciences, econometrics, and machine learning.

Ann Harbor RCTs, Causality, Ionnides, Observational Studies, teaches a course Most patients are treated like as in RCTs by their doctors, because there is no evidence-based practice. Doctors dont want to recognize this Joanna Mazel Univ. Michigan Ann Harbor

For example, lets say we are trying to understand the effect of administering diuretics to sepsis patients. In the ideal world, we would have a randomized control trial (clinical trial) that would provide us with the evidence of the effectiveness of diuretics vs. other alternative treatments or even vs. not administering diuretics.

Let’s take some data of sepsis patients from the MIMIC II database and use it to demonstrate how a randomized experiment would work, and how assessments of treatment effectiveness actually occur in the medical realm.

Some variables describe the condition of a patient during his or her ICU stay. A brief explanation of these variables follows.

Randomized Control Trials

Now lets say we want to run a randomized control trial to test the effectiveness of administering diuretics to sepsis patients. To do this we would randomly assign our 1522 patients into two groups, one that receives diuretics (the treatment group), and one that does not receive diuretics (the control group). Let us assume that both groups receive identical treatment for sepsis except for the application of diuretics in the treatment group. This randomization could be done through a simple random variable generator, such as follows:

# We first create a new dummy variable called "treatment", which will be 1 if
# patient will receive diuretics, and 0 otherwise
sepsis$treatment = 0
# We then randomly assign half of patients to receiving diuretics using a seed for replicability
set.seed(123)
indices = sample(nrow(sepsis), nrow(sepsis) / 2)
sepsis$treatment[indices] = 1

# We now check to see how many patients are in the Treatment and Control Groups
table(sepsis$treatment)

## 
##   0   1 
## 761 761

mean(sepsis$treatment)

## [1] 0.5

Lets say we have two outcomes, mortality and length of ICU stay post diuretics, we can check what is the effectiveness of diuretics on these outcomes

# We check balance on a few variables
# Saps variable
sepsis$mortality = mort.y
sepsis$stay = stay.y
# Mortality
by(sepsis$mortality, sepsis$diuretics, mean)

## sepsis$diuretics: 0
## [1] 0.3780945
## -------------------------------------------------------- 
## sepsis$diuretics: 1
## [1] 0.3280423

# Effect 
mean(sepsis$mortality[sepsis$diuretics==1])-mean(sepsis$mortality[sepsis$diuretics==0])

## [1] -0.0500522

# Length of Stay
by(sepsis$stay, sepsis$diuretics, mean)

## sepsis$diuretics: 0
## [1] 6.301575
## -------------------------------------------------------- 
## sepsis$diuretics: 1
## [1] 15.17989

# Effect 
mean(sepsis$stay[sepsis$diuretics==1])-mean(sepsis$stay[sepsis$diuretics==0])

## [1] 8.878319

This causal mechanism is identified thanks to the random assignment of the units of analysis into treatment and control groups, which creates two undistinguishable and (ideally) identical groups on both the observed and non-observed covariate space. A large enough sample is a necessary condition for causal inference for two reasons: 1. To achieve balance on the covariate space (making both groups similar). 2. To obtain statistical power (1 - beta) in order to detect the effect that the intervention brings upon the treatment group.

By randomly separating the study group in two large enough groups, we can ensure statistically that any difference in both treatment and control groups are due to the interventions and not to other factors. By making the study group large enough, we can ensure that both treatment and control groups are very similar in their covariate space (i.e. the variables that describe them).

Observational Studies

However, clincal trials are not always available for all interventions. In these cases, one must perform an observational study (or quasi-experimental study as called by economists) where one tries to build a control group (comparisson / counterfactual group) that most resembles the group that received the intervention (diuretics).

We now show the statistics by Treatment or Control groups (wheter they received Diureticts or not) The Mortality rates (Probability) (Did NOT receive diuretics (0) and received diuretics (1))

by(mort$mortality, mort$diuretics, mean)

## mort$diuretics: 0
## [1] 0.3780945
## -------------------------------------------------------- 
## mort$diuretics: 1
## [1] 0.3280423

So naive (incorrect) effect size is

mean(data$mortality[data$diuretics==1]) - mean(data$mortality[data$diuretics==0])

## [1] -0.0500522

which means that receiving diuretics reduced mortality by 5 percentage points in sepsis patients however, this does not control for differences in covariates or confounding variables, so we should Match the treated group to a set of non treated patients that are ideally identical to them.

The Length of Stay (# of days) (Did NOT receive diuretics (0) and received diuretics (1))

by(stay$lengthOfStay, stay$diuretics, mean)

## stay$diuretics: 0
## [1] 6.301575
## -------------------------------------------------------- 
## stay$diuretics: 1
## [1] 15.17989

So naive (incorrect) effect size is

mean(data$lengthOfStay[data$diuretics==1]) - mean(data$lengthOfStay[data$diuretics==0])

## [1] 8.878319

Matching

To do the comparisson, we find a group of patients who did not receive diuretics that is as similar as possible to the group who did receive diuretics.

results = Match(mort.y, features$diuretics, features[-1], estimand = "ATT", M=1)
summary(results)

## 
## Estimate...  -0.11111 
## AI SE......  0.049385 
## T-stat.....  -2.2499 
## p.val......  0.024455 
## 
## Original number of observations..............  1522 
## Original number of treated obs...............  189 
## Matched number of observations...............  189 
## Matched number of observations  (unweighted).  189

# Therefore, controlling for counfounding variables, receiving Diuretics reduces the probability
# of mortality in 11.1% and this result is statistically significant (p_value < 0.05)
results$est

##            [,1]
## [1,] -0.1111111

Now, this is using multivariate matching (and using ALL covariates, which is not efficient) Next step is figure out what covariates are most influential, and use those to match. Traditinoally in econometrics, the covariates or features that go into the model are chosen based on a theory of how the phenomena of interest works. In contrast, in Machine Learning there are many instances where a theory is not yet been developed, or the causal mechanisms are not yet established, thus algorithmic approaches that explore the impact of each feature on the outcome are employed (like best subset selection, forward subset, backward subset, etc.).

The algorithm we use to select the best features (those that provide a higher control for counfoundness) explores the combinatorial space of logistic regression models and selects the model based on AIC.

null = glm(diuretics ~ 1, data = features, family = "binomial")
full = glm(diuretics ~ ., data = features, family = "binomial")

Here we use an iterative stepwise logistic regression to get at the best model

summary(best)

## 
## Call:
## glm(formula = diuretics ~ mechVentilation + fluidsBalanceSum + 
##     fluidsBalanceFirst + sofa1 + elixahuserB1 + bpMean2 + elixahuserB2 + 
##     vasopressorsUse + bpAvg + sofaFirst + fluidsBalance1 + fluidsInputs2 + 
##     fluidsInputsSum + fluidsInputsFirst + fluidsOutputs1 + creatinineFirst + 
##     creatinine2 + creatinine1, family = "binomial", data = features)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6715  -0.4385  -0.2564  -0.1260   4.1617  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.622e+00  9.494e-01  -3.816 0.000136 ***
## mechVentilation     1.358e+00  3.635e-01   3.734 0.000188 ***
## fluidsBalanceSum   -6.150e-04  2.122e-04  -2.899 0.003745 ** 
## fluidsBalanceFirst  2.815e-04  8.044e-05   3.499 0.000467 ***
## sofa1               1.320e-01  2.979e-02   4.430 9.43e-06 ***
## elixahuserB1        4.874e-01  2.080e-01   2.343 0.019113 *  
## bpMean2            -3.074e-02  1.118e-02  -2.749 0.005981 ** 
## elixahuserB2        4.992e-01  2.186e-01   2.283 0.022414 *  
## vasopressorsUse     7.479e-01  2.775e-01   2.695 0.007032 ** 
## bpAvg               1.484e-02  7.886e-03   1.882 0.059837 .  
## sofaFirst          -6.811e-02  3.222e-02  -2.114 0.034540 *  
## fluidsBalance1      5.342e-04  1.394e-04   3.832 0.000127 ***
## fluidsInputs2       9.974e-04  1.402e-04   7.112 1.15e-12 ***
## fluidsInputsSum    -2.831e-03  3.754e-04  -7.541 4.65e-14 ***
## fluidsInputsFirst   7.967e-04  1.331e-04   5.987 2.14e-09 ***
## fluidsOutputs1      6.396e-04  1.723e-04   3.712 0.000206 ***
## creatinineFirst    -6.443e-01  2.415e-01  -2.667 0.007646 ** 
## creatinine2         9.044e-01  3.432e-01   2.635 0.008411 ** 
## creatinine1        -4.270e-01  2.161e-01  -1.976 0.048141 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1142.0  on 1521  degrees of freedom
## Residual deviance:  752.6  on 1503  degrees of freedom
## AIC: 790.6
## 
## Number of Fisher Scoring iterations: 7

We Print out name of variables that create the best model

names(best$model)[-1]

##  [1] "mechVentilation"    "fluidsBalanceSum"   "fluidsBalanceFirst"
##  [4] "sofa1"              "elixahuserB1"       "bpMean2"           
##  [7] "elixahuserB2"       "vasopressorsUse"    "bpAvg"             
## [10] "sofaFirst"          "fluidsBalance1"     "fluidsInputs2"     
## [13] "fluidsInputsSum"    "fluidsInputsFirst"  "fluidsOutputs1"    
## [16] "creatinineFirst"    "creatinine2"        "creatinine1"

selected.features = names(best$model)[-1]

Now we use these features to create a better matched control group

summary(results.improved)

## 
## Estimate...  -0.12169 
## AI SE......  0.049949 
## T-stat.....  -2.4364 
## p.val......  0.014836 
## 
## Original number of observations..............  1522 
## Original number of treated obs...............  189 
## Matched number of observations...............  189 
## Matched number of observations  (unweighted).  189

results.improved$est

##            [,1]
## [1,] -0.1216931

As can be seen, our mortality is reduced even more in the Treatment group (12 percentage points), one percentage point higher than when using all features.

An alternative, is to use the propsensity scores from the best model to compe up with the ATT.

summary(results.PSM)

## 
## Estimate...  0.010295 
## AI SE......  0.1171 
## T-stat.....  0.087913 
## p.val......  0.92995 
## 
## Original number of observations..............  1522 
## Original number of treated obs...............  189 
## Matched number of observations...............  189 
## Matched number of observations  (unweighted).  723

results.PSM$est

##            [,1]
## [1,] 0.01029488

While our effect size is a reduction of 10 percentage points in mortality, it is not statistically significant so matching on the propensity score does not allow for a credible estimation of the effects of diuretics.

Now we do a Mahalanobis distance on these feature space

summary(results.improved.Mahalanobis)

## 
## Estimate...  -0.1164 
## AI SE......  0.052854 
## T-stat.....  -2.2023 
## p.val......  0.027641 
## 
## Original number of observations..............  1522 
## Original number of treated obs...............  189 
## Matched number of observations...............  189 
## Matched number of observations  (unweighted).  189

results.improved.Mahalanobis$est

##            [,1]
## [1,] -0.1164021

With the Mahalanobis distance our mortality is reduced consistent with the previous metrics (11.6 percentage points), 0.5 percentage point higher than when using all features.

Now we do Genetic Matching on these features

summary(results.Genetic)

## 
## Estimate...  -0.084656 
## AI SE......  0.048654 
## T-stat.....  -1.7399 
## p.val......  0.081869 
## 
## Original number of observations..............  1522 
## Original number of treated obs...............  189 
## Matched number of observations...............  189 
## Matched number of observations  (unweighted).  189

results.Genetic$est

##             [,1]
## [1,] -0.08465608

Therefore, controlling for counfounding variables through genetic matching reduces the probability in 8 percentage points, however the results are not statistically significant.

Observational Studies as a Service

Miguel Paredes

2016-06-02

Randomized Control Trials

Observational Studies

Matching