The gold standard for establishing causality are randomized experiments or as they are called in medical terms, clinical trials. From the point of view of statistics, randomized experiments ensure that an observed outcome is produced from or caused by an intervention.
With OSaaS, our goal is to provide clinical researchers with a tool to conduct efficient fast observational studies integrating methods from health sciences, econometrics, and machine learning.
Ann Harbor RCTs, Causality, Ionnides, Observational Studies, teaches a course Most patients are treated like as in RCTs by their doctors, because there is no evidence-based practice. Doctors dont want to recognize this Joanna Mazel Univ. Michigan Ann Harbor
For example, lets say we are trying to understand the effect of administering diuretics to sepsis patients. In the ideal world, we would have a randomized control trial (clinical trial) that would provide us with the evidence of the effectiveness of diuretics vs. other alternative treatments or even vs. not administering diuretics.
Let’s take some data of sepsis patients from the MIMIC II database and use it to demonstrate how a randomized experiment would work, and how assessments of treatment effectiveness actually occur in the medical realm.
library(foreign)
library(base)
library(rpart)
library(e1071)
library(dplyr)
library(leaps)
library(calibrate)
library(stats)
library(rpart.plot)
library(rattle)
library(caTools)
library(ROCR)
library(caret)
library(Matching)
library(mlbench)
library(caret)
library(leaps)
# We first set the working directory
setwd("C:/Miguel/PhD/diuretics/Miguel/OSaaS/OSaaS")
# Importing the csv file into an R dataframe, which is a data structure
data = read.csv("C:/Miguel/PhD/diuretics/Miguel/OSaaS/OSaaS/data.csv")
# Data prepration
mort = data[,-58]
stay = data[,-57]
sepsis = data[,-c(57:58)]
features = data[,-c(57:58)]
outcomes = data[,c(57:58)]
mort.y = outcomes[,1]
stay.y = outcomes[,2]
Let’s explore the data in R
# Number of cases or observations
nrow(sepsis)
## [1] 1522
# Number of features
ncol(sepsis)
## [1] 56
#Feature names
names(sepsis)
## [1] "diuretics" "age" "gender"
## [4] "race" "sapsAvg" "sapsFirst"
## [7] "saps1" "saps2" "saps3"
## [10] "sofaAvg" "sofaFirst" "sofa1"
## [13] "sofa2" "sofa3" "elixahuserNumber"
## [16] "elixahuserB1" "elixahuserB2" "elixahuserB3"
## [19] "elixahuserB4" "elixahuserB5" "elixahuserB6"
## [22] "elixahuserB7" "elixahuserB8" "elixahuserB9"
## [25] "creatinineAvg" "creatinineFirst" "creatinine1"
## [28] "creatinine2" "creatinine3" "fluidsInputsSum"
## [31] "fluidsInputsFirst" "fluidsInputs1" "fluidsInputs2"
## [34] "fluidsInputs3" "fluidsOutputsSum" "fluidsOutputsFirst"
## [37] "fluidsOutputs1" "fluidsOutputs2" "fluidsOutputs3"
## [40] "fluidsBalanceSum" "fluidsBalanceFirst" "fluidsBalance1"
## [43] "fluidsBalance2" "fluidsBalance3" "vasopressorsUse"
## [46] "mechVentilation" "bpAvg" "bpFirst"
## [49] "bp1" "bp2" "bp3"
## [52] "bpMeanAvg" "bpMeanFirst" "bpMean1"
## [55] "bpMean2" "bpMean3"
Some variables describe the condition of a patient during his or her ICU stay. A brief explanation of these variables follows.
SAPS II score: This point score is based upon a severity of disease classification system. It is calculated from 12 routine physiological measurements during the first 24 hours, information about previous health status, and information obtained upon hospital admission.
SOFA score: This point score is based on a scoring system to determine the extent of a person’s organ function or rate of failure. The score is based on six different scores, one each for the respiratory, cardiovascular, hepatic, coagulation, renal, and neurological systems.
Elixhauser score: This score integrates a list of 30 comorbidities relying on the ICD-9 CM coding manual. The comorbidities were not simplified as an index because each comorbidity affected outcomes (length of hospital stay, hospital changes, and mortality) differently among different patients groups. The comorbidities identified by the Elixhauser comorbidity measure are significantly associated with in-hospital mortality and include both acute and chronic conditions. Walraven et al. has derived and validated an Elixhauser comorbidity index that summarizes disease burden and can discriminate for in-hospital mortality.
Creatinine: A byproduct of creatine phosphate metabolism in muscle. It is generally produced at a constant rate by the body, depending on muscle mass. In our study, this factor is included because it can be indicative of kidney disease.
Vasopressors: Administration of vasopressors indicates whether the patient was administered any sort of vaso-suppressor. Vasopressors are drugs that constrict the blood vessels and thereby elevate blood pressure. Usually, before a patient is considered for discharge from the ICU, vasopressors are suspended. The administration of vasopressors has been included as a factor given the correlation with health status.
Mechanical ventilation: This boolean variable indicates whether or not the patient was mechanically ventilated. Mechanical ventilation assists or replaces spontaneous breathing. Ventilation may involve a machine called a ventilator or breathing may be assisted by a physician, respiratory therapist, or other suitable person compressing a bag or set of bellows.
Arterial blood pressure: The pressure exerted by circulating blood upon the walls of blood vessels and is one of the principal vital signs. The blood pressure in the circulation is principally due to the pumping action of the heart. In the study, the value for arterial blood pressure refers to systolic pressure. During each heartbeat, blood pressure varies between a maximum (systolic) and a minimum (diastolic) pressure. Systolic blood pressure is a measure of blood pressure while the heart is beating, while diastolic pressure is a measure of blood pressure while the heart is relaxed.
Mean arterial blood pressure: This quantity is defined as arterial pressure during a single cardiac cycle. Is is calculated as \(\frac{2 \cdot diastolic + systolic}{3}\).
Now lets say we want to run a randomized control trial to test the effectiveness of administering diuretics to sepsis patients. To do this we would randomly assign our 1522 patients into two groups, one that receives diuretics (the treatment group), and one that does not receive diuretics (the control group). Let us assume that both groups receive identical treatment for sepsis except for the application of diuretics in the treatment group. This randomization could be done through a simple random variable generator, such as follows:
# We first create a new dummy variable called "treatment", which will be 1 if
# patient will receive diuretics, and 0 otherwise
sepsis$treatment = 0
# We then randomly assign half of patients to receiving diuretics using a seed for replicability
set.seed(123)
indices = sample(nrow(sepsis), nrow(sepsis) / 2)
sepsis$treatment[indices] = 1
# We now check to see how many patients are in the Treatment and Control Groups
table(sepsis$treatment)
##
## 0 1
## 761 761
mean(sepsis$treatment)
## [1] 0.5
Lets say we have two outcomes, mortality and length of ICU stay post diuretics, we can check what is the effectiveness of diuretics on these outcomes
# We check balance on a few variables
# Saps variable
sepsis$mortality = mort.y
sepsis$stay = stay.y
# Mortality
by(sepsis$mortality, sepsis$diuretics, mean)
## sepsis$diuretics: 0
## [1] 0.3780945
## --------------------------------------------------------
## sepsis$diuretics: 1
## [1] 0.3280423
# Effect
mean(sepsis$mortality[sepsis$diuretics==1])-mean(sepsis$mortality[sepsis$diuretics==0])
## [1] -0.0500522
# Length of Stay
by(sepsis$stay, sepsis$diuretics, mean)
## sepsis$diuretics: 0
## [1] 6.301575
## --------------------------------------------------------
## sepsis$diuretics: 1
## [1] 15.17989
# Effect
mean(sepsis$stay[sepsis$diuretics==1])-mean(sepsis$stay[sepsis$diuretics==0])
## [1] 8.878319
This causal mechanism is identified thanks to the random assignment of the units of analysis into treatment and control groups, which creates two undistinguishable and (ideally) identical groups on both the observed and non-observed covariate space. A large enough sample is a necessary condition for causal inference for two reasons: 1. To achieve balance on the covariate space (making both groups similar). 2. To obtain statistical power (1 - beta) in order to detect the effect that the intervention brings upon the treatment group.
By randomly separating the study group in two large enough groups, we can ensure statistically that any difference in both treatment and control groups are due to the interventions and not to other factors. By making the study group large enough, we can ensure that both treatment and control groups are very similar in their covariate space (i.e. the variables that describe them).
However, clincal trials are not always available for all interventions. In these cases, one must perform an observational study (or quasi-experimental study as called by economists) where one tries to build a control group (comparisson / counterfactual group) that most resembles the group that received the intervention (diuretics).
We now show the statistics by Treatment or Control groups (wheter they received Diureticts or not) The Mortality rates (Probability) (Did NOT receive diuretics (0) and received diuretics (1))
by(mort$mortality, mort$diuretics, mean)
## mort$diuretics: 0
## [1] 0.3780945
## --------------------------------------------------------
## mort$diuretics: 1
## [1] 0.3280423
So naive (incorrect) effect size is
mean(data$mortality[data$diuretics==1]) - mean(data$mortality[data$diuretics==0])
## [1] -0.0500522
which means that receiving diuretics reduced mortality by 5 percentage points in sepsis patients however, this does not control for differences in covariates or confounding variables, so we should Match the treated group to a set of non treated patients that are ideally identical to them.
The Length of Stay (# of days) (Did NOT receive diuretics (0) and received diuretics (1))
by(stay$lengthOfStay, stay$diuretics, mean)
## stay$diuretics: 0
## [1] 6.301575
## --------------------------------------------------------
## stay$diuretics: 1
## [1] 15.17989
So naive (incorrect) effect size is
mean(data$lengthOfStay[data$diuretics==1]) - mean(data$lengthOfStay[data$diuretics==0])
## [1] 8.878319
which means that receiving diuretics reduced mortality by 5 percentage points in sepsis patients however, this does not control for differences in covariates or confounding variables, so we should Match the treated group to a set of non treated patients that are ideally identical to them.
To do the comparisson, we find a group of patients who did not receive diuretics that is as similar as possible to the group who did receive diuretics.
results = Match(mort.y, features$diuretics, features[-1], estimand = "ATT", M=1)
summary(results)
##
## Estimate... -0.11111
## AI SE...... 0.049385
## T-stat..... -2.2499
## p.val...... 0.024455
##
## Original number of observations.............. 1522
## Original number of treated obs............... 189
## Matched number of observations............... 189
## Matched number of observations (unweighted). 189
# Therefore, controlling for counfounding variables, receiving Diuretics reduces the probability
# of mortality in 11.1% and this result is statistically significant (p_value < 0.05)
results$est
## [,1]
## [1,] -0.1111111
Now, this is using multivariate matching (and using ALL covariates, which is not efficient) Next step is figure out what covariates are most influential, and use those to match. Traditinoally in econometrics, the covariates or features that go into the model are chosen based on a theory of how the phenomena of interest works. In contrast, in Machine Learning there are many instances where a theory is not yet been developed, or the causal mechanisms are not yet established, thus algorithmic approaches that explore the impact of each feature on the outcome are employed (like best subset selection, forward subset, backward subset, etc.).
The algorithm we use to select the best features (those that provide a higher control for counfoundness) explores the combinatorial space of logistic regression models and selects the model based on AIC.
null = glm(diuretics ~ 1, data = features, family = "binomial")
full = glm(diuretics ~ ., data = features, family = "binomial")
Here we use an iterative stepwise logistic regression to get at the best model
summary(best)
##
## Call:
## glm(formula = diuretics ~ mechVentilation + fluidsBalanceSum +
## fluidsBalanceFirst + sofa1 + elixahuserB1 + bpMean2 + elixahuserB2 +
## vasopressorsUse + bpAvg + sofaFirst + fluidsBalance1 + fluidsInputs2 +
## fluidsInputsSum + fluidsInputsFirst + fluidsOutputs1 + creatinineFirst +
## creatinine2 + creatinine1, family = "binomial", data = features)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6715 -0.4385 -0.2564 -0.1260 4.1617
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.622e+00 9.494e-01 -3.816 0.000136 ***
## mechVentilation 1.358e+00 3.635e-01 3.734 0.000188 ***
## fluidsBalanceSum -6.150e-04 2.122e-04 -2.899 0.003745 **
## fluidsBalanceFirst 2.815e-04 8.044e-05 3.499 0.000467 ***
## sofa1 1.320e-01 2.979e-02 4.430 9.43e-06 ***
## elixahuserB1 4.874e-01 2.080e-01 2.343 0.019113 *
## bpMean2 -3.074e-02 1.118e-02 -2.749 0.005981 **
## elixahuserB2 4.992e-01 2.186e-01 2.283 0.022414 *
## vasopressorsUse 7.479e-01 2.775e-01 2.695 0.007032 **
## bpAvg 1.484e-02 7.886e-03 1.882 0.059837 .
## sofaFirst -6.811e-02 3.222e-02 -2.114 0.034540 *
## fluidsBalance1 5.342e-04 1.394e-04 3.832 0.000127 ***
## fluidsInputs2 9.974e-04 1.402e-04 7.112 1.15e-12 ***
## fluidsInputsSum -2.831e-03 3.754e-04 -7.541 4.65e-14 ***
## fluidsInputsFirst 7.967e-04 1.331e-04 5.987 2.14e-09 ***
## fluidsOutputs1 6.396e-04 1.723e-04 3.712 0.000206 ***
## creatinineFirst -6.443e-01 2.415e-01 -2.667 0.007646 **
## creatinine2 9.044e-01 3.432e-01 2.635 0.008411 **
## creatinine1 -4.270e-01 2.161e-01 -1.976 0.048141 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1142.0 on 1521 degrees of freedom
## Residual deviance: 752.6 on 1503 degrees of freedom
## AIC: 790.6
##
## Number of Fisher Scoring iterations: 7
We Print out name of variables that create the best model
names(best$model)[-1]
## [1] "mechVentilation" "fluidsBalanceSum" "fluidsBalanceFirst"
## [4] "sofa1" "elixahuserB1" "bpMean2"
## [7] "elixahuserB2" "vasopressorsUse" "bpAvg"
## [10] "sofaFirst" "fluidsBalance1" "fluidsInputs2"
## [13] "fluidsInputsSum" "fluidsInputsFirst" "fluidsOutputs1"
## [16] "creatinineFirst" "creatinine2" "creatinine1"
selected.features = names(best$model)[-1]
Now we use these features to create a better matched control group
summary(results.improved)
##
## Estimate... -0.12169
## AI SE...... 0.049949
## T-stat..... -2.4364
## p.val...... 0.014836
##
## Original number of observations.............. 1522
## Original number of treated obs............... 189
## Matched number of observations............... 189
## Matched number of observations (unweighted). 189
results.improved$est
## [,1]
## [1,] -0.1216931
As can be seen, our mortality is reduced even more in the Treatment group (12 percentage points), one percentage point higher than when using all features.
An alternative, is to use the propsensity scores from the best model to compe up with the ATT.
summary(results.PSM)
##
## Estimate... 0.010295
## AI SE...... 0.1171
## T-stat..... 0.087913
## p.val...... 0.92995
##
## Original number of observations.............. 1522
## Original number of treated obs............... 189
## Matched number of observations............... 189
## Matched number of observations (unweighted). 723
results.PSM$est
## [,1]
## [1,] 0.01029488
While our effect size is a reduction of 10 percentage points in mortality, it is not statistically significant so matching on the propensity score does not allow for a credible estimation of the effects of diuretics.
Now we do a Mahalanobis distance on these feature space
summary(results.improved.Mahalanobis)
##
## Estimate... -0.1164
## AI SE...... 0.052854
## T-stat..... -2.2023
## p.val...... 0.027641
##
## Original number of observations.............. 1522
## Original number of treated obs............... 189
## Matched number of observations............... 189
## Matched number of observations (unweighted). 189
results.improved.Mahalanobis$est
## [,1]
## [1,] -0.1164021
With the Mahalanobis distance our mortality is reduced consistent with the previous metrics (11.6 percentage points), 0.5 percentage point higher than when using all features.
Now we do Genetic Matching on these features
summary(results.Genetic)
##
## Estimate... -0.084656
## AI SE...... 0.048654
## T-stat..... -1.7399
## p.val...... 0.081869
##
## Original number of observations.............. 1522
## Original number of treated obs............... 189
## Matched number of observations............... 189
## Matched number of observations (unweighted). 189
results.Genetic$est
## [,1]
## [1,] -0.08465608
Therefore, controlling for counfounding variables through genetic matching reduces the probability in 8 percentage points, however the results are not statistically significant.