1. Introduction of Intensive Care Unit data.

The ICU data set consists of a sample of 200 subjects who were part of a much larger study on survival of patients following admission to an adult intensive care unit (ICU). The major goal of this study was to develop a logistic regression model to predict the probability of survival to hospital discharge of these patients and to study the risk factors associated with ICU mortality. A number of publications have appeared which have focused on various facets of the problem. The reader wishing to learn more about the clinical aspects of this study should start with Lemeshow, Teres, Avrunin, and Pastides (1988).(Hosmer Jr and Sturdivant. 2013)

2. The variables of Intensive Care Unit data.

There are 200 observations of 21 variables.
ID : Identification Number
STA : Vital Status(0=Lived, 1=Died)
AGE : Years
GENDER: 0=Male, 1=Female
RACE: 1=White, 2=Black, 3=Other
SER : Service at ICU Admission(0=Medical, 1=Surgical)
CAN : Cancer Part of Present Problem (0=No, 1=Yes)
CRN : History of Chronic Renal Failure(0=No, 1=Yes)
INF : Infection Probable at ICU Admission(0=No, 1=Yes)
CPR : CPR Prior to ICU Admission (0=No, 1=Yes)
SYS : Systolic Blood Pressure(mm Hg)
HRA : Heart Rate at ICU Admission(Beats/min)
PRE : Previous Admission to an ICU within 6 Months(0=No, 1=Yes)
TYP : Type of Admission(0=Elective, 1=Emergency)
FRA : Long Bone, Multiple, Neck, Single Area, or Hip Fracture(0=No, 1=Yes)
PO2 : PO2 from Initial Blood Gases( 0 = >60, 1 = <60)
PH : PH from Initial Blood Gases(0 =>7.25, 1 =< 7.25)
PCO2: PCO2 from initial Blood(0 =<45, 1 = > 45)
BIC : Bicarbonate from Initial Blood Gases(0=>18, 1 =< 18)
CRE : Creatinine from Initial Blood(0 =<2.0, 1 => 2.0)
LOC : Level of Consciousness at ICU Admission (0=No Coma, or Stupor, 1=Deep Stupor, 2= Coma )

3. Estimation of Logistic Regression Models using “Zelig” Package

library(stargazer)
library(Zelig)

icu <- read.table("icu.txt", header=TRUE)

icu.z1 <- zelig(STA ~ AGE+GENDER+RACE+SER+CAN+CRN+INF+CPR+SYS+HRA+PRE+TYP+FRA+PO2+PH+PCO+BIC+CRE+LOC, data=icu, model="logit", x=TRUE)

How to cite this model in Zelig: Kosuke Imai, Gary King, and Olivia Lau. 2015. “logit: Logistic Regression for Dichotomous Dependent Variables” in Kosuke Imai, Gary King, and Olivia Lau, “Zelig: Everyone’s Statistical Software,” http://gking.harvard.edu/zelig

icu.z2 <- zelig(STA ~ AGE+TYP, data=icu, model="logit", x=TRUE)

How to cite this model in Zelig: Kosuke Imai, Gary King, and Olivia Lau. 2015. “logit: Logistic Regression for Dichotomous Dependent Variables” in Kosuke Imai, Gary King, and Olivia Lau, “Zelig: Everyone’s Statistical Software,” http://gking.harvard.edu/zelig

icu.z3 <- zelig(STA ~ AGE+TYP + AGE:TYP, data=icu, model="logit", x=TRUE)

How to cite this model in Zelig: Kosuke Imai, Gary King, and Olivia Lau. 2015. “logit: Logistic Regression for Dichotomous Dependent Variables” in Kosuke Imai, Gary King, and Olivia Lau, “Zelig: Everyone’s Statistical Software,” http://gking.harvard.edu/zelig

stargazer(icu.z1, icu.z2, icu.z3, type="html")
Dependent variable:
STA
(1) (2) (3)
AGE 0.052*** 0.034*** 0.074
(0.017) (0.011) (0.088)
GENDER -0.555
(0.501)
RACE -0.005
(0.528)
SER -0.545
(0.575)
CAN 2.758***
(0.980)
CRN -0.102
(0.762)
INF -0.056
(0.535)
CPR 0.978
(0.983)
SYS -0.012
(0.008)
HRA -0.004
(0.009)
PRE 0.929
(0.629)
TYP 2.744*** 2.454*** 5.308
(0.995) (0.753) (6.413)
FRA 1.152
(0.999)
PO2 0.387
(0.851)
PH 2.415**
(1.231)
PCO -3.173**
(1.386)
BIC -0.794
(0.916)
CRE 0.233
(1.075)
LOC 2.706***
(0.752)
AGE:TYP -0.040
(0.088)
Constant -5.414** -5.509*** -8.315
(2.122) (1.034) (6.375)
Observations 200 200 200
Log Likelihood -64.800 -86.538 -86.416
Akaike Inf. Crit. 169.600 179.076 180.832
Note: p<0.1; p<0.05; p<0.01

“Zelig” package is used to estimate logistic regresssion models of patient’s Vital Status(0=Lived, 1=Died) at ICU(Intensive Care Unit) as a function of explanatory variables.(Kosuke Imai and Lau 2015)
The first model uses all the variables in the dataset and shows only 6 variables are significant among 19 explanatory variables.
For the second model, only 2 explanatory variables: AGE and TYP(Type of Admission, 0=Elective, 1=Emergency) are included to explain the relationships of Patient’s age and type of admission on the response variable vital status.
The third model includes AGE, TYP and their interaction term AGE:TYP to measure the interaction effect of AGE and TYP.
Then 3 logit models are compared using “stargazer” package.(Hlavac 2014)
The result shows that the interaction term of AGE and TYP is not statistically significant.

4. Simulating the logistic models using the Zelig package

The next Simulation-based approach produces more accurate results than conventional mathematical technique for many reasons.(King, Tomz, and Wittenberg 2000) (Zelner 2009) The first simulation table shows that when type of admission to the ICU is elective(0) the ICU patient’s log odds of dying(Vital Status =1) increases 0.035 times higher for additonal age increment.
Exponentiating the logistic regression model removes the log for the odds and changes it from a model that is additive in the log-odds scale to one that is multiplicative in the odds scale.
Therefore, we also can interpret that when type of admission to the ICU is elective(TYP=0) the ICU patient’s odds of dying(Vital Status =1) increases 1.0356(=e^0.035) times higher for additonal age increment.

Similary, the second simulation table shows that when type of admission to the ICU is emergency(TYP=1) the ICU patient’s log odds of dying(Vital Status =1) increases 0.254 times higher for additonal age increment . Which means that when type of admission to the ICU is emergency(TYP=1) the ICU patient’s odds of dying(Vital Status =1) increases 1.289(=e^0.254) times higher for additonal age increment.

Then the graphs visualizes the relationship of expected value(Vital Status) and age. The second graph shows that probabilty of dying increases much steeper than the first graph since it is the case in which type of admission is emergency.

library(stargazer)
set.seed(11111)

x1 <- setx(icu.z2, TYP = 0)
s1 <- sim(icu.z2, x=x1)
summary(s1)
## 
## Model:  logit 
## Number of simulations:  1000 
## 
## Values of X
##   (Intercept)    AGE TYP
## 1           1 57.545   0
## attr(,"assign")
## [1] 0 1 2
## 
## Expected Values: E(Y|X) 
##   mean    sd   50%  2.5% 97.5%
##  0.035 0.028 0.026 0.008  0.11
## 
## Predicted Values: Y|X 
##      0     1
##  0.968 0.032
x2 <- setx(icu.z2, TYP = 1)
s2 <- sim(icu.z2, x=x2)
summary(s2) 
## 
## Model:  logit 
## Number of simulations:  1000 
## 
## Values of X
##   (Intercept)    AGE TYP
## 1           1 57.545   1
## attr(,"assign")
## [1] 0 1 2
## 
## Expected Values: E(Y|X) 
##   mean    sd   50%  2.5% 97.5%
##  0.254 0.038 0.251 0.188 0.331
## 
## Predicted Values: Y|X 
##      0     1
##  0.754 0.246
AGE.range <- min(icu$AGE):max(icu$AGE)
x1 <- setx(icu.z2, TYP = 0, AGE=AGE.range)
s1 <- sim(icu.z2, x=x1)
plot.ci(s1)

AGE.range <- min(icu$AGE):max(icu$AGE)
x2 <- setx(icu.z2, TYP = 1, AGE=AGE.range)
s2 <- sim(icu.z2, x=x2)
plot.ci(s2)

5.Statistical significance test of the interaction effect

The third logistic model icu.z3 includes the interaction term of AGE and STA. To test the significance of the interaction effect, the following “difference in differences” method is conducted. However, the distributions of simulated expected values : “zh1\(qi\)ev, zl1\(qi\)ev, zh0\(qi\)ev, zl0\(qi\)ev” make null values. The further investigation is needed for this result.

library(mvtnorm)
set.seed(12345)

xh1 <- setx(icu.z3, AGE = mean(icu$AGE)+sd(icu$AGE), TYP=1)
xl1 <- setx(icu.z3, AGE = mean(icu$AGE), TYP=1)
xh0 <- setx(icu.z3, AGE = mean(icu$AGE)+sd(icu$AGE), TYP =0)
xl0 <- setx(icu.z3, AGE = mean(icu$AGE), TYP=0)

zh1 <- sim(icu.z3, x=xh1)
zl1 <- sim(icu.z3, x=xl1)
zh0 <- sim(icu.z3, x=xh0)
zl0 <- sim(icu.z3, x=xl0)

eff <- (zh1$qi$ev - zl1$qi$ev) -(zh0$qi$ev - zl0$qi$ev)
summary(zh1$qi$ev)
## Length  Class   Mode 
##      0   NULL   NULL
quantile(eff, c(.025, .975))
##  2.5% 97.5% 
##    NA    NA

So the first difference of 2 distributions(one with the effect of TYP and the other without the effect of TYP) is simulated instead of using “difference in differences” method.

set.seed(1)
x.high1 <- setx(icu.z3, TYP=1)
x.low1 <- setx(icu.z3, TYP=0)
z.diff1 <-sim(icu.z3, x=x.low1, x1=x.high1)
summary(z.diff1)
## 
## Model:  logit 
## Number of simulations:  1000 
## 
## Values of X
##   (Intercept)    AGE TYP AGE:TYP
## 1           1 57.545   0       0
## attr(,"assign")
## [1] 0 1 2 3
## 
## Values of X1
##   (Intercept)    AGE TYP AGE:TYP
## 1           1 57.545   1  57.545
## attr(,"assign")
## [1] 0 1 2 3
## 
## Expected Values: E(Y|X) 
##   mean    sd   50%  2.5% 97.5%
##  0.041 0.069 0.016 0.001 0.237
## 
## Expected Values: E(Y|X1) 
##   mean    sd   50%  2.5% 97.5%
##  0.254 0.039 0.251 0.184 0.335
## 
## Predicted Values: Y|X 
##      0     1
##  0.966 0.034
## 
## Predicted Values: Y|X1 
##      0     1
##  0.745 0.255
## 
## First Differences: E(Y|X1) - E(Y|X) 
##   mean   sd   50%   2.5% 97.5%
##  0.213 0.08 0.226 -0.007 0.319

As a result, 95% Confidence interval(-0.007, 0.319) of the distribution for the interaction effect includes 0. Which implies the interaction term AGE:TYP has no significant effect on the vital status of patient.This result is the same as the logistic regression using “Zelig” package in (3) above.

References

Hlavac, Marek(2014). 2014. Stargazer:LaTex Code and ASCII Text for Well-Formatted Regression and Summary Staistics Tables. http://CRAN.R-project.org/package=stargazer.

Hosmer Jr, Stanley Lemeshow, David W., and Rodney X. Sturdivant. 2013. Applied Logistic Regression. John Wiley & Sons.

King, G., M. Tomz, and J. Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 341–55.

Kosuke Imai, Gary King, and Olivia Lau. 2015. Logit: Logistic Regression for Dichotomous Dependent Variables. http://gking.harvard.edu/zelig.

Zelner, Bennet A. 2009. “Using Simulation to Interpret Results from Logit, Probit, and Other Nonlinear Models.” Strategic Management Journal.