Biostatistics 213: Homework 8

## Note: On Mac OS X we strongly recommend using iplots from within JGR.
## Proceed at your own risk as iplots cannot resolve potential ev.loop deadlocks.
## 'Yes' is assumed for all dialogs as they cannot be shown without a deadlock,
## also ievent.wait() is disabled.

Loading data

Introduction

To evaluate the association between ICU mortality (outcome) and various baseline characteristics at baseline (predictors), analysis of data from the icu dataset was conducted.

library(gdata)
icu <- read.xls("~/statistics/bio213/icu.xls")

## Correct variable names: po2 for pO2, pco2 for pCO2
names(icu) <- c("id","status","age","sex","race","service","cancer","renal","inf","cpr","sys","heart","prevad","type","frac","po2","ph","pco2","bic","cre","loc")

## Same data available online:
## Hosmer and Lemeshow (2000) Applied Logistic Regression
## http://www.umass.edu/statdata/statdata/data/index.html
## description: http://www.umass.edu/statdata/statdata/data/icu.txt
## text format: http://www.umass.edu/statdata/statdata/data/icu.dat
## Excel format: http://www.umass.edu/statdata/statdata/data/icu.xls

## icu <- read.table("http://www.umass.edu/statdata/statdata/data/icu.dat",
##                   skip = 8, col.names = c("id","status","age","sex","race","service","cancer","renal","inf","cpr","sys","heart","prevad","type","frac","po2","ph","pco2","bic","cre","loc")
##                   )
## OR
## icu <- read.xls("http://www.umass.edu/statdata/statdata/data/icu.xls")

Exploring data

No missing data are present. There are few minority patients, cancer patients, chronic renal failure patients, patients who underwent CPR, long bone fracture patients, patients who had abnormal values in the ABG testing, and patients who had abnormal consciousness levels.

## Number of subjects
nrow(icu)

## Check categorical and continuous variables (No need after categorical conversion)
lapply(icu, function(x) head(summary.factor(x), 20))  # First 20 levels

## Check number of unique values
n.unique.values <- sapply(icu, function(x) nlevels(factor(x)))
binary.variables <- names(icu)[n.unique.values == 2]
dput(binary.variables)

## summary
summary(icu)

## Number of complete cases: All 200 are complete cases
sum(complete.cases(icu))

## Frequencies
library(Deducer)
frequencies(icu[, !names(icu) %in% c("id","age","sys","heart")])

Recoding data

## Recode variables
icu <- within(icu, {
    status  <- factor(status,  levels = 0:1, labels = c("Lived","Died"))
    sex     <- factor(sex,     levels = 0:1, labels = c("Male","Female"))
    race    <- factor(race,    levels = 1:3, labels = c("White","Black","Other"))
    service <- factor(service, levels = 0:1, labels = c("Medical","Surgical"))
    type    <- factor(type,    levels = 0:1, labels = c("Elective","Emergency"))
    po2     <- factor(po2,     levels = 0:1, labels = c(">60","<=60"))
    ph      <- factor(ph,      levels = 0:1, labels = c(">=7.25","<7.25"))
    pco2    <- factor(pco2,    levels = 0:1, labels = c("<=45",">45"))
    bic     <- factor(bic,     levels = 0:1, labels = c(">=18","<18"))
    cre     <- factor(cre,     levels = 0:1, labels = c("<=2.0",">2.0"))
    loc     <- factor(loc,     levels = 0:2, labels = c("No Coma or Deep Stupor","Deep Stupor","Coma"))
})

## Recode No-Yes variables
no.yes.variables <- c("cancer","renal","inf","cpr","prevad","frac")
icu[,no.yes.variables] <-
    lapply(icu[,no.yes.variables],
           function(var) {
               var <- factor(var, levels = 0:1, labels = c("No","Yes"))
           })

summary(icu)
       id        status         age           sex         race         service    cancer    renal      inf     
 Min.   :  4   Lived:160   Min.   :16.0   Male  :124   White:175   Medical : 93   No :180   No :181   No :116  
 1st Qu.:210   Died : 40   1st Qu.:46.8   Female: 76   Black: 15   Surgical:107   Yes: 20   Yes: 19   Yes: 84  
 Median :412               Median :63.0                Other: 10                                               
 Mean   :445               Mean   :57.5                                                                        
 3rd Qu.:672               3rd Qu.:72.0                                                                        
 Max.   :929               Max.   :92.0                                                                        
  cpr           sys          heart       prevad           type      frac       po2           ph        pco2    
 No :187   Min.   : 36   Min.   : 39.0   No :170   Elective : 53   No :185   >60 :184   >=7.25:187   <=45:180  
 Yes: 13   1st Qu.:110   1st Qu.: 80.0   Yes: 30   Emergency:147   Yes: 15   <=60: 16   <7.25 : 13   >45 : 20  
           Median :130   Median : 96.0                                                                         
           Mean   :132   Mean   : 98.9                                                                         
           3rd Qu.:150   3rd Qu.:118.2                                                                         
           Max.   :256   Max.   :192.0                                                                         
   bic         cre                          loc     
 >=18:185   <=2.0:190   No Coma or Deep Stupor:185  
 <18 : 15   >2.0 : 10   Deep Stupor           :  5  
                        Coma                  : 10  



Pairs relationship

library(gpairs)
gpairs(icu)

plot of chunk unnamed-chunk-5

Univariable analysis

In the univariable analysis (homework 7), higher age, lower systolic blood pressure, medical service (vs surgical service), chronic renal failure, infection, cardiopulmonary resuscitation prior to ICU admission, emergency ICU admission, creatinine > 2.0 mg/dl were significantly associated with poor outcome. Low bicarbonate had a univariable p-value less than 0.2 for association with high mortality.

Variable manipulation

As they conveyed biologically similar information, pO2, pH, pCO2, and bicarbonate variables were combined as arterial blood gas (abg) variable, indicating abnormality in any of these four variables. Deep stupor and coma were considered similar, thus these were combined to form a dichotomous variable deep stupor or coma.

icu <- within(icu, {
    abg <- NA
    abg[ (po2 == "<=60" | ph == "<7.25" | pco2 == ">45" | bic == "<18")] <- "abnormal"
    abg[!(po2 == "<=60" | ph == "<7.25" | pco2 == ">45" | bic == "<18")] <- "normal"
    abg <- factor(abg, levels = c("normal", "abnormal"))

    stupor.coma <- NA
    stupor.coma[ loc %in% c("Coma","Deep Stupor")] <- "Yes"
    stupor.coma[!loc %in% c("Coma","Deep Stupor")] <- "No"
    stupor.coma <- factor(stupor.coma, levels = c("No","Yes"))
})


if (FALSE) {
    ## filter test
    grid.list <- list(po2 = c(">60","<=60"),
                      ph = c(">=7.25","<7.25"),
                      pco2 = c("<=45",">45"),
                      bic = c(">=18","<18")
                      )

    grid.out <- expand.grid(grid.list)

    grid.out <- within(grid.out, {
        abg <- NA
        abg[ (po2 == "<=60" | ph == "<7.25" | pco2 == ">45" | bic == "<18")] <- "abnormal"
        abg[!(po2 == "<=60" | ph == "<7.25" | pco2 == ">45" | bic == "<18")] <- "normal"
        abg <- factor(abg, levels = c("normal", "abnormal"))
    })
}

Initial model

An initial model was constructed with variables that were significant in the univariable analysis (homework 7): age, systolic pressure, service (medical vs surgical), chronic renal failure, admission for infection, CPR before admission, and emergency admission. The recoded variables (abg and stupor or coma) were also included.

Age, systolic blood pressure, emergency admission, and stupor or coma were significant in the multivariable regression model.

model.full <- glm(status ~ age + sys + service + renal + inf + cpr + type + abg + stupor.coma, family = binomial, data = icu)

summary(model.full)

Call:
glm(formula = status ~ age + sys + service + renal + inf + cpr + 
    type + abg + stupor.coma, family = binomial, data = icu)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.536  -0.600  -0.344  -0.196   2.563  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -3.88315    1.64074   -2.37  0.01795 *  
age              0.02984    0.01274    2.34  0.01919 *  
sys             -0.01163    0.00725   -1.60  0.10868    
serviceSurgical -0.08872    0.53242   -0.17  0.86765    
renalYes         0.50921    0.67080    0.76  0.44779    
infYes           0.08179    0.48069    0.17  0.86488    
cprYes           0.55797    0.82620    0.68  0.49946    
typeEmergency    1.95773    0.88212    2.22  0.02646 *  
abgabnormal      0.35352    0.53509    0.66  0.50883    
stupor.comaYes   3.46283    0.91258    3.79  0.00015 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 200.16  on 199  degrees of freedom
Residual deviance: 139.69  on 190  degrees of freedom
AIC: 159.7

Number of Fisher Scoring iterations: 6

##epicalc::logistic.display(model.full)
..glm.ratio(model.full)
                   OR 2.5 % 97.5 %
(Intercept)      0.02  0.00   0.46
age              1.03  1.01   1.06
sys              0.99  0.97   1.00
serviceSurgical  0.92  0.32   2.60
renalYes         1.66  0.42   6.11
infYes           1.09  0.42   2.78
cprYes           1.75  0.30   8.42
typeEmergency    7.08  1.51  55.59
abgabnormal      1.42  0.49   4.06
stupor.comaYes  31.91  6.52 265.41

Model selection

Backward selection of variables with Akaike information criteria (AIC) was used to choose a best model, and resulted in a model with age, systolic blood pressure, emergency admission, and stupor or coma remained in the chosen model and were significant except for systolic blood pressure (p = 0.073). As there was only one non-significant variable, assessment for collinearity was not considered useful, thus was omitted.

backward.model <- step(model.full, direction = "backward", trace = 0)
summary(backward.model)

Call:
glm(formula = status ~ age + sys + type + stupor.coma, family = binomial, 
    data = icu)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.499  -0.597  -0.364  -0.186   2.515  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.04069    1.46218   -2.76   0.0057 ** 
age             0.03300    0.01206    2.74   0.0062 ** 
sys            -0.01194    0.00666   -1.79   0.0728 .  
typeEmergency   2.17646    0.80253    2.71   0.0067 ** 
stupor.comaYes  3.56868    0.88957    4.01  0.00006 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 200.16  on 199  degrees of freedom
Residual deviance: 141.94  on 195  degrees of freedom
AIC: 151.9

Number of Fisher Scoring iterations: 6

..glm.ratio(backward.model)
                  OR 2.5 % 97.5 %
(Intercept)     0.02  0.00   0.27
age             1.03  1.01   1.06
sys             0.99  0.97   1.00
typeEmergency   8.82  2.28  61.09
stupor.comaYes 35.47  7.64 283.29

Further assessment for confounding

This model was then further assessed for confounding by including other variables one by one and looking for 20% change in coefficients of age, systolic pressure, emergency admission, and stupor or coma variables. The coefficient of emergency admission increased by 36% after adding the cancer variable, thus the cancer variable was added to the model as a confounder.

Change in coefficents (rows) by adding a new variable (columns) shown as ratios

coef.after.add1 <-
    sapply(c("sex","race","service","cancer","renal","inf","cpr","heart","prevad","frac","abg","cre"),
           function(var) {
               res.model <- update(backward.model, formula = paste("~ . +", var))
               coef(res.model)[2:5]
           })

coef.before.add1        <- coef(backward.model)[2:5]
coef.ratio.before.after <- sweep(coef.after.add1, MARGIN = 1, STATS = coef.before.add1, FUN = "/")
coef.ratio.before.after
                  sex   race service cancer  renal    inf    cpr  heart prevad   frac    abg    cre
age            1.0306 0.9368  0.9996  1.108 0.9367 0.9587 1.0118 1.0023 0.9885 1.0431 0.9351 0.9702
sys            0.9617 0.8943  0.9963  1.042 1.0975 0.9276 0.9957 1.0053 0.9859 0.9848 0.9048 0.9621
typeEmergency  1.0256 1.0155  0.9045  1.362 0.9580 0.9684 0.9744 1.0023 1.0069 0.9951 0.9912 0.9724
stupor.comaYes 1.0057 1.0306  1.0086  1.079 1.0004 0.9928 0.9779 0.9991 1.0153 1.0044 0.9897 0.9979
change.ge.20percent     <- coef.ratio.before.after >= 1.2 | coef.ratio.before.after <= 0.8
change.ge.20percent
                 sex  race service cancer renal   inf   cpr heart prevad  frac   abg   cre
age            FALSE FALSE   FALSE  FALSE FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE
sys            FALSE FALSE   FALSE  FALSE FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE
typeEmergency  FALSE FALSE   FALSE   TRUE FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE
stupor.comaYes FALSE FALSE   FALSE  FALSE FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE

Cancer is associated with the predictor (admission type) in the study population, and is associated with higher mortality among the unexposed (elective admission). Biologically, cancer cannot be a downstream consequence of either of admission type or mortality.

xtabs(~ cancer + type, data = icu)
      type
cancer Elective Emergency
   No        38       142
   Yes       15         5
xtabs(~ cancer + status + type, data = icu)[,,1, drop = F]
, , type = Elective

      status
cancer Lived Died
   No     37    1
   Yes    14    1

Further assessment for changes in the coefficients of aforementioned four variables (age, systolic blood pressure, emergency admission, and stupor or coma variables) did not yield a 20% change, thus no other variables were added as confounders.

Change in coefficents (rows) by adding a second variable (columns) shown as ratios

backward.model.cancer <- update(backward.model, formula = ~ . + cancer)

coef.after.add1 <-
    sapply(c("sex","race","service","renal","inf","cpr","heart","prevad","frac","abg","cre"),
           function(var) {
               res.model <- update(backward.model.cancer, formula = paste("~ . +", var))
               coef(res.model)[2:5]
           })

coef.before.add1        <- coef(backward.model.cancer)[2:5]
coef.ratio.before.after <- sweep(coef.after.add1, MARGIN = 1, STATS = coef.before.add1, FUN = "/")
coef.ratio.before.after
                  sex   race service renal    inf    cpr  heart prevad   frac    abg    cre
age            1.0631 0.9522  1.0044 0.957 0.9658 1.0161 1.0018 0.9967 1.0626 0.9524 0.9782
sys            0.9385 0.9103  1.0031 1.084 0.9267 0.9993 1.0053 0.9880 0.9798 0.9243 0.9732
typeEmergency  1.0701 0.9980  0.9248 0.962 0.9823 0.9737 1.0007 1.0097 1.0021 0.9971 0.9805
stupor.comaYes 1.0215 1.0278  1.0116 1.001 0.9931 0.9825 0.9991 1.0257 1.0095 0.9914 0.9961
change.ge.20percent     <- coef.ratio.before.after >= 1.2 | coef.ratio.before.after <= 0.8
change.ge.20percent
                 sex  race service renal   inf   cpr heart prevad  frac   abg   cre
age            FALSE FALSE   FALSE FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE
sys            FALSE FALSE   FALSE FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE
typeEmergency  FALSE FALSE   FALSE FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE
stupor.comaYes FALSE FALSE   FALSE FALSE FALSE FALSE FALSE  FALSE FALSE FALSE FALSE

Final model

In the final model, age was associated with an odds ratio (OR) of 1.04 (95% confidence interval (CI) 1.01-1.07, p = 0.004) for each 1 unit increase, systolic blood pressure with an OR of 0.99 (95% CI 0.97-1.00, p = 0.067) for each unit increase, emergency admission with an OR 19.37 (95% CI 4.00-163.94, p = 0.001), stupor or coma with an OR of 47.08 (95% CI 9.24-434.19, p < 0.001), and cancer with an OR of 8.49 (95% CI 1.63-50.09, p = 0.011). Therefore, higher age, emergency admission, abnormal consciousness defined as deep stupor or coma, and having cancer were significant predictors of increased mortality in ICU.

Results from the final model

summary(backward.model.cancer)

Call:
glm(formula = status ~ age + sys + type + stupor.coma + cancer, 
    family = binomial, data = icu)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.569  -0.571  -0.339  -0.135   2.568  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -5.11198    1.58266   -3.23   0.0012 ** 
age             0.03657    0.01274    2.87   0.0041 ** 
sys            -0.01245    0.00679   -1.83   0.0668 .  
typeEmergency   2.96365    0.92230    3.21   0.0013 ** 
stupor.comaYes  3.85180    0.95283    4.04 0.000053 ***
cancerYes       2.13892    0.84412    2.53   0.0113 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 200.16  on 199  degrees of freedom
Residual deviance: 135.61  on 194  degrees of freedom
AIC: 147.6

Number of Fisher Scoring iterations: 6

Odds ratios and 95 percent confidence intervals

..glm.ratio(backward.model.cancer)
                  OR 2.5 % 97.5 %
(Intercept)     0.01  0.00   0.11
age             1.04  1.01   1.07
sys             0.99  0.97   1.00
typeEmergency  19.37  4.00 163.94
stupor.comaYes 47.08  9.24 434.19
cancerYes       8.49  1.63  50.09