Biostatistics 213: Homework 8
## Note: On Mac OS X we strongly recommend using iplots from within JGR.
## Proceed at your own risk as iplots cannot resolve potential ev.loop deadlocks.
## 'Yes' is assumed for all dialogs as they cannot be shown without a deadlock,
## also ievent.wait() is disabled.
Introduction
To evaluate the association between ICU mortality (outcome) and various baseline characteristics at baseline (predictors), analysis of data from the icu dataset was conducted.
library(gdata)
icu <- read.xls("~/statistics/bio213/icu.xls")
## Correct variable names: po2 for pO2, pco2 for pCO2
names(icu) <- c("id","status","age","sex","race","service","cancer","renal","inf","cpr","sys","heart","prevad","type","frac","po2","ph","pco2","bic","cre","loc")
## Same data available online:
## Hosmer and Lemeshow (2000) Applied Logistic Regression
## http://www.umass.edu/statdata/statdata/data/index.html
## description: http://www.umass.edu/statdata/statdata/data/icu.txt
## text format: http://www.umass.edu/statdata/statdata/data/icu.dat
## Excel format: http://www.umass.edu/statdata/statdata/data/icu.xls
## icu <- read.table("http://www.umass.edu/statdata/statdata/data/icu.dat",
## skip = 8, col.names = c("id","status","age","sex","race","service","cancer","renal","inf","cpr","sys","heart","prevad","type","frac","po2","ph","pco2","bic","cre","loc")
## )
## OR
## icu <- read.xls("http://www.umass.edu/statdata/statdata/data/icu.xls")
No missing data are present. There are few minority patients, cancer patients, chronic renal failure patients, patients who underwent CPR, long bone fracture patients, patients who had abnormal values in the ABG testing, and patients who had abnormal consciousness levels.
## Number of subjects
nrow(icu)
## Check categorical and continuous variables (No need after categorical conversion)
lapply(icu, function(x) head(summary.factor(x), 20)) # First 20 levels
## Check number of unique values
n.unique.values <- sapply(icu, function(x) nlevels(factor(x)))
binary.variables <- names(icu)[n.unique.values == 2]
dput(binary.variables)
## summary
summary(icu)
## Number of complete cases: All 200 are complete cases
sum(complete.cases(icu))
## Frequencies
library(Deducer)
frequencies(icu[, !names(icu) %in% c("id","age","sys","heart")])
## Recode variables
icu <- within(icu, {
status <- factor(status, levels = 0:1, labels = c("Lived","Died"))
sex <- factor(sex, levels = 0:1, labels = c("Male","Female"))
race <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
service <- factor(service, levels = 0:1, labels = c("Medical","Surgical"))
type <- factor(type, levels = 0:1, labels = c("Elective","Emergency"))
po2 <- factor(po2, levels = 0:1, labels = c(">60","<=60"))
ph <- factor(ph, levels = 0:1, labels = c(">=7.25","<7.25"))
pco2 <- factor(pco2, levels = 0:1, labels = c("<=45",">45"))
bic <- factor(bic, levels = 0:1, labels = c(">=18","<18"))
cre <- factor(cre, levels = 0:1, labels = c("<=2.0",">2.0"))
loc <- factor(loc, levels = 0:2, labels = c("No Coma or Deep Stupor","Deep Stupor","Coma"))
})
## Recode No-Yes variables
no.yes.variables <- c("cancer","renal","inf","cpr","prevad","frac")
icu[,no.yes.variables] <-
lapply(icu[,no.yes.variables],
function(var) {
var <- factor(var, levels = 0:1, labels = c("No","Yes"))
})
summary(icu)
id status age sex race service cancer renal inf
Min. : 4 Lived:160 Min. :16.0 Male :124 White:175 Medical : 93 No :180 No :181 No :116
1st Qu.:210 Died : 40 1st Qu.:46.8 Female: 76 Black: 15 Surgical:107 Yes: 20 Yes: 19 Yes: 84
Median :412 Median :63.0 Other: 10
Mean :445 Mean :57.5
3rd Qu.:672 3rd Qu.:72.0
Max. :929 Max. :92.0
cpr sys heart prevad type frac po2 ph pco2
No :187 Min. : 36 Min. : 39.0 No :170 Elective : 53 No :185 >60 :184 >=7.25:187 <=45:180
Yes: 13 1st Qu.:110 1st Qu.: 80.0 Yes: 30 Emergency:147 Yes: 15 <=60: 16 <7.25 : 13 >45 : 20
Median :130 Median : 96.0
Mean :132 Mean : 98.9
3rd Qu.:150 3rd Qu.:118.2
Max. :256 Max. :192.0
bic cre loc
>=18:185 <=2.0:190 No Coma or Deep Stupor:185
<18 : 15 >2.0 : 10 Deep Stupor : 5
Coma : 10
library(gpairs)
gpairs(icu)
Univariable analysis
In the univariable analysis (homework 7), higher age, lower systolic blood pressure, medical service (vs surgical service), chronic renal failure, infection, cardiopulmonary resuscitation prior to ICU admission, emergency ICU admission, creatinine > 2.0 mg/dl were significantly associated with poor outcome. Low bicarbonate had a univariable p-value less than 0.2 for association with high mortality.
Variable manipulation
As they conveyed biologically similar information, pO2, pH, pCO2, and bicarbonate variables were combined as arterial blood gas (abg) variable, indicating abnormality in any of these four variables. Deep stupor and coma were considered similar, thus these were combined to form a dichotomous variable deep stupor or coma.
icu <- within(icu, {
abg <- NA
abg[ (po2 == "<=60" | ph == "<7.25" | pco2 == ">45" | bic == "<18")] <- "abnormal"
abg[!(po2 == "<=60" | ph == "<7.25" | pco2 == ">45" | bic == "<18")] <- "normal"
abg <- factor(abg, levels = c("normal", "abnormal"))
stupor.coma <- NA
stupor.coma[ loc %in% c("Coma","Deep Stupor")] <- "Yes"
stupor.coma[!loc %in% c("Coma","Deep Stupor")] <- "No"
stupor.coma <- factor(stupor.coma, levels = c("No","Yes"))
})
if (FALSE) {
## filter test
grid.list <- list(po2 = c(">60","<=60"),
ph = c(">=7.25","<7.25"),
pco2 = c("<=45",">45"),
bic = c(">=18","<18")
)
grid.out <- expand.grid(grid.list)
grid.out <- within(grid.out, {
abg <- NA
abg[ (po2 == "<=60" | ph == "<7.25" | pco2 == ">45" | bic == "<18")] <- "abnormal"
abg[!(po2 == "<=60" | ph == "<7.25" | pco2 == ">45" | bic == "<18")] <- "normal"
abg <- factor(abg, levels = c("normal", "abnormal"))
})
}
Initial model
An initial model was constructed with variables that were significant in the univariable analysis (homework 7): age, systolic pressure, service (medical vs surgical), chronic renal failure, admission for infection, CPR before admission, and emergency admission. The recoded variables (abg and stupor or coma) were also included.
Age, systolic blood pressure, emergency admission, and stupor or coma were significant in the multivariable regression model.
model.full <- glm(status ~ age + sys + service + renal + inf + cpr + type + abg + stupor.coma, family = binomial, data = icu)
summary(model.full)
Call:
glm(formula = status ~ age + sys + service + renal + inf + cpr +
type + abg + stupor.coma, family = binomial, data = icu)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.536 -0.600 -0.344 -0.196 2.563
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.88315 1.64074 -2.37 0.01795 *
age 0.02984 0.01274 2.34 0.01919 *
sys -0.01163 0.00725 -1.60 0.10868
serviceSurgical -0.08872 0.53242 -0.17 0.86765
renalYes 0.50921 0.67080 0.76 0.44779
infYes 0.08179 0.48069 0.17 0.86488
cprYes 0.55797 0.82620 0.68 0.49946
typeEmergency 1.95773 0.88212 2.22 0.02646 *
abgabnormal 0.35352 0.53509 0.66 0.50883
stupor.comaYes 3.46283 0.91258 3.79 0.00015 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 200.16 on 199 degrees of freedom
Residual deviance: 139.69 on 190 degrees of freedom
AIC: 159.7
Number of Fisher Scoring iterations: 6
##epicalc::logistic.display(model.full)
..glm.ratio(model.full)
OR 2.5 % 97.5 %
(Intercept) 0.02 0.00 0.46
age 1.03 1.01 1.06
sys 0.99 0.97 1.00
serviceSurgical 0.92 0.32 2.60
renalYes 1.66 0.42 6.11
infYes 1.09 0.42 2.78
cprYes 1.75 0.30 8.42
typeEmergency 7.08 1.51 55.59
abgabnormal 1.42 0.49 4.06
stupor.comaYes 31.91 6.52 265.41
Model selection
Backward selection of variables with Akaike information criteria (AIC) was used to choose a best model, and resulted in a model with age, systolic blood pressure, emergency admission, and stupor or coma remained in the chosen model and were significant except for systolic blood pressure (p = 0.073). As there was only one non-significant variable, assessment for collinearity was not considered useful, thus was omitted.
backward.model <- step(model.full, direction = "backward", trace = 0)
summary(backward.model)
Call:
glm(formula = status ~ age + sys + type + stupor.coma, family = binomial,
data = icu)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.499 -0.597 -0.364 -0.186 2.515
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.04069 1.46218 -2.76 0.0057 **
age 0.03300 0.01206 2.74 0.0062 **
sys -0.01194 0.00666 -1.79 0.0728 .
typeEmergency 2.17646 0.80253 2.71 0.0067 **
stupor.comaYes 3.56868 0.88957 4.01 0.00006 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 200.16 on 199 degrees of freedom
Residual deviance: 141.94 on 195 degrees of freedom
AIC: 151.9
Number of Fisher Scoring iterations: 6
..glm.ratio(backward.model)
OR 2.5 % 97.5 %
(Intercept) 0.02 0.00 0.27
age 1.03 1.01 1.06
sys 0.99 0.97 1.00
typeEmergency 8.82 2.28 61.09
stupor.comaYes 35.47 7.64 283.29
Further assessment for confounding
This model was then further assessed for confounding by including other variables one by one and looking for 20% change in coefficients of age, systolic pressure, emergency admission, and stupor or coma variables. The coefficient of emergency admission increased by 36% after adding the cancer variable, thus the cancer variable was added to the model as a confounder.
Change in coefficents (rows) by adding a new variable (columns) shown as ratios
coef.after.add1 <-
sapply(c("sex","race","service","cancer","renal","inf","cpr","heart","prevad","frac","abg","cre"),
function(var) {
res.model <- update(backward.model, formula = paste("~ . +", var))
coef(res.model)[2:5]
})
coef.before.add1 <- coef(backward.model)[2:5]
coef.ratio.before.after <- sweep(coef.after.add1, MARGIN = 1, STATS = coef.before.add1, FUN = "/")
coef.ratio.before.after
sex race service cancer renal inf cpr heart prevad frac abg cre
age 1.0306 0.9368 0.9996 1.108 0.9367 0.9587 1.0118 1.0023 0.9885 1.0431 0.9351 0.9702
sys 0.9617 0.8943 0.9963 1.042 1.0975 0.9276 0.9957 1.0053 0.9859 0.9848 0.9048 0.9621
typeEmergency 1.0256 1.0155 0.9045 1.362 0.9580 0.9684 0.9744 1.0023 1.0069 0.9951 0.9912 0.9724
stupor.comaYes 1.0057 1.0306 1.0086 1.079 1.0004 0.9928 0.9779 0.9991 1.0153 1.0044 0.9897 0.9979
change.ge.20percent <- coef.ratio.before.after >= 1.2 | coef.ratio.before.after <= 0.8
change.ge.20percent
sex race service cancer renal inf cpr heart prevad frac abg cre
age FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
sys FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
typeEmergency FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
stupor.comaYes FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Cancer is associated with the predictor (admission type) in the study population, and is associated with higher mortality among the unexposed (elective admission). Biologically, cancer cannot be a downstream consequence of either of admission type or mortality.
xtabs(~ cancer + type, data = icu)
type
cancer Elective Emergency
No 38 142
Yes 15 5
xtabs(~ cancer + status + type, data = icu)[,,1, drop = F]
, , type = Elective
status
cancer Lived Died
No 37 1
Yes 14 1
Further assessment for changes in the coefficients of aforementioned four variables (age, systolic blood pressure, emergency admission, and stupor or coma variables) did not yield a 20% change, thus no other variables were added as confounders.
Change in coefficents (rows) by adding a second variable (columns) shown as ratios
backward.model.cancer <- update(backward.model, formula = ~ . + cancer)
coef.after.add1 <-
sapply(c("sex","race","service","renal","inf","cpr","heart","prevad","frac","abg","cre"),
function(var) {
res.model <- update(backward.model.cancer, formula = paste("~ . +", var))
coef(res.model)[2:5]
})
coef.before.add1 <- coef(backward.model.cancer)[2:5]
coef.ratio.before.after <- sweep(coef.after.add1, MARGIN = 1, STATS = coef.before.add1, FUN = "/")
coef.ratio.before.after
sex race service renal inf cpr heart prevad frac abg cre
age 1.0631 0.9522 1.0044 0.957 0.9658 1.0161 1.0018 0.9967 1.0626 0.9524 0.9782
sys 0.9385 0.9103 1.0031 1.084 0.9267 0.9993 1.0053 0.9880 0.9798 0.9243 0.9732
typeEmergency 1.0701 0.9980 0.9248 0.962 0.9823 0.9737 1.0007 1.0097 1.0021 0.9971 0.9805
stupor.comaYes 1.0215 1.0278 1.0116 1.001 0.9931 0.9825 0.9991 1.0257 1.0095 0.9914 0.9961
change.ge.20percent <- coef.ratio.before.after >= 1.2 | coef.ratio.before.after <= 0.8
change.ge.20percent
sex race service renal inf cpr heart prevad frac abg cre
age FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
sys FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
typeEmergency FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
stupor.comaYes FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Final model
In the final model, age was associated with an odds ratio (OR) of 1.04 (95% confidence interval (CI) 1.01-1.07, p = 0.004) for each 1 unit increase, systolic blood pressure with an OR of 0.99 (95% CI 0.97-1.00, p = 0.067) for each unit increase, emergency admission with an OR 19.37 (95% CI 4.00-163.94, p = 0.001), stupor or coma with an OR of 47.08 (95% CI 9.24-434.19, p < 0.001), and cancer with an OR of 8.49 (95% CI 1.63-50.09, p = 0.011). Therefore, higher age, emergency admission, abnormal consciousness defined as deep stupor or coma, and having cancer were significant predictors of increased mortality in ICU.
Results from the final model
summary(backward.model.cancer)
Call:
glm(formula = status ~ age + sys + type + stupor.coma + cancer,
family = binomial, data = icu)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.569 -0.571 -0.339 -0.135 2.568
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.11198 1.58266 -3.23 0.0012 **
age 0.03657 0.01274 2.87 0.0041 **
sys -0.01245 0.00679 -1.83 0.0668 .
typeEmergency 2.96365 0.92230 3.21 0.0013 **
stupor.comaYes 3.85180 0.95283 4.04 0.000053 ***
cancerYes 2.13892 0.84412 2.53 0.0113 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 200.16 on 199 degrees of freedom
Residual deviance: 135.61 on 194 degrees of freedom
AIC: 147.6
Number of Fisher Scoring iterations: 6
Odds ratios and 95 percent confidence intervals
..glm.ratio(backward.model.cancer)
OR 2.5 % 97.5 %
(Intercept) 0.01 0.00 0.11
age 1.04 1.01 1.07
sys 0.99 0.97 1.00
typeEmergency 19.37 4.00 163.94
stupor.comaYes 47.08 9.24 434.19
cancerYes 8.49 1.63 50.09