Data in ICU (in Stat2Data) for a sample of 200 patients.

Predict Survival (1=Live, 0=Dead) based on three variables:Age, systolic blood pressure (SysBP), and Pulse

data(ICU)

Boxplots for Each Predictor

ggplot(ICU, aes(x=factor(Survive) , y=Age))+
  geom_violin(trim=FALSE)+ 
  geom_boxplot(aes(fill=factor(Survive)), width=0.2) +
  xlab("Survival Status")

ggplot(ICU, aes(x = factor(Survive), y = SysBP)) +
geom_violin(trim = FALSE) +
geom_boxplot(aes(fill = factor(Survive)),width = 0.2)+
theme_bw() +
ggtitle("Predicting Survival in ICU") + ylab("SysBP") + xlab("Survival Status (1 = alive, 0 = dead)")

ggplot(ICU, aes(x = factor(Survive), y = Pulse)) +
geom_violin(trim = FALSE) +
geom_boxplot(aes(fill = factor(Survive)),width = 0.2)+
theme_bw() +
ggtitle("Predicting Survival in ICU") + ylab("Pulse") + xlab("Survival Status (1 = alive, 0 = dead)")

Including a Categorical Predictor- Different Slopes

lmodBPInf2 <- glm(Survive~SysBP+ SysBP:factor(Infection),family = binomial,data=ICU)

tidy(lmodBPInf2,exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 3 x 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)               0.547   0.772      -0.780 0.435      0.117      2.47
## 2 SysBP                     1.02    0.00614     2.89  0.00385    1.01       1.03
## 3 SysBP:factor(Infectio…    0.995   0.00291    -1.57  0.115      0.990      1.00
lmodBPInf2_pred <- predict(lmodBPInf2, type="response", se=TRUE) 

ICU$fit2 <- lmodBPInf2_pred$fit

ggplot(ICU, aes(x = SysBP, y = jitter(Survive, 0.25),colour=factor(Infection))) +
geom_point() +
geom_line(data=ICU, aes(x = SysBP, y = fit2, col=factor(Infection))) + theme_bw()

Two Different Curves

Fit the model in which logit(Survive) depends on SysBP and on Infection, allowing for different coefficients of SysBP for Infection=1 and for Infection=0

The P-value for testing the interaction is 0.15, so the interaction term can be dropped from the model. The effect of SysBP on Survive appears to be the same for patients with and without infection.

lmodBPInf3 <- glm(Survive~SysBP + factor(Infection) + SysBP*factor(Infection), family=binomial, data=ICU )
summary(lmodBPInf3)
## 
## Call:
## glm(formula = Survive ~ SysBP + factor(Infection) + SysBP * factor(Infection), 
##     family = binomial, data = ICU)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2513   0.4757   0.5511   0.6018   1.4916  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)  
## (Intercept)               1.123221   1.195223   0.940   0.3473  
## SysBP                     0.005189   0.008638   0.601   0.5480  
## factor(Infection)1       -2.933825   1.589470  -1.846   0.0649 .
## SysBP:factor(Infection)1  0.017660   0.012294   1.436   0.1509  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 200.16  on 199  degrees of freedom
## Residual deviance: 185.41  on 196  degrees of freedom
## AIC: 193.41
## 
## Number of Fisher Scoring iterations: 4
#Find odds ratio , so use tidy
tidy(lmodBPInf3, exponentiate=TRUE, conf.int=TRUE )
## # A tibble: 4 x 7
##   term                   estimate std.error statistic p.value conf.low conf.high
##   <chr>                     <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)              3.07     1.20        0.940  0.347   0.287       32.2 
## 2 SysBP                    1.01     0.00864     0.601  0.548   0.989        1.02
## 3 factor(Infection)1       0.0532   1.59       -1.85   0.0649  0.00223      1.17
## 4 SysBP:factor(Infectio…   1.02     0.0123      1.44   0.151   0.994        1.04
lmodBPInf3_pred <- predict(lmodBPInf3, type="response", se=TRUE) 

ICU$fit3 <- lmodBPInf3_pred$fit

ggplot(ICU, aes(x = SysBP, y = jitter(Survive, 0.25),colour=factor(Infection))) +
geom_point() +
geom_line(data=ICU, aes(x = SysBP, y = fit3, col=factor(Infection))) + theme_bw()

Prediction

Consider a multiple logistic regression model for Survive using the three quantitative predictors in the dataset Age, SysBP, and Pulse.

lmodASP <- glm(Survive~ Age + SysBP + Pulse,family = binomial,data=ICU)

tidy(lmodASP,exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 4 x 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)    2.89    1.23        0.864 0.388      0.260    33.3  
## 2 Age            0.972   0.0108     -2.63  0.00850    0.950     0.992
## 3 SysBP          1.02    0.00587     2.85  0.00432    1.01      1.03 
## 4 Pulse          0.999   0.00670    -0.139 0.890      0.986     1.01

After Age and SysBP explain variability in Survive, Pulse is not helpful for explaining any remaining unexplained variation in survival status at discharge (P-value = 0.89). Here is some output when Pulse is dropped from the model.

lmodAS <- glm(Survive~ Age + SysBP,family = binomial,data=ICU)

tidy(lmodAS,exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 3 x 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)    2.62    1.00        0.962 0.336      0.374    19.4  
## 2 Age            0.972   0.0108     -2.64  0.00838    0.950     0.992
## 3 SysBP          1.02    0.00586     2.87  0.00407    1.01      1.03

The predicted log(odds) of survival for an 87-year-old patient with a systolic blood pressur of 80 is:

new <- data.frame(Age=87, SysBP=80)
predict(lmodAS, newdata=new)
##          1 
## -0.1624203

The estimated odds of survival are:

exp(predict(lmodAS, newdata=new))
##         1 
## 0.8500838

estimated probability that this person survives his visit is:

#We use type=response to find probability. Otherwise, R will provide log odds. 
predict(lmodAS, type="response", newdata=new)
##        1 
## 0.459484