Data in ICU (in Stat2Data) for a sample of 200 patients.
Predict Survival (1=Live, 0=Dead) based on three variables:Age, systolic blood pressure (SysBP), and Pulse
data(ICU)
ggplot(ICU, aes(x=factor(Survive) , y=Age))+
geom_violin(trim=FALSE)+
geom_boxplot(aes(fill=factor(Survive)), width=0.2) +
xlab("Survival Status")
ggplot(ICU, aes(x = factor(Survive), y = SysBP)) +
geom_violin(trim = FALSE) +
geom_boxplot(aes(fill = factor(Survive)),width = 0.2)+
theme_bw() +
ggtitle("Predicting Survival in ICU") + ylab("SysBP") + xlab("Survival Status (1 = alive, 0 = dead)")
ggplot(ICU, aes(x = factor(Survive), y = Pulse)) +
geom_violin(trim = FALSE) +
geom_boxplot(aes(fill = factor(Survive)),width = 0.2)+
theme_bw() +
ggtitle("Predicting Survival in ICU") + ylab("Pulse") + xlab("Survival Status (1 = alive, 0 = dead)")
lmodBPInf2 <- glm(Survive~SysBP+ SysBP:factor(Infection),family = binomial,data=ICU)
tidy(lmodBPInf2,exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 3 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.547 0.772 -0.780 0.435 0.117 2.47
## 2 SysBP 1.02 0.00614 2.89 0.00385 1.01 1.03
## 3 SysBP:factor(Infectio… 0.995 0.00291 -1.57 0.115 0.990 1.00
lmodBPInf2_pred <- predict(lmodBPInf2, type="response", se=TRUE)
ICU$fit2 <- lmodBPInf2_pred$fit
ggplot(ICU, aes(x = SysBP, y = jitter(Survive, 0.25),colour=factor(Infection))) +
geom_point() +
geom_line(data=ICU, aes(x = SysBP, y = fit2, col=factor(Infection))) + theme_bw()
Fit the model in which logit(Survive) depends on SysBP and on Infection, allowing for different coefficients of SysBP for Infection=1 and for Infection=0
The P-value for testing the interaction is 0.15, so the interaction term can be dropped from the model. The effect of SysBP on Survive appears to be the same for patients with and without infection.
lmodBPInf3 <- glm(Survive~SysBP + factor(Infection) + SysBP*factor(Infection), family=binomial, data=ICU )
summary(lmodBPInf3)
##
## Call:
## glm(formula = Survive ~ SysBP + factor(Infection) + SysBP * factor(Infection),
## family = binomial, data = ICU)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2513 0.4757 0.5511 0.6018 1.4916
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.123221 1.195223 0.940 0.3473
## SysBP 0.005189 0.008638 0.601 0.5480
## factor(Infection)1 -2.933825 1.589470 -1.846 0.0649 .
## SysBP:factor(Infection)1 0.017660 0.012294 1.436 0.1509
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 200.16 on 199 degrees of freedom
## Residual deviance: 185.41 on 196 degrees of freedom
## AIC: 193.41
##
## Number of Fisher Scoring iterations: 4
#Find odds ratio , so use tidy
tidy(lmodBPInf3, exponentiate=TRUE, conf.int=TRUE )
## # A tibble: 4 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.07 1.20 0.940 0.347 0.287 32.2
## 2 SysBP 1.01 0.00864 0.601 0.548 0.989 1.02
## 3 factor(Infection)1 0.0532 1.59 -1.85 0.0649 0.00223 1.17
## 4 SysBP:factor(Infectio… 1.02 0.0123 1.44 0.151 0.994 1.04
lmodBPInf3_pred <- predict(lmodBPInf3, type="response", se=TRUE)
ICU$fit3 <- lmodBPInf3_pred$fit
ggplot(ICU, aes(x = SysBP, y = jitter(Survive, 0.25),colour=factor(Infection))) +
geom_point() +
geom_line(data=ICU, aes(x = SysBP, y = fit3, col=factor(Infection))) + theme_bw()
Consider a multiple logistic regression model for Survive using the three quantitative predictors in the dataset Age, SysBP, and Pulse.
lmodASP <- glm(Survive~ Age + SysBP + Pulse,family = binomial,data=ICU)
tidy(lmodASP,exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 4 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.89 1.23 0.864 0.388 0.260 33.3
## 2 Age 0.972 0.0108 -2.63 0.00850 0.950 0.992
## 3 SysBP 1.02 0.00587 2.85 0.00432 1.01 1.03
## 4 Pulse 0.999 0.00670 -0.139 0.890 0.986 1.01
After Age and SysBP explain variability in Survive, Pulse is not helpful for explaining any remaining unexplained variation in survival status at discharge (P-value = 0.89). Here is some output when Pulse is dropped from the model.
lmodAS <- glm(Survive~ Age + SysBP,family = binomial,data=ICU)
tidy(lmodAS,exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 3 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.62 1.00 0.962 0.336 0.374 19.4
## 2 Age 0.972 0.0108 -2.64 0.00838 0.950 0.992
## 3 SysBP 1.02 0.00586 2.87 0.00407 1.01 1.03
The predicted log(odds) of survival for an 87-year-old patient with a systolic blood pressur of 80 is:
new <- data.frame(Age=87, SysBP=80)
predict(lmodAS, newdata=new)
## 1
## -0.1624203
The estimated odds of survival are:
exp(predict(lmodAS, newdata=new))
## 1
## 0.8500838
estimated probability that this person survives his visit is:
#We use type=response to find probability. Otherwise, R will provide log odds.
predict(lmodAS, type="response", newdata=new)
## 1
## 0.459484