RMD 5 will focus on an introduction to logistic regression. Main takeaways should be when and how to use logistic regression.

To start, the formula for logistic regression is: log(p/1-p) = β0 + β1(var1)

library(faraway)
data(orings)

Plots

This is a cool plot from Dr. Knudson’s Logistic Intro.

plot(damage/6 ~ temp, data = orings, ylim = c(0,1), xlab = "Temperature at Launch", ylab = "Probability of O-ring damage")

The plot tells us a few things. First, as the temperature rises O-ring damage decreases and secondly, we should not launch below 55 degrees.

Model Building

shuttlemod <- glm(cbind(damage,6-damage) ~ temp, family = binomial, data = orings)
summary(shuttlemod)
## 
## Call:
## glm(formula = cbind(damage, 6 - damage) ~ temp, family = binomial, 
##     data = orings)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9529  -0.7345  -0.4393  -0.2079   1.9565  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 11.66299    3.29626   3.538 0.000403 ***
## temp        -0.21623    0.05318  -4.066 4.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 38.898  on 22  degrees of freedom
## Residual deviance: 16.912  on 21  degrees of freedom
## AIC: 33.675
## 
## Number of Fisher Scoring iterations: 6

log(p/1-p) = 11.663 + -.2162*temp

plot((damage)/6 ~ jitter(temp), data = orings, ylim = c(0,1), xlim= c(25,85), xlab = "Temperature at launch", ylab = "Probability of O-Ring Damage")
xs<- seq(25,85,1)
ys <- ilogit(11.663-.2162*xs)
lines(xs,ys)

This plots our model for our data. We can see that the lower the temp, the higher chance of O-ring damage.

Point Estimation

We can interpret regression equations by talking about the odds p/1-p. The odds of o-ring failure change by a factor of exp(.2162) for every degree the temp drops.

Confidence Intervals

Just like Poisson regression we want to interpret the exponentiated coefficents. So, we want to calculate the confidence interval for a coefficent and the exponentiate the endpoints

forbeta <- coef(shuttlemod)[2] + 1.96*c(-1,1)*.05318
exp(forbeta)#first way
## [1] 0.7258104 0.8940435
CI <- confint(shuttlemod)
## Waiting for profiling to be done...
exp(CI)
##                   2.5 %       97.5 %
## (Intercept) 263.8010254 1.372889e+08
## temp          0.7170161 8.867617e-01

Lack of Fit

df = n - # of betas

pchisq(deviance(shuttlemod), lower.tail = FALSE, df =1)
## [1] 3.914754e-05

How do we interpret this: H0: The model does not exhibit a lack of fit

So, in this example we can say that the model does exhibit a lack of fit and we can reject the null hypothesis

Drop in Deviance

mod2 <- glm(cbind(damage, 6-damage) ~1, data = orings, family = binomial)
dsmall <- deviance(mod2)
dbig <- deviance(shuttlemod)
teststat <- dsmall-dbig
pchisq(teststat, df = 1, lower.tail = FALSE)
## [1] 2.747351e-06

```