library(MASS)
library(dplyr)
Consider the space shuttle data ?shuttle in the MASS library. Consider modeling the use of the autolander as the outcome (variable name use). Fit a logistic regression model with autolander (variable auto) use (labeled as “auto” 1) versus not (0) as predicted by wind sign (variable wind). Give the estimated odds ratio for autolander use comparing head winds, labeled as “head” in the variable headwind (numerator) to tail winds (denominator).
data("shuttle")
shuttle <- mutate(shuttle, use = relevel(use, ref="noauto"))
shuttle$use.bin <- as.integer(shuttle$use) - 1
mdl <- glm(use.bin ~ wind - 1, family = "binomial", data = shuttle)
summary(mdl)
##
## Call:
## glm(formula = use.bin ~ wind - 1, family = "binomial", data = shuttle)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.30 -1.29 1.06 1.07 1.07
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## windhead 0.251 0.178 1.41 0.16
## windtail 0.283 0.179 1.59 0.11
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 354.89 on 256 degrees of freedom
## Residual deviance: 350.35 on 254 degrees of freedom
## AIC: 354.3
##
## Number of Fisher Scoring iterations: 4
exp(coef(mdl))
## windhead windtail
## 1.29 1.33
exp(coef(mdl)[[1]])/exp(coef(mdl)[[2]])
## [1] 0.969
The odds ratio is 0.969.
Consider the previous problem. Give the estimated odds ratio for autolander use comparing head winds (numerator) to tail winds (denominator) adjusting for wind strength from the variable magn.
mdl2 <- glm(use.bin ~ wind + magn - 1, family = "binomial", data = shuttle)
summary(mdl2)
##
## Call:
## glm(formula = use.bin ~ wind + magn - 1, family = "binomial",
## data = shuttle)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.35 -1.32 1.01 1.04 1.18
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## windhead 3.64e-01 2.84e-01 1.28 0.20
## windtail 3.96e-01 2.84e-01 1.39 0.16
## magnMedium -1.01e-15 3.60e-01 0.00 1.00
## magnOut -3.80e-01 3.57e-01 -1.06 0.29
## magnStrong -6.44e-02 3.59e-01 -0.18 0.86
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 354.89 on 256 degrees of freedom
## Residual deviance: 348.78 on 251 degrees of freedom
## AIC: 358.8
##
## Number of Fisher Scoring iterations: 4
exp(coef(mdl2))
## windhead windtail magnMedium magnOut magnStrong
## 1.438 1.485 1.000 0.684 0.938
exp(coef(mdl2))[[1]]/exp(coef(mdl2))[[2]]
## [1] 0.968
The odds ratio is 0.968 when accounting for the magnitude of wind velocity. Therefore, wind velocity has no impact on the probability of using the autolander.
If you fit a logistic regression model to a binary variable, for example use of the autolander, then fit a logistic regression model for one minus the outcome (not using the autolander) what happens to the coefficients?
mdl3 <- glm(1- use.bin ~ wind - 1, family = "binomial", data = shuttle)
summary(mdl)$coef
## Estimate Std. Error z value Pr(>|z|)
## windhead 0.251 0.178 1.41 0.158
## windtail 0.283 0.179 1.59 0.113
summary(mdl3)$coef
## Estimate Std. Error z value Pr(>|z|)
## windhead -0.251 0.178 -1.41 0.158
## windtail -0.283 0.179 -1.59 0.113
The coefficients reverse their signs.
Consider the insect spray data InsectSprays. Fit a Poisson model using spray as a factor level. Report the estimated relative rate comapring spray A (numerator) to spray B (denominator).
data("InsectSprays")
mdl4 <- glm(count ~ spray -1, family = "poisson", data = InsectSprays)
summary(mdl4)$coef
## Estimate Std. Error z value Pr(>|z|)
## sprayA 2.674 0.0758 35.27 1.45e-272
## sprayB 2.730 0.0737 37.03 3.51e-300
## sprayC 0.734 0.2000 3.67 2.43e-04
## sprayD 1.593 0.1302 12.23 2.07e-34
## sprayE 1.253 0.1543 8.12 4.71e-16
## sprayF 2.813 0.0707 39.79 0.00e+00
coefs <- exp(coef(mdl4))
coefs
## sprayA sprayB sprayC sprayD sprayE sprayF
## 14.50 15.33 2.08 4.92 3.50 16.67
coefs[[1]]/coefs[[2]]
## [1] 0.946
The relative rate of spray A to spray B is 0.946.
Consider a Poisson glm with an offset, t. So, for example, a model of the form glm(count ~ x + offset(t), family = poisson) where x is a factor variable comparing a treatment (1) to a control (0) and t is the natural log of a monitoring time. What is impact of the coefficient for x if we fit the model glm(count ~ x + offset(t2), family = poisson) where 2 <- log(10) + t? In other words, what happens to the coefficients if we change the units of the offset variable. (Note, adding log(10) on the log scale is multiplying by 10 on the original scale.)
mdl5.1 <- glm(count ~ spray, offset = log(count+1), family = poisson, data = InsectSprays)
mdl5.2 <- glm(count ~ spray, offset = log(count+1)+log(10), family = poisson, data = InsectSprays)
summary(mdl5.1)
##
## Call:
## glm(formula = count ~ spray, family = poisson, data = InsectSprays,
## offset = log(count + 1))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1625 -0.0909 -0.0190 0.0674 0.6557
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.06669 0.07581 -0.88 0.38
## sprayB 0.00351 0.10574 0.03 0.97
## sprayC -0.32535 0.21389 -1.52 0.13
## sprayD -0.11845 0.15065 -0.79 0.43
## sprayE -0.18462 0.17192 -1.07 0.28
## sprayF 0.00842 0.10367 0.08 0.94
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 9.5211 on 71 degrees of freedom
## Residual deviance: 4.9323 on 66 degrees of freedom
## AIC: 283.2
##
## Number of Fisher Scoring iterations: 4
summary(mdl5.2)
##
## Call:
## glm(formula = count ~ spray, family = poisson, data = InsectSprays,
## offset = log(count + 1) + log(10))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1625 -0.0909 -0.0190 0.0674 0.6557
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.36928 0.07581 -31.25 <2e-16 ***
## sprayB 0.00351 0.10574 0.03 0.97
## sprayC -0.32535 0.21389 -1.52 0.13
## sprayD -0.11845 0.15065 -0.79 0.43
## sprayE -0.18462 0.17192 -1.07 0.28
## sprayF 0.00842 0.10367 0.08 0.94
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 9.5211 on 71 degrees of freedom
## Residual deviance: 4.9323 on 66 degrees of freedom
## AIC: 283.2
##
## Number of Fisher Scoring iterations: 4
rbind(coef(mdl5.1),coef(mdl5.2))
## (Intercept) sprayB sprayC sprayD sprayE sprayF
## [1,] -0.0667 0.00351 -0.325 -0.118 -0.185 0.00842
## [2,] -2.3693 0.00351 -0.325 -0.118 -0.185 0.00842
The intercept changes, but the coefficient estimate is unchanged.
Consider the data
x <- -5:5
y <- c(5.12, 3.93, 2.67, 1.87, 0.52, 0.08, 0.93, 2.05, 2.54, 3.87, 4.97)
Using a knot point at 0, fit a linear model that looks like a hockey stick with two lines meeting at x=0. Include an intercept term, x and the knot point term. What is the estimated slope of the line after 0?
x <- -5:5
y <- c(5.12, 3.93, 2.67, 1.87, 0.52, 0.08, 0.93, 2.05, 2.54, 3.87, 4.97)
plot(x, y, pch = 21, cex = 2, col="grey20", bg="cadetblue2")
knots <- 0
splineTerms <- sapply(knots, function(knot) (x > knot) * (x - knot))
xmat <- cbind(1, x, splineTerms)
mdl6 <- lm(y~xmat-1)
yhat<-predict(mdl6)
lines(x, yhat, col = "red", lwd = 2)
summary(mdl6)
##
## Call:
## lm(formula = y ~ xmat - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3216 -0.1098 0.0159 0.1407 0.2626
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## xmat -0.1826 0.1356 -1.35 0.21
## xmatx -1.0242 0.0481 -21.31 2.5e-08 ***
## xmat 2.0372 0.0857 23.76 1.0e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.228 on 8 degrees of freedom
## Multiple R-squared: 0.996, Adjusted R-squared: 0.995
## F-statistic: 665 on 3 and 8 DF, p-value: 6.25e-10
sum(coef(mdl6)[2:3])
## [1] 1.01
The slope of the line after 0 is 1.013.