suppressMessages(library(ggplot2))
suppressMessages(library(stats))
suppressMessages(library(dplyr))
suppressMessages(library(MASS))

Question - 1

Consider the space shuttle data ?𝚜𝚑𝚞𝚝𝚝𝚕𝚎 in the 𝙼𝙰𝚂𝚂 library. Consider modeling the use of the autolander as the outcome (variable name 𝚞𝚜𝚎). Fit a logistic regression model with autolander (variable auto) use (labeled as “auto” 1) versus not (0) as predicted by wind sign (variable wind). Give the estimated odds ratio for autolander use comparing head winds, labeled as “head” in the variable headwind (numerator) to tail winds (denominator).

Answer:

data("shuttle")
# Method 1
shuttle$use2<-ifelse(shuttle$use=="auto", 1, 0)
shuttle$headwind<-ifelse(shuttle$wind=="head", 1, 0)
mdl1<-glm(formula = use2~headwind, family = binomial, data = shuttle)
exp(coef(mdl1))
## (Intercept)    headwind 
##   1.3272727   0.9686888
# Method 2
mdl1<- glm(relevel(use, "noauto") ~ relevel(wind, "tail"), data = shuttle, family = binomial)
exp(coef(mdl1))
##               (Intercept) relevel(wind, "tail")head 
##                 1.3272727                 0.9686888

Question - 2

Consider the previous problem. Give the estimated odds ratio for autolander use comparing head winds (numerator) to tail winds (denominator) adjusting for wind strength from the variable magn.

Answer:

# The estimate doesn't change with the inclusion of wind strength
shuttle$use2<-ifelse(shuttle$use=="auto", 1, 0)
shuttle$wind2<-ifelse(shuttle$wind=="head", 1, 0)
mdl2<-glm(use2~ wind2 + magn, "binomial", data = shuttle)
exp(coef(mdl2))
## (Intercept)       wind2  magnMedium     magnOut  magnStrong 
##   1.4851533   0.9684981   1.0000000   0.6841941   0.9376181

Question - 3

If you fit a logistic regression model to a binary variable, for example use of the autolander, then fit a logistic regression model for one minus the outcome (not using the autolander) what happens to the coefficients?

Answer:

mdl1<-glm(formula = use2~wind, family = binomial, data = shuttle)
summary(mdl1)$coef
##               Estimate Std. Error   z value  Pr(>|z|)
## (Intercept) 0.25131443  0.1781742 1.4104987 0.1583925
## windtail    0.03181183  0.2522429 0.1261159 0.8996402
mdl2<-glm(formula = 1-use2~wind, family = binomial, data = shuttle)
summary(mdl2)$coef
##                Estimate Std. Error    z value  Pr(>|z|)
## (Intercept) -0.25131443  0.1781742 -1.4104987 0.1583925
## windtail    -0.03181183  0.2522429 -0.1261159 0.8996402

As seen from above simulation, the coefficients reverse their signs. Remember that the coefficients are on the log scale. So changing the sign changes the numerator and denominator for the exponent.

Question - 4

Consider the insect spray data 𝙸𝚗𝚜𝚎𝚌𝚝𝚂𝚙𝚛𝚊𝚢𝚜. Fit a Poisson model using spray as a factor level. Report the estimated relative rate comapring spray A (numerator) to spray B (denominator).

Answer:

data("InsectSprays")
head(InsectSprays)
##   count spray
## 1    10     A
## 2     7     A
## 3    20     A
## 4    14     A
## 5    14     A
## 6    12     A
str(InsectSprays)
## 'data.frame':    72 obs. of  2 variables:
##  $ count: num  10 7 20 14 14 12 10 23 17 20 ...
##  $ spray: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
InsectSprays$spray2<-relevel(InsectSprays$spray, "B")
mdl3<-glm(count~spray2, poisson, data = InsectSprays)
summary(mdl3)
## 
## Call:
## glm(formula = count ~ spray2, family = poisson, data = InsectSprays)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3852  -0.8876  -0.1482   0.6063   2.6922  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.73003    0.07372  37.032  < 2e-16 ***
## spray2A     -0.05588    0.10574  -0.528    0.597    
## spray2C     -1.99606    0.21315  -9.364  < 2e-16 ***
## spray2D     -1.13740    0.14961  -7.602 2.91e-14 ***
## spray2E     -1.47727    0.17101  -8.638  < 2e-16 ***
## spray2F      0.08338    0.10215   0.816    0.414    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 409.041  on 71  degrees of freedom
## Residual deviance:  98.329  on 66  degrees of freedom
## AIC: 376.59
## 
## Number of Fisher Scoring iterations: 5
exp(summary(mdl3)$coef[2,1])
## [1] 0.9456522

Question - 5

Consider a Poisson glm with an offset, t. So, for example, a model of the form 𝚐𝚕𝚖(𝚌𝚘𝚞𝚗𝚝 ~ 𝚡 + 𝚘𝚏𝚏𝚜𝚎𝚝(𝚝), 𝚏𝚊𝚖𝚒𝚕𝚢 = 𝚙𝚘𝚒𝚜𝚜𝚘𝚗) where 𝚡 is a factor variable comparing a treatment (1) to a control (0) and 𝚝 is the natural log of a monitoring time. What is impact of the coefficient for 𝚡 if we fit the model 𝚐𝚕𝚖(𝚌𝚘𝚞𝚗𝚝 ~ 𝚡 + 𝚘𝚏𝚏𝚜𝚎𝚝(𝚝𝟸), 𝚏𝚊𝚖𝚒𝚕𝚢 = 𝚙𝚘𝚒𝚜𝚜𝚘𝚗) where 𝟸 <- 𝚕𝚘𝚐(𝟷𝟶) + 𝚝? In other words, what happens to the coefficients if we change the units of the offset variable. (Note, adding log(10) on the log scale is multiplying by 10 on the original scale.)

Answer:

t=c(1:72)
mdl4<-glm(count~spray+offset(log(t)), poisson, data = InsectSprays)
summary(mdl4)$coef
##               Estimate Std. Error    z value     Pr(>|z|)
## (Intercept)  0.8023465  0.0758098  10.583677 3.547573e-26
## sprayB      -0.9900881  0.1057445  -9.363018 7.749091e-21
## sprayC      -3.4861040  0.2138856 -16.298916 1.004583e-59
## sprayD      -2.9592198  0.1506528 -19.642641 6.683014e-86
## sprayE      -3.5477842  0.1719205 -20.636193 1.298990e-94
## sprayF      -2.1861377  0.1036683 -21.087803 1.029321e-98
t2=t*(log(10)+t)
mdl5<-glm(count~spray+offset(log(t2)), poisson, data = InsectSprays)
summary(mdl4)$coef
##               Estimate Std. Error    z value     Pr(>|z|)
## (Intercept)  0.8023465  0.0758098  10.583677 3.547573e-26
## sprayB      -0.9900881  0.1057445  -9.363018 7.749091e-21
## sprayC      -3.4861040  0.2138856 -16.298916 1.004583e-59
## sprayD      -2.9592198  0.1506528 -19.642641 6.683014e-86
## sprayE      -3.5477842  0.1719205 -20.636193 1.298990e-94
## sprayF      -2.1861377  0.1036683 -21.087803 1.029321e-98

Note, the coefficients are unchanged. Recall that, except the intercept, all of the coefficients are interpretted as log relative rates when holding the other variables or offset constant. Thus, a unit change in the offset would cancel out. This is not true of the intercept, which is interperted as the log rate (not relative rate) with all of the covariates set to 0.

Question - 6

Consider the data

x <- -5:5
y <- c(5.12, 3.93, 2.67, 1.87, 0.52, 0.08, 0.93, 2.05, 2.54, 3.87, 4.97)

Using a knot point at 0, fit a linear model that looks like a hockey stick with two lines meeting at x=0. Include an intercept term, x and the knot point term. What is the estimated slope of the line after 0?

Answer:

# Method 1
knots<-c(0)
splineTerms <- sapply(knots, function(knot) (x > knot) * (x - knot))
xMat <- cbind(1, x, splineTerms)
summary(lm(y ~ xMat))$coef
##               Estimate Std. Error    t value     Pr(>|t|)
## (Intercept) -0.1825806 0.13557812  -1.346682 2.149877e-01
## xMatx       -1.0241584 0.04805280 -21.313188 2.470198e-08
## xMat         2.0372258 0.08574713  23.758531 1.048711e-08
yhat<-predict(lm(y~xMat))
yhat
##          1          2          3          4          5          6 
##  4.9382111  3.9140528  2.8898944  1.8657361  0.8415777 -0.1825806 
##          7          8          9         10         11 
##  0.8304868  1.8435543  2.8566217  3.8696891  4.8827566
(yhat[11]-yhat[6])/5
##       11 
## 1.013067
plot(x,y)
lines(x,yhat, col="red")

# Method 2
z<-(x>0)*x
fit<-lm(y~x+z)
sum(coef(fit)[2:3])
## [1] 1.013067