Question 1 - Consider the space shuttle data ?shuttle in the MASS library. Consider modeling the use of the autolander as the outcome (variable name use). Fit a logistic regression model with autolander (variable auto) use (labeled as “auto” 1) versus not (0) as predicted by wind sign (variable wind). Give the estimated odds ratio for autolander use comparing head winds, labeled as “head” in the variable headwind (numerator) to tail winds (denominator).

library(MASS)
data(shuttle)
head(shuttle)
##   stability error sign wind   magn vis  use
## 1     xstab    LX   pp head  Light  no auto
## 2     xstab    LX   pp head Medium  no auto
## 3     xstab    LX   pp head Strong  no auto
## 4     xstab    LX   pp tail  Light  no auto
## 5     xstab    LX   pp tail Medium  no auto
## 6     xstab    LX   pp tail Strong  no auto
unique (shuttle$use)
## [1] auto   noauto
## Levels: auto noauto
unique(shuttle$wind)
## [1] head tail
## Levels: head tail
# Auto lander predicted by wind sign
fitAutoLander<-glm(shuttle$use~shuttle$wind, family="binomial")
summary(fitAutoLander)
## 
## Call:
## glm(formula = shuttle$use ~ shuttle$wind, family = "binomial")
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.073  -1.073  -1.060   1.286   1.300  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)      -0.25131    0.17817  -1.410    0.158
## shuttle$windtail -0.03181    0.25224  -0.126    0.900
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 350.36  on 255  degrees of freedom
## Residual deviance: 350.35  on 254  degrees of freedom
## AIC: 354.35
## 
## Number of Fisher Scoring iterations: 4
# Coefficient 2 is the windtail refereced to windhead
# coef[2] is the ration in the log scale so we must exponentiate it to go back to the original scale
exp(summary(fitAutoLander$coef[2]))
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.9686906 0.9686906 0.9686906 0.9686906 0.9686906 0.9686906
# 0.9686906 is the ratio of the odds for auto lander when we have tail wninds comparing to head winds

Answer:

** 0.031

**-0.031

**1.327

**0.969

Question 2 - Consider the previous problem. Give the estimated odds ratio for autolander use comparing head winds (numerator) to tail winds (denominator) adjusting for wind strength from the variable magn.

fitAutoLanderMag<-glm(shuttle$use~shuttle$wind+shuttle$magn, family="binomial")
summary(fitAutoLanderMag)
## 
## Call:
## glm(formula = shuttle$use ~ shuttle$wind + shuttle$magn, family = "binomial")
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.184  -1.040  -1.015   1.321   1.349  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)
## (Intercept)        -3.635e-01  2.841e-01  -1.280    0.201
## shuttle$windtail   -3.201e-02  2.530e-01  -0.127    0.899
## shuttle$magnMedium  2.442e-16  3.599e-01   0.000    1.000
## shuttle$magnOut     3.795e-01  3.568e-01   1.064    0.287
## shuttle$magnStrong  6.441e-02  3.590e-01   0.179    0.858
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 350.36  on 255  degrees of freedom
## Residual deviance: 348.78  on 251  degrees of freedom
## AIC: 358.78
## 
## Number of Fisher Scoring iterations: 4
exp(summary(fitAutoLanderMag$coef[2]))
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.9684969 0.9684969 0.9684969 0.9684969 0.9684969 0.9684969

Answer:

**1.485

**1.00

**0.969

**0.684

Question 3 - If you fit a logistic regression model to a binary variable, for example use of the autolander, then fit a logistic regression model for one minus the outcome (not using the autolander) what happens to the coefficients?

# Let's use relevel to noauto as the reference value
fitNoAutoLander<-glm(relevel((shuttle$use),ref="noauto")~shuttle$wind, family="binomial")
summary(fitNoAutoLander)
## 
## Call:
## glm(formula = relevel((shuttle$use), ref = "noauto") ~ shuttle$wind, 
##     family = "binomial")
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.300  -1.286   1.060   1.073   1.073  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)       0.25131    0.17817   1.410    0.158
## shuttle$windtail  0.03181    0.25224   0.126    0.900
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 350.36  on 255  degrees of freedom
## Residual deviance: 350.35  on 254  degrees of freedom
## AIC: 354.35
## 
## Number of Fisher Scoring iterations: 4
# Coefficient 2 is the windtail refereced to windhead

Answer:

**The coefficients get inverted (one over their previous value).

**The intercept changes sign, but the other coefficients don’t.

**The coefficients reverse their signs.

**The coefficients change in a non-linear fashion.

Question 4 - Consider the insect spray data InsectSprays. Fit a Poisson model using spray as a factor level. Report the estimated relative rate comapring spray A (numerator) to spray B (denominator).

data("InsectSprays")
head(InsectSprays)
##   count spray
## 1    10     A
## 2     7     A
## 3    20     A
## 4    14     A
## 5    14     A
## 6    12     A
#using relevel we can obtais the coefficients relatively to B
fitArelB<-glm(InsectSprays$count~relevel(InsectSprays$spray,ref="B"),family="poisson")
summary(fitArelB)
## 
## Call:
## glm(formula = InsectSprays$count ~ relevel(InsectSprays$spray, 
##     ref = "B"), family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3852  -0.8876  -0.1482   0.6063   2.6922  
## 
## Coefficients:
##                                         Estimate Std. Error z value
## (Intercept)                              2.73003    0.07372  37.032
## relevel(InsectSprays$spray, ref = "B")A -0.05588    0.10574  -0.528
## relevel(InsectSprays$spray, ref = "B")C -1.99606    0.21315  -9.364
## relevel(InsectSprays$spray, ref = "B")D -1.13740    0.14961  -7.602
## relevel(InsectSprays$spray, ref = "B")E -1.47727    0.17101  -8.638
## relevel(InsectSprays$spray, ref = "B")F  0.08338    0.10215   0.816
##                                         Pr(>|z|)    
## (Intercept)                              < 2e-16 ***
## relevel(InsectSprays$spray, ref = "B")A    0.597    
## relevel(InsectSprays$spray, ref = "B")C  < 2e-16 ***
## relevel(InsectSprays$spray, ref = "B")D 2.91e-14 ***
## relevel(InsectSprays$spray, ref = "B")E  < 2e-16 ***
## relevel(InsectSprays$spray, ref = "B")F    0.414    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 409.041  on 71  degrees of freedom
## Residual deviance:  98.329  on 66  degrees of freedom
## AIC: 376.59
## 
## Number of Fisher Scoring iterations: 5
#the outcome is the result of e exponentiated with coef[2], A relative to B
exp(summary(fitArelB)$coef[2])
## [1] 0.9456522

Answer:

**-0.056

0.9457

**0.321

**0.136

Question 5 - Consider a Poisson GLM with an offset, t. So, for example, a model of the form glm(count ~ x + offset(t), family = poisson) where x is a factor variable comparing a treatment (1) to a control (0) and t is the natural log of a monitoring time. What is impact of the coefficient for x if we fit the model glm(count ~ x + offset(t2), family = poisson) where 2 <- log(10) + t? In other words, what happens to the coefficients if we change the units of the offset variable. (Note, adding log(10) on the log scale is multiplying by 10 on the original scale.)

#The coefficient estimate don't changes because the offset only translates the curve mantaining the shape. The coefficients are related to the shape.
#Class Poisson Regression II (9:30)

Answer:

**The coefficient is subtracted by log(10).

The coefficient estimate is unchanged

**The coefficient estimate is divided by 10.

**The coefficient estimate is multiplied by 10.

Question 6 - Consider the data

x <- -5:5
y <- c(5.12, 3.93, 2.67, 1.87, 0.52, 0.08, 0.93, 2.05, 2.54, 3.87, 4.97)

Using a knot point at 0, fit a linear model that looks like a hockey stick with two lines meeting at x=0. Include an intercept term, x and the knot point term. What is the estimated slope of the line after 0?

#knot point at o

knots<-0

splineTerms<-sapply(knots, function(knot)(x>knot)*(x-knot))
(xMat<-cbind(1,x,splineTerms))
##          x  
##  [1,] 1 -5 0
##  [2,] 1 -4 0
##  [3,] 1 -3 0
##  [4,] 1 -2 0
##  [5,] 1 -1 0
##  [6,] 1  0 0
##  [7,] 1  1 1
##  [8,] 1  2 2
##  [9,] 1  3 3
## [10,] 1  4 4
## [11,] 1  5 5
fit6<-lm(y~ xMat-1)
yhat<-predict(fit6)
plot(x,y,frame=FALSE, pch=21, bg="lightblue", cex=2)
lines(x,yhat,col="red", lwd=2)

#the slope is calculated as the difference of yhat at 11. position and 6. position divided by the difference of the x valuas art same positions 

yhat
##          1          2          3          4          5          6 
##  4.9382111  3.9140528  2.8898944  1.8657361  0.8415777 -0.1825806 
##          7          8          9         10         11 
##  0.8304868  1.8435543  2.8566217  3.8696891  4.8827566
slope = (yhat[11]-yhat[6])/(x[11]-x[6])
slope
##       11 
## 1.013067

Answer:

**-1.024

**2.037

**-0.183

**1.013