R Markdown

1. Consider the space shuttle data ?shuttle in the MASS library. Consider modeling the use of the autolander as the outcome (variable name use). Fit a logistic regression model with autolander (variable auto) use (labeled as “auto” 1) versus not (0) as predicted by wind sign (variable wind). Give the estimated odds ratio for autolander use comparing head winds, labeled as “head” in the variable headwind (numerator) to tail winds (denominator).

library(MASS)
fit1 <- glm(use~wind, family="binomial", data=shuttle)
exp(coef(fit1))[2]
##  windtail 
## 0.9686888

In this case, as Head is considered 1 the interpretation of data is direct –> odds ratio = p/(1-p) –> so what we have already calculate is odds ratio for head winds.

2. Consider the previous problem. Give the estimated odds ratio for autolander use comparing head winds (numerator) to tail winds (denominator) adjusting for wind strength from the variable magn.

fit2 <- glm(use~wind+magn, family="binomial", data=shuttle)
summary(fit2)
## 
## Call:
## glm(formula = use ~ wind + magn, family = "binomial", data = shuttle)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.184  -1.040  -1.015   1.321   1.349  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.635e-01  2.841e-01  -1.280    0.201
## windtail    -3.201e-02  2.530e-01  -0.127    0.899
## magnMedium   2.442e-16  3.599e-01   0.000    1.000
## magnOut      3.795e-01  3.568e-01   1.064    0.287
## magnStrong   6.441e-02  3.590e-01   0.179    0.858
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 350.36  on 255  degrees of freedom
## Residual deviance: 348.78  on 251  degrees of freedom
## AIC: 358.78
## 
## Number of Fisher Scoring iterations: 4
exp(coef(fit2))[2]
##  windtail 
## 0.9684981

3. If you fit a logistic regression model to a binary variable, for example use of the autolander, then fit a logistic regression model for one minus the outcome (not using the autolander) what happens to the coefficients?

fit3 <- glm(use~wind, family=binomial, data=shuttle)
summary(fit3)
## 
## Call:
## glm(formula = use ~ wind, family = binomial, data = shuttle)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.073  -1.073  -1.060   1.286   1.300  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.25131    0.17817  -1.410    0.158
## windtail    -0.03181    0.25224  -0.126    0.900
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 350.36  on 255  degrees of freedom
## Residual deviance: 350.35  on 254  degrees of freedom
## AIC: 354.35
## 
## Number of Fisher Scoring iterations: 4
shuttle2 <- shuttle
shuttle2$use.bin <- as.integer(shuttle$use=="auto")
fit4 <- glm(1-use.bin~wind, family=binomial, data=shuttle2)
summary(fit4)
## 
## Call:
## glm(formula = 1 - use.bin ~ wind, family = binomial, data = shuttle2)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.073  -1.073  -1.060   1.286   1.300  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.25131    0.17817  -1.410    0.158
## windtail    -0.03181    0.25224  -0.126    0.900
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 350.36  on 255  degrees of freedom
## Residual deviance: 350.35  on 254  degrees of freedom
## AIC: 354.35
## 
## Number of Fisher Scoring iterations: 4

4. Consider the insect spray data InsectSprays. Fit a Poisson model using spray as a factor level. Report the estimated relative rate comapring spray A (numerator) to spray B (denominator).

View(InsectSprays)
#fit a model to get the coefficients independent from each other (add -1)
fit5 <- glm(count~.-1, family=poisson, data=InsectSprays)
summary(fit5)
## 
## Call:
## glm(formula = count ~ . - 1, family = poisson, data = InsectSprays)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3852  -0.8876  -0.1482   0.6063   2.6922  
## 
## Coefficients:
##        Estimate Std. Error z value Pr(>|z|)    
## sprayA  2.67415    0.07581  35.274  < 2e-16 ***
## sprayB  2.73003    0.07372  37.032  < 2e-16 ***
## sprayC  0.73397    0.20000   3.670 0.000243 ***
## sprayD  1.59263    0.13019  12.233  < 2e-16 ***
## sprayE  1.25276    0.15430   8.119 4.71e-16 ***
## sprayF  2.81341    0.07071  39.788  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2264.808  on 72  degrees of freedom
## Residual deviance:   98.329  on 66  degrees of freedom
## AIC: 376.59
## 
## Number of Fisher Scoring iterations: 5
#divide coef 1 (spray A) between coef 2 (spray B) to get the estimated relative rate
exp(coef(fit5)[1])/exp(coef(fit5)[2])
##    sprayA 
## 0.9456522

5. Consider a Poisson glm with an offset, t. So, for example, a model of the form: glm(countx+offset(t),family=poisson), …where x is a factor variable comparing a treatment(1) to a control(0) and t is the natural log of a monitoring time. What is impact of the coefficients for x if we fit the model: glm(countx+offset(t2), family=poisson

…where: t2 <- log(10)+t In other words, what happens to the coefficients if we change the units of the offset variable. (Note, adding log(10) on the log scale is multiplying by 10 on the original scale).

The coefficient estimate is unchanged

6. Consider the data:

x <- -5:5
y <- c(5.12, 3.93, 2.67, 1.87, 0.52, 0.08, 0.93, 2.05, 2.54, 3.87, 4.97)

Using a knot point at 0, fit a linear model that looks like a hockey stick with two lines meeting at x=0. Include an intercept term, x and the knot point term. What is the estimated slope of the line after 0?

knots <- 0
splineTerms <- sapply (knots, function(knot)(x>knot)*(x-knot))
splineTerms
##       [,1]
##  [1,]    0
##  [2,]    0
##  [3,]    0
##  [4,]    0
##  [5,]    0
##  [6,]    0
##  [7,]    1
##  [8,]    2
##  [9,]    3
## [10,]    4
## [11,]    5
xMat <- cbind (1, x, splineTerms)
fit6 <- lm(y~xMat-1)
yHat <- predict(fit6)
plot(x, y, frame = FALSE, pch = 21, bg = "lightblue", cex = 2)
lines(x,yHat,col="red",lwd=2)

sum(coef(fit6)[2:3])
## [1] 1.013067