Quiz 4 - Regression Models

Q1

Consider the space shuttle data ?shuttle in the MASS library. Consider modeling the use of the autolander as the outcome (variable name use). Fit a logistic regression model with autolander (variable auto) use (labeled as “auto” 1) versus not (0) as predicted by wind sign (variable wind). Give the estimated odds ratio for autolander use comparing head winds, labeled as “head” in the variable headwind (numerator) to tail winds (denominator).

library(MASS)
data("shuttle")
summary(shuttle)

##  stability   error   sign       wind         magn     vis     
##  stab :128   LX:64   nn:128   head:128   Light :64   no :128  
##  xstab:128   MM:64   pp:128   tail:128   Medium:64   yes:128  
##              SS:64                       Out   :64            
##              XL:64                       Strong:64            
##      use     
##  auto  :145  
##  noauto:111  
##              
##

head(shuttle)

##   stability error sign wind   magn vis  use
## 1     xstab    LX   pp head  Light  no auto
## 2     xstab    LX   pp head Medium  no auto
## 3     xstab    LX   pp head Strong  no auto
## 4     xstab    LX   pp tail  Light  no auto
## 5     xstab    LX   pp tail Medium  no auto
## 6     xstab    LX   pp tail Strong  no auto

shuttle$use2<-as.integer(shuttle$use=="auto")
shuttle$wind2<-as.integer(shuttle$wind=="head")
logRegShuttle<-glm(use2~wind2, family=binomial, shuttle)

plot(x=shuttle$wind2, logRegShuttle$fitted, pch=19, col="blue", xlab="Wind", ylab="Use (auto)")

summary(logRegShuttle)$coef

##                Estimate Std. Error    z value  Pr(>|z|)
## (Intercept)  0.28312626  0.1785510  1.5856887 0.1128099
## wind2       -0.03181183  0.2522429 -0.1261159 0.8996402

exp(coef(logRegShuttle))

## (Intercept)       wind2 
##   1.3272727   0.9686888

Q2

Consider the previous problem. Give the estimated odds ratio for autolander use comparing head winds (numerator) to tail winds (denominator) adjusting for wind strength from the variable magn.

logRegShuttle2<-glm(use2~wind2+magn, family=binomial, shuttle)
exp(coef(logRegShuttle2)) #odds

## (Intercept)       wind2  magnMedium     magnOut  magnStrong 
##   1.4851533   0.9684981   1.0000000   0.6841941   0.9376181

#The estimate doesn't change with the inclusion of wind strength

Q3

If you fit a logistic regression model to a binary variable, for example use of the autolander, then fit a logistic regression model for one minus the outcome (not using the autolander) what happens to the coefficients?

logRegShuttle3<-glm(use2~wind, family=binomial, shuttle)
summary(logRegShuttle3)$coef

##               Estimate Std. Error   z value  Pr(>|z|)
## (Intercept) 0.25131443  0.1781742 1.4104987 0.1583925
## windtail    0.03181183  0.2522429 0.1261159 0.8996402

logRegShuttle32<-glm(1-use2~wind, family=binomial, shuttle)
summary(logRegShuttle32)$coef

##                Estimate Std. Error    z value  Pr(>|z|)
## (Intercept) -0.25131443  0.1781742 -1.4104987 0.1583925
## windtail    -0.03181183  0.2522429 -0.1261159 0.8996402

Sol: The coefficients reverse their signs.

Q4

Consider the insect spray data InsectSprays. Fit a Poisson model using spray as a factor level. Report the estimated relative rate comapring spray A (numerator) to spray B (denominator).

data("InsectSprays")
head(InsectSprays)

##   count spray
## 1    10     A
## 2     7     A
## 3    20     A
## 4    14     A
## 5    14     A
## 6    12     A

logRegInsect<-glm(count~factor(spray)-1, family="poisson", InsectSprays)
summary(logRegInsect)$coef

##                 Estimate Std. Error   z value      Pr(>|z|)
## factor(spray)A 2.6741486 0.07580980 35.274443 1.448048e-272
## factor(spray)B 2.7300291 0.07372098 37.031917 3.510670e-300
## factor(spray)C 0.7339692 0.19999987  3.669848  2.426946e-04
## factor(spray)D 1.5926308 0.13018891 12.233229  2.065604e-34
## factor(spray)E 1.2527630 0.15430335  8.118832  4.706917e-16
## factor(spray)F 2.8134107 0.07071068 39.787636  0.000000e+00

exp(coef(logRegInsect))[1]/exp(coef(logRegInsect))[2]

## factor(spray)A 
##      0.9456522

#otra forma
fit <- glm(count ~ relevel(spray, "B"), data = InsectSprays, family = poisson)
exp(coef(fit))[2]

## relevel(spray, "B")A 
##            0.9456522

Q5

Consider a Poisson glm with an offset, t. So, for example, a model of the form glm(count ~ x + offset(t), family = poisson) where x is a factor variable comparing a treatment (1) to a control (0) and t is the natural log of a monitoring time. What is impact of the coefficient for x if we fit the model glm(count ~ x + offset(t2), family = poisson) where 2 <- log(10) + t? In other words, what happens to the coefficients if we change the units of the offset variable. (Note, adding log(10) on the log scale is multiplying by 10 on the original scale.)

logRegInsect51<-glm(count~factor(spray)+offset(log(count+1)), family="poisson", InsectSprays)
summary(logRegInsect51)$coef

##                    Estimate Std. Error     z value  Pr(>|z|)
## (Intercept)    -0.066691374  0.0758098 -0.87971965 0.3790112
## factor(spray)B  0.003512473  0.1057445  0.03321659 0.9735019
## factor(spray)C -0.325350713  0.2138857 -1.52114274 0.1282240
## factor(spray)D -0.118451059  0.1506528 -0.78625173 0.4317200
## factor(spray)E -0.184623054  0.1719205 -1.07388635 0.2828736
## factor(spray)F  0.008422466  0.1036683  0.08124434 0.9352476

logRegInsect52<-glm(count~factor(spray)+offset(log(count+1)+log(10)), family="poisson", InsectSprays)
summary(logRegInsect52)$coef

##                    Estimate Std. Error      z value      Pr(>|z|)
## (Intercept)    -2.369276467  0.0758098 -31.25290307 2.038695e-214
## factor(spray)B  0.003512473  0.1057445   0.03321659  9.735019e-01
## factor(spray)C -0.325350713  0.2138857  -1.52114274  1.282240e-01
## factor(spray)D -0.118451059  0.1506528  -0.78625173  4.317200e-01
## factor(spray)E -0.184623054  0.1719205  -1.07388635  2.828736e-01
## factor(spray)F  0.008422466  0.1036683   0.08124434  9.352476e-01

Sol: The coefficient (b1) estimate is unchanged

Q6

Using a knot point at 0, fit a linear model that looks like a hockey stick with two lines meeting at x=0. Include an intercept term, x and the knot point term. What is the estimated slope of the line after 0?

#Consider the data
x <- -5:5
y <- c(5.12, 3.93, 2.67, 1.87, 0.52, 0.08, 0.93, 2.05, 2.54, 3.87, 4.97)

plot(x, y, frame=FALSE, pch=21, bg="lightblue", cex=2)

l1<-lm(y[1:6]~ x[1:6])$coef
l2<-lm(y[6:11]~ x[6:11])$coef

abline(l1, col="red", lwd=2)
abline(l2, col="red", lwd=2)

knots<-c(0)
splineTerms<-sapply(knots,function(knot) (x>knot)*(x-knot))
xMat<-cbind(1,x,x^2,splineTerms)
xMat2<-cbind(1,x,splineTerms)

yhat<-predict(lm(y~xMat-1))
yhat2<-predict(lm(y~xMat2-1))

lines(x,yhat, col="blue", lwd=2)
lines(x,yhat2, col="green", lwd=2)

legend(x=-1,y=5, c("With lm","With X","With X+X^2"),lty=c(1,1), lwd=c(2.5,2.5),col=c("red", "green","blue"))

(yhat2[11]-yhat2[7])/(x[11]-x[7])

##       11 
## 1.013067

Otra forma

z <- (x > 0) * x
fit <- lm(y ~ x + z)
sum(coef(fit)[2:3])

## [1] 1.013067