Coursera Regression Quiz 4
Q1. Consider the space shuttle data ?(𝚜𝚑𝚞𝚝𝚝𝚕𝚎 in he 𝙼𝙰)𝚂𝚂 library. Consider modeling the use of the autolander as the outcome (variable name 𝚞𝚜𝚎). Fit a logistic regression model with autolander (variable auto) use (labeled as “auto” 1) versus not (0) as predicted by wind sign (variable wind). Give the estimated odds ratio for autolander use comparing head winds, labeled as “head” in the variable headwind (numerator) to tail winds (denominator).
log(p/1-p)= b1(winhead) + b2(windtail)
library(MASS)
head(shuttle)
## stability error sign wind magn vis use
## 1 xstab LX pp head Light no auto
## 2 xstab LX pp head Medium no auto
## 3 xstab LX pp head Strong no auto
## 4 xstab LX pp tail Light no auto
## 5 xstab LX pp tail Medium no auto
## 6 xstab LX pp tail Strong no auto
shuttle$auto <- as.integer(shuttle$use=="auto")
mdl <- glm(factor(auto)~ wind-1, family="binomial", data=shuttle)
exp(mdl$coefficients[1])/exp(mdl$coefficients[2])
## windhead
## 0.9686888
Q2. Consider the previous problem. Give the estimated odds ratio for autolander use comparing head winds (numerator) to tail winds (denominator) adjusting for wind strength from the variable magn.
mdl <- glm(factor(auto)~ wind + magn -1, family="binomial", data=shuttle)
exp(mdl$coefficients[1])/exp(mdl$coefficients[2])
## windhead
## 0.9684981
Q3. If you fit a logistic regression model to a binary variable, for example use of the autolander, then fit a logistic regression model for one minus the outcome (not using the autolander) what happens to the coefficients?
glm(1-auto~ wind , family="binomial", data=shuttle)
##
## Call: glm(formula = 1 - auto ~ wind, family = "binomial", data = shuttle)
##
## Coefficients:
## (Intercept) windtail
## -0.25131 -0.03181
##
## Degrees of Freedom: 255 Total (i.e. Null); 254 Residual
## Null Deviance: 350.4
## Residual Deviance: 350.3 AIC: 354.3
The coefficients reverse their signs.
Q4. Consider the insect spray data 𝙸𝚗𝚜𝚎𝚌𝚝𝚂𝚙𝚛𝚊𝚢𝚜. Fit a Poisson model using spray as a factor level. Report the estimated relative rate comapring spray A (numerator) to spray B (denominator).
data(InsectSprays)
mdl <-glm(count ~ factor(spray)-1, family =poisson, data = InsectSprays)
exp(mdl$coefficients[1])/exp(mdl$coefficients[2])
## factor(spray)A
## 0.9456522
Q5. Consider a Poisson glm with an offset, t. So, for example, a model of the form 𝚐𝚕𝚖(𝚌𝚘𝚞𝚗𝚝 ~ 𝚡 + 𝚘𝚏𝚏𝚜𝚎𝚝(𝚝), 𝚏𝚊𝚖𝚒𝚕𝚢 = 𝚙𝚘𝚒𝚜𝚜𝚘𝚗) where 𝚡 is a factor variable comparing a treatment (1) to a control (0) and 𝚝 is the natural log of a monitoring time. What is impact of the coefficient for 𝚡 if we fit the model 𝚐𝚕𝚖(𝚌𝚘𝚞𝚗𝚝 ~ 𝚡 + 𝚘𝚏𝚏𝚜𝚎𝚝(𝚝𝟸), 𝚏𝚊𝚖𝚒𝚕𝚢 = 𝚙𝚘𝚒𝚜𝚜𝚘𝚗) where 𝟸 <- 𝚕𝚘𝚐(𝟷𝟶) + 𝚝? In other words, what happens to the coefficients if we change the units of the offset variable. (Note, adding log(10) on the log scale is multiplying by 10 on the original scale.)
t <- rnorm(72)
t2 <- log(10) + t
glm(count ~ factor(spray)+offset(t) ,family ="poisson", data = InsectSprays)
##
## Call: glm(formula = count ~ factor(spray) + offset(t), family = "poisson",
## data = InsectSprays)
##
## Coefficients:
## (Intercept) factor(spray)B factor(spray)C factor(spray)D
## 2.1491 0.7229 -1.9054 -1.4646
## factor(spray)E factor(spray)F
## -1.4458 0.0885
##
## Degrees of Freedom: 71 Total (i.e. Null); 66 Residual
## Null Deviance: 1267
## Residual Deviance: 771.2 AIC: 1049
glm(count ~ factor(spray) + offset(t2),family =poisson, data = InsectSprays)
##
## Call: glm(formula = count ~ factor(spray) + offset(t2), family = poisson,
## data = InsectSprays)
##
## Coefficients:
## (Intercept) factor(spray)B factor(spray)C factor(spray)D
## -0.1535 0.7229 -1.9054 -1.4646
## factor(spray)E factor(spray)F
## -1.4458 0.0885
##
## Degrees of Freedom: 71 Total (i.e. Null); 66 Residual
## Null Deviance: 1267
## Residual Deviance: 771.2 AIC: 1049
Q6 . Consider the data
Using a knot point at 0, fit a linear model that looks like a hockey stick with two lines meeting at x=0. Include an intercept term, x and the knot point term. What is the estimated slope of the line after 0?
x <- -5:5
y <- c(5.12, 3.93, 2.67, 1.87, 0.52, 0.08, 0.93, 2.05, 2.54, 3.87, 4.97)
knots<-c(0)
splineTerms<-sapply(knots,function(knot) (x>knot)*(x-knot))
xmat<-cbind(1,x,splineTerms)
fit<-lm(y~xmat-1)
yhat<-predict(fit)
summary(fit)$coef
## Estimate Std. Error t value Pr(>|t|)
## xmat -0.1825806 0.13557812 -1.346682 2.149877e-01
## xmatx -1.0241584 0.04805280 -21.313188 2.470198e-08
## xmat 2.0372258 0.08574713 23.758531 1.048711e-08
(yhat[10]-yhat[6])/4
## 10
## 1.013067
plot(x,y)
lines(x,yhat,col="red")