Question 1

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

cyl <- as.factor(mtcars$cyl)
fit <- lm(mtcars$mpg~cyl + mtcars$wt)
summary(fit)$coeff[3, 1]
## [1] -6.07086

Question 2

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.

fit2 <- lm(mtcars$mpg~cyl)
print(paste0('effect of 8 versus 4 cylinders on mpg weight regarded = ', summary(fit)$coeff[3, 1]))
## [1] "effect of 8 versus 4 cylinders on mpg weight regarded = -6.07085968049088"
print(paste0('effect of 8 versus 4 cylinders on mpg weight disregarded  = ', summary(fit2)$coeff[3, 1]))
## [1] "effect of 8 versus 4 cylinders on mpg weight disregarded  = -11.5636363636364"

Answer:
Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.

Question 3

Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

fit1 <- lm(mtcars$mpg~cyl + mtcars$wt)
fit2 <- lm(mtcars$mpg~cyl + mtcars$wt + cyl *  mtcars$wt)
a <- anova(fit1, fit2)
print(paste0('P - value = ', a[2, 6]))
## [1] "P - value = 0.12385702605218"

Answer:
The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.

Question 4

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
## 
## Call:
## lm(formula = mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
## 
## Coefficients:
##  (Intercept)   I(wt * 0.5)  factor(cyl)6  factor(cyl)8  
##       33.991        -6.411        -4.256        -6.071

How is the wt coefficient interpretted?

fit<-lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
summary(fit)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  33.990794   1.887793 18.005569 6.257246e-17
## I(wt * 0.5)  -6.411227   1.507791 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582   1.386073 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860   1.652288 -3.674214 9.991893e-04

Asnwer: As 2000 lbs is one short ton, the correct answer is:
The estimated expected change in MPG per one ton increase in weight for a specific number of cylinders (4, 6, 8)

Question 5

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the hat diagonal for the most influential point

fit <- lm(y~x)
max(hatvalues(fit))
## [1] 0.9945734

Question 6

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the slope dfbeta for the point with the highest hat value.

fit6 <- lm(y~x)
hv <- hatvalues(fit6)
wm <- which.max(hv)
dfbetas(fit6)[wm, 2]
## [1] -133.8226

Question 7

Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.

Answer:
It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.
Explanation:
This is an example of Simpsons paradox and the importance of model selection.