Question 1

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

Answer. Let us first load the data set and have a look at it.

data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
factor(mtcars$cyl)
##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
## Levels: 4 6 8

Let us now fit a linear model and check the coefficients.

fit <- lm(mpg ~ wt + factor(cyl), data = mtcars)
summary(fit)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  33.990794  1.8877934 18.005569 6.257246e-17
## wt           -3.205613  0.7538957 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582  1.3860728 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860  1.6522878 -3.674214 9.991893e-04

Hence, the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4 is -6.071.

Question 2

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight model. Here, adjusted means including the weight variable as a term in the regression model and unasjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?

** Answer.** The adjusted model is fit. Let us also work out the unadjusted model, which shall be called fit0.

fit0 <- lm(mpg ~factor(cyl), data = mtcars)
summary(fit)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  33.990794  1.8877934 18.005569 6.257246e-17
## wt           -3.205613  0.7538957 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582  1.3860728 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860  1.6522878 -3.674214 9.991893e-04
summary(fit0)$coefficients
##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)   26.663636  0.9718008 27.437347 2.688358e-22
## factor(cyl)6  -6.920779  1.5583482 -4.441099 1.194696e-04
## factor(cyl)8 -11.563636  1.2986235 -8.904534 8.568209e-10

Taking a look at the coefficients, for the comparison of 8 cylinder to 4, we have -6.0708597 and -11.5636364, respectively. Hence

Question 3

Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

Answer. Again, the first model is fit, so let us only work out the second model.

fit1 <- lm(mpg ~ wt * factor(cyl), data = mtcars)

We can now use the ANOVA method to find the requested p-value.

anova(fit, fit1)
## Analysis of Variance Table
## 
## Model 1: mpg ~ wt + factor(cyl)
## Model 2: mpg ~ wt * factor(cyl)
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     28 183.06                           
## 2     26 155.89  2     27.17 2.2658 0.1239

This shows that the p-value is 0.123857 which is larger that the threshold 0.05. Thus,

Question 4

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)

How is the wt coefficient interpreted?

Answer. Let us start by generating this new fit:

newfit <- lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
summary(fit)$coefficients[2,]
##      Estimate    Std. Error       t value      Pr(>|t|) 
## -3.2056132562  0.7538956550 -4.2520649046  0.0002130435
summary(newfit)$coefficients[2,]
##      Estimate    Std. Error       t value      Pr(>|t|) 
## -6.4112265124  1.5077913099 -4.2520649046  0.0002130435

Thus, looking at the numbers -3.2056133 and -6.4112265 for the original fit and the new fit, respectively, it turns out that

Question 5

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the hat diagonal for the most influential point.

Answer.

xyfit <- lm(y ~ x)
max(round(hatvalues(xyfit), 4))
## [1] 0.9946

So the answer is 0.9946.

Question 6

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the slope dfbeta for the point with the highest hat value.

Answer.

signif(dfbetas(xyfit)[hatvalues(xyfit) == max(hatvalues(xyfit)),2], 3)
## [1] -134

So the answer is -134.

Question 7

Consider a regression relationship between \(Y\) and \(X\) with and without adjustment for a third variable \(Z\). Which of the following is true about comparing the regression coefficient between \(Y\) and \(X\) with and without adjustment for \(Z\).

Answer.