1

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

unique(mtcars$cyl)
## [1] 6 4 8
cylinders <-relevel(factor(mtcars$cyl), "4")
fit <- lm(mpg ~ cylinders + wt, data = mtcars)
fit
## 
## Call:
## lm(formula = mpg ~ cylinders + wt, data = mtcars)
## 
## Coefficients:
## (Intercept)   cylinders6   cylinders8           wt  
##      33.991       -4.256       -6.071       -3.206

2

Consider the mtcars data set.

Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable.

Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included.

What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.

cylinders <-relevel(factor(mtcars$cyl), "4")
fit <- lm(mpg ~ cylinders + wt, data = mtcars)
fit
## 
## Call:
## lm(formula = mpg ~ cylinders + wt, data = mtcars)
## 
## Coefficients:
## (Intercept)   cylinders6   cylinders8           wt  
##      33.991       -4.256       -6.071       -3.206
fit <- lm(mpg ~ cylinders, data = mtcars)
fit
## 
## Call:
## lm(formula = mpg ~ cylinders, data = mtcars)
## 
## Coefficients:
## (Intercept)   cylinders6   cylinders8  
##      26.664       -6.921      -11.564

3

Consider the mtcars data set.

Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder.

Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight.

Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

cylinders <-factor(mtcars$cyl)

fit <- lm(mpg ~ cylinders + wt, data = mtcars)
fit1 <- lm(mpg ~ cylinders + wt + cylinders * wt, data = mtcars)

anova(fit, fit1, test="LRT")
## Analysis of Variance Table
## 
## Model 1: mpg ~ cylinders + wt
## Model 2: mpg ~ cylinders + wt + cylinders * wt
##   Res.Df    RSS Df Sum of Sq Pr(>Chi)
## 1     28 183.06                      
## 2     26 155.89  2     27.17   0.1038

4

Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)

fit <- lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
summary(fit)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  33.990794   1.887793 18.005569 6.257246e-17
## I(wt * 0.5)  -6.411227   1.507791 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582   1.386073 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860   1.652288 -3.674214 9.991893e-04

5

Give the hat diagonal for the most influential point

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit <- lm(y~x) 
z <- max(influence(fit)$hat)
z
## [1] 0.9945734

6

Give the slope dfbeta for the point with the highest hat value

influence.measures(fit) 
## Influence measures of
##   lm(formula = y ~ x) :
## 
##    dfb.1_     dfb.x     dffit cov.r   cook.d   hat inf
## 1  1.0621 -3.78e-01    1.0679 0.341 2.93e-01 0.229   *
## 2  0.0675 -2.86e-02    0.0675 2.934 3.39e-03 0.244    
## 3 -0.0174  7.92e-03   -0.0174 3.007 2.26e-04 0.253   *
## 4 -1.2496  6.73e-01   -1.2557 0.342 3.91e-01 0.280   *
## 5  0.2043 -1.34e+02 -149.7204 0.107 2.70e+02 0.995   *