Quiz - Week 3 - Regression Models

Question 1

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

fit<-lm(mpg ~ factor(cyl) + wt, data=mtcars)

summary(fit)$coefficients

##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  33.990794  1.8877934 18.005569 6.257246e-17
## factor(cyl)6 -4.255582  1.3860728 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860  1.6522878 -3.674214 9.991893e-04
## wt           -3.205613  0.7538957 -4.252065 2.130435e-04

mycol=rainbow(8)
plot(mtcars$wt, mtcars$mpg, pch=19, col=mycol[mtcars$cyl])
abline(coef=c(fit$coeff[1]                ,fit$coeff[4]),col="red",lwd=3)
abline(coef=c(fit$coeff[1] + fit$coeff[2] ,fit$coeff[4]),col="blue",lwd=3)
abline(coef=c(fit$coeff[1] + fit$coeff[3] ,fit$coeff[4]),col="black",lwd=3)

Solution: factor(cyl)8 = -6.071 (intercept is cyl(4) off which cly(8) is compared)

Question 2

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.

fit_adj<-lm(mpg ~ factor(cyl) + wt, data=mtcars)
fit_unadj<-lm(mpg ~ factor(cyl), data=mtcars)

summary(fit_adj)$coef[3]

## [1] -6.07086

summary(fit_unadj)$coef[3]

## [1] -11.56364

summary(fit_adj)$cov

##              (Intercept) factor(cyl)6 factor(cyl)8          wt
## (Intercept)   0.54510067   0.07429958    0.2495740 -0.19870769
## factor(cyl)6  0.07429958   0.29385961    0.2147572 -0.07227838
## factor(cyl)8  0.24957395   0.21475716    0.4175795 -0.14896049
## wt           -0.19870769  -0.07227838   -0.1489605  0.08693412

Solution: Taking into consideration the values shown before, when unadjusted (weight not included on the model), the value is smaller than when adjusted (weight included). So impact on mpg is higher when adjusted (= weight constant)

Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.

It is both true and sensible that including weight would attenuate the effect of number of cylinders on mpg.

Question 3

Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

fit_confounder<-lm(mpg ~ factor(cyl) + wt, data=mtcars)
fit_interaction<-lm(mpg ~ factor(cyl) * wt, data=mtcars)
anova(fit_confounder, fit_interaction)

## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(cyl) + wt
## Model 2: mpg ~ factor(cyl) * wt
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     28 183.06                           
## 2     26 155.89  2     27.17 2.2658 0.1239

anova(fit_confounder, fit_interaction)[2,6]

## [1] 0.123857

Solution: both models are the same

The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.

(Analysis of Variance (ANOVA) is a hypothesis testing procedure that tests whether two or more means are significantly different from each other. We can not reject the hypothesis, therefore the models are similar)

Question 4

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)

How is the wt coefficient interpretted?

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)

## 
## Call:
## lm(formula = mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
## 
## Coefficients:
##  (Intercept)   I(wt * 0.5)  factor(cyl)6  factor(cyl)8  
##       33.991        -6.411        -4.256        -6.071

lm(mpg ~ I(wt * 1) + factor(cyl), data = mtcars)

## 
## Call:
## lm(formula = mpg ~ I(wt * 1) + factor(cyl), data = mtcars)
## 
## Coefficients:
##  (Intercept)     I(wt * 1)  factor(cyl)6  factor(cyl)8  
##       33.991        -3.206        -4.256        -6.071

Solution: The estimated expected change in MPG per ton increase in weight for for a specific number of cylinders (4, 6, 8).

Question 5

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72) y <- c(0.549, -0.026, -0.127, -0.751, 1.344) Give the hat diagonal for the most influential point

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

fit_hat <- lm(y ~ x)

plot(x, y)
abline(fit_hat,col="red",lwd=3)

hatvalues(fit_hat)

##         1         2         3         4         5 
## 0.2286650 0.2438146 0.2525027 0.2804443 0.9945734

plot(hatvalues(fit_hat), type = "h", col="red")

influence(lm(y ~ x))$hat #other way

##         1         2         3         4         5 
## 0.2286650 0.2438146 0.2525027 0.2804443 0.9945734

Solution: hatvalues(fit_hat)[5]

Question 6

Consider the following data set x <- c(0.586, 0.166, -0.042, -0.614, 11.72) y <- c(0.549, -0.026, -0.127, -0.751, 1.344) Give the slope dfbeta for the point with the highest hat value.

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

fit_beta <- lm(y ~ x)

plot(x, y)
abline(fit_beta,col="red",lwd=3)

dfbetas(fit_beta)[,2]

##             1             2             3             4             5 
##   -0.37811633   -0.02861769    0.00791512    0.67253246 -133.82261293

influence.measures(lm(y ~ x)) #other way

## Influence measures of
##   lm(formula = y ~ x) :
## 
##    dfb.1_     dfb.x     dffit cov.r   cook.d   hat inf
## 1  1.0621 -3.78e-01    1.0679 0.341 2.93e-01 0.229   *
## 2  0.0675 -2.86e-02    0.0675 2.934 3.39e-03 0.244    
## 3 -0.0174  7.92e-03   -0.0174 3.007 2.26e-04 0.253   *
## 4 -1.2496  6.73e-01   -1.2557 0.342 3.91e-01 0.280   *
## 5  0.2043 -1.34e+02 -149.7204 0.107 2.70e+02 0.995   *

Solution: -134

Question 7

Solution: It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.