Coursera Regression Models Quiz 3

Question 1

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

Answer. Let us first load the data set and have a look at it.

data(mtcars)
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

factor(mtcars$cyl)

##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
## Levels: 4 6 8

Let us now fit a linear model and check the coefficients.

fit <- lm(mpg ~ wt + factor(cyl), data = mtcars)
summary(fit)$coefficients

##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  33.990794  1.8877934 18.005569 6.257246e-17
## wt           -3.205613  0.7538957 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582  1.3860728 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860  1.6522878 -3.674214 9.991893e-04

Hence, the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4 is -6.071.

Question 2

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight model. Here, adjusted means including the weight variable as a term in the regression model and unasjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?

** Answer.** The adjusted model is fit. Let us also work out the unadjusted model, which shall be called fit0.

fit0 <- lm(mpg ~factor(cyl), data = mtcars)
summary(fit)$coefficients

##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  33.990794  1.8877934 18.005569 6.257246e-17
## wt           -3.205613  0.7538957 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582  1.3860728 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860  1.6522878 -3.674214 9.991893e-04

summary(fit0)$coefficients

##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)   26.663636  0.9718008 27.437347 2.688358e-22
## factor(cyl)6  -6.920779  1.5583482 -4.441099 1.194696e-04
## factor(cyl)8 -11.563636  1.2986235 -8.904534 8.568209e-10

Taking a look at the coefficients, for the comparison of 8 cylinder to 4, we have -6.0708597 and -11.5636364, respectively. Hence

Holding weight constant, cylinder appears to have less of an impact on mpg than i weight is disregarded.

Question 3

Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

Answer. Again, the first model is fit, so let us only work out the second model.

fit1 <- lm(mpg ~ wt * factor(cyl), data = mtcars)

We can now use the ANOVA method to find the requested p-value.

anova(fit, fit1)

## Analysis of Variance Table
## 
## Model 1: mpg ~ wt + factor(cyl)
## Model 2: mpg ~ wt * factor(cyl)
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     28 183.06                           
## 2     26 155.89  2     27.17 2.2658 0.1239

This shows that the p-value is 0.123857 which is larger that the threshold 0.05. Thus,

The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.

Question 4

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)

How is the wt coefficient interpreted?

Answer. Let us start by generating this new fit:

newfit <- lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
summary(fit)$coefficients[2,]

##      Estimate    Std. Error       t value      Pr(>|t|) 
## -3.2056132562  0.7538956550 -4.2520649046  0.0002130435

summary(newfit)$coefficients[2,]

##      Estimate    Std. Error       t value      Pr(>|t|) 
## -6.4112265124  1.5077913099 -4.2520649046  0.0002130435

Thus, looking at the numbers -3.2056133 and -6.4112265 for the original fit and the new fit, respectively, it turns out that

The estimated expected change in MPG per one ton increase in weight for a specific number of cylinders (4, 6, 8).

Question 5

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the hat diagonal for the most influential point.

Answer.

xyfit <- lm(y ~ x)
max(round(hatvalues(xyfit), 4))

## [1] 0.9946

So the answer is 0.9946.

Question 6

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the slope dfbeta for the point with the highest hat value.

Answer.

signif(dfbetas(xyfit)[hatvalues(xyfit) == max(hatvalues(xyfit)),2], 3)

## [1] -134

So the answer is -134.

Question 7

Consider a regression relationship between \(Y\) and \(X\) with and without adjustment for a third variable \(Z\). Which of the following is true about comparing the regression coefficient between \(Y\) and \(X\) with and without adjustment for \(Z\).

Answer.

It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.

Coursera Regression Models Quiz 3

Andrea Carpignani

2023-02-12

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7