Quiz 3 Regression ModelsConsider the mtcars data set. Fit a model with mpg as
the outcome that includes number of cylinders as a factor variable and
weight as confounder. Give the adjusted estimate for the expected change
in mpg comparing 8 cylinders to 4.
Answer
# Fitting a model using wt and cyl.
fit_q1 <- lm(data = mtcars, formula = mpg ~ wt + factor(cyl))
# Printing the coefficients.
summary(fit_q1)$coeff
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.990794 1.8877934 18.005569 6.257246e-17
## wt -3.205613 0.7538957 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582 1.3860728 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860 1.6522878 -3.674214 9.991893e-04
The interpretation of cyl coefficients is based on cyl
4, the baseline. So the coefficient -6.07 is in comparison to the cyl
4.
Consider the mtcars data set. Fit a model with mpg as
the outcome that includes number of cylinders as a factor variable and
weight as a possible confounding variable. Compare the effect of 8
versus 4 cylinders on mpg for the adjusted and unadjusted by weight
models. Here, adjusted means including the weight variable as a term in
the regression model and unadjusted means the model without weight
included. What can be said about the effect comparing 8 and 4 cylinders
after looking at models with and without weight included?
Answer
# Printing the coefficients.
summary(lm(data = mtcars, formula = mpg ~ factor(cyl)))$coeff;
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.663636 0.9718008 27.437347 2.688358e-22
## factor(cyl)6 -6.920779 1.5583482 -4.441099 1.194696e-04
## factor(cyl)8 -11.563636 1.2986235 -8.904534 8.568209e-10
summary(lm(data = mtcars, formula = mpg ~ wt + factor(cyl)))$coeff
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.990794 1.8877934 18.005569 6.257246e-17
## wt -3.205613 0.7538957 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582 1.3860728 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860 1.6522878 -3.674214 9.991893e-04
Comparing the two fitted models, when wt has disregarded the influence in cyl in greater.
Consider the mtcars data set. Fit a model with mpg as
the outcome that considers number of cylinders as a factor variable and
weight as confounder. Now fit a second model with mpg as the outcome
model that considers the interaction between number of cylinders (as a
factor variable) and weight. Give the P-value for the likelihood ratio
test comparing the two models and suggest a model using 0.05 as a type I
error rate significance benchmark.
Answer
fit_q3_1 <- lm(data = mtcars, formula = mpg ~ factor(cyl) + wt)
fit_q3_2 <- lm(data = mtcars, formula = mpg ~ factor(cyl) * wt)
round(summary(fit_q3_1)$coeff,4); round(summary(fit_q3_2)$coeff,4)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.9908 1.8878 18.0056 0.0000
## factor(cyl)6 -4.2556 1.3861 -3.0702 0.0047
## factor(cyl)8 -6.0709 1.6523 -3.6742 0.0010
## wt -3.2056 0.7539 -4.2521 0.0002
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.5712 3.1939 12.3895 0.0000
## factor(cyl)6 -11.1624 9.3553 -1.1932 0.2436
## factor(cyl)8 -15.7032 4.8395 -3.2448 0.0032
## wt -5.6470 1.3595 -4.1538 0.0003
## factor(cyl)6:wt 2.8669 3.1173 0.9197 0.3662
## factor(cyl)8:wt 3.4546 1.6273 2.1229 0.0434
anova(fit_q3_1, fit_q3_2)
## Analysis of Variance Table
##
## Model 1: mpg ~ factor(cyl) + wt
## Model 2: mpg ~ factor(cyl) * wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 183.06
## 2 26 155.89 2 27.17 2.2658 0.1239
Consider the mtcars data set. Fit a model with mpg as
the outcome that includes number of cylinders as a factor variable and
weight inlcuded in the model as
lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
How is the wt coefficient interpretted?
Answer
summary(lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars))$coeff;
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.990794 1.887793 18.005569 6.257246e-17
## I(wt * 0.5) -6.411227 1.507791 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582 1.386073 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860 1.652288 -3.674214 9.991893e-04
summary(lm(mpg ~ wt + factor(cyl), data = mtcars))$coeff;
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.990794 1.8877934 18.005569 6.257246e-17
## wt -3.205613 0.7538957 -4.252065 2.130435e-04
## factor(cyl)6 -4.255582 1.3860728 -3.070244 4.717834e-03
## factor(cyl)8 -6.070860 1.6522878 -3.674214 9.991893e-04
The 0.5 will not affect the interpretation or change the
results of the final model.
\[mpg = \beta_0 + \beta_1 \cdot wt + \beta_2 \cdot cyl_6 + \beta_3 \cdot cyl_8\] For model 1:
\[mpg = 33.99 -6.41 \cdot wt \cdot 0.5 -4.25 \cdot cyl_6 -6.07 \cdot cyl_8\] For model 2:
\[mpg = 33.99 -3.206 \cdot wt -4.25 \cdot cyl_6 -6.07 \cdot cyl_8\]
In the end, both models has the same equation.
Consider the following data set
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
Give the hat diagonal for the most influential point
Answer
max(influence(lm(y ~ x))$hat)
## [1] 0.9945734
Consider the following data set
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
Give the slope dfbeta for the point with the highest hat value.
Answer
influence.measures(lm(y ~ x))
## Influence measures of
## lm(formula = y ~ x) :
##
## dfb.1_ dfb.x dffit cov.r cook.d hat inf
## 1 1.0621 -3.78e-01 1.0679 0.341 2.93e-01 0.229 *
## 2 0.0675 -2.86e-02 0.0675 2.934 3.39e-03 0.244
## 3 -0.0174 7.92e-03 -0.0174 3.007 2.26e-04 0.253 *
## 4 -1.2496 6.73e-01 -1.2557 0.342 3.91e-01 0.280 *
## 5 0.2043 -1.34e+02 -149.7204 0.107 2.70e+02 0.995 *
Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.