Q1

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

data("mtcars")
names(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
fit1 <- lm(mpg~wt+I(factor(cyl)), data = mtcars)
summary(fit1)$coef
##                  Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)     33.990794  1.8877934 18.005569 6.257246e-17
## wt              -3.205613  0.7538957 -4.252065 2.130435e-04
## I(factor(cyl))6 -4.255582  1.3860728 -3.070244 4.717834e-03
## I(factor(cyl))8 -6.070860  1.6522878 -3.674214 9.991893e-04
# Plotting for direct visualization
par(mfrow = c(2,2))
plot(fit1)

Q2

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.

fit1 <- lm(mpg~wt+I(factor(cyl)), data = mtcars)
fit0 <- lm(mpg~I(factor(cyl)), data = mtcars)
# Holding weight constant
summary(fit0)$coef
##                   Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)      26.663636  0.9718008 27.437347 2.688358e-22
## I(factor(cyl))6  -6.920779  1.5583482 -4.441099 1.194696e-04
## I(factor(cyl))8 -11.563636  1.2986235 -8.904534 8.568209e-10
# Weight is disregarded
summary(fit1)$coef
##                  Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)     33.990794  1.8877934 18.005569 6.257246e-17
## wt              -3.205613  0.7538957 -4.252065 2.130435e-04
## I(factor(cyl))6 -4.255582  1.3860728 -3.070244 4.717834e-03
## I(factor(cyl))8 -6.070860  1.6522878 -3.674214 9.991893e-04

-Answer: From the results we can conclude - Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.

Q3

Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

fit3 <- lm(mpg~ factor(cyl)*wt, data = mtcars)
summary(fit3)$coef
##                   Estimate Std. Error    t value     Pr(>|t|)
## (Intercept)      39.571196   3.193940 12.3894599 2.058359e-12
## factor(cyl)6    -11.162351   9.355346 -1.1931522 2.435843e-01
## factor(cyl)8    -15.703167   4.839464 -3.2448150 3.223216e-03
## wt               -5.647025   1.359498 -4.1537586 3.127578e-04
## factor(cyl)6:wt   2.866919   3.117330  0.9196716 3.661987e-01
## factor(cyl)8:wt   3.454587   1.627261  2.1229458 4.344037e-02
anova(fit1, fit3, test = "Chisq")
## Analysis of Variance Table
## 
## Model 1: mpg ~ wt + I(factor(cyl))
## Model 2: mpg ~ factor(cyl) * wt
##   Res.Df    RSS Df Sum of Sq Pr(>Chi)
## 1     28 183.06                      
## 2     26 155.89  2     27.17   0.1038

- answer: The P-value is 0.1 which is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.

Q4

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
## 
## Call:
## lm(formula = mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
## 
## Coefficients:
##  (Intercept)   I(wt * 0.5)  factor(cyl)6  factor(cyl)8  
##       33.991        -6.411        -4.256        -6.071

- Answer: Since the unit of wt is (lb/2000), and one (short) ton is 2000 lbs. Therefore \(wt*0.5 = ton\). The estimated expected change in MPG per one ton increase in weight for for a specific number of cylinders (4, 6, 8).

Q5

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the hat diagonal for the most influential point

fit5 <- lm(y~x)


# Plotting for direct visualization
par(mfrow = c(2,2))
plot(fit5)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

# From the plot we can see clearly the point 5th is the most influent to the fitting

# The hat diagonal for the most influential point
hatvalues(fit5)
##         1         2         3         4         5 
## 0.2286650 0.2438146 0.2525027 0.2804443 0.9945734

Q6

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

# It is the above datasets, from the plot and the previous results we conclude that the 5th point is the most influent point for fitting

Give the slope dfbeta for the point with the highest hat value.

fit6 <- lm(y~x)
dfbetas(fit6)
##   (Intercept)             x
## 1  1.06212391   -0.37811633
## 2  0.06748037   -0.02861769
## 3 -0.01735756    0.00791512
## 4 -1.24958248    0.67253246
## 5  0.20432010 -133.82261293

Q7

Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.

- Anser: It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.