Linear Regression

Question 1

Consider the mtcars data set. Fit a model with mpg as the outcome (Y) that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

-3.206
33.991
-6.071
-4.256

Answer

data(mtcars)
fit <- lm(mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt)
summary(fit)$coefficients

##                      Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)         33.990794  1.8877934 18.005569 6.257246e-17
## factor(mtcars$cyl)6 -4.255582  1.3860728 -3.070244 4.717834e-03
## factor(mtcars$cyl)8 -6.070860  1.6522878 -3.674214 9.991893e-04
## mtcars$wt           -3.205613  0.7538957 -4.252065 2.130435e-04

Notice, 4-cylinders is not displayed as one of the factors. First, review the data mtcars. lm assumes the first level of the factor is 4-cylinders and therefore the coefficients values are based on the reference level.

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Therefore the answer 8-cylinders is the third row of the Estimate column.

summary(fit)$coefficients[3,1]

## [1] -6.07086

Question 2

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?

Within a given weight, 8 cylinder vehicles have an expected 12 mpg drop in fuel efficiency.
Including or excluding weight does not appear to change anything regarding the estimated impact of number of cylinders on mpg.
Holding weight constant, cyclinder appears to have more of an impact on mpg than if weight is disregarded
Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.

Answer

data(mtcars)
fit <- lm(mtcars$mpg ~ factor(mtcars$cyl))
summary(fit)$coefficients[3,1]

## [1] -11.56364

fit2 <- lm(mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt)
summary(fit2)$coefficients[3,1]

## [1] -6.07086

The unadjusted beta values (coefficients) are higher, therefore weight is significant

Question 3

Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type 1 error rate significance benchmark.

The P-value is small (less than 0.05). Thus it is surely true that there is no interaction term in the true model
The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms is necessary
The P-value is small (less than 0.05). so, according to our criterion, we reject, which suggests that the interaction term is necessary
The P-value is small (less than 0.05). Thus it is surely true that there is an interaction term in the true model.
The P-value is larger than 0.05, So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary
Th P-value is small (less than 0.05). So, according to our criterion, we reject which suggests that the interaction term is not necessary.

Answer

data(mtcars)
fit <- lm(mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt)
fit1 <- lm(mtcars$mpg ~ factor(mtcars$cyl) * mtcars$wt)
anova(fit, fit1)

## Analysis of Variance Table
## 
## Model 1: mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt
## Model 2: mtcars$mpg ~ factor(mtcars$cyl) * mtcars$wt
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     28 183.06                           
## 2     26 155.89  2     27.17 2.2658 0.1239

The P-value is larger than 0.05

Question 4

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight included in the model as

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)

How is the we coefficient interpretted?

The estimated expected change in MPG per half ton increase in weight.
The estimated expected change in MPG per one ton increase in weight for a specfic number of cylinders (4, 6, 8).
The estimated expected change in MPG per half ton increase in weight for the average number of cylinders.
The estimated expected change in MPG per one ton increase in weight.
The estimated expected chagne in MPG per half ton increase in weight for a specific number of cylinders (4, 6, 8).

Answer

fit <- lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
lm(fit)

## 
## Call:
## lm(formula = fit)
## 
## Coefficients:
##  (Intercept)   I(wt * 0.5)  factor(cyl)6  factor(cyl)8  
##       33.991        -6.411        -4.256        -6.071

Note: According to mtcars help page, weight is per 1000 lbs

Question 5

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the hat diagonal for the most influential point

0.2804
0.2025
0.2287
0.9946

Answer

fit <- lm(y ~ x)
plot(x,y, frame = FALSE, cex = 2, bg = "lightblue", col='black')
abline(fit)

Now look for the outlier and determine which has the most potential for influence

round(dfbetas(fit)[1 : 5, 2], 4)

##         1         2         3         4         5 
##   -0.3781   -0.0286    0.0079    0.6725 -133.8226

round(hatvalues(fit)[1:5], 4)

##      1      2      3      4      5 
## 0.2287 0.2438 0.2525 0.2804 0.9946

dfit <- dffits(fit)
dfit

##             1             2             3             4             5 
##    1.06794603    0.06750799   -0.01735800   -1.25570867 -149.72037760

max <- max(abs(dffits(fit)))
round(hatvalues(fit)[which(abs(dfit)==max)], 4)

##      5 
## 0.9946

Clearly the 5th point has the largest dfbeta for a predictor. By having a large leverage value and dfbeta, that point has a high potential for influence.

Question 6

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the slop dfbeta for the point with the highest hat value

0.673
-134
-0.378
-.00134

Answer

Using the results from above

max <- max(hatvalues(fit))
round(dfbetas(fit)[which(hatvalues(fit)==max),2], 4)

## [1] -133.8226

Question 7

Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.

Adjusting for another variable can only attenuate the coefficient toward zero. It can’t materially change sign.
the coefficient can’t change sign after adjustment, except for slight numerical pathological cases
It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.
For the coefficient to change sign, there must bbe a significant interaction term.

Linear Regression - Quiz 3

Mark Spoto

April 13, 2018

Question 1

Answer

Question 2

Answer

Question 3

Answer

Question 4

Answer

Question 5

Answer

Question 6

Answer

Question 7