Consider the mtcars data set. Fit a model with mpg as the outcome (Y) that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.
data(mtcars)
fit <- lm(mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt)
summary(fit)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.990794 1.8877934 18.005569 6.257246e-17
## factor(mtcars$cyl)6 -4.255582 1.3860728 -3.070244 4.717834e-03
## factor(mtcars$cyl)8 -6.070860 1.6522878 -3.674214 9.991893e-04
## mtcars$wt -3.205613 0.7538957 -4.252065 2.130435e-04
Notice, 4-cylinders is not displayed as one of the factors. First, review the data mtcars. lm assumes the first level of the factor is 4-cylinders and therefore the coefficients values are based on the reference level.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Therefore the answer 8-cylinders is the third row of the Estimate column.
summary(fit)$coefficients[3,1]
## [1] -6.07086
Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?
data(mtcars)
fit <- lm(mtcars$mpg ~ factor(mtcars$cyl))
summary(fit)$coefficients[3,1]
## [1] -11.56364
fit2 <- lm(mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt)
summary(fit2)$coefficients[3,1]
## [1] -6.07086
The unadjusted beta values (coefficients) are higher, therefore weight is significant
Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type 1 error rate significance benchmark.
data(mtcars)
fit <- lm(mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt)
fit1 <- lm(mtcars$mpg ~ factor(mtcars$cyl) * mtcars$wt)
anova(fit, fit1)
## Analysis of Variance Table
##
## Model 1: mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt
## Model 2: mtcars$mpg ~ factor(mtcars$cyl) * mtcars$wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 183.06
## 2 26 155.89 2 27.17 2.2658 0.1239
The P-value is larger than 0.05
Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight included in the model as
lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
How is the we coefficient interpretted?
fit <- lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
lm(fit)
##
## Call:
## lm(formula = fit)
##
## Coefficients:
## (Intercept) I(wt * 0.5) factor(cyl)6 factor(cyl)8
## 33.991 -6.411 -4.256 -6.071
Note: According to mtcars help page, weight is per 1000 lbs
Consider the following data set
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
Give the hat diagonal for the most influential point
fit <- lm(y ~ x)
plot(x,y, frame = FALSE, cex = 2, bg = "lightblue", col='black')
abline(fit)
Now look for the outlier and determine which has the most potential for influence
round(dfbetas(fit)[1 : 5, 2], 4)
## 1 2 3 4 5
## -0.3781 -0.0286 0.0079 0.6725 -133.8226
round(hatvalues(fit)[1:5], 4)
## 1 2 3 4 5
## 0.2287 0.2438 0.2525 0.2804 0.9946
dfit <- dffits(fit)
dfit
## 1 2 3 4 5
## 1.06794603 0.06750799 -0.01735800 -1.25570867 -149.72037760
max <- max(abs(dffits(fit)))
round(hatvalues(fit)[which(abs(dfit)==max)], 4)
## 5
## 0.9946
Clearly the 5th point has the largest dfbeta for a predictor. By having a large leverage value and dfbeta, that point has a high potential for influence.
Consider the following data set
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
Give the slop dfbeta for the point with the highest hat value
Using the results from above
max <- max(hatvalues(fit))
round(dfbetas(fit)[which(hatvalues(fit)==max),2], 4)
## [1] -133.8226
Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.