Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4
mtcars$cyl<- factor(mtcars$cyl)
fit<- lm(mpg~cyl+wt,mtcars)
summary(fit)$coef[3,1]
## [1] -6.07086
Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.
mtcars$cyl<- factor(mtcars$cyl)
fit1<- lm(mpg~cyl+wt,mtcars)
fit2<- lm(mpg~cyl,mtcars)
summary(fit1)
##
## Call:
## lm(formula = mpg ~ cyl + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5890 -1.2357 -0.5159 1.3845 5.7915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.9908 1.8878 18.006 < 2e-16 ***
## cyl6 -4.2556 1.3861 -3.070 0.004718 **
## cyl8 -6.0709 1.6523 -3.674 0.000999 ***
## wt -3.2056 0.7539 -4.252 0.000213 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.557 on 28 degrees of freedom
## Multiple R-squared: 0.8374, Adjusted R-squared: 0.82
## F-statistic: 48.08 on 3 and 28 DF, p-value: 3.594e-11
summary(fit2)
##
## Call:
## lm(formula = mpg ~ cyl, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2636 -1.8357 0.0286 1.3893 7.2364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.6636 0.9718 27.437 < 2e-16 ***
## cyl6 -6.9208 1.5583 -4.441 0.000119 ***
## cyl8 -11.5636 1.2986 -8.905 8.57e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.223 on 29 degrees of freedom
## Multiple R-squared: 0.7325, Adjusted R-squared: 0.714
## F-statistic: 39.7 on 2 and 29 DF, p-value: 4.979e-09
Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
mtcars$cyl<- factor(mtcars$cyl)
fit1<- lm(mpg~cyl+wt,mtcars)
fit2<- lm(mpg~cyl+wt+cyl:wt,mtcars)
lrtest(fit1,fit2)
## Likelihood ratio test
##
## Model 1: mpg ~ cyl + wt
## Model 2: mpg ~ cyl + wt + cyl:wt
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 5 -73.311
## 2 7 -70.741 2 5.1412 0.07649 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.
#Also, the anova function will tell u the same thing
Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars) How is the wt coefficient interpretted?
fit<- lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5890 -1.2357 -0.5159 1.3845 5.7915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.991 1.888 18.006 < 2e-16 ***
## I(wt * 0.5) -6.411 1.508 -4.252 0.000213 ***
## factor(cyl)6 -4.256 1.386 -3.070 0.004718 **
## factor(cyl)8 -6.071 1.652 -3.674 0.000999 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.557 on 28 degrees of freedom
## Multiple R-squared: 0.8374, Adjusted R-squared: 0.82
## F-statistic: 48.08 on 3 and 28 DF, p-value: 3.594e-11
Consider the following data set x <- c(0.586, 0.166, -0.042, -0.614, 11.72) y <- c(0.549, -0.026, -0.127, -0.751, 1.344) Give the hat diagonal for the most influential point
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit<- lm(y~x)
#Approach 1
predict(fit)
## 1 2 3 4 5
## -0.03120904 -0.08533002 -0.11213279 -0.18584041 1.40351226
hat(x,intercept=TRUE)
## [1] 0.2286650 0.2438146 0.2525027 0.2804443 0.9945734
#Approach 2
influence.measures(fit)
## Influence measures of
## lm(formula = y ~ x) :
##
## dfb.1_ dfb.x dffit cov.r cook.d hat inf
## 1 1.0621 -3.78e-01 1.0679 0.341 2.93e-01 0.229 *
## 2 0.0675 -2.86e-02 0.0675 2.934 3.39e-03 0.244
## 3 -0.0174 7.92e-03 -0.0174 3.007 2.26e-04 0.253 *
## 4 -1.2496 6.73e-01 -1.2557 0.342 3.91e-01 0.280 *
## 5 0.2043 -1.34e+02 -149.7204 0.107 2.70e+02 0.995 *
#Approach 3
max(hatvalues(fit))
## [1] 0.9945734
Consider the following data set x <- c(0.586, 0.166, -0.042, -0.614, 11.72) y <- c(0.549, -0.026, -0.127, -0.751, 1.344) Give the slope dfbeta for the point with the highest hat value.
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit<- lm(y~x)
influence.measures(fit)$infmat[5,"dfb.x"]
## [1] -133.8226
Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
z<- c(0.5,0.4,0.6,0.7,0.2)
fit<- lm(y~x)
fit_adj<- lm(y~x+z)
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## 1 2 3 4 5
## 0.58021 0.05933 -0.01487 -0.56516 -0.05951
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1067 0.2354 -0.453 0.6811
## x 0.1289 0.0448 2.877 0.0637 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4702 on 3 degrees of freedom
## Multiple R-squared: 0.7339, Adjusted R-squared: 0.6452
## F-statistic: 8.275 on 1 and 3 DF, p-value: 0.06371
summary(fit_adj)
##
## Call:
## lm(formula = y ~ x + z)
##
## Residuals:
## 1 2 3 4 5
## 0.49345 -0.30595 0.09667 -0.25103 -0.03314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.25676 1.24949 1.006 0.420
## x 0.05232 0.08137 0.643 0.586
## z -2.46372 2.22023 -1.110 0.383
##
## Residual standard error: 0.4531 on 2 degrees of freedom
## Multiple R-squared: 0.8353, Adjusted R-squared: 0.6706
## F-statistic: 5.072 on 2 and 2 DF, p-value: 0.1647