Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.
fit<-lm(mpg ~ wt + factor(cyl), data = mtcars)
fit$coef[4]
## factor(cyl)8
## -6.07086
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
g = ggplot(mtcars, aes(x = mtcars$wt, y = mtcars$mpg, colour = factor(mtcars$cyl)))
g = g + geom_point(size = 6, colour = "black") + geom_point(size = 4)
fit = lm(mpg ~ wt * factor(cyl), data = mtcars)
g1 = g
g1 +
geom_abline(intercept = coef(fit)[1], slope = coef(fit)[2], size = 2) +
geom_abline(intercept = coef(fit)[1] + coef(fit)[3], slope = coef(fit)[2] + coef(fit)[5], size = 2) +
geom_abline(intercept = coef(fit)[1] + coef(fit)[4], slope = coef(fit)[2] + coef(fit)[6], size = 2) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))+
ggtitle("Estimate for the expected change in mpg comparing 8, 6, 4 cylinders ")+
labs(x = "Weigth", y ="MPG") +
#Just to show that we can also use directly the R bult-in funcion I've overlaid the geom_smooth function with the trend line
#that I've built manually with the different coeff given my the LM function
geom_smooth(method = "lm")
Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.
fitAdj <-lm(mpg ~ wt + factor(cyl) , data=mtcars)
fitUnadj<-lm(mpg ~ factor(cyl), data=mtcars)
fitAdj$coef[4]
## factor(cyl)8
## -6.07086
fitUnadj$coef[3]
## factor(cyl)8
## -11.56364
c(fitAdj$coef[4],fitUnadj$coef[3])
## factor(cyl)8 factor(cyl)8
## -6.07086 -11.56364
#Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.
Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.
library(lmtest)
## Warning: package 'lmtest' was built under R version 3.3.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
#Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder.
fitConfounder <- lm(mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt)
#Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight.
fitInteraction <- lm(mtcars$mpg ~ factor(mtcars$cyl) * mtcars$wt)
lrtest(fitConfounder, fitInteraction)
## Likelihood ratio test
##
## Model 1: mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt
## Model 2: mtcars$mpg ~ factor(mtcars$cyl) * mtcars$wt
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 5 -73.311
## 2 7 -70.741 2 5.1412 0.07649 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms #may not be necessary.
Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as
lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
How is the wt coefficient interpretted?
#The estimated expected change in MPG per one ton increase in weight for a specific number of cylinders (4, 6, 8).
Consider the following data set
x <- c(0.586, 0.166, -0.042, -0.614, 11.72) y <- c(0.549, -0.026, -0.127, -0.751, 1.344) Give the hat diagonal for the most influential point
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit<-lm(y~x)
max(influence(fit)$hat)
## [1] 0.9945734
Consider the following data set
x <- c(0.586, 0.166, -0.042, -0.614, 11.72) y <- c(0.549, -0.026, -0.127, -0.751, 1.344) Give the slope dfbeta for the point with the highest hat value.
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit<-lm(y~x)
dfbetas(lm(y ~ x))
## (Intercept) x
## 1 1.06212391 -0.37811633
## 2 0.06748037 -0.02861769
## 3 -0.01735756 0.00791512
## 4 -1.24958248 0.67253246
## 5 0.20432010 -133.82261293
#or
influence.measures(lm(y ~ x))
## Influence measures of
## lm(formula = y ~ x) :
##
## dfb.1_ dfb.x dffit cov.r cook.d hat inf
## 1 1.0621 -3.78e-01 1.0679 0.341 2.93e-01 0.229 *
## 2 0.0675 -2.86e-02 0.0675 2.934 3.39e-03 0.244
## 3 -0.0174 7.92e-03 -0.0174 3.007 2.26e-04 0.253 *
## 4 -1.2496 6.73e-01 -1.2557 0.342 3.91e-01 0.280 *
## 5 0.2043 -1.34e+02 -149.7204 0.107 2.70e+02 0.995 *
Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.
#It is possible for the coefficient to reverse sign after adjustment.
#For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.