Quuestion 1

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

fit<-lm(mpg ~ wt + factor(cyl), data = mtcars)
fit$coef[4]
## factor(cyl)8 
##     -6.07086
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
g = ggplot(mtcars, aes(x = mtcars$wt, y = mtcars$mpg, colour = factor(mtcars$cyl)))
g = g + geom_point(size = 6, colour = "black") + geom_point(size = 4)
fit = lm(mpg ~ wt * factor(cyl), data = mtcars)
g1 = g
g1 + 
geom_abline(intercept = coef(fit)[1], slope = coef(fit)[2], size = 2) + 
geom_abline(intercept = coef(fit)[1] + coef(fit)[3], slope = coef(fit)[2] + coef(fit)[5], size = 2) +
geom_abline(intercept = coef(fit)[1] + coef(fit)[4], slope = coef(fit)[2] + coef(fit)[6], size = 2) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))+
ggtitle("Estimate for the expected change in mpg comparing 8, 6, 4 cylinders ")+
labs(x = "Weigth", y ="MPG") +
#Just to show that we can also use directly the R bult-in funcion I've overlaid the geom_smooth function with the trend line
#that I've built manually with the different coeff given my the LM function
 geom_smooth(method = "lm")

Quuestion 2

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.

fitAdj  <-lm(mpg ~ wt + factor(cyl) , data=mtcars)
fitUnadj<-lm(mpg ~ factor(cyl), data=mtcars)

fitAdj$coef[4]
## factor(cyl)8 
##     -6.07086
fitUnadj$coef[3]
## factor(cyl)8 
##    -11.56364
c(fitAdj$coef[4],fitUnadj$coef[3])
## factor(cyl)8 factor(cyl)8 
##     -6.07086    -11.56364
#Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.

Quuestion 3

Consider the mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

library(lmtest)
## Warning: package 'lmtest' was built under R version 3.3.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
#Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder.
fitConfounder <- lm(mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt)
#Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight.
fitInteraction <- lm(mtcars$mpg ~ factor(mtcars$cyl) * mtcars$wt)
lrtest(fitConfounder, fitInteraction)
## Likelihood ratio test
## 
## Model 1: mtcars$mpg ~ factor(mtcars$cyl) + mtcars$wt
## Model 2: mtcars$mpg ~ factor(mtcars$cyl) * mtcars$wt
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1   5 -73.311                       
## 2   7 -70.741  2 5.1412    0.07649 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms #may not be necessary.

Quuestion 4

Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)

How is the wt coefficient interpretted?

#The estimated expected change in MPG per one ton increase in weight for a specific number of cylinders (4, 6, 8).

Quuestion 5

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72) y <- c(0.549, -0.026, -0.127, -0.751, 1.344) Give the hat diagonal for the most influential point

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit<-lm(y~x)
max(influence(fit)$hat)
## [1] 0.9945734

Quuestion 6

Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72) y <- c(0.549, -0.026, -0.127, -0.751, 1.344) Give the slope dfbeta for the point with the highest hat value.

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit<-lm(y~x)
dfbetas(lm(y ~ x))
##   (Intercept)             x
## 1  1.06212391   -0.37811633
## 2  0.06748037   -0.02861769
## 3 -0.01735756    0.00791512
## 4 -1.24958248    0.67253246
## 5  0.20432010 -133.82261293
#or
influence.measures(lm(y ~ x))
## Influence measures of
##   lm(formula = y ~ x) :
## 
##    dfb.1_     dfb.x     dffit cov.r   cook.d   hat inf
## 1  1.0621 -3.78e-01    1.0679 0.341 2.93e-01 0.229   *
## 2  0.0675 -2.86e-02    0.0675 2.934 3.39e-03 0.244    
## 3 -0.0174  7.92e-03   -0.0174 3.007 2.26e-04 0.253   *
## 4 -1.2496  6.73e-01   -1.2557 0.342 3.91e-01 0.280   *
## 5  0.2043 -1.34e+02 -149.7204 0.107 2.70e+02 0.995   *

Quuestion 7

Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.

#It is possible for the coefficient to reverse sign after adjustment. 
#For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.