Coursera Regression Models Quiz 3

Prepare the environment for the analysis

library(UsingR)

1. Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

data (mtcars)
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

factor(mtcars$cyl)

##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
## Levels: 4 6 8

#Once 4 is the first value, will not be necessary to relevel
fit<-lm(mpg~factor(cyl)+wt,data=mtcars)
summary(fit)

## 
## Call:
## lm(formula = mpg ~ factor(cyl) + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5890 -1.2357 -0.5159  1.3845  5.7915 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   33.9908     1.8878  18.006  < 2e-16 ***
## factor(cyl)6  -4.2556     1.3861  -3.070 0.004718 ** 
## factor(cyl)8  -6.0709     1.6523  -3.674 0.000999 ***
## wt            -3.2056     0.7539  -4.252 0.000213 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.557 on 28 degrees of freedom
## Multiple R-squared:  0.8374, Adjusted R-squared:   0.82 
## F-statistic: 48.08 on 3 and 28 DF,  p-value: 3.594e-11

#So, factor(cyl) 8 is -6.0709

Answer:

-6.071

33.991

-4.256

-3.206

2. Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.

#Using the results from question 1, let's fit a model without adjusting with wt
fitnowt<-lm(mpg~factor(cyl), mtcars)
summary(fitnowt)

## 
## Call:
## lm(formula = mpg ~ factor(cyl), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2636 -1.8357  0.0286  1.3893  7.2364 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   26.6636     0.9718  27.437  < 2e-16 ***
## factor(cyl)6  -6.9208     1.5583  -4.441 0.000119 ***
## factor(cyl)8 -11.5636     1.2986  -8.905 8.57e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.223 on 29 degrees of freedom
## Multiple R-squared:  0.7325, Adjusted R-squared:  0.714 
## F-statistic:  39.7 on 2 and 29 DF,  p-value: 4.979e-09

#Here factor(cyl) 8 is -11.56364 and we can say that:

Answer:

** Including or excluding weight does not appear to change anything regarding the estimated impact of number of cylinders on mpg.

** Within a given weight, 8 cylinder vehicles have an expected 12 mpg drop in fuel efficiency.

** Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.

** Holding weight constant, cylinder appears to have more of an impact on mpg than if weight is disregarded.

3. Consider the mtcars data set. Fit a model with mpg as the outcome that consider number of cylinders as a factor variable and weight as a confounder. Now fit a second modelwith mpg as the outcomemodel that considers the interaction between number of cylinders(as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significange benchmark.

#Considers number of cylinders as a factor variable and weight as confounder is the question 1 case: wt is added to factor(cyl) and we can use fit
#Now, considers the interaction between number of cylinders (as a factor variable) and weight we must multiply factor(cyl) by wt in fit3
fit3<-lm(mpg~factor(cyl)*wt, mtcars)
#now we can us the Analysis of Variance Table to look at the P-value
anova(fit, fit3)

## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(cyl) + wt
## Model 2: mpg ~ factor(cyl) * wt
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     28 183.06                           
## 2     26 155.89  2     27.17 2.2658 0.1239

#P-value is 0.1239, larger than 0.05.

Answer:

** The P-value is small (less than 0.05). So, according to our criterion, we reject, which suggests that the interaction term is necessary

** The P-value is small (less than 0.05). Thus it is surely true that there is no interaction term in the true model.

** The P-value is small (less than 0.05). Thus it is surely true that there is an interaction term in the true model.

** The P-value is small (less than 0.05). So, according to our criterion, we reject, which suggests that the interaction term is not necessary.

** The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.

** The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms is necessary.

4. Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

fit4<-lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
summary(fit4)

## 
## Call:
## lm(formula = mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5890 -1.2357 -0.5159  1.3845  5.7915 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    33.991      1.888  18.006  < 2e-16 ***
## I(wt * 0.5)    -6.411      1.508  -4.252 0.000213 ***
## factor(cyl)6   -4.256      1.386  -3.070 0.004718 ** 
## factor(cyl)8   -6.071      1.652  -3.674 0.000999 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.557 on 28 degrees of freedom
## Multiple R-squared:  0.8374, Adjusted R-squared:   0.82 
## F-statistic: 48.08 on 3 and 28 DF,  p-value: 3.594e-11

#remember that the weight is expressed in 1000lbs and a ton is 2000lbs so

How is the wt coefficient interpretted?

Answer:

** The estimated expected change in MPG per half ton increase in weight for the average number of cylinders.

** The estimated expected change in MPG per half ton increase in weight.

** The estimated expected change in MPG per half ton increase in weight for for a specific number of cylinders (4, 6, 8).

** The estimated expected change in MPG per one ton increase in weight.

** The estimated expected change in MPG per one ton increase in weight for a specific number of cylinders (4, 6, 8).

5. Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the hat diagonal for the most influential point

fit5<-lm(y~x)
rstudent(fit5)

##            1            2            3            4            5 
##   1.96142171   0.11888841  -0.02986561  -2.01139691 -11.05925885

#The most influencial point is the fifth point
hatvalues(fit5)

##         1         2         3         4         5 
## 0.2286650 0.2438146 0.2525027 0.2804443 0.9945734

#So, the fifth hatvalue is

Answer:

** 0.2025

** 0.9946

** 0.2287

** 0.2804

6. Consider the following data set

x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)

Give the slope dfbeta for the point with the highest hat value.

fit6<-lm(y~x)
round (hatvalues(fit6)[1:5],3)

##     1     2     3     4     5 
## 0.229 0.244 0.253 0.280 0.995

# we can see that the highest hatvalue is the fifth 
round(dfbetas(fit6)[1:5,2],3)

##        1        2        3        4        5 
##   -0.378   -0.029    0.008    0.673 -133.823

# So the fifth value of the slope dfbeta is

Answer:

** -0.378

** 0.673

** -.00134

** -134

7. Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.

Answer:

** The coefficient can’t change sign after adjustment, except for slight numerical pathological cases.

** For the the coefficient to change sign, there must be a significant interaction term.

** Adjusting for another variable can only attenuate the coefficient toward zero. It can’t materially change sign.

** It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.

Coursera Regression Models Quiz 3

cwerneck - Claudia Werneck

27 de novembro de 2017

Prepare the environment for the analysis

1. Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

-6.071

** Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.

** The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.

4. Consider the mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

How is the wt coefficient interpretted?

** The estimated expected change in MPG per one ton increase in weight for a specific number of cylinders (4, 6, 8).

5. Consider the following data set

Give the hat diagonal for the most influential point

** 0.9946

6. Consider the following data set

Give the slope dfbeta for the point with the highest hat value.

** -134

7. Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.

** It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.