library(UsingR)
data (mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
factor(mtcars$cyl)
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
## Levels: 4 6 8
#Once 4 is the first value, will not be necessary to relevel
fit<-lm(mpg~factor(cyl)+wt,data=mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ factor(cyl) + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5890 -1.2357 -0.5159 1.3845 5.7915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.9908 1.8878 18.006 < 2e-16 ***
## factor(cyl)6 -4.2556 1.3861 -3.070 0.004718 **
## factor(cyl)8 -6.0709 1.6523 -3.674 0.000999 ***
## wt -3.2056 0.7539 -4.252 0.000213 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.557 on 28 degrees of freedom
## Multiple R-squared: 0.8374, Adjusted R-squared: 0.82
## F-statistic: 48.08 on 3 and 28 DF, p-value: 3.594e-11
#So, factor(cyl) 8 is -6.0709
Answer:
33.991
-4.256
-3.206
#Using the results from question 1, let's fit a model without adjusting with wt
fitnowt<-lm(mpg~factor(cyl), mtcars)
summary(fitnowt)
##
## Call:
## lm(formula = mpg ~ factor(cyl), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2636 -1.8357 0.0286 1.3893 7.2364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.6636 0.9718 27.437 < 2e-16 ***
## factor(cyl)6 -6.9208 1.5583 -4.441 0.000119 ***
## factor(cyl)8 -11.5636 1.2986 -8.905 8.57e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.223 on 29 degrees of freedom
## Multiple R-squared: 0.7325, Adjusted R-squared: 0.714
## F-statistic: 39.7 on 2 and 29 DF, p-value: 4.979e-09
#Here factor(cyl) 8 is -11.56364 and we can say that:
Answer:
** Including or excluding weight does not appear to change anything regarding the estimated impact of number of cylinders on mpg.
** Within a given weight, 8 cylinder vehicles have an expected 12 mpg drop in fuel efficiency.
** Holding weight constant, cylinder appears to have more of an impact on mpg than if weight is disregarded.
#Considers number of cylinders as a factor variable and weight as confounder is the question 1 case: wt is added to factor(cyl) and we can use fit
#Now, considers the interaction between number of cylinders (as a factor variable) and weight we must multiply factor(cyl) by wt in fit3
fit3<-lm(mpg~factor(cyl)*wt, mtcars)
#now we can us the Analysis of Variance Table to look at the P-value
anova(fit, fit3)
## Analysis of Variance Table
##
## Model 1: mpg ~ factor(cyl) + wt
## Model 2: mpg ~ factor(cyl) * wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 183.06
## 2 26 155.89 2 27.17 2.2658 0.1239
#P-value is 0.1239, larger than 0.05.
Answer:
** The P-value is small (less than 0.05). So, according to our criterion, we reject, which suggests that the interaction term is necessary
** The P-value is small (less than 0.05). Thus it is surely true that there is no interaction term in the true model.
** The P-value is small (less than 0.05). Thus it is surely true that there is an interaction term in the true model.
** The P-value is small (less than 0.05). So, according to our criterion, we reject, which suggests that the interaction term is not necessary.
** The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms is necessary.
fit4<-lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
summary(fit4)
##
## Call:
## lm(formula = mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5890 -1.2357 -0.5159 1.3845 5.7915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.991 1.888 18.006 < 2e-16 ***
## I(wt * 0.5) -6.411 1.508 -4.252 0.000213 ***
## factor(cyl)6 -4.256 1.386 -3.070 0.004718 **
## factor(cyl)8 -6.071 1.652 -3.674 0.000999 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.557 on 28 degrees of freedom
## Multiple R-squared: 0.8374, Adjusted R-squared: 0.82
## F-statistic: 48.08 on 3 and 28 DF, p-value: 3.594e-11
#remember that the weight is expressed in 1000lbs and a ton is 2000lbs so
Answer:
** The estimated expected change in MPG per half ton increase in weight for the average number of cylinders.
** The estimated expected change in MPG per half ton increase in weight.
** The estimated expected change in MPG per half ton increase in weight for for a specific number of cylinders (4, 6, 8).
** The estimated expected change in MPG per one ton increase in weight.
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit5<-lm(y~x)
rstudent(fit5)
## 1 2 3 4 5
## 1.96142171 0.11888841 -0.02986561 -2.01139691 -11.05925885
#The most influencial point is the fifth point
hatvalues(fit5)
## 1 2 3 4 5
## 0.2286650 0.2438146 0.2525027 0.2804443 0.9945734
#So, the fifth hatvalue is
Answer:
** 0.2025
** 0.2287
** 0.2804
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit6<-lm(y~x)
round (hatvalues(fit6)[1:5],3)
## 1 2 3 4 5
## 0.229 0.244 0.253 0.280 0.995
# we can see that the highest hatvalue is the fifth
round(dfbetas(fit6)[1:5,2],3)
## 1 2 3 4 5
## -0.378 -0.029 0.008 0.673 -133.823
# So the fifth value of the slope dfbeta is
Answer:
** -0.378
** 0.673
** -.00134
Answer:
** The coefficient can’t change sign after adjustment, except for slight numerical pathological cases.
** For the the coefficient to change sign, there must be a significant interaction term.
** Adjusting for another variable can only attenuate the coefficient toward zero. It can’t materially change sign.