We will need to convert the cylinder column to factor
data("mtcars")
fit1 <- lm(mpg ~ factor(cyl) + wt, data = mtcars)
coefficients(fit1)
## (Intercept) factor(cyl)6 factor(cyl)8 wt
## 33.990794 -4.255582 -6.070860 -3.205613
The adjusted estimate for expected change in mpg comparing 8 cylinders to 4 is -6.0708597
The adjusted estimate for expected change in mpg comparing 8 cylinders to 4 is -6.0708597. This is the answer from the first question.
data("mtcars")
fit2 <- lm(mpg ~ factor(cyl), data = mtcars)
coefficients(fit2)
## (Intercept) factor(cyl)6 factor(cyl)8
## 26.663636 -6.920779 -11.563636
The adjusted estimate for expected change in mpg comparing 8 cylinders to 4 is -6.0708597. This is the answer from the first question. Summarizing the fit 33.990794, -4.2555824, -6.0708597, -3.2056133
The unadjusted estimate for expected change in mpg comparing 8 cylinders to 4 is -11.5636364. Summarizing the fit 26.6636364, -6.9207792, -11.5636364
From the above, we can say that adjusted model has less of an impacted on mpg than unadjusted.
data("mtcars")
fit3 <- lm(mpg ~ factor(cyl)*wt, data = mtcars)
coefficients(fit3)
## (Intercept) factor(cyl)6 factor(cyl)8 wt
## 39.571196 -11.162351 -15.703167 -5.647025
## factor(cyl)6:wt factor(cyl)8:wt
## 2.866919 3.454587
summary(fit1)
##
## Call:
## lm(formula = mpg ~ factor(cyl) + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5890 -1.2357 -0.5159 1.3845 5.7915
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.9908 1.8878 18.006 < 2e-16 ***
## factor(cyl)6 -4.2556 1.3861 -3.070 0.004718 **
## factor(cyl)8 -6.0709 1.6523 -3.674 0.000999 ***
## wt -3.2056 0.7539 -4.252 0.000213 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.557 on 28 degrees of freedom
## Multiple R-squared: 0.8374, Adjusted R-squared: 0.82
## F-statistic: 48.08 on 3 and 28 DF, p-value: 3.594e-11
summary(fit3)
##
## Call:
## lm(formula = mpg ~ factor(cyl) * wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1513 -1.3798 -0.6389 1.4938 5.2523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.571 3.194 12.389 2.06e-12 ***
## factor(cyl)6 -11.162 9.355 -1.193 0.243584
## factor(cyl)8 -15.703 4.839 -3.245 0.003223 **
## wt -5.647 1.359 -4.154 0.000313 ***
## factor(cyl)6:wt 2.867 3.117 0.920 0.366199
## factor(cyl)8:wt 3.455 1.627 2.123 0.043440 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.449 on 26 degrees of freedom
## Multiple R-squared: 0.8616, Adjusted R-squared: 0.8349
## F-statistic: 32.36 on 5 and 26 DF, p-value: 2.258e-10
anova(fit1, fit3)
## Analysis of Variance Table
##
## Model 1: mpg ~ factor(cyl) + wt
## Model 2: mpg ~ factor(cyl) * wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 183.06
## 2 26 155.89 2 27.17 2.2658 0.1239
From the above anova test for the two models, we see that the P value of 0.1239 is higher than .05. So we will fail to reject.
This is the estimated expected change in MPG per half ton increase in weight for a specific number of cylinders (4, 6, 8).
The most influential point in predictor x is 11.72. This value is furthest away from the rest of the x value group. we will identify the hat value of that influential point.
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit5 <- lm(y~x)
influence(fit5)$hat
## 1 2 3 4 5
## 0.2286650 0.2438146 0.2525027 0.2804443 0.9945734
The Hat value is .995
The slope dfbeta is the second column. We need to identify that value for highest hat value.
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
fit6 <- lm(y~x)
influence.measures(fit5)
## Influence measures of
## lm(formula = y ~ x) :
##
## dfb.1_ dfb.x dffit cov.r cook.d hat inf
## 1 1.0621 -3.78e-01 1.0679 0.341 2.93e-01 0.229 *
## 2 0.0675 -2.86e-02 0.0675 2.934 3.39e-03 0.244
## 3 -0.0174 7.92e-03 -0.0174 3.007 2.26e-04 0.253 *
## 4 -1.2496 6.73e-01 -1.2557 0.342 3.91e-01 0.280 *
## 5 0.2043 -1.34e+02 -149.7204 0.107 2.70e+02 0.995 *
The slope dfbeta is -1.34e+02 = -134
It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.