You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:
mtcars is a data frame with 32 observations on 11 (numeric) variables.
t.test(mtcars[mtcars$am == 0,]$mpg, mtcars[mtcars$am == 1,]$mpg)
##
## Welch Two Sample t-test
##
## data: mtcars[mtcars$am == 0, ]$mpg and mtcars[mtcars$am == 1, ]$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
There is significant difference in mpg between cars with automatic transmission and cars with manual transmission.
mdl <- lm(mpg ~ ., mtcars); summary(mdl)
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
It seems that wt, am, drat, qsec, gear are all high in coefficient with mpg, especially wt. Hence I am going to compare multiple models.
anova(lm(mpg ~ am, mtcars), #1
lm(mpg ~ am + wt, mtcars), #2
lm(mpg ~ am + wt + drat, mtcars), #3
lm(mpg ~ am + wt + drat + qsec, mtcars), #4
lm(mpg ~ am + wt + drat + qsec + gear, mtcars), #5
lm(mpg ~ wt, mtcars), #6
lm(mpg ~ wt + drat, mtcars), #7
lm(mpg ~ wt + drat + qsec, mtcars), #8
lm(mpg ~ wt + drat + qsec + gear, mtcars)) #9
Form the plots above we can see, cars with automatic transmission also weight heavier in general than cars with manual transmission.
coef(lm(mpg ~ am, mtcars))
## (Intercept) am
## 17.147368 7.244939
coef(lm(mpg ~ am + wt, mtcars))
## (Intercept) am wt
## 37.32155131 -0.02361522 -5.35281145
coef(lm(mpg ~ am + wt + qsec, mtcars))
## (Intercept) am wt qsec
## 9.617781 2.935837 -3.916504 1.225886
mdl <- lm(mpg ~ am + wt + qsec + drat, mtcars); summary(mdl)
##
## Call:
## lm(formula = mpg ~ am + wt + qsec + drat, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3046 -1.6260 -0.6634 1.2097 4.6626
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6277 8.2103 0.929 0.361095
## am 2.5729 1.6225 1.586 0.124446
## wt -3.8040 0.7592 -5.010 2.96e-05 ***
## qsec 1.1958 0.2995 3.992 0.000452 ***
## drat 0.6429 1.3551 0.474 0.639003
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.494 on 27 degrees of freedom
## Multiple R-squared: 0.8509, Adjusted R-squared: 0.8288
## F-statistic: 38.52 on 4 and 27 DF, p-value: 8.673e-11
par(mfrow = c(2,2)); plot(mdl)
If we add different regressors into the linear model, and we will have different coefficients for am.