You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
library(datasets)
data(mtcars)
It consists of 32 observations on 11 variables.
For automatic:
summary(mtcars[mtcars$am==0,])
## mpg cyl disp hp drat
## Min. :10.4 Min. :4.00 Min. :120 Min. : 62 Min. :2.76
## 1st Qu.:14.9 1st Qu.:6.00 1st Qu.:196 1st Qu.:116 1st Qu.:3.07
## Median :17.3 Median :8.00 Median :276 Median :175 Median :3.15
## Mean :17.1 Mean :6.95 Mean :290 Mean :160 Mean :3.29
## 3rd Qu.:19.2 3rd Qu.:8.00 3rd Qu.:360 3rd Qu.:192 3rd Qu.:3.69
## Max. :24.4 Max. :8.00 Max. :472 Max. :245 Max. :3.92
## wt qsec vs am gear
## Min. :2.46 Min. :15.4 Min. :0.000 Min. :0 Min. :3.00
## 1st Qu.:3.44 1st Qu.:17.2 1st Qu.:0.000 1st Qu.:0 1st Qu.:3.00
## Median :3.52 Median :17.8 Median :0.000 Median :0 Median :3.00
## Mean :3.77 Mean :18.2 Mean :0.368 Mean :0 Mean :3.21
## 3rd Qu.:3.84 3rd Qu.:19.2 3rd Qu.:1.000 3rd Qu.:0 3rd Qu.:3.00
## Max. :5.42 Max. :22.9 Max. :1.000 Max. :0 Max. :4.00
## carb
## Min. :1.00
## 1st Qu.:2.00
## Median :3.00
## Mean :2.74
## 3rd Qu.:4.00
## Max. :4.00
For manual:
summary(mtcars[mtcars$am==1,])
## mpg cyl disp hp
## Min. :15.0 Min. :4.00 Min. : 71.1 Min. : 52
## 1st Qu.:21.0 1st Qu.:4.00 1st Qu.: 79.0 1st Qu.: 66
## Median :22.8 Median :4.00 Median :120.3 Median :109
## Mean :24.4 Mean :5.08 Mean :143.5 Mean :127
## 3rd Qu.:30.4 3rd Qu.:6.00 3rd Qu.:160.0 3rd Qu.:113
## Max. :33.9 Max. :8.00 Max. :351.0 Max. :335
## drat wt qsec vs am
## Min. :3.54 Min. :1.51 Min. :14.5 Min. :0.000 Min. :1
## 1st Qu.:3.85 1st Qu.:1.94 1st Qu.:16.5 1st Qu.:0.000 1st Qu.:1
## Median :4.08 Median :2.32 Median :17.0 Median :1.000 Median :1
## Mean :4.05 Mean :2.41 Mean :17.4 Mean :0.538 Mean :1
## 3rd Qu.:4.22 3rd Qu.:2.78 3rd Qu.:18.6 3rd Qu.:1.000 3rd Qu.:1
## Max. :4.93 Max. :3.57 Max. :19.9 Max. :1.000 Max. :1
## gear carb
## Min. :4.00 Min. :1.00
## 1st Qu.:4.00 1st Qu.:1.00
## Median :4.00 Median :2.00
## Mean :4.38 Mean :2.92
## 3rd Qu.:5.00 3rd Qu.:4.00
## Max. :5.00 Max. :8.00
Hence, the mean of mpg is greater for manual (at 24.4) than automatic (at 17.1).
Investigating further..
boxplot(mpg ~ am, data = mtcars, xlab = "Transmission", ylab = "Miles per gallon", main="Miles per gallon by Transmission Type")
Manual (represented by 1) has a higher mean for mpg than automatic (represented by 0).
aggregate(mpg~am, data = mtcars, mean)
## am mpg
## 1 0 17.15
## 2 1 24.39
The mean transmission for manual is 7.24mpg higher than automatic. Let alpha=0.5.
auto <- mtcars[mtcars$am == 0,]
manual <- mtcars[mtcars$am == 1,]
t.test(auto$mpg, manual$mpg)
##
## Welch Two Sample t-test
##
## data: auto$mpg and manual$mpg
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.28 -3.21
## sample estimates:
## mean of x mean of y
## 17.15 24.39
Since p-value = 0.001374, we reject the null hypothesis. There is a major difference between mpg of manual and automatic transmissions.
m<-lm(mpg~am,data=mtcars)
summary(m)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.392 -3.092 -0.297 3.244 9.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.15 1.12 15.25 1.1e-15 ***
## am 7.24 1.76 4.11 0.00029 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared: 0.36, Adjusted R-squared: 0.338
## F-statistic: 16.9 on 1 and 30 DF, p-value: 0.000285
From the above, we may conclude that automatic run at 17.15mpg, while manual have 7.24 more mpg.
Also, R^2 is 0.36, hence the model only accounts for 36% variance.
Performing multivariate linear regression:
model <- lm(mpg~am + wt + hp + cyl, data = mtcars)
anova(m,model)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + hp + cyl
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 721
## 2 27 170 3 551 29.2 1.3e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The final model is below:
summary(model)
##
## Call:
## lm(formula = mpg ~ am + wt + hp + cyl, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.476 -1.847 -0.554 1.276 5.661
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.1465 3.1048 11.64 4.9e-12 ***
## am 1.4780 1.4411 1.03 0.3142
## wt -2.6065 0.9198 -2.83 0.0086 **
## hp -0.0250 0.0136 -1.83 0.0786 .
## cyl -0.7452 0.5828 -1.28 0.2119
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.51 on 27 degrees of freedom
## Multiple R-squared: 0.849, Adjusted R-squared: 0.827
## F-statistic: 38 on 4 and 27 DF, p-value: 1.02e-10
This model explains 84.9% of the variance. It may be concluded that on average, manual transmissions have 1.478 more mpg than automatic.
plot(model)
Hence, the residuals are normally distributed, and homoskedastic.