As an employee of Motor Trend, a magazine about the automobile industry, I attempt to answer the following questions:
This analysis explores the relationship between the variable transmission type (manual or automatic) and the variable MPG (miles per gallon). This analysis uses the mtcars data set.
A t-test between automatic and manual transmission vehicles shows that manual transmission vehicles have around 7MPG greater than automatic transmission vehicles. Fitting multiple linear regressions shows that manual transmission contributes less significantly to MPG, giving an improvement of 1.81 MPG only. Other variables like weight, horsepower, and number of cylinders contribute more significantly to the overall MPG of vehicles.
library(ggplot2)
data(mtcars)
Taking a look at the data and converting categorical variables into factors:
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
dim(mtcars)
## [1] 32 11
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- factor(mtcars$am)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
Reference Appendix Fig. 1
Attached box plot compares Automatic and Manual transmission MPG. The plot shows that there is a significant increase in MPG for vehicles with manual transmission than for vehicles with automatic transmission.
Following is the T-Test between transmission type and MPG
ttestRes <- t.test(mpg ~ am)
ttestRes$p.value
## [1] 0.001373638
The T-Test rejects the null hypothesis (the difference between transmission types is 0).
ttestRes$estimate
## mean in group 0 mean in group 1
## 17.14737 24.39231
The difference estimate between the two transmissions is 7.24494 MPG, which means that the mean for MPG of manual transmitted cars is 7.24494 more than that of automatic transmitted cars.
Fitting the model
fit <- lm(mpg ~ ., data = mtcars)
summary(fit)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.87913244 20.06582026 1.19004018 0.25252548
## cyl6 -2.64869528 3.04089041 -0.87102622 0.39746642
## cyl8 -0.33616298 7.15953951 -0.04695316 0.96317000
## disp 0.03554632 0.03189920 1.11433290 0.28267339
## hp -0.07050683 0.03942556 -1.78835344 0.09393155
## drat 1.18283018 2.48348458 0.47627845 0.64073922
## wt -4.52977584 2.53874584 -1.78425732 0.09461859
## qsec 0.36784482 0.93539569 0.39325050 0.69966720
## vs1 1.93085054 2.87125777 0.67247551 0.51150791
## am1 1.21211570 3.21354514 0.37718957 0.71131573
## gear4 1.11435494 3.79951726 0.29328856 0.77332027
## gear5 2.52839599 3.73635801 0.67670068 0.50889747
## carb2 -0.97935432 2.31797446 -0.42250436 0.67865093
## carb3 2.99963875 4.29354611 0.69863900 0.49546781
## carb4 1.09142288 4.44961992 0.24528452 0.80956031
## carb6 4.47756921 6.38406242 0.70136677 0.49381268
## carb8 7.25041126 8.36056638 0.86721532 0.39948495
We see that none of the coefficients have p-value less than 0.05. Thus, we cannot conclude which variables are more significant.
Backward selection may determine which variables are most significant
stepf <- step(fit)
## Start: AIC=76.4
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
##
## Df Sum of Sq RSS AIC
## - carb 5 13.5989 134.00 69.828
## - gear 2 3.9729 124.38 73.442
## - am 1 1.1420 121.55 74.705
## - qsec 1 1.2413 121.64 74.732
## - drat 1 1.8208 122.22 74.884
## - cyl 2 10.9314 131.33 75.184
## - vs 1 3.6299 124.03 75.354
## <none> 120.40 76.403
## - disp 1 9.9672 130.37 76.948
## - wt 1 25.5541 145.96 80.562
## - hp 1 25.6715 146.07 80.588
##
## Step: AIC=69.83
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear
##
## Df Sum of Sq RSS AIC
## - gear 2 5.0215 139.02 67.005
## - disp 1 0.9934 135.00 68.064
## - drat 1 1.1854 135.19 68.110
## - vs 1 3.6763 137.68 68.694
## - cyl 2 12.5642 146.57 68.696
## - qsec 1 5.2634 139.26 69.061
## <none> 134.00 69.828
## - am 1 11.9255 145.93 70.556
## - wt 1 19.7963 153.80 72.237
## - hp 1 22.7935 156.79 72.855
##
## Step: AIC=67
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am
##
## Df Sum of Sq RSS AIC
## - drat 1 0.9672 139.99 65.227
## - cyl 2 10.4247 149.45 65.319
## - disp 1 1.5483 140.57 65.359
## - vs 1 2.1829 141.21 65.503
## - qsec 1 3.6324 142.66 65.830
## <none> 139.02 67.005
## - am 1 16.5665 155.59 68.608
## - hp 1 18.1768 157.20 68.937
## - wt 1 31.1896 170.21 71.482
##
## Step: AIC=65.23
## mpg ~ cyl + disp + hp + wt + qsec + vs + am
##
## Df Sum of Sq RSS AIC
## - disp 1 1.2474 141.24 63.511
## - vs 1 2.3403 142.33 63.757
## - cyl 2 12.3267 152.32 63.927
## - qsec 1 3.1000 143.09 63.928
## <none> 139.99 65.227
## - hp 1 17.7382 157.73 67.044
## - am 1 19.4660 159.46 67.393
## - wt 1 30.7151 170.71 69.574
##
## Step: AIC=63.51
## mpg ~ cyl + hp + wt + qsec + vs + am
##
## Df Sum of Sq RSS AIC
## - qsec 1 2.442 143.68 62.059
## - vs 1 2.744 143.98 62.126
## - cyl 2 18.580 159.82 63.466
## <none> 141.24 63.511
## - hp 1 18.184 159.42 65.386
## - am 1 18.885 160.12 65.527
## - wt 1 39.645 180.88 69.428
##
## Step: AIC=62.06
## mpg ~ cyl + hp + wt + vs + am
##
## Df Sum of Sq RSS AIC
## - vs 1 7.346 151.03 61.655
## <none> 143.68 62.059
## - cyl 2 25.284 168.96 63.246
## - am 1 16.443 160.12 63.527
## - hp 1 36.344 180.02 67.275
## - wt 1 41.088 184.77 68.108
##
## Step: AIC=61.65
## mpg ~ cyl + hp + wt + am
##
## Df Sum of Sq RSS AIC
## <none> 151.03 61.655
## - am 1 9.752 160.78 61.657
## - cyl 2 29.265 180.29 63.323
## - hp 1 31.943 182.97 65.794
## - wt 1 46.173 197.20 68.191
summary(stepf)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832390 2.60488618 12.940421 7.733392e-13
## cyl6 -3.03134449 1.40728351 -2.154040 4.068272e-02
## cyl8 -2.16367532 2.28425172 -0.947214 3.522509e-01
## hp -0.03210943 0.01369257 -2.345025 2.693461e-02
## wt -2.49682942 0.88558779 -2.819404 9.081408e-03
## am1 1.80921138 1.39630450 1.295714 2.064597e-01
This model has 4 variables, viz. cylinders, horsepower, weight, transmission. The R-squared value (0.8659) confirms that this model explains about 87% of the variance in MPG. The p-values also are significant because they have a p-value < 0.05. The coefficients conclude that increasing the number of cylinders from 4 to 6 will decrease the MPG by 3.03.
Furthermore, increasing the cylinders to 8 will decrease the MPG by 2.16. Increasing the horsepower will decrease MPG by 3.21 for every 100HP. Increasing weight will decrease MPG by 2.5 for every 1000 lbs. Manual transmission will improve the MPG by 1.81.
Reference Appendix Fig. 2
The following inferences can be drawn from the plots:
sum((abs(dfbetas(stepf)))>1)
## [1] 0
Thus, the analysis proves the assumptions and meets the requirements.
Manual transmission gives a slight boost to MPG. However, the variables weight, horsepower and number of cylinders are more significant when determining the MPG.
boxplot(mpg ~ am,
xlab="Transmission Type (0 = Automatic, 1 = Manual)",
ylab="MPG",
main="MPG by Transmission Type")
par(mfrow = c(2, 2))
plot(stepf)