Motor Trend is a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome).
They are particularly interested in the following two questions: * Is an automatic or manual transmission better for KPL * Quantify the KPL difference between automatic and manual transmissions
Using hypothesis testing and simple linear regression, we can conclude that there is a signficant difference between the mean MPG for automatic and manual transmission cars and hence conclude that “manual transmission better than automatic transmission for MPG”
To confirm our conclusions & to adjust for confounding variables such as the weight and quarter mile time (acceleration) of the car, multivariate regression analysis was run to understand the impact of transmission type on MPG.
The best-fit model results indicates that weight and quarter mile time (acceleration) have signficant impact of the mpg between automatic and manual transmission cars.
Data was obtained in R CRAN and its documentation can be found on http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
A data frame with 32 observations on 11 variables
Before you start execution of this Rmd file, please set working dir to your repository
setwd(your_path)
# load data
load("mtcars.RData")
cars <- mtcars
head(cars)
## Cylinders Displacement Gross.Horsepower Rear.Axle.Ratio
## Mazda RX4 6 160 110 3.90
## Mazda RX4 Wag 6 160 110 3.90
## Datsun 710 4 108 93 3.85
## Hornet 4 Drive 6 258 110 3.08
## Hornet Sportabout 8 360 175 3.15
## Valiant 6 225 105 2.76
## Weight Quarter.Mile.Time Engine.Type Transmission
## Mazda RX4 2.620 16.46 V.Engine Manual
## Mazda RX4 Wag 2.875 17.02 V.Engine Manual
## Datsun 710 2.320 18.61 Straight.Engine Manual
## Hornet 4 Drive 3.215 19.44 Straight.Engine Automatic
## Hornet Sportabout 3.440 17.02 V.Engine Automatic
## Valiant 3.460 20.22 Straight.Engine Automatic
## Gears.Number Carburetors.Number Consumtion.Kpl
## Mazda RX4 4 4 8.928000
## Mazda RX4 Wag 4 4 8.928000
## Datsun 710 4 1 9.693257
## Hornet 4 Drive 3 1 9.098057
## Hornet Sportabout 3 2 7.950171
## Valiant 3 1 7.695086
str(cars)
## 'data.frame': 32 obs. of 11 variables:
## $ Cylinders : num 6 6 4 6 8 6 8 4 4 6 ...
## $ Displacement : num 160 160 108 258 360 ...
## $ Gross.Horsepower : num 110 110 93 110 175 105 245 62 95 123 ...
## $ Rear.Axle.Ratio : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ Weight : num 2.62 2.88 2.32 3.21 3.44 ...
## $ Quarter.Mile.Time : num 16.5 17 18.6 19.4 17 ...
## $ Engine.Type : Factor w/ 2 levels "V.Engine","Straight.Engine": 1 1 2 2 1 2 1 2 2 2 ...
## $ Transmission : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
## $ Gears.Number : Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
## $ Carburetors.Number: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
## $ Consumtion.Kpl : num 8.93 8.93 9.69 9.1 7.95 ...
names(mtcars)
## [1] "Cylinders" "Displacement" "Gross.Horsepower"
## [4] "Rear.Axle.Ratio" "Weight" "Quarter.Mile.Time"
## [7] "Engine.Type" "Transmission" "Gears.Number"
## [10] "Carburetors.Number" "Consumtion.Kpl"
summary(cars)
## Cylinders Displacement Gross.Horsepower Rear.Axle.Ratio
## Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760
## 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
## Median :6.000 Median :196.3 Median :123.0 Median :3.695
## Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597
## 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
## Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930
## Weight Quarter.Mile.Time Engine.Type Transmission
## Min. :1.513 Min. :14.50 V.Engine :18 Automatic:19
## 1st Qu.:2.581 1st Qu.:16.89 Straight.Engine:14 Manual :13
## Median :3.325 Median :17.71
## Mean :3.217 Mean :17.85
## 3rd Qu.:3.610 3rd Qu.:18.90
## Max. :5.424 Max. :22.90
## Gears.Number Carburetors.Number Consumtion.Kpl
## 3:15 1: 7 Min. : 4.421
## 4:12 2:10 1st Qu.: 6.558
## 5: 5 3: 3 Median : 8.163
## 4:10 Mean : 8.541
## 6: 1 3rd Qu.: 9.693
## 8: 1 Max. :14.412
table.kpl.means <- tapply(cars$Consumtion.Kpl, cars$Transmission, mean)
table.kpl.means
## Automatic Manual
## 7.290081 10.370215
barplot(
height = table.kpl.means,
main = "Average Fuel consumption(Kpl() by transmission")
boxplot(cars$Consumtion.Kpl ~ cars$Transmission, data = mtcars,
col = c("dark grey", "light grey"),
xlab = "Transmission",
ylab = "KPL",
main = "Average Fuel consumption(Kpl() by transmission")
# plot densities
sm.density.compare(cars$Consumtion.Kpl, cars$Transmission, xlab="Consumption")
title(main="Fuel Consumtion by Transmission")
Observation
From the above graph, it can be seen that the following basic assumptions are met.
auto.rows <- cars[cars$Transmission == "Automatic",]
manual.rows <- cars[cars$Transmission == "Manual",]
ttest <- t.test(auto.rows$Consumtion.Kpl, manual.rows$Consumtion.Kpl)
ttest
##
## Welch Two Sample t-test
##
## data: auto.rows$Consumtion.Kpl and manual.rows$Consumtion.Kpl
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.795694 -1.364574
## sample estimates:
## mean of x mean of y
## 7.290081 10.370215
Observation
The p-value is 0.0014, so we can reject the null hypothesis and conclude automatic has low kpl compared with manual cars. This ratifies our observation as seen in the above boxplot graph titled “Fuel consumption by Transmission”. However this conclusion would be incomplete without further investigation. The should be further explored using the multiple linear regression analysis.
We will be running a linear regression tests on this data.
model.kpl.to.transmission <- lm(Consumtion.Kpl~Transmission, data=cars)
model.summary <- summary(model.kpl.to.transmission)
model.summary
##
## Call:
## lm(formula = Consumtion.Kpl ~ Transmission, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9931 -1.3147 -0.1264 1.3791 4.0421
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.2901 0.4781 15.247 1.13e-15 ***
## TransmissionManual 3.0801 0.7501 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.084 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Observation
Interpreting the coefficient and intercepts, we say that, on average, manual transmission cars have 7.2449 mpg more than automatic transmission. In addition, we see that the R^2 value is 0.3598. This means that our model explains 35.9799% of the variance (not sufficient). Hence we can say that we do not gain much information from our hypothesis test using this model.
pairs(cars)
str(cars)
## 'data.frame': 32 obs. of 11 variables:
## $ Cylinders : num 6 6 4 6 8 6 8 4 4 6 ...
## $ Displacement : num 160 160 108 258 360 ...
## $ Gross.Horsepower : num 110 110 93 110 175 105 245 62 95 123 ...
## $ Rear.Axle.Ratio : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ Weight : num 2.62 2.88 2.32 3.21 3.44 ...
## $ Quarter.Mile.Time : num 16.5 17 18.6 19.4 17 ...
## $ Engine.Type : Factor w/ 2 levels "V.Engine","Straight.Engine": 1 1 2 2 1 2 1 2 2 2 ...
## $ Transmission : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
## $ Gears.Number : Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
## $ Carburetors.Number: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
## $ Consumtion.Kpl : num 8.93 8.93 9.69 9.1 7.95 ...
correlations <- cor(cars[,c(1,2,3,4,5,6,11)])
# create correlation plot
corrplot(correlations, method="circle")
sort(cor(cars[,c(1,2,3,4,5,6,11)])[7,])
## Weight Cylinders Displacement Gross.Horsepower
## -0.8676594 -0.8521620 -0.8475514 -0.7761684
## Quarter.Mile.Time Rear.Axle.Ratio Consumtion.Kpl
## 0.4186840 0.6811719 1.0000000
final.model <- lm(formula = cars$Consumtion.Kpl ~ cars$Weight + cars$Quarter.Mile.Time + cars$Transmission, data = cars)
summary(final.model)
##
## Call:
## lm(formula = cars$Consumtion.Kpl ~ cars$Weight + cars$Quarter.Mile.Time +
## cars$Transmission, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4800 -0.6613 -0.3085 0.5999 1.9816
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0889 2.9588 1.382 0.177915
## cars$Weight -1.6651 0.3024 -5.507 6.95e-06 ***
## cars$Quarter.Mile.Time 0.5212 0.1227 4.247 0.000216 ***
## cars$TransmissionManual 1.2482 0.5998 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.045 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Observation
This model captured 83.26% of total variance in kpl. The p-value is 0.00000000005399. Based on above, we can reject the null hypothesis and can conclude that our multivariate model is significantly different from our simple linear regression model.
This model explains 83.26% of the variance in miles per gallon (mpg).
We see that Weight & Cylinders did impact the relationship between am and mpg (mostly wt).
Therefore given the above analysis, the question of “Is an automatic or manual transmission better for KPL” can not be answered without considering Weight & Cylinders.
Again from the above analysis, to answer the question “Quantify the MPG difference between automatic and manual transmissions”, we refer to the coefficient for am and based on that we can say that, on average, manual transmission cars have 2.9358 mpg more than automatic transmission cars.