Data for mtcars was extracted from Motor Trend US magazine at 1974 which included 32 automobiles and 10 different designs. In this study we are going to compare effect of automatic and manual transmission on MPG and determination of difference of MPG between automatic and manual automobiles . The results show manual transmission is better than automatic
mtcars have 32 observations on 11 variables:
1 mpg: Miles per gallon
2 cyl: Number of cylinders
3 disp: Displacement
4 hp: Gross horsepower
5 drat: Rear axle ratio
6 wt: Weight (1000 lbs)
7 qsec: 1/4 mile time
8 vs: V/S
9 am: Transmission (0 = automatic, 1 = manual)
10 gear: Number of forward gears
11 carb: Number of carburetors
Setting Directory
setwd("C:/Users/FARZAD/Desktop/Data Science/Course 7/Project")
getwd()
[1] "C:/Users/FARZAD/Desktop/Data Science/Course 7/Project"
Getting Data & summary
data(mtcars)
summary(mtcars)
mpg cyl disp hp drat wt
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760 Min. :1.513
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581
Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695 Median :3.325
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597 Mean :3.217
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930 Max. :5.424
qsec vs am gear carb
Min. :14.50 Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :17.71 Median :0.0000 Median :0.0000 Median :4.000 Median :2.000
Mean :17.85 Mean :0.4375 Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :22.90 Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :8.000
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Evaluation of MPG according to Transmission
boxplot(mpg ~ am, data = mtcars,col = c("green", "pink"),xlab = "Transmission Type",ylab = "Miles / Gallon", main = "MPG by Transmission Type",names= c("Automatic","Manual"),horizontal= F)
Therefore Manual looks better than Automatic based on Miles per Gallon based on above boxplot but for evidence based practice it requires hypothesis testing .
H0: Mean MPG for Automatic = Mean MPG for Manual
H1: Mean MPG for Automatic different than Mean MPG for Manual
auto=subset(mtcars,select=mpg,am==0)
manual=subset(mtcars,select=mpg,am==1)
t.test(auto,manual)
Welch Two Sample t-test
data: auto and manual
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval: -11.280194 -3.209684
sample estimates:
mean of x(Automatic) mean of y(Manual)
17.14737 24.39231
Manual transmission shows higher mean of MPG than Automatic therefore amount of distance (Miles) per gallon in
manual vehicles is higher than automatic so manual cars can drive longer by certain amount of fuel then Null
hypothesis will be rejected.
For regression analysis “MPG” defines as Dependent variable and “am” defines as Independient variable
reg_Mod<- lm(mpg~am,mtcars)
summary(reg_Mod)
Call:
lm(formula = mpg ~ am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-9.3923 -3.0923 -0.2974 3.2439 9.5077
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 17.147 1.125 15.247 1.13e-15 ***
am(Manual) 7.245 1.764 4.106 0.000285 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
This regression determines Manual is better with average 7.245 miles and R squared id 0.36 with variance of 36% .
To evaluate effect of other variables on MPG
reg_total <- lm(mpg~.,mtcars)
summary(reg_total)
Call:
lm(formula = mpg ~ ., data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4506 -1.6044 -0.1196 1.2193 4.6271
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 12.30337 18.71788 0.657 0.5181
cyl -0.11144 1.04502 -0.107 0.9161
disp 0.01334 0.01786 0.747 0.4635
hp -0.02148 0.02177 -0.987 0.3350
drat 0.78711 1.63537 0.481 0.6353
wt -3.71530 1.89441 -1.961 0.0633 .
qsec 0.82104 0.73084 1.123 0.2739
vs 0.31776 2.10451 0.151 0.8814
am 2.52023 2.05665 1.225 0.2340
gear 0.65541 1.49326 0.439 0.6652
carb -0.19942 0.82875 -0.241 0.8122
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
Evaluation of other variables show although Manual is better but its average reduced to 2.52 miles and R squared
shows variance of 86.9% therefore all coefficients are not significant.
Then for selection of best variables needs stepwise regression method.
reg_stepwise=step(reg_total,trace=0)
summary(reg_stepwise)
Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4811 -1.5555 -0.7257 1.4110 4.6610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6178 6.9596 1.382 0.177915
wt -3.9165 0.7112 -5.507 6.95e-06 ***
qsec 1.2259 0.2887 4.247 0.000216 ***
am 2.9358 1.4109 2.081 0.046716 *
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Stepwise regression method determines variables such as “wt”,“qsec” and “am” can affect on MPG value more than
others , so with variance of 84.9% and coefficients significative of 5% ,the effect of “am” has more significant
than “wt” and “qsec” on MPG value.
anova(reg_Mod,reg_stepwise,reg_total)
Analysis of Variance Table
Model 1: mpg ~ am
Model 2: mpg ~ wt + qsec + am
Model 3: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
Res.Df RSS Df Sum of Sq F Pr(>F)
1 30 720.90
2 28 169.29 2 551.61 39.2687 8.025e-08 ***
3 21 147.49 7 21.79 0.4432 0.8636
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion:
ANOVA shows Model 2 with consideration of three variables (“wt”,“qsec”,“am”) is the best choice to evaluate MPG Value.
To evaluate the residuals best model with consideration of three variables (“wt”,“qsec”,“am”) will be plotted
plot(reg_stepwise, which=c(1:1))
cor(mtcars)[1,]
mpg cyl disp hp drat wt qsec vs am
1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.4186840 0.6640389 0.5998324
gear carb
0.4802848 -0.5509251
res_all <- lm(mpg ~ wt+hp+disp+cyl+am, data = mtcars)
par(mfrow = c(1, 1))
plot(res_all)
pairs(mtcars)