setup
## Warning: package 'ggplot2' was built under R version 3.3.3
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Note that since automatic transmissions are often standard on large(weight), powerful(hp) cars with big engines(disp) and manual transmission are often common on small basic under powered cars - that just looking at cars with automatic transmissions vrs manual transmissions could be misleading. Therefore I adjusted for differences in horse power, engine displacement and overall car weight. The assumption that mpg is impacted by hp, engine displacement and weight was tested and confirmed below. First let’s look at a simple boxplot of mpg vrs transmission type (no adjustments for other factors).
boxplot(mpg ~ am, data = mtcars, col = (c("magenta","green")), ylab = "mpg", xlab = "Type of Transmission")

The above boxplot shows that of the cars tested, manual transmissions have significantly higher mpg ratings than automatic transmissions.
Next lets look at scatterplots with linear regression smoothers with hp, disp & weight vrs mpg - each plot will compare between auto and manual transmissions (df2 and df3 databases).
g1<- ggplot()+
geom_point(aes(x=df2$hp,y=df2$mpg),color="red",alpha=.6)+
geom_smooth(aes(x=df2$hp,y=df2$mpg),method="lm",color="dark red",se=F)+
geom_point(aes(x=df3$hp,y=df3$mpg),color="darkgreen",alpha=.8)+
geom_smooth(aes(x=df3$hp,y=df3$mpg),method="lm",color="dark green",se=F)+
labs(title="Effect of Transmission type & Horse Power vrs MPG")+
labs(subtitle="Red=AutoTransmission ; Green=Manual Transmission")+
labs(x="Horse Power (Red=AutoTransmission ; Green=Manual Transmission)",y="Miles per Gallon")
g1

g2<- ggplot()+
geom_point(aes(x=df2$disp,y=df2$mpg),color="red",alpha=.6)+
geom_smooth(aes(x=df2$disp,y=df2$mpg),method="lm",color="dark red",se=F)+
geom_point(aes(x=df3$disp,y=df3$mpg),color="darkgreen",alpha=.8)+
geom_smooth(aes(x=df3$disp,y=df3$mpg),method="lm",color="dark green",se=F)+
labs(title="Effect of Transmission type & Engine displacement vrs MPG")+
labs(subtitle="Red=AutoTransmission ; Green=Manual Transmission")+
labs(x="Engine displacement (Red=AutoTransmission ; Green=Manual Transmission)",y="Miles per Gallon")
g2

g3<- ggplot()+
geom_point(aes(x=df2$wt,y=df2$mpg),color="red",alpha=.6)+
geom_smooth(aes(x=df2$wt,y=df2$mpg),method="lm",color="dark red",se=F)+
geom_point(aes(x=df3$wt,y=df3$mpg),color="darkgreen",alpha=.8)+
geom_smooth(aes(x=df3$wt,y=df3$mpg),method="lm",color="dark green",se=F)+
labs(title="Effect of Transmission type & Weight of car vrs MPG")+
labs(subtitle="Red=AutoTransmission ; Green=Manual Transmission")+
labs(x="Weight of Car (Red=AutoTransmission ; Green=Manual Transmission)" ,y="Miles per Gallon")
g3

The above plots appear to show that manual transmissions get better mileage than automatic transmission - even when accounting for differences in hp, displacement and weight. Let’s look more and try to quantify the differences.
Assuming that mpg is impacted by hp, engine displacement and weight - lets look at average mpg for auto vrs manual transmissions - we will adjust for hp, displacement and weight.
avgauto<-round((mean(df2$mpg)*10)/mean(df2$hp),5)
avgman<-round((mean(df3$mpg)*10)/mean(df3$hp),5)
a1<-c("Auto mpg/hp",avgauto,"vrs","Man mpg/hp",avgman)
avgauto2<-round((mean(df2$mpg)*10)/mean(df2$disp),5)
avgman2<-round((mean(df3$mpg)*10)/mean(df3$disp),5)
a2<-c("Auto mpg/disp",avgauto2,"vrs","Man mpg/disp",avgman2)
avgauto3<-round((mean(df2$mpg)*10)/mean(df2$wt),5)
avgman3<-round((mean(df3$mpg)*10)/mean(df3$wt),5)
a3<-c("Auto mpg/wt",avgauto3,"vrs","Man mpg/wt",avgman3)
a1
## [1] "Auto mpg/hp" "1.06995" "vrs" "Man mpg/hp" "1.92298"
a2
## [1] "Auto mpg/disp" "0.59052" "vrs" "Man mpg/disp"
## [5] "1.69945"
a3
## [1] "Auto mpg/wt" "45.49707" "vrs" "Man mpg/wt" "101.17092"
The above ratios of mpg/hp, mpg/disp and mpg/wt for cars with automatic transmissions vrs cars with manual transmitions would clearly indicate that the cars tested with manual transmissions have significantly better mpg than the cars tested with auto transmissions. The following code provides the percentage difference (how much better mpg does the manual get vrs the auto in percentage) between manual and auto transmissions based as follows:
1) mpg for manual transmissions are 79.7% higher than for auto transmissions adjusted for differences in horse power.
2) mpg for manual transmissions are 187.8% higher than for auto transmissions adjusted for differences in engine displacement.
3) mpg for manual transmissions are 122.4% higher than for auto transmissions adjusted for differences in weight.
mpg_hp<-round((avgman-avgauto)/avgauto,3)
mpg_disp<-round((avgman2-avgauto2)/avgauto2,3)
mpg_wt<-round((avgman3-avgauto3)/avgauto3,3)
c("Percentage higher avg mpg for manual vrs auto adjusted for hp",mpg_hp*100)
## [1] "Percentage higher avg mpg for manual vrs auto adjusted for hp"
## [2] "79.7"
c("Percentage higher avg mpg for manual vrs auto adjusted for disp",mpg_disp*100)
## [1] "Percentage higher avg mpg for manual vrs auto adjusted for disp"
## [2] "187.8"
c("Percentage higher avg mpg for manual vrs auto adjusted for wt",mpg_wt*100)
## [1] "Percentage higher avg mpg for manual vrs auto adjusted for wt"
## [2] "122.4"
The assignment made the following statements:
1) When measuring MPG, manual transmissions perform better than automatic transmissions by 7.25MPG, however this single factor only accounts for 36% of the explanation
2) When measuring MPG, manual transmissions provide an additional 1.48MPG of performance over automatic transmissions when taking into account three additonal explanatory variables (cylinders, horsepower & weight), these additional factors account for 85% of the explanation
Let’s look at the above statements:
statement1<-mean(df3$mpg)-mean(df2$mpg)
statement1
## [1] 7.244939
man<-lm(mpg~cyl+hp+wt,df3)
auto<-lm(mpg~cyl+hp+wt,df2)
summary(man)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt, data = df3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0259 -1.2918 -0.9867 0.2744 5.5044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.200980 4.607628 9.810 4.2e-06 ***
## cyl -0.484090 1.366919 -0.354 0.7314
## hp -0.007558 0.023131 -0.327 0.7513
## wt -7.213709 2.579060 -2.797 0.0208 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.855 on 9 degrees of freedom
## Multiple R-squared: 0.8392, Adjusted R-squared: 0.7856
## F-statistic: 15.65 on 3 and 9 DF, p-value: 0.0006465
summary(auto)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt, data = df2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2693 -1.6016 -0.3866 1.3423 3.3022
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.73699 2.73120 11.986 4.39e-09 ***
## cyl -0.71841 0.54972 -1.307 0.2109
## hp -0.02429 0.01712 -1.419 0.1764
## wt -1.77914 0.79525 -2.237 0.0409 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.918 on 15 degrees of freedom
## Multiple R-squared: 0.7913, Adjusted R-squared: 0.7496
## F-statistic: 18.96 on 3 and 15 DF, p-value: 2.303e-05
The statements appear to be true