Through this project, I have tried to analyze Fuel Economy data from 1999 to 2008 for 38 popular models of cars.
Points that I have tried to answer through this project:
1. Best Cars in 1999 & 2008
2. Mean City v Highway mileage
3. 1999 cars cylinder-wise & drive-wise mean mileage for city
4. 1999 cars cylinder-wise & drive-wise mean mileage for highway
5. 2008 cars cylinder-wise & drive-wise mean mileage for city
6. 2008 cars cylinder-wise & drive-wise mean mileage for highway
This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.
mpg
A data frame with 234 rows and 11 variables:
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.6.3
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
View(mpg)
str(mpg)
## tibble [234 x 11] (S3: tbl_df/tbl/data.frame)
## $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
## $ model : chr [1:234] "a4" "a4" "a4" "a4" ...
## $ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr [1:234] "f" "f" "f" "f" ...
## $ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr [1:234] "p" "p" "p" "p" ...
## $ class : chr [1:234] "compact" "compact" "compact" "compact" ...
dat99<-mpg[mpg$year==1999,]
View(dat99)
dat08<-mpg[mpg$year==2008,]
View(dat08)
f99<-dat99[dat99$drv=="f" & dat99$cyl==4,]
f08<-dat08[dat08$drv=="f" & dat08$cyl==4,]
print("Mean-Mileage in city of a '99 fwd car having 4 cylinders: ")
## [1] "Mean-Mileage in city of a '99 fwd car having 4 cylinders: "
mean(f99$cty)
## [1] 22.06061
print("Mean-Mileage in city of a '08 fwd car having 4 cylinders: ")
## [1] "Mean-Mileage in city of a '08 fwd car having 4 cylinders: "
mean(f08$cty)
## [1] 22.08
print("Mean-Improvement: ")
## [1] "Mean-Improvement: "
mean(f08$cty)-mean(f99$cty)
## [1] 0.01939394
print("Mean-Mileage on highway of a '99 fwd car having 4 cylinders: ")
## [1] "Mean-Mileage on highway of a '99 fwd car having 4 cylinders: "
mean(f99$hwy)
## [1] 30.09091
print("Mean-Mileage on highway of a '08 fwd car having 4 cylinders: ")
## [1] "Mean-Mileage on highway of a '08 fwd car having 4 cylinders: "
mean(f08$hwy)
## [1] 30.96
print("Mean-Improvement: ")
## [1] "Mean-Improvement: "
mean(f08$hwy)-mean(f99$hwy)
## [1] 0.8690909
From above we can conclude that there has been a slight improvement in mileage of fwd cars having 4 cylinders. Similar Comparisons can be done on 6 & 8 cylinders cars.
print("Mean-Mileage in city of a '99 car:")
## [1] "Mean-Mileage in city of a '99 car:"
mean(dat99$cty)
## [1] 17.01709
print("Mean-Mileage on highway of a '99 car")
## [1] "Mean-Mileage on highway of a '99 car"
mean(dat99$hwy)
## [1] 23.42735
dat99<-mutate(dat99,cty_diff=dat99$cty-mean(dat99$cty),hwy_diff=dat99$hwy-mean(dat99$hwy))
print("Best Cars in '99: ")
## [1] "Best Cars in '99: "
head(dat99[order(dat99$cty_diff,decreasing=T),],n=5)
## # A tibble: 5 x 13
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 volkswagen new ~ 1.9 1999 4 manu~ f 35 44 d subc~
## 2 volkswagen jetta 1.9 1999 4 manu~ f 33 44 d comp~
## 3 volkswagen new ~ 1.9 1999 4 auto~ f 29 41 d subc~
## 4 honda civic 1.6 1999 4 manu~ f 28 33 r subc~
## 5 toyota coro~ 1.8 1999 4 manu~ f 26 35 r comp~
## # ... with 2 more variables: cty_diff <dbl>, hwy_diff <dbl>
head(dat99[order(dat99$hwy_diff,decreasing=T),],n=5)
## # A tibble: 5 x 13
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 volkswagen jetta 1.9 1999 4 manu~ f 33 44 d comp~
## 2 volkswagen new ~ 1.9 1999 4 manu~ f 35 44 d subc~
## 3 volkswagen new ~ 1.9 1999 4 auto~ f 29 41 d subc~
## 4 toyota coro~ 1.8 1999 4 manu~ f 26 35 r comp~
## 5 honda civic 1.6 1999 4 manu~ f 28 33 r subc~
## # ... with 2 more variables: cty_diff <dbl>, hwy_diff <dbl>
From the above, it is clear that Volkswagen’s New Beetle & Jetta are the best in terms of mileage.
print("Mean-Mileage in city of a '08 car:")
## [1] "Mean-Mileage in city of a '08 car:"
mean(dat08$cty)
## [1] 16.70085
print("Mean-Mileage on highway of a '08 car")
## [1] "Mean-Mileage on highway of a '08 car"
mean(dat08$hwy)
## [1] 23.45299
dat08<-mutate(dat08,cty_diff=dat08$cty-mean(dat08$cty),hwy_diff=dat08$hwy-mean(dat08$hwy))
print("Best Cars in '08: ")
## [1] "Best Cars in '08: "
head(dat08[order(dat08$cty_diff,decreasing=T),],n=5)
## # A tibble: 5 x 13
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 toyota coro~ 1.8 2008 4 manu~ f 28 37 r comp~
## 2 honda civic 1.8 2008 4 manu~ f 26 34 r subc~
## 3 toyota coro~ 1.8 2008 4 auto~ f 26 35 r comp~
## 4 honda civic 1.8 2008 4 auto~ f 25 36 r subc~
## 5 honda civic 1.8 2008 4 auto~ f 24 36 c subc~
## # ... with 2 more variables: cty_diff <dbl>, hwy_diff <dbl>
head(dat08[order(dat08$hwy_diff,decreasing=T),],n=5)
## # A tibble: 5 x 13
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 toyota coro~ 1.8 2008 4 manu~ f 28 37 r comp~
## 2 honda civic 1.8 2008 4 auto~ f 25 36 r subc~
## 3 honda civic 1.8 2008 4 auto~ f 24 36 c subc~
## 4 toyota coro~ 1.8 2008 4 auto~ f 26 35 r comp~
## 5 honda civic 1.8 2008 4 manu~ f 26 34 r subc~
## # ... with 2 more variables: cty_diff <dbl>, hwy_diff <dbl>
From the above, it is clear that Toyota’s Corolla & Honda’s Civic are the best in terms of mileage.
cyl_cty<-aggregate(x=mpg$cty,by=list(mpg$cyl),FUN=mean)
cyl_cty
## Group.1 x
## 1 4 21.01235
## 2 5 20.50000
## 3 6 16.21519
## 4 8 12.57143
p1<-ggplot(data=cyl_cty, aes(x=Group.1, y=x,fill=Group.1)) +
geom_bar(stat="identity")+
geom_text(aes(label=signif(x,digits=4)), vjust=-0.2, size=3.5)+
xlab("Cylinders in Car")+ylab("Mean-Mileage in City")
p1
cyl_hwy<-aggregate(x=mpg$hwy,by=list(mpg$cyl),FUN=mean)
cyl_hwy
## Group.1 x
## 1 4 28.80247
## 2 5 28.75000
## 3 6 22.82278
## 4 8 17.62857
p2<-ggplot(data=cyl_hwy, aes(x=Group.1, y=x,fill=Group.1)) +
geom_bar(stat="identity")+
geom_text(aes(label=signif(x,digits=4)), vjust=-0.2, size=3.5)+
xlab("Cylinders in Car")+ylab("Mean-Mileage on Highway")
p2
grid.arrange(p1,p2)
From the above, it is clear that 4 cylinders & 5 cylinders cars have higher mileage than their counterparts.
cyl_4<-dat99[dat99$cyl==4,]
cyl_4_drv<-aggregate(x=cyl_4$cty,by=list(cyl_4$drv),FUN=mean)
cyl_6<-dat99[dat99$cyl==6,]
cyl_6_drv<-aggregate(x=cyl_6$cty,by=list(cyl_6$drv),FUN=mean)
cyl_8<-dat99[dat99$cyl==8,]
cyl_8_drv<-aggregate(x=cyl_8$cty,by=list(cyl_8$drv),FUN=mean)
cyl_4_drv[3,]<-0
cyl_8_drv[3,]<-0
data1<-data.frame(Cyl=rep(c(4,6,8),each=3),Drv=rep(c("4wd","fwd","rwd"),times=3),Mil=c(cyl_4_drv$x,cyl_6_drv$x,cyl_8_drv$x))
p3<-ggplot(data=data1, aes(x=Cyl, y=Mil, fill=Drv)) +
geom_bar(stat="identity", position=position_dodge())+
xlab("Cylinders in Car")+ylab("Mean-Mileage in City")+
ggtitle("1999 Cars Mean-City Mileage")
p3
From the above, it is clear that if you are picking up a 4 cylinder car then you should choose front-wheel drive car, if you are picking up a 6 cylinder car then you should choose rear-wheel drive car, and if you are picking a 8 cylinder car then you should choose front-wheel drive car.
cyl_4<-dat99[dat99$cyl==4,]
cyl_4_drv<-aggregate(x=cyl_4$hwy,by=list(cyl_4$drv),FUN=mean)
cyl_6<-dat99[dat99$cyl==6,]
cyl_6_drv<-aggregate(x=cyl_6$hwy,by=list(cyl_6$drv),FUN=mean)
cyl_8<-dat99[dat99$cyl==8,]
cyl_8_drv<-aggregate(x=cyl_8$hwy,by=list(cyl_8$drv),FUN=mean)
cyl_4_drv
## Group.1 x
## 1 4 23.66667
## 2 f 30.09091
cyl_6_drv
## Group.1 x
## 1 4 18.68421
## 2 f 24.91667
## 3 r 25.50000
cyl_8_drv
## Group.1 x
## 1 4 15.77778
## 2 r 19.55556
cyl_4_drv[3,]<-0
cyl_8_drv[3,]<-0
data2<-data.frame(Cyl=rep(c(4,6,8),each=3),Drv=rep(c("4wd","fwd","rwd"),times=3),Mil=c(cyl_4_drv$x,cyl_6_drv$x,cyl_8_drv$x))
p4<-ggplot(data=data2, aes(x=Cyl, y=Mil, fill=Drv)) +
geom_bar(stat="identity", position=position_dodge())+
xlab("Cylinders in Car")+ylab("Mean-Mileage on Highway")+
ggtitle("1999 Cars Mean-Higway Mileage")
p4
grid.arrange(p3,p4)
From the above, it is clear that if you are picking up a 4 cylinder car then you should choose front-wheel drive car, if you are picking up a 6 cylinder car then you should choose rear-wheel drive car, and if you are picking a 8 cylinder car then you should choose front-wheel drive car.
Also, from above it is clear that a 4 cylinder fwd car gives a far better average than its counterpart. This is self explanatory due to higher combustion rate in 6 & 8 cylinders cars.
cyl_4<-dat08[dat08$cyl==4,]
cyl_4_drv<-aggregate(x=cyl_4$cty,by=list(cyl_4$drv),FUN=mean)
cyl_4_drv
## Group.1 x
## 1 4 19.27273
## 2 f 22.08000
cyl_4_drv[3,]<-0
cyl_5<-dat08[dat08$cyl==5,]
cyl_5_drv<-aggregate(x=cyl_5$cty,by=list(cyl_5$drv),FUN=mean)
cyl_5_drv
## Group.1 x
## 1 f 20.5
cyl_5_drv[c(2,3),]<-0
cyl_6<-dat08[dat08$cyl==6,]
cyl_6_drv<-aggregate(x=cyl_6$cty,by=list(cyl_6$drv),FUN=mean)
cyl_6_drv
## Group.1 x
## 1 4 15.15385
## 2 f 17.26316
## 3 r 16.50000
cyl_8<-dat08[dat99$cyl==8,]
cyl_8_drv<-aggregate(x=cyl_8$cty,by=list(cyl_8$drv),FUN=mean)
cyl_8_drv
## Group.1 x
## 1 4 12.46154
## 2 f 17.62500
## 3 r 13.16667
data3<-data.frame(Cyl=rep(c(4,5,6,8),each=3),Drv=rep(c("4wd","fwd","rwd"),times=4),Mil=c(cyl_4_drv$x,cyl_5_drv$x,cyl_6_drv$x,cyl_8_drv$x))
p5<-ggplot(data=data3, aes(x=Cyl, y=Mil, fill=Drv)) +
geom_bar(stat="identity", position=position_dodge())+
xlab("Cylinders in Car")+ylab("Mean-Mileage in City")+
ggtitle("2008 Cars Mean-City Mileage")
p5
From the above, it is clear that if you are picking up a 4 cylinder car then you should choose front-wheel drive car,if you are picking up a 5 cylinder car then you should choose 4-wheel drive car,if you are picking up a 6 cylinder car then you should choose front-wheel drive car, and if you are picking a 8 cylinder car then you should choose front-wheel drive car.
cyl_4<-dat08[dat08$cyl==4,]
cyl_4_drv<-aggregate(x=cyl_4$hwy,by=list(cyl_4$drv),FUN=mean)
cyl_4_drv
## Group.1 x
## 1 4 25.63636
## 2 f 30.96000
cyl_4_drv[3,]<-0
cyl_5<-dat08[dat08$cyl==5,]
cyl_5_drv<-aggregate(x=cyl_5$hwy,by=list(cyl_5$drv),FUN=mean)
cyl_5_drv
## Group.1 x
## 1 f 28.75
cyl_5_drv[c(2,3),]<-0
cyl_6<-dat08[dat08$cyl==6,]
cyl_6_drv<-aggregate(x=cyl_6$hwy,by=list(cyl_6$drv),FUN=mean)
cyl_6_drv
## Group.1 x
## 1 4 20.69231
## 2 f 25.26316
## 3 r 25.00000
cyl_8<-dat08[dat99$cyl==8,]
cyl_8_drv<-aggregate(x=cyl_8$hwy,by=list(cyl_8$drv),FUN=mean)
cyl_8_drv
## Group.1 x
## 1 4 16.38462
## 2 f 25.12500
## 3 r 19.33333
data4<-data.frame(Cyl=rep(c(4,5,6,8),each=3),Drv=rep(c("4wd","fwd","rwd"),times=4),Mil=c(cyl_4_drv$x,cyl_5_drv$x,cyl_6_drv$x,cyl_8_drv$x))
p6<-ggplot(data=data4, aes(x=Cyl, y=Mil, fill=Drv)) +
geom_bar(stat="identity", position=position_dodge())+
xlab("Cylinders in Car")+ylab("Mean-Mileage on Highway")+
ggtitle("2008 Cars Mean-Highway Mileage")
p6
grid.arrange(p5,p6)
From the above, it is clear that if you are picking up a 4 cylinder car then you should choose front-wheel drive car,if you are picking up a 5 cylinder car then you should choose 4-wheel drive car,if you are picking up a 6 cylinder car then you should choose front-wheel drive car or rear-wheel drive car, and if you are picking a 8 cylinder car then you should choose front-wheel drive car.
Also, from above it is clear that a 4 cylinder fwd car gives a far better average than its counterpart. This is self explanatory due to higher combustion rate in 6 & 8 cylinders cars.
The dataset used in this project is available as a part of ‘ggplot2’ library.