Project Details

Through this project, I have tried to analyze Fuel Economy data from 1999 to 2008 for 38 popular models of cars.
Points that I have tried to answer through this project:
1. Best Cars in 1999 & 2008
2. Mean City v Highway mileage
3. 1999 cars cylinder-wise & drive-wise mean mileage for city
4. 1999 cars cylinder-wise & drive-wise mean mileage for highway
5. 2008 cars cylinder-wise & drive-wise mean mileage for city
6. 2008 cars cylinder-wise & drive-wise mean mileage for highway

Dataset Description

This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.

Usage

mpg

Format

A data frame with 234 rows and 11 variables:

  1. manufacturer
  • manufacturer name
  1. model
  • model name
  1. displ
  • engine displacement, in litres
  1. year
  • year of manufacture
  1. cyl
  • number of cylinders
  1. trans
  • type of transmission
  1. drv
  • the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd
  1. cty
  • city miles per gallon
  1. hwy
  • highway miles per gallon
  1. fl
  • fuel type
  1. class
  • “type” of car

Loading Required Packages for Analysis

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.6.3
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Basic information

View(mpg)
str(mpg)
## tibble [234 x 11] (S3: tbl_df/tbl/data.frame)
##  $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
##  $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
##  $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr [1:234] "f" "f" "f" "f" ...
##  $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr [1:234] "p" "p" "p" "p" ...
##  $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
dat99<-mpg[mpg$year==1999,]
View(dat99)
dat08<-mpg[mpg$year==2008,]
View(dat08)

For Comparison Purpose of City & Highway Mileage of fwd car having 4 cylinders

f99<-dat99[dat99$drv=="f" & dat99$cyl==4,]
f08<-dat08[dat08$drv=="f" & dat08$cyl==4,]

print("Mean-Mileage in city of a '99 fwd car having 4 cylinders: ")
## [1] "Mean-Mileage in city of a '99 fwd car having 4 cylinders: "
mean(f99$cty)
## [1] 22.06061
print("Mean-Mileage in city of a '08 fwd car having 4 cylinders: ")
## [1] "Mean-Mileage in city of a '08 fwd car having 4 cylinders: "
mean(f08$cty)
## [1] 22.08
print("Mean-Improvement: ")
## [1] "Mean-Improvement: "
mean(f08$cty)-mean(f99$cty)
## [1] 0.01939394
print("Mean-Mileage on highway of a '99 fwd car having 4 cylinders: ")
## [1] "Mean-Mileage on highway of a '99 fwd car having 4 cylinders: "
mean(f99$hwy)
## [1] 30.09091
print("Mean-Mileage on highway of a '08 fwd car having 4 cylinders: ")
## [1] "Mean-Mileage on highway of a '08 fwd car having 4 cylinders: "
mean(f08$hwy)
## [1] 30.96
print("Mean-Improvement: ")
## [1] "Mean-Improvement: "
mean(f08$hwy)-mean(f99$hwy)
## [1] 0.8690909

From above we can conclude that there has been a slight improvement in mileage of fwd cars having 4 cylinders. Similar Comparisons can be done on 6 & 8 cylinders cars.

Best cars in ’99 & ’08 years

print("Mean-Mileage in city of a '99 car:")
## [1] "Mean-Mileage in city of a '99 car:"
mean(dat99$cty)
## [1] 17.01709
print("Mean-Mileage on highway of a '99 car")
## [1] "Mean-Mileage on highway of a '99 car"
mean(dat99$hwy)
## [1] 23.42735
dat99<-mutate(dat99,cty_diff=dat99$cty-mean(dat99$cty),hwy_diff=dat99$hwy-mean(dat99$hwy))
print("Best Cars in '99: ")
## [1] "Best Cars in '99: "
head(dat99[order(dat99$cty_diff,decreasing=T),],n=5)
## # A tibble: 5 x 13
##   manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##   <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 volkswagen   new ~   1.9  1999     4 manu~ f        35    44 d     subc~
## 2 volkswagen   jetta   1.9  1999     4 manu~ f        33    44 d     comp~
## 3 volkswagen   new ~   1.9  1999     4 auto~ f        29    41 d     subc~
## 4 honda        civic   1.6  1999     4 manu~ f        28    33 r     subc~
## 5 toyota       coro~   1.8  1999     4 manu~ f        26    35 r     comp~
## # ... with 2 more variables: cty_diff <dbl>, hwy_diff <dbl>
head(dat99[order(dat99$hwy_diff,decreasing=T),],n=5)
## # A tibble: 5 x 13
##   manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##   <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 volkswagen   jetta   1.9  1999     4 manu~ f        33    44 d     comp~
## 2 volkswagen   new ~   1.9  1999     4 manu~ f        35    44 d     subc~
## 3 volkswagen   new ~   1.9  1999     4 auto~ f        29    41 d     subc~
## 4 toyota       coro~   1.8  1999     4 manu~ f        26    35 r     comp~
## 5 honda        civic   1.6  1999     4 manu~ f        28    33 r     subc~
## # ... with 2 more variables: cty_diff <dbl>, hwy_diff <dbl>

From the above, it is clear that Volkswagen’s New Beetle & Jetta are the best in terms of mileage.

print("Mean-Mileage in city of a '08 car:")
## [1] "Mean-Mileage in city of a '08 car:"
mean(dat08$cty)
## [1] 16.70085
print("Mean-Mileage on highway of a '08 car")
## [1] "Mean-Mileage on highway of a '08 car"
mean(dat08$hwy)
## [1] 23.45299
dat08<-mutate(dat08,cty_diff=dat08$cty-mean(dat08$cty),hwy_diff=dat08$hwy-mean(dat08$hwy))
print("Best Cars in '08: ")
## [1] "Best Cars in '08: "
head(dat08[order(dat08$cty_diff,decreasing=T),],n=5)
## # A tibble: 5 x 13
##   manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##   <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 toyota       coro~   1.8  2008     4 manu~ f        28    37 r     comp~
## 2 honda        civic   1.8  2008     4 manu~ f        26    34 r     subc~
## 3 toyota       coro~   1.8  2008     4 auto~ f        26    35 r     comp~
## 4 honda        civic   1.8  2008     4 auto~ f        25    36 r     subc~
## 5 honda        civic   1.8  2008     4 auto~ f        24    36 c     subc~
## # ... with 2 more variables: cty_diff <dbl>, hwy_diff <dbl>
head(dat08[order(dat08$hwy_diff,decreasing=T),],n=5)
## # A tibble: 5 x 13
##   manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##   <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 toyota       coro~   1.8  2008     4 manu~ f        28    37 r     comp~
## 2 honda        civic   1.8  2008     4 auto~ f        25    36 r     subc~
## 3 honda        civic   1.8  2008     4 auto~ f        24    36 c     subc~
## 4 toyota       coro~   1.8  2008     4 auto~ f        26    35 r     comp~
## 5 honda        civic   1.8  2008     4 manu~ f        26    34 r     subc~
## # ... with 2 more variables: cty_diff <dbl>, hwy_diff <dbl>

From the above, it is clear that Toyota’s Corolla & Honda’s Civic are the best in terms of mileage.

Comparison of city & highway mean mileage

cyl_cty<-aggregate(x=mpg$cty,by=list(mpg$cyl),FUN=mean)
cyl_cty
##   Group.1        x
## 1       4 21.01235
## 2       5 20.50000
## 3       6 16.21519
## 4       8 12.57143
p1<-ggplot(data=cyl_cty, aes(x=Group.1, y=x,fill=Group.1)) +
  geom_bar(stat="identity")+
  geom_text(aes(label=signif(x,digits=4)), vjust=-0.2, size=3.5)+
  xlab("Cylinders in Car")+ylab("Mean-Mileage in City")
p1

cyl_hwy<-aggregate(x=mpg$hwy,by=list(mpg$cyl),FUN=mean)
cyl_hwy
##   Group.1        x
## 1       4 28.80247
## 2       5 28.75000
## 3       6 22.82278
## 4       8 17.62857
p2<-ggplot(data=cyl_hwy, aes(x=Group.1, y=x,fill=Group.1)) +
  geom_bar(stat="identity")+
  geom_text(aes(label=signif(x,digits=4)), vjust=-0.2, size=3.5)+
  xlab("Cylinders in Car")+ylab("Mean-Mileage on Highway")
p2

grid.arrange(p1,p2)

From the above, it is clear that 4 cylinders & 5 cylinders cars have higher mileage than their counterparts.

1999 cars mean-city mileage

cyl_4<-dat99[dat99$cyl==4,]
cyl_4_drv<-aggregate(x=cyl_4$cty,by=list(cyl_4$drv),FUN=mean)
cyl_6<-dat99[dat99$cyl==6,]
cyl_6_drv<-aggregate(x=cyl_6$cty,by=list(cyl_6$drv),FUN=mean)
cyl_8<-dat99[dat99$cyl==8,]
cyl_8_drv<-aggregate(x=cyl_8$cty,by=list(cyl_8$drv),FUN=mean)
cyl_4_drv[3,]<-0
cyl_8_drv[3,]<-0

data1<-data.frame(Cyl=rep(c(4,6,8),each=3),Drv=rep(c("4wd","fwd","rwd"),times=3),Mil=c(cyl_4_drv$x,cyl_6_drv$x,cyl_8_drv$x))

p3<-ggplot(data=data1, aes(x=Cyl, y=Mil, fill=Drv)) +
  geom_bar(stat="identity", position=position_dodge())+
  xlab("Cylinders in Car")+ylab("Mean-Mileage in City")+
  ggtitle("1999 Cars Mean-City Mileage")
p3

From the above, it is clear that if you are picking up a 4 cylinder car then you should choose front-wheel drive car, if you are picking up a 6 cylinder car then you should choose rear-wheel drive car, and if you are picking a 8 cylinder car then you should choose front-wheel drive car.

1999 cars mean-highway mileage

cyl_4<-dat99[dat99$cyl==4,]
cyl_4_drv<-aggregate(x=cyl_4$hwy,by=list(cyl_4$drv),FUN=mean)
cyl_6<-dat99[dat99$cyl==6,]
cyl_6_drv<-aggregate(x=cyl_6$hwy,by=list(cyl_6$drv),FUN=mean)
cyl_8<-dat99[dat99$cyl==8,]
cyl_8_drv<-aggregate(x=cyl_8$hwy,by=list(cyl_8$drv),FUN=mean)
cyl_4_drv
##   Group.1        x
## 1       4 23.66667
## 2       f 30.09091
cyl_6_drv
##   Group.1        x
## 1       4 18.68421
## 2       f 24.91667
## 3       r 25.50000
cyl_8_drv
##   Group.1        x
## 1       4 15.77778
## 2       r 19.55556
cyl_4_drv[3,]<-0
cyl_8_drv[3,]<-0

data2<-data.frame(Cyl=rep(c(4,6,8),each=3),Drv=rep(c("4wd","fwd","rwd"),times=3),Mil=c(cyl_4_drv$x,cyl_6_drv$x,cyl_8_drv$x))

p4<-ggplot(data=data2, aes(x=Cyl, y=Mil, fill=Drv)) +
  geom_bar(stat="identity", position=position_dodge())+
  xlab("Cylinders in Car")+ylab("Mean-Mileage on Highway")+
  ggtitle("1999 Cars Mean-Higway Mileage")
p4

grid.arrange(p3,p4)

From the above, it is clear that if you are picking up a 4 cylinder car then you should choose front-wheel drive car, if you are picking up a 6 cylinder car then you should choose rear-wheel drive car, and if you are picking a 8 cylinder car then you should choose front-wheel drive car.

Also, from above it is clear that a 4 cylinder fwd car gives a far better average than its counterpart. This is self explanatory due to higher combustion rate in 6 & 8 cylinders cars.

2008 cars mean-city mileage

cyl_4<-dat08[dat08$cyl==4,]
cyl_4_drv<-aggregate(x=cyl_4$cty,by=list(cyl_4$drv),FUN=mean)
cyl_4_drv
##   Group.1        x
## 1       4 19.27273
## 2       f 22.08000
cyl_4_drv[3,]<-0
cyl_5<-dat08[dat08$cyl==5,]
cyl_5_drv<-aggregate(x=cyl_5$cty,by=list(cyl_5$drv),FUN=mean)
cyl_5_drv
##   Group.1    x
## 1       f 20.5
cyl_5_drv[c(2,3),]<-0
cyl_6<-dat08[dat08$cyl==6,]
cyl_6_drv<-aggregate(x=cyl_6$cty,by=list(cyl_6$drv),FUN=mean)
cyl_6_drv
##   Group.1        x
## 1       4 15.15385
## 2       f 17.26316
## 3       r 16.50000
cyl_8<-dat08[dat99$cyl==8,]
cyl_8_drv<-aggregate(x=cyl_8$cty,by=list(cyl_8$drv),FUN=mean)
cyl_8_drv
##   Group.1        x
## 1       4 12.46154
## 2       f 17.62500
## 3       r 13.16667
data3<-data.frame(Cyl=rep(c(4,5,6,8),each=3),Drv=rep(c("4wd","fwd","rwd"),times=4),Mil=c(cyl_4_drv$x,cyl_5_drv$x,cyl_6_drv$x,cyl_8_drv$x))

p5<-ggplot(data=data3, aes(x=Cyl, y=Mil, fill=Drv)) +
  geom_bar(stat="identity", position=position_dodge())+
  xlab("Cylinders in Car")+ylab("Mean-Mileage in City")+
  ggtitle("2008 Cars Mean-City Mileage")
p5

From the above, it is clear that if you are picking up a 4 cylinder car then you should choose front-wheel drive car,if you are picking up a 5 cylinder car then you should choose 4-wheel drive car,if you are picking up a 6 cylinder car then you should choose front-wheel drive car, and if you are picking a 8 cylinder car then you should choose front-wheel drive car.

2008 cars mean-highway mileage

cyl_4<-dat08[dat08$cyl==4,]
cyl_4_drv<-aggregate(x=cyl_4$hwy,by=list(cyl_4$drv),FUN=mean)
cyl_4_drv
##   Group.1        x
## 1       4 25.63636
## 2       f 30.96000
cyl_4_drv[3,]<-0
cyl_5<-dat08[dat08$cyl==5,]
cyl_5_drv<-aggregate(x=cyl_5$hwy,by=list(cyl_5$drv),FUN=mean)
cyl_5_drv
##   Group.1     x
## 1       f 28.75
cyl_5_drv[c(2,3),]<-0
cyl_6<-dat08[dat08$cyl==6,]
cyl_6_drv<-aggregate(x=cyl_6$hwy,by=list(cyl_6$drv),FUN=mean)
cyl_6_drv
##   Group.1        x
## 1       4 20.69231
## 2       f 25.26316
## 3       r 25.00000
cyl_8<-dat08[dat99$cyl==8,]
cyl_8_drv<-aggregate(x=cyl_8$hwy,by=list(cyl_8$drv),FUN=mean)
cyl_8_drv
##   Group.1        x
## 1       4 16.38462
## 2       f 25.12500
## 3       r 19.33333
data4<-data.frame(Cyl=rep(c(4,5,6,8),each=3),Drv=rep(c("4wd","fwd","rwd"),times=4),Mil=c(cyl_4_drv$x,cyl_5_drv$x,cyl_6_drv$x,cyl_8_drv$x))

p6<-ggplot(data=data4, aes(x=Cyl, y=Mil, fill=Drv)) +
  geom_bar(stat="identity", position=position_dodge())+
  xlab("Cylinders in Car")+ylab("Mean-Mileage on Highway")+
  ggtitle("2008 Cars Mean-Highway Mileage")
p6

grid.arrange(p5,p6)

From the above, it is clear that if you are picking up a 4 cylinder car then you should choose front-wheel drive car,if you are picking up a 5 cylinder car then you should choose 4-wheel drive car,if you are picking up a 6 cylinder car then you should choose front-wheel drive car or rear-wheel drive car, and if you are picking a 8 cylinder car then you should choose front-wheel drive car.

Also, from above it is clear that a 4 cylinder fwd car gives a far better average than its counterpart. This is self explanatory due to higher combustion rate in 6 & 8 cylinders cars.

Sources

The dataset used in this project is available as a part of ‘ggplot2’ library.