library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Abstract

In this project, I examined the MPG dataset. I looked in particular at highway fuel efficiency against other variables. This information might influence a future automobile purchase. I found in particular that diesel- powered subcompact cars were the most fuel efficient highway vehicles. I also found that 2-seater cars were particularly efficient among cars with large engines, which may also be of interest to consumers who are considering a car with a large engine.

Introduction

In this project, I examined the EPA mpg dataset. This dataset is included in the R tidyverse package, and also available from https://fueleconomy.gov

The dataset is somewhat outdated, as it only contains data from 1999 to 2008. It contains 234 makes of automobile, which is likely not an exhaustive list even for the time period. It is limited to data on cars which had a new release for each of those years. However, it may still be useful as a comparison point for the specifications on a used car someone is considering. However, a consumer shopping for a car today should factor newer data into their consideration as well.

Description of the Data

The mpg dataset contains fuel economy data for 38 models and 234 makes of car which werereleased between 1999 and 2008.

It consists of 11 variables:

manufacturer manufacturer name

model model name

displ engine displacement, in litres

year year of manufacture

cyl number of cylinders

trans type of transmission

drv the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd

cty city miles per gallon

hwy highway miles per gallon

fl fuel type

class “type” of car

Data Analysis

Plot a scatter plot of ‘hwy’ vs ‘displ’

mpg %>%
  ggplot() + 
  geom_point(mapping = aes(x = displ, y = hwy))

There are a few points which lie well above the linear trend at the high end of the displacement axis. There are also a couple points which have unusually high highway mileage for being at the lower end of the displacement axis.

Color the plot by ‘class’

mpg %>%
  ggplot() + 
  geom_point(mapping = aes(x = displ, y = hwy, color=class))

We see that the outliers with large engine displacement are 2 seaters.

Color the plot by fuel type (‘fl’)

mpg %>%
  ggplot() + 
  geom_point(mapping = aes(x = displ, y = hwy, color=fl))

Here we can see that the outlying vehicles with smaller engines are diesel powered subcompacts. It makes sense that they have good highway mileage with a small engine. We can see that the other diesel engines on the plot are SUVs, so they are not quite so efficient.

mpg2 <- filter(mpg, class == "2seater")
mpg3 <- filter(mpg, fl == "d") %>% filter(displ < 2)

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point() +
  geom_point(data = mpg2, color = "red", size = 2) +
  geom_point(data = mpg3, color = "red", size = 2)

Conclusion

We have demonstrated that cars in this dataset with the best highway fuel efficiency were subcompact cars with diesel engines. We also saw that 2-seater cars can accomodate a large engine without sacrificing fuel efficiency to the same degree as other classes of vehicle. Again, the dataset is somewhat out of date. Consumers who primarily care about fuel efficiency should consider newer data, and perhaps look into hybrid or electric cars, for which technology and price have advanced substantially since this dataset was published.

Rerences

R tidyverse mpg dataset - https://ggplot2.tidyverse.org/reference/mpg.html

RStudio Cloud Data Visualization Primer - https://rstudio.cloud/learn/primers/3