library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
In this project, I examined the MPG dataset. I looked in particular at highway fuel efficiency against other variables. This information might influence a future automobile purchase. I found in particular that diesel- powered subcompact cars were the most fuel efficient highway vehicles. I also found that 2-seater cars were particularly efficient among cars with large engines, which may also be of interest to consumers who are considering a car with a large engine.
In this project, I examined the EPA mpg dataset. This dataset is included in the R tidyverse package, and also available from https://fueleconomy.gov
The dataset is somewhat outdated, as it only contains data from 1999 to 2008. It contains 234 makes of automobile, which is likely not an exhaustive list even for the time period. It is limited to data on cars which had a new release for each of those years. However, it may still be useful as a comparison point for the specifications on a used car someone is considering. However, a consumer shopping for a car today should factor newer data into their consideration as well.
The mpg dataset contains fuel economy data for 38 models and 234 makes of car which werereleased between 1999 and 2008.
It consists of 11 variables:
manufacturer manufacturer name
model model name
displ engine displacement, in litres
year year of manufacture
cyl number of cylinders
trans type of transmission
drv the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd
cty city miles per gallon
hwy highway miles per gallon
fl fuel type
class “type” of car
Plot a scatter plot of ‘hwy’ vs ‘displ’
mpg %>%
ggplot() +
geom_point(mapping = aes(x = displ, y = hwy))
There are a few points which lie well above the linear trend at the high end of the displacement axis. There are also a couple points which have unusually high highway mileage for being at the lower end of the displacement axis.
Color the plot by ‘class’
mpg %>%
ggplot() +
geom_point(mapping = aes(x = displ, y = hwy, color=class))
We see that the outliers with large engine displacement are 2 seaters.
Color the plot by fuel type (‘fl’)
mpg %>%
ggplot() +
geom_point(mapping = aes(x = displ, y = hwy, color=fl))
Here we can see that the outlying vehicles with smaller engines are diesel powered subcompacts. It makes sense that they have good highway mileage with a small engine. We can see that the other diesel engines on the plot are SUVs, so they are not quite so efficient.
mpg2 <- filter(mpg, class == "2seater")
mpg3 <- filter(mpg, fl == "d") %>% filter(displ < 2)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_point(data = mpg2, color = "red", size = 2) +
geom_point(data = mpg3, color = "red", size = 2)
We have demonstrated that cars in this dataset with the best highway fuel efficiency were subcompact cars with diesel engines. We also saw that 2-seater cars can accomodate a large engine without sacrificing fuel efficiency to the same degree as other classes of vehicle. Again, the dataset is somewhat out of date. Consumers who primarily care about fuel efficiency should consider newer data, and perhaps look into hybrid or electric cars, for which technology and price have advanced substantially since this dataset was published.
R tidyverse mpg dataset - https://ggplot2.tidyverse.org/reference/mpg.html
RStudio Cloud Data Visualization Primer - https://rstudio.cloud/learn/primers/3