The Environmental Protection Agency (EPA) offers a public facing data set of fuel efficiency estimates for a range of cars to assist the public in purchasing a car. The subset of data utilized in this analysis focused on 38 models of vehicles which had a new release for the years of 1999 and 2008. I analyzed the fuel efficiency of the vehicles by observing and creating a scatter plot to determine the effects of the engine displacement and highway MPG of the 38 models. The scatterplot has a couple of outliers, which raised questions between the relationship between the two variables.
The MPG data utilized to conduct this analysis was collected from the United States Department of Energy posted on their website https://fueleconomy.gov/. Potential problems within the data include that the data in this project is outdated causing possible inconsistencies within the data. Additionally, the sample size was limited to model years of 1999 and 2008 causing possible inaccurate data. The MPG data reports from 1999 and 2008 are still accessible to today (2021).
The mpg dataset contains only models that had a new release every year between 1999 and 2008.
“manufacturer” refers to the vehicle manufacturer name.
“model” refers to the vehicle model name.
“displ” refers to the engine displacement in liters.
“year” refers to the year the vehicle was manufactured.
“cyl” refers to the number of cylinders in the engine.
“trans” refers to the type of transmission in the vehicles.
“drv” refers to the type of drive train
“cty” refers to the miles per gallon in the city.
“hwy” refers to the miles per gallon on the highway.
“fl” refers to the type of fuels used by the vehicles.
“class” refers to the type of car.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
Outlier<- mpg$displ > 5 & mpg$hwy > 20| mpg$displ < 5 & mpg$hwy > 40 | mpg$displ < 5 & mpg$hwy < 15
ggplot(data=mpg, mapping = aes(x=displ, y = hwy, color=Outlier)) +
ggtitle("Highway MPG vs. Engine Displacement and Outliers")+
xlab("Engine Displacement (Displ)")+
ylab("Highway Miles Per Gallon (hwy)")+
geom_point()+
scale_colour_manual(values = c("black","red"))
ggplot(data=mpg)+
geom_point(mapping = aes(x = displ, y = hwy, color=class)) +
ggtitle("Vehicle Classes")+
xlab("Engine Displacement (Displ)")+
ylab("Highway Miles Per Gallon (hwy)")
According to the scatterplot there is a strong linear negative correlation between highway mileage per gallon (hwy) and engine displacement (displ) across the different car classes. Cars that have a higher engine displacement tend to have a lower highway mileage per gallon. There are few outliers that are highlighted in red that are abnormal to the rest of the data.
The first group of outliers are located on the top left quadrant of the graph displaying a highway mileage per gallon above 40 and an engine displacement below 2. These two vehicles are a 1999 Subcompact Volkswagen New Beetle and a 1999 Compact Volkswagen Jetta classified as subcompact and compact cars. Usually these vehicles tend to have a lower engine displacement which explains the highway MPG above 40.
The second group of outliers are located on the right side of the graph displaying a highway mileage per gallon above 20 and an engine displacement above 5. These vehicles are a 1999 Chevrolet Corvette, 2008 Chevrolet Corvette, and a 2008 Pontiac Grand Prix. These vehicles are classified as 2seater and midsize cars. 2seater are relatively small, however Corvette is a sport car that has high engine displacement and consumes more fuel than other similar car sizes to reach its required speed. The 2008 Pontiac Grand Prix using premium fuel causing higher highway MPG.
The third group of outliers are located on the bottom center of the graph displaying highway mileage per gallon below 10 and an engine displacement above 4. This vehicle is a 2008 Grand Cherokee 4wd Jeep classified as a SUV. SUVs are considered to be large cars. We would expect that this vehicle would have a higher engine displacement causing low highway MPG considering the size of the car.
By examining the relationship between Highway Mileage Per Gallon (hwy) and Engine displacement (displ) given by the EPA fuel efficiency data, there is a clear linear negative correlation between fuel efficiency and engine displacement. The data supports our hypotheses that smaller cars tend to have better fuel efficiency than larger cars, meaning manufacturers need to focus on decreasing engine displacement in vehicles to improve vehicles’ fuel economy. Additionally, more data should be gathered to determine if the findings of this report are accurate as the EPA data only covers certain types of vehicle classes and models from 1999 and 2008. With this data the public can have a basic foundation on what model of car they want from 1999 and 2008.
https://www.fueleconomy.gov/feg/how_tested.shtml
https://www.fueleconomy.gov/feg/epadata/99feg.pdf
https://www.fueleconomy.gov/feg/pdfs/guides/FEG2008.pdf
https://www.epa.gov/compliance-and-fuel-economy-data/data-cars-used-testing-fuel-economy
https://www.epa.gov/greenvehicles/testing-national-vehicle-and-fuel-emissions-laboratory
https://www.nhtsa.gov/laws-regulations/corporate-average-fuel-economy