The Department of Transportation oversees the fuel economy standards of vehicles built by manufacturers. The Environmental Protection Agency conducts the testing of those manufactured vehicles. The Department of Energy maintains a public database of Environmental Protection Agency findings. The subset of data utilized in this analysis focused on 38 models of vehicles which had a new release every year between 1999 and 2008. We analyzed the fuel efficiency of the vehicles by observing the effects of the engine displacement and fuel types of the 38 models.
The Corporate Average Fuel Economy Standards regulate how far a vehicle must travel on a gallon of fuel. They were originally established in 1975. The corporate average fuel economy standards are currently set by the National Highway Traffic Safety Administration which falls under the United States Department of Transportation. The fuel economy tests that determine whether these standards are met are conducted under controlled conditions at the National Vehicle and Fuel Emissions Laboratory in Ann Arbor, Michigan. The laboratory is operated by the United States Environmental Protection Agency (EPA). Car manufactures must submit their own fuel efficiency data to the EPA. The EPA then reviews their results and conducts their own tests to confirm the reports submitted by the manufacturers. The EPA test results are provided to the Department of Energy, Department of Transportation, and the Internal Revenue Service. The Department of Energy maintains public records of the fuel economy test conducted every year.
The mpg dataset utilized to conduct this analysis is a subset of the information posted by the Department of Energy. The criteria for vehicles selected to be part of the mpg dataset was that the models were manufactured every between 1999 to 2008. 38 models were selected that were released either on 1999 or 2008. The aim of this analysis is to view the affects on highway performance by fuel type and engine displacement, using miles per gallon as the unit of measure of that performance.
The mpg dataset contains only models that had a new release every year between 1999 and 2008. Eleven variables were observed in the dataset.
“manufacturer” refers to the vehicle manufacturer name. The dataset has fifteen distinct manufacturers.
“model” refers to the vehicle model name. The dataset has thirty-eight distinct models.
“displ” refers to the engine displacement in liters. Engine displacement is the volume of a combustion chamber in an engine per the number of cylinders in the engine. The engine displacement values in the dataset range from 1.6 to 7.0 liters.
“year” refers to the year the vehicle was manufactured. The vehicles in the data were either manufactured in 1999 or 2008.
“cyl” refers to the number of cylinders in the engine. The vehicles in the dataset either have four, six or eight cylinder engines.
“trans” refers to the type of transmission in the vehicles. The dataset covers ten different types of transmissions that are either automatic or manual.
“drv” refers to the type of drive train where “f” is front-wheel drive, “r” is rear wheel drive, and “4” is four wheel drive.
“cty” refers to the miles per gallon in the city. City miles in the dataset range from 9 to 35 miles per gallon.
“hwy” refers to the miles per gallon on the highway. Highway miles in the dataset range from 12 to 44 miles per gallon.
“fl” refers to the type of fuels used by the vehicles. The five fuel types are “e” for ethanol, “d” for diesel, “r” for regular, “p” for premium, and “c” for natural gas.
“class” refers to the type of car. There are 7 distinct types of cars in the dataset.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.5 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa~
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa~
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa~
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa~
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa~
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa~
str(mpg)
## tibble [234 x 11] (S3: tbl_df/tbl/data.frame)
## $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
## $ model : chr [1:234] "a4" "a4" "a4" "a4" ...
## $ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr [1:234] "f" "f" "f" "f" ...
## $ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr [1:234] "p" "p" "p" "p" ...
## $ class : chr [1:234] "compact" "compact" "compact" "compact" ...
Outlier<- mpg$displ > 5 & mpg$hwy > 20| mpg$displ < 5 & mpg$hwy > 40 | mpg$displ < 5 & mpg$hwy < 15
ggplot(data=mpg, mapping = aes(x=displ, y = hwy, color=Outlier, shape=fl, size=2))+
ggtitle("Engine Displacment affect on Highway Milage")+
xlab("Engine Displacement (L)")+
ylab("Highway Miles Per Gallon")+
geom_point()+
scale_colour_manual(values = c("green","red"))
The graph displays a negative correlation between the engine displacement and highway mileage in vehicles. Two main factors affecting highway mileage in vehicles are the weight of a vehicle and the engine displacement. Heavy vehicles like pickup trucks and SUVs tend to have higher levels of engine displacement due to the size of their engine. This correlates with heavier vehicles having lower highway and city mileages per gallon compared to smaller vehicles. A third factor affecting mpg is the type of fuel utilized by the vehicles. The fuel types has created some outliers in the graph. The graphs displays three groups of outliers, which are marked in red.
The first group of outliers are located on the far left of the graph displaying a highway mileage per gallon above 40. These vehicles are the 1999 Volkswagen New Beetle and 1999 Volkswagen Jetta which operate using diesel fuel. Diesel fuel has been shown to produce better fuel efficiency than other fuels. The utilization of diesel fuel is a factor on their highway mileage being higher than all the other vehicles. These vehicles’ classes are subcompact and compact which tend to have a lower engine displacement and lead to better highway mpg.
The second group of outliers are located on the bottom center of the graph displaying highway mileage per gallon below 10. These vehicles are the 2008 Dodge Ram 1500 pickup 4wd, 2008 Jeep Grand Cherokee 4wd, 2008 Dodge Durango 4wd, and 2008 Dodge Dakota Pickup 4wd. The vehicles’ classes are pickups and SUVs. Being larger cars, we would expect the vehicles to have a higher engine displacement and low highway mpg. But another factor affecting their poor highway mileage is the type of fuel. These vehicles utilize ethanol and the graph displays that engines utilizing ethanol have a lower highway mileage compared to other vehicles.
The third group of outliers are located on the far right of the graph displaying highway mileage per gallon above 20. These vehicles are 1999 Chevrolet Corvette, 2008 Chevrolet Corvette, and 2008 Pontiac Grand Prix. The vehicles’ classes are 2seater and midsize. The majority of the other vehicles displaying high level of displacement are pickups and SUVs but the size of these vehicles seems to improve the highway mileage even though the engines have a high displacement.
ggplot(data=mpg, mapping = aes(x=fl, y = cty, fill=fl))+
ggtitle("Fuel Distribution")+
xlab("Fuel Type")+
ylab("City Miles Per Gallon")+
geom_boxplot()+
expand_limits(y = 0)
The graph above displays the distribution of the fuel type by the city miles per gallon. The distribution of the ethanol is lower than the other fuel types. We would expect cars operating with ethanol would perform poorly in fuel economy. The distribution between premium and regular are similar. We would expect cars operating with premium to be slightly better than regular gas based off of the medians. Regular gas has some outliers in its distribution. Natural gas only had one data point resulting in all the statistics being the same creating a line. Diesel had the largest distribution. We would expect cars operating with diesel would perform better than other fuels.
The dataset revealed a negative correlation between engine displacement and highway mileage. Manufacturers will need to continue focusing on decreasing engine displacement in vehicles to improve vehicle fuel economy. A review a data for the models after 2008 would help to confirm if the manufactures were able to improve their vehicle engine displacement. Fuel type selection has shown to have an impact on fuel efficiency. Vehicles operating on ethanol had a lower fuel efficiency compared the other fuel types. Diesel fuel seems to provide the best performance. Since the dataset did not have an equal representation of all the fuel types, more data would be required to determine which fuel type would the most efficient.
https://www.epa.gov/compliance-and-fuel-economy-data/data-cars-used-testing-fuel-economy
https://www.epa.gov/greenvehicles/testing-national-vehicle-and-fuel-emissions-laboratory
https://www.nhtsa.gov/laws-regulations/corporate-average-fuel-economy
https://www.fueleconomy.gov/feg/how_tested.shtml
https://driving-tests.org/beginner-drivers/types-and-grades-of-fuel/
https://www.rdocumentation.org/packages/ggplot2/versions/3.3.3/topics/mpg#:~:text=hwy%20%3A%20Highway%20miles%20per%20gallon,%3A%20The%20’type’%20of%20car