Abstract: The scatter plot and government sourced car data, demonstrate that displacement and highway mileage have a strong negative correlation. The relationship is due to higher displacement vehicles using more fuel, thus decreasing the car’s fuel efficiency. Furthermore, there are three outliers all from the same maker Volkswagen and manufactured in 1999. These cars share key characteristics such as running on diesel, two Volkswagen Jetta models having manual transmission, and compact or subcompact body. These characteristics allow it to greatly surpass the average high way mileage of all the cars in the data set. In addition the relationship between car manufacturer and class was investigated. Each car manufacture has a niche in the market. For example, small car manufactures such as Land Rover and Mercury, solely produced SUVS. This is seen Dodge that is the sole producer of minivans and Chevrolet the sole producer of 2seater cars.
Introduction:
The car data was gathered by the U.S. Department of Energy and includes data from 1999-2008. The car data reports are still accessible individually by year, in data file and pdf format. Due to parts of the data going back to 1999 there could be inconsistent data with today's methodology of calculating car fuel efficiency. Furthermore, as the cars are older the data may not be as applicable for today's car usage. Lastly, the data was collected by the U.S. DOE therefore it includes only United States car data, meaning that it is not applicable to other countries.
Data Description:
The data set, has a sample car size of 234 alongside 11 different variables. Some of the variables included are manufacturer, model, year made, class, displacement, and highway milleage. The scatter plot below represents the relationship betweeen x= displ and y=hwy. The variable displ is defined as the engine displacement meaning the cylinder volume swept by all of the engine pistons. The variable hwy is defined as the car's high way milleage. Also, the only bar graph presented uses the same data set to compare the following variables, manufacturer and class (vehicle size).
Data Analysis:
The scatterplot is demonstrating a strong negative correaltion between engine diplacement and highway mileage across the different car classes. A high engine displacement results in more engine power causing more fuel consumption, thus less highway milleage efficiency. This relationship is evident in the graph as cars with a displacement of 4 or higher tend to average around 16-20 mpg on the highway. In contrast, cars with a lower than 4 displacement average around 25-30 mpg on the highway. Another important observation is that bigger cars such as suvs and pickups have higher displacements and are less fuel efficient.
As can be seen in the graph, there are three red dots (2 dots are overlapping) that reperesent outliers. These cars share key characterstics such as running on diesel, two Voltswagon models have manual transimission, and similar class either compact or semicompact. According to the U.S. department of energy, diesel cars can travel 20-35% farther on a galon of fuel compared to regular cars. In addition, a manual transimission improves gas mileage by 2-5 mpg compared to automatic transimission. These two pieces of information are salient in the outliers. For example, a 1999 Voltswagon New Bettle 1.9L manual transimission diesel has a hwy mpg of 44. In contrast, the regular fuel and automatic 1999 Voltswagon New Bettle has hwy mpg of 41. The 3 mpg difference is due to the transimission type. Moreover, the outlier is further explained by the fact that the majority of the cars have a displacement greater than 2.0 and use regular fuel.
The bar graph demonstrates the relationship between car manufacturer and class (car size). The bar chart highlights which companies heavily produce certain car sizes. According to the data set and the graph, smaller luxury car companies like Land Rover, Mercury, Lincoln, and Jeep solely account for suvs. In addition, the larger car manufacturing companies like Ford, Dodge, and Toyota are the only producers of pick up trucks. Furthermore, Dodge is the only company that manufactures minivans and has the largest volume of cars accounted for in the data set. Lastly, Honda only produces subcompact cars, Chervolet is the sole producer of 2seater, and compact cars are produced by European manufactures such as Voltswagon and Audi. Another important observation is that larger manufacturers like Ford and Dodge account for more car model represenation. In sum, the data market is representative of market forces as each car manufacturing brand has a niche in certain car classes
Conclusion:
The scatter plot demonstrated the clear relationship between car displacement and high way mileage, which is strongly negatively correlated. This is due to the fact that higher displacement led to less fuel efficiency. About half of the cars were in the higher displacement range of 4 or above, and had averages of 20 mpg. Furthermore, the three car outliers denoted with red occurred due to the cars having low displacement rate, using diesel fuel, and two had manual transmission. Moving on, the bar graph represents the relationship between car manufacturer and class. As can be seen on the graph certain car manufacturers accounted for the sole production of vehicles. This is representative of the car market place where certain car manufactures control the prodcution of a specific car size.
Refrences:
https://www.fueleconomy.gov/feg/bymodel/1999_Volkswagen_Jetta.shtml https://www.fueleconomy.gov/feg/download.shtml https://www.fueleconomy.gov/feg/di_diesels.shtml https://www.consumerreports.org/cro/2012/01/save-gas-and-money-with-a-manual-transmission/index.htm http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually
Scatter plot of displ vs hwy:
library(ggplot2)
ggplot(data=mpg)+
geom_point(mapping = aes(x = displ, y = hwy, color=class))
Scatter plot:
Used dplyr to access mutate function and color scheme the data as outlier and non outlier. Used aesthetic ptColor.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mpg=mutate(mpg,ptColor="black")
mpg[mpg$hwy>40,"ptColor"]="red"
g=ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy,color=ptColor))
print(g)
Scatter plot: Car displ vs hwy with outliers highlighted in red:
Used the previous geom function in addition to scale_color_manual to manually estabish the colors for the data set, which are “black” normal data and “red” for outliers.
g=ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy,color=ptColor))+scale_color_manual(values=c("black","red"))
print(g)
Bar graph: Manufacture vs class
Used geom_bar to make a bar graph and included aesthetic fill, to differentiate the car classes.
ggplot(data=mpg) +
geom_bar(mapping=aes(x=manufacturer,fill=class))