This is my analysis of a set of auto data that is part of the ISRL library which accompanies the book “An Introduction to Statistical Learning, with applications in R” (2013) by G. James, D. Witten, T. Hastie and R. Tibshirani. The data contains variables, which include: MPG, cylinders, displacement, horsepower, weight, and acceleration.
auto <- read.csv("Auto-Data-Set.csv")
As a bit of a car person (mostly high-performance) I’ve generally kept track of the various factors the make for fast cars. These factors are the number of cylinders and the overall horsepower. In simple terms, the more cylinders an engine has the more horsepower that can be generated. It would seem logical of course that the more power generated the more fuel would be required, which would be correlated to the miles per gallon (MPG) rating for a particular vehicle. In this particular data set I saw a range of MPG ratings from 9 to 46.6 (see graph below).
hist(auto$mpg)
In looking at what I believed to be key factors for impacting MPG I mapped the relationship between the number of cylinders and the MPG for each of the vehicles (see graph below). It was very clear in this measurement that as the number of cylinders in the vehicle increased that the MPG would decrease. This is what I expected and it makes sense, as more cylinders require more fuel.
ggplot(auto, aes(x=cylinders, y=mpg)) + geom_point()
Similarly, the greater the horsepower, the lower the MPG. This too makes sense as it requires more fuel to generate more horsepower.
ggplot(auto, aes(x=horsepower, y=mpg)) + geom_point()
## Warning: Removed 5 rows containing missing values (geom_point).
Overall, MPG is a function of the energy required to move the automobile. Energy is created from fuel and the more things that use fuel (cylinders) or the higher performance you want to have will negatively impact the MPG rating.