Abstract
I have compare the highway miles per gallon to the fuel displacement of a set of cars from 1999 to 2008 and found a negative correlation. Similarly I compared the numner of cylinders to the highway miles per callon and found a negetive correlation.
Introduction
I used the fuel economy data of cars from 1999 to 2008 provided by the Environmental Protection Agency (EPA). while the data is likely very credible as it comes from a very reliable source it is very outdated as the most recent cars in the data set are still over 10 years old. Additionally the data seems to only show data from the year 1999 and the year 2008 not any years in between so the data may not be the most accurate to the whole time period.
Description of the data
the Data set that I have used provideds 11 different statistics from 234 cars. the statistics that I used in this are the “hwy”, which is the number of miles per gallon the car gets on the highway; “displ”, which is the ammount of fuel the engine displaces in liters; “class”, whihc is the type of car for example minivan or suv; “model”, which is the name of the model of the car; and “cyl”, which is the number of cylinders in the cars engine.
Data Analysis
This is the library that I used to get the data and make the graphs
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
help(mpg)
This graph compares the highway mpiles per gallon and the fuel displacement of the car in the mpg data set. This graph shows that there is a negative correlation between the fuel displacement and the highway miles per gallon.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))

This creates an object that contains all of the outliers.
mpgOutliers <- mpg %>%
filter(hwy > 40 | class == "2seater" | (model == "grand prix" & displ == 5.3))
This graphs the mpg data set in the same way as the previous but highlights the outliers red.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpgOutliers, mapping = aes(x = displ, y = hwy), color = "red")

This graph is comparing the number of cyliders in the different cars to the miles per gallon on the highway. This shows an interesting correlation that the more cylenders there are the fewer mile per gallon the car is able to get in the highway. I have chosen to compare the number of cylinders to the highway miles per gallon on a scatter plot because it illustrates that while ther is some variation in the miles per gallon and that while there is a general negetive correlation there are still some cars with 8 cylinders that still get better miles per gallon than cars with 4 cylinders.
ggplot(data = mpg) +
geom_point(mapping = aes(x = cyl, y = hwy))

Conclusion
In conclusion, I found that the number of cylinders does have an impact on the miles per gallon a car can get which means that if someone wanted to get the best mpg on the highway they would want to look at cars that have fewer cyliders. Additionally as the fuel displacement has a similar effect they would want to consider getting a car that displaces less fuel.
References
This is where the data set came from.