## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Loading required package: viridisLite
This data set shows the fuel consumption (Highway, City, and Combined), CO2 emissions along with fuel type, transmission type, number of cylinders, engine size, and vehicle class for specific car makes and models from 2000 - 2022 for retail sale in Canada.
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.60 9.10 10.60 11.03 12.70 26.10
Min. 1st Qu. Median Mean 3rd Qu. Max.
83.0 209.0 243.0 250.1 288.0 608.0
Question: Is there a relationship between emissions, fuel consumption, and engine size for cars with more or less than 5 cylinders?
We predicted that fuel consumption and emissions level will have a strong positive relationship/an increasing trend, meaning as the amount of fuel a car consumes increases, the amount of emissions also increases proportionally. While the relationship between the two seems intuitive, we wanted to explore how the size of the car would effect this relationship, and hypothesized that there might be a steeper relationship between fuel consumption and emissions if the engine size of the vehicle is larger.
As expected, there was a strong positive relationship between fuel consumption and emissions in 2022, and the relationship stayed consistent throughout engine size and cylinder size (with the larger sized values consuming more fuel and having higher emission levels). While the relationship amongst the two categories was very stable, there were a couple outlying points that did not fall on the line of points, especially seen where the engine size was between 4 and 6. When looking into these outlying points, we found that they were all pick-up truck FFVs (flexible fuel vehicles), meaning that they have an internal combustion engine and are capable of operating on gasoline and any blend of gasoline and ethanol up to 83%. FFVs are better for the environment as many FFVs run on ethanol, which is sustainably produced from ingredients such as cane sugar and corn and burns cleaner than regular gasoline. This means FFVs generally have lower emission levels than regular vehicles, so they emit fewer greenhouse gases than regular cars, making these vehicles a more environmentally friendly alternative compared to vehicles that run on traditional gasoline.
We decided to create a scatter plot to answer this first question, as it would best assist in exploring the relationship between two variables of interest (fuel consumption and emissions). The scatter plot also allowed us to use facets to separate the large plot and split it into categories based on engine size, which gave a closer look into the relationship of fuel consumption and emissions for each engine size grouping. Since we were also looking at the cylinder size of the vehicles, with the scatter plot, we were able to change the color of each vehicle point to categorize the dots by cylinder size and explore those relationships.
### Discussion
As seen in the main graphs and this particular line graph, CO2 emissions and fuel consumption mirrors each other. However, this particular graph shows the CO2 emissions and Fuel consumption for Toyota Prius models over time (2001-2022). Over the first decade, Prius emissions and fuel consumption trend steadily downwards. However, around 2014 to 2015, there was a dramatic spike in Prius CO2 emissions and fuel consumption. Even after 2017, despite both CO2 emissions and fuel consumption having decreased somewhat, both factors are far away from the best CO2 emissions/fuel consumption numbers (from early 2000s to early 2010s).
A possible explanation to the sudden spike could be faulty evaporative emission control units. According to a 2016 Toyota recall, various models from 2006 to 2015 (including the Prius) had evaporative emission control systems that were prone to cracking. This results in fuel leaks (increase in fuel consumption). Additionally, issues with the emission control units lead to high gas fume leakage (increase in CO2 emissions). This is a plausible explanation since the years of the faulty evaporative emission control units in Toyota models overlap with the years of spiking in the graph.
A line graph is the best option to visualize the data because the data is measured over time (around 2 decades). The line graph clearly marks the changes in CO2 emissions and fuel consumption through time. A strength of this chart is that it is a simple graph to understand, and the trends in the data provide a springboard for more nuanced analysis. A weakness of this chart is that it is a sizable generalization since it averages all Toyota Prius models from 2001 to 2022.
What is the density distribution of the class of fuel that is consumed the most and is there a clear correlation between that and the carbon emissions emitted?
Since we assume that most cars use regular gas, we predict that there will be a greater number of Fuel Type X (Regular gas) cars that will produce the highest amount of emissions. This is simply based on probability density because we expect there to be more cars using regular gas. We also predict that as fuel consumption increases, emissions will decrease.
Scale for 'fill' is already present. Adding another scale for 'fill', which
will replace the existing scale.
It appears that the better a car’s fuel consumption across all fuel types, the less emissions they generate. We can reasonably conclude that this trend exists because highways allow cars to be more fuel efficient since there are less stoppages, whereas driving in the city results in people needing to wait at red lights for example.
In the violin plot, the regular and premium fuel types are fairly evenly distributed whereas ethanol and diesel were more densely populated around their respective median values of roughly 28 and 16. One of the features of the violin plot is that it shows the maximum and minimum values of the fuel consumption for a specific fuel type. Moreover, the peaks in the fuel consumption for regular and premium gas are greater than that of diesel and ethanol gas. This may be due to small sample sizes, but diesel and ethanol do not seem to vary significantly in their fuel consumption. Despite the width of the violin plot being adjusted effectively, it is important to highlight that the width is more sensitive to change, especially with fewer data points. For example, the diesel gas violin plot is relatively wider in the region surrounding the median, but there aren’t that many vehicles to normalize this spread, so the width is being over emphasized in the violin chart. Similarly, the same observation can be made about the ethanol gas because it is more concentrated below the median. Since other charts have more sampling, the violin chart is less sensitive to being excessively wide in one area.
In the jittered boxplot, we noticed there was much much more sampling for vehicles that use regular and premium gas as opposed to ones that use diesel and ethanol. This could be viewed as a limitation in our analysis of the data since there is less sampling of two of the fuel types. However, we found it to be sufficient sampling to be able to determine the amount of emissions emitted. Additionally, the plot shows the median fuel consumption to be very similar for all fuel types except for ethanol. This implies that because of the smaller sample size with ethanol vehicles, there may be a level of uncertainty that needs to be explored more. Since there are inherently more regular and premium vehicles, we predict there to be a greater number of regular and premium cars that will produce the highest amount of emissions. We can infer that many of the vehicles that produce emissions in the regular and premium categories and are below the median, represent the fuel consumed in cities. This justifies our prediction of there being a greater number of vehicles that will produce emissions in the regular and premium categories. Because there is a greater number of vehicles that use regular and premium gas, this allows for there to be more outliers in the data. The boxplot does a great job at capturing the outliers, as the difference in the minimum and maximum is significant.
What is the density distribution of the class of fuel that is consumed the most and is there a clear correlation between that and the carbon emissions emitted per class?
At first glance, the jittered boxplot depicts density, standard error, median, minimum/maximum, and relationship between the fuel classes for each vehicle in our data set. For this reason, we found this to be effective in illustrating where the majority of the vehicles lie and how they compare in respect to the median, which is the best measure for central tendency. Furthermore, the boxplot defines the standard error in the data, allowing the reader to see the statistical spread of the vehicles per class. In addition, including the color scaling of the emissions added an element to the chart that allowed it to visibly display the changes per Fuel Type.
To avoid overplotting, we opted to create two graphs to display density rather than one. We could have created a half violin, half scatter plot but that would not have been able to fully capture the summary statistics that we wanted to show. Breaking the plots into two allows us to convey more information without overwhelming the audience.
Additionally, we found our complimentary descriptive graph, the boxplot within the violin plot, to be the best measure for density. The width on the violin plot was adjusted effectively to display the concentration of the data points in respect to the median. Since the boxplot is also within the violin plot, this puts in perspective the maximum and minimum of the data, allowing both plots to work jointly.
Question 1:
While there seems to be a very apparent trend between fuel consumption and emissions amongst vehicles, there were also, surprisingly, a couple outlying points that did not fall under this direct relationship. Looking into these outlying vehicles, we noticed they were all pickup truck FFVs (fuel-flexible vehicles). In recent years, new technology has allowed for trucks to reduce fuel consumption and improve efficiency without losing out on performance or horsepower. Seeing how FFVs have lower emissions for the same amount of fuel consumption compared to other vehicles, it may be useful in researching how other vehicles could switch to ethanol or diesel, as it is cleaner for the environment, sustainably produced, and can even improve performance/horsepower.
While exploring the trend between fuel consumption and Carbon dioxide emissions, we focused on a case study for Toyota Prius’, specifically for its fuel efficiency. Looking at the data, there were surprising trends that contradicted our initial prediction (CO2 emissions and fuel consumption should generally decrease over the two decade span). In fact, the drastic spike occurs midway through the span of time. After researching Toyota recalls, a faulty control emissions system seems to help explain the spike. Seeing how renowned fuel-efficient cars can also have faulty systems, it is equally important to improve on present design as well as research innovative eco-friendly car ideas.
Question 2:
The clearest trend shown by the boxplot graph is that for all fuel classes, the higher a car’s mileage the less emissions it creates. This trend seems intuitive as highways are designed for cars to be more fuel efficient as compared with cities or areas immediately surrounding cities. Referring to our original prediction that the regular gas fuel class will have the most cars that produce the highest emissions, we can say from our sample that the premium fuel class contains more cars that produce the most emissions, although not by a significant amount.
We found vehicles that operate on regular and premium gas to be the optimal representation of our analysis simply due to the large sample size. A larger sample size allows for more certainty, and it will always inherently favor regular and premium gas vehicles since there are more of those on the road. Overall, our plots accurately convey the density of our data points while displaying the inverse relationship between a car’s fuel consumption and the emissions it gives off.