Roughly 100 million barrels of crude oil are burned daily. Of that 100 million, around 37 to 40 million are refined into gasoline and diesel specifically for vehicle consumption. The U.S. is one of the biggest contributors to gas consumption in the global market, burning an estimated 375 to 391 million gallons of gasoline. This makes up around 35% of the world’s daily consumption of vehicle fuel.
During the analysis of vehicles.csv, I examined two different research questions. First, I analyzed if the number of cylinders in an engine affects highway miles per gallon (MPG) using an interactive boxplot in R. Secondly, I created an interactive line graph in Tableau which examines if the average highway fuel miles per gallon (MPG) over a 42-year timeframe is impacted by fuel type. The fuel types being analyzed are premium gasoline, regular gasoline, diesel, and electrically charged vehicles.
In this analysis, I will evaluate the dataset “vehicles.csv,” which was collected from FuelEconomy.Gov. Data collection and examination were performed through laboratory testing by the U.S. Environmental Protection Agency (EPA) and various vehicle manufacturers. This dataset contains information about many vehicle characteristics—84 different characteristics in total. Some variables include model year, transmission type, cylinder amount, vehicle class, city08, and highway08.
The reason I chose this dataset is because many Americans travel an average of 27.2 minutes each day for a one-way work commute. This is equivalent to maintaining a speed of 65 mph on a highway, which would cover 29.5 miles, or traveling through a busy city/traffic scenario at 35 mph, covering only 15.9 miles each day. My research question is: Do the number of cylinders affect highway miles per gallon (MPG)? If so, by how much? Through this analysis, I will explore…
# library(readr) unloads the readr package that allows users to read rectangular and tabular data quickly# library(ggplot2) unloads packages such as ggplot which helps create graphs# library(dplyr) unloads the dplyr package which helps with using data manipulation tools# library(plotly) uploads interactive tolls used to create interactive graphslibrary(readr)library(ggplot2)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
# Data cleaning and filtering begins at filtered_vehicles and ends at the second mutate# read_csv is a tool apart of the readr package it is not to be confused with read.csv. Which is a built in r reader.# %>% a block of code used to perform a task# mutate() allows to create, delete, modify columns in a datasetvehicles <-read_csv("vehicles.csv")
Rows: 49995 Columns: 84
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (23): drive, eng_dscr, fuelType, fuelType1, make, model, mpgData, trany,...
dbl (59): barrels08, barrelsA08, charge120, charge240, city08, city08U, city...
lgl (2): phevBlended, tCharger
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# library(plotly) unloads potly package# ggplot() creates blank and readt to set up graph# aes() short for aesthetics collects variables selected by creator # geom_boxplot)() creates boxplot# scale_x_discrete() allows to drop and add specific categories of data # scale_fill_manual() allows you to choose manually your custom color pallete# labs() creates labels sucha s x,y, and title of graph/ boxplot# theme_minimal() does the final clean ups for boxplots/ graphs makes sure data is hsown clean and tidy# gplotly()library(plotly)my_plot <-ggplot(filtered_vehicles, aes(x = cylinders_factor, y = highway08, fill = cylinders_factor)) +geom_boxplot(alpha =0.7, outlier.alpha =0.4)+scale_x_discrete() +scale_fill_manual(values =c("pink", "pink", "pink", "pink", "pink", "pink", "pink", "pink","pink")) +labs(title ="The number of car cylinders impacting highway efficiency",subtitle ="Boxplots ordered from highest median highway MPG to lowest",x ="Number of Cylinders (Ordered by Descending Efficiency)",y ="Highway Mileage (MPG)",caption ="Source: U.S. Department of Energy Fuel Economy Data",fill ="Cylinder Count" ) +theme_minimal() ggplotly(my_plot)
The interactive boxplot depicted that highway miles per gallon (MPG) is heavily impacted by the number of cylinders in a vehicle’s engine. Vehicles with 3-cylinder engines average 36.00 miles per gallon, and vehicles with 4 cylinders get around 29.00 miles per gallon. Furthermore, the drop in average miles per gallon (MPG) is not too major until you reach vehicles with 12 cylinders. On average, a vehicle with 12 cylinders receives an MPG of 10.00 miles per gallon.
# lm)() linear module# summary() summarises daat# print() executes requestmodel <-lm(highway08 ~ cylinders + displ, data = filtered_vehicles)model_summary <-summary(model)print(model_summary)
Call:
lm(formula = highway08 ~ cylinders + displ, data = filtered_vehicles)
Residuals:
Min 1Q Median 3Q Max
-15.477 -2.810 -0.329 2.522 30.153
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.32831 0.07019 503.323 < 2e-16 ***
cylinders -0.18579 0.02628 -7.069 1.59e-12 ***
displ -2.96152 0.03456 -85.696 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.353 on 48476 degrees of freedom
Multiple R-squared: 0.4936, Adjusted R-squared: 0.4936
F-statistic: 2.363e+04 on 2 and 48476 DF, p-value: < 2.2e-16
# adj_r_squared# print() executes request# round() rounds to the nearest hundrethsadj_r_sqaured <- model_summary$adj.r.squared print(paste("Adjusted R-Squared:", round(adj_r_sqaured, 4)))
[1] "Adjusted R-Squared: 0.4936"
# par() parameter funxtion# mfrow multi framw function# plot() automatically plot four regression diagnostic plotspar(mfrow =c(2, 2)) plot(model)
The interactive line graph created using Tableau contains three variables. Two quantitative variables, year and highway08, represent average fuel economy (MPG). The third variable, a qualitative variable, is fuel type. This examines the four most commonly used vehicle fuels: premium gasoline, regular gasoline, diesel, and electric vehicle. Through this examination, there is a visible growth in higher average miles per gallon (MPG) for vehicles that operate on electricity. The data also depicts that vehicles operating on regular gas, on average, have higher miles per gallon averages compared to premium gas users. As of 2026, regular gas users reach averages of 29.15 miles per gallon (MPG) compared to premium users only getting 25.79 miles per gallon (MPG).
Conclusion
The data analysis collected from this experiment, constructed in R-script, depicts that vehicles with fewer cylinders, on average, can go further on highway miles per gallon (MPG). However, vehicles with four-cylinder engines can almost be on par with three-cylinder engines. As depicted on the box plot, there are a great number of outliers found from data collected from 1984 to 2026. Additionally, the line graph created on Tableau illustrates the growth of electric vehicle efficiency. As of 2026, the average highway miles per gallon (MPG) for an electric vehicle is 87.66 MPG. This is outstanding compared to premium gas at 25.79 MPG and diesel at 24.86 MPG.
If I had additional time, I would like to create a graph examining if there is any correlation between transmission and miles per gallon (MPG). I would examine specific transmissions such as 6-speed manual transmission, 10-speed automatic transmission, and CVT transmission. I furthermore would like to examine if different fuel types such as E85, E87, E89,E91, and E93 have any beneficial or detrimental effect on fuel injectors.
Overall, the analysis and evaluation done using R-script and Tableau were fantastic in answering both research questions.