This project will look to see how fuel types compare in efficiency.
The data used for this project was obtained through the
fueleconomy library in R and was published by the EPA,
ranging from 1985-2015.
# Store data in environment
vehicles <- fueleconomy::vehicles
highway <- vehicles[, c("fuel", "hwy")]
# View the structure of dataset
str(highway)
## Classes 'tbl_df', 'tbl' and 'data.frame': 33442 obs. of 2 variables:
## $ fuel: chr "Regular" "Regular" "Regular" "Regular" ...
## $ hwy : num 26 28 26 27 29 26 27 29 26 23 ...
# Preview of first few lines of data
head(highway)
## fuel hwy
## 1 Regular 26
## 2 Regular 28
## 3 Regular 26
## 4 Regular 27
## 5 Regular 29
## 6 Regular 26
The dataset contains 33,442 observations of cars. In these observations, 12 variables were recorded. There is no missing data in the dataset.
# Summary of data
summary(highway)
## fuel hwy
## Length:33442 Min. : 9.00
## Class :character 1st Qu.: 19.00
## Mode :character Median : 23.00
## Mean : 23.55
## 3rd Qu.: 27.00
## Max. :109.00
table(highway$fuel)
##
## CNG Diesel
## 58 874
## Electricity Gasoline or E85
## 55 1043
## Gasoline or natural gas Gasoline or propane
## 18 8
## Midgrade Premium
## 43 8617
## Premium and Electricity Premium Gas or Electricity
## 1 7
## Premium or E85 Regular
## 88 22622
## Regular Gas and Electricity
## 8
Fuel economy of the highway cars ranges from 9 MPG to 109 MPG. The fuel types range from Regular, Premium, CNG, Hybrid, and Electric.
#Cleaning the data
highway$fuel_group <- "Gasoline"
highway$fuel_group[highway$fuel=="Diesel"] <- "Diesel"
highway$fuel_group[highway$fuel=="Electricity"] <- "Electric"
highway$fuel_group[highway$fuel == "Regular Gas and Electricity"] <- "Hybrid"
highway$fuel_group[highway$fuel == "Premium and Electricity"] <- "Hybrid"
highway$fuel_group[highway$fuel == "Premium Gas or Electricity"] <- "Hybrid"
highway$fuel_group[highway$fuel == "CNG"] <- "CNG"
table(highway$fuel_group)
##
## CNG Diesel Electric Gasoline Hybrid
## 58 874 55 32439 16
Due to the number of fuel categories in the dataset, I’ve grouped similar types together to simplify later analysis at a risk of losing specific detail.
To see the relationship between the variables, we should look at a boxplot of the data.
# Exploring data with a boxplot
library(ggplot2)
ggplot(data = highway, aes(x = fuel_group, y = hwy)) +
geom_boxplot() +
labs(x= "Fuel Group", y= "Highway Fuel Economy (MPG)", title= "Highway Fuel Efficiency by Fuel Group")
Based on the data, it appears electric vehicles have the highest highway fuel economy and the widest variation. Hybrid vehicles appear to have a higher highway fuel efficiency than diesel, gas, and CNG vehicles. Outliers in the gasoline and hybrid group suggest that some vehicles in these categories have a substantially different MPG than the average. Overall, the graph suggests fuel types have a significant effect on fuel efficiency.
H0: muG = muD = muCNG = muH = muE
HA: at least one mean highway MPG is different among fuel types
ANOVA will be used to compare the mean highway MPG across a number of fuel categories. Significance level of 0.05. #### ANOVA
# ANOVA testing
anovatest <- aov(hwy~fuel_group, data=highway)
# Summary of our linear model
summary(anovatest)
## Df Sum Sq Mean Sq F value Pr(>F)
## fuel_group 4 180715 45179 1362 <2e-16 ***
## Residuals 33437 1109495 33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value is less than 0.05, we reject the null hypothesis. This means there is sufficient evidence suggesting mean highway fuel efficiency differs among fuel types. Generally, the results make sense when you consider the difference between engine types and fuel systems. For example, electric and hybrid vehicles tend to have higher MPG values than traditional gasoline vehicles because they use their fuel more efficiently.
Potential limitations include the dataset only ranging from 1985-2015 and the failure to consider other possible influences on MPG. I’ll also note that the Hybrid category has a low observation count, increasing the likelihood of a Type II error.
It may be useful in future research to include additional variables that may affect fuel efficiency; for example, vehicle weight, engine size, or make/model. If the study were to be recreated using data from newer vehicles, I’d expect the results to remain the same. However, I’d expect the difference between fuel type and fuel efficiency to become greater due to significant changes in technology.
This document was produced as a final project for MAT 143H -
Introduction to Statistics (Honors) at North Shore Community
College.
The course was led by Professor Billy Jackson.
Student Name: Jada Perez Semester: Spring 2026