Overview

This project will look to see how fuel types compare in efficiency.

Introduction

The data used for this project was obtained through the fueleconomy library in R and was published by the EPA, ranging from 1985-2015.

Exploring the Data

# Store data in environment
vehicles <- fueleconomy::vehicles
highway <- vehicles[, c("fuel", "hwy")]

# View the structure of dataset
str(highway)
## Classes 'tbl_df', 'tbl' and 'data.frame':    33442 obs. of  2 variables:
##  $ fuel: chr  "Regular" "Regular" "Regular" "Regular" ...
##  $ hwy : num  26 28 26 27 29 26 27 29 26 23 ...

# Preview of first few lines of data
head(highway)
##      fuel hwy
## 1 Regular  26
## 2 Regular  28
## 3 Regular  26
## 4 Regular  27
## 5 Regular  29
## 6 Regular  26

The dataset contains 33,442 observations of cars. In these observations, 12 variables were recorded. There is no missing data in the dataset.

# Summary of data
summary(highway)
##      fuel                hwy        
##  Length:33442       Min.   :  9.00  
##  Class :character   1st Qu.: 19.00  
##  Mode  :character   Median : 23.00  
##                     Mean   : 23.55  
##                     3rd Qu.: 27.00  
##                     Max.   :109.00
table(highway$fuel)
## 
##                         CNG                      Diesel 
##                          58                         874 
##                 Electricity             Gasoline or E85 
##                          55                        1043 
##     Gasoline or natural gas         Gasoline or propane 
##                          18                           8 
##                    Midgrade                     Premium 
##                          43                        8617 
##     Premium and Electricity  Premium Gas or Electricity 
##                           1                           7 
##              Premium or E85                     Regular 
##                          88                       22622 
## Regular Gas and Electricity 
##                           8

Fuel economy of the highway cars ranges from 9 MPG to 109 MPG. The fuel types range from Regular, Premium, CNG, Hybrid, and Electric.

#Cleaning the data
highway$fuel_group <- "Gasoline"
highway$fuel_group[highway$fuel=="Diesel"] <- "Diesel"
highway$fuel_group[highway$fuel=="Electricity"] <- "Electric"
highway$fuel_group[highway$fuel == "Regular Gas and Electricity"] <- "Hybrid"
highway$fuel_group[highway$fuel == "Premium and Electricity"] <- "Hybrid"
highway$fuel_group[highway$fuel == "Premium Gas or Electricity"] <- "Hybrid"
highway$fuel_group[highway$fuel == "CNG"] <- "CNG"

table(highway$fuel_group)
## 
##      CNG   Diesel Electric Gasoline   Hybrid 
##       58      874       55    32439       16

Due to the number of fuel categories in the dataset, I’ve grouped similar types together to simplify later analysis at a risk of losing specific detail.

To see the relationship between the variables, we should look at a boxplot of the data.

# Exploring data with a boxplot
library(ggplot2)
ggplot(data = highway, aes(x = fuel_group, y = hwy)) +
  geom_boxplot() +  
  labs(x= "Fuel Group", y= "Highway Fuel Economy (MPG)", title= "Highway Fuel Efficiency by Fuel Group")

Based on the data, it appears electric vehicles have the highest highway fuel economy and the widest variation. Hybrid vehicles appear to have a higher highway fuel efficiency than diesel, gas, and CNG vehicles. Outliers in the gasoline and hybrid group suggest that some vehicles in these categories have a substantially different MPG than the average. Overall, the graph suggests fuel types have a significant effect on fuel efficiency.

Analysis

Hypothesis Testing

H0: muG = muD = muCNG = muH = muE

HA: at least one mean highway MPG is different among fuel types

ANOVA will be used to compare the mean highway MPG across a number of fuel categories. Significance level of 0.05. #### ANOVA

# ANOVA testing
anovatest <- aov(hwy~fuel_group, data=highway)

Explained Test

# Summary of our linear model
summary(anovatest)
##                Df  Sum Sq Mean Sq F value Pr(>F)    
## fuel_group      4  180715   45179    1362 <2e-16 ***
## Residuals   33437 1109495      33                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusions

Since the p-value is less than 0.05, we reject the null hypothesis. This means there is sufficient evidence suggesting mean highway fuel efficiency differs among fuel types. Generally, the results make sense when you consider the difference between engine types and fuel systems. For example, electric and hybrid vehicles tend to have higher MPG values than traditional gasoline vehicles because they use their fuel more efficiently.

Limitations and Future Considerations

Potential limitations include the dataset only ranging from 1985-2015 and the failure to consider other possible influences on MPG. I’ll also note that the Hybrid category has a low observation count, increasing the likelihood of a Type II error.

It may be useful in future research to include additional variables that may affect fuel efficiency; for example, vehicle weight, engine size, or make/model. If the study were to be recreated using data from newer vehicles, I’d expect the results to remain the same. However, I’d expect the difference between fuel type and fuel efficiency to become greater due to significant changes in technology.


This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College.
The course was led by Professor Billy Jackson.
Student Name: Jada Perez Semester: Spring 2026