Fuel Economy Analysis: Engine Size and MPG in Modern Vehicles

Author

CJ Schmidt

Introduction

For this project, I selected data from the EPA Fuel Economy dataset, which contains detailed information on vehicle characteristics and fuel efficiency. To look at the cars that are currently on the road, I narrowed down the data to only include cars from 2013 and newer, which aligns with the current average age of vehicles in the United States.

The goal of this project is to explore the relationship between engine characteristics and fuel efficiency. The research question guiding this analysis is:

How do engine displacement and the number of cylinders affect a vehicle’s MPG?

I want to figure out how the two variables affect the miles per gallon a car gets. These two variables are the size of the engine and the number of cylinders it has. To understand this better, I will create two clear, and easy to read graphs that highlight the patterns.

Load Packages

library(tidyverse)
library(MetBrewer)

Load and Prepare the Data

# Load Dataset
cars <- read_csv("data/vehicles.csv")

# Filter and Select
cars_recent <- cars |>
  filter(year >= 2013) |>
  select(
    mpg = comb08,
    cylinders,
    displacement = displ,
    year
  ) |>
  drop_na()

Visualization 1: Displacement vs. MPG

ggplot(cars_recent,
       aes(displacement,
           mpg
           )
       ) +
  geom_point(alpha = 0.4,
             size = 2,
             stroke = 0,
             color = MetBrewer::met.brewer("Hokusai3")[4]
             ) +
  geom_smooth(method = "lm",
              se = FALSE,
              color = MetBrewer::met.brewer("Hokusai3")[1],
              linewidth = 1
              ) +
  labs(title = "Engine Displacement vs. Fuel Efficiency (MPG)",
       x = "Engine Displacement (Liters)",
       y = "MPG"
       ) +
  theme_minimal(base_size = 14) +
    theme(
    panel.background = element_rect(fill = "grey98", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  )

Visualization 2: MPG by Number of Cylinders

ggplot(cars_recent,
       aes(x = factor(cylinders),
           y = mpg
           )
       ) +
  geom_boxplot(fill = "#3498DB",
               alpha = 0.7
               ) +
  labs(title = "Fuel Efficiency Across Cylinder Counts",
       x = "Number of Cylinders",
       y = "MPG"
       ) +
  theme_minimal(base_size = 14) +
  theme(
    panel.background = element_rect(fill = "grey98", color = NA),
    plot.background = element_rect(fill = "white", color = NA)
  )

Interpretation

The scatterplot of displacement versus MPG shows a clear negative relationship: vehicles with larger engines tend to have lower fuel efficiency. The downward trend line reinforces this pattern, suggesting that displacement is a meaningful predictor of MPG.

The boxplot comparing MPG across cylinder groups shows a similar pattern. Vehicles with fewer cylinders generally achieve higher MPG, while those with six or more cylinders show a noticeable lower fuel efficiency. This supports the idea that engine size and configuration play a major role in determining fuel economy.