library(tidyverse)
library(knitr)
data.frame(Variable_Names = names(mpg)) %>%
knitr::kable(
caption = "Variable Names in mpg Dataset"
)| Variable_Names |
|---|
| manufacturer |
| model |
| displ |
| year |
| cyl |
| trans |
| drv |
| cty |
| hwy |
| fl |
| class |
Eagels Emily Johnson, Michael Davis, Sophia Martinez
Knowing the factors that affect Miles Per Gallon (MPG) is important because it helps consumers lower fuel expenses, supports automakers in improving efficiency, and contributes to reducing environmental impacts such as carbon emissions.
It also allows drivers to make smarter vehicle purchasing and maintenance decisions.
Studying MPG helps identify factors that improve vehicle efficiency.
The goal of this project is to explore the factors that could impact MPG.
We obtained the mpg dataset from R (tidyverse), which contains 234 records and 11 variables describing fuel economy and characteristics of various vehicles. The following code the the variables (columns) in mpg. hwy and cty represent mpg.
library(tidyverse)
library(knitr)
data.frame(Variable_Names = names(mpg)) %>%
knitr::kable(
caption = "Variable Names in mpg Dataset"
)| Variable_Names |
|---|
| manufacturer |
| model |
| displ |
| year |
| cyl |
| trans |
| drv |
| cty |
| hwy |
| fl |
| class |
There are two MPG-related variables in this dataset: highway MPG (hwy) and city MPG (cty). To provide a more comprehensive measure of fuel efficiency, we create a new variable by combining these two values.
mpg <- mpg %>%
mutate(mpg = (cty + hwy) / 2)You can interact with the data using the search box, such as limiting results to Audi or a chosen year.
library(DT)
datatable(mpg)This project analyzes the data across three levels:
MPG analysis
Relationships between MPG and numeric variables
A deeper exploration of MPG with additional variables to highlight drivetrain types.
To better understand the MPG variables, we summarize the dataset by reporting the total number of records and variables, along with the average, minimum, and maximum values for overall, highway, and city MPG. We also examine their distributions. The results show that the average overall, highway, and city MPG are approximately 20, 23, and 16, respectively, indicating that highway MPG is higher than city MPG.
library(tidyverse)
library(knitr)
Row1 <- mpg %>%
summarise(
Records = n(),
Variables = ncol(.),
Avg = mean(mpg),
Min = min(mpg),
Max = max(mpg),
Med = median(mpg)
)
Row2 <- mpg %>%
summarise(
Records = n(),
Variables = ncol(.),
Avg = mean(hwy),
Min = min(hwy),
Max = max(hwy),
Med = median(hwy)
)
Row3 <- mpg %>%
summarise(
Records = n(),
Variables = ncol(.),
Avg = mean(cty),
Min = min(cty),
Max = max(cty),
Med = median(cty)
)
bind_data_rows <- data.frame(rbind(Row1,Row2,Row3))
rownames(bind_data_rows) <- c("Overall MPG", "Highway MPG","City MPG")
bind_data_rows%>%
knitr::kable(
caption = "Summary of Overal MPG Dataset",
digits = 2
)| Records | Variables | Avg | Min | Max | Med | |
|---|---|---|---|---|---|---|
| Overall MPG | 234 | 12 | 20.15 | 10.5 | 39.5 | 20.5 |
| Highway MPG | 234 | 12 | 23.44 | 12.0 | 44.0 | 24.0 |
| City MPG | 234 | 12 | 16.86 | 9.0 | 35.0 | 17.0 |
library(plotly)
library(ggplot2)
plot_ly(
data = mpg,
x = ~mpg,
type = "histogram",
nbinsx = 10
)%>%
layout(
xaxis = list(title = "Overall MPG"),
yaxis = list(title = "Freuqency")
)plot_ly(
data = mpg,
x = ~hwy,
type = "histogram",
nbinsx = 10
)%>%
layout(
xaxis = list(title = "Highway MPG"),
yaxis = list(title = "Freuqency")
)plot_ly(
data = mpg,
x = ~cty,
type = "histogram",
nbinsx = 10
)%>%
layout(
xaxis = list(title = "City MPG"),
yaxis = list(title = "Freuqency")
)This analysis investigates the relationship between MPG, engine displacement, and the number of cylinders. In all cases, a clear negative relationship is observed, indicating that vehicles with larger engines and more cylinders tend to have lower fuel efficiency.
library(patchwork)
p1 <- ggplot(mpg, aes(x = displ, y = mpg)) +
geom_point(color = "steelblue", size = 2) +
geom_smooth(color = "red", se = FALSE) +
labs(
title = "Average MPG vs Engine Displacement",
x = "Engine Displacement",
y = "Average MPG"
) +
theme_minimal()
p2 <- ggplot(mpg, aes(x = cyl, y = mpg)) +
geom_point(color = "darkgreen", size = 2) +
geom_smooth(color = "red", se = FALSE) +
labs(
title = "Average MPG vs Cylinders",
x = "Cylinders (cyl)",
y = "MPG"
) +
theme_minimal()
p3 <- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(color = "steelblue", size = 2) +
geom_smooth(color = "red", se = FALSE) +
labs(
title = "Highway MPG vs Engine Displacement",
x = "Engine Displacement",
y = "Highway MPG"
) +
theme_minimal()
p4 <- ggplot(mpg, aes(x = cyl, y = hwy)) +
geom_point(color = "darkgreen", size = 2) +
geom_smooth(color = "red", se = FALSE) +
labs(
title = "Highway MPG vs Cylinders",
x = "Cylinders",
y = "Highway MPG"
) +
theme_minimal()
p5 <- ggplot(mpg, aes(x = displ, y = cty)) +
geom_point(color = "steelblue", size = 2) +
geom_smooth(color = "red", se = FALSE) +
labs(
title = "City MPG vs Engine Displacement",
x = "Engine Displacement",
y = "City MPG"
) +
theme_minimal()
p6 <- ggplot(mpg, aes(x = cyl, y = cty)) +
geom_point(color = "darkgreen", size = 2) +
geom_smooth(color = "red", se = FALSE) +
labs(
title = "City MPG vs Cylinders",
x = "Cylinders",
y = "City MPG"
) +
theme_minimal()
(p1 + p2)/(p3 + p4)/(p5 + p6)This analysis can help to understand the changes of MPGs and engine displacement thorugh driving types. The results show that:
Front-wheel drive and 4-Wheel drive: First, there is a negative relationship between MPG and engine displacement, followed by a more stable pattern at higher displacement levels.
Rear-Wheel Drive: First, there is a negative relationship between MPG and engine displacement, followed by a positive relationship at higher displacement levels.
label_info <- mpg |>
group_by(drv) |>
arrange(desc(displ)) |>
slice_head(n = 1) |>
mutate(
drive_type = case_when(
drv == "f" ~ "front-wheel drive",
drv == "r" ~ "rear-wheel drive",
drv == "4" ~ "4-wheel drive"
)
) |>
select(displ, hwy, drv, drive_type)
p1 <- mpg |>
ggplot(aes(x = displ, y = hwy, color = drv)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
labs(
title = "Highway MPG vs Engine Displacement",
x = "Engine Displacement",
y = "Highway MPG",
color = "Type of Drivetrain"
)+
theme(legend.position = "none")
p2 <- mpg |>
ggplot(aes(x = displ, y = cty, color = drv)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
labs(
title = "City MPG vs Engine Displacement",
x = "Engine Displacement",
y = "City MPG",
color = "Type of Drivetrain"
)+
theme(legend.position = "none")
p3 <- mpg |>
ggplot(aes(x = displ, y = mpg, color = drv)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
labs(
title = "MPG vs Engine Displacement",
x = "Engine Displacement",
y = "MPG",
color = "Type of Drivetrain"
)+scale_color_discrete(
labels = c(
"4" = "4-Wheel Drive",
"f" = "Front-Wheel Drive",
"r" = "Rear-Wheel Drive"
)
)+
theme(legend.position = "bottom")
(p1+p2)/p3This analysis examines the drivetrain categories of outliers based on MPG and engine displacement. The first step in identifying outliers is determining threshold values. The boxplots suggest approximate cutoffs of 40 for highway MPG, 30 for city MPG, and 35 for overall MPG. However, no clear outliers are observed for engine displacement.
library(patchwork)
p1 <- ggplot(mpg, aes(x = mpg)) +
geom_boxplot(fill = "steelblue", color = "black") +
labs(
title = "Boxplot of Overall MPG",
x = ""
)
p2<- ggplot(mpg, aes(x = hwy)) +
geom_boxplot(fill = "steelblue", color = "black") +
labs(
title = "Boxplot of Highway MPG",
x = ""
) +
scale_x_continuous(limits = c(0, 60))
p3<- ggplot(mpg, aes(x = cty)) +
geom_boxplot(fill = "steelblue", color = "black") +
labs(
title = "Boxplot of City MPG",
x = ""
) +
scale_x_continuous(limits = c(0, 60))
p4 <- ggplot(mpg, aes(x = displ)) +
geom_boxplot(fill = "steelblue", color = "black") +
labs(
title = "Boxplot of Engine Displacement",
x = ""
)
p1/p2/p3/p4The final analysis highlights outliers across different MPG measures. The plots show that these outliers are concentrated in the front-wheel drive category, suggesting that vehicles in this group tend to achieve unusually high fuel efficiency compared to others.
library(ggrepel)
potential_outliers <- mpg |>
filter(mpg >= 35)
ggplot(mpg, aes(x = displ, y = mpg)) +
geom_point(color = "black") +
geom_point(
data = potential_outliers,
aes(color = drv),
size = 3
) +
scale_y_continuous(limits = c(0, 50))+
geom_text_repel(
data = potential_outliers,
aes(label = model, color = drv),
show.legend = FALSE
) +
labs(
title = "Outliers by Drivetrain",
x = "Engine Displacement",
y = "Overall MPG",
color = "Type of Drivetrain"
) +
scale_color_discrete(
labels = c(
"4" = "4-Wheel Drive",
"f" = "Front-Wheel Drive",
"r" = "Rear-Wheel Drive"
)
) +
theme_minimal() +
theme(legend.position = "bottom")library(ggrepel)
potential_outliers <- mpg |>
filter(hwy >= 40)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(color = "black") +
geom_point(
data = potential_outliers,
aes(color = drv),
size = 3
) +
scale_y_continuous(limits = c(0, 50))+
geom_text_repel(
data = potential_outliers,
aes(label = model, color = drv),
show.legend = FALSE
) +
labs(
title = "Outliers by Drivetrain",
x = "Engine Displacement",
y = "Highway MPG",
color = "Type of Drivetrain"
) +
scale_color_discrete(
labels = c(
"4" = "4-Wheel Drive",
"f" = "Front-Wheel Drive",
"r" = "Rear-Wheel Drive"
)
) +
theme_minimal() +
theme(legend.position = "bottom")library(ggrepel)
potential_outliers <- mpg |>
filter(cty >= 30)
ggplot(mpg, aes(x = displ, y = cty)) +
geom_point(color = "black") +
geom_point(
data = potential_outliers,
aes(color = drv),
size = 3
) +
scale_y_continuous(limits = c(0, 50))+
geom_text_repel(
data = potential_outliers,
aes(label = model, color = drv),
show.legend = FALSE
) +
labs(
title = "Outliers by Drivetrain",
x = "Engine Displacement",
y = "City MPG",
color = "Type of Drivetrain"
) +
scale_color_discrete(
labels = c(
"4" = "4-Wheel Drive",
"f" = "Front-Wheel Drive",
"r" = "Rear-Wheel Drive"
)
) +
theme_minimal() +
theme(legend.position = "bottom")Thanks for visiting our page!
emily.johnson@kennesaw.edu
michael.davis@kennesaw.edu
sophia.martinez@kennesaw.edu