I will walk through a small data exploration project using the built-in mpg dataset from ggplot2.

I will create a simple derived variable for average MPG, summarize MPG by car class with dplyr, and make a basic plots with ggplot2

First, load the tidyverse

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

The mpg dataset is included with ggplot2

mpg %>%
glimpse()
## Rows: 234
## Columns: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
## $ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
## $ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
## $ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
## $ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
## $ trans        <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto…
## $ drv          <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4…
## $ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
## $ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
## $ fl           <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p…
## $ class        <chr> "compact", "compact", "compact", "compact", "compact", "c…

Key variables we will use are the class: type of car (compact, suv, etc. the cty: city miles per gallon, and the hwy: highway miles per gallon.

cars <- mpg

head(cars, 5)
## # A tibble: 5 × 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…

combine city and highway mileage into a single simple measure

cars <- cars %>%
mutate(
avg_mpg = (cty + hwy) / 2
)

head(cars, 5)
## # A tibble: 5 × 12
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
## # ℹ 1 more variable: avg_mpg <dbl>

Now each row has avg_mpg

Next, we calculate average MPG for each car class.

mpg_by_class <- cars %>%
group_by(class) %>%
summarise(
mean_avg_mpg = mean(avg_mpg, na.rm = TRUE),
mean_hwy_mpg = mean(hwy, na.rm = TRUE),
n = n()
) %>%
arrange(desc(mean_avg_mpg))

mpg_by_class
## # A tibble: 7 × 4
##   class      mean_avg_mpg mean_hwy_mpg     n
##   <chr>             <dbl>        <dbl> <int>
## 1 subcompact         24.3         28.1    35
## 2 compact            24.2         28.3    47
## 3 midsize            23.0         27.3    41
## 4 2seater            20.1         24.8     5
## 5 minivan            19.1         22.4    11
## 6 suv                15.8         18.1    62
## 7 pickup             14.9         16.9    33

We can turn the summary table into a bar chart

ggplot(mpg_by_class, aes(x = reorder(class, mean_avg_mpg), y = mean_avg_mpg)) +
geom_col() +
coord_flip() +
labs(
title = "Average Fuel Efficiency by Car Class",
x = "Car Class",
y = "Mean Average MPG"
)

Each bar is a car class

in addition, boxplots show the spread of MPG within each class

ggplot(cars, aes(x = class, y = hwy)) +
geom_boxplot() +
labs(
title = "Highway MPG Distribution by Car Class",
x = "Car Class",
y = "Highway MPG"
)

This lets us see how variable each class is