What is a Hull Plot?

The Hull Plot is a visualization that produces a shaded areas around clusters (groups) within our data. It gets the name because of the Convex Hull shape. It’s a great way to show customer segments, group membership, and clusters on a Scatter Plot. We’ll go through a short tutorial to get you up and running with ggforce to make a hull plot.

Hull plots with ggforce

This tutorial showcases the awesome power of ggforce for visualizing distributions. This tutorial wouldn’t be possible without the excellent work of Thomas Lin Pedersen, creator of ggforce. Check out the ggforce package here.

Load the Libraries and Data

Run following code to:

  1. Load Libraries: Load ggforce, tidyquant, and tidyverse.
  2. Import Data: We’re using the mpg dataset that comes with ggplot2.
#--- LIBRARIES ---

library(tidyverse)
library(tidyquant)
library(ggforce)
library(ggthemes)

#--- DATA ---

mpg
## # A tibble: 234 x 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto~ f        18    29 p     comp~
##  2 audi         a4           1.8  1999     4 manu~ f        21    29 p     comp~
##  3 audi         a4           2    2008     4 manu~ f        20    31 p     comp~
##  4 audi         a4           2    2008     4 auto~ f        21    30 p     comp~
##  5 audi         a4           2.8  1999     6 auto~ f        16    26 p     comp~
##  6 audi         a4           2.8  1999     6 manu~ f        18    26 p     comp~
##  7 audi         a4           3.1  2008     6 auto~ f        18    27 p     comp~
##  8 audi         a4 quattro   1.8  1999     4 manu~ 4        18    26 p     comp~
##  9 audi         a4 quattro   1.8  1999     4 auto~ 4        16    25 p     comp~
## 10 audi         a4 quattro   2    2008     4 manu~ 4        20    28 p     comp~
## # ... with 224 more rows

Here’s the mpg dataset. We’ll focus on “hwy” (fuel economy in Miles Per Gallon), “displ” (engine displacement volume in liters), and “cyl” (number of engine cylinders).

Hull plot: Using ggplot

Next, we’ll make a hull plot that highlights the Vehicle Fuel Economy (MPG) for Engine Size (Number of Cylinders and Engine Displacement). It helps if you have ggplot2 visualization experience.

Step 1: Make the Base Scatter Plot

The first step is to make the scatter plot using ggplot2. We:

  1. Prep the Data: Using mutate() to add a descriptive Engine Size column that will display the Number of Cylinders.

  2. Map the columns: Using ggplot(), we map the displ and hwy column.

  3. Make the scatter points: Using geom_point(), we add scatter plot points to our base plot. Refer to the Ultimate R Cheat Sheet and ggplot2 “CS” for more geoms.

mpg %>% 
    mutate(engine_size = str_c("Cylinder: ", cyl)) %>% 
    # making base plot
    ggplot(
        aes(
            x = displ,
            y = hwy
        )
    ) +
    geom_point()

This produces our base plot, which is a scatter plot of displacement vs highway fuel economy.

Step 2: Add the Hull Plot with geom_mark_hull()

Next, we add our hull plot geometry layer using ggforce::geom_mark_hull(). This produces the hull plot shaded regions indicating the groups. We map the descriptive engine size column to the fill and label aesthetics. We adjust the concavity to smooth out the concavity.

mpg %>% 
    mutate(engine_size = str_c("Cylinder: ", cyl)) %>% 
    # making base plot
    ggplot(
        aes(
            x = displ,
            y = hwy
        )
    ) +
    geom_point() +
    # ploting clusters
    geom_mark_hull(
        aes(
            fill = engine_size,
            label = engine_size
        ),
        concavity = 2.5
        # if you get error, install "concaveman" package
    )

And here’s the output. We can see that the hull plot shows the cylinder class membership for the vehicles scatter points. You may get an warning that saying, "concaveman package is required for geom_mark_hull(). then you have to Install the concaveman package.

Step 3: Make the plot look professional

It’s a good idea to spruce up our plot, especially if we are going to present to business stakeholders in a presentation or report. We’ll leverage tidyquant and ggplot for theme customization.

mpg %>% 
    mutate(engine_size = str_c("Cylinder: ", cyl)) %>% 
    # making base plot
    ggplot(
        aes(
            x = displ,
            y = hwy
        )
    ) +
    geom_point() +
    # ploting clusters
    geom_mark_hull(
        aes(
            fill = engine_size,
            label = engine_size
        ),
        concavity = 2.5
        # if you get error, install "concaveman" package
    ) +
    geom_smooth(se = F, span = 1.0) +
    expand_limits(y = c(5, 50), x = c(1, 8)) +
    scale_fill_calc() +
    theme_tq() +
    labs(
        title = "Fuel Economy (mpg) Trends by Engine Size and Displacement",
        x = "Engine Displacement Volume (Liters)",
        y = "Highway Fuel Economy (MPG)",
        fill = "Engine Size",
        caption = "Engine size has a negative relationship to fuel economy."
    )

And here’s the output. We have our final plot that tells the story of how highway fuel economy varies with the vehicle’s number of cylinders and engine displacement volume.

Summary

We learned how to make hull plots with ggforce. But, there’s a lot more to visualization.

It’s critical to learn how to visualize with ggplot2, which is the premier framework for data visualization in R.

If you’d like to learn ggplot2, data visualizations, and data science for business with R,

Stay Tuned With Analyticsfy.