The Grammar of Graphics

A Systemic Guide to ggplot2

Author

Abdullah Al Shamim

Published

January 22, 2026

Introduction

In this tutorial, we will learn how to transform raw data into meaningful visual stories using ggplot2, the most powerful visualization tool in R. We will move away from traditional coding styles and embrace the modern Pipe Operator (%>%) to build graphs step-by-step.

Our Goal: To understand every layer of the “Grammar of Graphics” and progress from an unstructured dataset to a publication-quality visualization.

By the end of this guide, you will be able to: * Understand and analyze data structures. * Build graphs layer by layer. * Use colors and facets to reveal complex relationships. * Export professional-level visualizations.


The Concept: Grammar of Graphics

The Grammar of Graphics is a theoretical framework for data visualization. Just as a sentence is composed of a subject, verb, and object, a graph is composed of specific “grammatical” layers.

  1. Data: The foundation. Usually a “Tidy” data frame where every column is a variable and every row is an observation.
  2. Aesthetics (aes): Mapping data variables to visual properties (X-axis, Y-axis, Color, Size).
  3. Geometries (geom): The visual shape of the data (points, bars, lines).
  4. Facets: Splitting data into sub-plots to compare groups.
  5. Statistics (stat): Mathematical transformations (e.g., calculating a mean or a regression line).
  6. Coordinates (coord): The 2D space of the plot (usually Cartesian).
  7. Themes: The non-data ink (fonts, background colors, legend positions).

1. Environment Setup

First, we must load the tidyverse suite, which includes ggplot2 and dplyr.

Code
#install.packages('tidyverse')
library(tidyverse)

2. Data Inspection (The Inquiry Phase)

Before visualizing, we must perform a “Structural Audit.” We will use the mpg data set (fuel economy data from the US EPA).

Code
# Viewing the first 6 rows
mpg %>% head()
# A tibble: 6 × 11
  manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
  <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…
Code
# Checking data types and structure
mpg %>% glimpse()
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi", "…
$ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro", "…
$ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0, 2.…
$ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, 200…
$ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, …
$ trans        <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "auto…
$ drv          <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4", "4…
$ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17, 1…
$ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25, 2…
$ fl           <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p…
$ class        <chr> "compact", "compact", "compact", "compact", "compact", "c…
Code
# Statistical snapshot
mpg %>% summary()
 manufacturer          model               displ            year     
 Length:234         Length:234         Min.   :1.600   Min.   :1999  
 Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
 Mode  :character   Mode  :character   Median :3.300   Median :2004  
                                       Mean   :3.472   Mean   :2004  
                                       3rd Qu.:4.600   3rd Qu.:2008  
                                       Max.   :7.000   Max.   :2008  
      cyl           trans               drv                 cty       
 Min.   :4.000   Length:234         Length:234         Min.   : 9.00  
 1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00  
 Median :6.000   Mode  :character   Mode  :character   Median :17.00  
 Mean   :5.889                                         Mean   :16.86  
 3rd Qu.:8.000                                         3rd Qu.:19.00  
 Max.   :8.000                                         Max.   :35.00  
      hwy             fl               class          
 Min.   :12.00   Length:234         Length:234        
 1st Qu.:18.00   Class :character   Class :character  
 Median :24.00   Mode  :character   Mode  :character  
 Mean   :23.44                                        
 3rd Qu.:27.00                                        
 Max.   :44.00                                        

3. Layers 1 & 2: Data and Aesthetics

The first layer initializes the canvas and maps the variables to the axes.

Code
# This creates a blank canvas with axes, but no data points yet
mpg %>%
  ggplot(aes(x = displ, y = hwy))

4. Layer 3: Geometry (Adding Shapes)

We add the geom_point() layer to create a Scatter Plot.

Code
mpg %>%
  ggplot(aes(x = displ, y = hwy)) +
  geom_point()

5. Layer 4: Color Mapping & Sizing

To see more detail, we map the number of engine cylinders (cyl) to color, increase the point size and color transparency.

Code
mpg %>%
  ggplot(aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point()

Code
mpg %>%
  ggplot(aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point(size = 3)

Code
mpg %>%
  ggplot(aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point(size = 3, alpha = 0.4)

6. Layer 5: Faceting (Sub-plots)

If the data is too crowded, we can use facet_wrap to create separate windows for each car class.

Code
mpg %>%
  ggplot(aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point(size = 3, alpha = 0.6) +
  facet_wrap(~ class)

7. Layer 6 & 7: Labels and Themes (Final Polish)

To make the graph professional, we add descriptive labels and a clean theme.

Code
final_plot <- mpg %>%
  ggplot(aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point(size = 3, alpha = 0.7) +
  labs(
    title = "Engine Size vs. Highway Mileage",
    subtitle = "Color-coded by number of cylinders and faceted by car class",
    x = "Displacement (Liters)",
    y = "Highway MPG",
    color = "Cylinders",
    caption = "Source: mpg dataset | Tutorial by Shamim"
  ) +
  theme_bw() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    legend.position = "bottom"
  )

final_plot

8. Saving the Output

Use ggsave() to export your work as a high-resolution image for reports or presentations.

Code
# Save as PNG
ggsave("fuel_efficiency_plot.png", plot = final_plot, width = 8, height = 6, dpi = 300)
# Save as PDF
ggsave("fuel_efficiency_plot.pdf", plot = final_plot)

Systemic Summary Checklist

Component Function Systemic Purpose
Data mpg %>% The source of truth.
Aesthetics aes(x, y) Mapping variables to visual axes.
Geometries geom_point() Choosing the visual representation.
Facets facet_wrap() Creating sub-plots for comparison.
Labels labs() Adding context and titles.
Themes theme_bw() Cleaning up the appearance.

Summary Code Block

For beginners, you can follow this “Universal Formula” to build almost any professional plot in a single, systemic pipeline:

The Master Structure Data %>% ggplot(aes(x, y)) + geom_type() + facet_type() + labs() + theme()

Conclusion: From Coding to Storytelling

By successfully completing this tutorial, you have moved beyond simply “making charts.” You now know how to transform Raw Data into a compelling Visual Story.

You are no longer just typing lines of code; you are architecting a visualization by stacking the Grammar of Graphics, layer by layer, to reveal the hidden insights within your data.