Mastering Scatterplots in ggplot2: The Complete Guide

Mastering Correlation and Bivariate Analysis

Author

Abdullah Al Shamim

Published

February 9, 2026

Introduction

Scatterplots are the foundational tool for visualizing the relationship between two continuous variables. By observing the distribution of points, we can identify correlations, detect outliers, and uncover hidden clusters within our data.


1. Environment Setup

We use the tidyverse library for data manipulation and plotting.

Code
library(tidyverse)

2. Basic to Customized Scatterplots

The Core Geometry

A basic scatterplot maps two continuous variables. In this example, we examine the relationship between Sepal Length and Sepal Width from the iris dataset.

Code
iris %>% 
  ggplot(aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  labs(title = "Basic Scatterplot",
       x = "Sepal Length (cm)",
       y = "Sepal Width (cm)")

Aesthetic Customization

To make points more distinct, we can use specific shapes that support both color (outline) and fill (interior).

Code
iris %>% 
  ggplot(aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(size = 3, 
             shape = 21, 
             color = "black", 
             fill = "salmon", 
             alpha = 0.7) + 
  labs(title = "Customized Scatterplot: Fill, Color, and Alpha")


3. Encoding Information (Color & Size)

Scatterplots can represent more than just two variables by mapping additional data to Color and Size.

Using color to represent a third categorical variable helps visualize how different groups cluster.

Code
iris %>% 
  ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  scale_color_brewer(palette = "Set1") + 
  labs(title = "Clustering by Species")

By mapping size to a third numerical variable, you create a “Bubble Chart,” allowing you to see three dimensions of data at once.

Code
iris %>%
  ggplot(aes(x = Sepal.Length, y = Sepal.Width, size = Petal.Length)) +
  geom_point(alpha = 0.6) +
  labs(title = "Size Encodes Petal Length")


5. Advanced Publication-Ready Plot

Combining all systemic elements: manual color scales, centered titles, and statistical overlays.

Code
iris %>%
  ggplot(aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(aes(color = Species, shape = Species), size = 3, alpha = 0.8) +
  geom_smooth(method = "lm", color = "black", linetype = "dashed", se = FALSE) +
  scale_color_manual(values = c("setosa" = "#A88EF2", 
                                "versicolor" = "#FBB222", 
                                "virginica" = "#4CAF50")) +
  labs(title = "Advanced Scatterplot Analysis",
       subtitle = "Sepal Dimensions with Linear Trends across Species",
       x = "Sepal Length (cm)",
       y = "Sepal Width (cm)",
       caption = "Dataset: Iris | Prepared by Abdullah Al Shamim") +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = 14))


Systemic Summary Checklist

Aesthetic Mapping Usage Variable Type
Color Grouping or Intensity Categorical or Continuous
Size Weight or Magnitude Continuous
Shape Differentiation Categorical
Alpha Managing Overplotting Numerical (0-1)