Mastering Histograms: From Base R to Professional ggplot2

A Systemic Journey Through the Grammar of Graphics

Author

Abdullah Al Shamim

Published

February 7, 2026

Introduction

Visualizing the distribution of data is the foundational step in Exploratory Data Analysis (EDA). In this guide, we will compare the traditional Base R approach with the modern, layered architecture of ggplot2 to see how we can transform a simple plot into a publication-ready visualization.


1. Basic Histogram (Base R Graphics)

Base R provides a quick, “no-frills” way to generate plots without additional libraries. This is excellent for rapid data inspection during the initial phase of analysis.

Code
# Default histogram using base R
hist(iris$Sepal.Length, 
     breaks = 20, 
     col = "purple", 
     xlab = "Sepal Length", 
     ylab = "Frequency", 
     main = "Histogram of Sepal Length (Base R)")

Explanation: In Base R, we use a single function hist(). While efficient, it is less flexible for complex layering and customization compared to modern frameworks.


2. The ggplot2 Way: A Step-by-Step Transformation

The power of ggplot2 lies in its layered logic, known as the Grammar of Graphics. We build our final plot by stacking independent layers on top of each other.

Step 1: Foundation & Aesthetics

We start by defining the data and mapping the internal color (fill) and transparency (alpha).

Code
library(tidyverse)

iris %>% 
  ggplot(aes(x = Sepal.Length)) +
  geom_histogram(fill = "purple", alpha = 0.3, color = "black")

Step 2: Binwidth and Labeling

Adjusting the binwidth is the most critical part of a histogram; it controls the granularity of your distribution. We also add descriptive labels using labs().

Code
iris %>% 
  ggplot(aes(x = Sepal.Length)) +
  geom_histogram(fill = "purple", alpha = 0.2, color = "black", binwidth = 0.2) +
  labs(title = "Histogram of Sepal Length", 
       x = "Sepal Length (cm)", 
       y = "Frequency")

Step 3: Data Filtering and Theming

In this step, we use the Pipe operator (%>%) to filter the data before it reaches the plot. We also apply theme_test() to remove the gray background for a cleaner look.

Code
iris %>% 
  filter(Sepal.Length > 5) %>% 
  ggplot(aes(x = Sepal.Length)) +
  geom_histogram(fill = "purple", alpha = 0.2, color = "black", binwidth = 0.2) +
  theme_test() +
  labs(title = "Filtered Histogram (> 5cm)", 
       x = "Sepal Length", 
       y = "Frequency")


3. The Final Professional Plot

To make our graph “Publication Ready,” we center the title and bold the text using the theme() function.

Code
iris %>% 
  filter(Sepal.Length > 5) %>% 
  ggplot(aes(x = Sepal.Length)) +
  geom_histogram(fill = "purple", alpha = 0.2, color = "black", binwidth = 0.2) +
  labs(title = "Distribution of Sepal Length", 
       subtitle = "Cleaned and Formatted Histogram",
       x = "Sepal Length", 
       y = "Frequency",
       caption = "Dataset: Iris | Prepared by Abdullah Al Shamim") +
  theme_test() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
    plot.subtitle = element_text(hjust = 0.5),
    axis.title = element_text(face = "bold")
  )


Systemic Cheat Sheet

Parameter Function Systemic Purpose
fill Interior Color Defines the visual theme of the bars.
color Border Color Distinguishes individual bins.
alpha Transparency Softens the visual impact (range 0-1).
binwidth Data Grouping Controls the statistical resolution.
hjust = 0.5 Centering Ensures professional alignment.

Congratulations! You have successfully transitioned from a basic R plot to a layered, professional ggplot2 visualization.

```