Code
# Default histogram using base R
hist(iris$Sepal.Length,
breaks = 20,
col = "purple",
xlab = "Sepal Length",
ylab = "Frequency",
main = "Histogram of Sepal Length (Base R)")Visualizing the distribution of data is the foundational step in Exploratory Data Analysis (EDA). In this guide, we will compare the traditional Base R approach with the modern, layered architecture of ggplot2 to see how we can transform a simple plot into a publication-ready visualization.
Base R provides a quick, “no-frills” way to generate plots without additional libraries. This is excellent for rapid data inspection during the initial phase of analysis.
# Default histogram using base R
hist(iris$Sepal.Length,
breaks = 20,
col = "purple",
xlab = "Sepal Length",
ylab = "Frequency",
main = "Histogram of Sepal Length (Base R)")Explanation: In Base R, we use a single function hist(). While efficient, it is less flexible for complex layering and customization compared to modern frameworks.
The power of ggplot2 lies in its layered logic, known as the Grammar of Graphics. We build our final plot by stacking independent layers on top of each other.
We start by defining the data and mapping the internal color (fill) and transparency (alpha).
library(tidyverse)
iris %>%
ggplot(aes(x = Sepal.Length)) +
geom_histogram(fill = "purple", alpha = 0.3, color = "black")Adjusting the binwidth is the most critical part of a histogram; it controls the granularity of your distribution. We also add descriptive labels using labs().
iris %>%
ggplot(aes(x = Sepal.Length)) +
geom_histogram(fill = "purple", alpha = 0.2, color = "black", binwidth = 0.2) +
labs(title = "Histogram of Sepal Length",
x = "Sepal Length (cm)",
y = "Frequency")In this step, we use the Pipe operator (%>%) to filter the data before it reaches the plot. We also apply theme_test() to remove the gray background for a cleaner look.
iris %>%
filter(Sepal.Length > 5) %>%
ggplot(aes(x = Sepal.Length)) +
geom_histogram(fill = "purple", alpha = 0.2, color = "black", binwidth = 0.2) +
theme_test() +
labs(title = "Filtered Histogram (> 5cm)",
x = "Sepal Length",
y = "Frequency")To make our graph “Publication Ready,” we center the title and bold the text using the theme() function.
iris %>%
filter(Sepal.Length > 5) %>%
ggplot(aes(x = Sepal.Length)) +
geom_histogram(fill = "purple", alpha = 0.2, color = "black", binwidth = 0.2) +
labs(title = "Distribution of Sepal Length",
subtitle = "Cleaned and Formatted Histogram",
x = "Sepal Length",
y = "Frequency",
caption = "Dataset: Iris | Prepared by Abdullah Al Shamim") +
theme_test() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
plot.subtitle = element_text(hjust = 0.5),
axis.title = element_text(face = "bold")
)| Parameter | Function | Systemic Purpose |
|---|---|---|
fill |
Interior Color | Defines the visual theme of the bars. |
color |
Border Color | Distinguishes individual bins. |
alpha |
Transparency | Softens the visual impact (range 0-1). |
binwidth |
Data Grouping | Controls the statistical resolution. |
hjust = 0.5 |
Centering | Ensures professional alignment. |
Congratulations! You have successfully transitioned from a basic R plot to a layered, professional ggplot2 visualization.
```