# Load ggplot2 package for visualization
library(ggplot2)program 5
Implement an R program to create a histogram illustrating the distribution of a continuous variable, with overlays of density curves for each group, using ggplot2.
Overview of Steps
In this program, we will follow these steps:
- Load the required library
- Explore the dataset
- Identify the continuous and grouping variables
- Initialize the plot with aesthetic mappings
- Add the histogram layer
- Add group-wise density curves
- Add labels and theme
- Display and interpret the final plot
Step 1: Load Required Library
We first load the ggplot2 package, which is used for data visualization in R.
Step 2: Explore the Inbuilt Dataset
We use the built-in iris dataset. In this dataset:
Petal.Lengthis the continuous variableSpeciesis the categorical grouping variable
Before creating a graph, we inspect the structure and the first few rows of the dataset.
# Use the built-in 'iris' dataset
# 'Petal.Length' is a continuous variable
# 'Species' is a categorical grouping variable
str(iris)'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Explanation
str(iris)helps us understand the structure and data types of the variableshead(iris)displays the first few rows of the dataset- This step is important because it helps us confirm that
Petal.Lengthis numerical andSpeciesis categorical
Step 3: Create Histogram with Group-wise Density Curves
Now we begin building the visualization step by step.
Step 3.1: Initialize the ggplot with Aesthetic Mappings
We start by creating a basic ggplot object and mapping the variables.
# Start ggplot with iris dataset
# Map Petal.Length to x-axis and fill by Species
p <- ggplot(data = iris, aes(x = Petal.Length, fill = Species))
pExplanation
This step initializes the plot and tells ggplot2 how to use the variables:
Petal.Lengthis mapped to the x-axis because it is the continuous variable we want to studySpeciesis mapped to the fill aesthetic so that different groups can be visually distinguished
This does not yet create the complete graph, but it sets up the plotting framework.
Step 3.2: Add Histogram Layer
Next, we add a histogram layer to show the distribution of the continuous variable.
# Add histogram with density scaling
p <- p + geom_histogram(aes(y = ..density..),
alpha = 0.4,
position = "identity",
bins = 30)
pWarning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Explanation
aes(y = ..density..)scales the histogram so that the y-axis represents density instead of raw countsalpha = 0.4makes the bars semi-transparent so overlapping groups can still be seenposition = "identity"allows the histograms of different groups to overlap rather than being stacked side by sidebins = 30controls how many intervals are used in the histogram
This layer helps us understand how the values of Petal.Length are distributed for different species.
Step 3.3: Add Density Curve Layer
To make the distribution smoother and easier to interpret, we overlay density curves for each group.
# Overlay density curves for each group
p <- p +
geom_density(aes(color = Species),
size = 1.2)Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
pExplanation
This step adds smooth density curves on top of the histogram.
aes(color = Species)assigns a different line color to each speciessize = 1.2controls the thickness of the density curves
The density curves help students compare the shape, spread, and concentration of the distributions across groups more clearly than the histogram alone.
Step 3.4: Add Labels and Theme
Now we improve the appearance of the graph by adding a title, axis labels, and a clean theme.
# Add title and axis labels, and apply clean theme
p <- p + labs(
title = "Distribution of Petal Length with Group-wise Density Curves",
x = "Petal Length",
y = "Density") +
theme_minimal()
pExplanation
labs()is used to add a meaningful title and axis labels- The title explains what the graph is showing
- The x-axis label identifies the continuous variable
- The y-axis label indicates that the graph is scaled by density
theme_minimal()gives the plot a simple and clean appearance
This step improves readability and presentation quality.
Step 3.5: Display the Plot
Finally, we render the complete plot.
# Finally, render the plot
pInterpretation
After generating the plot, we study the distribution of Petal.Length for each species.
The histogram shows how the values are spread across intervals, while the density curves provide a smooth summary of the distribution for each group.
This makes it easier to compare:
- where most values are concentrated
- how wide or narrow each distribution is
- whether the groups overlap or are clearly separated
Summary
In this program, we:
- Used the built-in
irisdataset - Selected
Petal.Lengthas the continuous variable - Used
Speciesas the grouping variable - Created a histogram to visualize the distribution
- Added density curves for each group
- Improved the graph using labels and a minimal theme
Discussion Points
- Why is
Petal.Lengthconsidered a continuous variable? - Why is
Speciesused as a grouping variable? - What is the purpose of using
aes(y = ..density..)in the histogram? - Why is transparency useful when histograms overlap?
- How do density curves improve the interpretation of the graph?
- What can be learned by comparing the distributions of the three species?
Follow-up Questions
- Change the number of bins and observe how the histogram changes
- Create the same plot using
Sepal.Lengthinstead ofPetal.Length - Use a different theme such as
theme_bw() - Try changing the transparency level using
alpha - Remove the density curves and compare the graph with the original
- Create separate histograms for each species using faceting
Conclusion
In this exercise, we learned how to:
- explore a dataset before visualization
- identify continuous and categorical variables
- create a histogram for a continuous variable
- overlay group-wise density curves
- interpret grouped distributions effectively using
ggplot2