PROGRAM 9

Author

Manjunath B R

Objective

Create multiple histograms using ggplot2::facet_wrap() to visualize how a variable (e.g., Sepal.Length) is distributed across different groups (e.g., Species) in a built-in R dataset.

Requirements

Before proceeding, make sure you have the ggplot2 package installed. You can install it using: install.packages(“ggplot2”) Then, load the package:

library(ggplot2)

Step 1: Load and Explore the Dataset.

We will use the built-in iris dataset. This dataset contains:

  • 150 rows(observations)

  • 4 numeric columns: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width

  • 1 categorical column: Species (Setosa, Versicolor, Virginica)

Let’s view the first few rows.

# Load the iris dataset
data(iris)

# View the first few rows of the dataset
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Step 2: Create Grouped Histograms Using facet_wrap.

Let’s now create histograms of Sepal.Length for each Species using ggplot2 and facet_wrap().

# Create histograms using facet_wrap for grouped data
ggplot(iris, aes(x = Sepal.Length)) +
  geom_histogram(binwidth = 0.3, fill = "skyblue", color = "black") +
  facet_wrap(~ Species) +
  labs(title = "Distribution of Sepal Length by Species",
       x = "Sepal Length (cm)",
       y = "Frequency") +
  theme_minimal()

Step 3: Explanation of Each Line.

Code Line Description
ggplot(iris, aes(x = Sepal.Length)) Initializes a plot using the iris dataset and maps Sepal.Length to the x-axis.
geom_histogram(binwidth = 0.3, ...) Adds a histogram layer with a bin width of 0.3.
fill = "skyblue" Sets the fill color of the bars.
color = "black" Sets the border color of the bars.
facet_wrap(~ Species) Creates separate histograms for each species in a grid layout.
labs(...) Adds a title and axis labels.
theme_minimal() Applies a minimal theme for better visualization.

Output Description

The output will be three side-by-side histograms, each showing the distribution of Sepal Length for one of the following species:

  • setosa

  • versicolor

  • virginica

Each histogram allows us to visually compare the distribution of Sepal Length across the species.

Bonus Tip: Try with Different Variables

You can replace Sepal.Length in the aes(x = ...) part with:

  • Sepal.Width

  • Petal.Length

  • Petal.Width

This lets you explore how other features vary across species!

Summary

This exercise demonstrates:

  • How to create grouped visualizations using facet_wrap().

  • How to analyze and compare distributions across categories using histograms.

  • Use of ggplot2, one of the most powerful R libraries for data visualization.