Program 9

Author

Manoj

Objective

Create multiple histograms using ggplot2::facet_wrap() to visualize how a variable (e.g., Sepal.Length) is distributed across different groups (e.g., Species) in a built-in R dataset.


Requirements

Before proceeding, make sure you have the ggplot2 package installed. You can install it using: install.packages(“ggplot2”) Then, load the package:

# Load the ggplot2 package
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.1.3

Step 1: Load and Explore the Dataset

We’ll use the built-in iris dataset. This dataset contains:

  • 150 rows (observations)
  • 4 numeric columns: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
  • 1 categorical column: Species (Setosa, Versicolor, Virginica)

Let’s view the first few rows.

# Load the iris dataset
data(iris)

# View the first few rows of the dataset
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Step 2: Create Grouped Histograms Using facet_wrap

Let’s now create histograms of Sepal.Length for each Species using ggplot2 and facet_wrap().

# Create histograms using facet_wrap for grouped data
ggplot(iris, aes(x = Sepal.Length)) +
  geom_histogram(binwidth = 0.3, fill = "skyblue", color = "black") +
  facet_wrap(~ Species) +
  labs(title = "Distribution of Sepal Length by Species",
       x = "Sepal Length (cm)",
       y = "Frequency") +
  theme_minimal()


Step 3: Explanation of Each Line

Code Line Description
ggplot(iris, aes(x = Sepal.Length)) Initializes a plot using the iris dataset and maps Sepal.Length to the x-axis.
geom_histogram(binwidth = 0.3, ...) Adds a histogram layer with a bin width of 0.3.
fill = "skyblue" Sets the fill color of the bars.
color = "black" Sets the border color of the bars.
facet_wrap(~ Species) Creates separate histograms for each species in a grid layout.
labs(...) Adds a title and axis labels.
theme_minimal() Applies a minimal theme for better visualization.

Output Description

The output will be three side-by-side histograms, each showing the distribution of Sepal Length for one of the following species:

  • setosa
  • versicolor
  • virginica

Each histogram allows us to visually compare the distribution of Sepal Length across the species.


Bonus Tip: Try with Different Variables

You can replace Sepal.Length in the aes(x = ...) part with:

  • Sepal.Width
  • Petal.Length
  • Petal.Width

This lets you explore how other features vary across species!


Summary

This exercise demonstrates:

  • How to create grouped visualizations using facet_wrap().
  • How to analyze and compare distributions across categories using histograms.
  • Use of ggplot2, one of the most powerful R libraries for data visualization.