PROGRAM 5

Author

PRACHETAN MS

Implement a R program to create a histogram illustrating the distribution of a continous variable,with overlays of density curves for each group, using ggplot2

Step 1: Load Required Library:

Loads the ggplot2 package used for data visualization. Required for creating histogram and density plots.

#Load ggplot2 package for visualization 
library(ggplot2)

Step 2:Explore

str(iris) → Shows structure (data types, variables) head() → Displays first rows tail() → Displays last rows Helps understand: Petal.Length → Continuous variable Species → Grouping variable (categorical)

#use the built-in 'iris' dataset
# 'Petal.Length' is a continous variable
# 'Species' is a categorical grouping variable
str (iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
head(iris,n=2)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
tail(iris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
145          6.7         3.3          5.7         2.5 virginica
146          6.7         3.0          5.2         2.3 virginica
147          6.3         2.5          5.0         1.9 virginica
148          6.5         3.0          5.2         2.0 virginica
149          6.2         3.4          5.4         2.3 virginica
150          5.9         3.0          5.1         1.8 virginica
tail(iris,n=2)
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
149          6.2         3.4          5.4         2.3 virginica
150          5.9         3.0          5.1         1.8 virginica

Creates base plot object p x = Petal.Length → X-axis shows continuous values fill = Species → Different colors for each species

# Star ggplot with iris dataset
# Map Petal.Length to x-axis and fill by Species

p  <- ggplot(data = iris, aes(x = Petal.Length , fill = Species))
p

geom_histogram() → Creates histogram y = ..density.. → Converts count to density (important for overlay) alpha = 0.4 → Makes bars transparent position = “identity” → Overlapping histograms bins = 30 → Number of intervals

# Add histogram with density scaling

p <- p + geom_histogram(aes(y = ..density..),
                        alpha = 0.4,
                        position = "identity",
                        bins = 30)
p
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Step 3:

geom_density() → Adds smooth density curves color = Species → Different color per group size = 1.2 → Thickness of lines

#Overlay density curves for each group

p  <- p +
  geom_density(aes(color = Species),
               size = 1.2)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
p

Step 4:

labs() → Adds title and axis labels theme_minimal() → Clean and simple design

# Add title and axis labels, and apply clean theme

p <- p + labs(
  title = "Distribution of Petal Length with Group-wise Density Curves",
  x = "Petal Length" ,
  y = "Density") +
  theme_minimal()
  p

Moves legend to the top Improves readability of the graph

# Add title and axis labels, and apply clean theme

p <- p + labs(
  title = "Distribution of Petal Length with Group-wise Density Curves",
  x = "Petal Length" ,
  y = "Density") +
  theme_minimal()+
  theme(legend.position = 'top')
p

Conclusion:

Used iris dataset to analyze continuous data Created histogram of Petal Length Overlaid density curves for each species Improved visualization using themes and labels