Program 10

Author

Manoj

Program

  1. Develop an R function to draw a density curve representing the probability density function of a continuous variable, with separate curves for each group, using ggplot2.

Step 1: Load Required Library

We need the ggplot2 package to create density plots.

# Load ggplot2 for plotting
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.1.3

Step 2: Define the Function

We will create a function called plot_density_by_group() which: - Accepts a data frame, the name of a continuous variable, and a grouping variable - Draws density curves by group - Allows optional custom color schemes

plot_density_by_group <- function(data, continuous_var, group_var, fill_colors = NULL) {
  # Check if the specified columns exist
  if (!(continuous_var %in% names(data)) || !(group_var %in% names(data))) {
    stop("Invalid column names. Make sure both variables exist in the dataset.")
  }

  # Create the ggplot object
  p <- ggplot(data, aes_string(x = continuous_var, color = group_var, fill = group_var)) +
    geom_density(alpha = 0.4) +
    labs(title = paste("Density Plot of", continuous_var, "by", group_var),
         x = continuous_var,
         y = "Density") +
    theme_minimal()

  # Apply custom fill colors if provided
  if (!is.null(fill_colors)) {
    p <- p + scale_fill_manual(values = fill_colors) +
             scale_color_manual(values = fill_colors)
  }

  # Return the plot
  return(p)
}

Step 3: Explanation of Function Components

Code Description
data The dataset (e.g., iris)
continuous_var Name of the continuous variable (e.g., "Sepal.Length")
group_var Grouping variable (e.g., "Species")
aes_string() Maps the variables using string names (for flexibility)
geom_density(alpha = 0.4) Draws smoothed density curves with transparency
facet_wrap(~ group_var) Not used here; instead we overlay curves in one plot
theme_minimal() Clean layout with minimal gridlines
scale_fill_manual() Applies custom fill colors if provided

Step 4: Example with Built-in iris Dataset

Let’s draw density plots for Sepal.Length across different Species in the iris dataset.

# Basic usage
plot_density_by_group(iris, "Sepal.Length", "Species")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
i Please use tidy evaluation ideoms with `aes()`


Step 5: Example with Custom Colors

You can customize colors to improve visual appeal or match your theme.

# Define custom colors
custom_colors <- c("setosa" = "steelblue",
                   "versicolor" = "forestgreen",
                   "virginica" = "darkorange")

# Plot with custom colors
plot_density_by_group(iris, "Petal.Length", "Species", fill_colors = custom_colors)


📈 Step 6: Output Description

  • The X-axis shows the continuous variable (e.g., Sepal.Length)
  • The Y-axis shows the probability density
  • Each group (e.g., Species) is represented by a separate curve
  • The alpha = 0.4 setting allows curves to overlap transparently

Summary

This function is: - Reusable: Works for any dataset with a numeric and a categorical variable - Customizable: Supports color schemes - Effective: Helps visualize distribution patterns across groups

Use it for exploratory data analysis to compare how different categories behave in terms of continuous measurements.