Develop an R program to quickly explore a given data set, including categorical analysis using the group_by command, and visualize the findings using ggplot2 features.
Step 1: Load the necessary libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)library(ggplot2)
Step 2: Load the dataset
#Load datasetdata <- mtcars#Convert 'cyl' toa factor for categorical analysisdata$cyl <-as.factor(data$cyl)
Step 3: Group by categorical variables
#Summarize average mpg by cylinder categorysummary_data <- data %>%group_by(cyl) %>%summarise(avg_mpg =mean(mpg), .groups='drop')#Display summaryprint(summary_data)
#Create a bar plot using ggplot2ggplot(summary_data, aes( x= cyl, y = avg_mpg, fill = cyl))+geom_bar(stat ="identity") +labs(title ="Average MPG by cylinder count",x ="Number of Cylinder",y="Average MPG") +theme_minimal()
Program 2
Write an R script to create a scatter plot, incorporating categorical analysis through color-coded data points representing different groups, using ggplot2.
#Create a scatter plot using ggplot2ggplot(data, aes(x = Sepal.Length, y = Sepal.Width, color = Species))+geom_point(size =3, alpha =0.7) +#Increase point size and transaprencylabs(title ="Scatter Plot of Sepal Dimensions",x ="Sepal Length",y ="Sepal Width",color ="Species") +#Legend titletheme_minimal()+#Clean layouttheme(legend.position ="top") #Move legend to the top
Program 3
Implement an R function to generate a line graph depicting the trend of a time-series dataset, with separate lines for each group, utilizing ggplot2’s group aesthetic.
Introduction
This document demonstrates how to create a time-series line graph using the built-in AirPassengers dataset in R.
The dataset contains monthly airline passenger counts from 1949 to 1960. We will use ggplot2 to visualize trends, with separate lines for each year.
Step 1: Load the necessary libraries
library(ggplot2)library(dplyr)library(tidyr)
Step 2: Load the built-in AirPassengers Dataset
The AirPassengers dataset is a time series object in R.
We first convert it into a dataframe to use it with ggplot2.
Date: Represents the month and year (from January 1949 to December 1960).
Passengers: Monthly airline passenger counts.
Year: Extracted year from the date column, which will be used to group the data.
# Convert time-series data to a dataframedata <-data.frame(Date =seq(as.Date("1949-01-01"), by ="month", length.out =length(AirPassengers)),Passengers =as.numeric(AirPassengers),Year =as.factor(format(seq(as.Date("1949-01-01"), by ="month", length.out =length(AirPassengers)), "%Y")))# Display first few rowshead(data, n=20)
Step 3: Define a Function for Time-Series Line Graph
We define a function to create a time-series line graph where:
The x-axis represents time (Date).
The y-axis represents the number of passengers (Passengers).
Each year has a separate line to compare trends.
Function Inputs
data – The dataset containing time-series data.
x_col – The column representing time (Date).
y_col – The column representing values (Passengers).
group_col – The categorical variable for grouping (Year).
title – Custom plot title.
Features of the Line Graph
Group-based Visualization:
Each year has a distinct line color.
The group aesthetic ensures lines are drawn separately for each year.
geom_line(size = 1.2)
Adds a smooth line for trend analysis.
geom_point(size = 2)
Highlights individual data points.
theme_minimal() & theme(legend.position = “top”)
Enhances readability with a clean layout.
Moves legend to the top for better visualization.
# Function to plot time-series trendplot_time_series <-function(data, x_col, y_col, group_col, title="Air Passenger Trends") {ggplot(data, aes_string(x = x_col, y = y_col, color = group_col, group =group_col)) +geom_line(size =1.2) +# Line graphgeom_point(size =2) +# Add points for claritylabs(title = title,x ="Year",y ="Number of Passengers",color ="Year") +# Legend titletheme_minimal() +theme(legend.position ="top")}# Call the functionplot_time_series(data, "Date", "Passengers", "Year", "Trend of Airline Passengers Over Time")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Program 4
Develop a script in R to produce a bar graph displaying the frequency distribution of categorical data in a given dataset, grouped by a specific variable, using ggplot2.
#create a bar graphggplot(data, aes(x=cyl,fill = gear)) +geom_bar(position ="dodge") +#Grouped bar chartlabs(title ="Frequency of cyclinders groupled byGear Type",x ="Number of Cyclinders",y ="Count",fill ="Gears") +#Legend titletheme_minimal()
Explanation of the Plot
X-Axis (cyl)
Displays cylinder categories (4, 6, 8 cylinders).
Y-Axis (Frequency Count)
Represents the number of cars in each category.
Color Fill (gear)
Differentiates cars based on number of gears (3, 4, 5 gears).
Grouped Bars (position = "dodge")
Ensures bars are side by side instead of stacked.
Minimal Theme (theme_minimal())
Provides a clean and readable layout.
Program 5
Implement an R program to create a histogram illustrating the distribution of a continuous variable, with overlays of density curves for each group, using ggplot2.
Step 1: Load Required Library
#Load ggplot2 package for visualizationlibrary(ggplot2)
Step 2: Explore the Inbuilt Dataset
#Use the built-in 'iris' dataset#'Petal-Length' is a continuous variable#'Species' is a categorical grouping variablestr(iris) #shows the structure of the dataset
Step 3: Create Histogram with Group-wise Density Curves
Step 3.1: Initialize the ggplot with aesthetic mappings
#Start ggplot with iris dataset#Map Petal.Length to x-axis and fill by Species (grouping variable)p <-ggplot(data = iris, aes(x = Petal.Length, fill = Species))p
Explanation:
This initializes the plot and tells ggplot to map:
Petal.Length (continuous variable) to the x-axis
Species (categorical) to fill aesthetic to distinguish groups
Step 3.2: Add Histogram Layer
# Add histogram with density scalingp <- p +geom_histogram(aes(y = ..density..),alpha =0.4, # Set transparencyposition ="identity",# Overlap histogramsbins =30) # Number of binsp
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Explanation:
aes(y = ..density..) normalizes the histogram to density
alpha = 0.4 makes bars semi-transparent so overlaps are visible
position = "identity" lets different group histograms stack on top
bins = 30 controls histogram resolution
Step 3.3: Add Density Curve Layer
#Overlay density curves for each groupp <- p +geom_density(aes(color = Species), # Lie color by groupsize =1.2) #Line thicknessp
Explanation: This overlays smooth density curves for each species using color. The aes(color = Species) ensures each curve is colored by group.
Step 3.4: Add Labels and Theme
#Add title and axis labels, and apply clen themep <- p +labs(title ="Distribution of Petal Length with Group-wise Density Curves",x ="Petal Length",y ="Density") +theme_minimal()
Explanation:
labs() adds a title and axis labels
theme_minimal() applies a clean, modern plot style
Step 3.5: Display the Plot
# Finally, render the plotp
Program 6
Write an R script to construct a box plot showcasing the distribution of a continuous variable, grouped by a categorical variable, using ggplot2’s fill aesthetic.
Step 1: Load the required library
# Load ggplot2 package for visualizationlibrary(ggplot2)
Step 2: Explore the inbuilt dataset
# Use the built-in 'iris' dataset# 'Petal.Width' is a continuous variable# 'Species' is a categorical grouping variablestr(iris) # View structure of the dataset
Step 3.1: Initialize the ggplot with aesthetic mappings
# Initialize ggplot with data and aesthetic mappingsp <-ggplot(data = iris, aes(x = Species, y = Petal.Width, fill = Species))
Explanation:
x = Species: Grouping variable (categorical)
y = Petal.Width: Continuous variable to show distribution
fill = Species: Fill box colors by species
Step 3.2: Add box plot layer
# Add the box plot layerp <- p +geom_boxplot()
Explanation:
geom_boxplot() creates box plots for each group
Automatically shows median, quartiles, and outliers
Step 3.3: Add labels and theme
# Add title and labels and use a minimal themep <- p +labs(title ="Box Plot of Petal Width by Species",x ="Species",y ="Petal Width") +theme_minimal()
Explanation:
labs() adds a descriptive title and axis labels
theme_minimal() gives a clean, modern look
Step 3.4: Display the plot
# Render the final plotp
Summary
Used the iris dataset
Visualized Petal.Width as a box plot
Grouped by Species
Used fill = Species for colorful grouping
Each box represents the distribution of values for one species
Program 7
Develop a function in R to plot a function curve based on a mathematical equation provided as input, with different curve styles for each group, using ggplot2.
Step 1: Load the required library
library(ggplot2)
Step 2: Create data for the functions
# Create a sequence of x values ranging from -2pi to 2pix <-seq(-2*pi, 2*pi, length.out =500)# Evaluate sin(x) and cos(x) over the x rangey1 <-sin(x)y2 <-cos(x)# Combine data into one data framedf <-data.frame(x =rep(x, 2), # Repeat x values for both functionsy =c(y1, y2), # Combine y values: first sin(x, then cos(x))group =rep(c("sin(x)", "cos(x)"), each =length(x)) # Label each row by function)
Step 3: Plot the Function Curves
Step 3.1: Initialize the ggplot Object
# Start building the ggplot using the data frame and aestheticsp <-ggplot(df, aes(x = x, y = y, color = group, linetype = group))
Step 3.2: Add the Line Geometry
# Add smooth lines to represent each function curvep <- p +geom_line(size =1.2)
Step 3.3: Add Plot Labels
# Add title, axis labels, and legendsP <- p +labs(title ="Function Curves: sin(x) and cos(x)",x ="x",y ="y = f(x)",color ="function",linetype ="Function")p
Step 3.4: Apply a Clean Theme
# Use a clean and simple background themep <- p +theme_minimal()p