Develop an R program to quickly explore a given dataset, including categorical analysis using the group_by command, and visualize the findings using ggplot2 features.
Step 1: Load the necessary Library
library(ggplot2)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ lubridate 1.9.4 ✔ tibble 3.2.1
✔ purrr 1.0.4 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Step 2: Load data Set
#Load datasetdata <- mtcars#Convert 'cyl' toa factor for categorical analysisdata$cyl <-as.factor(data$cyl)
Step 3: Group by categorical variable
#Summarize average mpg by cylinder categorysummary_data <- data %>%group_by(cyl) %>%summarise(avg_mpg =mean(mpg), .groups='drop')#Display summaryprint(summary_data)
#Create a bar plot using ggplot2ggplot(summary_data, aes( x= cyl, y = avg_mpg, fill = cyl))+geom_bar(stat ="identity") +labs(title ="Average MPG by cylinder count",x ="Number of Cylinder",y="Average MPG") +theme_minimal()
Program 2:
Write an R script to create a scatter plot,incorporating categorical analysis through color-coded data points representing different groups, using ggplot2.
#Create a scatter plot using ggplot2ggplot(data, aes(x = Sepal.Length, y = Sepal.Width, color = Species))+geom_point(size =3, alpha =0.7) +#Increase point size and transaprencylabs(title ="Scatter Plot of Sepal Dimensions",x ="Sepal Length",y ="Sepal Width",color ="Species") +#Legend titletheme_minimal()+#Clean layouttheme(legend.position ="top") #Move legend to the top
Program 3:
Implement an R function to generate a line graph depicting the trend of a time-series dataset, with separate lines for each group, utilizing ggplot2’s group aesthetic.
Step 1: Load necessary libraries
library(ggplot2)library(dplyr)library(tidyr)
Step 2: Load the Build-in Air Passengers DataSet
# Convert time-series data to a dataframedata <-data.frame(Date =seq(as.Date("1949-01-01"), by ="month", length.out =length(AirPassengers)),Passengers =as.numeric(AirPassengers),Year =as.factor(format(seq(as.Date("1949-01-01"), by ="month", length.out =length(AirPassengers)), "%Y")))# Display first few rowshead(data, n=20)
Step 3: Define a Function for time series Line Graph
# Function to plot time-series trendplot_time_series <-function(data, x_col, y_col, group_col, title="Air Passenger Trends") {ggplot(data, aes_string(x = x_col, y = y_col, color = group_col, group =group_col)) +geom_line(size =1.2) +# Line graphgeom_point(size =2) +# Add points for claritylabs(title = title,x ="Year",y ="Number of Passengers",color ="Year") +# Legend titletheme_minimal() +theme(legend.position ="top")}# Call the functionplot_time_series(data, "Date", "Passengers", "Year", "Trend of Airline Passengers Over Time")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Program 4:
Develop a script in R to produce a Bar graph displaying the frequency distribution of categorical data in a given dataset, grouped by a specific variabe, using ggplot2.
#create a bar graphggplot(data, aes(x=cyl,fill = gear)) +geom_bar(position ="dodge") +#Grouped bar chartlabs(title ="Frequency of cyclinders groupled byGear Type",x ="Number of Cyclinders",y ="Count",fill ="Gears") +#Legend titletheme_minimal()
Program 5:
Implement an R program to create a histogram illustrating the distribution of a continuous variable, with overlays of density curves for each group, using ggplot2.
Step 1: Load necessary libraries
#Load ggplot2 package for visualizationlibrary(ggplot2)
Step 2: Explore the Inbuilt Dataset
#Use the built-in 'iris' dataset#'Petal-Length' is a continuous variable#'Species' is a categorical grouping variablestr(iris) #shows the structure of the dataset
Step 3: Creating a Histogram with Group-wise Density Curves
Step 3.1: Initialize the ggplot with aesthetic mappings
#Start ggplot with iris dataset#Map Petal.Length to x-axis and fill by Species (grouping variable)p <-ggplot(data = iris, aes(x = Petal.Length, fill = Species))p
Step 3.2: Add Histogram Layer
#add histogram with density scaling p <- p +geom_histogram(aes(y= ..density..),alpha =0.4, #position ="identity", #Overlap histogrambins =30#number of bins )p
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Step 3.3: add density Curve Layer
#Overlay density curves for each groupp <- p +geom_density(aes(color = Species), # Lie color by groupsize =1.2) #Line thicknessp
Step 3.4 : Add Labels and theme
#Add title and axis labels, and apply clen themep <- p +labs(title ="Distribution of Petal Length with Group-wise Density Curves",x ="Petal Length",y ="Density") +theme_minimal()
Step 3.5: Display the plot
p
Program 6:
Write an R script to construct a box plot showcasing the distribution of a continuous variable, grouped by a categorical variable, using ggplot2’s fill aesthetic.
Step 1: Load Required Library
# Load ggplot2 package for visualizationlibrary(ggplot2)
Step 2: Explore the Inbuilt Dataset
# Use the built-in 'iris' dataset# 'Petal.Width' is a continuous variable# 'Species' is a categorical grouping variablestr(iris) # View structure of the dataset
Step 3.1: Initialize ggplot with Aesthetic Mappings
# Initialize ggplot with data and aesthetic mappingsp <-ggplot(data = iris, aes(x = Species, y = Petal.Width, fill = Species))
Step 3.2: Add Box Plot Layer
# Add the box plot layerp <- p +geom_boxplot()
Step 3.3: Add Labels and Theme
# Add title and labels and use a minimal themep <- p +labs(title ="Box Plot of Petal Width by Species",x ="Species",y ="Petal Width") +theme_minimal()
Step 3.4: Display the Plot
p
Program 7:
Develop a function in R to plot a function curve based on a mathematical equation provided as input, with different curve styles for each group, using ggplot2.
Step 1: Load the required library
library(ggplot2)
Step 2: Create data for the functions
# Create a sequence of x values ranging from -2pi to 2pix <-seq(-2*pi, 2*pi, length.out =500)# Evaluate sin(x) and cos(x) over the x rangey1 <-sin(x)y2 <-cos(x)# Combine data into one data framedf <-data.frame(x =rep(x, 2), # Repeat x values for both functionsy =c(y1, y2), # Combine y values: first sin(x, then cos(x))group =rep(c("sin(x)", "cos(x)"), each =length(x)) # Label each row by function)
Step 3: Plot the Function Curves
Step 3.1: Initialize the ggplot Object
# Start building the ggplot using the data frame and aestheticsp <-ggplot(df, aes(x = x, y = y, color = group, linetype = group))
Step 3.2: Add the Line Geometry
# Add smooth lines to represent each function curvep <- p +geom_line(size =1.2)
Step 3.3: Add Plot Labels
# Add title, axis labels, and legendsP <- p +labs(title ="Function Curves: sin(x) and cos(x)",x ="x",y ="y = f(x)",color ="function",linetype ="Function")p
Step 3.4: Apply a Clean Theme
# Use a clean and simple background themep <- p +theme_minimal()p