Develop an R program to compile all the programmes from 1-7.
1. Develop an R program to quickly explore a given dataset, including categorical analysis using group_by command, and visualize the findings using ggplot2 features.
Step 1 : Load necessary libaries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#LOad datasetdata<-mtcars#Convert 'cyl' to a factor for categorical analysisdata$cyl <-as.factor(data$cyl)
Step 3: Group by categorical variables
#Summarize average mpg by cylinder ccategorysummary_data <- data %>%group_by(cyl) %>%summarise(avg_mpg =mean(mpg), .groups ='drop')#display summaryprint (summary_data)
#create a bar plot using ggpot2ggplot(summary_data, aes(x = cyl, y =avg_mpg, fill = cyl))+geom_bar(stat ="identity")+labs(title ="average MPG by Cylinder Count",x="Number of cylinders" , y="Average MPG")+theme_minimal()
2. Write an R script to create a scatter plot, incorporating categorical analysis through color-coded data points representing different groups, using ggplot2.
Step-1:Load Necessary Libraries
# Load the necesssary librarylibrary(ggplot2)library(dplyr)
Step-2: Load the Dataset
Explanation:
The iris dataset contains 150 samples of iris flowers categorized into 3 species
Each sample has the Sepal and the petal measurements.
head(data) displays the first few rows.
# load the iris datasetdata<- iris#display the first few rowshead(data , n=10)
ggplot(data, aes(x = Sepal.Length, y= Sepal.Width, color = Species))+geom_point(size=3,alpha =0.7)+labs(title ="Scatter Plot of Sepal Dimensions",x="Sepal length",y ="Sepal Width",# legend titletheme_minimal() +# clean layouttheme(legend.position ="top")) # move legend to top
3. Implement an R function to generate a line graph depicting the trend of a time-series dataset, with separate lines for each group, utilizing ggplot2’s group aesthetic.
Introduction:
This document demonstrates how to create a time-series line graph using the built-in AirPassengers dataset in R.
The dataset contains monthly airline passenger counts from 1949 to 1960. We will use ggplot2 to visualize trends , with separate lines for each year.
Step 1: Load necessary libraries.
library(tidyr)library(dplyr)library(ggplot2)
Step 2: Load the Built-in AirPassengers Dataset
The AirPassengers dataset is a time series object in R.
We first convert it into a dataframe to use it with ggplot2.
#convert time-series data to a dataframeAirPassengers
plot_time_series<-function(data, x_col, y_col, group_col, title="Air Passenger Trends" ){ggplot(data, aes_string(x=x_col,y= y_col , color = group_col,group = group_col))+geom_line(size =1.2)+geom_point(size =2)+labs(title = title,x="year",y="number of passengers",color ="year") +theme_minimal()+theme(legend.position ="top")}#call the functionplot_time_series(data,"Date","Passengers" , "year", "Trend of airline passengers over time")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
4. Develop a script in R to produce a bar graph displaying the frequency distribution of categorical data in a given dataset, grouped by specific variable, using ggplot2.
#LOAD necessary librarylibrary(ggplot2)
Step 1 : Load the dataset
We use the built-in mtcars dataset, which contains information about different car models.
ggplot(data, aes(x=cyl,fill=gear))+geom_bar(position ="dodge")+labs(title ="frequncy of cylinders grouped by gear type",x="no.of cylinders",y="count",fill="gears")+theme_minimal()
5. Implement an R program to create a histogram illustrating the distribution of a continuous variable, with overlays of density curves for each group, using ggplot2.
Step 3:Create Histogram with Group-wise Density Curves
Step 3.1 : Intialize the ggplot with aesthetic mappings
# Start ggplot with iris dataset# Map Petal.Length to x-axis and fill by Species (grouping variable)p <-ggplot(data = iris, aes(x = Petal.Length, fill = Species))p
Step 3.2: Add Histogram Layer
p<-ggplot(data=iris, aes(x= Petal.Length, fill = Species))p<- p+geom_histogram(aes(y=..density..),alpha=0.4,position ="identity",bins=30)
Step 3.3: Add Density Curve Layer
# Overlay density curves for each groupp <- p +geom_density(aes(color = Species), # Line color by groupsize =1.2)# Line thicknessp
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Step 3.4: Add Labels and Theme
# Add title and axis labels, and apply clean themep <- p +labs(title ="Distribution of Petal Length with Group-wise Density Curves", x ="Petal Length", y ="Density")+theme_minimal()p
Step 3.5: Display the Plot
p
6. Write a R script to construct a box plot showcasing the distribution of a continuous variable , grouped by a categorical variable , using ggplot2’s fill aesthetic.
Step1: Load Required Library
#Load ggplot2 package for visualizationlibrary(ggplot2)
Step2: Explore the Inbuilt Dataset
# Use the built-in 'iris' dataset# 'Petal.Width' is a continuous variable# 'Species' is a categorical grouping variablestr(iris) # View structure of the dataset
Step 3.1: Initialize ggplot with Aesthetic Mappings
Explanation:
x = Species: Grouping variable (categorical)
y = Petal.Width: Continuous variable to show distribution
fill = Species: Fill box colors by species
# Initialize ggplot with data and aesthetic mappingsp <-ggplot(data = iris, aes(x = Species, y = Petal.Width, fill = Species))
Step 3.2: Add Box Plot Layer
# Add the box plot layer p <- p +geom_boxplot()
Explanation:
geom_boxplot() creates box plots for each group.
Automatically shows median, quartiles, and outliers.
Step 3.3: Add Labels and Theme
# Add title and labels and use a minimal themep <- p +labs(title ="Box Plot of Petal Width by Species", x ="Species", y ="Petal Width") +theme_minimal()
Explanation:
labs() adds a descriptive title and axis labels.
theme_minimal() gives a clean, modern look.
Step 3.4: Display the Plot
# Render the final plotp
7. Develop a function in R to plot a function curve based on mathematical equation provided as input, with different curve styles for each group, using ggplot2.
Step 1 : Load the library
#Load ggplot2 package for advanced plottinglibrary(ggplot2)
Step 2 : Create data for the functions
#Create a sequence of x values ringing from -2pi to 2pix <-seq(-2*pi, 2*pi, length.out =500)#Evaluate sin(x) and cos(x) over the x rangey1 <-sin(x)y2 <-cos(x)#Combine data into one data framedf <-data.frame(x =rep(x,2),y =c(y1,y2),group =rep(c("sin(x)", "cos(x)"), each =length(x)))df