Develop and R program to quickly explore a given data set, including categorical analysis using the group_by command, and visualize the findings using ggplot2 features.
Step 1: Load necessary libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot(summary_data, aes(x=cyl, y=avg_mpg, fill=cyl)) +geom_bar(stat="identity") +labs(title="Average MPG by cylinder count", x="Number of Cylinders", y="Average MPG") +theme_minimal()
Write an R script to create a scatter plot, incorporating categorical analysis through color-coded data points representing different groups, using ggplot2.
Step 1: Load necessary libraries
library(ggplot2)library(dplyr)
Step 2: Load the Data set
data <- irishead(data, n=10) #first six rows of the dataset by default
ggplot(data, aes(x= Sepal.Length, y= Sepal.Width, color=Species))+geom_point(size=3, alpha=0.7)+labs(title="Scatter Plot of Sepal Dimensions", x="Sepal Length",y="Sepal Width", color ="Species") +#Legend titletheme_minimal() +#Clean titletheme(legend.position ="top") #Move legend to the top
Implement an R function to generate a line graph depicting the trend of a time-series data set, with separate lines for each group, utilizing ggplot2’s group aesthetic.
Step 1: Load the necessary libraries
library(ggplot2)library(dplyr)
Step 2: Load the built-in Air Passengers Data set
data <-data.frame(Date =seq(as.Date("1949-01-01"), by ="month", length.out =length(AirPassengers)),Passengers =as.numeric(AirPassengers),Year =as.factor(format(seq(as.Date("1949-01-01"), by ="month", length.out =length(AirPassengers)), "%Y")))# Display first few rowshead(data, n=20)
Step 3: Define a Function for Time-Series Line Graph
plot_time_series <-function(data, x_col, y_col, group_col, title="Air Passenger Trends") {ggplot(data, aes_string(x = x_col, y = y_col, color = group_col, group = group_col)) +geom_line(size =1.2) +# Line graphgeom_point(size =2) +# Add points for claritylabs(title = title,x ="Year",y ="Number of Passengers",color ="Year") +# Legend titletheme_minimal() +theme(legend.position ="top")}plot_time_series(data, "Date", "Passengers", "Year", "Trend of Airline Passengers Over Time")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Develop a script in R to produce a bar graph displaying the frequency distribution of categorical data in a given dataset, grouped by a specific variable, using ggplot2.
# Create a bar graphggplot(data, aes(x = cyl, fill = gear)) +geom_bar(position ="dodge") +# Grouped bar chartlabs(title ="Frequency of Cylinders Grouped by Gear Type",x ="Number of Cylinders",y ="Count",fill ="Gears") +# Legend titletheme_minimal()
Implement an R program to create a histogram illustrating the distribution of a continuous variable, with overlays of density curves for each group, using ggplot2.
Step 1: Load Required Library
# Load ggplot2 package for visualizationlibrary(ggplot2)
Step 3: Create Histogram with Group-wise Density Curves
Step 3.1: Initialize the ggplot with aesthetic mappings
p <-ggplot(data = iris, aes(x = Petal.Length, fill = Species))p
Step 3.2: Add Histogram Layer
p <- p +geom_histogram(aes(y = ..density..),alpha =0.4, # Set transparencyposition ="identity",# Overlap histogramsbins =30) # Number of binsp
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Step 3.3: Add Density Curve Layer
p <- p +geom_density(aes(color = Species), # Line color by groupsize =1.2)# Line thicknessp
Step 3.4: Add Labels and Theme
# Add title and axis labels, and apply clean themep <- p +labs(title ="Distribution of Petal Length with Group-wise Density Curves", x ="Petal Length", y ="Density")+theme_minimal()p
Step 3.5: Display the Plot
p
Write an R script to construct a box plot showcasing the distribution of a continuous variable, grouped by a categorical variable, using ggplot2’s fill aesthetic
Step 3.1: Initialize ggplot with Aesthetic Mappings
p <-ggplot(data = iris, aes(x = Species, y = Petal.Width, fill = Species))
Step 3.2: Add Box Plot Laye
p <- p +geom_boxplot()
Step 3.3: Add Labels and Theme
p <- p +labs(title ="Box Plot of Petal Width by Species",x ="Species",y ="Petal Width") +theme_minimal()
Step 3.4: Display the Plot
p
Develop a function in R to plot a function curve based on mathematical equation provided as input, with different curve styles for each group, using ggplot2
Step 1 : Load the library
#Load ggplot2 package for advanced plottinglibrary(ggplot2)
Step 2 : Create data for the functions
#Create a sequence of x values ringing from -2pi to 2pix <-seq(-2*pi, 2*pi, length.out =500)#Evaluate sin(x) and cos(x) over the x rangey1 <-sin(x)y2 <-cos(x)#Combine data into one data framedf <-data.frame(x =rep(x,2),y =c(y1,y2),group =rep(c("sin(x)", "cos(x)"), each =length(x)))df