1.Develop an R program to quickly explore a given dataset, including categorical analysis using the group_by command, and visualize the findings using ggplot2 features
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
# Load datasetdata <- mtcars# Convert 'cyl' to a factor for categorical analysisdata$cyl <-as.factor(data$cyl)
# Summarize average mpg by cylinder categorysummary_data <- data %>%group_by(cyl) %>%summarise(avg_mpg =mean(mpg), .groups ='drop')# Display summaryprint(summary_data)
# Create a bar plot using ggplot2ggplot(summary_data, aes(x = cyl, y = avg_mpg, fill = cyl)) +geom_bar(stat ="identity") +labs(title ="Average MPG by Cylinder Count",x ="Number of Cylinders",y ="Average MPG") +theme_minimal()
2.Write an R script to create a scatter plot, incorporating categorical analysis through color-coded data points representing different groups, using ggplot2
# Load the iris datasetdata <- iris# Display first few rowshead(data)
# Create a scatter plot using ggplot2ggplot(data, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +geom_point(size =3, alpha =0.7) +# Increase point size & transparencylabs(title ="Scatter Plot of Sepal Dimensions",x ="Sepal Length",y ="Sepal Width",color ="Species") +# Legend titletheme_minimal() +# Clean layouttheme(legend.position ="top") # Move legend to the top
3.Implement an R function to generate a line graph depicting the trend of a time-series dataset, with separate lines for each group, utilizing ggplot2’s group aesthetic
library(tidyr)
# Convert time-series data to a dataframedata <-data.frame(Date =seq(as.Date("1949-01-01"), by ="month", length.out =length(AirPassengers)),Passengers =as.numeric(AirPassengers),Year =as.factor(format(seq(as.Date("1949-01-01"), by ="month", length.out =length(AirPassengers)), "%Y")))# Display first few rowshead(data, n=20)
# Function to plot time-series trendplot_time_series <-function(data, x_col, y_col, group_col, title="Air Passenger Trends") {ggplot(data, aes_string(x = x_col, y = y_col, color = group_col, group = group_col)) +geom_line(size =1.2) +# Line graphgeom_point(size =2) +# Add points for claritylabs(title = title,x ="Year",y ="Number of Passengers",color ="Year") +# Legend titletheme_minimal() +theme(legend.position ="top")}# Call the functionplot_time_series(data, "Date", "Passengers", "Year", "Trend of Airline Passengers Over Time")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
4.Develop a script in R to produce a bar graph displaying the frequency distribution of categorical data in a given dataset, grouped by a specific variable, using ggplpot2
ggplot(data,aes(x=cyl,fill=gear)) +geom_bar(position ="dodge") +labs(title ="Frequency of cylinder grouped by gear type",x ="Number of Cylinders",y ="Count",fil ="Gears") +theme_minimal()
Implement an R program to create a histogram illustrating the distributing of a continous variable, with overlays of density curves for each group,using ggplot2
p <-ggplot(data = iris, aes(x = Petal.Length, fill = Species))p
p <- p +geom_histogram(aes(y = ..density..),alpha =0.4,position ="identity",bins =30)p
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
p <- p +geom_density(aes(colour = Species),size =1.2)
p
p <- p +labs(title ="Distribution of Petal Length with Group-wise Density Curves",x ="Petal.Length",y ="Density")+theme_minimal()p
6.Write an R script to construct a box plot showcasing the distribution of a continuous variable, grouped by a categorical variable, using ggplot2’s fill aesthetic
# Initialize ggplot with data and aesthetic mappingsp <-ggplot(data = iris, aes(x = Species, y = Petal.Width, fill = Species))# Add the box plot layer
# Add the box plot layerp <- p +geom_boxplot()
# Add title and labels and use a minimal themep <- p +labs(title ="Box Plot of Petal Width by Species",x ="Species",y ="Petal Width") +theme_minimal()
# Render the final plot# Render the final plotp
7.Develop a function in R to plot a function curve based on a mathematical equation provided as input ,with different curve styles foe each group ,using ggplot2