PRG3

Author

kusu

Develop an R program to quickly explore a given dataset, including categorical analysis using the group_by command, and visualize the findings using ggplot2 features.

Step 1: Load necessary libraries

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'tibble' was built under R version 4.4.3
Warning: package 'tidyr' was built under R version 4.4.3
Warning: package 'readr' was built under R version 4.4.3
Warning: package 'purrr' was built under R version 4.4.3
Warning: package 'dplyr' was built under R version 4.4.3
Warning: package 'stringr' was built under R version 4.4.3
Warning: package 'forcats' was built under R version 4.4.3
Warning: package 'lubridate' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

Step 2: Load the dataset

# Load dataset
data <- mtcars
data$cyl <- as.factor(data$cyl)

Step 3: Group by categorical variables

#summarize average mpg by cylinder category
summary_data <- data %>%
  group_by(cyl) %>%
  summarise(avg_mpg = mean(mpg), .groups = 'drop')
#display summary
print(summary_data)
# A tibble: 3 × 2
  cyl   avg_mpg
  <fct>   <dbl>
1 4        26.7
2 6        19.7
3 8        15.1

Step 4: Visualizing the findings

#create a bar plot using ggplot2
ggplot(summary_data, aes(x = cyl, y = avg_mpg, fill = cyl )) +
  geom_bar(stat = "identity") +
  labs(title = "Average MPG by cylinder count",
       x  = "Number of cylinders",
       y = "Average MPG") +
  theme_minimal()