P1 - Class Demo

Author

Manoj

Develop an R program to quickly explore a given dataset, including categorical analysis using the group_by command, and visualize the findings using ggplot2 features.

Step 1: Load necessary libraries

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.1.3
-- Attaching packages --------------------------------------- tidyverse 1.3.2 --
v ggplot2 3.4.0     v purrr   1.0.1
v tibble  3.1.6     v dplyr   1.1.0
v tidyr   1.3.0     v stringr 1.5.0
v readr   2.1.1     v forcats 0.5.1
Warning: package 'ggplot2' was built under R version 4.1.3
Warning: package 'tidyr' was built under R version 4.1.3
Warning: package 'purrr' was built under R version 4.1.3
Warning: package 'dplyr' was built under R version 4.1.3
Warning: package 'stringr' was built under R version 4.1.3
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
library(dplyr)
library(ggplot2)

Step 2: Load the dataset

# Load dataset
data <- mtcars

# Convert 'cyl' to a factor for categorical analysis
data$cyl <- as.factor(data$cyl)

Step 3: Group by categorical variable

# Summarize average mpg by cylinder category
summary_data <- data %>%
  group_by(cyl) %>%
  summarise(avg_mpg = mean(mpg), .groups = 'drop')

# Display summary
print(summary_data)
# A tibble: 3 x 2
  cyl   avg_mpg
  <fct>   <dbl>
1 4        26.7
2 6        19.7
3 8        15.1

Step 4: Visualizing the findings

# Create a bar plot using ggplot2
ggplot(summary_data, aes(x = cyl, y = avg_mpg, fill = cyl)) +
  geom_bar(stat = "identity") +
  labs(title = "Average MPG by Cylinder Count",
       x = "Number of Cylinders",
       y = "Average MPG") +
  theme_minimal()