title: “Understanding group_by() and %>% in R” author: “HIrwa Fabrice” date: “2026-05-24” output: html_document ———————
In R programming, data analysis is made easier using the dplyr package. Two very important tools are:
group_by()%>% (pipe operator)These functions help in organizing, transforming, and summarizing data efficiently.
library(dplyr)
%>%The pipe operator %>% is used to pass the result of
one function to another function.
It means:
Take the output from the left side and use it as input for the next step.
mtcars %>%
select(mpg, hp) %>%
filter(mpg > 20)
mtcarsfilter(select(mtcars, mpg, hp), mpg > 20)
This is harder to read and understand.
The group_by() function is used to divide data into
groups based on a variable.
After grouping, we can apply summary functions like mean, sum, or count.
mtcars %>%
group_by(cyl)
This groups cars based on number of cylinders:
mtcars %>%
group_by(cyl) %>%
summarise(avg_mpg = mean(mpg))
A lecturer wants to analyze student marks by department to understand performance differences.
We have a dataset called students:
students <- data.frame(
student_id = 1:10,
name = c("John","Alice","David","Grace","Eric","Sarah","Michael","Linda","James","Emma"),
department = c("IT","Nursing","Education","IT","Business","Nursing","Education","IT","Business","Nursing"),
marks = c(78,85,67,90,74,88,69,95,72,81)
)
students
students %>%
group_by(department) %>%
summarise(
average_marks = mean(marks),
total_students = n()
)
Data is split into groups:
For each department:
| department | average_marks | total_students |
|---|---|---|
| IT | high score | 3 |
| Nursing | high score | 3 |
| Education | lower score | 2 |
| Business | medium score | 2 |
This analysis helps:
mtcars %>%
filter(mpg > 15) %>%
group_by(cyl) %>%
summarise(
avg_mpg = mean(mpg),
total_cars = n()
)
Step-by-step:
These functions are widely used in data analysis because they help to:
| Function | Purpose |
|---|---|
%>% |
Connect steps in a workflow |
group_by() |
Split data into groups |
summarise() |
Compute summary statistics |
The combination of group_by() and %>%
allows data analysts to write clean, readable, and efficient code. These
tools are essential for modern data analysis and big data
processing.