In spreadsheet software like Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table.
In R, we achieve “Pivot Table” functionality using two primary
packages from the Tidyverse: 1.
dplyr: For summarizing and aggregating
data (The “Summary” part). 2. tidyr: For
reshaping data between “long” and “wide” formats (The “Layout”
part).
We will use the palmerpenguins dataset, which contains
observations of three penguin species in Antarctica.
| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
|---|---|---|---|---|---|---|---|
| Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
| Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
| Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
| Adelie | Torgersen | NA | NA | NA | NA | NA | 2007 |
| Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
| Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
The most common use of a pivot table is to group data and calculate a metric (Sum, Mean, Count).
In Excel, you would drag “Species” to Rows and “Body Mass” to Values. In R:
pivot_summary <- penguins %>%
group_by(species) %>%
summarize(
avg_mass_g = mean(body_mass_g, na.rm = TRUE),
count = n()
)
pivot_summary %>% kable()| species | avg_mass_g | count |
|---|---|---|
| Adelie | 3700.662 | 152 |
| Chinstrap | 3733.088 | 68 |
| Gentoo | 5076.016 | 124 |
To see the average mass across both species and the island they inhabit:
multi_pivot <- penguins %>%
group_by(species, island) %>%
summarize(
avg_mass = mean(body_mass_g, na.rm = TRUE)
) %>%
ungroup()
multi_pivot %>% kable()| species | island | avg_mass |
|---|---|---|
| Adelie | Biscoe | 3709.659 |
| Adelie | Dream | 3688.393 |
| Adelie | Torgersen | 3706.373 |
| Chinstrap | Dream | 3733.088 |
| Gentoo | Biscoe | 5076.016 |
pivot_wider)The output above is in Long Format. However, a
traditional Pivot Table often puts one variable in the columns. We use
pivot_wider() to achieve this.
Let’s turn the islands into columns to compare species mass side-by-side.
wide_pivot <- multi_pivot %>%
pivot_wider(
names_from = island,
values_from = avg_mass
)
wide_pivot %>% kable(caption = "Average Body Mass (g) by Island")| species | Biscoe | Dream | Torgersen |
|---|---|---|---|
| Adelie | 3709.659 | 3688.393 | 3706.373 |
| Chinstrap | NA | 3733.088 | NA |
| Gentoo | 5076.016 | NA | NA |
names_from: The column whose values
will become the new column headers.values_from: The column whose values
will fill the cells.pivot_longer)Often, data arrives in a “Wide” format (like an Excel sheet) that is
hard for R to plot. We use pivot_longer() to make it
“Tidy.”
Imagine we have a dataset of yearly sales:
sales_data <- data.frame(
Region = c("North", "South"),
Year_2021 = c(100, 150),
Year_2022 = c(120, 180),
Year_2023 = c(130, 200)
)
sales_data %>% kable()| Region | Year_2021 | Year_2022 | Year_2023 |
|---|---|---|---|
| North | 100 | 120 | 130 |
| South | 150 | 180 | 200 |
To analyze this properly, we want “Year” in one column and “Revenue” in another.
long_sales <- sales_data %>%
pivot_longer(
cols = starts_with("Year"),
names_to = "Year",
values_to = "Revenue",
names_prefix = "Year_"
)
long_sales %>% kable()| Region | Year | Revenue |
|---|---|---|
| North | 2021 | 100 |
| North | 2022 | 120 |
| North | 2023 | 130 |
| South | 2021 | 150 |
| South | 2022 | 180 |
| South | 2023 | 200 |
A visual pivot table is often represented as a Heatmap or a Grouped Bar Chart.
Using our multi_pivot data (Species vs. Island):
ggplot(multi_pivot, aes(x = island, y = species, fill = avg_mass)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "Heatmap of Penguin Body Mass",
subtitle = "Visualizing the aggregated pivot table",
x = "Island",
y = "Species",
fill = "Avg Mass (g)") +
theme_minimal()| Excel Action | R Function (Tidyverse) |
|---|---|
| Rows / Columns | group_by() |
| Values (Sum, Avg, Count) | summarize() |
| Move Field to Columns | pivot_wider() |
| Clean messy wide data | pivot_longer() |
| Filters | filter() |
Task: Using the penguins dataset: 1.
Filter the data for female penguins only. 2. Create a pivot table
showing the maximum flipper_length_mm
grouped by species and sex. 3. Use
pivot_wider to make the sex the columns.
# Try it yourself!
exercise_result <- penguins %>%
filter(sex == "female") %>%
group_by(species, island) %>%
summarize(max_flipper = max(flipper_length_mm, na.rm = TRUE)) %>%
pivot_wider(names_from = island, values_from = max_flipper)
exercise_result %>% kable()| species | Biscoe | Dream | Torgersen |
|---|---|---|---|
| Adelie | 199 | 202 | 196 |
| Chinstrap | NA | 202 | NA |
| Gentoo | 222 | NA | NA |
```
File -> New File -> R Markdown.....Rmd.palmerpenguins library, which is the modern standard for
data science teaching.pivot_longer
and pivot_wider (the modern replacements for
melt and cast).ggplot2 heatmap
to help students bridge the gap between tables and charts.