---
title: "Mastering Pivot Tables in R"
subtitle: "Data Transformation with the Tidyverse"
author: "Data Science Department"
date: "2026-04-16"
output:
html_document:
toc: true
toc_float: true
theme: cosmo
highlight: tango
---
# 1. Introduction
In Excel, a **Pivot Table** is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table. In R, we achieve the same (and much more) using the `tidyverse` suite of packages, specifically `dplyr` for aggregation and `tidyr` for reshaping.
### Key Concepts
1. **Grouping**: Which categories do we want to see in rows?
2. **Summarizing**: What math are we doing (Sum, Mean, Count)?
3. **Pivoting**: How do we want to orient the columns and rows?
---
# 2. The Data
We will use a simulated **Retail Sales Dataset**. This contains transaction records across different regions, product categories, and time.
``` r
# Create a realistic sales dataset
set.seed(42)
sales_data <- data.frame(
Date = rep(seq(as.Date("2023-01-01"), as.Date("2023-12-01"), by="month"), each=4),
Region = rep(c("North", "South", "East", "West"), 12),
Category = sample(c("Electronics", "Furniture", "Clothing"), 48, replace = TRUE),
Sales = round(runif(48, 1000, 5000), 0),
Quantity = sample(1:10, 48, replace = TRUE)
)
# View the first few rows
kable(head(sales_data)) %>% kable_styling(bootstrap_options = c("striped", "hover"))
| Date | Region | Category | Sales | Quantity |
|---|---|---|---|---|
| 2023-01-01 | North | Electronics | 1758 | 7 |
| 2023-01-01 | South | Electronics | 2085 | 10 |
| 2023-01-01 | East | Electronics | 4313 | 2 |
| 2023-01-01 | West | Electronics | 3773 | 5 |
| 2023-02-01 | North | Furniture | 1962 | 6 |
| 2023-02-01 | South | Furniture | 1172 | 3 |
The most basic pivot table calculates a total based on one variable.
In R, we use group_by() and summarize().
pivot_1 <- sales_data %>%
group_by(Region) %>%
summarize(
Total_Sales = sum(Sales),
Avg_Qty = mean(Quantity),
Transactions = n()
)
kable(pivot_1) %>% kable_styling(full_width = F)
| Region | Total_Sales | Avg_Qty | Transactions |
|---|---|---|---|
| East | 34951 | 6.166667 | 12 |
| North | 31768 | 5.083333 | 12 |
| South | 34086 | 7.166667 | 12 |
| West | 35032 | 5.250000 | 12 |
To create a “Cross-tab” (like a standard Excel Pivot Table), we use
pivot_wider().
pivot_2 <- sales_data %>%
group_by(Region, Category) %>%
summarize(Total_Sales = sum(Sales)) %>%
pivot_wider(names_from = Category, values_from = Total_Sales)
kable(pivot_2) %>% kable_styling(full_width = F)
| Region | Clothing | Electronics | Furniture |
|---|---|---|---|
| East | 3936 | 17066 | 13949 |
| North | 9400 | 6882 | 15486 |
| South | 11354 | 6387 | 16345 |
| West | 9761 | 12990 | 12281 |
Often, some combinations (e.g., North Region + Furniture) might have
no data. We can fill those NAs with zeros.
pivot_3 <- sales_data %>%
group_by(Region, Category) %>%
summarize(Total_Sales = sum(Sales)) %>%
pivot_wider(
names_from = Category,
values_from = Total_Sales,
values_fill = 0 # Replace NAs with 0
) %>%
mutate(Grand_Total = Clothing + Electronics + Furniture) # Adding a total column
kable(pivot_3) %>% kable_styling(full_width = F)
| Region | Clothing | Electronics | Furniture | Grand_Total |
|---|---|---|---|---|
| East | 3936 | 17066 | 13949 | 34951 |
| North | 9400 | 6882 | 15486 | 31768 |
| South | 11354 | 6387 | 16345 | 34086 |
| West | 9761 | 12990 | 12281 | 35032 |
A pivot table is often best understood as a Heatmap.
# Preparing data for heatmap
plot_data <- sales_data %>%
group_by(Region, Category) %>%
summarize(Total_Sales = sum(Sales))
ggplot(plot_data, aes(x = Category, y = Region, fill = Total_Sales)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "#e3f2fd", high = "#0d47a1") +
geom_text(aes(label = scales::dollar(Total_Sales)), color = "black") +
theme_minimal() +
labs(title = "Sales Heatmap by Region and Category",
subtitle = "Visual representation of a Pivot Table",
fill = "Revenue")
Sometimes you receive data that is already “Pivoted” (Wide format), but R needs it in “Long” format for analysis or plotting.
wide_data <- pivot_2 # From example 4
print(wide_data)
## # A tibble: 4 × 4
## # Groups: Region [4]
## Region Clothing Electronics Furniture
## <chr> <dbl> <dbl> <dbl>
## 1 East 3936 17066 13949
## 2 North 9400 6882 15486
## 3 South 11354 6387 16345
## 4 West 9761 12990 12281
long_data <- wide_data %>%
pivot_longer(
cols = c(Clothing, Electronics, Furniture),
names_to = "Product_Category",
values_to = "Sales_Amount"
)
kable(head(long_data)) %>% kable_styling(full_width = F)
| Region | Product_Category | Sales_Amount |
|---|---|---|
| East | Clothing | 3936 |
| East | Electronics | 17066 |
| East | Furniture | 13949 |
| North | Clothing | 9400 |
| North | Electronics | 6882 |
| North | Furniture | 15486 |
If you miss the “Drag and Drop” feel of Excel, you can use the
rpivotTable package.
Note: This requires the rpivotTable library
installed.
# Install if needed
# install.packages("rpivotTable")
library(rpivotTable)
rpivotTable(sales_data, rows="Region", cols="Category", aggregatorName="Sum", vals="Sales")
| Excel Action | R Tidyverse Command |
|---|---|
| Select Data | df %>% |
| Rows / Columns | group_by(Variable) |
| Values (Summarize) | summarize(Result = sum(Value)) |
| Change Layout | pivot_wider() or
pivot_longer() |
| Filter | filter(Variable == "Criteria") |
Using the built-in diamonds dataset: 1. Create a pivot
table showing the average price of diamonds grouped by
cut (rows) and color (columns). 2.
Filter the data to only include diamonds with a carat
greater than 1.0. 3. Plot the result as a bar chart.
# Sample Solution
diamonds %>%
filter(carat > 1.0) %>%
group_by(cut, color) %>%
summarize(mean_price = mean(price)) %>%
pivot_wider(names_from = color, values_from = mean_price) %>%
kable(caption = "Exercise: Avg Price by Cut and Color") %>%
kable_styling()
| cut | D | E | F | G | H | I | J |
|---|---|---|---|---|---|---|---|
| Fair | 7213.741 | 6620.571 | 6998.541 | 7383.731 | 7545.500 | 6824.217 | 7188.179 |
| Good | 7146.239 | 7581.790 | 7166.548 | 7940.004 | 8027.114 | 8909.765 | 7025.711 |
| Very Good | 8481.019 | 8286.706 | 8544.886 | 8368.524 | 8145.186 | 8712.360 | 7746.332 |
| Premium | 8403.586 | 8715.706 | 8412.197 | 8634.356 | 8112.005 | 8815.368 | 8471.312 |
| Ideal | 9075.383 | 8657.260 | 9227.478 | 9016.481 | 8045.969 | 8619.710 | 7993.977 |
```
File -> New File -> R Markdown...pivot_lecture.Rmd.sales_data object so students can see exactly how the data
structure changes.pivot_wider and pivot_longer, which are the
modern industry standards replacing the older reshape2 or
spread/gather functions.ggplot2 heatmap
example, bridging the gap between tabular data and data
visualization.kableExtra for clean, readable tables.