---
title: "Mastering Pivot Tables in R"
subtitle: "Data Transformation with the Tidyverse"
author: "Data Science Department"
date: "2026-04-16"
output:
  html_document:
    toc: true
    toc_float: true
    theme: cosmo
    highlight: tango
---



# 1. Introduction
In Excel, a **Pivot Table** is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table. In R, we achieve the same (and much more) using the `tidyverse` suite of packages, specifically `dplyr` for aggregation and `tidyr` for reshaping.

### Key Concepts
1.  **Grouping**: Which categories do we want to see in rows?
2.  **Summarizing**: What math are we doing (Sum, Mean, Count)?
3.  **Pivoting**: How do we want to orient the columns and rows?

---

# 2. The Data
We will use a simulated **Retail Sales Dataset**. This contains transaction records across different regions, product categories, and time.


``` r
# Create a realistic sales dataset
set.seed(42)
sales_data <- data.frame(
  Date = rep(seq(as.Date("2023-01-01"), as.Date("2023-12-01"), by="month"), each=4),
  Region = rep(c("North", "South", "East", "West"), 12),
  Category = sample(c("Electronics", "Furniture", "Clothing"), 48, replace = TRUE),
  Sales = round(runif(48, 1000, 5000), 0),
  Quantity = sample(1:10, 48, replace = TRUE)
)

# View the first few rows
kable(head(sales_data)) %>% kable_styling(bootstrap_options = c("striped", "hover"))
Date Region Category Sales Quantity
2023-01-01 North Electronics 1758 7
2023-01-01 South Electronics 2085 10
2023-01-01 East Electronics 4313 2
2023-01-01 West Electronics 3773 5
2023-02-01 North Furniture 1962 6
2023-02-01 South Furniture 1172 3

3. Level 1: Basic Aggregation (The “Row” Summary)

The most basic pivot table calculates a total based on one variable. In R, we use group_by() and summarize().

Total Sales by Region

pivot_1 <- sales_data %>%
  group_by(Region) %>%
  summarize(
    Total_Sales = sum(Sales),
    Avg_Qty = mean(Quantity),
    Transactions = n()
  )

kable(pivot_1) %>% kable_styling(full_width = F)
Region Total_Sales Avg_Qty Transactions
East 34951 6.166667 12
North 31768 5.083333 12
South 34086 7.166667 12
West 35032 5.250000 12

4. Level 2: Two-Dimensional Pivot (Rows and Columns)

To create a “Cross-tab” (like a standard Excel Pivot Table), we use pivot_wider().

Goal: Rows = Region, Columns = Category, Values = Sum of Sales

pivot_2 <- sales_data %>%
  group_by(Region, Category) %>%
  summarize(Total_Sales = sum(Sales)) %>%
  pivot_wider(names_from = Category, values_from = Total_Sales)

kable(pivot_2) %>% kable_styling(full_width = F)
Region Clothing Electronics Furniture
East 3936 17066 13949
North 9400 6882 15486
South 11354 6387 16345
West 9761 12990 12281

5. Level 3: Handling Missing Values and Totals

Often, some combinations (e.g., North Region + Furniture) might have no data. We can fill those NAs with zeros.

pivot_3 <- sales_data %>%
  group_by(Region, Category) %>%
  summarize(Total_Sales = sum(Sales)) %>%
  pivot_wider(
    names_from = Category, 
    values_from = Total_Sales,
    values_fill = 0  # Replace NAs with 0
  ) %>%
  mutate(Grand_Total = Clothing + Electronics + Furniture) # Adding a total column

kable(pivot_3) %>% kable_styling(full_width = F)
Region Clothing Electronics Furniture Grand_Total
East 3936 17066 13949 34951
North 9400 6882 15486 31768
South 11354 6387 16345 34086
West 9761 12990 12281 35032

6. Visualizing the Pivot Table

A pivot table is often best understood as a Heatmap.

# Preparing data for heatmap
plot_data <- sales_data %>%
  group_by(Region, Category) %>%
  summarize(Total_Sales = sum(Sales))

ggplot(plot_data, aes(x = Category, y = Region, fill = Total_Sales)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "#e3f2fd", high = "#0d47a1") +
  geom_text(aes(label = scales::dollar(Total_Sales)), color = "black") +
  theme_minimal() +
  labs(title = "Sales Heatmap by Region and Category",
       subtitle = "Visual representation of a Pivot Table",
       fill = "Revenue")


7. Pivoting Longer (Un-pivoting)

Sometimes you receive data that is already “Pivoted” (Wide format), but R needs it in “Long” format for analysis or plotting.

The “Wide” Data we have:

wide_data <- pivot_2 # From example 4
print(wide_data)
## # A tibble: 4 × 4
## # Groups:   Region [4]
##   Region Clothing Electronics Furniture
##   <chr>     <dbl>       <dbl>     <dbl>
## 1 East       3936       17066     13949
## 2 North      9400        6882     15486
## 3 South     11354        6387     16345
## 4 West       9761       12990     12281

Transforming it back to “Long”:

long_data <- wide_data %>%
  pivot_longer(
    cols = c(Clothing, Electronics, Furniture), 
    names_to = "Product_Category", 
    values_to = "Sales_Amount"
  )

kable(head(long_data)) %>% kable_styling(full_width = F)
Region Product_Category Sales_Amount
East Clothing 3936
East Electronics 17066
East Furniture 13949
North Clothing 9400
North Electronics 6882
North Furniture 15486

8. Interactive Pivot Tables in R

If you miss the “Drag and Drop” feel of Excel, you can use the rpivotTable package.

Note: This requires the rpivotTable library installed.

# Install if needed
# install.packages("rpivotTable")
library(rpivotTable)

rpivotTable(sales_data, rows="Region", cols="Category", aggregatorName="Sum", vals="Sales")

9. Summary Checklist

Excel Action R Tidyverse Command
Select Data df %>%
Rows / Columns group_by(Variable)
Values (Summarize) summarize(Result = sum(Value))
Change Layout pivot_wider() or pivot_longer()
Filter filter(Variable == "Criteria")

10. Exercise for Students

Using the built-in diamonds dataset: 1. Create a pivot table showing the average price of diamonds grouped by cut (rows) and color (columns). 2. Filter the data to only include diamonds with a carat greater than 1.0. 3. Plot the result as a bar chart.

# Sample Solution
diamonds %>%
  filter(carat > 1.0) %>%
  group_by(cut, color) %>%
  summarize(mean_price = mean(price)) %>%
  pivot_wider(names_from = color, values_from = mean_price) %>%
  kable(caption = "Exercise: Avg Price by Cut and Color") %>%
  kable_styling()
Exercise: Avg Price by Cut and Color
cut D E F G H I J
Fair 7213.741 6620.571 6998.541 7383.731 7545.500 6824.217 7188.179
Good 7146.239 7581.790 7166.548 7940.004 8027.114 8909.765 7025.711
Very Good 8481.019 8286.706 8544.886 8368.524 8145.186 8712.360 7746.332
Premium 8403.586 8715.706 8412.197 8634.356 8112.005 8815.368 8471.312
Ideal 9075.383 8657.260 9227.478 9016.481 8045.969 8619.710 7993.977

```

How to use this:

  1. Open RStudio.
  2. Go to File -> New File -> R Markdown...
  3. Delete all the default text in the new file.
  4. Paste the code block above.
  5. Save it as pivot_lecture.Rmd.
  6. Click the Knit button (the yarn icon at the top).

Key Features of this Lecture Note:

  • Real Data Simulation: It creates a sales_data object so students can see exactly how the data structure changes.
  • Tidyverse Workflow: It teaches pivot_wider and pivot_longer, which are the modern industry standards replacing the older reshape2 or spread/gather functions.
  • Visuals: It includes a ggplot2 heatmap example, bridging the gap between tabular data and data visualization.
  • Professional Formatting: Uses kableExtra for clean, readable tables.