---
title: "Mastering Pivot Tables in R"
subtitle: "Data Transformation with the Tidyverse"
author: "Data Science Department"
date: "2026-04-16"
output:
  html_document:
    toc: true
    toc_float: true
    theme: cosmo
    highlight: tango
---



# 1. Introduction
In Excel, a **Pivot Table** is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table. In R, we achieve the same (and much more) using the `tidyverse` suite of packages, specifically `dplyr` for aggregation and `tidyr` for reshaping.

### Key Concepts
1.  **Grouping**: Which categories do we want to see in rows?
2.  **Summarizing**: What math are we doing (Sum, Mean, Count)?
3.  **Pivoting**: How do we want to orient the columns and rows?

---

# 2. The Data
We will use a simulated **Retail Sales Dataset**. This contains transaction records across different regions, product categories, and time.


``` r
# Create a realistic sales dataset
set.seed(42)
sales_data <- data.frame(
  Date = rep(seq(as.Date("2023-01-01"), as.Date("2023-12-01"), by="month"), each=4),
  Region = rep(c("North", "South", "East", "West"), 12),
  Category = sample(c("Electronics", "Furniture", "Clothing"), 48, replace = TRUE),
  Sales = round(runif(48, 1000, 5000), 0),
  Quantity = sample(1:10, 48, replace = TRUE)
)

# View the first few rows
kable(head(sales_data)) %>% kable_styling(bootstrap_options = c("striped", "hover"))

Date	Region	Category	Sales	Quantity
2023-01-01	North	Electronics	1758	7
2023-01-01	South	Electronics	2085	10
2023-01-01	East	Electronics	4313	2
2023-01-01	West	Electronics	3773	5
2023-02-01	North	Furniture	1962	6
2023-02-01	South	Furniture	1172	3

3. Level 1: Basic Aggregation (The “Row” Summary)

The most basic pivot table calculates a total based on one variable. In R, we use group_by() and summarize().

Total Sales by Region

pivot_1 <- sales_data %>%
  group_by(Region) %>%
  summarize(
    Total_Sales = sum(Sales),
    Avg_Qty = mean(Quantity),
    Transactions = n()
  )

kable(pivot_1) %>% kable_styling(full_width = F)

Region	Total_Sales	Avg_Qty	Transactions
East	34951	6.166667	12
North	31768	5.083333	12
South	34086	7.166667	12
West	35032	5.250000	12

4. Level 2: Two-Dimensional Pivot (Rows and Columns)

To create a “Cross-tab” (like a standard Excel Pivot Table), we use pivot_wider().

Goal: Rows = Region, Columns = Category, Values = Sum of Sales

pivot_2 <- sales_data %>%
  group_by(Region, Category) %>%
  summarize(Total_Sales = sum(Sales)) %>%
  pivot_wider(names_from = Category, values_from = Total_Sales)

kable(pivot_2) %>% kable_styling(full_width = F)

Region	Clothing	Electronics	Furniture
East	3936	17066	13949
North	9400	6882	15486
South	11354	6387	16345
West	9761	12990	12281

5. Level 3: Handling Missing Values and Totals

Often, some combinations (e.g., North Region + Furniture) might have no data. We can fill those NAs with zeros.

pivot_3 <- sales_data %>%
  group_by(Region, Category) %>%
  summarize(Total_Sales = sum(Sales)) %>%
  pivot_wider(
    names_from = Category, 
    values_from = Total_Sales,
    values_fill = 0  # Replace NAs with 0
  ) %>%
  mutate(Grand_Total = Clothing + Electronics + Furniture) # Adding a total column

kable(pivot_3) %>% kable_styling(full_width = F)

Region	Clothing	Electronics	Furniture	Grand_Total
East	3936	17066	13949	34951
North	9400	6882	15486	31768
South	11354	6387	16345	34086
West	9761	12990	12281	35032

6. Visualizing the Pivot Table

A pivot table is often best understood as a Heatmap.

# Preparing data for heatmap
plot_data <- sales_data %>%
  group_by(Region, Category) %>%
  summarize(Total_Sales = sum(Sales))

ggplot(plot_data, aes(x = Category, y = Region, fill = Total_Sales)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "#e3f2fd", high = "#0d47a1") +
  geom_text(aes(label = scales::dollar(Total_Sales)), color = "black") +
  theme_minimal() +
  labs(title = "Sales Heatmap by Region and Category",
       subtitle = "Visual representation of a Pivot Table",
       fill = "Revenue")

7. Pivoting Longer (Un-pivoting)

Sometimes you receive data that is already “Pivoted” (Wide format), but R needs it in “Long” format for analysis or plotting.

The “Wide” Data we have:

wide_data <- pivot_2 # From example 4
print(wide_data)

## # A tibble: 4 × 4
## # Groups:   Region [4]
##   Region Clothing Electronics Furniture
##   <chr>     <dbl>       <dbl>     <dbl>
## 1 East       3936       17066     13949
## 2 North      9400        6882     15486
## 3 South     11354        6387     16345
## 4 West       9761       12990     12281

Transforming it back to “Long”:

long_data <- wide_data %>%
  pivot_longer(
    cols = c(Clothing, Electronics, Furniture), 
    names_to = "Product_Category", 
    values_to = "Sales_Amount"
  )

kable(head(long_data)) %>% kable_styling(full_width = F)

Region	Product_Category	Sales_Amount
East	Clothing	3936
East	Electronics	17066
East	Furniture	13949
North	Clothing	9400
North	Electronics	6882
North	Furniture	15486

8. Interactive Pivot Tables in R

If you miss the “Drag and Drop” feel of Excel, you can use the rpivotTable package.

Note: This requires the rpivotTable library installed.

# Install if needed
# install.packages("rpivotTable")
library(rpivotTable)

rpivotTable(sales_data, rows="Region", cols="Category", aggregatorName="Sum", vals="Sales")

9. Summary Checklist

Excel Action	R Tidyverse Command
Select Data	`df %>%`
Rows / Columns	`group_by(Variable)`
Values (Summarize)	`summarize(Result = sum(Value))`
Change Layout	`pivot_wider()` or `pivot_longer()`
Filter	`filter(Variable == "Criteria")`

10. Exercise for Students

Using the built-in diamonds dataset: 1. Create a pivot table showing the average price of diamonds grouped by cut (rows) and color (columns). 2. Filter the data to only include diamonds with a carat greater than 1.0. 3. Plot the result as a bar chart.

# Sample Solution
diamonds %>%
  filter(carat > 1.0) %>%
  group_by(cut, color) %>%
  summarize(mean_price = mean(price)) %>%
  pivot_wider(names_from = color, values_from = mean_price) %>%
  kable(caption = "Exercise: Avg Price by Cut and Color") %>%
  kable_styling()

Exercise: Avg Price by Cut and Color
cut	D	E	F	G	H	I	J
Fair	7213.741	6620.571	6998.541	7383.731	7545.500	6824.217	7188.179
Good	7146.239	7581.790	7166.548	7940.004	8027.114	8909.765	7025.711
Very Good	8481.019	8286.706	8544.886	8368.524	8145.186	8712.360	7746.332
Premium	8403.586	8715.706	8412.197	8634.356	8112.005	8815.368	8471.312
Ideal	9075.383	8657.260	9227.478	9016.481	8045.969	8619.710	7993.977

```

How to use this:

Open RStudio.
Go to File -> New File -> R Markdown...
Delete all the default text in the new file.
Paste the code block above.
Save it as pivot_lecture.Rmd.
Click the Knit button (the yarn icon at the top).

Key Features of this Lecture Note:

Real Data Simulation: It creates a sales_data object so students can see exactly how the data structure changes.
Tidyverse Workflow: It teaches pivot_wider and pivot_longer, which are the modern industry standards replacing the older reshape2 or spread/gather functions.
Visuals: It includes a ggplot2 heatmap example, bridging the gap between tabular data and data visualization.
Professional Formatting: Uses kableExtra for clean, readable tables.

pivot table

sundus khadar cabdi

2026-04-16