1. Introduction

In spreadsheet software like Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table.

In R, we achieve “Pivot Table” functionality using two primary packages from the Tidyverse: 1. dplyr: For summarizing and aggregating data (The “Summary” part). 2. tidyr: For reshaping data between “long” and “wide” formats (The “Layout” part).


2. The Dataset

We will use the palmerpenguins dataset, which contains observations of three penguin species in Antarctica.

# Load data
data("penguins")

# View the first few rows
head(penguins) %>% kable()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

3. Part 1: Aggregation (Summarizing Data)

The most common use of a pivot table is to group data and calculate a metric (Sum, Mean, Count).

3.1 Basic Pivot: Mean Body Mass by Species

In Excel, you would drag “Species” to Rows and “Body Mass” to Values. In R:

pivot_summary <- penguins %>%
  group_by(species) %>%
  summarize(
    avg_mass_g = mean(body_mass_g, na.rm = TRUE),
    count = n()
  )

pivot_summary %>% kable()
species avg_mass_g count
Adelie 3700.662 152
Chinstrap 3733.088 68
Gentoo 5076.016 124

3.2 Multi-Level Pivot: Species and Island

To see the average mass across both species and the island they inhabit:

multi_pivot <- penguins %>%
  group_by(species, island) %>%
  summarize(
    avg_mass = mean(body_mass_g, na.rm = TRUE)
  ) %>%
  ungroup()

multi_pivot %>% kable()
species island avg_mass
Adelie Biscoe 3709.659
Adelie Dream 3688.393
Adelie Torgersen 3706.373
Chinstrap Dream 3733.088
Gentoo Biscoe 5076.016

4. Part 2: Reshaping Data (pivot_wider)

The output above is in Long Format. However, a traditional Pivot Table often puts one variable in the columns. We use pivot_wider() to achieve this.

4.1 Creating a Cross-Tabulation

Let’s turn the islands into columns to compare species mass side-by-side.

wide_pivot <- multi_pivot %>%
  pivot_wider(
    names_from = island, 
    values_from = avg_mass
  )

wide_pivot %>% kable(caption = "Average Body Mass (g) by Island")
Average Body Mass (g) by Island
species Biscoe Dream Torgersen
Adelie 3709.659 3688.393 3706.373
Chinstrap NA 3733.088 NA
Gentoo 5076.016 NA NA
  • names_from: The column whose values will become the new column headers.
  • values_from: The column whose values will fill the cells.

5. Part 3: Reshaping Data (pivot_longer)

Often, data arrives in a “Wide” format (like an Excel sheet) that is hard for R to plot. We use pivot_longer() to make it “Tidy.”

5.1 The Scenario

Imagine we have a dataset of yearly sales:

sales_data <- data.frame(
  Region = c("North", "South"),
  Year_2021 = c(100, 150),
  Year_2022 = c(120, 180),
  Year_2023 = c(130, 200)
)

sales_data %>% kable()
Region Year_2021 Year_2022 Year_2023
North 100 120 130
South 150 180 200

5.2 Converting to Long Format

To analyze this properly, we want “Year” in one column and “Revenue” in another.

long_sales <- sales_data %>%
  pivot_longer(
    cols = starts_with("Year"), 
    names_to = "Year", 
    values_to = "Revenue",
    names_prefix = "Year_"
  )

long_sales %>% kable()
Region Year Revenue
North 2021 100
North 2022 120
North 2023 130
South 2021 150
South 2022 180
South 2023 200

6. Visualizing Pivot Tables

A visual pivot table is often represented as a Heatmap or a Grouped Bar Chart.

6.1 Heatmap (The Visual Pivot Table)

Using our multi_pivot data (Species vs. Island):

ggplot(multi_pivot, aes(x = island, y = species, fill = avg_mass)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Heatmap of Penguin Body Mass",
       subtitle = "Visualizing the aggregated pivot table",
       x = "Island",
       y = "Species",
       fill = "Avg Mass (g)") +
  theme_minimal()


7. Summary Table: Excel vs. R

Excel Action R Function (Tidyverse)
Rows / Columns group_by()
Values (Sum, Avg, Count) summarize()
Move Field to Columns pivot_wider()
Clean messy wide data pivot_longer()
Filters filter()

8. Lab Exercise

Task: Using the penguins dataset: 1. Filter the data for female penguins only. 2. Create a pivot table showing the maximum flipper_length_mm grouped by species and sex. 3. Use pivot_wider to make the sex the columns.

# Try it yourself!
exercise_result <- penguins %>%
  filter(sex == "female") %>%
  group_by(species, island) %>%
  summarize(max_flipper = max(flipper_length_mm, na.rm = TRUE)) %>%
  pivot_wider(names_from = island, values_from = max_flipper)

exercise_result %>% kable()
species Biscoe Dream Torgersen
Adelie 199 202 196
Chinstrap NA 202 NA
Gentoo 222 NA NA

```

How to use this:

  1. Open RStudio.
  2. Go to File -> New File -> R Markdown....
  3. Clear all the default text.
  4. Paste the code above.
  5. Save the file as .Rmd.
  6. Click the Knit button (top of the editor).

Key Features of this Note:

  • Dynamic Data: It uses the palmerpenguins library, which is the modern standard for data science teaching.
  • Tidyverse Syntax: It uses pivot_longer and pivot_wider (the modern replacements for melt and cast).
  • Visuals: It includes a ggplot2 heatmap to help students bridge the gap between tables and charts.
  • Comparison: A handy table helps Excel users translate their knowledge to R code.