1. Introduction

In spreadsheet software like Excel, Pivot Tables are the go-to tool for summarizing large datasets. In R, we achieve the same (and much more) using the tidyverse ecosystem, specifically the dplyr package for aggregation and the tidyr package for reshaping.

Why Pivot?

  1. Summarization: Turning thousands of rows into a meaningful summary table (e.g., Average sales per region).
  2. Tidying Data: Converting data into a format suitable for analysis or plotting.
  3. Reporting: Creating wide-format tables that are easier for humans to read.

2. The Dataset: Palmer Penguins

We will use the penguins dataset. It contains measurements for penguin species, islands in the Palmer Archipelago, size (flipper length, body mass), and sex.

# Load and view the first few rows
data("penguins")
head(penguins) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

3. Part A: Summarizing Data (The “Excel” Pivot)

In R, a pivot table summary is created by combining group_by() and summarize().

Example 1: Basic Summary

Let’s find the average body mass and count of penguins per species.

pivot_summary <- penguins %>%
  filter(!is.na(sex)) %>% # Remove rows with missing sex
  group_by(species, island) %>%
  summarize(
    avg_mass_g = mean(body_mass_g, na.rm = TRUE),
    count = n()
  ) %>%
  ungroup()

pivot_summary %>% 
  kable(caption = "Summary Pivot Table") %>% 
  kable_styling(full_width = F)
Summary Pivot Table
species island avg_mass_g count
Adelie Biscoe 3709.659 44
Adelie Dream 3701.364 55
Adelie Torgersen 3708.511 47
Chinstrap Dream 3733.088 68
Gentoo Biscoe 5092.437 119

Visualizing the Summary

Pivot tables are often precursors to visualizations.

ggplot(pivot_summary, aes(x = species, y = avg_mass_g, fill = island)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Average Penguin Body Mass", y = "Mass (g)", x = "Species")
Figure 1: Average Body Mass by Species and Island

Figure 1: Average Body Mass by Species and Island


4. Part B: Reshaping Data

Often, “pivoting” refers to changing the layout of the data from Long to Wide or vice-versa.

4.1 pivot_wider(): Making it Readable

Use pivot_wider() when you want to transform categories into columns. This is common for creating comparison reports.

Task: Create a table where species are rows, islands are columns, and the values are the average body mass.

wide_table <- pivot_summary %>%
  select(species, island, avg_mass_g) %>%
  pivot_wider(
    names_from = island, 
    values_from = avg_mass_g
  )

wide_table %>% 
  kable(caption = "Wide Format Table (Human Readable)") %>% 
  kable_styling(full_width = F)
Wide Format Table (Human Readable)
species Biscoe Dream Torgersen
Adelie 3709.659 3701.364 3708.511
Chinstrap NA 3733.088 NA
Gentoo 5092.437 NA NA

4.2 pivot_longer(): Making it Analysis-Ready

Most R functions (like ggplot2) require Long Format data. Let’s assume we have data where “Year” is spread across columns and we need to collapse it.

Example:

# Creating a dummy 'wide' dataset
untidy_data <- data.frame(
  Country = c("USA", "Canada", "Mexico"),
  `2020` = c(100, 150, 120),
  `2021` = c(120, 160, 130),
  check.names = FALSE
)

# Pivoting Longer
tidy_data <- untidy_data %>%
  pivot_longer(
    cols = c(`2020`, `2021`), 
    names_to = "Year", 
    values_to = "GDP_Growth"
  )

tidy_data %>% kable() %>% kable_styling(full_width = F)
Country Year GDP_Growth
USA 2020 100
USA 2021 120
Canada 2020 150
Canada 2021 160
Mexico 2020 120
Mexico 2021 130

5. Advanced Pivot: Multiple Values

You can pivot multiple value columns at once. Let’s get both the mean and the max body mass into a wide format.

complex_pivot <- penguins %>%
  filter(!is.na(sex)) %>%
  group_by(species, sex) %>%
  summarize(
    avg_mass = mean(body_mass_g, na.rm = TRUE),
    max_mass = max(body_mass_g, na.rm = TRUE)
  ) %>%
  pivot_wider(
    names_from = sex,
    values_from = c(avg_mass, max_mass)
  )

complex_pivot %>% kable() %>% kable_styling(bootstrap_options = "condensed")
species avg_mass_female avg_mass_male max_mass_female max_mass_male
Adelie 3368.836 4043.493 3900 4775
Chinstrap 3527.206 3938.971 4150 4800
Gentoo 4679.741 5484.836 5200 6300

6. Summary Cheat Sheet

Goal Function
Group Data group_by(column_name)
Calculate Stats summarize(new_var = mean(value))
Rows to Columns pivot_wider(names_from = ..., values_from = ...)
Columns to Rows pivot_longer(cols = ..., names_to = ..., values_to = ...)

7. Exercise for Students

  1. Using the penguins dataset, create a pivot table showing the median flipper length for each species and sex.
  2. Filter out NA values.
  3. Pivot the result so that sex (male/female) are the column headers.

Solution Template

# Try this yourself!
penguins %>%
  filter(!is.na(sex)) %>%
  group_by(species, sex) %>%
  summarize(med_flipper = median(flipper_length_mm)) %>%
  pivot_wider(names_from = sex, values_from = med_flipper)

End of Lecture Notes. ```

How to use this in RStudio:

  1. Open RStudio.
  2. Go to File > New File > R Markdown…
  3. Delete all the default text in the new window.
  4. Paste the code provided above.
  5. Save the file as Pivot_Table_Lecture.Rmd.
  6. Click the Knit button (the ball of yarn icon) at the top of the editor.

Key Features of this Script:

  • Real Data: It uses the palmerpenguins dataset, which is a modern replacement for the Iris dataset.
  • Visuals: It includes a ggplot2 bar chart to show how pivot results are used in plotting.
  • Formatting: It uses kable and kableExtra to make the tables look like professional lecture handouts.
  • Comprehensive: It covers the two meanings of “pivoting” (aggregation and reshaping).