1. Introduction

In Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table. In R, we achieve “Pivot Table” functionality primarily through the tidyverse suite of packages, specifically:

  • dplyr: For grouping and summarizing data.
  • tidyr: For reshaping data (changing the layout from long to wide and vice versa).

Why R for Pivot Tables?

  1. Reproducibility: You can re-run the script on new data without re-clicking buttons.
  2. Scalability: R can handle millions of rows that might crash Excel.
  3. Flexibility: You can integrate complex statistical functions easily.

2. The Dataset: Palmer Penguins

We will use the palmerpenguins dataset, which contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.

# Inspect the data
data("penguins")
head(penguins) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

3. The Basic Summary (The “Pivot” Logic)

The most common pivot operation is calculating a summary statistic (like average or count) across categories.

Example: Average Body Mass by Species and Sex

In Excel, you would put species in Rows, sex in Columns, and body_mass_g in Values. In R:

pivot_summary <- penguins %>%
  filter(!is.na(sex)) %>% # Remove missing values
  group_by(species, sex) %>%
  summarize(mean_mass = mean(body_mass_g, na.rm = TRUE))

pivot_summary %>% 
  kable() %>% 
  kable_styling(full_width = F)
species sex mean_mass
Adelie female 3368.836
Adelie male 4043.493
Chinstrap female 3527.206
Chinstrap male 3938.971
Gentoo female 4679.741
Gentoo male 5484.836

4. Reshaping Data: pivot_wider()

The table above is in Long Format. To make it look like a traditional Excel Pivot Table, we use pivot_wider(). This spreads a column’s values into multiple new columns.

wide_pivot <- pivot_summary %>%
  pivot_wider(
    names_from = sex, 
    values_from = mean_mass
  )

wide_pivot %>% 
  kable(caption = "Traditional Pivot Table View (Wide Format)") %>% 
  kable_styling(full_width = F)
Traditional Pivot Table View (Wide Format)
species female male
Adelie 3368.836 4043.493
Chinstrap 3527.206 3938.971
Gentoo 4679.741 5484.836

Visualizing the Transformation

  • names_from: Which column’s values become the new headers? (e.g., female, male).
  • values_from: Which column contains the numbers to fill the cells?

5. Reshaping Data: pivot_longer()

Sometimes data comes in a wide format that is hard to analyze or plot. We use pivot_longer() to “un-pivot” it back to a tidy format.

Imagine we have a dataset of penguin measurements in a wide format:

# Creating a wide sample
wide_data <- penguins %>%
  slice(1:5) %>%
  select(species, bill_length_mm, bill_depth_mm)

wide_data %>% kable()
species bill_length_mm bill_depth_mm
Adelie 39.1 18.7
Adelie 39.5 17.4
Adelie 40.3 18.0
Adelie NA NA
Adelie 36.7 19.3
# Pivoting it Longer
long_data <- wide_data %>%
  pivot_longer(
    cols = starts_with("bill"), 
    names_to = "measurement_type", 
    values_to = "value"
  )

long_data %>% kable()
species measurement_type value
Adelie bill_length_mm 39.1
Adelie bill_depth_mm 18.7
Adelie bill_length_mm 39.5
Adelie bill_depth_mm 17.4
Adelie bill_length_mm 40.3
Adelie bill_depth_mm 18.0
Adelie bill_length_mm NA
Adelie bill_depth_mm NA
Adelie bill_length_mm 36.7
Adelie bill_depth_mm 19.3

6. Real-World Application: Multi-Metric Pivot

Let’s create a complex summary table that shows both the count of penguins and their average flipper length across islands and species.

complex_pivot <- penguins %>%
  group_by(species, island) %>%
  summarize(
    n = n(),
    avg_flipper = mean(flipper_length_mm, na.rm = TRUE)
  ) %>%
  pivot_wider(
    names_from = island,
    values_from = c(n, avg_flipper),
    values_fill = 0 # Fill NAs with 0
  )

complex_pivot %>% 
  kable() %>% 
  kable_styling(bootstrap_options = "bordered")
species n_Biscoe n_Dream n_Torgersen avg_flipper_Biscoe avg_flipper_Dream avg_flipper_Torgersen
Adelie 44 56 52 188.7955 189.7321 191.1961
Chinstrap 0 68 0 0.0000 195.8235 0.0000
Gentoo 124 0 0 217.1870 0.0000 0.0000

7. Visualizing Pivoted Data

One of the main reasons we pivot data in R is to prepare it for ggplot2.

Figure 1: Visualizing Grouped Summaries

penguins %>%
  filter(!is.na(sex)) %>%
  group_by(species, island, sex) %>%
  summarize(avg_mass = mean(body_mass_g, na.rm = TRUE)) %>%
  ggplot(aes(x = species, y = avg_mass, fill = sex)) +
  geom_col(position = "dodge") +
  facet_wrap(~island) +
  theme_minimal() +
  labs(title = "Average Body Mass by Species, Island, and Sex",
       y = "Mean Body Mass (g)",
       x = "Species") +
  scale_fill_brewer(palette = "Set1")


8. Summary Table Table Cheat Sheet

Excel Term R Equivalent Description
Rows/Columns group_by() Defines the categories.
Values summarize() Defines the math (Sum, Mean, Count).
Pivot Layout pivot_wider() Moves data from rows to columns.
Un-pivot pivot_longer() Moves data from columns to rows.
Filters filter() Subsets the data before or after pivoting.

9. Practice Exercise

Task: Using the penguins dataset: 1. Filter for only the “Dream” island. 2. Pivot the data to show the maximum bill_length_mm for each species and sex. 3. Make the final table wide, so that sex is the column headers.

# Solution
exercise_result <- penguins %>%
  filter(island == "Dream", !is.na(sex)) %>%
  group_by(species, sex) %>%
  summarize(max_bill = max(bill_length_mm, na.rm = TRUE)) %>%
  pivot_wider(names_from = sex, values_from = max_bill)

exercise_result %>% kable()
species female male
Adelie 42.2 44.1
Chinstrap 58.0 55.8

End of Lecture Note ```

How to use this:

  1. Open RStudio.
  2. Go to File -> New File -> R Markdown....
  3. Give it a title and click OK.
  4. Delete all the default text in the editor.
  5. Copy and paste the code block above into the editor.
  6. Ensure you have the required packages installed by running this in your console: install.packages(c("tidyverse", "palmerpenguins", "knitr", "kableExtra"))
  7. Click the Knit button at the top of the editor.

Key Features of this Note:

  • Real Data: Uses palmerpenguins, which is the industry standard for teaching R data manipulation.
  • Comparison: Explicitly maps Excel terminology to R functions.
  • Visuals: Includes a ggplot2 chart to show why reshaping data is important.
  • Formatting: Uses kableExtra to make the tables look professional and readable in the output.

library(palmerpenguins) library(kableExtra)