Data Transformation: Mastering Pivot Tables in R

1. Introduction

In Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table. In R, we achieve “Pivot Table” functionality primarily through the tidyverse suite of packages, specifically:

dplyr: For grouping and summarizing data.
tidyr: For reshaping data (changing the layout from long to wide and vice versa).

Why R for Pivot Tables?

Reproducibility: You can re-run the script on new data without re-clicking buttons.
Scalability: R can handle millions of rows that might crash Excel.
Flexibility: You can integrate complex statistical functions easily.

2. The Dataset: Palmer Penguins

We will use the palmerpenguins dataset, which contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.

# Inspect the data
data("penguins")
head(penguins) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
Adelie	Torgersen	39.1	18.7	181	3750	male	2007
Adelie	Torgersen	39.5	17.4	186	3800	female	2007
Adelie	Torgersen	40.3	18.0	195	3250	female	2007
Adelie	Torgersen	NA	NA	NA	NA	NA	2007
Adelie	Torgersen	36.7	19.3	193	3450	female	2007
Adelie	Torgersen	39.3	20.6	190	3650	male	2007

3. The Basic Summary (The “Pivot” Logic)

The most common pivot operation is calculating a summary statistic (like average or count) across categories.

Example: Average Body Mass by Species and Sex

In Excel, you would put species in Rows, sex in Columns, and body_mass_g in Values. In R:

pivot_summary <- penguins %>%
  filter(!is.na(sex)) %>% # Remove missing values
  group_by(species, sex) %>%
  summarize(mean_mass = mean(body_mass_g, na.rm = TRUE))

pivot_summary %>% 
  kable() %>% 
  kable_styling(full_width = F)

species	sex	mean_mass
Adelie	female	3368.836
Adelie	male	4043.493
Chinstrap	female	3527.206
Chinstrap	male	3938.971
Gentoo	female	4679.741
Gentoo	male	5484.836

4. Reshaping Data: `pivot_wider()`

The table above is in Long Format. To make it look like a traditional Excel Pivot Table, we use pivot_wider(). This spreads a column’s values into multiple new columns.

wide_pivot <- pivot_summary %>%
  pivot_wider(
    names_from = sex, 
    values_from = mean_mass
  )

wide_pivot %>% 
  kable(caption = "Traditional Pivot Table View (Wide Format)") %>% 
  kable_styling(full_width = F)

Traditional Pivot Table View (Wide Format)
species	female	male
Adelie	3368.836	4043.493
Chinstrap	3527.206	3938.971
Gentoo	4679.741	5484.836

Visualizing the Transformation

names_from: Which column’s values become the new headers? (e.g., female, male).
values_from: Which column contains the numbers to fill the cells?

5. Reshaping Data: `pivot_longer()`

Sometimes data comes in a wide format that is hard to analyze or plot. We use pivot_longer() to “un-pivot” it back to a tidy format.

Imagine we have a dataset of penguin measurements in a wide format:

# Creating a wide sample
wide_data <- penguins %>%
  slice(1:5) %>%
  select(species, bill_length_mm, bill_depth_mm)

wide_data %>% kable()

species	bill_length_mm	bill_depth_mm
Adelie	39.1	18.7
Adelie	39.5	17.4
Adelie	40.3	18.0
Adelie	NA	NA
Adelie	36.7	19.3

# Pivoting it Longer
long_data <- wide_data %>%
  pivot_longer(
    cols = starts_with("bill"), 
    names_to = "measurement_type", 
    values_to = "value"
  )

long_data %>% kable()

species	measurement_type	value
Adelie	bill_length_mm	39.1
Adelie	bill_depth_mm	18.7
Adelie	bill_length_mm	39.5
Adelie	bill_depth_mm	17.4
Adelie	bill_length_mm	40.3
Adelie	bill_depth_mm	18.0
Adelie	bill_length_mm	NA
Adelie	bill_depth_mm	NA
Adelie	bill_length_mm	36.7
Adelie	bill_depth_mm	19.3

6. Real-World Application: Multi-Metric Pivot

Let’s create a complex summary table that shows both the count of penguins and their average flipper length across islands and species.

complex_pivot <- penguins %>%
  group_by(species, island) %>%
  summarize(
    n = n(),
    avg_flipper = mean(flipper_length_mm, na.rm = TRUE)
  ) %>%
  pivot_wider(
    names_from = island,
    values_from = c(n, avg_flipper),
    values_fill = 0 # Fill NAs with 0
  )

complex_pivot %>% 
  kable() %>% 
  kable_styling(bootstrap_options = "bordered")

species	n_Biscoe	n_Dream	n_Torgersen	avg_flipper_Biscoe	avg_flipper_Dream	avg_flipper_Torgersen
Adelie	44	56	52	188.7955	189.7321	191.1961
Chinstrap	0	68	0	0.0000	195.8235	0.0000
Gentoo	124	0	0	217.1870	0.0000	0.0000

7. Visualizing Pivoted Data

One of the main reasons we pivot data in R is to prepare it for ggplot2.

Figure 1: Visualizing Grouped Summaries

penguins %>%
  filter(!is.na(sex)) %>%
  group_by(species, island, sex) %>%
  summarize(avg_mass = mean(body_mass_g, na.rm = TRUE)) %>%
  ggplot(aes(x = species, y = avg_mass, fill = sex)) +
  geom_col(position = "dodge") +
  facet_wrap(~island) +
  theme_minimal() +
  labs(title = "Average Body Mass by Species, Island, and Sex",
       y = "Mean Body Mass (g)",
       x = "Species") +
  scale_fill_brewer(palette = "Set1")

8. Summary Table Table Cheat Sheet

Excel Term	R Equivalent	Description
Rows/Columns	`group_by()`	Defines the categories.
Values	`summarize()`	Defines the math (Sum, Mean, Count).
Pivot Layout	`pivot_wider()`	Moves data from rows to columns.
Un-pivot	`pivot_longer()`	Moves data from columns to rows.
Filters	`filter()`	Subsets the data before or after pivoting.

9. Practice Exercise

Task: Using the penguins dataset: 1. Filter for only the “Dream” island. 2. Pivot the data to show the maximum bill_length_mm for each species and sex. 3. Make the final table wide, so that sex is the column headers.

# Solution
exercise_result <- penguins %>%
  filter(island == "Dream", !is.na(sex)) %>%
  group_by(species, sex) %>%
  summarize(max_bill = max(bill_length_mm, na.rm = TRUE)) %>%
  pivot_wider(names_from = sex, values_from = max_bill)

exercise_result %>% kable()

species	female	male
Adelie	42.2	44.1
Chinstrap	58.0	55.8

End of Lecture Note ```

How to use this:

Open RStudio.
Go to File -> New File -> R Markdown....
Give it a title and click OK.
Delete all the default text in the editor.
Copy and paste the code block above into the editor.
Ensure you have the required packages installed by running this in your console: install.packages(c("tidyverse", "palmerpenguins", "knitr", "kableExtra"))
Click the Knit button at the top of the editor.

Key Features of this Note:

Real Data: Uses palmerpenguins, which is the industry standard for teaching R data manipulation.
Comparison: Explicitly maps Excel terminology to R functions.
Visuals: Includes a ggplot2 chart to show why reshaping data is important.
Formatting: Uses kableExtra to make the tables look professional and readable in the output.

library(palmerpenguins) library(kableExtra)

Data Transformation: Mastering Pivot Tables in R

Using tidyverse, dplyr, and tidyr

Naema Yousuf Ali

2026-04-16

1. Introduction

Why R for Pivot Tables?

2. The Dataset: Palmer Penguins

3. The Basic Summary (The “Pivot” Logic)

Example: Average Body Mass by Species and Sex

4. Reshaping Data: `pivot_wider()`

Visualizing the Transformation

5. Reshaping Data: `pivot_longer()`

6. Real-World Application: Multi-Metric Pivot

7. Visualizing Pivoted Data

Figure 1: Visualizing Grouped Summaries

8. Summary Table Table Cheat Sheet

9. Practice Exercise

How to use this:

Key Features of this Note:

Data Transformation: Mastering Pivot Tables in R

Using tidyverse, dplyr, and tidyr

Naema Yousuf Ali

2026-04-16

1. Introduction

Why R for Pivot Tables?

2. The Dataset: Palmer Penguins

3. The Basic Summary (The “Pivot” Logic)

Example: Average Body Mass by Species and Sex

4. Reshaping Data: pivot_wider()

Visualizing the Transformation

5. Reshaping Data: pivot_longer()

6. Real-World Application: Multi-Metric Pivot

7. Visualizing Pivoted Data

Figure 1: Visualizing Grouped Summaries

8. Summary Table Table Cheat Sheet

9. Practice Exercise

How to use this:

Key Features of this Note:

4. Reshaping Data: `pivot_wider()`

5. Reshaping Data: `pivot_longer()`