Data Manipulation: Mastering Pivot Tables in R

1. Introduction

In Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table. In R, we achieve this using two primary concepts:

Reshaping: Converting data between “Wide” and “Long” formats using tidyr.
Aggregation: Summarizing data using group_by() and summarize() from dplyr.

2. The Dataset: Palmer Penguins

We will use the penguins dataset, which contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.

# Inspect the data
head(penguins)

## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## 6 Adelie  Torgersen           39.3          20.6               190        3650
## # ℹ 2 more variables: sex <fct>, year <int>

3. The “Excel-Style” Pivot Table

In Excel, you drag “Species” to Rows and “Island” to Columns, then “Body Mass” to Values. In R, we use a combination of group_by, summarize, and pivot_wider.

Step 1: Create the Summary (Aggregation)

First, we calculate the average body mass for each species on each island.

summary_df <- penguins %>%
  filter(!is.na(body_mass_g)) %>% # Remove missing values
  group_by(species, island) %>%
  summarize(avg_mass = mean(body_mass_g))

kable(summary_df)

species	island	avg_mass
Adelie	Biscoe	3709.659
Adelie	Dream	3688.393
Adelie	Torgersen	3706.373
Chinstrap	Dream	3733.088
Gentoo	Biscoe	5076.016

Step 2: Pivot to Wide Format

To make it look like a classic 2D spreadsheet pivot table, we use pivot_wider().

pivot_table_wide <- summary_df %>%
  pivot_wider(names_from = island, values_from = avg_mass)

kable(pivot_table_wide, caption = "Wide Format Pivot Table (Average Body Mass)")

Wide Format Pivot Table (Average Body Mass)
species	Biscoe	Dream	Torgersen
Adelie	3709.659	3688.393	3706.373
Chinstrap	NA	3733.088	NA
Gentoo	5076.016	NA	NA

4. Understanding `pivot_longer()`

Sometimes data comes in a “Wide” format (like the table above), but R’s plotting libraries (like ggplot2) prefer “Long” format.

Scenario: Suppose we have a dataset where years are columns, and we want them in a single “Year” column.

# Let's create a dummy wide dataset
wide_data <- data.frame(
  Country = c("USA", "Canada"),
  Year_2020 = c(100, 150),
  Year_2021 = c(120, 180)
)

# Convert to long format
long_data <- wide_data %>%
  pivot_longer(
    cols = starts_with("Year"), 
    names_to = "Year", 
    values_to = "GDP"
  )

kable(long_data)

Country	Year	GDP
USA	Year_2020	100
USA	Year_2021	120
Canada	Year_2020	150
Canada	Year_2021	180

5. Visualizing the Pivot Table

Pivot tables are often precursors to visualization. Let’s visualize the distribution of flipper lengths across species and sex.

penguins %>%
  filter(!is.na(sex)) %>%
  group_by(species, sex) %>%
  summarize(mean_flipper = mean(flipper_length_mm)) %>%
  ggplot(aes(x = species, y = mean_flipper, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Aggregated Penguin Data",
       y = "Mean Flipper Length (mm)",
       x = "Species")

Figure 1: Mean Flipper Length by Species and Sex

6. Advanced Pivoting: Multiple Values

You can pivot multiple value columns at once. For example, let’s find the mean AND max body mass for each species.

advanced_pivot <- penguins %>%
  filter(!is.na(body_mass_g)) %>%
  group_by(species) %>%
  summarize(
    Average = mean(body_mass_g),
    Maximum = max(body_mass_g)
  ) %>%
  pivot_longer(cols = c(Average, Maximum), 
               names_to = "Statistic", 
               values_to = "Value")

kable(advanced_pivot)

species	Statistic	Value
Adelie	Average	3700.662
Adelie	Maximum	4775.000
Chinstrap	Average	3733.088
Chinstrap	Maximum	4800.000
Gentoo	Average	5076.016
Gentoo	Maximum	6300.000

7. Summary Cheat Sheet

Action	R Function
Group Data	`group_by()`
Calculate Values	`summarize()`
Rows to Columns	`pivot_wider(names_from = ..., values_from = ...)`
Columns to Rows	`pivot_longer(cols = ..., names_to = ..., values_to = ...)`
Count Rows	`count()` or `n()`

8. Exercise

Task: Using the penguins dataset: 1. Filter out rows where sex is NA. 2. Create a pivot table showing the count of penguins for each island (rows) and species (columns).

# Solution
exercise_result <- penguins %>%
  filter(!is.na(sex)) %>%
  group_by(island, species) %>%
  tally() %>%
  pivot_wider(names_from = species, values_from = n, values_fill = 0)

kable(exercise_result)

island	Adelie	Gentoo	Chinstrap
Biscoe	44	119	0
Dream	55	0	68
Torgersen	47	0	0

```

How to use this:

Open RStudio.
Go to File -> New File -> R Markdown....
Delete all the default text in the new file.
Copy and paste the code block above into the file.
Install the required packages if you haven’t already by running this in your console: install.packages(c("tidyverse", "palmerpenguins"))
Click the Knit button at the top of RStudio to generate the HTML lecture notes.

Key Concepts included:

Real Data: Uses palmerpenguins.
Aggregating: Shows how group_by and summarize act as the “engine” of a pivot table.
Reshaping: Shows pivot_wider (to make it look like Excel) and pivot_longer (to make it ready for plotting).
Visuals: Includes a ggplot2 bar chart.

library(palmerpenguins)

Data Manipulation: Mastering Pivot Tables in R

Reshaping and Summarizing Data with tidyr and dplyr

Hannan Hassan Nouh

2026-04-16

1. Introduction

2. The Dataset: Palmer Penguins

3. The “Excel-Style” Pivot Table

Step 1: Create the Summary (Aggregation)

Step 2: Pivot to Wide Format

4. Understanding `pivot_longer()`

5. Visualizing the Pivot Table

6. Advanced Pivoting: Multiple Values

7. Summary Cheat Sheet

8. Exercise

How to use this:

Key Concepts included:

Data Manipulation: Mastering Pivot Tables in R

Reshaping and Summarizing Data with tidyr and dplyr

Hannan Hassan Nouh

2026-04-16

1. Introduction

2. The Dataset: Palmer Penguins

3. The “Excel-Style” Pivot Table

Step 1: Create the Summary (Aggregation)

Step 2: Pivot to Wide Format

4. Understanding pivot_longer()

5. Visualizing the Pivot Table

6. Advanced Pivoting: Multiple Values

7. Summary Cheat Sheet

8. Exercise

How to use this:

Key Concepts included:

4. Understanding `pivot_longer()`