1. Introduction

In Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table. In R, we achieve this using two primary concepts:

  1. Reshaping: Converting data between “Wide” and “Long” formats using tidyr.
  2. Aggregation: Summarizing data using group_by() and summarize() from dplyr.

2. The Dataset: Palmer Penguins

We will use the penguins dataset, which contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.

# Inspect the data
head(penguins)
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## 6 Adelie  Torgersen           39.3          20.6               190        3650
## # ℹ 2 more variables: sex <fct>, year <int>

3. The “Excel-Style” Pivot Table

In Excel, you drag “Species” to Rows and “Island” to Columns, then “Body Mass” to Values. In R, we use a combination of group_by, summarize, and pivot_wider.

Step 1: Create the Summary (Aggregation)

First, we calculate the average body mass for each species on each island.

summary_df <- penguins %>%
  filter(!is.na(body_mass_g)) %>% # Remove missing values
  group_by(species, island) %>%
  summarize(avg_mass = mean(body_mass_g))

kable(summary_df)
species island avg_mass
Adelie Biscoe 3709.659
Adelie Dream 3688.393
Adelie Torgersen 3706.373
Chinstrap Dream 3733.088
Gentoo Biscoe 5076.016

Step 2: Pivot to Wide Format

To make it look like a classic 2D spreadsheet pivot table, we use pivot_wider().

pivot_table_wide <- summary_df %>%
  pivot_wider(names_from = island, values_from = avg_mass)

kable(pivot_table_wide, caption = "Wide Format Pivot Table (Average Body Mass)")
Wide Format Pivot Table (Average Body Mass)
species Biscoe Dream Torgersen
Adelie 3709.659 3688.393 3706.373
Chinstrap NA 3733.088 NA
Gentoo 5076.016 NA NA

4. Understanding pivot_longer()

Sometimes data comes in a “Wide” format (like the table above), but R’s plotting libraries (like ggplot2) prefer “Long” format.

Scenario: Suppose we have a dataset where years are columns, and we want them in a single “Year” column.

# Let's create a dummy wide dataset
wide_data <- data.frame(
  Country = c("USA", "Canada"),
  Year_2020 = c(100, 150),
  Year_2021 = c(120, 180)
)

# Convert to long format
long_data <- wide_data %>%
  pivot_longer(
    cols = starts_with("Year"), 
    names_to = "Year", 
    values_to = "GDP"
  )

kable(long_data)
Country Year GDP
USA Year_2020 100
USA Year_2021 120
Canada Year_2020 150
Canada Year_2021 180

5. Visualizing the Pivot Table

Pivot tables are often precursors to visualization. Let’s visualize the distribution of flipper lengths across species and sex.

penguins %>%
  filter(!is.na(sex)) %>%
  group_by(species, sex) %>%
  summarize(mean_flipper = mean(flipper_length_mm)) %>%
  ggplot(aes(x = species, y = mean_flipper, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Aggregated Penguin Data",
       y = "Mean Flipper Length (mm)",
       x = "Species")
Figure 1: Mean Flipper Length by Species and Sex

Figure 1: Mean Flipper Length by Species and Sex


6. Advanced Pivoting: Multiple Values

You can pivot multiple value columns at once. For example, let’s find the mean AND max body mass for each species.

advanced_pivot <- penguins %>%
  filter(!is.na(body_mass_g)) %>%
  group_by(species) %>%
  summarize(
    Average = mean(body_mass_g),
    Maximum = max(body_mass_g)
  ) %>%
  pivot_longer(cols = c(Average, Maximum), 
               names_to = "Statistic", 
               values_to = "Value")

kable(advanced_pivot)
species Statistic Value
Adelie Average 3700.662
Adelie Maximum 4775.000
Chinstrap Average 3733.088
Chinstrap Maximum 4800.000
Gentoo Average 5076.016
Gentoo Maximum 6300.000

7. Summary Cheat Sheet

Action R Function
Group Data group_by()
Calculate Values summarize()
Rows to Columns pivot_wider(names_from = ..., values_from = ...)
Columns to Rows pivot_longer(cols = ..., names_to = ..., values_to = ...)
Count Rows count() or n()

8. Exercise

Task: Using the penguins dataset: 1. Filter out rows where sex is NA. 2. Create a pivot table showing the count of penguins for each island (rows) and species (columns).

# Solution
exercise_result <- penguins %>%
  filter(!is.na(sex)) %>%
  group_by(island, species) %>%
  tally() %>%
  pivot_wider(names_from = species, values_from = n, values_fill = 0)

kable(exercise_result)
island Adelie Gentoo Chinstrap
Biscoe 44 119 0
Dream 55 0 68
Torgersen 47 0 0

```


How to use this:

  1. Open RStudio.
  2. Go to File -> New File -> R Markdown....
  3. Delete all the default text in the new file.
  4. Copy and paste the code block above into the file.
  5. Install the required packages if you haven’t already by running this in your console: install.packages(c("tidyverse", "palmerpenguins"))
  6. Click the Knit button at the top of RStudio to generate the HTML lecture notes.

Key Concepts included:

  • Real Data: Uses palmerpenguins.
  • Aggregating: Shows how group_by and summarize act as the “engine” of a pivot table.
  • Reshaping: Shows pivot_wider (to make it look like Excel) and pivot_longer (to make it ready for plotting).
  • Visuals: Includes a ggplot2 bar chart.

library(palmerpenguins)