1. Introduction

In spreadsheet software like Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table.

In R, “pivoting” refers to two distinct but related tasks: 1. Reshaping: Moving data between “Wide” and “Long” formats. 2. Aggregating: Summarizing data (Calculating means, sums, etc.) across categories.

We primarily use the tidyr and dplyr packages (part of the tidyverse).


2. Reshaping Data: Wide vs. Long

Before we summarize data, we must understand the shape of it.

Wide Format

Each subject has one row, and variables are spread across multiple columns. (Great for data entry).

Long Format

Each row is a single observation. (Great for analysis and plotting).


3. Pivot Longer (pivot_longer)

Use this when you have column headers that are actually values, not variable names.

Example: Billboard Rankings

We will use a built-in dataset of song rankings.

# Load sample data
data("billboard")

# Look at the wide data
head(billboard[, 1:10])
## # A tibble: 6 × 10
##   artist       track      date.entered   wk1   wk2   wk3   wk4   wk5   wk6   wk7
##   <chr>        <chr>      <date>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac        Baby Don'… 2000-02-26      87    82    72    77    87    94    99
## 2 2Ge+her      The Harde… 2000-09-02      91    87    92    NA    NA    NA    NA
## 3 3 Doors Down Kryptonite 2000-04-08      81    70    68    67    66    57    54
## 4 3 Doors Down Loser      2000-10-21      76    76    72    69    67    65    55
## 5 504 Boyz     Wobble Wo… 2000-04-15      57    34    25    17    17    31    36
## 6 98^0         Give Me J… 2000-08-19      51    39    34    26    26    19     2

The columns wk1, wk2, etc., are “values” of time. To make this “Tidy”, we pivot:

billboard_long <- billboard %>%
  pivot_longer(
    cols = starts_with("wk"), 
    names_to = "week", 
    values_to = "rank",
    values_drop_na = TRUE
  )

head(billboard_long)
## # A tibble: 6 × 5
##   artist track                   date.entered week   rank
##   <chr>  <chr>                   <date>       <chr> <dbl>
## 1 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk1      87
## 2 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk2      82
## 3 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk3      72
## 4 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk4      77
## 5 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk5      87
## 6 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk6      94

4. Pivot Wider (pivot_wider)

Use this to create a summary table or a “report-style” view where categories become column headers.

# Let's take the long data and turn it back to wide
billboard_wide_again <- billboard_long %>%
  pivot_wider(
    names_from = week,
    values_from = rank
  )

head(billboard_wide_again[, 1:7])
## # A tibble: 6 × 7
##   artist       track                   date.entered   wk1   wk2   wk3   wk4
##   <chr>        <chr>                   <date>       <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac        Baby Don't Cry (Keep... 2000-02-26      87    82    72    77
## 2 2Ge+her      The Hardest Part Of ... 2000-09-02      91    87    92    NA
## 3 3 Doors Down Kryptonite              2000-04-08      81    70    68    67
## 4 3 Doors Down Loser                   2000-10-21      76    76    72    69
## 5 504 Boyz     Wobble Wobble           2000-04-15      57    34    25    17
## 6 98^0         Give Me Just One Nig... 2000-08-19      51    39    34    26

5. The “Real” Pivot Table: Aggregation

In Excel, you drag “Region” to Rows and “Sales” to Values. In R, we use group_by() and summarize().

Real-World Data: Gapminder

The gapminder dataset contains life expectancy, GDP per capita, and population by country over time.

data("gapminder")
head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

Task: Average Life Expectancy by Continent and Year

This is exactly what an Excel Pivot Table does.

gapminder_pivot <- gapminder %>%
  group_by(continent, year) %>%
  summarize(avg_lifeExp = mean(lifeExp)) %>%
  pivot_wider(names_from = year, values_from = avg_lifeExp)

knitr::kable(gapminder_pivot, caption = "Pivot Table: Avg Life Expectancy by Continent")
Pivot Table: Avg Life Expectancy by Continent
continent 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
Africa 39.13550 41.26635 43.31944 45.33454 47.45094 49.58042 51.59287 53.34479 53.62958 53.59827 53.32523 54.80604
Americas 53.27984 55.96028 58.39876 60.41092 62.39492 64.39156 66.22884 68.09072 69.56836 71.15048 72.42204 73.60812
Asia 46.31439 49.31854 51.56322 54.66364 57.31927 59.61056 62.61794 64.85118 66.53721 68.02052 69.23388 70.72848
Europe 64.40850 66.70307 68.53923 69.73760 70.77503 71.93777 72.80640 73.64217 74.44010 75.50517 76.70060 77.64860
Oceania 69.25500 70.29500 71.08500 71.31000 71.91000 72.85500 74.29000 75.32000 76.94500 78.19000 79.74000 80.71950

6. Visualizing Pivot Results

The reason we prefer Long Format in R is because ggplot2 requires it to map aesthetics (colors, facets) to variables.

# We use the LONG version of our summary for plotting
gapminder %>%
  group_by(continent, year) %>%
  summarize(avg_lifeExp = mean(lifeExp)) %>%
  ggplot(aes(x = year, y = avg_lifeExp, color = continent)) +
  geom_line(size = 1) +
  geom_point() +
  labs(title = "Life Expectancy over Time",
       subtitle = "Summarized using pivot-style grouping",
       y = "Average Life Expectancy",
       x = "Year") +
  theme_minimal()


7. Complex Pivoting: Multiple Values

What if you want to calculate the Mean AND the Median of life expectancy?

complex_pivot <- gapminder %>%
  filter(year == 2007) %>%
  group_by(continent) %>%
  summarize(
    Mean_LifeExp = mean(lifeExp),
    Max_LifeExp = max(lifeExp)
  )

complex_pivot
## # A tibble: 5 × 3
##   continent Mean_LifeExp Max_LifeExp
##   <fct>            <dbl>       <dbl>
## 1 Africa            54.8        76.4
## 2 Americas          73.6        80.7
## 3 Asia              70.7        82.6
## 4 Europe            77.6        81.8
## 5 Oceania           80.7        81.2

8. Summary Checklist

Feature Excel Pivot R (Tidyverse)
Rows Rows Field group_by()
Values Values Field (Sum, Avg) summarize(mean(...))
Columns Columns Field pivot_wider()
Filter Filters filter()

Key Functions to Remember:

  1. pivot_longer(): Make data “tall” (Wide \(\rightarrow\) Long).
  2. pivot_wider(): Make data “wide” (Long \(\rightarrow\) Wide).
  3. group_by() %>% summarize(): The engine that calculates the numbers.

Practice Exercise

Dataset: msleep (Built-in R dataset about mammal sleep patterns).

  1. Calculate the average sleep_total for each vore (carnivore, herbivore, etc.).
  2. Calculate the count of animals in each category.
  3. Pivot the results so that the vore types are column headers.
# Try it yourself!
msleep %>%
  drop_na(vore) %>%
  group_by(vore) %>%
  summarize(avg_sleep = mean(sleep_total)) %>%
  pivot_wider(names_from = vore, values_from = avg_sleep)
## # A tibble: 1 × 4
##   carni herbi insecti  omni
##   <dbl> <dbl>   <dbl> <dbl>
## 1  10.4  9.51    14.9  10.9

```

How to use this in RStudio:

  1. Install necessary packages: Run install.packages(c("tidyverse", "gapminder")) in your console.
  2. Create New File: File -> New File -> R Markdown.
  3. Paste & Knit: Delete the default content, paste the code above, and click the Knit button (the ball of yarn icon) to generate a beautiful HTML report.

Why this structure works:

  • Logical Flow: It explains what a pivot is before showing how to do it.
  • Gapminder Data: Using gapminder is a standard in data science education because it’s clean but reflects real-world trends (increasing life expectancy).
  • Side-by-Side Comparison: The table in Section 8 helps Excel users translate their existing knowledge into code.