1. Introduction

In spreadsheet software like Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table.

In R, “pivoting” refers to two distinct but related tasks: 1. Reshaping: Moving data between “Wide” and “Long” formats. 2. Aggregating: Summarizing data (Calculating means, sums, etc.) across categories.

We primarily use the tidyr and dplyr packages (part of the tidyverse).

2. Reshaping Data: Wide vs. Long

Before we summarize data, we must understand the shape of it.

Wide Format

Each subject has one row, and variables are spread across multiple columns. (Great for data entry).

Long Format

Each row is a single observation. (Great for analysis and plotting).

3. Pivot Longer (`pivot_longer`)

Use this when you have column headers that are actually values, not variable names.

Example: Billboard Rankings

We will use a built-in dataset of song rankings.

# Load sample data
data("billboard")

# Look at the wide data
head(billboard[, 1:10])

## # A tibble: 6 × 10
##   artist       track      date.entered   wk1   wk2   wk3   wk4   wk5   wk6   wk7
##   <chr>        <chr>      <date>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac        Baby Don'… 2000-02-26      87    82    72    77    87    94    99
## 2 2Ge+her      The Harde… 2000-09-02      91    87    92    NA    NA    NA    NA
## 3 3 Doors Down Kryptonite 2000-04-08      81    70    68    67    66    57    54
## 4 3 Doors Down Loser      2000-10-21      76    76    72    69    67    65    55
## 5 504 Boyz     Wobble Wo… 2000-04-15      57    34    25    17    17    31    36
## 6 98^0         Give Me J… 2000-08-19      51    39    34    26    26    19     2

The columns wk1, wk2, etc., are “values” of time. To make this “Tidy”, we pivot:

billboard_long <- billboard %>%
  pivot_longer(
    cols = starts_with("wk"), 
    names_to = "week", 
    values_to = "rank",
    values_drop_na = TRUE
  )

head(billboard_long)

## # A tibble: 6 × 5
##   artist track                   date.entered week   rank
##   <chr>  <chr>                   <date>       <chr> <dbl>
## 1 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk1      87
## 2 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk2      82
## 3 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk3      72
## 4 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk4      77
## 5 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk5      87
## 6 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk6      94

4. Pivot Wider (`pivot_wider`)

Use this to create a summary table or a “report-style” view where categories become column headers.

# Let's take the long data and turn it back to wide
billboard_wide_again <- billboard_long %>%
  pivot_wider(
    names_from = week,
    values_from = rank
  )

head(billboard_wide_again[, 1:7])

## # A tibble: 6 × 7
##   artist       track                   date.entered   wk1   wk2   wk3   wk4
##   <chr>        <chr>                   <date>       <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac        Baby Don't Cry (Keep... 2000-02-26      87    82    72    77
## 2 2Ge+her      The Hardest Part Of ... 2000-09-02      91    87    92    NA
## 3 3 Doors Down Kryptonite              2000-04-08      81    70    68    67
## 4 3 Doors Down Loser                   2000-10-21      76    76    72    69
## 5 504 Boyz     Wobble Wobble           2000-04-15      57    34    25    17
## 6 98^0         Give Me Just One Nig... 2000-08-19      51    39    34    26

5. The “Real” Pivot Table: Aggregation

In Excel, you drag “Region” to Rows and “Sales” to Values. In R, we use group_by() and summarize().

Real-World Data: Gapminder

The gapminder dataset contains life expectancy, GDP per capita, and population by country over time.

data("gapminder")
head(gapminder)

## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

Task: Average Life Expectancy by Continent and Year

This is exactly what an Excel Pivot Table does.

gapminder_pivot <- gapminder %>%
  group_by(continent, year) %>%
  summarize(avg_lifeExp = mean(lifeExp)) %>%
  pivot_wider(names_from = year, values_from = avg_lifeExp)

knitr::kable(gapminder_pivot, caption = "Pivot Table: Avg Life Expectancy by Continent")

Pivot Table: Avg Life Expectancy by Continent
continent	1952	1957	1962	1967	1972	1977	1982	1987	1992	1997	2002	2007
Africa	39.13550	41.26635	43.31944	45.33454	47.45094	49.58042	51.59287	53.34479	53.62958	53.59827	53.32523	54.80604
Americas	53.27984	55.96028	58.39876	60.41092	62.39492	64.39156	66.22884	68.09072	69.56836	71.15048	72.42204	73.60812
Asia	46.31439	49.31854	51.56322	54.66364	57.31927	59.61056	62.61794	64.85118	66.53721	68.02052	69.23388	70.72848
Europe	64.40850	66.70307	68.53923	69.73760	70.77503	71.93777	72.80640	73.64217	74.44010	75.50517	76.70060	77.64860
Oceania	69.25500	70.29500	71.08500	71.31000	71.91000	72.85500	74.29000	75.32000	76.94500	78.19000	79.74000	80.71950

6. Visualizing Pivot Results

The reason we prefer Long Format in R is because ggplot2 requires it to map aesthetics (colors, facets) to variables.

# We use the LONG version of our summary for plotting
gapminder %>%
  group_by(continent, year) %>%
  summarize(avg_lifeExp = mean(lifeExp)) %>%
  ggplot(aes(x = year, y = avg_lifeExp, color = continent)) +
  geom_line(size = 1) +
  geom_point() +
  labs(title = "Life Expectancy over Time",
       subtitle = "Summarized using pivot-style grouping",
       y = "Average Life Expectancy",
       x = "Year") +
  theme_minimal()

7. Complex Pivoting: Multiple Values

What if you want to calculate the Mean AND the Median of life expectancy?

complex_pivot <- gapminder %>%
  filter(year == 2007) %>%
  group_by(continent) %>%
  summarize(
    Mean_LifeExp = mean(lifeExp),
    Max_LifeExp = max(lifeExp)
  )

complex_pivot

## # A tibble: 5 × 3
##   continent Mean_LifeExp Max_LifeExp
##   <fct>            <dbl>       <dbl>
## 1 Africa            54.8        76.4
## 2 Americas          73.6        80.7
## 3 Asia              70.7        82.6
## 4 Europe            77.6        81.8
## 5 Oceania           80.7        81.2

8. Summary Checklist

Feature	Excel Pivot	R (Tidyverse)
Rows	Rows Field	`group_by()`
Values	Values Field (Sum, Avg)	`summarize(mean(...))`
Columns	Columns Field	`pivot_wider()`
Filter	Filters	`filter()`

Key Functions to Remember:

pivot_longer(): Make data “tall” (Wide \(\rightarrow\) Long).
pivot_wider(): Make data “wide” (Long \(\rightarrow\) Wide).
group_by() %>% summarize(): The engine that calculates the numbers.

Practice Exercise

Dataset: msleep (Built-in R dataset about mammal sleep patterns).

Calculate the average sleep_total for each vore (carnivore, herbivore, etc.).
Calculate the count of animals in each category.
Pivot the results so that the vore types are column headers.

# Try it yourself!
msleep %>%
  drop_na(vore) %>%
  group_by(vore) %>%
  summarize(avg_sleep = mean(sleep_total)) %>%
  pivot_wider(names_from = vore, values_from = avg_sleep)

## # A tibble: 1 × 4
##   carni herbi insecti  omni
##   <dbl> <dbl>   <dbl> <dbl>
## 1  10.4  9.51    14.9  10.9

```

How to use this in RStudio:

Install necessary packages: Run install.packages(c("tidyverse", "gapminder")) in your console.
Create New File: File -> New File -> R Markdown.
Paste & Knit: Delete the default content, paste the code above, and click the Knit button (the ball of yarn icon) to generate a beautiful HTML report.

Why this structure works:

Logical Flow: It explains what a pivot is before showing how to do it.
Gapminder Data: Using gapminder is a standard in data science education because it’s clean but reflects real-world trends (increasing life expectancy).
Side-by-Side Comparison: The table in Section 8 helps Excel users translate their existing knowledge into code.

Lecture: Data Reshaping and Pivot Tables in R

Balkisa Ibrahim Abdirahman

2026-04-16

1. Introduction

2. Reshaping Data: Wide vs. Long

Wide Format

Long Format

3. Pivot Longer (`pivot_longer`)

Example: Billboard Rankings

4. Pivot Wider (`pivot_wider`)

5. The “Real” Pivot Table: Aggregation

Real-World Data: Gapminder

Task: Average Life Expectancy by Continent and Year

6. Visualizing Pivot Results

7. Complex Pivoting: Multiple Values

8. Summary Checklist

Key Functions to Remember:

Practice Exercise

How to use this in RStudio:

Why this structure works:

Lecture: Data Reshaping and Pivot Tables in R

Balkisa Ibrahim Abdirahman

2026-04-16

1. Introduction

2. Reshaping Data: Wide vs. Long

Wide Format

Long Format

3. Pivot Longer (pivot_longer)

Example: Billboard Rankings

4. Pivot Wider (pivot_wider)

5. The “Real” Pivot Table: Aggregation

Real-World Data: Gapminder

Task: Average Life Expectancy by Continent and Year

6. Visualizing Pivot Results

7. Complex Pivoting: Multiple Values

8. Summary Checklist

Key Functions to Remember:

Practice Exercise

How to use this in RStudio:

Why this structure works:

3. Pivot Longer (`pivot_longer`)

4. Pivot Wider (`pivot_wider`)