In Excel, a Pivot Table is a tool used to summarize, sort, reorganize, group, count, total, or average data stored in a table. In R, we achieve this using two primary concepts:
tidyr.group_by() and summarize() from
dplyr.We will use the penguins dataset, which contains size
measurements for three penguin species observed on three islands in the
Palmer Archipelago, Antarctica.
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## # ℹ 2 more variables: sex <fct>, year <int>
In Excel, you drag “Species” to Rows and “Island” to Columns, then
“Body Mass” to Values. In R, we use a combination of
group_by, summarize, and
pivot_wider.
First, we calculate the average body mass for each species on each island.
summary_df <- penguins %>%
filter(!is.na(body_mass_g)) %>% # Remove missing values
group_by(species, island) %>%
summarize(avg_mass = mean(body_mass_g))
kable(summary_df)| species | island | avg_mass |
|---|---|---|
| Adelie | Biscoe | 3709.659 |
| Adelie | Dream | 3688.393 |
| Adelie | Torgersen | 3706.373 |
| Chinstrap | Dream | 3733.088 |
| Gentoo | Biscoe | 5076.016 |
To make it look like a classic 2D spreadsheet pivot table, we use
pivot_wider().
pivot_table_wide <- summary_df %>%
pivot_wider(names_from = island, values_from = avg_mass)
kable(pivot_table_wide, caption = "Wide Format Pivot Table (Average Body Mass)")| species | Biscoe | Dream | Torgersen |
|---|---|---|---|
| Adelie | 3709.659 | 3688.393 | 3706.373 |
| Chinstrap | NA | 3733.088 | NA |
| Gentoo | 5076.016 | NA | NA |
pivot_longer()Sometimes data comes in a “Wide” format (like the table above), but
R’s plotting libraries (like ggplot2) prefer “Long”
format.
Scenario: Suppose we have a dataset where years are columns, and we want them in a single “Year” column.
# Let's create a dummy wide dataset
wide_data <- data.frame(
Country = c("USA", "Canada"),
Year_2020 = c(100, 150),
Year_2021 = c(120, 180)
)
# Convert to long format
long_data <- wide_data %>%
pivot_longer(
cols = starts_with("Year"),
names_to = "Year",
values_to = "GDP"
)
kable(long_data)| Country | Year | GDP |
|---|---|---|
| USA | Year_2020 | 100 |
| USA | Year_2021 | 120 |
| Canada | Year_2020 | 150 |
| Canada | Year_2021 | 180 |
Pivot tables are often precursors to visualization. Let’s visualize the distribution of flipper lengths across species and sex.
penguins %>%
filter(!is.na(sex)) %>%
group_by(species, sex) %>%
summarize(mean_flipper = mean(flipper_length_mm)) %>%
ggplot(aes(x = species, y = mean_flipper, fill = sex)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
labs(title = "Aggregated Penguin Data",
y = "Mean Flipper Length (mm)",
x = "Species")Figure 1: Mean Flipper Length by Species and Sex
You can pivot multiple value columns at once. For example, let’s find
the mean AND max body mass for each
species.
advanced_pivot <- penguins %>%
filter(!is.na(body_mass_g)) %>%
group_by(species) %>%
summarize(
Average = mean(body_mass_g),
Maximum = max(body_mass_g)
) %>%
pivot_longer(cols = c(Average, Maximum),
names_to = "Statistic",
values_to = "Value")
kable(advanced_pivot)| species | Statistic | Value |
|---|---|---|
| Adelie | Average | 3700.662 |
| Adelie | Maximum | 4775.000 |
| Chinstrap | Average | 3733.088 |
| Chinstrap | Maximum | 4800.000 |
| Gentoo | Average | 5076.016 |
| Gentoo | Maximum | 6300.000 |
| Action | R Function |
|---|---|
| Group Data | group_by() |
| Calculate Values | summarize() |
| Rows to Columns | pivot_wider(names_from = ..., values_from = ...) |
| Columns to Rows | pivot_longer(cols = ..., names_to = ..., values_to = ...) |
| Count Rows | count() or n() |
Task: Using the penguins dataset: 1.
Filter out rows where sex is NA. 2. Create a
pivot table showing the count of penguins for each
island (rows) and species (columns).
# Solution
exercise_result <- penguins %>%
filter(!is.na(sex)) %>%
group_by(island, species) %>%
tally() %>%
pivot_wider(names_from = species, values_from = n, values_fill = 0)
kable(exercise_result)| island | Adelie | Gentoo | Chinstrap |
|---|---|---|---|
| Biscoe | 44 | 119 | 0 |
| Dream | 55 | 0 | 68 |
| Torgersen | 47 | 0 | 0 |
```
File -> New File ->
R Markdown....install.packages(c("tidyverse", "palmerpenguins"))palmerpenguins.group_by and
summarize act as the “engine” of a pivot table.pivot_wider (to make
it look like Excel) and pivot_longer (to make it ready for
plotting).ggplot2 bar
chart.library(palmerpenguins)