In Excel, a Pivot Table is a tool used to summarize,
sort, reorganize, group, count, total, or average data stored in a
table. In R, we achieve “Pivot Table” functionality primarily through
the tidyverse suite of packages, specifically:
dplyr: For grouping and summarizing
data.tidyr: For reshaping data (changing
the layout from long to wide and vice versa).We will use the palmerpenguins dataset, which contains
size measurements for three penguin species observed on three islands in
the Palmer Archipelago, Antarctica.
# Inspect the data
data("penguins")
head(penguins) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"))| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
|---|---|---|---|---|---|---|---|
| Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
| Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
| Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
| Adelie | Torgersen | NA | NA | NA | NA | NA | 2007 |
| Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
| Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
The most common pivot operation is calculating a summary statistic (like average or count) across categories.
In Excel, you would put species in Rows,
sex in Columns, and body_mass_g in Values. In
R:
pivot_summary <- penguins %>%
filter(!is.na(sex)) %>% # Remove missing values
group_by(species, sex) %>%
summarize(mean_mass = mean(body_mass_g, na.rm = TRUE))
pivot_summary %>%
kable() %>%
kable_styling(full_width = F)| species | sex | mean_mass |
|---|---|---|
| Adelie | female | 3368.836 |
| Adelie | male | 4043.493 |
| Chinstrap | female | 3527.206 |
| Chinstrap | male | 3938.971 |
| Gentoo | female | 4679.741 |
| Gentoo | male | 5484.836 |
pivot_wider()The table above is in Long Format. To make it look
like a traditional Excel Pivot Table, we use pivot_wider().
This spreads a column’s values into multiple new columns.
wide_pivot <- pivot_summary %>%
pivot_wider(
names_from = sex,
values_from = mean_mass
)
wide_pivot %>%
kable(caption = "Traditional Pivot Table View (Wide Format)") %>%
kable_styling(full_width = F)| species | female | male |
|---|---|---|
| Adelie | 3368.836 | 4043.493 |
| Chinstrap | 3527.206 | 3938.971 |
| Gentoo | 4679.741 | 5484.836 |
names_from: Which column’s values
become the new headers? (e.g., female, male).values_from: Which column contains the
numbers to fill the cells?pivot_longer()Sometimes data comes in a wide format that is hard to analyze or
plot. We use pivot_longer() to “un-pivot” it back to a tidy
format.
Imagine we have a dataset of penguin measurements in a wide format:
# Creating a wide sample
wide_data <- penguins %>%
slice(1:5) %>%
select(species, bill_length_mm, bill_depth_mm)
wide_data %>% kable()| species | bill_length_mm | bill_depth_mm |
|---|---|---|
| Adelie | 39.1 | 18.7 |
| Adelie | 39.5 | 17.4 |
| Adelie | 40.3 | 18.0 |
| Adelie | NA | NA |
| Adelie | 36.7 | 19.3 |
# Pivoting it Longer
long_data <- wide_data %>%
pivot_longer(
cols = starts_with("bill"),
names_to = "measurement_type",
values_to = "value"
)
long_data %>% kable()| species | measurement_type | value |
|---|---|---|
| Adelie | bill_length_mm | 39.1 |
| Adelie | bill_depth_mm | 18.7 |
| Adelie | bill_length_mm | 39.5 |
| Adelie | bill_depth_mm | 17.4 |
| Adelie | bill_length_mm | 40.3 |
| Adelie | bill_depth_mm | 18.0 |
| Adelie | bill_length_mm | NA |
| Adelie | bill_depth_mm | NA |
| Adelie | bill_length_mm | 36.7 |
| Adelie | bill_depth_mm | 19.3 |
Let’s create a complex summary table that shows both the count of penguins and their average flipper length across islands and species.
complex_pivot <- penguins %>%
group_by(species, island) %>%
summarize(
n = n(),
avg_flipper = mean(flipper_length_mm, na.rm = TRUE)
) %>%
pivot_wider(
names_from = island,
values_from = c(n, avg_flipper),
values_fill = 0 # Fill NAs with 0
)
complex_pivot %>%
kable() %>%
kable_styling(bootstrap_options = "bordered")| species | n_Biscoe | n_Dream | n_Torgersen | avg_flipper_Biscoe | avg_flipper_Dream | avg_flipper_Torgersen |
|---|---|---|---|---|---|---|
| Adelie | 44 | 56 | 52 | 188.7955 | 189.7321 | 191.1961 |
| Chinstrap | 0 | 68 | 0 | 0.0000 | 195.8235 | 0.0000 |
| Gentoo | 124 | 0 | 0 | 217.1870 | 0.0000 | 0.0000 |
One of the main reasons we pivot data in R is to prepare it for
ggplot2.
penguins %>%
filter(!is.na(sex)) %>%
group_by(species, island, sex) %>%
summarize(avg_mass = mean(body_mass_g, na.rm = TRUE)) %>%
ggplot(aes(x = species, y = avg_mass, fill = sex)) +
geom_col(position = "dodge") +
facet_wrap(~island) +
theme_minimal() +
labs(title = "Average Body Mass by Species, Island, and Sex",
y = "Mean Body Mass (g)",
x = "Species") +
scale_fill_brewer(palette = "Set1")| Excel Term | R Equivalent | Description |
|---|---|---|
| Rows/Columns | group_by() |
Defines the categories. |
| Values | summarize() |
Defines the math (Sum, Mean, Count). |
| Pivot Layout | pivot_wider() |
Moves data from rows to columns. |
| Un-pivot | pivot_longer() |
Moves data from columns to rows. |
| Filters | filter() |
Subsets the data before or after pivoting. |
Task: Using the penguins dataset: 1.
Filter for only the “Dream” island. 2. Pivot the data to show the
maximum bill_length_mm for each species
and sex. 3. Make the final table wide, so that sex is the
column headers.
# Solution
exercise_result <- penguins %>%
filter(island == "Dream", !is.na(sex)) %>%
group_by(species, sex) %>%
summarize(max_bill = max(bill_length_mm, na.rm = TRUE)) %>%
pivot_wider(names_from = sex, values_from = max_bill)
exercise_result %>% kable()| species | female | male |
|---|---|---|
| Adelie | 42.2 | 44.1 |
| Chinstrap | 58.0 | 55.8 |
End of Lecture Note ```
File -> New File ->
R Markdown....install.packages(c("tidyverse", "palmerpenguins", "knitr", "kableExtra"))palmerpenguins, which
is the industry standard for teaching R data manipulation.ggplot2 chart to
show why reshaping data is important.kableExtra to make
the tables look professional and readable in the output.library(palmerpenguins) library(kableExtra)