Beyond Tidyverse 101

Intro

Priyanka Gagneja

Data Analytics Consultant @ OnPoint Insights
Data Analytics Freelancer
Twitter: priyankaigit
Linkedin: priyanka-gagneja

Basics

select()
filter()

arrange()
mutate()

summarise()
group_by()

What did I miss

Important

Details in the documentation !!

Show me some action

Repeat an action across multiple columns at once

Include all the grouping variables even if its instance is not in the data

Conditional action

Change position of a variable

Sample data

food <- tibble(
    food = c('Banana', 'Apple', 'Lemon','Potato', 'Tomato', 'Mango', 'Carrot'),
    type = c('fruit','fruit','vegetable','vegetable','vegetable','fruit','vegetable'),
    px_2000_usd = c(5, 10, 5, 8, 3, 9, 12),
    px_2010_usd = c(7, 9, 7, 8, 5, 10, 13),
    px_2020_usd = c(8, 9, 8, 10, 6, 13, 14)
    
) %>% 
  mutate(type = factor(type, levels = c('fruit', 'vegetable','staple')))

food %>% 
  gt()

food	type	px_2000_usd	px_2010_usd	px_2020_usd
Banana	fruit	5	7	8
Apple	fruit	10	9	9
Lemon	vegetable	5	7	8
Potato	vegetable	8	8	10
Tomato	vegetable	3	5	6
Mango	fruit	9	10	13
Carrot	vegetable	12	13	14

food %>% 
  mutate(across(where(is.character), 
                stringr::str_to_lower)) %>% 
  head(3) %>% 
  gt()

food	type	px_2000_usd	px_2010_usd	px_2020_usd
banana	fruit	5	7	8
apple	fruit	10	9	9
lemon	vegetable	5	7	8

food %>% 
  group_by(type) %>% 
   summarize(across(where(is.numeric), 
                   list(mean = ~mean(.x, na.rm = TRUE)))) %>% 
  gt()

type	px_2000_usd_mean	px_2010_usd_mean	px_2020_usd_mean
fruit	8	8.666667	10.0
vegetable	7	8.250000	9.5

food %>% 
  select(-food) %>% 
  group_by(type) %>% 
  summarise_all(mean) %>% 
  gt()

type	px_2000_usd	px_2010_usd	px_2020_usd
fruit	8	8.666667	10.0
vegetable	7	8.250000	9.5

Conditional action

_if , _at variants

select_if
summarise_at

food %>% 
  select_if(is.numeric,  list(~ paste0("numeric_", .))) %>% 
  gt()

numeric_px_2000_usd	numeric_px_2010_usd	numeric_px_2020_usd
5	7	8
10	9	9
5	7	8
8	8	10
3	5	6
9	10	13
12	13	14

food %>% 
  summarise_at(vars(matches("px")), mean) %>% 
  gt()

px_2000_usd	px_2010_usd	px_2020_usd
7.428571	8.428571	9.714286

View all the grouping variables

complete
group_by

food %>% 
  count(type) %>% 
  complete(type) %>% 
  gt()

type	n
fruit	3
vegetable	4
staple	NA

Use fill option if you would like to replace NA to another value like 0 or 9999.

To keep groups with zero length in output

food %>% 
  group_by(type) %>% 
  summarise(avg_px_2020 = mean(px_2020_usd)) %>% 
  gt()

type	avg_px_2020
fruit	10.0
vegetable	9.5

food %>% 
  group_by(type, .drop = FALSE) %>% 
  summarise(avg_px_2020 = mean(px_2020_usd)) %>% 
  gt()

type	avg_px_2020
fruit	10.0
vegetable	9.5
staple	NaN

Change position of a variable

relocate
mutate

food %>%
  relocate(food, .after=type) %>% 
  gt()

type	food	px_2000_usd	px_2010_usd	px_2020_usd
fruit	Banana	5	7	8
fruit	Apple	10	9	9
vegetable	Lemon	5	7	8
vegetable	Potato	8	8	10
vegetable	Tomato	3	5	6
fruit	Mango	9	10	13
vegetable	Carrot	12	13	14

food %>%
  mutate(food_upper = stringr::str_to_upper(food), .after=type) %>% 
  gt()

food	type	food_upper	px_2000_usd	px_2010_usd	px_2020_usd
Banana	fruit	BANANA	5	7	8
Apple	fruit	APPLE	10	9	9
Lemon	vegetable	LEMON	5	7	8
Potato	vegetable	POTATO	8	8	10
Tomato	vegetable	TOMATO	3	5	6
Mango	fruit	MANGO	9	10	13
Carrot	vegetable	CARROT	12	13	14

Beyond Tidyverse 101

Intro

Priyanka Gagneja

Basics

What did I miss

Show me some action

Sample data

Repeat an action

Conditional action

View all the grouping variables

Change position of a variable

Thank You