Beyond Tidyverse 101

Aha Moments while doing data wrangling in Tidyverse Ecosystem

Priyanka Gagneja

Intro

Priyanka Gagneja

Data Analytics Consultant @ OnPoint Insights
Data Analytics Freelancer
Twitter: priyankaigit
Linkedin: priyanka-gagneja

Basics

  • select()

  • filter()

  • arrange()

  • mutate()

  • summarise()

  • group_by()

What did I miss

Important

Details in the documentation !!

Show me some action

  • Repeat an action across multiple columns at once
  • Include all the grouping variables even if its instance is not in the data
  • Conditional action
  • Change position of a variable

Sample data

food <- tibble(
    food = c('Banana', 'Apple', 'Lemon','Potato', 'Tomato', 'Mango', 'Carrot'),
    type = c('fruit','fruit','vegetable','vegetable','vegetable','fruit','vegetable'),
    px_2000_usd = c(5, 10, 5, 8, 3, 9, 12),
    px_2010_usd = c(7, 9, 7, 8, 5, 10, 13),
    px_2020_usd = c(8, 9, 8, 10, 6, 13, 14)
    
) %>% 
  mutate(type = factor(type, levels = c('fruit', 'vegetable','staple')))

food %>% 
  gt()
food type px_2000_usd px_2010_usd px_2020_usd
Banana fruit 5 7 8
Apple fruit 10 9 9
Lemon vegetable 5 7 8
Potato vegetable 8 8 10
Tomato vegetable 3 5 6
Mango fruit 9 10 13
Carrot vegetable 12 13 14

Repeat an action

food %>% 
  mutate(across(where(is.character), 
                stringr::str_to_lower)) %>% 
  head(3) %>% 
  gt()
food type px_2000_usd px_2010_usd px_2020_usd
banana fruit 5 7 8
apple fruit 10 9 9
lemon vegetable 5 7 8
food %>% 
  group_by(type) %>% 
   summarize(across(where(is.numeric), 
                   list(mean = ~mean(.x, na.rm = TRUE)))) %>% 
  gt()
type px_2000_usd_mean px_2010_usd_mean px_2020_usd_mean
fruit 8 8.666667 10.0
vegetable 7 8.250000 9.5
food %>% 
  select(-food) %>% 
  group_by(type) %>% 
  summarise_all(mean) %>% 
  gt()
type px_2000_usd px_2010_usd px_2020_usd
fruit 8 8.666667 10.0
vegetable 7 8.250000 9.5

Conditional action

_if , _at variants

food %>% 
  select_if(is.numeric,  list(~ paste0("numeric_", .))) %>% 
  gt()
numeric_px_2000_usd numeric_px_2010_usd numeric_px_2020_usd
5 7 8
10 9 9
5 7 8
8 8 10
3 5 6
9 10 13
12 13 14
food %>% 
  summarise_at(vars(matches("px")), mean) %>% 
  gt()
px_2000_usd px_2010_usd px_2020_usd
7.428571 8.428571 9.714286

View all the grouping variables

food %>% 
  count(type) %>% 
  complete(type) %>% 
  gt()
type n
fruit 3
vegetable 4
staple NA

Use fill option if you would like to replace NA to another value like 0 or 9999.

To keep groups with zero length in output

food %>% 
  group_by(type) %>% 
  summarise(avg_px_2020 = mean(px_2020_usd)) %>% 
  gt()
type avg_px_2020
fruit 10.0
vegetable 9.5
food %>% 
  group_by(type, .drop = FALSE) %>% 
  summarise(avg_px_2020 = mean(px_2020_usd)) %>% 
  gt()
type avg_px_2020
fruit 10.0
vegetable 9.5
staple NaN

Change position of a variable

food %>%
  relocate(food, .after=type) %>% 
  gt()
type food px_2000_usd px_2010_usd px_2020_usd
fruit Banana 5 7 8
fruit Apple 10 9 9
vegetable Lemon 5 7 8
vegetable Potato 8 8 10
vegetable Tomato 3 5 6
fruit Mango 9 10 13
vegetable Carrot 12 13 14
food %>%
  mutate(food_upper = stringr::str_to_upper(food), .after=type) %>% 
  gt()
food type food_upper px_2000_usd px_2010_usd px_2020_usd
Banana fruit BANANA 5 7 8
Apple fruit APPLE 10 9 9
Lemon vegetable LEMON 5 7 8
Potato vegetable POTATO 8 8 10
Tomato vegetable TOMATO 3 5 6
Mango fruit MANGO 9 10 13
Carrot vegetable CARROT 12 13 14

Thank You