Catagorical Variables within Tidyverse

Libraries

Overview

R was the first scripting language I was introduced and it was love at first line of code. When working in R, I always find myself using the tidyverse package. It amplifies the R’s ability data manipulation, exploration and visualization. The packages that are encompasses in the tidyterve library are intended to increase the productivity as a whole while using R.

Here are some of the ways to deal with categorical variable using the forcats package within tidyverse.

Rearrange factor by frequency values

Star Wars Data

First, reordering catergival varaible by their frequency. The fct_infreq & fct_lump functions makes this extremely easy.

starwars %>% 
    filter(!is.na(hair_color)) %>% 
    ggplot(aes(x = fct_infreq(hair_color))) +
    geom_bar() +
    labs(title = "Most Common Hair Color in Star Wars", x = "Types of Hair Color") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

Ordering Multiple Factors at once

starwars %>% 
  mutate(skin_color = fct_lump(skin_color, n = 5)) %>% 
  count(skin_color, sort = T) %>% 
  kable() %>% 
  kable_styling(full_width = F)
skin_color n
Other 41
fair 17
light 11
dark 6
green 6
grey 6

Rearrange factor by another factor

Gapminder Data

Next, we we will take a look at the fct_order function.

Here we can reorder the levels of the factors, which will make plotting this data much more visually stimulating.

a <- gapminder %>% 
  filter(year == 2002, continent == "Asia") %>% 
  ggplot(aes(x = lifeExp, y = country)) +
  geom_point()
b <- gapminder %>% 
  filter(year == 2002, continent == "Asia") %>% 
  ggplot(aes(x = lifeExp, y = fct_reorder(country, lifeExp))) +
  geom_point()
ggarrange(a, b, ncol = 2, nrow = 1)