R was the first scripting language I was introduced and it was love at first line of code. When working in R, I always find myself using the tidyverse package. It amplifies the R’s ability data manipulation, exploration and visualization. The packages that are encompasses in the tidyterve library are intended to increase the productivity as a whole while using R.
Here are some of the ways to deal with categorical variable using the forcats package within tidyverse.
First, reordering catergival varaible by their frequency. The fct_infreq & fct_lump functions makes this extremely easy.
starwars %>%
filter(!is.na(hair_color)) %>%
ggplot(aes(x = fct_infreq(hair_color))) +
geom_bar() +
labs(title = "Most Common Hair Color in Star Wars", x = "Types of Hair Color") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))starwars %>%
mutate(skin_color = fct_lump(skin_color, n = 5)) %>%
count(skin_color, sort = T) %>%
kable() %>%
kable_styling(full_width = F)| skin_color | n |
|---|---|
| Other | 41 |
| fair | 17 |
| light | 11 |
| dark | 6 |
| green | 6 |
| grey | 6 |
Next, we we will take a look at the fct_order function.
Here we can reorder the levels of the factors, which will make plotting this data much more visually stimulating.
a <- gapminder %>%
filter(year == 2002, continent == "Asia") %>%
ggplot(aes(x = lifeExp, y = country)) +
geom_point()b <- gapminder %>%
filter(year == 2002, continent == "Asia") %>%
ggplot(aes(x = lifeExp, y = fct_reorder(country, lifeExp))) +
geom_point()