Factors are a very useful type of variable in R, but they can also drive you nuts. Especially the “stealth factor” that you think of as character.
Can we soften some of their sharp edges?
fbind()
Binding two factors via fbind()
:
library(foofactors)
a <- factor(c("character", "hits", "your", "eyeballs"))
b <- factor(c("but", "integer", "where it", "counts"))
Simply catenating two factors leads to a result that most don’t expect.
## [1] 1 3 4 2 1 3 4 2
The fbind()
function glues two factors together and returns factor.
## [1] character hits your eyeballs but integer where it
## [8] counts
## Levels: but character counts eyeballs hits integer where it your
Often we want a table of frequencies for the levels of a factor. The base table()
function returns an object of class table
, which can be inconvenient for downstream work. Processing with as.data.frame()
can be helpful but it’s a bit clunky.
## x
## a b c d e
## 25 26 17 17 15
## x Freq
## 1 a 25
## 2 b 26
## 3 c 17
## 4 d 17
## 5 e 15
freq_out()
The freq_out()
function returns a frequency table as a well-named tbl_df
:
## # A tibble: 5 x 2
## x n
## <fct> <int>
## 1 a 25
## 2 b 26
## 3 c 17
## 4 d 17
## 5 e 15
first_upper()
The first_upper()
function returns the same factor but with the first letter of each character value capitalized.
months <- factor(c("january","february","march","april","may","june","july",
"august","september","october","november","december"))
first_upper(months)
## [1] January February March April May June July
## [8] August September October November December
## 12 Levels: April August December February January July June March ... September
subfactor()
The subfactor()
This function returns a factor with just the first three letters of each element.
library(tidyverse)
library(scales)
library(gapminder)
Euro <- gapminder %>%
filter(year == 2007, continent == "Europe" ) %>%
head(20)
Euro %>%
ggplot(aes(x = country, y = lifeExp)) +
geom_point(shape = 23, fill = 'cornflowerblue', size = 3) +
guides(fill=F) +
labs(x = "Country", y = "Life Expectancy (years)") +
theme_bw()
And now lets see after applying the subfactor()
function
Euro %>%
ggplot(aes(x = subfactor(country), y = lifeExp)) +
geom_point(shape = 23, fill = 'cornflowerblue', size = 3) +
labs(x = "Country", y = "Life Expectancy (years)") +
theme_bw()
factor_asis()
The factor_asis()
function encodes a vector as a factor with levels in the same order as they appear in the data instead of the alphabetical order.
## [1] "Africa" "Americas" "Asia" "Europe" "Oceania"
continents_asis <- factor_asis(c("Asia", "Europe", "Africa", "Americas", "Oceania"))
levels(continents_asis)
## [1] "Asia" "Europe" "Africa" "Americas" "Oceania"
This function has the same effect when plotting. To demonstrate this, let’s look at a plot of the number of countries in each continent from the gapminder
dataset without modification.
gapminder %>%
filter(year == 2007) %>%
ggplot(aes(x = continent, fill = continent)) +
scale_fill_brewer(palette = "Pastel2") +
geom_bar() +
guides(fill=F) +
labs(x = "Continent", y = "Number of countries") +
theme_bw()
And now the same plot using factor_asis()
.
gapminder %>%
filter(year == 2007) %>%
ggplot(aes(x = factor_asis(continent), fill = continent)) +
scale_fill_brewer(palette = "Pastel2") +
geom_bar() +
guides(fill=F) +
labs(x = "Continent", y = "Number of countries") +
theme_bw()
As noted, the order of the continents corresponds to the order of appearance in the dataset.
reorder_desc()
The reorder_desc()
function reorders the levels of a factor in a descending order.
## [1] Africa Americas Asia Europe Oceania
## Levels: Africa Americas Asia Europe Oceania
## [1] Africa Americas Asia Europe Oceania
## attr(,"scores")
## Africa Americas Asia Europe Oceania
## -1 -2 -3 -4 -5
## Levels: Oceania Europe Asia Americas Africa
This function mantains levels in a descending order even when plotting:
gapminder %>%
filter(year == 2007) %>%
ggplot(aes(x = reorder_desc(continent), fill = continent)) +
scale_fill_brewer(palette = "Pastel2") +
geom_bar() +
guides(fill=F) +
labs(x = "Continent", y = "Number of countries") +
theme_bw()
```