stringsAsFactors = HELLNO

Alejandra Urcelay

2018-11-15

Factors are a very useful type of variable in R, but they can also drive you nuts. Especially the “stealth factor” that you think of as character.

Can we soften some of their sharp edges?

fbind()

Binding two factors via fbind():

library(foofactors)
a <- factor(c("character", "hits", "your", "eyeballs"))
b <- factor(c("but", "integer", "where it", "counts"))

Simply catenating two factors leads to a result that most don’t expect.

c(a, b)
## [1] 1 3 4 2 1 3 4 2

The fbind() function glues two factors together and returns factor.

fbind(a, b)
## [1] character hits      your      eyeballs  but       integer   where it 
## [8] counts   
## Levels: but character counts eyeballs hits integer where it your

Often we want a table of frequencies for the levels of a factor. The base table() function returns an object of class table, which can be inconvenient for downstream work. Processing with as.data.frame() can be helpful but it’s a bit clunky.

set.seed(1234)
x <- factor(sample(letters[1:5], size = 100, replace = TRUE))
table(x)
## x
##  a  b  c  d  e 
## 25 26 17 17 15
as.data.frame(table(x))
##   x Freq
## 1 a   25
## 2 b   26
## 3 c   17
## 4 d   17
## 5 e   15

freq_out()

The freq_out() function returns a frequency table as a well-named tbl_df:

freq_out(x)
## # A tibble: 5 x 2
##   x         n
##   <fct> <int>
## 1 a        25
## 2 b        26
## 3 c        17
## 4 d        17
## 5 e        15

first_upper()

The first_upper() function returns the same factor but with the first letter of each character value capitalized.

months <- factor(c("january","february","march","april","may","june","july",
         "august","september","october","november","december"))
first_upper(months)
##  [1] January   February  March     April     May       June      July     
##  [8] August    September October   November  December 
## 12 Levels: April August December February January July June March ... September

subfactor()

The subfactor() This function returns a factor with just the first three letters of each element.

library(tidyverse)
library(scales)
library(gapminder)
Euro <- gapminder %>%
  filter(year == 2007, continent == "Europe" ) %>% 
  head(20)

Euro %>%
  ggplot(aes(x = country, y = lifeExp)) +
  geom_point(shape = 23, fill = 'cornflowerblue', size = 3) +
  guides(fill=F) +
  labs(x = "Country", y = "Life Expectancy (years)") +
  theme_bw()

And now lets see after applying the subfactor() function

Euro %>%
  ggplot(aes(x = subfactor(country), y = lifeExp)) +
  geom_point(shape = 23, fill = 'cornflowerblue', size = 3) +
  labs(x = "Country", y = "Life Expectancy (years)") +
  theme_bw()

factor_asis()

The factor_asis() function encodes a vector as a factor with levels in the same order as they appear in the data instead of the alphabetical order.

continents <- factor(c("Asia", "Europe", "Africa", "Americas", "Oceania"))
levels(continents)
## [1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania"
continents_asis <- factor_asis(c("Asia", "Europe", "Africa", "Americas", "Oceania"))
levels(continents_asis)
## [1] "Asia"     "Europe"   "Africa"   "Americas" "Oceania"

This function has the same effect when plotting. To demonstrate this, let’s look at a plot of the number of countries in each continent from the gapminder dataset without modification.

gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = continent, fill = continent)) +
  scale_fill_brewer(palette = "Pastel2") +
  geom_bar() +
  guides(fill=F) +
  labs(x = "Continent", y = "Number of countries") +
  theme_bw()

And now the same plot using factor_asis().

gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = factor_asis(continent), fill = continent)) +
  scale_fill_brewer(palette = "Pastel2") +
  geom_bar() +
  guides(fill=F) +
  labs(x = "Continent", y = "Number of countries") +
  theme_bw()

As noted, the order of the continents corresponds to the order of appearance in the dataset.

reorder_desc()

The reorder_desc() function reorders the levels of a factor in a descending order.

alpha_continents <- factor(c("Africa", "Americas", "Asia", "Europe", "Oceania"))
alpha_continents
## [1] Africa   Americas Asia     Europe   Oceania 
## Levels: Africa Americas Asia Europe Oceania
reorder_desc(alpha_continents)
## [1] Africa   Americas Asia     Europe   Oceania 
## attr(,"scores")
##   Africa Americas     Asia   Europe  Oceania 
##       -1       -2       -3       -4       -5 
## Levels: Oceania Europe Asia Americas Africa

This function mantains levels in a descending order even when plotting:

gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = reorder_desc(continent), fill = continent)) +
  scale_fill_brewer(palette = "Pastel2") +
  geom_bar() +
  guides(fill=F) +
  labs(x = "Continent", y = "Number of countries") +
  theme_bw()

```