Creating factors

In R, factors are used to work with categorical variables, variables that have a fixed and known set of possible values.

To work with factors, we use forcats package.

To create a factor we must start by creating a list of the valid levels.

tweet_types <- c(
"News", "Fun", "Quotation", "Commentary", "Document"
)

Then, we can make a factor simply like other vectors:

type <- c("Fun","Commentary","Quotation","Quotation","News","Quotation","Document","Commentary")

tweet_type <- factor(type, levels = tweet_types) 
tweet_type
## [1] Fun        Commentary Quotation  Quotation  News       Quotation  Document  
## [8] Commentary
## Levels: News Fun Quotation Commentary Document

Levels here play an important role. Some points on levels:

(tweet_type <- factor(type))
## [1] Fun        Commentary Quotation  Quotation  News       Quotation  Document  
## [8] Commentary
## Levels: Commentary Document Fun News Quotation
(tweet_type <- factor(type, levels = unique(type)))
## [1] Fun        Commentary Quotation  Quotation  News       Quotation  Document  
## [8] Commentary
## Levels: Fun Commentary Quotation News Document
levels(tweet_type)
## [1] "Fun"        "Commentary" "Quotation"  "News"       "Document"

So, we learn that we can create a factor by pre-defined levels or concurrent levels.

Exploring levels

When factors are stored in a tibble, you can’t see their levels so easily. One way to see them is with count():

gss_cat %>%
count(race)
## # A tibble: 3 x 2
##   race      n
##   <fct> <int>
## 1 Other  1959
## 2 Black  3129
## 3 White 16395

Or with a bar chart:

ggplot(gss_cat, aes(race)) +
geom_bar()

By default, ggplot2 will drop levels that don’t have any values. You can force them to display with:

ggplot(gss_cat, aes(race)) +
geom_bar() +
scale_x_discrete(drop = FALSE)

Reordering and releveling factors

In many cases, when we plot a factor it quite possible that the data is illustrated in a disorder way. To fix this, we should use fct_reorder().

fct_reorder() takes three arguments:

There are also other cases when we want to force some levels to come first, or change the levels in general. Here, we should use fct_relevel(). It takes a factor, f, and then any number of levels that you want to move to the front of the line.

fct_relevel() also takes a function, e.g., sort to relevel the factor (other functions: rev and sample).

Recoding the factor’s labels

We can change the labels of levels. fct_recode() does this job. It allows you to recode, or change, the value of each level.

fct_recode(factor, new lable = "old label)