Working with factors in R sometimes might not intuitive. Particularly, if we convert a character variable into factor then the default sort order follows as per alphabetic order. The problem arises when we want to re-arrange the factor labels in the visualisations. This is straight forward for an ordered factor (ordinal variable), but not for un-ordered (nominal variable). In the example below, I will write a short example showing how we can re-arrange factor labels for a nominal variable.
The first part of this example is to create a data with two variables, as follows:
personId: A numeric unique ID of study
individuals
sportsName: Name of sports she/he is wanted to build
career. This is a character variable
set.seed(12345)
exampleDF <- data.frame(
personId = 1:150,
sportsName = sample(
c("Soccer", "Cricket", "Basketball", "Rugby", "Football"),
size = 150,
replace = TRUE
)
)
In the code block above, set.seed() function used so
that the results are reproducible. The data contains 150 rows with
unique ID as the sequential numbers, and the sports names were created
by taking a sample of size 150 with replacement from five different
sports options
c("Soccer", "Cricket", "Basketball", "Rugby", "Football").
Now the data is ready and we want to create a factor variable from
sportsName and will create a bar chart using
ggplot2. Here I will use dplyr library for
data processing.
library(dplyr)
exampleDF <- exampleDF %>%
mutate(
sportsFct = as.factor(sportsName)
)
library(ggplot2)
ggplot(data = exampleDF)+
geom_bar(
aes(
x=sportsFct,
y=after_stat(count)
)
)
We can see that factor labels were organized as per their alphabetic
order. We want to change this order to display it as:
Soccer, Football, Rugby,
Basketball, and Cricket. To achieve this goal,
we will use another library called forcats.
library(forcats)
exampleDF <- exampleDF %>%
mutate(
sportsFct = fct_relevel(
sportsFct,
c("Soccer", "Football", "Rugby", "Basketball", "Cricket")
)
)
ggplot(data = exampleDF)+
geom_bar(
aes(
x=sportsFct,
y=after_stat(count)
)
)
Now if we want Cricket should display as the first
element, then:
exampleDF <- exampleDF %>%
mutate(
sportsFct = fct_relevel(
sportsFct,
c("Cricket","Soccer", "Football", "Rugby", "Basketball")
)
)
ggplot(data = exampleDF)+
geom_bar(
aes(
x=sportsFct,
y=after_stat(count)
)
)
Please leave a comment or questions if you have any. Also, if you want more customisation in this plot, feel free to ask.