The purpose of this is to address a question about reclassifying a numeric vector of ages into a character vector or factor vector of age classes. I simulate some ages and go through a few possible approaches. These approaches rely on the case_when() function from the dplyr package and fct_relevel() function from the forcats packages. Some additional base R is thrown in for the optional approaches. These include:

The final approach would be my preferred approach as it is concise and does not lose any readability.

Generate some age data

# set seed for random age generation
set.seed(717)
# make a vector of random but likely ages
sim_ages <- abs(floor(rnorm(100,30,30)))
# look at distribution
print(sim_ages)
  [1] 71 49 26 23 36 38 50 66 53  7 19 31  6 29 32 77  1 25 32 26  4 19 23
 [24] 26  9 63 35 15 71 13 40 38 53 88 31 71 26 14 37 23  2 35 22 48  0 30
 [47] 22 78  7 41 73 22 96  3 17 35 51 21 59 71 67 54 16 40 56 21 24 36  1
 [70] 51  8 79 18 37 26 41 54 42  4 38 55 10 18 54 42 28 13 12 22 36 48 41
 [93] 19 28  8 20  1 65 34 25
hist(sim_ages)

dplyr::case_when() Approach

The dplyr::case_when() function is a vectorized set of if ... ifelse statements. We’ll use it to categorize the ages into age categories (* note, I am in the “Old’ish” category, so I can be mean like that )

Note the > and <= to make open and close ends to the age ranges sets. Also note that any age less than 0 and greater than 125 (oldest person ever was 122.5 y.o.) will return NA. You could (should) set up catches for out of bounds data such as that somewhere in your analysis

As Character

First treating the output as a vector of characters since that seemed to be part of your question. But we will see in the plot that there are problems with this.

new_ages <- case_when(
  sim_ages >=  0 & sim_ages <= 3   ~ "Baby",
  sim_ages >  3 & sim_ages <= 5    ~ "Toddler",
  sim_ages >  5 & sim_ages <= 12   ~ "Kid",
  sim_ages >  12 & sim_ages <= 19  ~ "Teenager",
  sim_ages >  19 & sim_ages <= 30  ~ "Twenty something",
  sim_ages >  30 & sim_ages <= 65  ~ "Old'ish",
  sim_ages >  65 & sim_ages <= 90  ~ "Senior",
  sim_ages >  90 & sim_ages <= 125 ~ "Super Hero"
)
print(new_ages)
  [1] "Senior"           "Old'ish"          "Twenty something"
  [4] "Twenty something" "Old'ish"          "Old'ish"         
  [7] "Old'ish"          "Senior"           "Old'ish"         
 [10] "Kid"              "Teenager"         "Old'ish"         
 [13] "Kid"              "Twenty something" "Old'ish"         
 [16] "Senior"           "Baby"             "Twenty something"
 [19] "Old'ish"          "Twenty something" "Toddler"         
 [22] "Teenager"         "Twenty something" "Twenty something"
 [25] "Kid"              "Old'ish"          "Old'ish"         
 [28] "Teenager"         "Senior"           "Teenager"        
 [31] "Old'ish"          "Old'ish"          "Old'ish"         
 [34] "Senior"           "Old'ish"          "Senior"          
 [37] "Twenty something" "Teenager"         "Old'ish"         
 [40] "Twenty something" "Baby"             "Old'ish"         
 [43] "Twenty something" "Old'ish"          "Baby"            
 [46] "Twenty something" "Twenty something" "Senior"          
 [49] "Kid"              "Old'ish"          "Senior"          
 [52] "Twenty something" "Super Hero"       "Baby"            
 [55] "Teenager"         "Old'ish"          "Old'ish"         
 [58] "Twenty something" "Old'ish"          "Senior"          
 [61] "Senior"           "Old'ish"          "Teenager"        
 [64] "Old'ish"          "Old'ish"          "Twenty something"
 [67] "Twenty something" "Old'ish"          "Baby"            
 [70] "Old'ish"          "Kid"              "Senior"          
 [73] "Teenager"         "Old'ish"          "Twenty something"
 [76] "Old'ish"          "Old'ish"          "Old'ish"         
 [79] "Toddler"          "Old'ish"          "Old'ish"         
 [82] "Kid"              "Teenager"         "Old'ish"         
 [85] "Old'ish"          "Twenty something" "Teenager"        
 [88] "Kid"              "Twenty something" "Old'ish"         
 [91] "Old'ish"          "Old'ish"          "Teenager"        
 [94] "Twenty something" "Kid"              "Twenty something"
 [97] "Baby"             "Old'ish"          "Old'ish"         
[100] "Twenty something"

Preservce as character

convert character vector to a data.frame for plotting. I use stringsAsFactors = FALSE to conserve the character class of our data (as opposed to automatically converting to factor)

plot_ages_char <- data.frame(age_class = new_ages, stringsAsFactors = FALSE)

Plot

Notice it is ordered alphabetically since they are characters. We could do some work in the ggplot to correct this, but only effects the plot and not the data. If you want the correct order later, then factors is probably what you want (as shown below)

ggplot(plot_ages_char, aes(x = age_class)) +
  geom_bar() +
  theme_bw()

As Factor with ordered levels

First we need to establish what the correct order is since it is arbitrary.

ord_ages_class <- c("Baby", "Toddler", "Kid", "Teenager", 
                    "Twenty something", "Old'ish", 
                    "Senior", "Super Hero")

convert character vector to data.frame, stringsAsFactors is still FALSE because the second line takes care of making it with factor() and the levels argument and the ord_ages_class object we just made above.

plot_ages_fctr <- data.frame(age_class = new_ages, 
                             stringsAsFactors = FALSE) %>%
  mutate(age_class = factor(age_class, 
                            levels = ord_ages_class))

It is important to note that we rearrange the levels, but not explicitly as an ordered factor. We are not making it explicit that “Baby” is qualitatively younger than “Toddler”. So we get:

# TRUE
is.factor(plot_ages_fctr$age_class)
[1] TRUE
#FALSE
is.ordered(plot_ages_fctr$age_class)
[1] FALSE

Plot

Note that the bar plot has the correct ordering of age classes

ggplot(plot_ages_fctr, aes(x = age_class)) +
  geom_bar() +
  theme_bw()

forcats package

I just discovered this, but it is really nice. The forcats package is a new part of the tidyverse for dealing with categorical variables. Here we use the fct_levels() function to do the relevel. Second line converts to data.frame for plotting

plot_ages_forcats <- fct_relevel(new_ages, ord_ages_class) %>%
  data.frame(age_class = .)

Plot

Correct order again

ggplot(plot_ages_forcats, aes(x = age_class)) +
  geom_bar() +
  theme_bw()

Tidy Approach

So putting the different approaches above into a nice concise steam, we can make this a pretty Tidy approach and string it all together using tibble() to get our data into a format that dplyr likes, mutate() to hold our transformations of case_when() to do the record, and fct_relevel() to reorder the factor levels.

note

The vector of age classes (ord_ages_class) we made earlier is copied here to show the full code. The case_when() function was altered to set each class to an element of the ord_age_class vector. This makes it so that you are only typing the age classes in one place to avoid typos. Also, this uses .$age_class where you might hope to use simply age_class, but case_when() is new’ish and the issue has been raised.

ord_ages_class <- c("Baby", "Toddler", "Kid", "Teenager", 
                    "Twenty something", "Old'ish", 
                    "Senior", "Super Hero")

new_ages2 <- tibble(age_class = sim_ages) %>%
  mutate(age_class = case_when(
    .$age_class >=  0 & .$age_class <= 3   ~ ord_ages_class[1],
    .$age_class >  3 & .$age_class <= 5    ~ ord_ages_class[2],
    .$age_class >  5 & .$age_class <= 12   ~ ord_ages_class[3],
    .$age_class >  12 & .$age_class <= 19  ~ ord_ages_class[4],
    .$age_class >  19 & .$age_class <= 30  ~ ord_ages_class[5],
    .$age_class >  30 & .$age_class <= 65  ~ ord_ages_class[6],
    .$age_class >  65 & .$age_class <= 90  ~ ord_ages_class[7],
    .$age_class >  90 & .$age_class <= 125 ~ ord_ages_class[8]),
    age_class = fct_relevel(age_class, ord_ages_class)
  ) 

Plot

ggplot(new_ages2, aes(x = age_class)) +
  geom_bar() +
  theme_bw()