The purpose of this is to address a question about reclassifying a numeric vector of ages into a character vector or factor vector of age classes. I simulate some ages and go through a few possible approaches. These approaches rely on the `case_when()` function from the dplyr package and `fct_relevel()` function from the forcats packages. Some additional base R is thrown in for the optional approaches. These include:

• As a character vector
• As a factor using `factor(., levels=c(...))`
• As a factor using `fct_relevel(., ...)`
• Put together as a single dplyr sequence with `tibble()`, `case_when()` and `fct_relevel()`

The final approach would be my preferred approach as it is concise and does not lose any readability.

### Generate some age data

``````# set seed for random age generation
set.seed(717)
# make a vector of random but likely ages
sim_ages <- abs(floor(rnorm(100,30,30)))
# look at distribution
print(sim_ages)``````
``````  [1] 71 49 26 23 36 38 50 66 53  7 19 31  6 29 32 77  1 25 32 26  4 19 23
[24] 26  9 63 35 15 71 13 40 38 53 88 31 71 26 14 37 23  2 35 22 48  0 30
[47] 22 78  7 41 73 22 96  3 17 35 51 21 59 71 67 54 16 40 56 21 24 36  1
[70] 51  8 79 18 37 26 41 54 42  4 38 55 10 18 54 42 28 13 12 22 36 48 41
[93] 19 28  8 20  1 65 34 25``````
``hist(sim_ages)``

## dplyr::case_when() Approach

The `dplyr::case_when()` function is a vectorized set of `if ... ifelse` statements. We’ll use it to categorize the ages into age categories (* note, I am in the “Old’ish” category, so I can be mean like that )

Note the `>` and `<=` to make open and close ends to the age ranges sets. Also note that any age less than 0 and greater than 125 (oldest person ever was 122.5 y.o.) will return `NA`. You could (should) set up catches for out of bounds data such as that somewhere in your analysis

### As Character

First treating the output as a vector of characters since that seemed to be part of your question. But we will see in the plot that there are problems with this.

``````new_ages <- case_when(
sim_ages >=  0 & sim_ages <= 3   ~ "Baby",
sim_ages >  3 & sim_ages <= 5    ~ "Toddler",
sim_ages >  5 & sim_ages <= 12   ~ "Kid",
sim_ages >  12 & sim_ages <= 19  ~ "Teenager",
sim_ages >  19 & sim_ages <= 30  ~ "Twenty something",
sim_ages >  30 & sim_ages <= 65  ~ "Old'ish",
sim_ages >  65 & sim_ages <= 90  ~ "Senior",
sim_ages >  90 & sim_ages <= 125 ~ "Super Hero"
)
print(new_ages)``````
``````  [1] "Senior"           "Old'ish"          "Twenty something"
[4] "Twenty something" "Old'ish"          "Old'ish"
[7] "Old'ish"          "Senior"           "Old'ish"
[10] "Kid"              "Teenager"         "Old'ish"
[13] "Kid"              "Twenty something" "Old'ish"
[16] "Senior"           "Baby"             "Twenty something"
[19] "Old'ish"          "Twenty something" "Toddler"
[22] "Teenager"         "Twenty something" "Twenty something"
[25] "Kid"              "Old'ish"          "Old'ish"
[28] "Teenager"         "Senior"           "Teenager"
[31] "Old'ish"          "Old'ish"          "Old'ish"
[34] "Senior"           "Old'ish"          "Senior"
[37] "Twenty something" "Teenager"         "Old'ish"
[40] "Twenty something" "Baby"             "Old'ish"
[43] "Twenty something" "Old'ish"          "Baby"
[46] "Twenty something" "Twenty something" "Senior"
[49] "Kid"              "Old'ish"          "Senior"
[52] "Twenty something" "Super Hero"       "Baby"
[55] "Teenager"         "Old'ish"          "Old'ish"
[58] "Twenty something" "Old'ish"          "Senior"
[61] "Senior"           "Old'ish"          "Teenager"
[64] "Old'ish"          "Old'ish"          "Twenty something"
[67] "Twenty something" "Old'ish"          "Baby"
[70] "Old'ish"          "Kid"              "Senior"
[73] "Teenager"         "Old'ish"          "Twenty something"
[76] "Old'ish"          "Old'ish"          "Old'ish"
[79] "Toddler"          "Old'ish"          "Old'ish"
[82] "Kid"              "Teenager"         "Old'ish"
[85] "Old'ish"          "Twenty something" "Teenager"
[88] "Kid"              "Twenty something" "Old'ish"
[91] "Old'ish"          "Old'ish"          "Teenager"
[94] "Twenty something" "Kid"              "Twenty something"
[97] "Baby"             "Old'ish"          "Old'ish"
[100] "Twenty something"``````

#### Preservce as character

convert character vector to a data.frame for plotting. I use `stringsAsFactors = FALSE` to conserve the character class of our data (as opposed to automatically converting to factor)

``plot_ages_char <- data.frame(age_class = new_ages, stringsAsFactors = FALSE)``

#### Plot

Notice it is ordered alphabetically since they are characters. We could do some work in the ggplot to correct this, but only effects the plot and not the data. If you want the correct order later, then factors is probably what you want (as shown below)

``````ggplot(plot_ages_char, aes(x = age_class)) +
geom_bar() +
theme_bw()``````

### As Factor with ordered levels

First we need to establish what the correct order is since it is arbitrary.

``````ord_ages_class <- c("Baby", "Toddler", "Kid", "Teenager",
"Twenty something", "Old'ish",
"Senior", "Super Hero")``````

convert character vector to data.frame, `stringsAsFactors` is still `FALSE` because the second line takes care of making it with `factor()` and the `levels` argument and the `ord_ages_class` object we just made above.

``````plot_ages_fctr <- data.frame(age_class = new_ages,
stringsAsFactors = FALSE) %>%
mutate(age_class = factor(age_class,
levels = ord_ages_class))``````

It is important to note that we rearrange the levels, but not explicitly as an ordered factor. We are not making it explicit that “Baby” is qualitatively younger than “Toddler”. So we get:

``````# TRUE
is.factor(plot_ages_fctr\$age_class)``````
``[1] TRUE``
``````#FALSE
is.ordered(plot_ages_fctr\$age_class)``````
``[1] FALSE``

#### Plot

Note that the bar plot has the correct ordering of age classes

``````ggplot(plot_ages_fctr, aes(x = age_class)) +
geom_bar() +
theme_bw()``````

### forcats package

I just discovered this, but it is really nice. The `forcats` package is a new part of the tidyverse for dealing with categorical variables. Here we use the `fct_levels()` function to do the relevel. Second line converts to data.frame for plotting

``````plot_ages_forcats <- fct_relevel(new_ages, ord_ages_class) %>%
data.frame(age_class = .)``````

#### Plot

Correct order again

``````ggplot(plot_ages_forcats, aes(x = age_class)) +
geom_bar() +
theme_bw()``````

## Tidy Approach

So putting the different approaches above into a nice concise steam, we can make this a pretty Tidy approach and string it all together using `tibble()` to get our data into a format that dplyr likes, `mutate()` to hold our transformations of `case_when()` to do the record, and `fct_relevel()` to reorder the factor levels.

### note

The vector of age classes (`ord_ages_class`) we made earlier is copied here to show the full code. The `case_when()` function was altered to set each class to an element of the `ord_age_class` vector. This makes it so that you are only typing the age classes in one place to avoid typos. Also, this uses `.\$age_class` where you might hope to use simply `age_class`, but `case_when()` is new’ish and the issue has been raised.

``````ord_ages_class <- c("Baby", "Toddler", "Kid", "Teenager",
"Twenty something", "Old'ish",
"Senior", "Super Hero")

new_ages2 <- tibble(age_class = sim_ages) %>%
mutate(age_class = case_when(
.\$age_class >=  0 & .\$age_class <= 3   ~ ord_ages_class[1],
.\$age_class >  3 & .\$age_class <= 5    ~ ord_ages_class[2],
.\$age_class >  5 & .\$age_class <= 12   ~ ord_ages_class[3],
.\$age_class >  12 & .\$age_class <= 19  ~ ord_ages_class[4],
.\$age_class >  19 & .\$age_class <= 30  ~ ord_ages_class[5],
.\$age_class >  30 & .\$age_class <= 65  ~ ord_ages_class[6],
.\$age_class >  65 & .\$age_class <= 90  ~ ord_ages_class[7],
.\$age_class >  90 & .\$age_class <= 125 ~ ord_ages_class[8]),
age_class = fct_relevel(age_class, ord_ages_class)
) ``````

#### Plot

``````ggplot(new_ages2, aes(x = age_class)) +
geom_bar() +
theme_bw()``````