Top_n values and rest as other

Author

P K Parida

Example of how to calculate the top_n value, and rest is others

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Bring the data into r

data <- data.frame(group = c("A", "B", "A", "A", "C", "C", "D", "E", "F", "D","G")) 

Codes for calculation the top 3 values by group and rest as other

data %>%
  count(group) %>% 
  arrange(desc(n)) %>% # calculate the value for cutoff
  mutate(id = if_else(n>= 2, 1L, 2L)) %>% # add a id and group them differently
  group_by(id) %>%
  arrange(id, -n) %>% 
  mutate(group = if_else(id == 2L, "Others", group),
         number = if_else(group == "Others", sum(n), n)) %>% 
  select(group, number) %>% 
  distinct()
Adding missing grouping variables: `id`
# A tibble: 4 × 3
# Groups:   id [2]
     id group  number
  <int> <chr>   <int>
1     1 A           3
2     1 C           2
3     1 D           2
4     2 Others      4

Another way of doing the same thing (the top 3 values by group and rest as other)

data %>%
  count(group) %>% 
  arrange(desc(n)) %>% # calculate the value for cutoff
  mutate(group1 = if_else(n>= 2, group, "Others")) %>% 
  select(group1, n) %>% 
  group_by(group1) %>% 
  summarize (sum = sum(n))
# A tibble: 4 × 2
  group1   sum
  <chr>  <int>
1 A          3
2 C          2
3 D          2
4 Others     4