Cheking rows with a specific pattern

library(readr)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.3.0     v dplyr   0.8.4
## v tibble  2.1.3     v stringr 1.4.0
## v tidyr   1.0.2     v forcats 0.4.0
## v purrr   0.3.3
## Warning: package 'ggplot2' was built under R version 3.6.3
## -- Conflicts --------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
incomeUS <- read_csv("adultincome.csv")
## Parsed with column specification:
## cols(
##   age = col_double(),
##   workclass = col_character(),
##   fnlwgt = col_double(),
##   education = col_character(),
##   education.num = col_double(),
##   marital.status = col_character(),
##   occupation = col_character(),
##   relationship = col_character(),
##   race = col_character(),
##   sex = col_character(),
##   capital.gain = col_double(),
##   capital.loss = col_double(),
##   hours.per.week = col_double(),
##   native.country = col_character(),
##   income = col_character()
## )
incomeUS %>% 
  group_by(marital.status) %>% 
  summarise(counts = n()) %>% 
  arrange(desc(counts))
## # A tibble: 7 x 2
##   marital.status        counts
##   <chr>                  <int>
## 1 Married-civ-spouse     14976
## 2 Never-married          10683
## 3 Divorced                4443
## 4 Separated               1025
## 5 Widowed                  993
## 6 Married-spouse-absent    418
## 7 Married-AF-spouse         23

we can put all the married people in the same group

This varaiable have 7 labels, three of them are:

Both levels can be grouped into the group “Married”.

Replace rows containing either “Married-AF-spouse”, “Married-civ-spouse”, or “Married-spouse-absent” by “Married”.

patterns <- c("Married-AF-spouse|Married-civ-spouse|Married-spouse-absent")
incomeUS <- incomeUS %>% 
  mutate(marital = stringr::str_replace_all(marital.status, patterns, "Married"))

To check the resulting column

incomeUS %>% 
  group_by(marital) %>% 
  summarise(counts = n()) %>% 
  arrange(desc(counts))
## # A tibble: 5 x 2
##   marital       counts
##   <chr>          <int>
## 1 Married        15417
## 2 Never-married  10683
## 3 Divorced        4443
## 4 Separated       1025
## 5 Widowed          993