suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
numword <- "\\b(one|two|three|four|five|six|seven|eight|nine|ten) +(\\w+)"
sentences[str_detect(sentences, numword)] %>%
str_extract(numword)
[1] "seven books" "two met" "two factors" "three lists" "seven is"
[6] "two when" "ten inches" "one war" "one button" "six minutes"
[11] "ten years" "two shares" "two distinct" "five cents" "two pins"
[16] "five robins" "four kinds" "three story" "three inches" "six comes"
[21] "three batches" "two leaves"
This is done in two steps. First, identify the contractions. Second, split the string on the contraction.
contraction <- "([A-Za-z]+)'([A-Za-z]+)"
sentences[str_detect(sentences, contraction)] %>%
str_extract(contraction) %>%
str_split("'")
[[1]]
[1] "It" "s"
[[2]]
[1] "man" "s"
[[3]]
[1] "don" "t"
[[4]]
[1] "store" "s"
[[5]]
[1] "workmen" "s"
[[6]]
[1] "Let" "s"
[[7]]
[1] "sun" "s"
[[8]]
[1] "child" "s"
[[9]]
[1] "king" "s"
[[10]]
[1] "It" "s"
[[11]]
[1] "don" "t"
[[12]]
[1] "queen" "s"
[[13]]
[1] "don" "t"
[[14]]
[1] "pirate" "s"
[[15]]
[1] "neighbor" "s"