suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
1. Words starting with vowels
str_subset(stringr::words, "^[aeiou]")
[1] "a" "able" "about" "absolute" "accept" "account"
[7] "achieve" "across" "act" "active" "actual" "add"
[13] "address" "admit" "advertise" "affect" "afford" "after"
[19] "afternoon" "again" "against" "age" "agent" "ago"
[25] "agree" "air" "all" "allow" "almost" "along"
[31] "already" "alright" "also" "although" "always" "america"
[37] "amount" "and" "another" "answer" "any" "apart"
[43] "apparent" "appear" "apply" "appoint" "approach" "appropriate"
[49] "area" "argue" "arm" "around" "arrange" "art"
[55] "as" "ask" "associate" "assume" "at" "attend"
[61] "authority" "available" "aware" "away" "awful" "each"
[67] "early" "east" "easy" "eat" "economy" "educate"
[73] "effect" "egg" "eight" "either" "elect" "electric"
[79] "eleven" "else" "employ" "encourage" "end" "engine"
[85] "english" "enjoy" "enough" "enter" "environment" "equal"
[91] "especial" "europe" "even" "evening" "ever" "every"
[97] "evidence" "exact" "example" "except" "excuse" "exercise"
[103] "exist" "expect" "expense" "experience" "explain" "express"
[109] "extra" "eye" "idea" "identify" "if" "imagine"
[115] "important" "improve" "in" "include" "income" "increase"
[121] "indeed" "individual" "industry" "inform" "inside" "instead"
[127] "insure" "interest" "into" "introduce" "invest" "involve"
[133] "issue" "it" "item" "obvious" "occasion" "odd"
[139] "of" "off" "offer" "office" "often" "okay"
[145] "old" "on" "once" "one" "only" "open"
[151] "operate" "opportunity" "oppose" "or" "order" "organize"
[157] "original" "other" "otherwise" "ought" "out" "over"
[163] "own" "under" "understand" "union" "unit" "unite"
[169] "university" "unless" "until" "up" "upon" "use"
[175] "usual"
2. Words that contain only consonants
str_subset(stringr::words, "^[^aeiou]+$")
[1] "by" "dry" "fly" "mrs" "try" "why"
This seems to require using the +
pattern introduced later, unless one wants to be very verbose and specify words of certain lengths.
3. Words that end with “-ed” but not ending in “-eed”.
str_subset(stringr::words, "[^e]ed$")
[1] "bed" "hundred" "red"
The pattern above will not match the word “ed”. If we wanted to include that, we could include it as a special case.
str_subset(c("ed", stringr::words), "(^|[^e])ed$")
[1] "ed" "bed" "hundred" "red"
4. Words ending in “ing” or “ise”:
str_subset(stringr::words, "i(ng|se)$")
[1] "advertise" "bring" "during" "evening" "exercise" "king" "meaning"
[8] "morning" "otherwise" "practise" "raise" "realise" "ring" "rise"
[15] "sing" "surprise" "thing"
length(str_subset(stringr::words, "(cei|[^c]ie)"))
[1] 14
length(str_subset(stringr::words, "(cie|[^c]ei)"))
[1] 3
In the stringr::words
dataset, yes.
str_view(stringr::words, "q[^u]", match = TRUE)
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
In the English language— no. However, the examples are few, and mostly loanwords, such as “burqa” and “cinq”. Also, “qwerty”. That I had to add all of those examples to the list of words that spellchecking should ignore is indicative of their rarity.
In the general case, this is hard, and could require a dictionary. But, there are a few heuristics to consider that would account for some common cases: British English tends to use the following:
The regex ou|ise$|ae|oe|yse$
would match these.
There are other spelling differences between American and British English but they are not patterns amenable to regular expressions. It would require a dictionary with differences in spellings for different words.
For the United States, phone numbers have a format like 123-456-7890.
x <- c("123-456-7890", "1235-2351")
str_view(x, "\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d")
or
str_view(x, "[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]")
This regular expression can be simplified with the {m,n}
regular expression modifier introduced in the next section,
str_view(x, "\\d{3}-\\d{3}-\\d{4}")
This answer can be improved and expanded. Note that this pattern doesn’t account for phone numbers that are invalid because of unassigned area code, or special numbers like 911, or extensions. See the Wikipedia page for the North American Numbering Plan for more information on the complexities of US phone numbers, and this Stack Overflow question for a discussion of using a regex for phone number validation.