function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE)
NULL
Check the following collection of words:
begin beige beijing beging bring boing banger
begin beige beijing beging bring boing banger
Easy way to use regex
a matches ‘apple’, ‘bag’, ‘hat’, ‘dam’
pp also matches ‘apple’
cat matches ‘catch’, ‘locate’, ‘ducat’, and of course, ‘cat’
Metacharacters are where the real power of regex resides. There are various types:
. (dot) matches ANY character
\w matches any alphanumeric character
\d matches any digit (0 through 9)
\s matches whitespace (including tabs, newlines, etc.)
The negations of these are \W, \D, \S
* means zero or more
? means at least one time
+ means one or more times
{n} means “exactly n times”
{n,} means “n or more times”
{n,m} means “between n and m times”
These characters are used to define the bounds of strings
^ indicates the start of a line
$ indicates the end of a line
\b indicates the bounds of a “word”
These characters are used for grouping regex characters:
(...) - any set of characters bounded by parentheses is taken as a unit
This is a useful construct for
[xyz] is taken as match on “any of x, y, or z”
This would match “zoo”, “xenon”, “eyes”
Character classes also allow us to use ranges:
[1-3] will match “156” and “562” but will not match “094”
To match all the English lower case letters we can use [a-z]
We also have named character classes:
[:alnum:] - any alphanumeric characater (same as [A-Za-z0-9])
[:alpha:] - any English letter (same as [A-Za-z])
[:upper:] - any upper case character (same as [A-Z])
[:lower:] - any lower case character (same as [a-z])
[:digit:] - 0 through 9 (same as [0-9])
… and more
\).M.Sc. might not yield the desired result.. will match any characterM\.Sc\...).\\.
This type of regex examines the neighbouring characters of a regex to decide on a match.
Positive Lookahead
X(?=Y) - is match if X is followed by Y (where X or Y can be one or more characters)
Negative Lookahead
X(?!Y) - is a match if X is not followed by Y
Positive Lookbehind (?<=Y)X - is a match if X is preceded by Y
Negative Lookbehind
(?<!Y)X - is a match only if X is not preceded by Y
^abc$ — exactly "abc"[0-9]{4} — “contains four digits”\bword\b — whole word “word”\w{5} and [:alnum:]{5}
Quick Syntax Cheat Sheet
. |
any character |
* |
0 or more |
+ |
1 or more |
? |
0 or 1 |
[] |
character class |
() |
capture group |
{n} |
exactly n times |
{n,} |
n or more times |
{n,m} |
between n and m times |
| |
OR |
\b |
word boundary |
\d |
digit (shortcut) |
\s |
whitespace (space, tab) |
detection functions: grep() and grepl()
substitution functions: sub() and gsub()
location functions: regexpr() and regexec()
grep, greplfunction (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE)
NULL
Main arguments:
pattern |
the regular expression |
x |
the vector against which the regex is matched |
ignore.case |
control case sensitivity |
value |
whether to return the index or the actual string |
sub(), gsub()Backreferences
regexpr(), regexec()grep and sub like functions, these give the actual positions within a string where the matches are made.stringr packageThe {stringr} package comes with several functions for regular expressions similar to those from Base R.
If not installed, do so with install.packages("stringr").
To load the package into your R session:
str_which |
grep |
str_detect |
grepl |
str_subset |
grep, where value is set to TRUE |
str_replace |
sub |
str_replace_all |
gsub |
str_locate |
regexpr |
str_locate_all |
gregexpr |
For more, see vignette("from-base", "stringr").
{stringr}Unlike Base R, {stringr} supports lookaround regex
{stringr} functions are particularly well designed for pipinggrep and sub, the input vector is the 2nd or 3rd argument, so piping is more complicated.s <- c("ebony", "Keduna", "Lagoss", "bauchi", "Rivers")
s |>
str_replace("^Ke", "Ka") |>
str_replace("(Lagos)(s)", "\\1") |>
str_replace("(y)$", "\\1i") |>
str_to_title()[1] "Ebonyi" "Kaduna" "Lagos" "Bauchi" "Rivers"
The {stringr} variants also come with helper functions regex(), fixed(), coll(), and boundary() for finer control e.g.
\ in stringsAsk me anything!