RegEx: Edge of a word

\b - Word boundary

From what I have understood, a word can constitute of letters, numbers and underscore.

In the first example, I am trying to match string, which can have any character as word boundary at the end.

example.one <- "Hello, how r you ?";
str_extract(example.one, ".+\\b");

## [1] "Hello, how r you"

example.one <- "Hello, how r you ?_";
str_extract(example.one, ".+\\b");

## [1] "Hello, how r you ?_"

Let me tweak my string a bit.

example.one <- "_hello, how r you ?_";

If I try to find the part of the string where the word starts with “h” –

unlist(str_extract_all(example.one, "\\bh.+"));

## [1] "how r you ?_"

Doesn’t match “hello”:
Because to the left hand side of hello, I have an underscore and its also a word and hence the letter “h” in hello is not a word boundary.
Matched “how”:
Because in from of “h” in “how”, there is a space and space is not a word. Hence t this “h” is a word boundary.

The “h” in “how”, has a word boundary on the left (a space on the left) but has a “o” on the right, which is a word.

unlist(str_extract_all(example.one, "\\bh\\b"));

## character(0)

If I tweak my string a bit and introduce a non-word character after the “h” in “how”, you can see that the pattern will match the “h”:

example.one <- "_hello, h.ow r you ?_";
unlist(str_extract_all(example.one, "\\bh\\b"));

## [1] "h"