library(stringr);
From what I have understood, a word can constitute of letters, numbers and underscore.
*Reference: http://www.rexegg.com/regex-boundaries.html*
In the first example, I am trying to match string, which can have any character as word boundary at the end.
example.one <- "Hello, how r you ?";
str_extract(example.one, ".+\\b");
## [1] "Hello, how r you"
example.one <- "Hello, how r you ?_";
str_extract(example.one, ".+\\b");
## [1] "Hello, how r you ?_"
Let me tweak my string a bit.
example.one <- "_hello, how r you ?_";
If I try to find the part of the string where the word starts with “h” –
unlist(str_extract_all(example.one, "\\bh.+"));
## [1] "how r you ?_"
The “h” in “how”, has a word boundary on the left (a space on the left) but has a “o” on the right, which is a word.
unlist(str_extract_all(example.one, "\\bh\\b"));
## character(0)
If I tweak my string a bit and introduce a non-word character after the “h” in “how”, you can see that the pattern will match the “h”:
example.one <- "_hello, h.ow r you ?_";
unlist(str_extract_all(example.one, "\\bh\\b"));
## [1] "h"