Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The
stringr
package provides a cohesive set of functions designed to make working with strings as easy as possible.
Packagestringr
is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulations.stringr
focuses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find thatstringr
is missing a function that you need, try looking in stringi.
For a detailed overview of stringr
visit https://stringr.tidyverse.org/.
To get started, load packages tidyverse
, tidytext
, and wordcloud
. Package stringr
will automatically be loaded when you load package tidyverse
. Install any packages with install.packages("package_to_install")
.
In the following tasks use functions available in package stringr
. Reference the stringr
RStudio cheat sheat available at https://www.rstudio.com/resources/cheatsheets/.
Functions in stringr
are structured as str_*()
, where * gives a hint as to the function’s puropse,
mainly have the first argument as string
: either a character vector, or something coercible to one,
have subsequent arguments that are function dependent, but a common argument is pattern
: a pattern to look for with the default being a regular expression,
are vectorized.
Determine the length of each string.
Determine the length of each string.
c("coffee", "tea", "whiskey", "water")
c("a", "ab", "abc", "abcd")
c("789", "pi", "e", "0")
Extract a substring from phrase
.
phrase <- "extract a substring from this phrase"
Extract the first two letters from each word in presidents
.
presidents <- c("Clinton", "Bush", "Regan", "Carter")
Extract the last two letters from each word in presidents
.
Split big.cats
at each comma.
big.cats <- "lion, tiger, jaguar, cougar, leopard, snow leopard, cheetah"
What structure was returned to you in Task 6? Unlist it.
Replace each “a” in big.cats
with “A”.
Replace the first “a” in big.cats
with “A”.
Replace every vowel in big.cats
with an “@” symbol. Hint: use a regexp.
Extract every word “fruit” or “flies” from phrases
.
phrases <- c("time flies when you're having fun in 191",
"fruit flies when you throw it",
"a fruit fly is a beautiful creature",
"how do you spell fruitfly?")
Tongue twister: Something in a 30 acre thermal thicket of thorns and thistles thumped and thundered threatening the 3-D thoughts of Matthew the thug - although, theatrically, it was only the 13000 thistles and thorns through the underneath of his thigh that the 30 year old thug thought of that morning.
Extract the numeric values from the tongue twister above. Unlist the resulting object.
twister <- paste("Something in a 30 acre thermal thicket of thorns and",
"thistles thumped and thundered threatening the 3-D",
"thoughts of Matthew the thug - although, theatrically,",
"it was only the 13000 thistles and thorns through the",
"underneath of his thigh that the 30 year old thug",
"thought of that morning.", sep = " ")
Create a word cloud or perform a sentiment analysis on one of the three documents below. Make use of the functions in package tidytext
.
Abraham Lincoln
Gettysburg Address
November 19, 1863, Address Delivered at the Dedication of the Cemetery at Gettysburg
https://www.d.umn.edu/~rmaclin/gettysburg-address.html
Dr. Martin Luther King Jr.
I have a dream speech
August 28, 1963, Lincoln Memorial in Washington D.C.
http://www.analytictech.com/mb021/mlk.htm
Theodore J. Kaczynski aka Unabomber
Manifesto
September 19, 1995, The Washington Post
https://www.josharcher.uk/static/files/2018/01/Industrial_Society_and_Its_Future-Ted_Kaczynski.txt