suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
suppressPackageStartupMessages(library("stringi"))
1. Count the number of words.
2. Find duplicated strings.
3. Generate random text.
The answer to each part follows.
1. To count the number of words use stringi::stri_count_words(). This code counts the words in the first five sentences of sentences.
stri_count_words(head(sentences))
[1] 8 8 9 9 7 7
2. The stringi::stri_duplicated()
function finds duplicate strings.
stri_duplicated(c(
"the", "brown", "cow", "jumped", "over",
"the", "lazy", "fox"
))
[1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
3. The stringi package contains several functions beginning with stri_rand_*
that generate random text. The function stringi::stri_rand_strings()
generates random strings. The following code generates four random strings each of length five.
stri_rand_strings(4, 5)
[1] "6tBOg" "TPKm1" "SSHKL" "2vg3J"
The function stringi::stri_rand_shuffle()
randomly shuffles the characters in the text.
stri_rand_shuffle("The brown fox jumped over the lazy cow.")
[1] "emTe wh. ofcborou tej zrlhvexypaodwn"
The function stringi::stri_rand_lipsum()
generates lorem ipsum text. Lorem ipsum text is nonsense text often used as placeholder text in publishing. The following code generates one paragraph of placeholder text.
stri_rand_lipsum(1)
[1] "Lorem ipsum dolor sit amet, turpis dictumst proin porta sed ut congue diam sed a. Sed fermentum erat facilisi nec amet lectus odio. Imperdiet nulla neque non. Risus convallis, purus congue efficitur. Nullam felis felis cras pretium justo est. Sit sagittis, efficitur sem eros natoque praesent sit semper. Senectus in vel non ut class consectetur iaculis ligula tristique. Et nam pharetra. Lorem est ac velit, gravida tristique ullamcorper at. Sit, fames neque vitae leo facilisis habitant. Aliquam fames mauris laoreet erat, sed mollis, in sit pellentesque, pharetra maecenas. Quis facilisi a risus mi gravida, ligula senectus. In, in ac ac, tincidunt aliquam."
stri_sort()
uses for sorting?You can set a locale to use when sorting with either stri_sort(...
, opts_collator=stri_opts_collator(locale = ...))
or stri_sort(..., locale = ...)
. In this example from the stri_sort()
documentation, the sorted order of the character vector depends on the locale.
string1 <- c("hladny", "chladny")
stri_sort(string1, locale = "pl_PL")
[1] "chladny" "hladny"
stri_sort(string1, locale = "sk_SK")
[1] "hladny" "chladny"
The output of stri_opts_collator()
can also be used for the locale argument of str_sort
.
stri_sort(string1, opts_collator = stri_opts_collator(locale = "pl_PL"))
[1] "chladny" "hladny"
stri_sort(string1, opts_collator = stri_opts_collator(locale = "sk_SK"))
[1] "hladny" "chladny"
The stri_opts_collator()
provides finer grained control over how strings are sorted. In addition to setting the locale, it has options to customize how cases, unicode, accents, and numeric values are handled when comparing strings.
string2 <- c("number100", "number2")
stri_sort(string2)
[1] "number100" "number2"
stri_sort(string2, opts_collator = stri_opts_collator(numeric = TRUE))
[1] "number2" "number100"