Text mining
Introduction
Understanding the problem
NLP is a hot topic and is being used widely in the industry to penetrate deeper into user analytics and understand the user better, learn the vocabulory and suggest auto-completion, analyze behaviour and so on.
Here use apply the knowledge to build a text/sentence auto completion product which requires us to:
- Analyzing a large corpus of text documents to discover the structure in the data and how words are put together.
- Cleaning and analyzing text data, then building and sampling from a predictive text model
- Finally, to build a predictive text product.
Data gathering and info
Data exploration and understanding is a key aspect of data science, for the purpose of understanding the kind of data we are working with.
Getting the data
The data is obtained as part of the coursera datascience specialization capstone project, and consists of 4 folders that each consist of data from twitter, news and blogs. Each folder represents languages: English, Finnish, German and Russian.
fileUrl =
'https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip'
dir.create(file.path('../data'), recursive = TRUE,showWarnings = F)
if(!dir.exists('../data/en_US')){
download.file(fileUrl,'../data/dataset.zip', mode = 'wb')
unzip("../data/dataset.zip", exdir = '../data')
file.copy("../data/final/en_US", "../data", recursive = T)
file.copy("../data/final/de_DE", "../data", recursive = T)
file.copy("../data/final/fi_FI", "../data", recursive = T)
file.copy("../data/final/ru_RU", "../data", recursive = T)
}## [1] TRUE
Loading the en_US blogs, news and twitter data
The data extracted consists of just lines of text/sentences that are to be mined to distill actionable insights. The files are read one by one using the readLines() function with encoding set to utf-8, the standard encoding standard, skipping through the null lines.
en_blogs = readLines(
"../data/en_US/en_US.blogs.txt", encoding="UTF-8",
skipNul = TRUE, warn = TRUE)
en_news = readLines(
"../data/en_US/en_US.news.txt", encoding="UTF-8",
skipNul = TRUE, warn = TRUE)
en_twitter = readLines(
"../data/en_US/en_US.twitter.txt", encoding="UTF-8",
skipNul = TRUE, warn = TRUE)Summary statistics on the data
blogs_info = c(stri_stats_general(en_blogs)[1], stri_stats_latex(en_blogs)[4],
max(summary(nchar(en_blogs))),
sum(grepl("love", en_blogs))/sum(grepl("hate", en_blogs)),
file.info("../data/en_US/en_US.blogs.txt")$size/(2^20))
news_info = c(stri_stats_general(en_news)[1], stri_stats_latex(en_news)[4],
max(summary(nchar(en_news))),
sum(grepl("love", en_news))/sum(grepl("hate", en_news)),
file.info("../data/en_US/en_US.news.txt")$size/(2^20))
twitter_info = c(stri_stats_general(en_twitter)[1], stri_stats_latex(en_twitter)[4],
max(summary(nchar(en_twitter))),
sum(grepl("love", en_twitter))/sum(grepl("hate", en_twitter)),
file.info("../data/en_US/en_US.twitter.txt")$size/(2^20))
total_info = blogs_info+news_info+twitter_info
table_info <- as.data.frame(rbind(blogs_info, news_info, twitter_info, total_info))
rm(blogs_info, news_info, twitter_info, total_info)
colnames(table_info) = c("Lines", "Words", "Longest line", 'love/hate ratio', "Size in Mb")
table_infodata processing
Sampling and combining the data
set.seed(193)
sample_size = 2500
blogs = en_blogs[sample(1:length(en_blogs),sample_size)]
blogs = removeWords(blogs, stopwords("en"))
blogs = gsub("\\s+", " ", blogs)
blogs = str_trim(blogs, side = c("both"))
news = en_news[sample(1:length(en_news),sample_size)]
news = removeWords(news, stopwords("en"))
news = gsub("\\s+", " ", news)
news = str_trim(news, side = c("both"))
twitter = en_twitter[sample(1:length(en_twitter),sample_size)]
twitter = removeWords(twitter, stopwords("en"))
twitter = gsub("\\s+", " ", twitter)
twitter = str_trim(twitter, side = c("both"))
rm(sample_size, en_blogs, en_news, en_twitter)Creating volatile corpus document
Building a corpus document
A corpus is a collection of documents, We create a volatile corpus to from the vector of texts obtained.
Make a vector source using the tm package, then converting the source vector into a VCorpus object
create_corpus = function(text_data){
text_corpus = VCorpus(
VectorSource(text_data),
readerControl=list(readPlain, language="en", load=TRUE)
)
text_corpus
}
blogs_corpus = create_corpus(blogs)
news_corpus = create_corpus(news)
twitter_corpus = create_corpus(twitter)The VCorpus object uses a nested - list of list structure to hold the data. At each index of the VCorpus object, there is a PlainTextDocument object, which is a list containing actual text data (content), and some corresponding metadata (meta). It can help to visualize a VCorpus object to conceptualize the whole thing.
Cleaning and preprocessing text
Using tm’s built-in text processing methods to mine data from the corpus.
clean_corpus <- function (corpus) {
corpus <- tm_map(corpus, tolower) # all lowercase
corpus <- tm_map(corpus, removePunctuation) # Eleminate punctuation
corpus <- tm_map(corpus, removeNumbers) # Eliminate numbers
corpus <- tm_map(corpus, replace_abbreviation) # Eliminate abbreviations
corpus <- tm_map(corpus, replace_contraction) # Eliminate contractions
corpus <- tm_map(corpus, replace_symbol) # Eliminate symbols
corpus <- tm_map(corpus, stripWhitespace) # Strip Whitespace
corpus <- tm_map(corpus, removeWords, stopwords("english")) # Eliminate English stop words
# corpus <- tm_map(corpus, stemDocument) # Stem the document
corpus <- tm_map(corpus, PlainTextDocument) # Create plain text format
}
text_corpus = c(news_corpus,blogs_corpus,twitter_corpus, recursive = FALSE)
text_corpus_cleaned = clean_corpus(text_corpus)
# Comparing with original data
cat("Original document: ",content(text_corpus[[1]]))## Original document: Hey, Hoynsie: What think Mark Shapiro Chris Antonetti's reasoning building team heavy left-handed hitting weak left-handed pitching. Is new baseball fad? -- Tim Phelps, Cleveland
##
## Cleaned document: Hey hoynsie think mark shapiro chris antonettis reasoning building team heavy lefthanded hitting weak lefthanded pitching new baseball fad tim phelps cleveland
## [1] "C:/Users/Aquaregis32/Documents/Github/Data_Science_R/Projects/Swift key predictive text analytics/Rmd"
Creating a document-term matrix for analysis
When we wish to represent the data with the document as rows and the words as column we use document term matrix, the transpose of which is term document matrix.
The fields then represent the frequency of words in the data. However, other frequency measures do exist.
## <<TermDocumentMatrix (terms: 24116, documents: 7500)>>
## Non-/sparse entries: 111028/180758972
## Sparsity : 100%
## Maximal term length: 51
## Weighting : term frequency (tf)
Exploratory analysis
Using qdap’s fre_terms() functions to count the frequency of words in the datasets and presenting the information in tabular and graphical methods.
Most frequently used words
term_frequency = rowSums(text_m)
term_frequency <- sort(term_frequency, decreasing = T)
rm(text_m)
barplot(term_frequency[1:10], col = "hotpink4", las = 2, main = "Most frequently used words")Sentiment analysis
Creating a tibble
Converting the twitter_dtm to a tibble for sentiment analysis using tidytext package.
## [1] 111028 2
Forming analysis
text_sentiments <- text_tbl %>%
inner_join(get_sentiments("bing"), by = c(term = "word"))
print(dim(text_sentiments))## [1] 12436 3
Visualizing
text_sentiments %>%
count(sentiment, term, wt = count) %>%
filter(n >= 35) %>%
mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
mutate(term = reorder(term, n)) %>%
ggplot(aes(term, n, fill = sentiment)) +
theme_minimal() +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle=90,hjust=1)) + #element_blank()) +
ylab("Contribution to sentiment")Word cloud
word_freqs = data.frame(
term = names(term_frequency),
num = term_frequency
)
wordcloud::wordcloud(
word_freqs$term, word_freqs$num,
max.words = 100, colors = cividis(n = 3)
)Comparisons using word cloud
Preparing the data
all_blogs = paste(blogs, collapse = "")
all_news = paste(news, collapse = "")
all_twitter = paste(twitter ,collapse = "")
all_texts = c(all_blogs, all_news, all_twitter)
rm(blogs, news, twitter)
all_texts = VCorpus(VectorSource(all_texts))
# all_texts = clean_corpus(all_texts)
all_dm = TermDocumentMatrix(all_texts)
all_tdm = TermDocumentMatrix(all_texts)
colnames(all_tdm) = c("Blogs", "News", "Twitter")
all_m = as.matrix(all_dm)
all_mc = as.matrix(all_tdm)Commonality
Using commonality word cloud to how similar the three corpus are
Polaized tag plot
Creating data for plot
using the pyramid package blogs and news:
# subset the common words in the documents
common_words = as.data.frame(subset(all_mc, all_mc[,1]>0 & all_mc[,2]>0 & all_mc[,3]>0))
# Finding the difference and then ordering by it
common_words = mutate(common_words,
diff_bn = abs(common_words[,1] - common_words[,2]),
diff_bt = abs(common_words[,1] - common_words[,3]),
diff_nt = abs(common_words[,2] - common_words[,3]),
labels = rownames(common_words)
)
common_words = common_words[order(-common_words[,4]),]
top_25_bn = common_words[1:25,]
common_words = common_words[order(-common_words[,5]),]
top_25_bt = common_words[1:25,]
common_words = common_words[order(-common_words[,6]),]
top_25_nt = common_words[1:25,]Difference between common terms in blogs and news
pyramid.plot(
top_25_bn$Blogs, top_25_bn$News, labels = top_25_bn$labels,
main = "Words in common between blogs and news", gap = 150,
unit = NULL, raxlab = NULL, laxlab = NULL, top.labels =
c("blogs", "words", "news")
)## 352 352
## [1] 5.1 4.1 4.1 2.1
Difference between common terms in blogs and twitter
pyramid.plot(
top_25_bt$Blogs, top_25_bt$News, labels = top_25_bt$labels,
main = "Words in common between blogs and news", gap = 100,
unit = NULL, raxlab = NULL, laxlab = NULL, top.labels =
c("blogs", "words", "twitter")
)## 352 352
## [1] 5.1 4.1 4.1 2.1
Difference between common terms in news and twitter
pyramid.plot(
top_25_nt$Blogs, top_25_nt$News, labels = top_25_nt$labels,
main = "Words in common between blogs and news", gap = 200,
unit = NULL, raxlab = NULL, laxlab = NULL, top.labels =
c("news", "words", "twitter")
)## 352 352
## [1] 5.1 4.1 4.1 2.1
Analyzing words using dendrogram plot
Dendrograms reduce complicated multi-dimensional datasets to simple clustering information. This makes them a valuable tool to reduce complexity. Using the distance matrix and hclust() function we create a hierarchical cluster object which is then passed into the function as.dendrogram. Removing sparsity of the term document using the removeSparseTerms to reduce the number of terms, the parameter ‘sparse’ species the percent cut-off for number of zeroes that is allowed for each term.
# Removing sparse terms
text_reduced = removeSparseTerms(text_tdm, sparse = 0.97)
text_reduced_m = as.matrix(text_reduced)
text_reduced_dist = dist(text_reduced_m)
# Creating hclust object
text_hc = hclust(text_reduced_dist)
# Converting hclust object to dendrogram
text_dend = as.dendrogram(text_hc)
# Labels of dendrogram object
labels(text_dend)## [1] "will" "said" "one" "can" "time" "get" "people" "first"
## [9] "also" "last" "two" "see" "now" "way" "well" "much"
## [17] "make" "think" "day" "back" "going" "good" "know" "love"
## [25] "new" "the" "just" "like"
# Changing color of branches 'year', 'love' and 'said' to red
text_dend_colored = branches_attr_by_labels(
text_dend,
c("two","will","said"),
"red"
)
# Plot
plot(text_dend_colored, main = "Dendrogram analysis")
# Adding rectangles
rect.dendrogram(tree = text_dend_colored, k = 3, border = "grey50")Word associations
Using the findAssocs() function of tm package and calculating the correlation of a word with every other words/terms in the document in the range [0,1].
text_associations = findAssocs(text_tdm, "year", 0.125)
# Converting to dataframe for plot
text_associations_df = list_vect2df(
text_associations, col2 = "word", col3 = "score"
)
# plot
ggplot(text_associations_df, aes(score, word)) +
geom_point(size = 3) +
theme_minimal()N-gram tokenize
So far we’ve studied the association of a token containing just one word with other words/tokens, this part of the analysis concentrates on analysis of tokens containing more than one word.
Defining functions
# Creating a tokenizer functions using the RWeka package
bigram_tokenizer = function(x){
NGramTokenizer(x, Weka_control(min = 2, max = 2))
}
trigram_tokenizer = function(x){
NGramTokenizer(x, Weka_control(min = 3, max = 3))
}
tetragram_tokenizer = function(x){
NGramTokenizer(x, Weka_control(min = 4, max = 4))
}
pentagram_tokenizer = function(x){
NGramTokenizer(x, Weka_control(min = 5, max = 5))
}Bigrams
Creating a bigram tdm
Trigrams
Creating a trigram tdm
Tetragrams
Creating a tetragram tdm
wordcloud analysis
freq = rowSums(tetragram_m)
tetra_tokens = names(freq)
tetra_token_df = data.frame(token = tetra_tokens, freq = freq)
row.names(tetra_token_df) = NULL
head(tetra_token_df[order(-tetra_token_df$freq),])wordcloud::wordcloud(
words = tetra_token_df$token, freq = tetra_token_df$freq,
max.words = 150, colors = cividis(n = 5)
)Creating prediction model
Building N-Grams
N-Grams
Loading/Defining data and functions
## Create corpus
create_corpus = function(text_data){
text_corpus = VCorpus(
VectorSource(text_data),
readerControl=list(readPlain, language="en", load=TRUE)
)
text_corpus
}
## Corpus clean function
clean_corpus_mdl <- function (corpus) {
corpus <- tm_map(corpus, tolower) # all lowercase
corpus <- tm_map(corpus, removePunctuation) # Eleminate punctuation
corpus <- tm_map(corpus, removeNumbers) # Eliminate numbers
corpus <- tm_map(corpus, replace_abbreviation) # Eliminate abbreviations
corpus <- tm_map(corpus, replace_contraction) # Eliminate contractions
corpus <- tm_map(corpus, replace_symbol) # Eliminate symbols
corpus <- tm_map(corpus, stripWhitespace) # Strip Whitespace
corpus <- tm_map(corpus, PlainTextDocument) # Create plain text format
}
set.seed(193)
sample_size = 5000
Ngram_sythesize = function(corpus,n){
tdm = TermDocumentMatrix(
corpus,
control = list(tokenize = function(x){
NGramTokenizer(x, Weka_control(min = n, max = n))
})
)
m = as.matrix(tdm)
rm(tdm)
freq = rowSums(m)
tokens = names(freq)
rm(m)
token_df = data.frame(token = tokens, freq = freq)
rm(tokens,freq)
row.names(token_df) = NULL
token_df = filter(token_df, freq>1)
}Creating models for Blogs
en_blogs = readLines(
"../data/en_US/en_US.blogs.txt", encoding="UTF-8",
skipNul = TRUE, warn = TRUE)
blogs = en_blogs[sample(1:length(en_blogs),sample_size)]
rm(en_blogs)
blogs = gsub("\\s+", " ", blogs)
blogs = str_trim(blogs, side = c("both"))
blogs_corpus = create_corpus(blogs)
blogs_corpus_cleaned = clean_corpus_mdl(blogs_corpus)
rm(blogs,blogs_corpus)
### Bigrams
blogs_bi_token_df = Ngram_sythesize(blogs_corpus_cleaned,2)
### Trigrams
blogs_tri_token_df = Ngram_sythesize(blogs_corpus_cleaned,3)
### Tetragrams
blogs_tetra_token_df = Ngram_sythesize(blogs_corpus_cleaned,4)
### Pentagrams
blogs_penta_token_df = Ngram_sythesize(blogs_corpus_cleaned,5)
rm(blogs_corpus_cleaned)Creating models for News
en_news = readLines(
"../data/en_US/en_US.news.txt", encoding="UTF-8",
skipNul = TRUE, warn = TRUE)
news = en_news[sample(1:length(en_news),sample_size)]
rm(en_news)
news = gsub("\\s+", " ", news)
news = str_trim(news, side = c("both"))
news_corpus = create_corpus(news)
news_corpus_cleaned = clean_corpus_mdl(news_corpus)
rm(news,news_corpus)
### Bigrams
news_bi_token_df = Ngram_sythesize(news_corpus_cleaned,2)
### Trigrams
news_tri_token_df = Ngram_sythesize(news_corpus_cleaned,3)
### Tetragrams
news_tetra_token_df = Ngram_sythesize(news_corpus_cleaned,4)
### Pentagrams
news_penta_token_df = Ngram_sythesize(news_corpus_cleaned,5)
rm(news_corpus_cleaned)Creating models for Twitter
en_twitter = readLines(
"../data/en_US/en_US.twitter.txt", encoding="UTF-8",
skipNul = TRUE, warn = TRUE)
twitter = en_twitter[sample(1:length(en_twitter),sample_size)]
rm(en_twitter)
twitter = gsub("\\s+", " ", twitter)
twitter = str_trim(twitter, side = c("both"))
twitter_corpus = create_corpus(twitter)
twitter_corpus_cleaned = clean_corpus_mdl(twitter_corpus)
rm(twitter, twitter_corpus)
### Bigrams
twitter_bi_token_df = Ngram_sythesize(twitter_corpus_cleaned,2)
### Trigrams
twitter_tri_token_df = Ngram_sythesize(twitter_corpus_cleaned,3)
### Tetragrams
twitter_tetra_token_df = Ngram_sythesize(twitter_corpus_cleaned,4)
### Pentagrams
twitter_penta_token_df = Ngram_sythesize(twitter_corpus_cleaned,5)
rm(twitter_corpus_cleaned)Combining the data
bigrams = rbind(blogs_bi_token_df,news_bi_token_df,twitter_bi_token_df)
trigrams = rbind(blogs_tri_token_df,news_tri_token_df,twitter_tri_token_df)
tetragrams = rbind(blogs_tetra_token_df,news_tetra_token_df,twitter_tetra_token_df)
pentagrams = rbind(blogs_penta_token_df,news_penta_token_df,twitter_penta_token_df)
rm(blogs_bi_token_df,news_bi_token_df,twitter_bi_token_df,
blogs_tri_token_df,news_tri_token_df,twitter_tri_token_df,
blogs_tetra_token_df,news_tetra_token_df,twitter_tetra_token_df,
blogs_penta_token_df,news_penta_token_df,twitter_penta_token_df
)
bigrams$which = 2
trigrams$which = 3
tetragrams$which = 4
pentagrams$which = 5
NGrams = rbind(bigrams,trigrams,tetragrams,pentagrams)
rm(bigrams,trigrams,tetragrams,pentagrams)
### Saving
save(NGrams, file = "../NGrams/NGrams.rda")
write.csv(NGrams, "../NGrams/NGrams.csv", fileEncoding = 'UTF-8', row.names = F)
rm(NGrams)Defining prediction function
Creating a function that predicts the next word for a given input list of words
## Loading the n-grams
NGrams = get(load("../NGrams/NGrams.rda"))
## Function
predictWord = function(str){
preds_ = ""
pred = ""
str = gsub("\\s+", " ", str)
str = tolower(str)
str = removePunctuation(str)
str = str_trim(str, side = c("both"))
n_words = str_count(str, " ") + 1
if(n_words>4){
str = str_split_fixed(str,pattern = " ", n_words)[(n_words-3):n_words]
n_words = 4
}
if(n_words == 4){
matching_tetragrams = filter(NGrams, which == 5)[grepl(str,
filter(NGrams, which == 5)[,1], ignore.case=TRUE),]
preds_ = data.frame(
prediction = str_extract_all(matching_tetragrams[,1],
paste0(str,"\\s([:alpha:]+)"),
simplify = T),
freq = matching_tetragrams[,2])
best_pred = preds_[order(-preds_$freq),][1]
if(is.null(dim(best_pred))){
str = str_split_fixed(str,pattern = " ", 4)[2:4]
n_words = 3
}
}
if(n_words == 3){
matching_tetragrams = filter(NGrams, which == 4)[grepl(str,
filter(NGrams, which == 4)[,1], ignore.case=TRUE),]
preds_ = data.frame(
prediction = str_extract_all(matching_tetragrams[,1],
paste0(str,"\\s([:alpha:]+)"),
simplify = T),
freq = matching_tetragrams[,2])
best_pred = preds_[order(-preds_$freq),][1]
if(is.null(dim(best_pred))){
str = str_split_fixed(str,pattern = " ", 3)[2:3]
n_words = 2
}
}
if(n_words == 2){
matching_trigrams = filter(NGrams, which == 3)[grep(str,
filter(NGrams, which == 3)[,1], ignore.case=TRUE),]
preds_ = data.frame(
prediction = str_extract_all(matching_trigrams[,1],
paste0(str,"\\s([:alpha:]+)"),
simplify = T),
freq = matching_trigrams[,2])
best_pred = preds_[order(-preds_$freq),][1]
if(is.na(best_pred)){
str = str_split_fixed(str,pattern = " ", 3)[2]
n_words = 1
}
}
if(n_words == 1){
matching_bigrams = filter(NGrams, which == 2)[grep(str,
filter(NGrams, which == 2)[,1], ignore.case=TRUE),]
preds_ = data.frame(
prediction = str_extract_all(matching_bigrams[,1],
paste0(str,"\\s([:alpha:]+)"),
simplify = T),
freq = matching_bigrams[,2])
best_pred = preds_[order(-preds_$freq),1][1]
}
pred = str_split_fixed(preds_[order(preds_[,2]),][1,1], " ", n=2)[2]
pred
}
predictWord("achieving")## [1] "greater"
## R Session Info:
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dendextend_1.14.0 plotrix_3.7-8 viridisLite_0.3.0 dplyr_1.0.2
## [5] stringr_1.4.0 stringi_1.5.3 tidytext_0.2.6 ggplot2_3.3.2
## [9] qdap_2.4.3 RColorBrewer_1.1-2 qdapTools_1.3.5 qdapRegex_0.7.2
## [13] qdapDictionaries_1.0.7 corpus_0.10.1 data.table_1.13.2 RWeka_0.4-43
## [17] tm_0.7-8 NLP_0.2-1 ngram_3.0.4
##
## loaded via a namespace (and not attached):
## [1] viridis_0.5.1 jsonlite_1.7.1 gender_0.5.4 yaml_2.2.1
## [5] slam_0.1-48 pillar_1.4.7 lattice_0.20-41 glue_1.4.2
## [9] chron_2.3-56 digest_0.6.27 colorspace_2.0-0 htmltools_0.5.0
## [13] Matrix_1.2-18 plyr_1.8.6 XML_3.99-0.5 pkgconfig_2.0.3
## [17] bookdown_0.21 purrr_0.3.4 scales_1.1.1 openxlsx_4.2.3
## [21] tibble_3.0.4 openNLP_0.2-7 farver_2.0.3 generics_0.1.0
## [25] ellipsis_0.3.1 withr_2.3.0 magrittr_2.0.1 crayon_1.3.4
## [29] evaluate_0.14 tokenizers_0.2.1 janeaustenr_0.1.5 SnowballC_0.7.0
## [33] xml2_1.3.2 tools_4.0.3 RWekajars_3.9.3-2 lifecycle_0.2.0
## [37] munsell_0.5.0 zip_2.1.1 compiler_4.0.3 rlang_0.4.8
## [41] grid_4.0.3 RCurl_1.98-1.2 igraph_1.2.6 bitops_1.0-6
## [45] labeling_0.4.2 rmarkdown_2.5 venneuler_1.1-0 gtable_0.3.0
## [49] reshape2_1.4.4 R6_2.5.0 gridExtra_2.3 knitr_1.30
## [53] utf8_1.1.4 openNLPdata_1.5.3-4 rJava_0.9-13 parallel_4.0.3
## [57] rmdformats_1.0.0 Rcpp_1.0.5 vctrs_0.3.4 wordcloud_2.6
## [61] tidyselect_1.1.0 xfun_0.19