Exploratory analysis - perform a thorough exploratory analysis of the data, understanding the distribution of words and relationship between the words in the corpora.
Understand frequencies of words and word pairs - build figures and tables to understand variation in the frequencies of words and word pairs in the data.
Making a first approximation, we observe documents with a considerably large size (150 - 200 MB) and around 35 million words to be processed per document.
size (MB) lines words chars
blogs 200.4242 899288 37334131 206824505
news 196.2775 1010242 34372530 203223159
twitter 159.3641 2360148 30373583 162096241
According to the boxplot the three documents have a similar amount of characters per word. In addition there is a large number of words outside the limits, we will explore this.
To evaluate what is involved, we will consider the first 10 words longer than 20 for each document (blogs, news and twitter respectively).
[1] "democratically-minded"
[2] "memorycomparatively."
[3] "HAHAHAHAHAHAHAHAHAHAHAHAHAHHAHAHAHAHAHAHAHAHAHAHHAHAHAHAHAHAHAHAHAHAAHHAHA"
[4] "something-for-nothing"
[5] "DunLaoghaire-Rathdown"
[6] "location--Minneapolis,"
[7] "http://www.yogajournal.com/for_teachers/697?utm_source=DailyInsight&utm_medium=newsletter&utm_campaign=DailyInsight)."
[8] "-http://deckboss.blogspot.ca/2012/05/legislature-lavishes-aquaculture.html"
[9] "hacker/cyberterrorist"
[10] "(Esshaych@hotmail.co.uk),"
[1] "theCareerBuilder.comad" "tetrahydrocannabinol,"
[3] "therndon@stonepointcc.org." "National-Bedminsters"
[5] "chandleraz.gov/cinco." "(healthoregon.org/radon)."
[7] "portlandbicycletours.com." "greyhoundwelfare.org."
[9] "http://www.nikkics.com." "http://www.mattdennys.com."
[1] "djsosnekspqnslanskam." "foundations/charities,"
[3] "evening/afternoon/whatever" "#OneThingYouShouldntDo"
[5] "Sark-oh-no-he-didn'tzy," "Liberals/Progressives/DemoRates"
[7] "after-work.Introducing" "#problemchildontheloose"
[9] "#WordsYouWillNeverHearMeSay" "www.historyglobe.com/jamestown/"
According to the summary seen, the “words” that have relationship with websites, emails, expressions that do not represent words and hashtag of twitter. (It is possible that they appear more in the cleaning)
On the other hand, words appear that are not separated by space, but by slash or line, these should be considered as valid in cleaning.
Due to the size of the documents a corpus is made considering a sample of 5% of each document.
Then transformations are made to structure the data, according to the following:
To remove the profane words a dictionary of the GitHub account was used. Robert J Gabriel.
Next, the resulting document-term matrix is displayed.
<<TermDocumentMatrix (terms: 124333, documents: 3)>>
Non-/sparse entries: 168191/204808
Sparsity : 55%
Maximal term length: 261
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
and 54596 3436 21801
are 9676 559 7974
for 18139 1434 19345
have 11046 564 8287
that 23000 1303 11609
the 91604 7718 46658
this 13076 465 8061
was 13750 862 5882
with 14192 1073 8819
you 14852 348 27092
To determine the behavior of the term frequency of the document-term matrix, four n-grams are created.
Next, the document-term matrix of each n-gram and a summary of the number of terms for each n-gram are shown.
<<TermDocumentMatrix (terms: 124333, documents: 3)>>
Non-/sparse entries: 168191/204808
Sparsity : 55%
Maximal term length: 261
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
and 54596 3436 21801
are 9676 559 7974
for 18139 1434 19345
have 11046 564 8287
that 23000 1303 11609
the 91604 7718 46658
this 13076 465 8061
was 13750 862 5882
with 14192 1073 8819
you 14852 348 27092
<<TermDocumentMatrix (terms: 1233860, documents: 3)>>
Non-/sparse entries: 1412130/2289450
Sparsity : 62%
Maximal term length: 356
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
and the 3013 236 732
at the 2340 211 1845
for the 2900 269 3701
i have 2386 23 1514
in a 2320 230 1144
in the 7630 718 3908
of the 9283 739 2966
on the 3541 298 2482
to be 3448 160 2352
to the 4283 309 2102
<<TermDocumentMatrix (terms: 2611491, documents: 3)>>
Non-/sparse entries: 2742032/5092441
Sparsity : 65%
Maximal term length: 373
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
a lot of 661 50 325
going to be 266 19 374
i have a 283 2 271
i have to 284 7 220
i want to 274 12 370
it was a 346 16 173
looking forward to 74 1 430
one of the 715 55 291
thanks for the 11 0 1157
to be a 346 19 327
<<TermDocumentMatrix (terms: 3233189, documents: 3)>>
Non-/sparse entries: 3273677/6425890
Sparsity : 66%
Maximal term length: 458
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
at the end of 158 11 49
cant wait to see 16 0 148
for the first time 84 6 77
going to be a 42 8 111
is going to be 70 9 117
thank you for the 8 0 151
thanks for the follow 0 0 295
thanks for the rt 0 0 175
the end of the 162 15 73
the rest of the 135 2 64
1-gram 2-gram 3-gram 4-gram
Terms 124333 1233860 2611491 3233189
A large number of terms is observed for each n-gram, which may slow down the execution of the model.
Below are the 20 terms most frequently for each n-gram.
The following shows how many unique words you need in a dictionary ordered by frequency to cover 50% and 90% of all instances of words in the language.
1-gram 2-gram 3-gram 4-gram
Cov 50% 262 36390 896745 1518444
Cov 90% 10057 890911 2268542 2890240
Using the hunspell package you can evaluate how many words come from foreign languages.
Below is a summary of the language of the terms and a graph of the 20 most frequent terms in another language.
terms
total 2676426
english 2415673
no english 260753
As you can see, there are many words that if they are in English, this is due to the dictionary used (the one that comes by default in the package).
Due to the low performance of applying this dictionary in this analysis, words in a foreign language will not be filtered.
Using a better dictionary could actually filter the words in another language.
To make the number of terms more manageable, the sparse words will be eliminated considering considering a factor of 20%.
Below is the result of Remove sparse words from each n-gram and a summary with the number of terms for each n-gram.
<<TermDocumentMatrix (terms: 11835, documents: 3)>>
Non-/sparse entries: 35505/0
Sparsity : 0%
Maximal term length: 18
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
and 54596 3436 21801
are 9676 559 7974
for 18139 1434 19345
have 11046 564 8287
that 23000 1303 11609
the 91604 7718 46658
this 13076 465 8061
was 13750 862 5882
with 14192 1073 8819
you 14852 348 27092
<<TermDocumentMatrix (terms: 25342, documents: 3)>>
Non-/sparse entries: 76026/0
Sparsity : 0%
Maximal term length: 23
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
and the 3013 236 732
at the 2340 211 1845
for the 2900 269 3701
i have 2386 23 1514
in a 2320 230 1144
in the 7630 718 3908
of the 9283 739 2966
on the 3541 298 2482
to be 3448 160 2352
to the 4283 309 2102
<<TermDocumentMatrix (terms: 10244, documents: 3)>>
Non-/sparse entries: 30732/0
Sparsity : 0%
Maximal term length: 33
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
a lot of 661 50 325
be able to 325 15 150
going to be 266 19 374
i have a 283 2 271
i have to 284 7 220
i want to 274 12 370
it was a 346 16 173
looking forward to 74 1 430
one of the 715 55 291
to be a 346 19 327
<<TermDocumentMatrix (terms: 1679, documents: 3)>>
Non-/sparse entries: 5037/0
Sparsity : 0%
Maximal term length: 29
Weighting : term frequency (tf)
Sample :
Docs
Terms en_US.blogs.txt en_US.news.txt en_US.twitter.txt
at the end of 158 11 49
at the same time 93 9 53
for the first time 84 6 77
going to be a 42 8 111
if you want to 73 5 70
is going to be 70 9 117
is one of the 87 6 54
the end of the 162 15 73
the rest of the 135 2 64
when it comes to 122 4 29
1-gram 2-gram 3-gram 4-gram
Terms 11835 25342 10244 1679
As noted, the terms for each n-gram have decreased considerably.
1-gram 2-gram 3-gram 4-gram
Cov 50% 157 1095 1078 231
Cov 90% 3033 10269 5845 1106
Performing the same analysis for the new n-gram, a large decrease in terms is also observed.
Load the libraries
suppressMessages(library(knitr))
suppressMessages(library(tm))
suppressMessages(library(ggplot2))
suppressMessages(library(NLP))
suppressMessages(library(tidyr))
suppressMessages(library(hunspell))Load the data
dir <- "./SwiftKey/"
# Load the data
con <- file(paste0(dir,"en_US.blogs.txt"), "rb")
blogs <- readLines(con, encoding = "UTF-8", skipNul = T, warn = F)
close(con)
con <- file(paste0(dir,"en_US.news.txt"), "rb")
news <- readLines(con, encoding = "UTF-8", skipNul = T, warn = F)
close(con)
con <- file(paste0(dir,"en_US.twitter.txt"), "rb")
twitter <- readLines(con, encoding = "UTF-8", skipNul = T, warn = F)
close(con)Calculations in the data to show summary
# Read the size of the data
sizeData <- c(file.info(paste0(dir,"en_US.blogs.txt"))$size,
file.info(paste0(dir,"en_US.news.txt"))$size,
file.info(paste0(dir,"en_US.twitter.txt"))$size)
# Count the number of words per data set
wordsBlogs <- words(blogs)
wordsNews <- words(news)
wordsTwitter <- words(twitter)
wordsData <- c(length(wordsBlogs), length(wordsNews),
length(wordsTwitter))
# Count the number of chars per data set
charsData <- c(sum(nchar(blogs)), sum(nchar(news)), sum(nchar(twitter)))
# Count the number of lines per data set
linesData <- c(length(blogs), length(news), length(twitter))Calculate number of characters per word
sumData <- rbind(sizeData/1048576, linesData, wordsData, charsData)
colnames(sumData) <- c("blogs", "news", "twitter")
rownames(sumData) <- c("size (MB)", "lines", "words", "chars")
# Summary of the data
t(sumData)
# Number of chars per word
charBlogs <- nchar(wordsBlogs)
charNews <- nchar(wordsNews)
charTwitter <- nchar(wordsTwitter)Boxplot of characters per word for each document
boxplot(charBlogs, charNews, charTwitter,
log = "y", names = c("blogs", "news", "twitter"),
ylab = "log(Number of Characters)", xlab = "File Name")
title("Comparing Distributions of Chracters per Line")Displays the first 10 words with more than 20 characters
wordsBlogs[charBlogs>20][1:10]
wordsNews[charNews>20][1:10]
wordsTwitter[charTwitter>20][1:10]Create a corpus with the three documents
remove(blogs, news, twitter, wordsBlogs, wordsNews, wordsTwitter)
corpus <- VCorpus(DirSource(dir),
readerControl = list(language = "en"))
n=.05 # 5% of the size of each set
set.seed(50)
corpus[[1]]$content <- sample(corpus[[1]]$content,
length(corpus[[1]]$content)*n)
corpus[[2]]$content <- sample(corpus[[2]]$content,
length(corpus[[2]]$content)*n)
corpus[[3]]$content <- sample(corpus[[3]]$content,
length(corpus[[3]]$content)*n)Make transformations and create a document-term matrix
# Create function to modify a text pattern
f <- content_transformer(function(x, patt1, patt2) gsub(patt1, patt2, x))
# Download profane words from Robert J Gabriel's github
# "https://github.com/RobertJGabriel/Google-profanity-words/blob/master/list.txt"
con <- file("profaneList.txt", "rb")
profaneWords <- readLines(con, encoding = "UTF-8", skipNul = T, warn = F)
close(con)
# Remove punctuation and junk
corpus <- tm_map(corpus, f, "[[:punct:]]", "")
# Replace Unicode apostrophe with ASCII apostrophe
corpus <- tm_map(corpus, f, "['']", "'")
# Remove multiple repeating consecutive words
corpus <- tm_map(corpus, f, "\\b(\\w+)(?:\\s+\\1\\b)+", "\\1")
# Remove multiple repeating consecutive pairs of words
corpus <- tm_map(corpus, f, "\\b(\\w+\\s\\w+)(\\s\\1)+", "\\1")
# Remove numbers
corpus <- tm_map(corpus, removeNumbers)
# Transform to tolower
corpus <- tm_map(corpus, content_transformer(tolower))
# Remove punctuation
corpus <- tm_map(corpus, removePunctuation)
# Remove profane words
corpus <- tm_map(corpus, removeWords, profaneWords)
# Remove strip extra whitespace
corpus <- tm_map(corpus, stripWhitespace)
# Create a document-term matrix from single words found in all documents
tdmCorpus <- TermDocumentMatrix(corpus)Show the document-term matrix
inspect(tdmCorpus)Create n-gram
token1gram <- function(x) {
unlist(lapply(ngrams(words(x), 1), paste, collapse = " "),
use.names = FALSE)}
token2gram <- function(x) {
unlist(lapply(ngrams(words(x), 2), paste, collapse = " "),
use.names = FALSE)}
token3gram <- function(x) {
unlist(lapply(ngrams(words(x), 3), paste, collapse = " "),
use.names = FALSE)}
token4gram <- function(x) {
unlist(lapply(ngrams(words(x), 4), paste, collapse = " "),
use.names = FALSE)}
gram1 <- TermDocumentMatrix(corpus, control = list(tokenize = token1gram))
gram2 <- TermDocumentMatrix(corpus, control = list(tokenize = token2gram))
gram3 <- TermDocumentMatrix(corpus, control = list(tokenize = token3gram))
gram4 <- TermDocumentMatrix(corpus, control = list(tokenize = token4gram))Show the four n-grams
inspect(gram1)
inspect(gram2)
inspect(gram3)
inspect(gram4)
ngramData <- cbind(dim(gram1)[1], dim(gram2)[1],
dim(gram3)[1], dim(gram4)[1])
colnames(ngramData) <- c("1-gram", "2-gram", "3-gram", "4-gram")
rownames(ngramData) <- "Terms"
ngramDataSave the n-gram to disk
saveRDS(gram1, "one_words_0.rds")
saveRDS(gram2, "two_words_0.rds")
saveRDS(gram3, "three_words_0.rds")
saveRDS(gram4, "four_words_0.rds")Show the 20 most frequent terms of each n-gram
nn=20 # Number of bar
# 1-gram
freqTerms <- findFreqTerms(gram1)
termFreq <- rowSums(as.matrix(gram1[freqTerms,]))
termFreq <- termFreq[order(termFreq, decreasing = TRUE)]
termFreq <- data.frame(unigram=names(head(termFreq,nn)),
frequency=head(termFreq,nn))
g1 <- ggplot(termFreq, aes(x=reorder(unigram, frequency), y=frequency)) +
geom_bar(stat = "identity", fill="red") + coord_flip() +
theme(legend.title=element_blank()) +
xlab("1-gram") + ylab("Frequency") +
labs(title = "Top 1-grams by frequency")
print(g1)
# 2-gram
freqTerms <- findFreqTerms(gram2)
termFreq <- rowSums(as.matrix(gram2[freqTerms,]))
termFreq <- termFreq[order(termFreq, decreasing = TRUE)]
termFreq <- data.frame(unigram=names(head(termFreq,nn)),
frequency=head(termFreq,nn))
g2 <- ggplot(termFreq, aes(x=reorder(unigram, frequency), y=frequency)) +
geom_bar(stat = "identity", fill="red") + coord_flip() +
theme(legend.title=element_blank()) +
xlab("2-gram") + ylab("Frequency") +
labs(title = "Top 2-grams by frequency")
print(g2)
# 3-gram
freqTerms <- findFreqTerms(gram3)
termFreq <- rowSums(as.matrix(gram3[freqTerms,]))
termFreq <- termFreq[order(termFreq, decreasing = TRUE)]
termFreq <- data.frame(unigram=names(head(termFreq,nn)),
frequency=head(termFreq,nn))
g3 <- ggplot(termFreq, aes(x=reorder(unigram, frequency), y=frequency)) +
geom_bar(stat = "identity", fill="red") + coord_flip() +
theme(legend.title=element_blank()) +
xlab("3-gram") + ylab("Frequency") +
labs(title = "Top 3-grams by frequency")
print(g3)
# 4-gram
freqTerms <- findFreqTerms(gram4)
termFreq <- rowSums(as.matrix(gram4[freqTerms,]))
termFreq <- termFreq[order(termFreq, decreasing = TRUE)]
termFreq <- data.frame(unigram=names(head(termFreq,nn)),
frequency=head(termFreq,nn))
g4 <- ggplot(termFreq, aes(x=reorder(unigram, frequency), y=frequency)) +
geom_bar(stat = "identity", fill="red") + coord_flip() +
theme(legend.title=element_blank()) +
xlab("4-gram") + ylab("Frequency") +
labs(title = "Top 4-grams by frequency")
print(g4)Calculate and show n-gram coverage for 50% and 90%
wordsum <- function(x, coverage){
totalfreq <- sum(x$freq)
wordfreq <- 0
for (i in 1:length(x$freq))
{
wordfreq <- wordfreq + as.numeric(x$freq[i])
if (wordfreq >= coverage * totalfreq)
{
return (i)
}
}
return (nrow(x))
}
freqTerms <- findFreqTerms(gram1)
termFreq <- rowSums(as.matrix(gram1[freqTerms,]))
G1 <- data.frame(word = freqTerms, freq = termFreq)
row.names(G1) <- 1:dim(G1)[1]
G1 <- G1[with(G1, order(G1$freq, decreasing = TRUE)), ]
freqTerms <- findFreqTerms(gram2)
termFreq <- rowSums(as.matrix(gram2[freqTerms,]))
G2 <- data.frame(word = freqTerms, freq = termFreq)
row.names(G2) <- 1:dim(G2)[1]
G2 <- G2[with(G2, order(G2$freq, decreasing = TRUE)), ]
freqTerms <- findFreqTerms(gram3)
termFreq <- rowSums(as.matrix(gram3[freqTerms,]))
G3 <- data.frame(word = freqTerms, freq = termFreq)
row.names(G3) <- 1:dim(G3)[1]
G3 <- G3[with(G3, order(G3$freq, decreasing = TRUE)), ]
freqTerms <- findFreqTerms(gram4)
termFreq <- rowSums(as.matrix(gram4[freqTerms,]))
G4 <- data.frame(word = freqTerms, freq = termFreq)
row.names(G4) <- 1:dim(G4)[1]
G4 <- G4[with(G4, order(G4$freq, decreasing = TRUE)), ]
ngramData <- rbind(cbind(wordsum(G1,0.5), wordsum(G2,0.5),
wordsum(G3,0.5),
wordsum(G4,0.5)),
cbind(wordsum(G1,0.9),
wordsum(G2,0.9),
wordsum(G3,0.9),
wordsum(G4,0.9)))
colnames(ngramData) <- c("1-gram", "2-gram", "3-gram", "4-gram")
rownames(ngramData) <- c("Cov 50%", "Cov 90%")
ngramDataEvaluate how many terms are in English
en <- dictionary("en_US")
freqTerms <- findFreqTerms(gram1)
termFreq <- rowSums(as.matrix(gram1[freqTerms,]))
termEng <- hunspell_check(freqTerms, dict = en)
language <- rbind(sum(termFreq),
sum(termFreq[termEng]),
sum(termFreq[!termEng]))
colnames(language) <- "terms"
rownames(language) <- c("total", "english", "no english")
language
freqTerms <- freqTerms[!termEng]
termFreq <- termFreq[!termEng]
termFreq <- termFreq[order(termFreq, decreasing = TRUE)]
termFreq <- data.frame(unigram=names(head(termFreq,20)),
frequency=head(termFreq,20))
g1 <- ggplot(termFreq, aes(x=reorder(unigram, frequency), y=frequency)) +
geom_bar(stat = "identity", fill="red") + coord_flip() +
theme(legend.title=element_blank()) +
xlab("1-gram") + ylab("Frequency") +
labs(title = "Top 1-grams by frequency")
print(g1)Decrease the number of terms of each n-gram
# remove sparse words, leaving only 20% sparsity
gram1 <- removeSparseTerms(gram1, .2)
gram2 <- removeSparseTerms(gram2, .2)
gram3 <- removeSparseTerms(gram3, .2)
gram4 <- removeSparseTerms(gram4, .2)
inspect(gram1)
inspect(gram2)
inspect(gram3)
inspect(gram4)
ngramData <- cbind(dim(gram1)[1], dim(gram2)[1],
dim(gram3)[1], dim(gram4)[1])
colnames(ngramData) <- c("1-gram", "2-gram", "3-gram", "4-gram")
rownames(ngramData) <- "Terms"
ngramDataCalculate and show the coverage of the new ngram for 50% and 90%
wordsum <- function(x, coverage){
totalfreq <- sum(x$freq)
wordfreq <- 0
for (i in 1:length(x$freq))
{
wordfreq <- wordfreq + as.numeric(x$freq[i])
if (wordfreq >= coverage * totalfreq)
{
return (i)
}
}
return (nrow(x))
}
freqTerms <- findFreqTerms(gram1)
termFreq <- rowSums(as.matrix(gram1[freqTerms,]))
G1 <- data.frame(word = freqTerms, freq = termFreq)
row.names(G1) <- 1:dim(G1)[1]
G1 <- G1[with(G1, order(G1$freq, decreasing = TRUE)), ]
freqTerms <- findFreqTerms(gram2)
termFreq <- rowSums(as.matrix(gram2[freqTerms,]))
G2 <- data.frame(word = freqTerms, freq = termFreq)
row.names(G2) <- 1:dim(G2)[1]
G2 <- G2[with(G2, order(G2$freq, decreasing = TRUE)), ]
freqTerms <- findFreqTerms(gram3)
termFreq <- rowSums(as.matrix(gram3[freqTerms,]))
G3 <- data.frame(word = freqTerms, freq = termFreq)
row.names(G3) <- 1:dim(G3)[1]
G3 <- G3[with(G3, order(G3$freq, decreasing = TRUE)), ]
freqTerms <- findFreqTerms(gram4)
termFreq <- rowSums(as.matrix(gram4[freqTerms,]))
G4 <- data.frame(word = freqTerms, freq = termFreq)
row.names(G4) <- 1:dim(G4)[1]
G4 <- G4[with(G4, order(G4$freq, decreasing = TRUE)), ]
ngramData <- rbind(cbind(wordsum(G1,0.5), wordsum(G2,0.5),
wordsum(G3,0.5),
wordsum(G4,0.5)),
cbind(wordsum(G1,0.9),
wordsum(G2,0.9),
wordsum(G3,0.9),
wordsum(G4,0.9)))
colnames(ngramData) <- c("1-gram", "2-gram", "3-gram", "4-gram")
rownames(ngramData) <- c("Cov 50%", "Cov 90%")
ngramDataSave new n-grams to disk
saveRDS(gram1, "one_words_1.rds")
saveRDS(gram2, "two_words_1.rds")
saveRDS(gram3, "three_words_1.rds")
saveRDS(gram4, "four_words_1.rds")sessionInfo()R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Chile.1252 LC_CTYPE=Spanish_Chile.1252
[3] LC_MONETARY=Spanish_Chile.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Chile.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] hunspell_3.0 tidyr_0.8.2 ggplot2_3.1.0 tm_0.7-6 NLP_0.2-0
[6] knitr_1.21
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 compiler_3.5.2 pillar_1.3.1 plyr_1.8.4
[5] bindr_0.1.1 tools_3.5.2 digest_0.6.18 evaluate_0.12
[9] tibble_2.0.0 gtable_0.2.0 pkgconfig_2.0.2 rlang_0.3.1
[13] yaml_2.2.0 parallel_3.5.2 xfun_0.4 bindrcpp_0.2.2
[17] withr_2.1.2 stringr_1.3.1 dplyr_0.7.8 xml2_1.2.0
[21] grid_3.5.2 tidyselect_0.2.5 glue_1.3.0 R6_2.3.0
[25] rmarkdown_1.11 purrr_0.2.5 magrittr_1.5 scales_1.0.0
[29] htmltools_0.3.6 assertthat_0.2.0 colorspace_1.3-2 stringi_1.2.4
[33] lazyeval_0.2.1 munsell_0.5.0 slam_0.1-44 crayon_1.3.4