This is an R Markdown Notebook.
Assignment:
- Unnest the words of the tweets, remove stop words and weird web “words”, and create a table and a word cloud of the top words.
- Conduct a sentiment analysis using bing, remove any errors like trump = positive, and create a graph of the words that contribute most to each sentiment.
- Do the same as above but with the nrc sentiment lexicon.
- Unnest the tweets as bigrams, remove stop words and errors, and create a table and word cloud of the most common bigrams.
- Using the bigrams, look for the most common words that follow two different words. You may choose trump and pelosi, or choose your own.
usatoday_tweets <- get_timeline("usatoday", n =5000)
usatoday_words <- usatoday_tweets %>%
unnest_tokens(word, text) %>%
select(screen_name, word)
usatoday_words %>%
anti_join(get_stopwords()) %>%
filter(!word == "https") %>%
filter(!word == "t.co") %>%
count(word, sort = T)
Joining, by = "word"
God, this is going to be sad. Okay. Top words are corona-related. No surprise.
usatoday_words %>%
anti_join(get_stopwords()) %>%
filter(!word == "https") %>%
filter(!word == "t.co") %>%
count(word, sort = T)%>%
top_n(200) %>%
wordcloud2(size = .6)
Joining, by = "word"
Selecting by n
Cardi B Voice: CAH-ROHNA VI-RUSS shit is real shit is getting real.
bing_words <- get_sentiments("bing")
usatoday_words %>%
inner_join(bing_words) %>%
filter(!word == "trump") %>%
count(word, sentiment, sort = TRUE) %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(vars(sentiment), scales = "free") +
labs(y = "Bing Sentiment Contribution by Word (USA Today Tweets)",
x = NULL) +
coord_flip() +
theme_minimal()
Joining, by = "word"
Selecting by n

I was right: this is depressing. All the positive words are like an overly cheerful person telling you to just “stop being sad,” and the negative words are all about dying. Great.
nrc_words <- get_sentiments("nrc")
usatoday_words %>%
inner_join(nrc_words) %>%
filter(!word == "trump") %>%
count(word, sentiment, sort = TRUE) %>%
group_by(sentiment) %>%
top_n(5) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(vars(sentiment), scales = "free") +
labs(y = "NRC Sentiment Contribution by Word (USA Today Tweets)",
x = NULL) +
coord_flip() +
theme_minimal()
Joining, by = "word"
Selecting by n

We’re all so scared. “Pandemic” shows up at the top for negative, sadness, and fear.
remove_words = c("https", "t.co")
usatoday_tweets %>%
select(text) %>%
unnest_tokens(words, text, token = "ngrams", n = 2) %>%
separate(words, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% remove_words) %>%
filter(!word2 %in% remove_words) %>%
unite(words, word1, word2, sep = " ") -> usatoday_bigrams
usatoday_bigrams %>%
count(words, sort = T)
This would be the ideal project to see the spread of tweets/interaction regarding the presidential candidates right before election, but now the world is ending and all we want to talk about is COVID.
usatoday_bigrams %>%
count(words, sort = T) %>%
top_n(100) %>%
wordcloud2(size = .5)
Selecting by n
See? Even with the limited coverage surrounding the 2020 presidential bid, it’s still cool to see how it could have been.
first_words <- c("biden", "sanders")
usatoday_bigrams %>%
count(words, sort = TRUE) %>%
separate(words, c("word1", "word2"), sep = " ") %>%
filter(word1 %in% first_words) %>%
count(word1, word2, wt = n, sort = TRUE) %>%
rename(total = nn)
Let’s do this one time with DNC candidates. Fairly positive.
first_words <- c("corona", "coronavirus")
usatoday_bigrams %>%
count(words, sort = TRUE) %>%
separate(words, c("word1", "word2"), sep = " ") %>%
filter(word1 %in% first_words) %>%
count(word1, word2, wt = n, sort = TRUE) %>%
rename(total = nn)
Oh, look. We’re sad again.
LS0tCnRpdGxlOiAiQVByYXR0IHR3ZWV0cyBBc3NpZ25tZW50IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpUaGlzIGlzIGFuIFtSIE1hcmtkb3duXShodHRwOi8vcm1hcmtkb3duLnJzdHVkaW8uY29tKSBOb3RlYm9vay4gCgpBc3NpZ25tZW50OgoKCjEuIFVubmVzdCB0aGUgd29yZHMgb2YgdGhlIHR3ZWV0cywgcmVtb3ZlIHN0b3Agd29yZHMgYW5kIHdlaXJkIHdlYiAid29yZHMiLCBhbmQgY3JlYXRlIGEgdGFibGUgYW5kIGEgd29yZCBjbG91ZCBvZiB0aGUgdG9wIHdvcmRzLiAgCjIuIENvbmR1Y3QgYSBzZW50aW1lbnQgYW5hbHlzaXMgdXNpbmcgYmluZywgcmVtb3ZlIGFueSBlcnJvcnMgbGlrZSB0cnVtcCA9IHBvc2l0aXZlLCBhbmQgY3JlYXRlIGEgZ3JhcGggb2YgdGhlIHdvcmRzIHRoYXQgY29udHJpYnV0ZSBtb3N0IHRvIGVhY2ggc2VudGltZW50LiAgCjMuIERvIHRoZSBzYW1lIGFzIGFib3ZlIGJ1dCB3aXRoIHRoZSBucmMgc2VudGltZW50IGxleGljb24uICAKNC4gVW5uZXN0IHRoZSB0d2VldHMgYXMgYmlncmFtcywgcmVtb3ZlIHN0b3Agd29yZHMgYW5kIGVycm9ycywgYW5kIGNyZWF0ZSBhIHRhYmxlIGFuZCB3b3JkIGNsb3VkIG9mIHRoZSBtb3N0IGNvbW1vbiBiaWdyYW1zLiAgCjUuIFVzaW5nIHRoZSBiaWdyYW1zLCBsb29rIGZvciB0aGUgbW9zdCBjb21tb24gd29yZHMgdGhhdCBmb2xsb3cgdHdvIGRpZmZlcmVudCB3b3Jkcy4gWW91IG1heSBjaG9vc2UgdHJ1bXAgYW5kIHBlbG9zaSwgb3IgY2hvb3NlIHlvdXIgb3duLiAgCgpgYGB7cn0KdXNhdG9kYXlfdHdlZXRzIDwtIGdldF90aW1lbGluZSgidXNhdG9kYXkiLCBuID01MDAwKQpgYGAKCmBgYHtyfQp1c2F0b2RheV93b3JkcyA8LSB1c2F0b2RheV90d2VldHMgJT4lIAogIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkgJT4lIAogIHNlbGVjdChzY3JlZW5fbmFtZSwgd29yZCkKYGBgCgpgYGB7cn0KdXNhdG9kYXlfd29yZHMgJT4lIAogIGFudGlfam9pbihnZXRfc3RvcHdvcmRzKCkpICU+JSAKICBmaWx0ZXIoIXdvcmQgPT0gImh0dHBzIikgJT4lIAogIGZpbHRlcighd29yZCA9PSAidC5jbyIpICU+JSAKICBjb3VudCh3b3JkLCBzb3J0ID0gVCkKYGBgCkdvZCwgdGhpcyBpcyBnb2luZyB0byBiZSBzYWQuIE9rYXkuIFRvcCB3b3JkcyBhcmUgY29yb25hLXJlbGF0ZWQuIE5vIHN1cnByaXNlLiAKCmBgYHtyfQp1c2F0b2RheV93b3JkcyAlPiUgCiAgYW50aV9qb2luKGdldF9zdG9wd29yZHMoKSkgJT4lIAogIGZpbHRlcighd29yZCA9PSAiaHR0cHMiKSAlPiUgCiAgZmlsdGVyKCF3b3JkID09ICJ0LmNvIikgJT4lIAogIGNvdW50KHdvcmQsIHNvcnQgPSBUKSU+JSAKICB0b3BfbigyMDApICU+JSAKICB3b3JkY2xvdWQyKHNpemUgPSAuNikKYGBgCgpDYXJkaSBCIFZvaWNlOiBDQUgtUk9ITkEgVkktUlVTUyBzaGl0IGlzIHJlYWwgc2hpdCBpcyBnZXR0aW5nIHJlYWwuIAoKYGBge3J9CmJpbmdfd29yZHMgPC0gZ2V0X3NlbnRpbWVudHMoImJpbmciKQoKYGBgCgpgYGB7cn0KdXNhdG9kYXlfd29yZHMgJT4lIAogIGlubmVyX2pvaW4oYmluZ193b3JkcykgJT4lIAogIGZpbHRlcighd29yZCA9PSAidHJ1bXAiKSAlPiUgCiAgY291bnQod29yZCwgc2VudGltZW50LCBzb3J0ID0gVFJVRSkgJT4lIAogIGdyb3VwX2J5KHNlbnRpbWVudCkgJT4lIAogIHRvcF9uKDEwKSAlPiUgCiAgdW5ncm91cCgpICU+JSAKICBtdXRhdGUod29yZCA9IHJlb3JkZXIod29yZCwgbikpICU+JSAKICBnZ3Bsb3QoYWVzKHdvcmQsIG4sIGZpbGwgPSBzZW50aW1lbnQpKSArCiAgZ2VvbV9jb2woc2hvdy5sZWdlbmQgPSBGQUxTRSkgKyAKICBmYWNldF93cmFwKHZhcnMoc2VudGltZW50KSwgc2NhbGVzID0gImZyZWUiKSArCiAgbGFicyh5ID0gIkJpbmcgU2VudGltZW50IENvbnRyaWJ1dGlvbiBieSBXb3JkIChVU0EgVG9kYXkgVHdlZXRzKSIsCiAgICAgICB4ID0gTlVMTCkgKwogIGNvb3JkX2ZsaXAoKSArCiAgdGhlbWVfbWluaW1hbCgpCmBgYApJIHdhcyByaWdodDogdGhpcyBpcyBkZXByZXNzaW5nLiBBbGwgdGhlIHBvc2l0aXZlIHdvcmRzIGFyZSBsaWtlIGFuIG92ZXJseSBjaGVlcmZ1bCBwZXJzb24gdGVsbGluZyB5b3UgdG8ganVzdCAic3RvcCBiZWluZyBzYWQsIiBhbmQgdGhlIG5lZ2F0aXZlIHdvcmRzIGFyZSBhbGwgYWJvdXQgZHlpbmcuIEdyZWF0LiAKCmBgYHtyfQpucmNfd29yZHMgPC0gZ2V0X3NlbnRpbWVudHMoIm5yYyIpCgpgYGAKYGBge3J9CnVzYXRvZGF5X3dvcmRzICU+JSAKICBpbm5lcl9qb2luKG5yY193b3JkcykgJT4lIAogIGZpbHRlcighd29yZCA9PSAidHJ1bXAiKSAlPiUgCiAgY291bnQod29yZCwgc2VudGltZW50LCBzb3J0ID0gVFJVRSkgJT4lIAogIGdyb3VwX2J5KHNlbnRpbWVudCkgJT4lIAogIHRvcF9uKDUpICU+JSAKICB1bmdyb3VwKCkgJT4lIAogIG11dGF0ZSh3b3JkID0gcmVvcmRlcih3b3JkLCBuKSkgJT4lIAogIGdncGxvdChhZXMod29yZCwgbiwgZmlsbCA9IHNlbnRpbWVudCkpICsKICBnZW9tX2NvbChzaG93LmxlZ2VuZCA9IEZBTFNFKSArIAogIGZhY2V0X3dyYXAodmFycyhzZW50aW1lbnQpLCBzY2FsZXMgPSAiZnJlZSIpICsKICBsYWJzKHkgPSAiTlJDIFNlbnRpbWVudCBDb250cmlidXRpb24gYnkgV29yZCAoVVNBIFRvZGF5IFR3ZWV0cykiLAogICAgICAgeCA9IE5VTEwpICsKICBjb29yZF9mbGlwKCkgKwogIHRoZW1lX21pbmltYWwoKQpgYGAKCldlJ3JlIGFsbCBzbyBzY2FyZWQuICJQYW5kZW1pYyIgc2hvd3MgdXAgYXQgdGhlIHRvcCBmb3IgbmVnYXRpdmUsIHNhZG5lc3MsIGFuZCBmZWFyLiAKCmBgYHtyfQpyZW1vdmVfd29yZHMgPSBjKCJodHRwcyIsICJ0LmNvIikKCnVzYXRvZGF5X3R3ZWV0cyAlPiUgCiAgc2VsZWN0KHRleHQpICU+JSAKICB1bm5lc3RfdG9rZW5zKHdvcmRzLCB0ZXh0LCB0b2tlbiA9ICJuZ3JhbXMiLCBuID0gMikgJT4lIAogIHNlcGFyYXRlKHdvcmRzLCBjKCJ3b3JkMSIsICJ3b3JkMiIpLCBzZXAgPSAiICIpICU+JSAKICBmaWx0ZXIoIXdvcmQxICVpbiUgc3RvcF93b3JkcyR3b3JkKSAlPiUgCiAgZmlsdGVyKCF3b3JkMiAlaW4lIHN0b3Bfd29yZHMkd29yZCkgJT4lIAogIGZpbHRlcighd29yZDEgJWluJSByZW1vdmVfd29yZHMpICU+JSAKICBmaWx0ZXIoIXdvcmQyICVpbiUgcmVtb3ZlX3dvcmRzKSAlPiUgCiAgdW5pdGUod29yZHMsIHdvcmQxLCB3b3JkMiwgc2VwID0gIiAiKSAtPiB1c2F0b2RheV9iaWdyYW1zCmBgYAoKYGBge3J9CnVzYXRvZGF5X2JpZ3JhbXMgJT4lIAogIGNvdW50KHdvcmRzLCBzb3J0ID0gVCkKYGBgClRoaXMgd291bGQgYmUgdGhlIGlkZWFsIHByb2plY3QgdG8gc2VlIHRoZSBzcHJlYWQgb2YgdHdlZXRzL2ludGVyYWN0aW9uIHJlZ2FyZGluZyB0aGUgcHJlc2lkZW50aWFsIGNhbmRpZGF0ZXMgcmlnaHQgYmVmb3JlIGVsZWN0aW9uLCBidXQgbm93IHRoZSB3b3JsZCBpcyBlbmRpbmcgYW5kIGFsbCB3ZSB3YW50IHRvIHRhbGsgYWJvdXQgaXMgQ09WSUQuIAoKYGBge3J9CnVzYXRvZGF5X2JpZ3JhbXMgJT4lIAogIGNvdW50KHdvcmRzLCBzb3J0ID0gVCkgJT4lIAogIHRvcF9uKDEwMCkgJT4lIAogIHdvcmRjbG91ZDIoc2l6ZSA9IC41KQpgYGAKU2VlPyBFdmVuIHdpdGggdGhlIGxpbWl0ZWQgY292ZXJhZ2Ugc3Vycm91bmRpbmcgdGhlIDIwMjAgcHJlc2lkZW50aWFsIGJpZCwgaXQncyBzdGlsbCBjb29sIHRvIHNlZSBob3cgaXQgY291bGQgaGF2ZSBiZWVuLiAKCmBgYHtyfQpmaXJzdF93b3JkcyA8LSBjKCJiaWRlbiIsICJzYW5kZXJzIikKCnVzYXRvZGF5X2JpZ3JhbXMgJT4lIAogIGNvdW50KHdvcmRzLCBzb3J0ID0gVFJVRSkgJT4lIAogIHNlcGFyYXRlKHdvcmRzLCBjKCJ3b3JkMSIsICJ3b3JkMiIpLCBzZXAgPSAiICIpICU+JSAKICBmaWx0ZXIod29yZDEgJWluJSBmaXJzdF93b3JkcykgJT4lIAogIGNvdW50KHdvcmQxLCB3b3JkMiwgd3QgPSBuLCBzb3J0ID0gVFJVRSkgJT4lIAogIHJlbmFtZSh0b3RhbCA9IG5uKQpgYGAKCkxldCdzIGRvIHRoaXMgb25lIHRpbWUgd2l0aCBETkMgY2FuZGlkYXRlcy4gRmFpcmx5IHBvc2l0aXZlLiAKCmBgYHtyfQpmaXJzdF93b3JkcyA8LSBjKCJjb3JvbmEiLCAiY29yb25hdmlydXMiKQoKdXNhdG9kYXlfYmlncmFtcyAlPiUgCiAgY291bnQod29yZHMsIHNvcnQgPSBUUlVFKSAlPiUgCiAgc2VwYXJhdGUod29yZHMsIGMoIndvcmQxIiwgIndvcmQyIiksIHNlcCA9ICIgIikgJT4lIAogIGZpbHRlcih3b3JkMSAlaW4lIGZpcnN0X3dvcmRzKSAlPiUgCiAgY291bnQod29yZDEsIHdvcmQyLCB3dCA9IG4sIHNvcnQgPSBUUlVFKSAlPiUgCiAgcmVuYW1lKHRvdGFsID0gbm4pCmBgYAoKT2gsIGxvb2suIFdlJ3JlIHNhZCBhZ2Fpbi4gCg==