Kaitlin Kavlie PSYC-541
Lab #9: Text Analysis of Tweets
- I extracted 5,000 tweets from CNN’s Twitter, unnested the words of the tweets, removed stop words and weird web related terms, and created a table and a word cloud of the top words.
I extracted the tweets using this first code.
info_tweets <- get_timeline("cnn", n = 5000)
Then I unnested the words of the tweets with this code below.
info_words <- info_tweets %>%
unnest_tokens(word, text) %>%
select(screen_name, word)
Using the code chunk below I removed stop words and weird words, as well as created a table of the top words.
info_words %>%
anti_join(stop_words) %>%
count(word, sort = T) %>%
filter(!word == "https") %>%
filter(!word == "t.co")
Joining, by = "word"
With this last code chunk I created a word cloud of the top words.
info_words %>%
anti_join(stop_words) %>%
count(word, sort = T) %>%
filter(!word == "https") %>%
filter(!word == "t.co") %>%
top_n(100) %>%
wordcloud2(size = .5)
Joining, by = "word"
Selecting by n
- I conducted a sentiment analysis using bing, removed multiple errors, and created a graph of the words that contribute the most to each sentiment.
I ran the sentiment analysis with bing by using the first code below.
bing <- get_sentiments("bing")
bing
Then I removed multiple word errors with the following code.
info_words %>%
inner_join(bing) %>%
count(word, sentiment, sort = TRUE) %>%
filter(!word == "trump") %>%
filter(!word == "like") %>%
filter(!word == "top")
Joining, by = "word"
Using this last code I created a graph of the words that contribute the most to each sentiment.
info_words %>%
inner_join(bing) %>%
count(word, sentiment, sort = TRUE) %>%
filter(!word == "trump") %>%
filter(!word == "like") %>%
filter(!word == "top") %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(vars(sentiment), scales = "free") +
labs(y = "News headlines: Words that contribute the most to each sentiment",
x = NULL) +
coord_flip() +
theme_minimal()
Joining, by = "word"
Selecting by n

- I unnested the tweets as bigrams, removed stop words and errors, and created a table and word cloud of the most common bigrams.
This first code chunk was used to unnest the tweets as bigrams.
info_tweets %>%
select(text) %>%
unnest_tokens(words, text, token = "ngrams", n = 2) %>%
count(words, sort = T)
This next code filtered out stop words.
info_tweets %>%
select(text) %>%
unnest_tokens(words, text, token = "ngrams", n = 2) %>%
separate(words, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% remove_words) %>%
filter(!word2 %in% remove_words) %>%
unite(words, word1, word2, sep = " ")
Then this code was used to filter out web terms.
remove_words = c("https", "t.co")
info_tweets %>%
select(text) %>%
unnest_tokens(words, text, token = "ngrams", n = 2) %>%
separate(words, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% remove_words) %>%
filter(!word2 %in% remove_words) %>%
unite(words, word1, word2, sep = " ") -> info_bigrams
This code created a table of the most common bigrams.
info_bigrams %>%
count(words, sort = T)
Then this code was used to create a word cloud of the most common bigrams.
info_bigrams %>%
count(words, sort = T) %>%
top_n(100) %>%
wordcloud2(size = .5)
Selecting by n
Above in question 3 I created bigrams of the tweets, removed the stopwords, and created a table and word cloud of the most common bigrams. I believe this question 4 is a repeat of question 3.
I used the bigram method and found the most common words that come after ‘ukraine’ and ‘russia’.
firstinfo_word <- c("ukraine", "russia")
info_bigrams %>%
count(words, sort = TRUE) %>%
separate(words, c("word1", "word2"), sep = " ") %>%
filter(word1 %in% firstinfo_word) %>%
count(word1, word2, wt = n, sort = TRUE)
After finding the most common words that come after ‘ukraine’ and ‘russia’, I created bar graph displaying the results for each word.
firstinfo_word <- c("ukraine", "russia")
info_bigrams %>%
count(words, sort = TRUE) %>%
separate(words, c("word1", "word2"), sep = " ") %>%
filter(word1 %in% firstinfo_word) %>%
count(word1, word2, wt = n, sort = TRUE) %>%
mutate(word2 = factor(word2, levels = rev(unique(word2)))) %>%
group_by(word1) %>%
top_n(5) %>%
ggplot(aes(word2, n, fill = word1)) +
scale_fill_viridis_d() +
geom_col(show.legend = FALSE) +
labs(x = NULL, y = NULL, title = "Word following:") +
facet_wrap(~word1, scales = "free") +
coord_flip()
Selecting by n

LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpLYWl0bGluIEthdmxpZSBQU1lDLTU0MQ0KDQpMYWIgIzk6IFRleHQgQW5hbHlzaXMgb2YgVHdlZXRzDQoNCg0KDQoNCjEuIEkgZXh0cmFjdGVkIDUsMDAwIHR3ZWV0cyBmcm9tIENOTidzIFR3aXR0ZXIsIHVubmVzdGVkIHRoZSB3b3JkcyBvZiB0aGUgdHdlZXRzLCByZW1vdmVkIHN0b3Agd29yZHMgYW5kIHdlaXJkIHdlYiByZWxhdGVkIHRlcm1zLCBhbmQgY3JlYXRlZCBhIHRhYmxlIGFuZCBhIHdvcmQgY2xvdWQgb2YgdGhlIHRvcCB3b3Jkcy4gIA0KDQpJIGV4dHJhY3RlZCB0aGUgdHdlZXRzIHVzaW5nIHRoaXMgZmlyc3QgY29kZS4NCg0KYGBge3J9DQppbmZvX3R3ZWV0cyA8LSBnZXRfdGltZWxpbmUoImNubiIsIG4gPSA1MDAwKQ0KDQpgYGANCg0KDQpUaGVuIEkgdW5uZXN0ZWQgdGhlIHdvcmRzIG9mIHRoZSB0d2VldHMgd2l0aCB0aGlzIGNvZGUgYmVsb3cuDQoNCmBgYHtyfQ0KaW5mb193b3JkcyA8LSBpbmZvX3R3ZWV0cyAlPiUgDQogIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkgJT4lIA0KICBzZWxlY3Qoc2NyZWVuX25hbWUsIHdvcmQpIA0KDQpgYGANCg0KDQpVc2luZyB0aGUgY29kZSBjaHVuayBiZWxvdyBJIHJlbW92ZWQgc3RvcCB3b3JkcyBhbmQgd2VpcmQgd29yZHMsIGFzIHdlbGwgYXMgY3JlYXRlZCBhIHRhYmxlIG9mIHRoZSB0b3Agd29yZHMuDQoNCmBgYHtyfQ0KaW5mb193b3JkcyAlPiUgDQogIGFudGlfam9pbihzdG9wX3dvcmRzKSAlPiUgDQogIGNvdW50KHdvcmQsIHNvcnQgPSBUKSAlPiUNCiAgZmlsdGVyKCF3b3JkID09ICJodHRwcyIpICU+JQ0KICBmaWx0ZXIoIXdvcmQgPT0gInQuY28iKQ0KYGBgDQoNCg0KDQpXaXRoIHRoaXMgbGFzdCBjb2RlIGNodW5rIEkgY3JlYXRlZCBhIHdvcmQgY2xvdWQgb2YgdGhlIHRvcCB3b3Jkcy4NCg0KDQpgYGB7cn0NCmluZm9fd29yZHMgJT4lIA0KICBhbnRpX2pvaW4oc3RvcF93b3JkcykgJT4lIA0KICBjb3VudCh3b3JkLCBzb3J0ID0gVCkgJT4lDQogIGZpbHRlcighd29yZCA9PSAiaHR0cHMiKSAlPiUNCiAgZmlsdGVyKCF3b3JkID09ICJ0LmNvIikgJT4lDQogIHRvcF9uKDEwMCkgJT4lDQogIHdvcmRjbG91ZDIoc2l6ZSA9IC41KQ0KYGBgDQoNCg0KDQoyLiBJIGNvbmR1Y3RlZCBhIHNlbnRpbWVudCBhbmFseXNpcyB1c2luZyBiaW5nLCByZW1vdmVkIG11bHRpcGxlIGVycm9ycywgYW5kIGNyZWF0ZWQgYSBncmFwaCBvZiB0aGUgd29yZHMgdGhhdCBjb250cmlidXRlIHRoZSBtb3N0IHRvIGVhY2ggc2VudGltZW50Lg0KDQoNCkkgcmFuIHRoZSBzZW50aW1lbnQgYW5hbHlzaXMgd2l0aCBiaW5nIGJ5IHVzaW5nIHRoZSBmaXJzdCBjb2RlIGJlbG93Lg0KDQpgYGB7cn0NCmJpbmcgPC0gZ2V0X3NlbnRpbWVudHMoImJpbmciKQ0KYmluZw0KYGBgDQoNCg0KVGhlbiBJIHJlbW92ZWQgbXVsdGlwbGUgd29yZCBlcnJvcnMgd2l0aCB0aGUgZm9sbG93aW5nIGNvZGUuDQoNCmBgYHtyfQ0KaW5mb193b3JkcyAlPiUgDQogIGlubmVyX2pvaW4oYmluZykgJT4lIA0KICBjb3VudCh3b3JkLCBzZW50aW1lbnQsIHNvcnQgPSBUUlVFKSAlPiUNCiAgZmlsdGVyKCF3b3JkID09ICJ0cnVtcCIpICU+JQ0KICBmaWx0ZXIoIXdvcmQgPT0gImxpa2UiKSAlPiUNCiAgZmlsdGVyKCF3b3JkID09ICJ0b3AiKQ0KDQpgYGANCg0KDQoNCg0KVXNpbmcgdGhpcyBsYXN0IGNvZGUgSSBjcmVhdGVkIGEgZ3JhcGggb2YgdGhlIHdvcmRzIHRoYXQgY29udHJpYnV0ZSB0aGUgbW9zdCB0byBlYWNoIHNlbnRpbWVudC4NCg0KYGBge3J9DQppbmZvX3dvcmRzICU+JSANCiAgaW5uZXJfam9pbihiaW5nKSAlPiUgDQogIGNvdW50KHdvcmQsIHNlbnRpbWVudCwgc29ydCA9IFRSVUUpICU+JQ0KICBmaWx0ZXIoIXdvcmQgPT0gInRydW1wIikgJT4lDQogIGZpbHRlcighd29yZCA9PSAibGlrZSIpICU+JQ0KICBmaWx0ZXIoIXdvcmQgPT0gInRvcCIpICU+JQ0KICBncm91cF9ieShzZW50aW1lbnQpICU+JQ0KICB0b3BfbigxMCkgJT4lDQogIHVuZ3JvdXAoKSAlPiUNCiAgbXV0YXRlKHdvcmQgPSByZW9yZGVyKHdvcmQsIG4pKSAlPiUNCiAgZ2dwbG90KGFlcyh3b3JkLCBuLCBmaWxsID0gc2VudGltZW50KSkgKw0KICBnZW9tX2NvbChzaG93LmxlZ2VuZCA9IEZBTFNFKSArDQogIGZhY2V0X3dyYXAodmFycyhzZW50aW1lbnQpLCBzY2FsZXMgPSAiZnJlZSIpICsNCiAgbGFicyh5ID0gIk5ld3MgaGVhZGxpbmVzOiBXb3JkcyB0aGF0IGNvbnRyaWJ1dGUgdGhlIG1vc3QgdG8gZWFjaCBzZW50aW1lbnQiLA0KICAgICAgIHggPSBOVUxMKSArDQogIGNvb3JkX2ZsaXAoKSArDQogIHRoZW1lX21pbmltYWwoKQ0KYGBgDQoNCg0KDQozLiBJIHVubmVzdGVkIHRoZSB0d2VldHMgYXMgYmlncmFtcywgcmVtb3ZlZCBzdG9wIHdvcmRzIGFuZCBlcnJvcnMsIGFuZCBjcmVhdGVkIGEgdGFibGUgYW5kIHdvcmQgY2xvdWQgb2YgdGhlIG1vc3QgY29tbW9uIGJpZ3JhbXMuICANCg0KDQpUaGlzIGZpcnN0IGNvZGUgY2h1bmsgd2FzIHVzZWQgdG8gdW5uZXN0IHRoZSB0d2VldHMgYXMgYmlncmFtcy4NCg0KYGBge3J9DQppbmZvX3R3ZWV0cyAlPiUNCiAgc2VsZWN0KHRleHQpICU+JSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICANCiAgdW5uZXN0X3Rva2Vucyh3b3JkcywgdGV4dCwgdG9rZW4gPSAibmdyYW1zIiwgbiA9IDIpICU+JQ0KICBjb3VudCh3b3Jkcywgc29ydCA9IFQpDQpgYGANCg0KVGhpcyBuZXh0IGNvZGUgZmlsdGVyZWQgb3V0IHN0b3Agd29yZHMuDQoNCmBgYHtyfQ0KaW5mb190d2VldHMgJT4lDQogIHNlbGVjdCh0ZXh0KSAlPiUgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQogIHVubmVzdF90b2tlbnMod29yZHMsIHRleHQsIHRva2VuID0gIm5ncmFtcyIsIG4gPSAyKSAlPiUgDQogIHNlcGFyYXRlKHdvcmRzLCBjKCJ3b3JkMSIsICJ3b3JkMiIpLCBzZXAgPSAiICIpICU+JSAgICAgICAgICANCiAgZmlsdGVyKCF3b3JkMSAlaW4lIHN0b3Bfd29yZHMkd29yZCkgJT4lICAgICAgICAgICAgICAgICAgICAgIA0KICBmaWx0ZXIoIXdvcmQyICVpbiUgc3RvcF93b3JkcyR3b3JkKSAlPiUgDQogIGZpbHRlcighd29yZDEgJWluJSByZW1vdmVfd29yZHMpICU+JSAgICAgICAgICAgICAgICAgICAgICAgICANCiAgZmlsdGVyKCF3b3JkMiAlaW4lIHJlbW92ZV93b3JkcykgJT4lDQogIHVuaXRlKHdvcmRzLCB3b3JkMSwgd29yZDIsIHNlcCA9ICIgIikgDQpgYGANCg0KDQoNClRoZW4gdGhpcyBjb2RlIHdhcyB1c2VkIHRvIGZpbHRlciBvdXQgd2ViIHRlcm1zLg0KDQpgYGB7cn0NCnJlbW92ZV93b3JkcyA9IGMoImh0dHBzIiwgInQuY28iKQ0KDQppbmZvX3R3ZWV0cyAlPiUNCiAgc2VsZWN0KHRleHQpICU+JSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICANCiAgdW5uZXN0X3Rva2Vucyh3b3JkcywgdGV4dCwgdG9rZW4gPSAibmdyYW1zIiwgbiA9IDIpICU+JSANCiAgc2VwYXJhdGUod29yZHMsIGMoIndvcmQxIiwgIndvcmQyIiksIHNlcCA9ICIgIikgJT4lICAgICAgICAgIA0KICBmaWx0ZXIoIXdvcmQxICVpbiUgc3RvcF93b3JkcyR3b3JkKSAlPiUgICAgICAgICAgICAgICAgICAgICAgDQogIGZpbHRlcighd29yZDIgJWluJSBzdG9wX3dvcmRzJHdvcmQpICU+JSAgICAgICAgICAgICAgICAgICAgICAgIA0KICBmaWx0ZXIoIXdvcmQxICVpbiUgcmVtb3ZlX3dvcmRzKSAlPiUgICAgICAgICAgICAgICAgICAgICAgICAgDQogIGZpbHRlcighd29yZDIgJWluJSByZW1vdmVfd29yZHMpICU+JSAgICAgICAgICAgICAgICAgICAgICAgICANCiAgdW5pdGUod29yZHMsIHdvcmQxLCB3b3JkMiwgc2VwID0gIiAiKSAtPiBpbmZvX2JpZ3JhbXMgICAgICAgICAgICAgICAgICAgICAgDQoNCmBgYA0KDQoNCg0KVGhpcyBjb2RlIGNyZWF0ZWQgYSB0YWJsZSBvZiB0aGUgbW9zdCBjb21tb24gYmlncmFtcy4NCmBgYHtyfQ0KaW5mb19iaWdyYW1zICU+JQ0KICBjb3VudCh3b3Jkcywgc29ydCA9IFQpDQpgYGANCg0KVGhlbiB0aGlzIGNvZGUgd2FzIHVzZWQgdG8gY3JlYXRlIGEgd29yZCBjbG91ZCBvZiB0aGUgbW9zdCBjb21tb24gYmlncmFtcy4NCg0KYGBge3J9DQppbmZvX2JpZ3JhbXMgJT4lDQogIGNvdW50KHdvcmRzLCBzb3J0ID0gVCkgJT4lDQogIHRvcF9uKDEwMCkgJT4lDQogIHdvcmRjbG91ZDIoc2l6ZSA9IC41KQ0KYGBgDQoNCg0KDQoNCg0KNC4gQWJvdmUgaW4gcXVlc3Rpb24gMyBJIGNyZWF0ZWQgYmlncmFtcyBvZiB0aGUgdHdlZXRzLCByZW1vdmVkIHRoZSBzdG9wd29yZHMsIGFuZCBjcmVhdGVkIGEgdGFibGUgYW5kIHdvcmQgY2xvdWQgb2YgdGhlIG1vc3QgY29tbW9uIGJpZ3JhbXMuIEkgYmVsaWV2ZSB0aGlzIHF1ZXN0aW9uIDQgaXMgYSByZXBlYXQgb2YgcXVlc3Rpb24gMy4gDQoNCg0KDQoNCjUuIEkgdXNlZCB0aGUgYmlncmFtIG1ldGhvZCBhbmQgZm91bmQgdGhlIG1vc3QgY29tbW9uIHdvcmRzIHRoYXQgY29tZSBhZnRlciAndWtyYWluZScgYW5kICdydXNzaWEnLg0KDQoNCmBgYHtyfQ0KZmlyc3RpbmZvX3dvcmQgPC0gYygidWtyYWluZSIsICJydXNzaWEiKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICANCg0KaW5mb19iaWdyYW1zICU+JSAgICAgICAgICAgICANCiAgY291bnQod29yZHMsIHNvcnQgPSBUUlVFKSAlPiUNCiAgc2VwYXJhdGUod29yZHMsIGMoIndvcmQxIiwgIndvcmQyIiksIHNlcCA9ICIgIikgJT4lICAgICANCiAgZmlsdGVyKHdvcmQxICVpbiUgZmlyc3RpbmZvX3dvcmQpICU+JSAgICAgICAgICAgICAgICAgICAgICAgICAgDQogIGNvdW50KHdvcmQxLCB3b3JkMiwgd3QgPSBuLCBzb3J0ID0gVFJVRSkNCmBgYA0KDQpBZnRlciBmaW5kaW5nIHRoZSBtb3N0IGNvbW1vbiB3b3JkcyB0aGF0IGNvbWUgYWZ0ZXIgJ3VrcmFpbmUnIGFuZCAncnVzc2lhJywgSSBjcmVhdGVkIGJhciBncmFwaCBkaXNwbGF5aW5nIHRoZSByZXN1bHRzIGZvciBlYWNoIHdvcmQuDQoNCmBgYHtyfQ0KZmlyc3RpbmZvX3dvcmQgPC0gYygidWtyYWluZSIsICJydXNzaWEiKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICANCg0KaW5mb19iaWdyYW1zICU+JSAgICAgICAgICAgICANCiAgY291bnQod29yZHMsIHNvcnQgPSBUUlVFKSAlPiUNCiAgc2VwYXJhdGUod29yZHMsIGMoIndvcmQxIiwgIndvcmQyIiksIHNlcCA9ICIgIikgJT4lICAgICAgIA0KICBmaWx0ZXIod29yZDEgJWluJSBmaXJzdGluZm9fd29yZCkgJT4lICAgICAgICAgICAgICAgICAgICAgICAgICANCiAgY291bnQod29yZDEsIHdvcmQyLCB3dCA9IG4sIHNvcnQgPSBUUlVFKSAlPiUNCiAgbXV0YXRlKHdvcmQyID0gZmFjdG9yKHdvcmQyLCBsZXZlbHMgPSByZXYodW5pcXVlKHdvcmQyKSkpKSAlPiUgICAgIA0KICBncm91cF9ieSh3b3JkMSkgJT4lIA0KICB0b3Bfbig1KSAlPiUgDQogIGdncGxvdChhZXMod29yZDIsIG4sIGZpbGwgPSB3b3JkMSkpICsgICAgICAgICAgICAgICAgICAgICAgICAgIA0KICBzY2FsZV9maWxsX3ZpcmlkaXNfZCgpICsgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQogIGdlb21fY29sKHNob3cubGVnZW5kID0gRkFMU0UpICsNCiAgbGFicyh4ID0gTlVMTCwgeSA9IE5VTEwsIHRpdGxlID0gIldvcmQgZm9sbG93aW5nOiIpICsNCiAgZmFjZXRfd3JhcCh+d29yZDEsIHNjYWxlcyA9ICJmcmVlIikgKw0KICBjb29yZF9mbGlwKCkNCg0KYGBgDQoNCg0KDQoNCg0K