Sentiment Analysis

In this assignment, use the primary example code in Text Mining in R - Chapter 2 Sentiment Analysis (Silge & Robinson, 2020) and extend the code in the following manner:
-Work with a different corpus of your choosing
-Incorporate at least 1 additional sentiment lexicon.

The NRC Word-Emotion Association Lexicon dataset is used in this code (Mohammad & Turney, 2013).

Reference: Silge, J., & Robinson, D. (2020). Text mining with R: A tidy approach. O’Reilly Media, Inc. Retrieved from https://www.tidytextmining.com/

Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436-465. Retrieved from https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8640.2012.00460.x

A Journal of the Plague Year

A Journal of the Plague Year by Daniel Defoe from Gutenberg used as corpus. NRC Word-Emotion Association Lexicon dataset is used in this code (Mohammad & Turney, 2013).

The corpus will be tokenized and stop words will be removed.

The NRC lexicon used to identify tokens related to “fear” and “joy” found in the corpus in a plague.

A count of words related to fear and joy are ranked by frequency.

# comments
library(gutenbergr)
## Warning: package 'gutenbergr' was built under R version 4.0.3
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.0.3
defoe <- gutenberg_download(c(376))  # A Journal of the Plague Year by Daniel Defoe
## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
tidy_defoe <- defoe %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words)
## Joining, by = "word"
nrc_sentiments <- get_sentiments("nrc")

nrc_joy <- get_sentiments("nrc") %>%
  filter(sentiment == "joy")

nrc_fear <- get_sentiments("nrc") %>%
  filter(sentiment == "fear")


(defoe_joy <- tidy_defoe %>%
  inner_join(nrc_joy) %>%
  count(word, sort = TRUE))
## Joining, by = "word"
## # A tibble: 174 x 2
##    word          n
##    <chr>     <int>
##  1 found       112
##  2 god          85
##  3 true         71
##  4 money        64
##  5 church       37
##  6 child        33
##  7 charity      32
##  8 abundance    27
##  9 pleased      25
## 10 mother       20
## # ... with 164 more rows
(defoe_fear <- tidy_defoe %>%
  inner_join(nrc_fear) %>%
  count(word, sort = TRUE))
## Joining, by = "word"
## # A tibble: 331 x 2
##    word          n
##    <chr>     <int>
##  1 plague      260
##  2 infection   162
##  3 god          85
##  4 dreadful     57
##  5 danger       54
##  6 death        52
##  7 die          42
##  8 fled         39
##  9 terrible     38
## 10 distress     35
## # ... with 321 more rows

Summary

Sentiment analysis provides a way to understand the attitudes and opinions expressed in texts. For example, in A Journal of the Plague Year, a search was made for the sentiments of fear and joy in a plague. Clearly, “plague” and “infection” are fear related words. However, some words can have dual meanings where “God” elicits both job and fear.

LS0tDQp0aXRsZTogIkQ2MDdfQTEwX1JpY2tSTiINCmF1dGhvcjogIlJpY2tSTiINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCm91dHB1dDogDQogIG9wZW5pbnRybzo6bGFiX3JlcG9ydDogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50Og0KICAgIG51bWJlcl9zZWN0aW9uczogeWVzDQotLS0NCg0KYGBge3Igc3RlcF9zZXR1cCwgaW5jbHVkZT1GQUxTRX0NCmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkNCmxpYnJhcnkodGlkeXZlcnNlKQ0KYGBgDQoNCiMgU2VudGltZW50IEFuYWx5c2lzDQoNCjxzdHlsZT4NCmRpdi5ibHVlIHsgYmFja2dyb3VuZC1jb2xvcjojZTZmMGZmOyBib3JkZXItcmFkaXVzOiA1cHg7IHBhZGRpbmc6IDIwcHg7fQ0KPC9zdHlsZT4NCjxkaXYgY2xhc3MgPSAiYmx1ZSI+DQoNCkluIHRoaXMgYXNzaWdubWVudCwgdXNlIHRoZSBwcmltYXJ5IGV4YW1wbGUgY29kZSBpbiBUZXh0IE1pbmluZyBpbiBSIC0gQ2hhcHRlciAyIFNlbnRpbWVudCBBbmFseXNpcyAoU2lsZ2UgJiBSb2JpbnNvbiwgMjAyMCkgYW5kIGV4dGVuZCB0aGUgY29kZSBpbiB0aGUgZm9sbG93aW5nIG1hbm5lcjogIA0KLVdvcmsgd2l0aCBhIGRpZmZlcmVudCBjb3JwdXMgb2YgeW91ciBjaG9vc2luZyAgDQotSW5jb3Jwb3JhdGUgYXQgbGVhc3QgMSBhZGRpdGlvbmFsIHNlbnRpbWVudCBsZXhpY29uLg0KDQoNClRoZSBOUkMgV29yZC1FbW90aW9uIEFzc29jaWF0aW9uIExleGljb24gZGF0YXNldCBpcyB1c2VkIGluIHRoaXMgY29kZSAoTW9oYW1tYWQgJiBUdXJuZXksIDIwMTMpLg0KDQpSZWZlcmVuY2U6DQpTaWxnZSwgSi4sICYgUm9iaW5zb24sIEQuICgyMDIwKS4gVGV4dCBtaW5pbmcgd2l0aCBSOiBBIHRpZHkgYXBwcm9hY2guIE8nUmVpbGx5IE1lZGlhLCBJbmMuIFJldHJpZXZlZCBmcm9tIGh0dHBzOi8vd3d3LnRpZHl0ZXh0bWluaW5nLmNvbS8NCg0KTW9oYW1tYWQsIFMuIE0uLCAmIFR1cm5leSwgUC4gRC4gKDIwMTMpLiBDcm93ZHNvdXJjaW5nIGEgd29yZOKAk2Vtb3Rpb24gYXNzb2NpYXRpb24gbGV4aWNvbi4gQ29tcHV0YXRpb25hbCBJbnRlbGxpZ2VuY2UsIDI5KDMpLCA0MzYtNDY1LiBSZXRyaWV2ZWQgZnJvbSBodHRwczovL29ubGluZWxpYnJhcnkud2lsZXkuY29tL2RvaS9hYnMvMTAuMTExMS9qLjE0NjctODY0MC4yMDEyLjAwNDYwLngNCg0KDQoNCg0KDQo8L2Rpdj4gXGhmaWxsXGJyZWFrDQoNCg0KDQoNCiMgQSBKb3VybmFsIG9mIHRoZSBQbGFndWUgWWVhcg0KDQo8c3R5bGU+DQpkaXYuYmx1ZSB7IGJhY2tncm91bmQtY29sb3I6I2U2ZjBmZjsgYm9yZGVyLXJhZGl1czogNXB4OyBwYWRkaW5nOiAyMHB4O30NCjwvc3R5bGU+DQo8ZGl2IGNsYXNzID0gImJsdWUiPg0KDQpBIEpvdXJuYWwgb2YgdGhlIFBsYWd1ZSBZZWFyIGJ5IERhbmllbCBEZWZvZSBmcm9tIEd1dGVuYmVyZyB1c2VkIGFzIGNvcnB1cy4NCk5SQyBXb3JkLUVtb3Rpb24gQXNzb2NpYXRpb24gTGV4aWNvbiBkYXRhc2V0IGlzIHVzZWQgaW4gdGhpcyBjb2RlIChNb2hhbW1hZCAmIFR1cm5leSwgMjAxMykuDQoNClRoZSBjb3JwdXMgd2lsbCBiZSB0b2tlbml6ZWQgYW5kIHN0b3Agd29yZHMgd2lsbCBiZSByZW1vdmVkLg0KDQpUaGUgTlJDIGxleGljb24gdXNlZCB0byBpZGVudGlmeSB0b2tlbnMgcmVsYXRlZCB0byAiZmVhciIgYW5kICJqb3kiIGZvdW5kIGluIHRoZSBjb3JwdXMgaW4gYSBwbGFndWUuDQoNCkEgY291bnQgb2Ygd29yZHMgcmVsYXRlZCB0byBmZWFyIGFuZCBqb3kgYXJlIHJhbmtlZCBieSBmcmVxdWVuY3kuDQoNCjwvZGl2PiBcaGZpbGxcYnJlYWsNCmBgYHtyIHN0ZXBfeSwgZWNobz1UUlVFfQ0KIyBjb21tZW50cw0KbGlicmFyeShndXRlbmJlcmdyKQ0KbGlicmFyeSh0aWR5dGV4dCkNCg0KDQoNCmRlZm9lIDwtIGd1dGVuYmVyZ19kb3dubG9hZChjKDM3NikpICAjIEEgSm91cm5hbCBvZiB0aGUgUGxhZ3VlIFllYXIgYnkgRGFuaWVsIERlZm9lDQoNCnRpZHlfZGVmb2UgPC0gZGVmb2UgJT4lDQogIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkgJT4lDQogIGFudGlfam9pbihzdG9wX3dvcmRzKQ0KDQpucmNfc2VudGltZW50cyA8LSBnZXRfc2VudGltZW50cygibnJjIikNCg0KbnJjX2pveSA8LSBnZXRfc2VudGltZW50cygibnJjIikgJT4lDQogIGZpbHRlcihzZW50aW1lbnQgPT0gImpveSIpDQoNCm5yY19mZWFyIDwtIGdldF9zZW50aW1lbnRzKCJucmMiKSAlPiUNCiAgZmlsdGVyKHNlbnRpbWVudCA9PSAiZmVhciIpDQoNCg0KKGRlZm9lX2pveSA8LSB0aWR5X2RlZm9lICU+JQ0KICBpbm5lcl9qb2luKG5yY19qb3kpICU+JQ0KICBjb3VudCh3b3JkLCBzb3J0ID0gVFJVRSkpDQoNCihkZWZvZV9mZWFyIDwtIHRpZHlfZGVmb2UgJT4lDQogIGlubmVyX2pvaW4obnJjX2ZlYXIpICU+JQ0KICBjb3VudCh3b3JkLCBzb3J0ID0gVFJVRSkpDQoNCg0KDQpgYGANCg0KDQojIFN1bW1hcnkNCg0KPHN0eWxlPg0KZGl2LmJsdWUgeyBiYWNrZ3JvdW5kLWNvbG9yOiNlNmYwZmY7IGJvcmRlci1yYWRpdXM6IDVweDsgcGFkZGluZzogMjBweDt9DQo8L3N0eWxlPg0KPGRpdiBjbGFzcyA9ICJibHVlIj4NCg0KU2VudGltZW50IGFuYWx5c2lzIHByb3ZpZGVzIGEgd2F5IHRvIHVuZGVyc3RhbmQgdGhlIGF0dGl0dWRlcyBhbmQgb3BpbmlvbnMgZXhwcmVzc2VkIGluIHRleHRzLiAgRm9yIGV4YW1wbGUsIGluIEEgSm91cm5hbCBvZiB0aGUgUGxhZ3VlIFllYXIsIGEgc2VhcmNoIHdhcyBtYWRlIGZvciB0aGUgc2VudGltZW50cyBvZiBmZWFyIGFuZCBqb3kgaW4gYSBwbGFndWUuICBDbGVhcmx5LCAicGxhZ3VlIiBhbmQgImluZmVjdGlvbiIgYXJlIGZlYXIgcmVsYXRlZCB3b3Jkcy4gSG93ZXZlciwgc29tZSB3b3JkcyBjYW4gaGF2ZSBkdWFsIG1lYW5pbmdzIHdoZXJlICJHb2QiIGVsaWNpdHMgYm90aCBqb2IgYW5kIGZlYXIuICANCg0KPC9kaXY+IFxoZmlsbFxicmVhaw0K