Sentiment Analysis
In this assignment, use the primary example code in Text Mining in R - Chapter 2 Sentiment Analysis (Silge & Robinson, 2020) and extend the code in the following manner:
-Work with a different corpus of your choosing
-Incorporate at least 1 additional sentiment lexicon.
The NRC Word-Emotion Association Lexicon dataset is used in this code (Mohammad & Turney, 2013).
Reference: Silge, J., & Robinson, D. (2020). Text mining with R: A tidy approach. O’Reilly Media, Inc. Retrieved from https://www.tidytextmining.com/
Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436-465. Retrieved from https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8640.2012.00460.x
A Journal of the Plague Year
A Journal of the Plague Year by Daniel Defoe from Gutenberg used as corpus. NRC Word-Emotion Association Lexicon dataset is used in this code (Mohammad & Turney, 2013).
The corpus will be tokenized and stop words will be removed.
The NRC lexicon used to identify tokens related to “fear” and “joy” found in the corpus in a plague.
A count of words related to fear and joy are ranked by frequency.
# comments
library(gutenbergr)
## Warning: package 'gutenbergr' was built under R version 4.0.3
## Warning: package 'tidytext' was built under R version 4.0.3
defoe <- gutenberg_download(c(376)) # A Journal of the Plague Year by Daniel Defoe
## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
tidy_defoe <- defoe %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)
## Joining, by = "word"
nrc_sentiments <- get_sentiments("nrc")
nrc_joy <- get_sentiments("nrc") %>%
filter(sentiment == "joy")
nrc_fear <- get_sentiments("nrc") %>%
filter(sentiment == "fear")
(defoe_joy <- tidy_defoe %>%
inner_join(nrc_joy) %>%
count(word, sort = TRUE))
## Joining, by = "word"
## # A tibble: 174 x 2
## word n
## <chr> <int>
## 1 found 112
## 2 god 85
## 3 true 71
## 4 money 64
## 5 church 37
## 6 child 33
## 7 charity 32
## 8 abundance 27
## 9 pleased 25
## 10 mother 20
## # ... with 164 more rows
(defoe_fear <- tidy_defoe %>%
inner_join(nrc_fear) %>%
count(word, sort = TRUE))
## Joining, by = "word"
## # A tibble: 331 x 2
## word n
## <chr> <int>
## 1 plague 260
## 2 infection 162
## 3 god 85
## 4 dreadful 57
## 5 danger 54
## 6 death 52
## 7 die 42
## 8 fled 39
## 9 terrible 38
## 10 distress 35
## # ... with 321 more rows
Summary
Sentiment analysis provides a way to understand the attitudes and opinions expressed in texts. For example, in A Journal of the Plague Year, a search was made for the sentiments of fear and joy in a plague. Clearly, “plague” and “infection” are fear related words. However, some words can have dual meanings where “God” elicits both job and fear.
LS0tDQp0aXRsZTogIkQ2MDdfQTEwX1JpY2tSTiINCmF1dGhvcjogIlJpY2tSTiINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCm91dHB1dDogDQogIG9wZW5pbnRybzo6bGFiX3JlcG9ydDogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50Og0KICAgIG51bWJlcl9zZWN0aW9uczogeWVzDQotLS0NCg0KYGBge3Igc3RlcF9zZXR1cCwgaW5jbHVkZT1GQUxTRX0NCmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkNCmxpYnJhcnkodGlkeXZlcnNlKQ0KYGBgDQoNCiMgU2VudGltZW50IEFuYWx5c2lzDQoNCjxzdHlsZT4NCmRpdi5ibHVlIHsgYmFja2dyb3VuZC1jb2xvcjojZTZmMGZmOyBib3JkZXItcmFkaXVzOiA1cHg7IHBhZGRpbmc6IDIwcHg7fQ0KPC9zdHlsZT4NCjxkaXYgY2xhc3MgPSAiYmx1ZSI+DQoNCkluIHRoaXMgYXNzaWdubWVudCwgdXNlIHRoZSBwcmltYXJ5IGV4YW1wbGUgY29kZSBpbiBUZXh0IE1pbmluZyBpbiBSIC0gQ2hhcHRlciAyIFNlbnRpbWVudCBBbmFseXNpcyAoU2lsZ2UgJiBSb2JpbnNvbiwgMjAyMCkgYW5kIGV4dGVuZCB0aGUgY29kZSBpbiB0aGUgZm9sbG93aW5nIG1hbm5lcjogIA0KLVdvcmsgd2l0aCBhIGRpZmZlcmVudCBjb3JwdXMgb2YgeW91ciBjaG9vc2luZyAgDQotSW5jb3Jwb3JhdGUgYXQgbGVhc3QgMSBhZGRpdGlvbmFsIHNlbnRpbWVudCBsZXhpY29uLg0KDQoNClRoZSBOUkMgV29yZC1FbW90aW9uIEFzc29jaWF0aW9uIExleGljb24gZGF0YXNldCBpcyB1c2VkIGluIHRoaXMgY29kZSAoTW9oYW1tYWQgJiBUdXJuZXksIDIwMTMpLg0KDQpSZWZlcmVuY2U6DQpTaWxnZSwgSi4sICYgUm9iaW5zb24sIEQuICgyMDIwKS4gVGV4dCBtaW5pbmcgd2l0aCBSOiBBIHRpZHkgYXBwcm9hY2guIE8nUmVpbGx5IE1lZGlhLCBJbmMuIFJldHJpZXZlZCBmcm9tIGh0dHBzOi8vd3d3LnRpZHl0ZXh0bWluaW5nLmNvbS8NCg0KTW9oYW1tYWQsIFMuIE0uLCAmIFR1cm5leSwgUC4gRC4gKDIwMTMpLiBDcm93ZHNvdXJjaW5nIGEgd29yZOKAk2Vtb3Rpb24gYXNzb2NpYXRpb24gbGV4aWNvbi4gQ29tcHV0YXRpb25hbCBJbnRlbGxpZ2VuY2UsIDI5KDMpLCA0MzYtNDY1LiBSZXRyaWV2ZWQgZnJvbSBodHRwczovL29ubGluZWxpYnJhcnkud2lsZXkuY29tL2RvaS9hYnMvMTAuMTExMS9qLjE0NjctODY0MC4yMDEyLjAwNDYwLngNCg0KDQoNCg0KDQo8L2Rpdj4gXGhmaWxsXGJyZWFrDQoNCg0KDQoNCiMgQSBKb3VybmFsIG9mIHRoZSBQbGFndWUgWWVhcg0KDQo8c3R5bGU+DQpkaXYuYmx1ZSB7IGJhY2tncm91bmQtY29sb3I6I2U2ZjBmZjsgYm9yZGVyLXJhZGl1czogNXB4OyBwYWRkaW5nOiAyMHB4O30NCjwvc3R5bGU+DQo8ZGl2IGNsYXNzID0gImJsdWUiPg0KDQpBIEpvdXJuYWwgb2YgdGhlIFBsYWd1ZSBZZWFyIGJ5IERhbmllbCBEZWZvZSBmcm9tIEd1dGVuYmVyZyB1c2VkIGFzIGNvcnB1cy4NCk5SQyBXb3JkLUVtb3Rpb24gQXNzb2NpYXRpb24gTGV4aWNvbiBkYXRhc2V0IGlzIHVzZWQgaW4gdGhpcyBjb2RlIChNb2hhbW1hZCAmIFR1cm5leSwgMjAxMykuDQoNClRoZSBjb3JwdXMgd2lsbCBiZSB0b2tlbml6ZWQgYW5kIHN0b3Agd29yZHMgd2lsbCBiZSByZW1vdmVkLg0KDQpUaGUgTlJDIGxleGljb24gdXNlZCB0byBpZGVudGlmeSB0b2tlbnMgcmVsYXRlZCB0byAiZmVhciIgYW5kICJqb3kiIGZvdW5kIGluIHRoZSBjb3JwdXMgaW4gYSBwbGFndWUuDQoNCkEgY291bnQgb2Ygd29yZHMgcmVsYXRlZCB0byBmZWFyIGFuZCBqb3kgYXJlIHJhbmtlZCBieSBmcmVxdWVuY3kuDQoNCjwvZGl2PiBcaGZpbGxcYnJlYWsNCmBgYHtyIHN0ZXBfeSwgZWNobz1UUlVFfQ0KIyBjb21tZW50cw0KbGlicmFyeShndXRlbmJlcmdyKQ0KbGlicmFyeSh0aWR5dGV4dCkNCg0KDQoNCmRlZm9lIDwtIGd1dGVuYmVyZ19kb3dubG9hZChjKDM3NikpICAjIEEgSm91cm5hbCBvZiB0aGUgUGxhZ3VlIFllYXIgYnkgRGFuaWVsIERlZm9lDQoNCnRpZHlfZGVmb2UgPC0gZGVmb2UgJT4lDQogIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkgJT4lDQogIGFudGlfam9pbihzdG9wX3dvcmRzKQ0KDQpucmNfc2VudGltZW50cyA8LSBnZXRfc2VudGltZW50cygibnJjIikNCg0KbnJjX2pveSA8LSBnZXRfc2VudGltZW50cygibnJjIikgJT4lDQogIGZpbHRlcihzZW50aW1lbnQgPT0gImpveSIpDQoNCm5yY19mZWFyIDwtIGdldF9zZW50aW1lbnRzKCJucmMiKSAlPiUNCiAgZmlsdGVyKHNlbnRpbWVudCA9PSAiZmVhciIpDQoNCg0KKGRlZm9lX2pveSA8LSB0aWR5X2RlZm9lICU+JQ0KICBpbm5lcl9qb2luKG5yY19qb3kpICU+JQ0KICBjb3VudCh3b3JkLCBzb3J0ID0gVFJVRSkpDQoNCihkZWZvZV9mZWFyIDwtIHRpZHlfZGVmb2UgJT4lDQogIGlubmVyX2pvaW4obnJjX2ZlYXIpICU+JQ0KICBjb3VudCh3b3JkLCBzb3J0ID0gVFJVRSkpDQoNCg0KDQpgYGANCg0KDQojIFN1bW1hcnkNCg0KPHN0eWxlPg0KZGl2LmJsdWUgeyBiYWNrZ3JvdW5kLWNvbG9yOiNlNmYwZmY7IGJvcmRlci1yYWRpdXM6IDVweDsgcGFkZGluZzogMjBweDt9DQo8L3N0eWxlPg0KPGRpdiBjbGFzcyA9ICJibHVlIj4NCg0KU2VudGltZW50IGFuYWx5c2lzIHByb3ZpZGVzIGEgd2F5IHRvIHVuZGVyc3RhbmQgdGhlIGF0dGl0dWRlcyBhbmQgb3BpbmlvbnMgZXhwcmVzc2VkIGluIHRleHRzLiAgRm9yIGV4YW1wbGUsIGluIEEgSm91cm5hbCBvZiB0aGUgUGxhZ3VlIFllYXIsIGEgc2VhcmNoIHdhcyBtYWRlIGZvciB0aGUgc2VudGltZW50cyBvZiBmZWFyIGFuZCBqb3kgaW4gYSBwbGFndWUuICBDbGVhcmx5LCAicGxhZ3VlIiBhbmQgImluZmVjdGlvbiIgYXJlIGZlYXIgcmVsYXRlZCB3b3Jkcy4gSG93ZXZlciwgc29tZSB3b3JkcyBjYW4gaGF2ZSBkdWFsIG1lYW5pbmdzIHdoZXJlICJHb2QiIGVsaWNpdHMgYm90aCBqb2IgYW5kIGZlYXIuICANCg0KPC9kaXY+IFxoZmlsbFxicmVhaw0K