Introduction
In this assignment we will be looking at different books and using sentiment lexicons to analyze the writing in the book. The lexicon sentiments have different ways of ranking words which is why each lexicon is different from each other. We will be using the example from Text Mining with R (Julia Silge and David Robinson, n.d.) and will be expanding on the example with different books and sentiment lexicons
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Using our own book and sentiment lexicon
In our odn example we will be using a sentiment lexicon called Loughran-McDonald (EmilHvitfeldt, n.d.). This setiment Loughran-McDonald is used for financial documents and has the 6 sentiments: negative, positive, ligigious, uncertainty, constraining, or superfluous. I will be using this sentiment lexicon on different financial pages from the New York Times using the RESTFUL api.
library(httr)
library(dotenv)
library(jsonlite)
api = Sys.getenv('API_KEY')
base = 'https://api.nytimes.com'
response = GET(base, path='/svc/search/v2/articlesearch.json', query = list('api-key' = api, 'fq' = 'news_desk(\"Financial\")', 'type_of_material' = 'Article', 'section_name' = 'Business'))
response_text = content(response, 'text')
No encoding supplied: defaulting to UTF-8.
json_file = fromJSON(response_text)
df = json_file$response$docs$lead_paragraph
hold = data.frame()
for (x in df)
{
text = strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
hold2 = data.frame(c(word = text))
hold = rbind(hold, hold2)
}
hold %>%
inner_join(sentiment) %>%
count(word, sentiment, sort = TRUE)
Joining, by = "word"
Conclusion
Using sentiments is a very helpful tool to help for understanding the information in a document for computers. The issue with sentiments is that it requires a lot of man power as it needs each word to have a label to associate with the word. The more words that the sentiment has better results one will get. This is why i would recommend the use of the NRC sentiments ash it has 13,901 words in the list.
EmilHvitfeldt. n.d.
“Textdata.” github.
https://github.com/EmilHvitfeldt/textdata.
Julia Silge and David Robinson. n.d.
“Sentiment Analysis with Tidy Data.” github.
https://www.tidytextmining.com/sentiment.html#sentiment.
LS0tCnRpdGxlOiAiRGF0YSA2MDcgSG9tZXdvcmsgMTAiCm91dHB1dDogaHRtbF9ub3RlYm9vawpyZWZlcmVuY2VzOgotIGlkOiB0aWR5dGV4dG1pbmluZwogIHRpdGxlOiBTZW50aW1lbnQgYW5hbHlzaXMgd2l0aCB0aWR5IGRhdGEKICBhdXRob3I6IEp1bGlhIFNpbGdlIGFuZCBEYXZpZCBSb2JpbnNvbgogIHVybDogaHR0cHM6Ly93d3cudGlkeXRleHRtaW5pbmcuY29tL3NlbnRpbWVudC5odG1sI3NlbnRpbWVudAogIHB1Ymxpc2hlcjogZ2l0aHViCiAgdHlwZTogd2Vic2l0ZQotIGlkOiB0ZXh0ZGF0YQogIHRpdGxlOiB0ZXh0ZGF0YQogIGF1dGhvcjogRW1pbEh2aXRmZWxkdAogIHVybDogaHR0cHM6Ly9naXRodWIuY29tL0VtaWxIdml0ZmVsZHQvdGV4dGRhdGEKICBwdWJsaXNoZXI6IGdpdGh1YgogIHR5cGU6IHdlYnNpdGUKLS0tCgojIEludHJvZHVjdGlvbgoKSW4gdGhpcyBhc3NpZ25tZW50IHdlIHdpbGwgYmUgbG9va2luZyBhdCBkaWZmZXJlbnQgYm9va3MgYW5kIHVzaW5nIHNlbnRpbWVudCBsZXhpY29ucyB0byBhbmFseXplIHRoZSB3cml0aW5nIGluIHRoZSBib29rLiBUaGUgbGV4aWNvbiBzZW50aW1lbnRzIGhhdmUgZGlmZmVyZW50IHdheXMgb2YgcmFua2luZyB3b3JkcyB3aGljaCBpcyB3aHkgZWFjaCBsZXhpY29uIGlzIGRpZmZlcmVudCBmcm9tIGVhY2ggb3RoZXIuIFdlIHdpbGwgYmUgdXNpbmcgdGhlIGV4YW1wbGUgZnJvbSBUZXh0IE1pbmluZyB3aXRoIFIgW0B0aWR5dGV4dG1pbmluZ10gYW5kIHdpbGwgYmUgZXhwYW5kaW5nIG9uIHRoZSBleGFtcGxlIHdpdGggZGlmZmVyZW50IGJvb2tzIGFuZCBzZW50aW1lbnQgbGV4aWNvbnMKCmBgYHtyIHRleHRib29rIGV4YW1wbGV9CmxpYnJhcnkoamFuZWF1c3RlbnIpCmxpYnJhcnkoZHBseXIpCmxpYnJhcnkoc3RyaW5ncikKbGlicmFyeSh0aWR5dGV4dCkKCnRpZHlfYm9va3MgPC0gYXVzdGVuX2Jvb2tzKCkgJT4lCiAgZ3JvdXBfYnkoYm9vaykgJT4lCiAgbXV0YXRlKAogICAgbGluZW51bWJlciA9IHJvd19udW1iZXIoKSwKICAgIGNoYXB0ZXIgPSBjdW1zdW0oc3RyX2RldGVjdCh0ZXh0LCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICByZWdleCgiXmNoYXB0ZXIgW1xcZGl2eGxjXSIsIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGlnbm9yZV9jYXNlID0gVFJVRSkpKSkgJT4lCiAgdW5ncm91cCgpICU+JQogIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkKCm5yY19qb3kgPC0gZ2V0X3NlbnRpbWVudHMoIm5yYyIpICU+JSAKICBmaWx0ZXIoc2VudGltZW50ID09ICJqb3kiKQoKdGlkeV9ib29rcyAlPiUKICBmaWx0ZXIoYm9vayA9PSAiRW1tYSIpICU+JQogIGlubmVyX2pvaW4obnJjX2pveSkgJT4lCiAgY291bnQod29yZCwgc29ydCA9IFRSVUUpCmBgYAoKIyBVc2luZyBvdXIgb3duIGJvb2sgYW5kIHNlbnRpbWVudCBsZXhpY29uCgpJbiBvdXIgb2RuIGV4YW1wbGUgd2Ugd2lsbCBiZSB1c2luZyBhIHNlbnRpbWVudCBsZXhpY29uIGNhbGxlZCBMb3VnaHJhbi1NY0RvbmFsZCBbQHRleHRkYXRhXS4gVGhpcyBzZXRpbWVudCBMb3VnaHJhbi1NY0RvbmFsZCBpcyB1c2VkIGZvciBmaW5hbmNpYWwgZG9jdW1lbnRzIGFuZCBoYXMgdGhlIDYgc2VudGltZW50czogbmVnYXRpdmUsIHBvc2l0aXZlLCBsaWdpZ2lvdXMsIHVuY2VydGFpbnR5LCBjb25zdHJhaW5pbmcsIG9yIHN1cGVyZmx1b3VzLiBJIHdpbGwgYmUgdXNpbmcgdGhpcyBzZW50aW1lbnQgbGV4aWNvbiBvbiBkaWZmZXJlbnQgZmluYW5jaWFsIHBhZ2VzIGZyb20gdGhlIE5ldyBZb3JrIFRpbWVzIHVzaW5nIHRoZSBSRVNURlVMIGFwaS4KCmBgYHtyIGdldHRpbmcgdGhlIGZpbmFuY2lhbCBhcnRpY2FsIGZyb20gbmV3IHlvcmsgdGltZXMufQpsaWJyYXJ5KGh0dHIpCmxpYnJhcnkoZG90ZW52KQpsaWJyYXJ5KGpzb25saXRlKQoKYXBpID0gU3lzLmdldGVudignQVBJX0tFWScpCmJhc2UgPSAnaHR0cHM6Ly9hcGkubnl0aW1lcy5jb20nCgpyZXNwb25zZSA9IEdFVChiYXNlLCBwYXRoPScvc3ZjL3NlYXJjaC92Mi9hcnRpY2xlc2VhcmNoLmpzb24nLCBxdWVyeSA9IGxpc3QoJ2FwaS1rZXknID0gYXBpLCAnZnEnID0gJ25ld3NfZGVzayhcIkZpbmFuY2lhbFwiKScsICd0eXBlX29mX21hdGVyaWFsJyA9ICdBcnRpY2xlJywgJ3NlY3Rpb25fbmFtZScgPSAnQnVzaW5lc3MnKSkKCnJlc3BvbnNlX3RleHQgPSBjb250ZW50KHJlc3BvbnNlLCAndGV4dCcpCgpqc29uX2ZpbGUgPSBmcm9tSlNPTihyZXNwb25zZV90ZXh0KQoKZGYgPSBqc29uX2ZpbGUkcmVzcG9uc2UkZG9jcyRsZWFkX3BhcmFncmFwaAoKaGVhZChkZikKYGBgCgpgYGB7cn0KbGlicmFyeSh0ZXh0ZGF0YSkKCnNlbnRpbWVudCA9IGxleGljb25fbG91Z2hyYW4oKQoKaG9sZCA9IGRhdGEuZnJhbWUoKQoKZm9yICh4IGluIGRmKQp7CiAgdGV4dCA9IHN0cnNwbGl0KHgsICIoXFxzKyl8KD8hJykoPz1bWzpwdW5jdDpdXSkiLCBwZXJsID0gVFJVRSkKICBob2xkMiA9IGRhdGEuZnJhbWUoYyh3b3JkID0gdGV4dCkpCiAgaG9sZCA9IHJiaW5kKGhvbGQsIGhvbGQyKQp9CmBgYAoKCmBgYHtyfQpob2xkICU+JQogIGlubmVyX2pvaW4oc2VudGltZW50KSAlPiUKICBjb3VudCh3b3JkLCBzZW50aW1lbnQsIHNvcnQgPSBUUlVFKQpgYGAKCiMgQ29uY2x1c2lvbgoKVXNpbmcgc2VudGltZW50cyBpcyBhIHZlcnkgaGVscGZ1bCB0b29sIHRvIGhlbHAgZm9yIHVuZGVyc3RhbmRpbmcgdGhlIGluZm9ybWF0aW9uIGluIGEgZG9jdW1lbnQgZm9yIGNvbXB1dGVycy4gVGhlIGlzc3VlIHdpdGggc2VudGltZW50cyBpcyB0aGF0IGl0IHJlcXVpcmVzIGEgbG90IG9mIG1hbiBwb3dlciBhcyBpdCBuZWVkcyBlYWNoIHdvcmQgdG8gaGF2ZSBhIGxhYmVsIHRvIGFzc29jaWF0ZSB3aXRoIHRoZSB3b3JkLiBUaGUgbW9yZSB3b3JkcyB0aGF0IHRoZSBzZW50aW1lbnQgaGFzIGJldHRlciByZXN1bHRzIG9uZSB3aWxsIGdldC4gVGhpcyBpcyB3aHkgaSB3b3VsZCByZWNvbW1lbmQgdGhlIHVzZSBvZiB0aGUgTlJDIHNlbnRpbWVudHMgYXNoIGl0IGhhcyAxMyw5MDEgd29yZHMgaW4gdGhlIGxpc3QuIAo=