Hello, it is now week 10 and we’re tasked with Sentiment analysis. We see sentiment analysis come up in chapter 2 of our book “Text Mining with R”. Our task for this assignment is to reproduce and extend the example provided in the chapter.
Planned Workflow
I plan to obtain the primary example code from the chapter, cite it, and create a quarto document that can run the code. Then I’ll have to extend it, which I plan to do by choosing a different text corpus, which will be using gutenbergr, and I’ll choose one sentiment lexicon which will be syuzhet.
Anticipated Challenges
A challenge I can foresee dealing with will be successfully extending the example with my two unique choices of gutenbergr and syuzhet. Successfully loading and using them as a text corpus and lexicon will be the main event for this assignment.
Source: Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. O’Reilly Media. Chapter 2.
Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 435434 of `x` matches multiple rows in `y`.
ℹ Row 5051 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
In conclusion, using the syuzhet lexicon for Frankenstein helped show the emotional weight of the amount of negative words vs. positive words on the plot. Per 100 word chunks, we can see that as the story progresses there’s a drastic drop in the total sentiment score. This drop can signal that in the story, there are many moments that are written that can be interpreted as dark and scary in nature. The plot as a whole shows that this story is primarily very dark with a few light moments throughout the story.
This is different in comparison to the example from chapter 2 I reproduced. Chapter 2 showed that the NRC and bing lexicons are not weighted but rather a sum per 80 words. A bad word is a negative like -1 and a positive word gives a +1. So per 80 it’s doing a sum count of the positive and negative words and plots that.