Initial goal is to recreate the original code according to the analysis in Chapter 2 of Text mining with R. I will outline the example using tidytext and the associated dictionaries.
The primary example demonstrates using sentiment analysis to view whether the passages can be considered most negative or positive using a frequency count of the type of word used. The data is from chapters in classical books.
To extend this example: I attempt to perform a similar analysis using Fredrick Douglas’ “What to the Slave is the Fourth of July?”. I feel like it is a complicated speech on a nuanced topic and thus the type of information needed to challenge this type of model. Finally i will see the difference compared to the original example.
Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 435434 of `x` matches multiple rows in `y`.
ℹ Row 5051 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
As stated in the code approach i will attempt to perform a similar sentiment analysis using the example above from Text Mining with R. In this case I will use this information in order to piece together the sentiment analysis on an important speech “What to the Slave is the Fourth of July?. I will be using the except version for the speech as an example to use. The original is several pages and even thoough its sentiment is clear, there is a chance with such a long text the sentiment can be null by the sheer volume of the text.
Douglass speechw as outlien during the 4th of july in 1852, Due to the political climate he was in fact very frustrated(to say the least, so I will see the sentiment of the speech that correlates to anger.)
We can see a lot of anger and words associated with anger, with the most common being slave and slavery with have been mention multiple times. There is reference to abuse , fury , and horrible as well.
Use of Sentiment referencing Bing-
Basically the simplest way is to use the information and outline established by the Chapter 2 and outline it using our own data. Although there needs to be significant changes. As the data should be overall smaller than a whole excerpt from a book.Since there was a outline of 63 total lines collects. We can divide the lines by 5. This should provide a good amount of columns to display the change in sentiment.
library(tidyr)douglass_sentiment <- tidy_speech %>%inner_join(get_sentiments("bing"), by ="word") %>%group_by(index = line %/%5) %>%count(sentiment)%>%pivot_wider(names_from = sentiment, values_from = n, values_fill =0) %>%mutate(sentiment = positive - negative) %>%ungroup()
Next we will try to simulate the graph that was in the example.
ggplot(douglass_sentiment, aes(x = index, y = sentiment, fill= sentiment)) +geom_col(show.legend =FALSE)+labs ( title ="Sentiment- NRC",y ="Sentiment Value",x ="Index")
By viewing this display we can see that the sentiment for Fredrick Douglass’ speech started fairly light and then became significantly negative as time passed. The later half of his speech was fully negative, mainly to call out the injustice that he witness and to leave a mark on the American audience at the time.
Now that we have the accomplished. We can highlight the other formats. In our original example, multiple excerpt where displayed. Since we are only adding one source, we can use this to compare different sentiment measurements.
Joining with `by = join_by(word)`
Joining with `by = join_by(word)`
Warning in inner_join(., get_sentiments("nrc") %>% filter(sentiment %in% : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 620 of `x` matches multiple rows in `y`.
ℹ Row 4637 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
When comparing the different lexicons side by side we can see a big difference in the way they are display in the assignment. Reading through the passage, I believe that the Bing lexicon has the most accurate depiction of the sentiment of the passage given. The passage has started with a praise about the hardworking nature of the American citizens, then makes a drastic and dark turn into explaining the cruelty of slavery and the American dream in the eyes of a slave as well as a criticism of the Fugitive Slave Law.