1. Sentence-level Sentiment Analysis

The sentimentr package provides us with the tools we need for sentence-level sentiment analysis. We also need the lexicon package which has several of the dictionaries used by the sentimentr package.

library(sentimentr)
library(lexicon)
library(tidyverse)
library(tidytext)
library(textdata)

To illustrate how to do sentence-level sentiment analysis, let’s revisit one of our previous examples.

statement <- c("I have a few thoughts about the MSBA Chicago program.", 
               "I dislike programming in general.",
               "However, I really do like R.",
               "I do kind of love machine learning.")

corpus <- tibble(document=1:4, text = statement)
corpus

Before we get the sentiment for each sentence, we first use the get_sentences() function to do sentence boundary disambiguation. We then pass each sentence to the sentiment() function.

corpus %>%
  get_sentences() %>%
  sentiment()

Note that the element_id refers to each document and sentence_id refers to each sentence in each document. In this example, each document has only one sentence.

Let’s compare these results with those of our previous approach (word-level sentiment analysis).

corpus %>%
  unnest_tokens(word, text) %>%
  inner_join(get_sentiments("bing"), by = "word") %>%
  count(document, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
  mutate(sentiment = positive - negative) %>%
  right_join(corpus, by = "document") %>%
  select(document, text, sentiment) %>%
  arrange(document)

Question: What do the results tell you about the two approaches?

2. Impact of Valence Shifters

To illustrate the impact of valence shifters, let’s take a look at another simple example.

statement <- c("I love apple pie.", 
               "I don't love apple pie.",
               "I really really love apple pie!!",
               "I hate apple pie.",
               "I don't hate apple pie",
               "I would love to eat some apple pie but I dislike it.")

corpus <- tibble(document=1:6, text = statement)
corpus

Let’s see how sentence-level sentiment analysis handles these sentences.

corpus %>%
  get_sentences() %>%
  sentiment()

Question: Are the sentiment scores what you expect?

3. Sentiment Lexicons

Similar to word-level sentiment analysis with the tidytext package, the sentimentr package supports the use of different sentiment lexicons. These lexicons are provided by the aptly named lexicon package. The default lexicon used by sentimentr is the hash_sentiment_jockers_rinker dictionary.

hash_sentiment_jockers_rinker

To use a different dictionary for our analysis, we set the polarity_dt argument in the sentiment() function to the dictionary we want to use.

corpus %>%
  get_sentences() %>%
  sentiment(polarity_dt = lexicon::hash_sentiment_huliu)

Notice that we get different sentiment scores with this dictionary. This is expected as we move from dictionary to dictionary. To get a comprehensive list of available dictionaries, see the vignette for the lexicon package.

4. Custom Lexicons

The sentiment dictionaries in the lexicon package conform to the same general format: x for tokens and y for sentiment values. So, to create a custom dictionary, we need to adhere to this same format. For example, let’s assume that we want to add a term to the hash_sentiment_jockers_rinker dictionary. Specifically, we want to add the phrase “apple pie” to the dictionary, with a polarity of +2.

To do this, we first need to make a copy of the dictionary.

my_lexicon <- hash_sentiment_jockers_rinker

Then we use the update_key() function to add our new entry.

my_lexicon <- update_key(my_lexicon, x = data.frame(x = "apple pie", y = 2))

To verify that the update worked, let’s look up the new entry in the dictionary.

my_lexicon %>%
  filter(x == "apple pie")

Now, we can use our custom dictionary.

corpus %>%
  get_sentences() %>%
  sentiment(polarity_dt = my_lexicon)

To remove a token from a dictionary, we set the drop argument of the update_key() function to the token we want to remove.

my_lexicon <- update_key(my_lexicon, drop = "apple pie")

To verify that the update worked, let’s look up the new entry in the dictionary.

my_lexicon %>%
  filter(x == "apple pie")

5. Aggregate Sentiment

To illustrate how aggregate sentence-level sentiment analysis works, let’s once again get the completed works of Jane Austen.

library(janeaustenr)
austen_sentence_corpus <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(), chapter = cumsum(str_detect(
    text, regex("^chapter [\\divxlc]", ignore_case = TRUE)
  ))) %>%
  ungroup() 
austen_sentence_corpus

Question: Note that we are treating each line as a sentence. What is the limitation of this approach?

To get the sentence-level sentiment by book and chapter, we use the sentiment_by() function and set the aggregation levels to book and chapter.

austen_book_chapter_sentiment <- austen_sentence_corpus %>%
  get_sentences() %>%
  sentiment_by(by = c("book", "chapter"))
austen_book_chapter_sentiment

Now we can create a chart of sentence-level sentiment across the chapters of the books.

austen_book_chapter_sentiment %>%
  ggplot(mapping = aes(x = chapter, y = ave_sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ book, ncol = 2, scales = "free_x") + 
  labs(title = "Sentence-level Sentiment Analysis") +
  theme_minimal()

Let’s recreate the word-level sentence charts for comparison.

austen_word_corpus <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(), chapter = cumsum(str_detect(
    text, regex("^chapter [\\divxlc]", ignore_case = TRUE)
  ))) %>%
  ungroup() %>%
  unnest_tokens(word, text)
austen_word_corpus %>%
  inner_join(get_sentiments("bing"), by = "word") %>%
  count(book, chapter, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
  mutate(sentiment = positive - negative) %>%
  ggplot(mapping = aes(x = chapter, y = sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ book, ncol = 2, scales = "free_x") + 
  labs(title = "Word-level Sentiment Analysis") +
  theme_minimal()

Question: What can you infer from the similarities and/or differences between these two results?