Sentiment Analysis

Author

Desiree Thomas, Denise Atherley, Kiera Griffiths

Approach

Methodology:

For this assignment, we are going to reproduce the base example of Sentiment Analysis in Chapter 2 of Text Mining with R. We are going to use the following packages: tidytext, dplyr and stringr. The goal here is to use Tidy Text philosophy, which can be done through functions such as un_nest_tokens() and inner_join(). We will be reproducing the sentiment path of Jane Austen’s novel and using the janeaustenr package to do so. We will also use the gutenbergr package to choose another work that has a significantly different tone. We will also perform a comparative validation to determine how much the lexicons agree by calculating the correlation between the sentiment scores that are produced by the different lexicons in the same segments to see where they diverge. Some of the data challenges that we anticipate is the potential sparsity of the lexicons. These lexicons are finite and many of the chosen words in the corpus may not exist in the lexicon. We will have to calculate the coverage rate to determine if the sentiment score that results will actually be representative of the text.