Sentiment Analysis

Packages needed for this blog

require(dplyr)
require(tidytext)
require(tidyverse)
require(ggplot2)
require(widyr)
require(tidyr)
require(kableExtra)

With the csv that was created in Blog 1, we can now do a Sentiment Analysis, but what is Sentiment Analysis exactly. Sentiment Analysis is a task of computing that determines automatically what type of feelings a writer is trying to express in their text. Sentiments can be framed in many ways whether it be a binary distinction like it being positive or negative, or it they could be expressing a specific type of emotion. There are three most used types of Sentiments which are NRC, BING, and AFINN. NRC Word-Emotion Association Lexicon (aka EmoLex), is a list of English words which associates with eight basic words: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust. Along with two sentiments: positive and negative. BING evaluates the words as positive or negative. AFINN, in AFINN words are given numerical values that are either positive or negative.

Just a reminder on what articles were used to create the csv. The articles that were chosen for this all involve music. The first article used is called “FEEL THE RHYTHM: HOW MUSIC AFFECTS MOVEMENT” and it is written by Chrissy Watson, and it was published on June 21, 2018, and it is apart of the Star Center website. The second article is called “Understanding Why Music Moves Us” it is written by Maia Szalavitz and it was published on Dec. 24, 2012, and it is part of Time Magazine. The third article is called “The music moves us — but how?”, written by Dan Falk, and it was published on 08/03/2018

Cleaning up the data

Before any of this can be done we first have to clean the data and get it ready for text analyzation. The way that the data gets cleaned is be removing the stop words that are provided by tidytext, removing words that are under 3 letters, and tokenizing the words so that each word has its own column.

## This method was found in datacamp with their example of sentiment analysis


music_tidy <- music %>%
  unnest_tokens(word, Text) %>%
  filter(!nchar(word) < 3) %>% #Words that are less than three letters 
  anti_join(stop_words) #Data provided by the tidytext package
## Joining, by = "word"
glimpse(music_tidy)
## Rows: 1,592
## Columns: 3
## $ X     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ Title <chr> "Feel the Rhythm: How Music Affects Movement - STAR Center", ...
## $ word  <chr> "music", "moves", "hear", "time", "introduce", "music", "ther...

By doing this you can see that there are 1,592 words and 3 columns.

Here you can see the top six words.

word n
music 83
beat 17
dance 17
movement 15
people 14
brain 13

NRC

The first Sentiment that we are going to do is NRC, and again NRC uses 8 words along with positive or negative.

Negative NRC

  • Using the negative NRC word fear
  • if you notice tidytext is used when doing an inner_join of the sentiment.
nrc_fear <- get_sentiments("nrc") %>% 
  filter(sentiment == "fear")

music_nrc<-music_tidy  %>%
 
  inner_join(nrc_fear) %>%
  count(word, sort = TRUE)
## Joining, by = "word"
kable(head(music_nrc)) %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T, border_right = F) %>%
  column_spec(1, width = "15em", background = "pink")
word n
feeling 5
blues 3
disease 3
infant 2
pain 2
abuse 1

Positive NRC

  • Using the positive NRC word joy
nrc_joy <- get_sentiments("nrc") %>% 
  filter(sentiment == "joy")

music_nrcpos<-music_tidy  %>%
 
  inner_join(nrc_joy) %>%
  count(word, sort = TRUE)
## Joining, by = "word"
kable(head(music_nrcpos)) %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T, border_right = F) %>%
  column_spec(1, width = "15em", background = "pink")
word n
music 83
dance 17
musical 6
feeling 5
share 5
favorite 4

BING

music_counts <- music_tidy %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
## Joining, by = "word"
kable(head(music_counts)) %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T, border_right = F) %>%
  column_spec(1, width = "15em", background = "pink")
word sentiment n
favorite positive 4
advantage positive 3
angry negative 3
famously positive 3
pretty positive 3
strong positive 3

Visualization of BING

Closer look of the aricles to see if it is mostly positve

  • The writing in Article 1 is completely positive.

  • The writing in Article 2 is mostly positive but it has some negative.

  • The writing in Article 3 is completely positive.

Afinn

music_afinn <- music_tidy%>%
  inner_join(get_sentiments("afinn"), by = "word") %>%
  group_by(X)%>%
  summarize(value = sum(value * n() / sum(n())))
## `summarise()` ungrouping output (override with `.groups` argument)
music_afinn %>%
  mutate(X= reorder(X, value)) %>%
  ggplot(aes(X, value, fill = value > 0)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  ylab("Average sentiment value")

What does this tell us about the articles? Since the majority of the articles convey such high positivity, it shows that when the writers were writing these articles they wanted it to have a sense of joy. Considering that the articles are based on how music moves us it is seen as something joyous, and article 3 seems to convey the most joy in music.