Blog 2

Sentiment Analysis

Packages needed for this blog

require(dplyr)
require(tidytext)
require(tidyverse)
require(ggplot2)
require(widyr)
require(tidyr)
require(kableExtra)

With the csv that was created in Blog 1, we can now do a Sentiment Analysis, but what is Sentiment Analysis exactly. Sentiment Analysis is a task of computing that determines automatically what type of feelings a writer is trying to express in their text. Sentiments can be framed in many ways whether it be a binary distinction like it being positive or negative, or it they could be expressing a specific type of emotion. There are three most used types of Sentiments which are NRC, BING, and AFINN. NRC Word-Emotion Association Lexicon (aka EmoLex), is a list of English words which associates with eight basic words: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust. Along with two sentiments: positive and negative. BING evaluates the words as positive or negative. AFINN, in AFINN words are given numerical values that are either positive or negative.

Just a reminder on what articles were used to create the csv. The articles that were chosen for this all involve music. The first article used is called “FEEL THE RHYTHM: HOW MUSIC AFFECTS MOVEMENT” and it is written by Chrissy Watson, and it was published on June 21, 2018, and it is apart of the Star Center website. The second article is called “Understanding Why Music Moves Us” it is written by Maia Szalavitz and it was published on Dec. 24, 2012, and it is part of Time Magazine. The third article is called “The music moves us — but how?”, written by Dan Falk, and it was published on 08/03/2018

Cleaning up the data

Before any of this can be done we first have to clean the data and get it ready for text analyzation. The way that the data gets cleaned is be removing the stop words that are provided by tidytext, removing words that are under 3 letters, and tokenizing the words so that each word has its own column.

## This method was found in datacamp with their example of sentiment analysis


music_tidy <- music %>%
  unnest_tokens(word, Text) %>%
  filter(!nchar(word) < 3) %>% #Words that are less than three letters 
  anti_join(stop_words) #Data provided by the tidytext package

## Joining, by = "word"

glimpse(music_tidy)

## Rows: 1,592
## Columns: 3
## $ X     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ Title <chr> "Feel the Rhythm: How Music Affects Movement - STAR Center", ...
## $ word  <chr> "music", "moves", "hear", "time", "introduce", "music", "ther...

By doing this you can see that there are 1,592 words and 3 columns.

Here you can see the top six words.

word	n
music	83
beat	17
dance	17
movement	15
people	14
brain	13

NRC

The first Sentiment that we are going to do is NRC, and again NRC uses 8 words along with positive or negative.

Negative NRC

Using the negative NRC word fear
if you notice tidytext is used when doing an inner_join of the sentiment.

nrc_fear <- get_sentiments("nrc") %>% 
  filter(sentiment == "fear")

music_nrc<-music_tidy  %>%
 
  inner_join(nrc_fear) %>%
  count(word, sort = TRUE)

## Joining, by = "word"

kable(head(music_nrc)) %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T, border_right = F) %>%
  column_spec(1, width = "15em", background = "pink")

word	n
feeling	5
blues	3
disease	3
infant	2
pain	2
abuse	1

Positive NRC

Using the positive NRC word joy

nrc_joy <- get_sentiments("nrc") %>% 
  filter(sentiment == "joy")

music_nrcpos<-music_tidy  %>%
 
  inner_join(nrc_joy) %>%
  count(word, sort = TRUE)

## Joining, by = "word"

kable(head(music_nrcpos)) %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T, border_right = F) %>%
  column_spec(1, width = "15em", background = "pink")

word	n
music	83
dance	17
musical	6
feeling	5
share	5
favorite	4

BING

Here we are going to look at BING, and again BING uses positive and negative words.

music_counts <- music_tidy %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()

## Joining, by = "word"

kable(head(music_counts)) %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T, border_right = F) %>%
  column_spec(1, width = "15em", background = "pink")

word	sentiment	n
favorite	positive	4
advantage	positive	3
angry	negative	3
famously	positive	3
pretty	positive	3
strong	positive	3

The words here seem to be positive, only one word is negative which is angry.

Visualization of BING

Closer look of the aricles to see if it is mostly positve

The writing in Article 1 is completely positive.
The writing in Article 2 is mostly positive but it has some negative.
The writing in Article 3 is completely positive.

Afinn

Here we can see the average sentiment values of each of the articles.

music_afinn <- music_tidy%>%
  inner_join(get_sentiments("afinn"), by = "word") %>%
  group_by(X)%>%
  summarize(value = sum(value * n() / sum(n())))

## `summarise()` ungrouping output (override with `.groups` argument)

music_afinn %>%
  mutate(X= reorder(X, value)) %>%
  ggplot(aes(X, value, fill = value > 0)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  ylab("Average sentiment value")

Article 3 has the highest average sentiment.

What does this tell us about the articles? Since the majority of the articles convey such high positivity, it shows that when the writers were writing these articles they wanted it to have a sense of joy. Considering that the articles are based on how music moves us it is seen as something joyous, and article 3 seems to convey the most joy in music.