require(dplyr)
require(tidytext)
require(tidyverse)
require(ggplot2)
require(widyr)
require(tidyr)
require(kableExtra)
With the csv that was created in Blog 1, we can now do a Sentiment Analysis, but what is Sentiment Analysis exactly. Sentiment Analysis is a task of computing that determines automatically what type of feelings a writer is trying to express in their text. Sentiments can be framed in many ways whether it be a binary distinction like it being positive or negative, or it they could be expressing a specific type of emotion. There are three most used types of Sentiments which are NRC, BING, and AFINN. NRC Word-Emotion Association Lexicon (aka EmoLex), is a list of English words which associates with eight basic words: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust. Along with two sentiments: positive and negative. BING evaluates the words as positive or negative. AFINN, in AFINN words are given numerical values that are either positive or negative.
Just a reminder on what articles were used to create the csv. The articles that were chosen for this all involve music. The first article used is called “FEEL THE RHYTHM: HOW MUSIC AFFECTS MOVEMENT” and it is written by Chrissy Watson, and it was published on June 21, 2018, and it is apart of the Star Center website. The second article is called “Understanding Why Music Moves Us” it is written by Maia Szalavitz and it was published on Dec. 24, 2012, and it is part of Time Magazine. The third article is called “The music moves us — but how?”, written by Dan Falk, and it was published on 08/03/2018
Before any of this can be done we first have to clean the data and get it ready for text analyzation. The way that the data gets cleaned is be removing the stop words that are provided by tidytext, removing words that are under 3 letters, and tokenizing the words so that each word has its own column.
## This method was found in datacamp with their example of sentiment analysis
music_tidy <- music %>%
unnest_tokens(word, Text) %>%
filter(!nchar(word) < 3) %>% #Words that are less than three letters
anti_join(stop_words) #Data provided by the tidytext package
## Joining, by = "word"
glimpse(music_tidy)
## Rows: 1,592
## Columns: 3
## $ X <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ Title <chr> "Feel the Rhythm: How Music Affects Movement - STAR Center", ...
## $ word <chr> "music", "moves", "hear", "time", "introduce", "music", "ther...
By doing this you can see that there are 1,592 words and 3 columns.
Here you can see the top six words.
| word | n |
|---|---|
| music | 83 |
| beat | 17 |
| dance | 17 |
| movement | 15 |
| people | 14 |
| brain | 13 |
The first Sentiment that we are going to do is NRC, and again NRC uses 8 words along with positive or negative.
nrc_fear <- get_sentiments("nrc") %>%
filter(sentiment == "fear")
music_nrc<-music_tidy %>%
inner_join(nrc_fear) %>%
count(word, sort = TRUE)
## Joining, by = "word"
kable(head(music_nrc)) %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = T, border_right = F) %>%
column_spec(1, width = "15em", background = "pink")
| word | n |
|---|---|
| feeling | 5 |
| blues | 3 |
| disease | 3 |
| infant | 2 |
| pain | 2 |
| abuse | 1 |
nrc_joy <- get_sentiments("nrc") %>%
filter(sentiment == "joy")
music_nrcpos<-music_tidy %>%
inner_join(nrc_joy) %>%
count(word, sort = TRUE)
## Joining, by = "word"
kable(head(music_nrcpos)) %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = T, border_right = F) %>%
column_spec(1, width = "15em", background = "pink")
| word | n |
|---|---|
| music | 83 |
| dance | 17 |
| musical | 6 |
| feeling | 5 |
| share | 5 |
| favorite | 4 |
music_counts <- music_tidy %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining, by = "word"
kable(head(music_counts)) %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = T, border_right = F) %>%
column_spec(1, width = "15em", background = "pink")
| word | sentiment | n |
|---|---|---|
| favorite | positive | 4 |
| advantage | positive | 3 |
| angry | negative | 3 |
| famously | positive | 3 |
| pretty | positive | 3 |
| strong | positive | 3 |
The writing in Article 1 is completely positive.
The writing in Article 2 is mostly positive but it has some negative.
The writing in Article 3 is completely positive.
music_afinn <- music_tidy%>%
inner_join(get_sentiments("afinn"), by = "word") %>%
group_by(X)%>%
summarize(value = sum(value * n() / sum(n())))
## `summarise()` ungrouping output (override with `.groups` argument)
music_afinn %>%
mutate(X= reorder(X, value)) %>%
ggplot(aes(X, value, fill = value > 0)) +
geom_col(show.legend = FALSE) +
coord_flip() +
ylab("Average sentiment value")
What does this tell us about the articles? Since the majority of the articles convey such high positivity, it shows that when the writers were writing these articles they wanted it to have a sense of joy. Considering that the articles are based on how music moves us it is seen as something joyous, and article 3 seems to convey the most joy in music.