INTRO

In this R Markdown file, I will explore the tidytext package and demonstrate how multiple sentiment lexicons can be used to infer whether a section of text conveys a positive or negative sentiment.

I’ll begin by configuring the R Markdown settings and loading the necessary packages for analysis. The primary example code from Chapter 2 of Text Mining with R: A Tidy Approach by Julia Silge and David Robinson will serve as my foundation, which I will then extend. This book provides a practical framework for applying tidy principles to text analysis and is available online at https://www.tidytextmining.com (Silge & Robinson, 2017).

Configuring R Markdown settings

This is a setup chunk that configures global options for all subsequent code chunks in the document. “include=FALSE” means the chunk itself won’t appear in the final rendered document (neither code nor output).

“knitr::opts_chunk$set(echo = TRUE)” sets the default behavior so that code will be shown (echoed) in the output for all other chunks unless overridden. It’s useful for transparency and teaching.

This sets the CRAN mirror to a specific URL (https://cran.rstudio.com) so that any package installations during the session use this repository.

“echo=TRUE” means the code will be displayed in the rendered document, which can be helpful for reproducibility or documentation.

options(repos = c(CRAN = "https://cran.rstudio.com"))

Install and load necessary packages

The following code automatically installs and loads required packages.

req_packages <- c("DBI","RMySQL","dplyr","dbplyr","knitr","tidyr", "readr", "stringr","tibble", "rmarkdown", "purrr", "lubridate", "here", "httr2", "httr", "janitor", "RCurl","rvest","xml2","jsonlite","kableExtra", "tidytext","janeaustenr", "geniusr","sentimentr","syuzhet","ggplot2","ggwordcloud","wordcloud2","wordcloud","reshape2")
for (pkg in req_packages) {
  if (!require(pkg, character.only = TRUE)) {
    message(paste("Installing package:", pkg))
    install.packages(pkg, dependencies = TRUE)
  } else {
    message(paste(pkg, " already installed."))
  }
  library(pkg, character.only = TRUE)
}
## Loading required package: DBI
## DBI  already installed.
## Loading required package: RMySQL
## RMySQL  already installed.
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## dplyr  already installed.
## Loading required package: dbplyr
## 
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
## 
##     ident, sql
## dbplyr  already installed.
## Loading required package: knitr
## knitr  already installed.
## Loading required package: tidyr
## tidyr  already installed.
## Loading required package: readr
## readr  already installed.
## Loading required package: stringr
## stringr  already installed.
## Loading required package: tibble
## tibble  already installed.
## Loading required package: rmarkdown
## rmarkdown  already installed.
## Loading required package: purrr
## purrr  already installed.
## Loading required package: lubridate
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## lubridate  already installed.
## Loading required package: here
## here() starts at /Users/paulabrown/Documents/CUNY SPS- Data 607/Week 10 Assignments
## here  already installed.
## Loading required package: httr2
## httr2  already installed.
## Loading required package: httr
## httr  already installed.
## Loading required package: janitor
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
## janitor  already installed.
## Loading required package: RCurl
## 
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
## 
##     complete
## RCurl  already installed.
## Loading required package: rvest
## 
## Attaching package: 'rvest'
## The following object is masked from 'package:readr':
## 
##     guess_encoding
## rvest  already installed.
## Loading required package: xml2
## 
## Attaching package: 'xml2'
## The following object is masked from 'package:httr2':
## 
##     url_parse
## xml2  already installed.
## Loading required package: jsonlite
## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten
## jsonlite  already installed.
## Loading required package: kableExtra
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## kableExtra  already installed.
## Loading required package: tidytext
## tidytext  already installed.
## Loading required package: janeaustenr
## janeaustenr  already installed.
## Loading required package: geniusr
## geniusr  already installed.
## Loading required package: sentimentr
## sentimentr  already installed.
## Loading required package: syuzhet
## 
## Attaching package: 'syuzhet'
## The following object is masked from 'package:sentimentr':
## 
##     get_sentences
## syuzhet  already installed.
## Loading required package: ggplot2
## ggplot2  already installed.
## Loading required package: ggwordcloud
## ggwordcloud  already installed.
## Loading required package: wordcloud2
## wordcloud2  already installed.
## Loading required package: wordcloud
## Loading required package: RColorBrewer
## wordcloud  already installed.
## Loading required package: reshape2
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
## reshape2  already installed.

Example: Jane Austen Corpus

Following the approach from Chapter 2, we tokenize Jane Austen’s works:

tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(text, 
                                regex("^chapter [\\divxlc]", 
                                      ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)

kable(head(tidy_books,5), align = rep('l', ncol(tidy_books)))%>% #Preview first 5 rows of tidied book data
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))
book linenumber chapter word
Sense & Sensibility 1 0 sense
Sense & Sensibility 1 0 and
Sense & Sensibility 1 0 sensibility
Sense & Sensibility 3 0 by
Sense & Sensibility 3 0 jane

Analysis of Song Lyrics

I now explore text mining using a different corpus. The corpus will consist of song lyrics scraped from the internet using an API token from Genius.com via the geniusr package.

Retrieve Artist Information

Retrieve the artist ID via the “search_artist()” function to locate song lyrics.

#genius_api_tk <- Sys.getenv("GENIUS_API_TOKEN")
#genius_token(genius_api_tk) #Since there are issues with the genius_token() function in the package geniusr, we will do a workaround that may be redundant but gets the job done.

Sys.setenv("GENIUS_API_TOKEN"=Sys.getenv("GENIUS_API_TOKEN"))

# Search artist
artist_raw <- search_artist("Cynthia Erivo")

# Convert to data frame without altering column names
artist_df <- as.data.frame(artist_raw)

# Display with styling
kable(artist_df, align = rep("l", ncol(artist_df))) %>%
  kable_styling(position = "left", bootstrap_options = c("basic", "bordered", "condensed")) %>%
  scroll_box(height = "150px")
artist_id artist_name artist_url
263409 Cynthia Erivo https://genius.com/artists/Cynthia-erivo
26507 Ariana Grande https://genius.com/artists/Ariana-grande

Find Target Song

Find a song to analyze the lyrics.

#Search songs
songs <- search_song("Stand Up")

#Filter songs for artist ID = 263409 (Cynthia Erivo)
artist_songs <- songs %>%
  filter(artist_id == 263409)

#Display filtered songs
kable(artist_songs, align = rep("l", ncol(artist_songs))) %>% # "align = rep("l", ncol(your_table)))" aligns text to the left
  kable_styling(position = "left", bootstrap_options = c("basic","condensed","bordered"))%>%
  scroll_box(height = "100px")
song_id song_name song_lyrics_url artist_id artist_name
4971319 Stand Up https://genius.com/Cynthia-erivo-stand-up-lyrics 263409 Cynthia Erivo

Extract Lyrics from Genius API

token <- Sys.getenv("GENIUS_ACCESS_TOKEN")  #Make sure this returns your token

res <- GET(
  url = "https://api.genius.com/songs/4971319",
  add_headers(Authorization = paste("Bearer", token))
)

content <- content(res, as = "parsed", simplifyVector = TRUE)
#str(content)

song_meta <- content$response$song

artist_song_info <- tibble(
  title = song_meta$title,
  artist = song_meta$artist_names,
  url = song_meta$url,
  release_date = song_meta$release_date,
  pageviews = song_meta$pageviews,
  annotation_count = song_meta$annotation_count
)
kable(artist_song_info, align = rep("l", ncol(artist_songs))) %>%
  kable_styling(bootstrap_options = c("basic","condensed","bordered")) %>%
  scroll_box(height = "100px")
title artist url release_date annotation_count
Stand Up Cynthia Erivo https://genius.com/Cynthia-erivo-stand-up-lyrics 2019-10-25 21

Scrape Lyrics from the Web Page

song_url <- artist_song_info$url # assign URL in the dataset to "song_url"
song_page <- read_html(song_url) # read the song lyrics from the URL provided

#Extract the lyrics text
song_lyrics <- song_page %>%
  html_nodes(xpath = "//div[contains(@class, 'Lyrics__Container')]") 

#Detect <br> tags and insert \n line breaks to preserve the original line structure shown on the website.
  lyrics <- song_lyrics %>%
  html_children() %>%
  map_chr(~ {
    xml_find_all(.x, ".//br") %>% xml_add_sibling("text", "\n")
    xml_text(.x)
  }) %>%
  paste(collapse = "\n")
  
  # Build a tibble that keeps only lines with actual content — skip any blank or empty rows
  lyrics_df <- tibble(
  line = 1:length(strsplit(lyrics, "\n")[[1]]),
  text = strsplit(lyrics, "\n")[[1]]
) %>%
 filter(nchar(text) > 0)

kable(lyrics_df, align = rep("l", ncol(lyrics_df)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))
line text
1 25 ContributorsStand Up Lyrics
3 I been walkin’ with my face turned to the sun
7 Weight on my shoulders, a bullet in my gun
11 Oh, I got eyes in the back of my head just in case I have to run
15 I do what I can when I can while I can for my people
19 While the clouds roll back and the stars fill the night
25 That’s when I’m gonna stand up
26 Take my people with me
30 Together we are going to a brand new home
34 Far across the river
38 Can you hear freedom calling?
39 Calling me to answer
43 Gonna keep on keepin’ on
47 I can feel it in my bones
53 Early in the mornin’ before the sun begins to shine
57 Gonna start movin’ towards that separating line
61 I’m wading through muddy waters, you know I got a made-up mind
65 And I don’t mind if I lose any blood on the way to salvation
69 And I’ll fight with the strength that I got until I die
75 So I’m gonna stand up
76 Take my people with me
80 Together we are going to a brand new home
84 Far across the river
88 Can you hear freedom calling?
89 Calling me to answer
93 Gonna keep on keepin’ on
98 And I know what’s around the bend
102 Might be hard to face ’cause I’m alone
106 And I just might fail, but Lord knows I tried
107 Sure as stars fill up the sky
113 Stand up
114 Take my people with me
118 Together we are going to a brand new home
122 Far across the river
126 Can you hear freedom calling?
127 Calling me to answer
131 Gonna keep on keepin’ on
135 I’m gonna stand up
136 Take my people with me
140 Together we are going to a brand new home
144 Far across the river
148 Do you hear freedom calling?
149 Calling me to answer
153 Gonna keep on keepin’ on
157 I’m gonna stand up
158 Take my people with me
162 Together we are going to a brand new home
166 Far across the river
170 I hear freedom calling
171 Calling me to answer
175 Gonna keep on keepin’ on
179 I can feel it in my bones
184 I go to prepare a place for you
185 I go to prepare a place for you
186 I go to prepare a place for you
187 I go to prepare a place for you

Tokenize Lyrics

unnested_lyrics <- lyrics_df %>%
  unnest_tokens(word, text)

kable(head(unnested_lyrics,20), align = rep("l", ncol(unnested_lyrics))) %>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))%>%
  scroll_box(height = "300px")
line word
1 25
1 contributorsstand
1 up
1 lyrics
3 i
3 been
3 walkin
3 with
3 my
3 face
3 turned
3 to
3 the
3 sun
7 weight
7 on
7 my
7 shoulders
7 a
7 bullet

Sentiment Analysis

NRC Lexicon Analysis

NRC Lexicon

The NRC lexicon categorizes words into emotions (anger, fear, joy, etc.) and general sentiments (positive/negative). Some words may have multiple labels.

#Count sentiment-tagged words
nrc_lex <- get_sentiments("nrc")
unnested_lyrics %>%
inner_join(nrc_lex, by = "word")%>%
  count(word,sentiment, sort=TRUE)
## Warning in inner_join(., nrc_lex, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
## # A tibble: 53 × 3
##    word    sentiment        n
##    <chr>   <chr>        <int>
##  1 freedom joy              5
##  2 freedom positive         5
##  3 freedom trust            5
##  4 prepare anticipation     4
##  5 prepare positive         4
##  6 fill    trust            2
##  7 sun     anticipation     2
##  8 sun     joy              2
##  9 sun     positive         2
## 10 sun     surprise         2
## # ℹ 43 more rows

Total NRC sentiment counts.

unnested_lyrics %>%
inner_join(nrc_lex, by = "word")%>%
  count(sentiment, sort=TRUE)
## Warning in inner_join(., nrc_lex, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
## # A tibble: 10 × 2
##    sentiment        n
##    <chr>        <int>
##  1 positive        17
##  2 trust           13
##  3 anticipation     9
##  4 joy              9
##  5 negative         8
##  6 fear             6
##  7 disgust          4
##  8 sadness          4
##  9 surprise         4
## 10 anger            3

Sentiment Bucketing

Place the sentiments into 3 buckets, positive, negative, and neutral.

nrc_sentiment <- unnested_lyrics %>%
  inner_join(get_sentiments("nrc"), by = "word") %>%
  mutate(sentiment_bucket = case_when(
    sentiment %in% c("positive", "trust", "joy") ~ "Positive",
    sentiment %in% c("negative", "fear", "disgust", "anger") ~ "Negative",
    TRUE ~ "Neutral"
  ))
## Warning in inner_join(., get_sentiments("nrc"), by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
kable(nrc_sentiment, align = rep('l', ncol(nrc_sentiment)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))
line word sentiment sentiment_bucket
3 sun anticipation Neutral
3 sun joy Positive
3 sun positive Positive
3 sun surprise Neutral
3 sun trust Positive
7 weight anticipation Neutral
7 weight disgust Negative
7 weight fear Negative
7 weight joy Positive
7 weight negative Negative
7 weight positive Positive
7 weight sadness Neutral
7 weight surprise Neutral
7 weight trust Positive
7 gun anger Negative
7 gun fear Negative
7 gun negative Negative
11 case fear Negative
11 case negative Negative
11 case sadness Neutral
19 fill trust Positive
38 freedom joy Positive
38 freedom positive Positive
38 freedom trust Positive
53 sun anticipation Neutral
53 sun joy Positive
53 sun positive Positive
53 sun surprise Neutral
53 sun trust Positive
53 shine positive Positive
57 start anticipation Neutral
61 muddy disgust Negative
61 muddy negative Negative
65 lose anger Negative
65 lose disgust Negative
65 lose fear Negative
65 lose negative Negative
65 lose sadness Neutral
65 lose surprise Neutral
65 salvation anticipation Neutral
65 salvation joy Positive
65 salvation positive Positive
65 salvation trust Positive
69 fight anger Negative
69 fight fear Negative
69 fight negative Negative
69 strength positive Positive
69 strength trust Positive
69 die fear Negative
69 die negative Negative
69 die sadness Neutral
88 freedom joy Positive
88 freedom positive Positive
88 freedom trust Positive
106 lord disgust Negative
106 lord negative Negative
106 lord positive Positive
106 lord trust Positive
107 fill trust Positive
107 sky positive Positive
126 freedom joy Positive
126 freedom positive Positive
126 freedom trust Positive
148 freedom joy Positive
148 freedom positive Positive
148 freedom trust Positive
170 freedom joy Positive
170 freedom positive Positive
170 freedom trust Positive
184 prepare anticipation Neutral
184 prepare positive Positive
185 prepare anticipation Neutral
185 prepare positive Positive
186 prepare anticipation Neutral
186 prepare positive Positive
187 prepare anticipation Neutral
187 prepare positive Positive

NRC Sentiment Distribution

Graph the nrc_sentiment percentages

nrc_sentiment %>%
  count(sentiment_bucket) %>%
  mutate(percent = n / sum(n)*100) %>%
  ggplot(aes(x = sentiment_bucket, y = percent, fill = sentiment_bucket)) +
  geom_col(width = 0.75, show.legend = FALSE) +
  scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"))+
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  labs(title = "Sentiment Distribution in Lyrics (NRC Lexicon)",
       subtitle = "Based on word-level sentiment analysis",
       x = "Sentiment Category", y = "Percentage of Words")+
  geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)

Word Cloud Visualizations

This word cloud visualizes the most frequently used words in the lyrics after removing stop words. Words that appear more often are displayed in larger font sizes, making it easy to identify dominant themes or repeated expressions.

nrc_sentiment %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n))#, max.words = 100))
## Joining with `by = join_by(word)`

This word cloud shows how frequently sentiment-tagged words appear in the lyrics. Larger words occur more often, and their color indicates whether they’re associated with positive, negative, or neutral emotion. While the layout loosely groups words near sentiment labels, it’s the color that truly defines their emotional tone.”

#Forcing a level so that the assigned colors align with each bucket listed.
nrc_sentiment$sentiment_bucket <- factor(nrc_sentiment$sentiment_bucket,
                                         levels = c("Positive", "Negative", "Neutral"))
word_freq2 <- nrc_sentiment %>%
  anti_join(stop_words, by = "word") %>%
  count(word, sentiment_bucket, sort = TRUE)

word_matrix <- acast(word_freq2, word ~ sentiment_bucket, value.var = "n", fill = 0)

# Plot comparison word cloud with custom colors
comparison.cloud(word_matrix,
                 colors = c("turquoise", "maroon", "grey"),  # Positive, Negative, Neutral
                # max.words = 100,
                 title.size = 1.5,
                 main = "Word Cloud by Sentiment Bucket")

Detailed Sentiment Breakdown

Here’s a detailed breakdown of each sentiment bucket, showing how many words are associated with each NRC sentiment category. The bar lengths represent word counts, and the colors indicate which bucket (Positive, Negative, or Neutral) each sentiment belongs to.

nrc_sentiment %>%
  count(sentiment_bucket, sentiment) %>%
  ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment_bucket)) +
  geom_col(width = 0.75, show.legend = TRUE) +
  coord_flip() +
  facet_wrap(~ sentiment_bucket, scales = "free_y") +
  scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"),
                    name = "Sentiment Category")+
  labs(title = "Detailed NRC Sentiment Breakdown",
       x = "Sentiment", y = "Word Count")

Bing Lexicon Analysis

Bing Lexicon

The Bing lexicon provides a simpler binary classification: positive or negative.

This code analyzes the lyrics using the Bing sentiment lexicon. It joins each word with its sentiment label (positive or negative), counts how often each sentiment-tagged word appears per line, and then calculates a net sentiment score by subtracting the number of negative words from positive ones.

bing_sentiment <- unnested_lyrics %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, index = line,sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>% 
  mutate(sentiment = positive - negative)
## Joining with `by = join_by(word)`
head(bing_sentiment) #let's preview this data
## # A tibble: 6 × 5
##   word    index negative positive sentiment
##   <chr>   <int>    <int>    <int>     <int>
## 1 die        69        1        0        -1
## 2 fail      106        1        0        -1
## 3 freedom    38        0        1         1
## 4 freedom    88        0        1         1
## 5 freedom   126        0        1         1
## 6 freedom   148        0        1         1
#Calculate the percentage of negative and positive sentiments for the bing lexicon
bing_percent <- unnested_lyrics %>%
  inner_join(get_sentiments("bing"), by = "word") %>%
  count(sentiment) %>%
  mutate(percent = round(n / sum(n) *100,2))
kable(bing_percent, align = rep('l', ncol(bing_percent))) %>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))
sentiment n percent
negative 5 45.45
positive 6 54.55

Bing Sentiment Distribution

We now plot the percentages of negative and positive words

bing_percent <- bing_percent %>%
  mutate(sentiment = recode(sentiment,
                            "positive" = "Positive",
                            "negative" = "Negative"))

  ggplot(bing_percent, aes(x = sentiment, y = percent, fill = sentiment)) +
   geom_col(width = 0.75, show.legend = FALSE) +
    scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon"))+
    scale_y_continuous(labels = scales::percent_format(scale = 1)) +
    labs(title = "Sentiment Distribution (Bing Lexicon)",
    x = "Sentiment", y = "Percentage of Words")+
    geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)

AFINN Lexicon Analysis

AFINN Lexicon

The AFINN lexicon assigns numeric scores from -5 (very negative) to +5 (very positive) to each word.

This code analyzes the lyrics using the AFINN sentiment lexicon. It joins each word in the lyrics with its AFINN score, then groups by word and line index to calculate a cumulative sentiment value per word occurrence. Each word is then bucketed into “Positive”, “Negative”, or “Neutral” based on its score.

afinn_sentiment <- unnested_lyrics %>%
  inner_join(get_sentiments("afinn")) %>%
  group_by(word, index = line) %>%
  summarise(sentiment = sum(value),.groups = "drop") %>% 
mutate(sentiment_bucket = case_when(
    sentiment > 0  ~ "Positive",
    sentiment < 0 ~ "Negative",
    TRUE ~ "Neutral"))
## Joining with `by = join_by(word)`
kable(afinn_sentiment, align = rep('l', ncol(afinn_sentiment)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))
word index sentiment sentiment_bucket
alone 102 -2 Negative
die 69 -3 Negative
fail 106 -2 Negative
fight 69 -1 Negative
freedom 38 2 Positive
freedom 88 2 Positive
freedom 126 2 Positive
freedom 148 2 Positive
freedom 170 2 Positive
gun 7 -1 Negative
hard 102 -1 Negative
strength 69 2 Positive

AFINN Sentiment Distribution

We now plot percentages of negative and positive AFINN sentiments

  afinn_sentiment %>%
  count(sentiment_bucket) %>%
  mutate(percent = n / sum(n)*100) %>%
  ggplot(aes(x = sentiment_bucket, y = percent, fill = sentiment_bucket)) +
  geom_col(width = 0.75, show.legend = FALSE) +
  scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"))+
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  labs(title = "Sentiment Distribution in Lyrics (AFINN Lexicon)",
       x = "Sentiment Category", y = "Percentage of Words")+
  geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)

AFINN Score Summary

Since my graph is displaying 50% negative and 50% positive emotions, I summarize the AFINN score to determine if the overall sentiment is positive or negative.

afinn_summary <- unnested_lyrics %>%
  inner_join(get_sentiments("afinn")) %>%
  summarise(
    total_score = sum(value),
    total_positive = sum(value[value > 0]),
    total_negative = sum(value[value < 0])
  )
## Joining with `by = join_by(word)`
kable(afinn_summary, align = rep('l', ncol(afinn_summary)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))
total_score total_positive total_negative
2 12 -10

Interpretation

The AFINN summary shows that the total sentiment score is positive, and the cumulative strength of positive word scores outweighs the negative ones. Therefore, the AFINN lexicon interprets the overall emotional tone of the song as positive.

Line-Level Sentiment Analysis

We analyze sentiment at the sentence/line level using the syuzhet package’s “afinn”, “bing”, and “syuzhet” methods.

lyrics_df2 <- lyrics_df %>%
  mutate(
    syuzhet_score = get_sentiment(text, method = "syuzhet"),
    afinn_score = get_sentiment(text, method = "afinn"),
    bing_score = get_sentiment(text, method = "bing")
  )
kable(lyrics_df2, align = rep('l', ncol(lyrics_df2)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))
line text syuzhet_score afinn_score bing_score
1 25 ContributorsStand Up Lyrics 0.00 0 0
3 I been walkin’ with my face turned to the sun 0.60 0 0
7 Weight on my shoulders, a bullet in my gun -0.50 -1 0
11 Oh, I got eyes in the back of my head just in case I have to run 0.00 0 0
15 I do what I can when I can while I can for my people 0.00 0 0
19 While the clouds roll back and the stars fill the night 0.00 0 0
25 That’s when I’m gonna stand up 0.00 0 0
26 Take my people with me 0.00 0 0
30 Together we are going to a brand new home 0.80 0 0
34 Far across the river 0.00 0 0
38 Can you hear freedom calling? 0.75 2 1
39 Calling me to answer 0.00 0 0
43 Gonna keep on keepin’ on 0.00 0 0
47 I can feel it in my bones 0.00 0 0
53 Early in the mornin’ before the sun begins to shine 1.10 0 1
57 Gonna start movin’ towards that separating line 0.00 0 0
61 I’m wading through muddy waters, you know I got a made-up mind -0.25 -2 -1
65 And I don’t mind if I lose any blood on the way to salvation 0.75 2 -1
69 And I’ll fight with the strength that I got until I die -0.75 -2 -1
75 So I’m gonna stand up 0.00 0 0
76 Take my people with me 0.00 0 0
80 Together we are going to a brand new home 0.80 0 0
84 Far across the river 0.00 0 0
88 Can you hear freedom calling? 0.75 2 1
89 Calling me to answer 0.00 0 0
93 Gonna keep on keepin’ on 0.00 0 0
98 And I know what’s around the bend 0.00 0 0
102 Might be hard to face ’cause I’m alone -0.85 -3 -1
106 And I just might fail, but Lord knows I tried -0.75 -2 -1
107 Sure as stars fill up the sky 0.00 0 0
113 Stand up 0.00 0 0
114 Take my people with me 0.00 0 0
118 Together we are going to a brand new home 0.80 0 0
122 Far across the river 0.00 0 0
126 Can you hear freedom calling? 0.75 2 1
127 Calling me to answer 0.00 0 0
131 Gonna keep on keepin’ on 0.00 0 0
135 I’m gonna stand up 0.00 0 0
136 Take my people with me 0.00 0 0
140 Together we are going to a brand new home 0.80 0 0
144 Far across the river 0.00 0 0
148 Do you hear freedom calling? 0.75 2 1
149 Calling me to answer 0.00 0 0
153 Gonna keep on keepin’ on 0.00 0 0
157 I’m gonna stand up 0.00 0 0
158 Take my people with me 0.00 0 0
162 Together we are going to a brand new home 0.80 0 0
166 Far across the river 0.00 0 0
170 I hear freedom calling 0.75 2 1
171 Calling me to answer 0.00 0 0
175 Gonna keep on keepin’ on 0.00 0 0
179 I can feel it in my bones 0.00 0 0
184 I go to prepare a place for you 0.10 0 0
185 I go to prepare a place for you 0.10 0 0
186 I go to prepare a place for you 0.10 0 0
187 I go to prepare a place for you 0.10 0 0

Comparison of Methods

Graph each method’s score to compare

#Reshape data to long format to plot comparison
lyrics_long <- lyrics_df2 %>%
  select(line, afinn_score, bing_score, syuzhet_score) %>%
  pivot_longer(cols = c(afinn_score, bing_score, syuzhet_score), names_to = "method", values_to = "score") %>%
 mutate(method = str_remove(method, "_score") %>% str_to_title())

#Plot the data
ggplot(lyrics_long, aes(x = line, y = score, color = method)) +
  geom_line(linewidth = 0.5) +
  geom_point(size = 1.5, alpha = 0.7) +
  labs(
    title = "Sentiment Scores Across Lyrics",
    subtitle = "Comparing three sentiment analysis methods",
    x = "Lyric Line Number",
    y = "Sentiment Score",
    color = "Method"
  ) +
  theme_minimal()

Sentiment Summary

Total score calculated for each sentiment method

sentiment_score_summary <- lyrics_df2 %>%
  summarise(
    total_afinn_score = sum(afinn_score),
    total_bing_score = sum(bing_score),
    total_syuzhet_score = sum(syuzhet_score)
  )
kable(sentiment_score_summary, align = rep('l', ncol(sentiment_score_summary)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))
total_afinn_score total_bing_score total_syuzhet_score
2 1 7.5

Key Finding:

All three methods produce positive total scores, indicating that “Stand Up” by Cynthia Erivo is classified as a positive song across all sentiment analysis approaches.

CONCLUSION:

This analysis examined the song “Stand Up” by Cynthia Erivo using multiple sentiment analysis approaches:

  1. Word-level analysis using NRC, Bing, and AFINN lexicons
  2. Line-level analysis using the syuzhet package with three different methods

Key Findings:

  • The NRC lexicon shows a predominance of positive emotions (trust, joy, anticipation)
  • The Bing lexicon indicates approximately 55% positive words
  • The AFINN lexicon produces a total positive score (Total Score = 2)
  • All line-level sentiment methods (afinn, bing, syuzhet) yield positive total scores

Conclusion:

Based on the evidence across multiple sentiment analysis methods, we can conclude that the song “Stand Up” by artist “Cynthia Erivo” is a positive song.

References

Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. O’Reilly Media. https://www.tidytextmining.com