INTRO

In this R Markdown file, I will explore the tidytext package and demonstrate how multiple sentiment lexicons can be used to infer whether a section of text conveys a positive or negative sentiment.

I’ll begin by configuring the R Markdown settings and loading the necessary packages for analysis. The primary example code from Chapter 2 of Text Mining with R: A Tidy Approach by Julia Silge and David Robinson will serve as my foundation, which I will then extend. This book provides a practical framework for applying tidy principles to text analysis and is available online at https://www.tidytextmining.com (Silge & Robinson, 2017).

Configuring R Markdown settings

This is a setup chunk that configures global options for all subsequent code chunks in the document. “include=FALSE” means the chunk itself won’t appear in the final rendered document (neither code nor output).

“knitr::opts_chunk$set(echo = TRUE)” sets the default behavior so that code will be shown (echoed) in the output for all other chunks unless overridden. It’s useful for transparency and teaching.

This sets the CRAN mirror to a specific URL (https://cran.rstudio.com) so that any package installations during the session use this repository.

“echo=TRUE” means the code will be displayed in the rendered document, which can be helpful for reproducibility or documentation.

options(repos = c(CRAN = "https://cran.rstudio.com"))

Install and load necessary packages

The following code automatically installs and loads required packages.

req_packages <- c("DBI","RMySQL","dplyr","dbplyr","knitr","tidyr", "readr", "stringr","tibble", "rmarkdown", "purrr", "lubridate", "here", "httr2", "httr", "janitor", "RCurl","rvest","xml2","jsonlite","kableExtra", "tidytext","janeaustenr", "geniusr","sentimentr","syuzhet","ggplot2","ggwordcloud","wordcloud2","wordcloud","reshape2")
for (pkg in req_packages) {
  if (!require(pkg, character.only = TRUE)) {
    message(paste("Installing package:", pkg))
    install.packages(pkg, dependencies = TRUE)
  } else {
    message(paste(pkg, " already installed."))
  }
  library(pkg, character.only = TRUE)
}

## Loading required package: DBI

## DBI  already installed.

## Loading required package: RMySQL

## RMySQL  already installed.

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## dplyr  already installed.

## Loading required package: dbplyr

## 
## Attaching package: 'dbplyr'

## The following objects are masked from 'package:dplyr':
## 
##     ident, sql

## dbplyr  already installed.

## Loading required package: knitr

## knitr  already installed.

## Loading required package: tidyr

## tidyr  already installed.

## Loading required package: readr

## readr  already installed.

## Loading required package: stringr

## stringr  already installed.

## Loading required package: tibble

## tibble  already installed.

## Loading required package: rmarkdown

## rmarkdown  already installed.

## Loading required package: purrr

## purrr  already installed.

## Loading required package: lubridate

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

## lubridate  already installed.

## Loading required package: here

## here() starts at /Users/paulabrown/Documents/CUNY SPS- Data 607/Week 10 Assignments

## here  already installed.

## Loading required package: httr2

## httr2  already installed.

## Loading required package: httr

## httr  already installed.

## Loading required package: janitor

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

## janitor  already installed.

## Loading required package: RCurl

## 
## Attaching package: 'RCurl'

## The following object is masked from 'package:tidyr':
## 
##     complete

## RCurl  already installed.

## Loading required package: rvest

## 
## Attaching package: 'rvest'

## The following object is masked from 'package:readr':
## 
##     guess_encoding

## rvest  already installed.

## Loading required package: xml2

## 
## Attaching package: 'xml2'

## The following object is masked from 'package:httr2':
## 
##     url_parse

## xml2  already installed.

## Loading required package: jsonlite

## 
## Attaching package: 'jsonlite'

## The following object is masked from 'package:purrr':
## 
##     flatten

## jsonlite  already installed.

## Loading required package: kableExtra

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

## kableExtra  already installed.

## Loading required package: tidytext

## tidytext  already installed.

## Loading required package: janeaustenr

## janeaustenr  already installed.

## Loading required package: geniusr

## geniusr  already installed.

## Loading required package: sentimentr

## sentimentr  already installed.

## Loading required package: syuzhet

## 
## Attaching package: 'syuzhet'

## The following object is masked from 'package:sentimentr':
## 
##     get_sentences

## syuzhet  already installed.

## Loading required package: ggplot2

## ggplot2  already installed.

## Loading required package: ggwordcloud

## ggwordcloud  already installed.

## Loading required package: wordcloud2

## wordcloud2  already installed.

## Loading required package: wordcloud

## Loading required package: RColorBrewer

## wordcloud  already installed.

## Loading required package: reshape2

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

## reshape2  already installed.

Example: Jane Austen Corpus

Following the approach from Chapter 2, we tokenize Jane Austen’s works:

tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(text, 
                                regex("^chapter [\\divxlc]", 
                                      ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)

kable(head(tidy_books,5), align = rep('l', ncol(tidy_books)))%>% #Preview first 5 rows of tidied book data
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))

book	linenumber	word
Sense & Sensibility	1	sense
Sense & Sensibility	1	and
Sense & Sensibility	1	sensibility
Sense & Sensibility	3	by
Sense & Sensibility	3	jane

Analysis of Song Lyrics

I now explore text mining using a different corpus. The corpus will consist of song lyrics scraped from the internet using an API token from Genius.com via the geniusr package.

Retrieve Artist Information

Retrieve the artist ID via the “search_artist()” function to locate song lyrics.

#genius_api_tk <- Sys.getenv("GENIUS_API_TOKEN")
#genius_token(genius_api_tk) #Since there are issues with the genius_token() function in the package geniusr, we will do a workaround that may be redundant but gets the job done.

Sys.setenv("GENIUS_API_TOKEN"=Sys.getenv("GENIUS_API_TOKEN"))

# Search artist
artist_raw <- search_artist("Cynthia Erivo")

# Convert to data frame without altering column names
artist_df <- as.data.frame(artist_raw)

# Display with styling
kable(artist_df, align = rep("l", ncol(artist_df))) %>%
  kable_styling(position = "left", bootstrap_options = c("basic", "bordered", "condensed")) %>%
  scroll_box(height = "150px")

artist_id	artist_name	artist_url
263409	Cynthia Erivo	https://genius.com/artists/Cynthia-erivo
26507	Ariana Grande	https://genius.com/artists/Ariana-grande

Find Target Song

Find a song to analyze the lyrics.

#Search songs
songs <- search_song("Stand Up")

#Filter songs for artist ID = 263409 (Cynthia Erivo)
artist_songs <- songs %>%
  filter(artist_id == 263409)

#Display filtered songs
kable(artist_songs, align = rep("l", ncol(artist_songs))) %>% # "align = rep("l", ncol(your_table)))" aligns text to the left
  kable_styling(position = "left", bootstrap_options = c("basic","condensed","bordered"))%>%
  scroll_box(height = "100px")

song_id	song_name	song_lyrics_url	artist_id	artist_name
4971319	Stand Up	https://genius.com/Cynthia-erivo-stand-up-lyrics	263409	Cynthia Erivo

Extract Lyrics from Genius API

token <- Sys.getenv("GENIUS_ACCESS_TOKEN")  #Make sure this returns your token

res <- GET(
  url = "https://api.genius.com/songs/4971319",
  add_headers(Authorization = paste("Bearer", token))
)

content <- content(res, as = "parsed", simplifyVector = TRUE)
#str(content)

song_meta <- content$response$song

artist_song_info <- tibble(
  title = song_meta$title,
  artist = song_meta$artist_names,
  url = song_meta$url,
  release_date = song_meta$release_date,
  pageviews = song_meta$pageviews,
  annotation_count = song_meta$annotation_count
)
kable(artist_song_info, align = rep("l", ncol(artist_songs))) %>%
  kable_styling(bootstrap_options = c("basic","condensed","bordered")) %>%
  scroll_box(height = "100px")

title	artist	url	release_date	annotation_count
Stand Up	Cynthia Erivo	https://genius.com/Cynthia-erivo-stand-up-lyrics	2019-10-25	21

Scrape Lyrics from the Web Page

song_url <- artist_song_info$url # assign URL in the dataset to "song_url"
song_page <- read_html(song_url) # read the song lyrics from the URL provided

#Extract the lyrics text
song_lyrics <- song_page %>%
  html_nodes(xpath = "//div[contains(@class, 'Lyrics__Container')]") 

#Detect <br> tags and insert \n line breaks to preserve the original line structure shown on the website.
  lyrics <- song_lyrics %>%
  html_children() %>%
  map_chr(~ {
    xml_find_all(.x, ".//br") %>% xml_add_sibling("text", "\n")
    xml_text(.x)
  }) %>%
  paste(collapse = "\n")
  
  # Build a tibble that keeps only lines with actual content — skip any blank or empty rows
  lyrics_df <- tibble(
  line = 1:length(strsplit(lyrics, "\n")[[1]]),
  text = strsplit(lyrics, "\n")[[1]]
) %>%
 filter(nchar(text) > 0)

kable(lyrics_df, align = rep("l", ncol(lyrics_df)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))

line	text
1	25 ContributorsStand Up Lyrics
3	I been walkin’ with my face turned to the sun
7	Weight on my shoulders, a bullet in my gun
11	Oh, I got eyes in the back of my head just in case I have to run
15	I do what I can when I can while I can for my people
19	While the clouds roll back and the stars fill the night
25	That’s when I’m gonna stand up
26	Take my people with me
30	Together we are going to a brand new home
34	Far across the river
38	Can you hear freedom calling?
39	Calling me to answer
43	Gonna keep on keepin’ on
47	I can feel it in my bones
53	Early in the mornin’ before the sun begins to shine
57	Gonna start movin’ towards that separating line
61	I’m wading through muddy waters, you know I got a made-up mind
65	And I don’t mind if I lose any blood on the way to salvation
69	And I’ll fight with the strength that I got until I die
75	So I’m gonna stand up
76	Take my people with me
80	Together we are going to a brand new home
84	Far across the river
88	Can you hear freedom calling?
89	Calling me to answer
93	Gonna keep on keepin’ on
98	And I know what’s around the bend
102	Might be hard to face ’cause I’m alone
106	And I just might fail, but Lord knows I tried
107	Sure as stars fill up the sky
113	Stand up
114	Take my people with me
118	Together we are going to a brand new home
122	Far across the river
126	Can you hear freedom calling?
127	Calling me to answer
131	Gonna keep on keepin’ on
135	I’m gonna stand up
136	Take my people with me
140	Together we are going to a brand new home
144	Far across the river
148	Do you hear freedom calling?
149	Calling me to answer
153	Gonna keep on keepin’ on
157	I’m gonna stand up
158	Take my people with me
162	Together we are going to a brand new home
166	Far across the river
170	I hear freedom calling
171	Calling me to answer
175	Gonna keep on keepin’ on
179	I can feel it in my bones
184	I go to prepare a place for you
185	I go to prepare a place for you
186	I go to prepare a place for you
187	I go to prepare a place for you

Tokenize Lyrics

unnested_lyrics <- lyrics_df %>%
  unnest_tokens(word, text)

kable(head(unnested_lyrics,20), align = rep("l", ncol(unnested_lyrics))) %>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))%>%
  scroll_box(height = "300px")

line	word
1	25
1	contributorsstand
1	up
1	lyrics
3	i
3	been
3	walkin
3	with
3	my
3	face
3	turned
3	to
3	the
3	sun
7	weight
7	on
7	my
7	shoulders
7	a
7	bullet

Sentiment Analysis

NRC Lexicon Analysis

NRC Lexicon

The NRC lexicon categorizes words into emotions (anger, fear, joy, etc.) and general sentiments (positive/negative). Some words may have multiple labels.

#Count sentiment-tagged words
nrc_lex <- get_sentiments("nrc")
unnested_lyrics %>%
inner_join(nrc_lex, by = "word")%>%
  count(word,sentiment, sort=TRUE)

## Warning in inner_join(., nrc_lex, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## # A tibble: 53 × 3
##    word    sentiment        n
##    <chr>   <chr>        <int>
##  1 freedom joy              5
##  2 freedom positive         5
##  3 freedom trust            5
##  4 prepare anticipation     4
##  5 prepare positive         4
##  6 fill    trust            2
##  7 sun     anticipation     2
##  8 sun     joy              2
##  9 sun     positive         2
## 10 sun     surprise         2
## # ℹ 43 more rows

Total NRC sentiment counts.

unnested_lyrics %>%
inner_join(nrc_lex, by = "word")%>%
  count(sentiment, sort=TRUE)

## Warning in inner_join(., nrc_lex, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## # A tibble: 10 × 2
##    sentiment        n
##    <chr>        <int>
##  1 positive        17
##  2 trust           13
##  3 anticipation     9
##  4 joy              9
##  5 negative         8
##  6 fear             6
##  7 disgust          4
##  8 sadness          4
##  9 surprise         4
## 10 anger            3

Sentiment Bucketing

Place the sentiments into 3 buckets, positive, negative, and neutral.

nrc_sentiment <- unnested_lyrics %>%
  inner_join(get_sentiments("nrc"), by = "word") %>%
  mutate(sentiment_bucket = case_when(
    sentiment %in% c("positive", "trust", "joy") ~ "Positive",
    sentiment %in% c("negative", "fear", "disgust", "anger") ~ "Negative",
    TRUE ~ "Neutral"
  ))

## Warning in inner_join(., get_sentiments("nrc"), by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

kable(nrc_sentiment, align = rep('l', ncol(nrc_sentiment)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))

line	word	sentiment	sentiment_bucket
3	sun	anticipation	Neutral
3	sun	joy	Positive
3	sun	positive	Positive
3	sun	surprise	Neutral
3	sun	trust	Positive
7	weight	anticipation	Neutral
7	weight	disgust	Negative
7	weight	fear	Negative
7	weight	joy	Positive
7	weight	negative	Negative
7	weight	positive	Positive
7	weight	sadness	Neutral
7	weight	surprise	Neutral
7	weight	trust	Positive
7	gun	anger	Negative
7	gun	fear	Negative
7	gun	negative	Negative
11	case	fear	Negative
11	case	negative	Negative
11	case	sadness	Neutral
19	fill	trust	Positive
38	freedom	joy	Positive
38	freedom	positive	Positive
38	freedom	trust	Positive
53	sun	anticipation	Neutral
53	sun	joy	Positive
53	sun	positive	Positive
53	sun	surprise	Neutral
53	sun	trust	Positive
53	shine	positive	Positive
57	start	anticipation	Neutral
61	muddy	disgust	Negative
61	muddy	negative	Negative
65	lose	anger	Negative
65	lose	disgust	Negative
65	lose	fear	Negative
65	lose	negative	Negative
65	lose	sadness	Neutral
65	lose	surprise	Neutral
65	salvation	anticipation	Neutral
65	salvation	joy	Positive
65	salvation	positive	Positive
65	salvation	trust	Positive
69	fight	anger	Negative
69	fight	fear	Negative
69	fight	negative	Negative
69	strength	positive	Positive
69	strength	trust	Positive
69	die	fear	Negative
69	die	negative	Negative
69	die	sadness	Neutral
88	freedom	joy	Positive
88	freedom	positive	Positive
88	freedom	trust	Positive
106	lord	disgust	Negative
106	lord	negative	Negative
106	lord	positive	Positive
106	lord	trust	Positive
107	fill	trust	Positive
107	sky	positive	Positive
126	freedom	joy	Positive
126	freedom	positive	Positive
126	freedom	trust	Positive
148	freedom	joy	Positive
148	freedom	positive	Positive
148	freedom	trust	Positive
170	freedom	joy	Positive
170	freedom	positive	Positive
170	freedom	trust	Positive
184	prepare	anticipation	Neutral
184	prepare	positive	Positive
185	prepare	anticipation	Neutral
185	prepare	positive	Positive
186	prepare	anticipation	Neutral
186	prepare	positive	Positive
187	prepare	anticipation	Neutral
187	prepare	positive	Positive

NRC Sentiment Distribution

Graph the nrc_sentiment percentages

nrc_sentiment %>%
  count(sentiment_bucket) %>%
  mutate(percent = n / sum(n)*100) %>%
  ggplot(aes(x = sentiment_bucket, y = percent, fill = sentiment_bucket)) +
  geom_col(width = 0.75, show.legend = FALSE) +
  scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"))+
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  labs(title = "Sentiment Distribution in Lyrics (NRC Lexicon)",
       subtitle = "Based on word-level sentiment analysis",
       x = "Sentiment Category", y = "Percentage of Words")+
  geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)

Word Cloud Visualizations

This word cloud visualizes the most frequently used words in the lyrics after removing stop words. Words that appear more often are displayed in larger font sizes, making it easy to identify dominant themes or repeated expressions.

nrc_sentiment %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n))#, max.words = 100))

## Joining with `by = join_by(word)`

This word cloud shows how frequently sentiment-tagged words appear in the lyrics. Larger words occur more often, and their color indicates whether they’re associated with positive, negative, or neutral emotion. While the layout loosely groups words near sentiment labels, it’s the color that truly defines their emotional tone.”

#Forcing a level so that the assigned colors align with each bucket listed.
nrc_sentiment$sentiment_bucket <- factor(nrc_sentiment$sentiment_bucket,
                                         levels = c("Positive", "Negative", "Neutral"))
word_freq2 <- nrc_sentiment %>%
  anti_join(stop_words, by = "word") %>%
  count(word, sentiment_bucket, sort = TRUE)

word_matrix <- acast(word_freq2, word ~ sentiment_bucket, value.var = "n", fill = 0)

# Plot comparison word cloud with custom colors
comparison.cloud(word_matrix,
                 colors = c("turquoise", "maroon", "grey"),  # Positive, Negative, Neutral
                # max.words = 100,
                 title.size = 1.5,
                 main = "Word Cloud by Sentiment Bucket")

Detailed Sentiment Breakdown

Here’s a detailed breakdown of each sentiment bucket, showing how many words are associated with each NRC sentiment category. The bar lengths represent word counts, and the colors indicate which bucket (Positive, Negative, or Neutral) each sentiment belongs to.

nrc_sentiment %>%
  count(sentiment_bucket, sentiment) %>%
  ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment_bucket)) +
  geom_col(width = 0.75, show.legend = TRUE) +
  coord_flip() +
  facet_wrap(~ sentiment_bucket, scales = "free_y") +
  scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"),
                    name = "Sentiment Category")+
  labs(title = "Detailed NRC Sentiment Breakdown",
       x = "Sentiment", y = "Word Count")

Bing Lexicon Analysis

Bing Lexicon

The Bing lexicon provides a simpler binary classification: positive or negative.

This code analyzes the lyrics using the Bing sentiment lexicon. It joins each word with its sentiment label (positive or negative), counts how often each sentiment-tagged word appears per line, and then calculates a net sentiment score by subtracting the number of negative words from positive ones.

bing_sentiment <- unnested_lyrics %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, index = line,sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>% 
  mutate(sentiment = positive - negative)

## Joining with `by = join_by(word)`

head(bing_sentiment) #let's preview this data

## # A tibble: 6 × 5
##   word    index negative positive sentiment
##   <chr>   <int>    <int>    <int>     <int>
## 1 die        69        1        0        -1
## 2 fail      106        1        0        -1
## 3 freedom    38        0        1         1
## 4 freedom    88        0        1         1
## 5 freedom   126        0        1         1
## 6 freedom   148        0        1         1

#Calculate the percentage of negative and positive sentiments for the bing lexicon
bing_percent <- unnested_lyrics %>%
  inner_join(get_sentiments("bing"), by = "word") %>%
  count(sentiment) %>%
  mutate(percent = round(n / sum(n) *100,2))
kable(bing_percent, align = rep('l', ncol(bing_percent))) %>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))

sentiment	n	percent
negative	5	45.45
positive	6	54.55

Bing Sentiment Distribution

We now plot the percentages of negative and positive words

bing_percent <- bing_percent %>%
  mutate(sentiment = recode(sentiment,
                            "positive" = "Positive",
                            "negative" = "Negative"))

  ggplot(bing_percent, aes(x = sentiment, y = percent, fill = sentiment)) +
   geom_col(width = 0.75, show.legend = FALSE) +
    scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon"))+
    scale_y_continuous(labels = scales::percent_format(scale = 1)) +
    labs(title = "Sentiment Distribution (Bing Lexicon)",
    x = "Sentiment", y = "Percentage of Words")+
    geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)

AFINN Lexicon Analysis

AFINN Lexicon

The AFINN lexicon assigns numeric scores from -5 (very negative) to +5 (very positive) to each word.

This code analyzes the lyrics using the AFINN sentiment lexicon. It joins each word in the lyrics with its AFINN score, then groups by word and line index to calculate a cumulative sentiment value per word occurrence. Each word is then bucketed into “Positive”, “Negative”, or “Neutral” based on its score.

afinn_sentiment <- unnested_lyrics %>%
  inner_join(get_sentiments("afinn")) %>%
  group_by(word, index = line) %>%
  summarise(sentiment = sum(value),.groups = "drop") %>% 
mutate(sentiment_bucket = case_when(
    sentiment > 0  ~ "Positive",
    sentiment < 0 ~ "Negative",
    TRUE ~ "Neutral"))

## Joining with `by = join_by(word)`

kable(afinn_sentiment, align = rep('l', ncol(afinn_sentiment)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))

word	index	sentiment	sentiment_bucket
alone	102	-2	Negative
die	69	-3	Negative
fail	106	-2	Negative
fight	69	-1	Negative
freedom	38	2	Positive
freedom	88	2	Positive
freedom	126	2	Positive
freedom	148	2	Positive
freedom	170	2	Positive
gun	7	-1	Negative
hard	102	-1	Negative
strength	69	2	Positive

AFINN Sentiment Distribution

We now plot percentages of negative and positive AFINN sentiments

  afinn_sentiment %>%
  count(sentiment_bucket) %>%
  mutate(percent = n / sum(n)*100) %>%
  ggplot(aes(x = sentiment_bucket, y = percent, fill = sentiment_bucket)) +
  geom_col(width = 0.75, show.legend = FALSE) +
  scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"))+
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  labs(title = "Sentiment Distribution in Lyrics (AFINN Lexicon)",
       x = "Sentiment Category", y = "Percentage of Words")+
  geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)

AFINN Score Summary

Since my graph is displaying 50% negative and 50% positive emotions, I summarize the AFINN score to determine if the overall sentiment is positive or negative.

afinn_summary <- unnested_lyrics %>%
  inner_join(get_sentiments("afinn")) %>%
  summarise(
    total_score = sum(value),
    total_positive = sum(value[value > 0]),
    total_negative = sum(value[value < 0])
  )

## Joining with `by = join_by(word)`

kable(afinn_summary, align = rep('l', ncol(afinn_summary)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))

total_score	total_positive	total_negative
2	12	-10

Interpretation

The AFINN summary shows that the total sentiment score is positive, and the cumulative strength of positive word scores outweighs the negative ones. Therefore, the AFINN lexicon interprets the overall emotional tone of the song as positive.

Line-Level Sentiment Analysis

We analyze sentiment at the sentence/line level using the syuzhet package’s “afinn”, “bing”, and “syuzhet” methods.

lyrics_df2 <- lyrics_df %>%
  mutate(
    syuzhet_score = get_sentiment(text, method = "syuzhet"),
    afinn_score = get_sentiment(text, method = "afinn"),
    bing_score = get_sentiment(text, method = "bing")
  )
kable(lyrics_df2, align = rep('l', ncol(lyrics_df2)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))

line	text	syuzhet_score	afinn_score	bing_score
1	25 ContributorsStand Up Lyrics	0.00	0	0
3	I been walkin’ with my face turned to the sun	0.60	0	0
7	Weight on my shoulders, a bullet in my gun	-0.50	-1	0
11	Oh, I got eyes in the back of my head just in case I have to run	0.00	0	0
15	I do what I can when I can while I can for my people	0.00	0	0
19	While the clouds roll back and the stars fill the night	0.00	0	0
25	That’s when I’m gonna stand up	0.00	0	0
26	Take my people with me	0.00	0	0
30	Together we are going to a brand new home	0.80	0	0
34	Far across the river	0.00	0	0
38	Can you hear freedom calling?	0.75	2	1
39	Calling me to answer	0.00	0	0
43	Gonna keep on keepin’ on	0.00	0	0
47	I can feel it in my bones	0.00	0	0
53	Early in the mornin’ before the sun begins to shine	1.10	0	1
57	Gonna start movin’ towards that separating line	0.00	0	0
61	I’m wading through muddy waters, you know I got a made-up mind	-0.25	-2	-1
65	And I don’t mind if I lose any blood on the way to salvation	0.75	2	-1
69	And I’ll fight with the strength that I got until I die	-0.75	-2	-1
75	So I’m gonna stand up	0.00	0	0
76	Take my people with me	0.00	0	0
80	Together we are going to a brand new home	0.80	0	0
84	Far across the river	0.00	0	0
88	Can you hear freedom calling?	0.75	2	1
89	Calling me to answer	0.00	0	0
93	Gonna keep on keepin’ on	0.00	0	0
98	And I know what’s around the bend	0.00	0	0
102	Might be hard to face ’cause I’m alone	-0.85	-3	-1
106	And I just might fail, but Lord knows I tried	-0.75	-2	-1
107	Sure as stars fill up the sky	0.00	0	0
113	Stand up	0.00	0	0
114	Take my people with me	0.00	0	0
118	Together we are going to a brand new home	0.80	0	0
122	Far across the river	0.00	0	0
126	Can you hear freedom calling?	0.75	2	1
127	Calling me to answer	0.00	0	0
131	Gonna keep on keepin’ on	0.00	0	0
135	I’m gonna stand up	0.00	0	0
136	Take my people with me	0.00	0	0
140	Together we are going to a brand new home	0.80	0	0
144	Far across the river	0.00	0	0
148	Do you hear freedom calling?	0.75	2	1
149	Calling me to answer	0.00	0	0
153	Gonna keep on keepin’ on	0.00	0	0
157	I’m gonna stand up	0.00	0	0
158	Take my people with me	0.00	0	0
162	Together we are going to a brand new home	0.80	0	0
166	Far across the river	0.00	0	0
170	I hear freedom calling	0.75	2	1
171	Calling me to answer	0.00	0	0
175	Gonna keep on keepin’ on	0.00	0	0
179	I can feel it in my bones	0.00	0	0
184	I go to prepare a place for you	0.10	0	0
185	I go to prepare a place for you	0.10	0	0
186	I go to prepare a place for you	0.10	0	0
187	I go to prepare a place for you	0.10	0	0

Comparison of Methods

Graph each method’s score to compare

#Reshape data to long format to plot comparison
lyrics_long <- lyrics_df2 %>%
  select(line, afinn_score, bing_score, syuzhet_score) %>%
  pivot_longer(cols = c(afinn_score, bing_score, syuzhet_score), names_to = "method", values_to = "score") %>%
 mutate(method = str_remove(method, "_score") %>% str_to_title())

#Plot the data
ggplot(lyrics_long, aes(x = line, y = score, color = method)) +
  geom_line(linewidth = 0.5) +
  geom_point(size = 1.5, alpha = 0.7) +
  labs(
    title = "Sentiment Scores Across Lyrics",
    subtitle = "Comparing three sentiment analysis methods",
    x = "Lyric Line Number",
    y = "Sentiment Score",
    color = "Method"
  ) +
  theme_minimal()

Sentiment Summary

Total score calculated for each sentiment method

sentiment_score_summary <- lyrics_df2 %>%
  summarise(
    total_afinn_score = sum(afinn_score),
    total_bing_score = sum(bing_score),
    total_syuzhet_score = sum(syuzhet_score)
  )
kable(sentiment_score_summary, align = rep('l', ncol(sentiment_score_summary)))%>%
  kable_styling(bootstrap_options = c("basic","bordered","condensed"))

total_afinn_score	total_bing_score	total_syuzhet_score
2	1	7.5

Key Finding:

All three methods produce positive total scores, indicating that “Stand Up” by Cynthia Erivo is classified as a positive song across all sentiment analysis approaches.

CONCLUSION:

This analysis examined the song “Stand Up” by Cynthia Erivo using multiple sentiment analysis approaches:

Word-level analysis using NRC, Bing, and AFINN lexicons
Line-level analysis using the syuzhet package with three different methods

Key Findings:

The NRC lexicon shows a predominance of positive emotions (trust, joy, anticipation)
The Bing lexicon indicates approximately 55% positive words
The AFINN lexicon produces a total positive score (Total Score = 2)
All line-level sentiment methods (afinn, bing, syuzhet) yield positive total scores

Conclusion:

Based on the evidence across multiple sentiment analysis methods, we can conclude that the song “Stand Up” by artist “Cynthia Erivo” is a positive song.

References

Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. O’Reilly Media. https://www.tidytextmining.com

Assignment 10A - Sentiment Analysis

Paula Brown

2025-11-01