In this R Markdown file, I will explore the tidytext package and demonstrate how multiple sentiment lexicons can be used to infer whether a section of text conveys a positive or negative sentiment.
I’ll begin by configuring the R Markdown settings and loading the necessary packages for analysis. The primary example code from Chapter 2 of Text Mining with R: A Tidy Approach by Julia Silge and David Robinson will serve as my foundation, which I will then extend. This book provides a practical framework for applying tidy principles to text analysis and is available online at https://www.tidytextmining.com (Silge & Robinson, 2017).
This is a setup chunk that configures global options for all subsequent code chunks in the document. “include=FALSE” means the chunk itself won’t appear in the final rendered document (neither code nor output).
“knitr::opts_chunk$set(echo = TRUE)” sets the default behavior so that code will be shown (echoed) in the output for all other chunks unless overridden. It’s useful for transparency and teaching.
This sets the CRAN mirror to a specific URL (https://cran.rstudio.com) so that any package installations during the session use this repository.
“echo=TRUE” means the code will be displayed in the rendered document, which can be helpful for reproducibility or documentation.
options(repos = c(CRAN = "https://cran.rstudio.com"))
The following code automatically installs and loads required packages.
req_packages <- c("DBI","RMySQL","dplyr","dbplyr","knitr","tidyr", "readr", "stringr","tibble", "rmarkdown", "purrr", "lubridate", "here", "httr2", "httr", "janitor", "RCurl","rvest","xml2","jsonlite","kableExtra", "tidytext","janeaustenr", "geniusr","sentimentr","syuzhet","ggplot2","ggwordcloud","wordcloud2","wordcloud","reshape2")
for (pkg in req_packages) {
if (!require(pkg, character.only = TRUE)) {
message(paste("Installing package:", pkg))
install.packages(pkg, dependencies = TRUE)
} else {
message(paste(pkg, " already installed."))
}
library(pkg, character.only = TRUE)
}
## Loading required package: DBI
## DBI already installed.
## Loading required package: RMySQL
## RMySQL already installed.
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## dplyr already installed.
## Loading required package: dbplyr
##
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
##
## ident, sql
## dbplyr already installed.
## Loading required package: knitr
## knitr already installed.
## Loading required package: tidyr
## tidyr already installed.
## Loading required package: readr
## readr already installed.
## Loading required package: stringr
## stringr already installed.
## Loading required package: tibble
## tibble already installed.
## Loading required package: rmarkdown
## rmarkdown already installed.
## Loading required package: purrr
## purrr already installed.
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## lubridate already installed.
## Loading required package: here
## here() starts at /Users/paulabrown/Documents/CUNY SPS- Data 607/Week 10 Assignments
## here already installed.
## Loading required package: httr2
## httr2 already installed.
## Loading required package: httr
## httr already installed.
## Loading required package: janitor
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
## janitor already installed.
## Loading required package: RCurl
##
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
##
## complete
## RCurl already installed.
## Loading required package: rvest
##
## Attaching package: 'rvest'
## The following object is masked from 'package:readr':
##
## guess_encoding
## rvest already installed.
## Loading required package: xml2
##
## Attaching package: 'xml2'
## The following object is masked from 'package:httr2':
##
## url_parse
## xml2 already installed.
## Loading required package: jsonlite
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
## jsonlite already installed.
## Loading required package: kableExtra
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
## kableExtra already installed.
## Loading required package: tidytext
## tidytext already installed.
## Loading required package: janeaustenr
## janeaustenr already installed.
## Loading required package: geniusr
## geniusr already installed.
## Loading required package: sentimentr
## sentimentr already installed.
## Loading required package: syuzhet
##
## Attaching package: 'syuzhet'
## The following object is masked from 'package:sentimentr':
##
## get_sentences
## syuzhet already installed.
## Loading required package: ggplot2
## ggplot2 already installed.
## Loading required package: ggwordcloud
## ggwordcloud already installed.
## Loading required package: wordcloud2
## wordcloud2 already installed.
## Loading required package: wordcloud
## Loading required package: RColorBrewer
## wordcloud already installed.
## Loading required package: reshape2
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
## reshape2 already installed.
Following the approach from Chapter 2, we tokenize Jane Austen’s works:
tidy_books <- austen_books() %>%
group_by(book) %>%
mutate(
linenumber = row_number(),
chapter = cumsum(str_detect(text,
regex("^chapter [\\divxlc]",
ignore_case = TRUE)))) %>%
ungroup() %>%
unnest_tokens(word, text)
kable(head(tidy_books,5), align = rep('l', ncol(tidy_books)))%>% #Preview first 5 rows of tidied book data
kable_styling(bootstrap_options = c("basic","bordered","condensed"))
| book | linenumber | chapter | word |
|---|---|---|---|
| Sense & Sensibility | 1 | 0 | sense |
| Sense & Sensibility | 1 | 0 | and |
| Sense & Sensibility | 1 | 0 | sensibility |
| Sense & Sensibility | 3 | 0 | by |
| Sense & Sensibility | 3 | 0 | jane |
I now explore text mining using a different corpus. The corpus will
consist of song lyrics scraped from the internet using an API token from
Genius.com via the geniusr package.
Retrieve the artist ID via the “search_artist()” function to locate song lyrics.
#genius_api_tk <- Sys.getenv("GENIUS_API_TOKEN")
#genius_token(genius_api_tk) #Since there are issues with the genius_token() function in the package geniusr, we will do a workaround that may be redundant but gets the job done.
Sys.setenv("GENIUS_API_TOKEN"=Sys.getenv("GENIUS_API_TOKEN"))
# Search artist
artist_raw <- search_artist("Cynthia Erivo")
# Convert to data frame without altering column names
artist_df <- as.data.frame(artist_raw)
# Display with styling
kable(artist_df, align = rep("l", ncol(artist_df))) %>%
kable_styling(position = "left", bootstrap_options = c("basic", "bordered", "condensed")) %>%
scroll_box(height = "150px")
| artist_id | artist_name | artist_url |
|---|---|---|
| 263409 | Cynthia Erivo | https://genius.com/artists/Cynthia-erivo |
| 26507 | Ariana Grande | https://genius.com/artists/Ariana-grande |
Find a song to analyze the lyrics.
#Search songs
songs <- search_song("Stand Up")
#Filter songs for artist ID = 263409 (Cynthia Erivo)
artist_songs <- songs %>%
filter(artist_id == 263409)
#Display filtered songs
kable(artist_songs, align = rep("l", ncol(artist_songs))) %>% # "align = rep("l", ncol(your_table)))" aligns text to the left
kable_styling(position = "left", bootstrap_options = c("basic","condensed","bordered"))%>%
scroll_box(height = "100px")
| song_id | song_name | song_lyrics_url | artist_id | artist_name |
|---|---|---|---|---|
| 4971319 | Stand Up | https://genius.com/Cynthia-erivo-stand-up-lyrics | 263409 | Cynthia Erivo |
token <- Sys.getenv("GENIUS_ACCESS_TOKEN") #Make sure this returns your token
res <- GET(
url = "https://api.genius.com/songs/4971319",
add_headers(Authorization = paste("Bearer", token))
)
content <- content(res, as = "parsed", simplifyVector = TRUE)
#str(content)
song_meta <- content$response$song
artist_song_info <- tibble(
title = song_meta$title,
artist = song_meta$artist_names,
url = song_meta$url,
release_date = song_meta$release_date,
pageviews = song_meta$pageviews,
annotation_count = song_meta$annotation_count
)
kable(artist_song_info, align = rep("l", ncol(artist_songs))) %>%
kable_styling(bootstrap_options = c("basic","condensed","bordered")) %>%
scroll_box(height = "100px")
| title | artist | url | release_date | annotation_count |
|---|---|---|---|---|
| Stand Up | Cynthia Erivo | https://genius.com/Cynthia-erivo-stand-up-lyrics | 2019-10-25 | 21 |
song_url <- artist_song_info$url # assign URL in the dataset to "song_url"
song_page <- read_html(song_url) # read the song lyrics from the URL provided
#Extract the lyrics text
song_lyrics <- song_page %>%
html_nodes(xpath = "//div[contains(@class, 'Lyrics__Container')]")
#Detect <br> tags and insert \n line breaks to preserve the original line structure shown on the website.
lyrics <- song_lyrics %>%
html_children() %>%
map_chr(~ {
xml_find_all(.x, ".//br") %>% xml_add_sibling("text", "\n")
xml_text(.x)
}) %>%
paste(collapse = "\n")
# Build a tibble that keeps only lines with actual content — skip any blank or empty rows
lyrics_df <- tibble(
line = 1:length(strsplit(lyrics, "\n")[[1]]),
text = strsplit(lyrics, "\n")[[1]]
) %>%
filter(nchar(text) > 0)
kable(lyrics_df, align = rep("l", ncol(lyrics_df)))%>%
kable_styling(bootstrap_options = c("basic","bordered","condensed"))
| line | text |
|---|---|
| 1 | 25 ContributorsStand Up Lyrics |
| 3 | I been walkin’ with my face turned to the sun |
| 7 | Weight on my shoulders, a bullet in my gun |
| 11 | Oh, I got eyes in the back of my head just in case I have to run |
| 15 | I do what I can when I can while I can for my people |
| 19 | While the clouds roll back and the stars fill the night |
| 25 | That’s when I’m gonna stand up |
| 26 | Take my people with me |
| 30 | Together we are going to a brand new home |
| 34 | Far across the river |
| 38 | Can you hear freedom calling? |
| 39 | Calling me to answer |
| 43 | Gonna keep on keepin’ on |
| 47 | I can feel it in my bones |
| 53 | Early in the mornin’ before the sun begins to shine |
| 57 | Gonna start movin’ towards that separating line |
| 61 | I’m wading through muddy waters, you know I got a made-up mind |
| 65 | And I don’t mind if I lose any blood on the way to salvation |
| 69 | And I’ll fight with the strength that I got until I die |
| 75 | So I’m gonna stand up |
| 76 | Take my people with me |
| 80 | Together we are going to a brand new home |
| 84 | Far across the river |
| 88 | Can you hear freedom calling? |
| 89 | Calling me to answer |
| 93 | Gonna keep on keepin’ on |
| 98 | And I know what’s around the bend |
| 102 | Might be hard to face ’cause I’m alone |
| 106 | And I just might fail, but Lord knows I tried |
| 107 | Sure as stars fill up the sky |
| 113 | Stand up |
| 114 | Take my people with me |
| 118 | Together we are going to a brand new home |
| 122 | Far across the river |
| 126 | Can you hear freedom calling? |
| 127 | Calling me to answer |
| 131 | Gonna keep on keepin’ on |
| 135 | I’m gonna stand up |
| 136 | Take my people with me |
| 140 | Together we are going to a brand new home |
| 144 | Far across the river |
| 148 | Do you hear freedom calling? |
| 149 | Calling me to answer |
| 153 | Gonna keep on keepin’ on |
| 157 | I’m gonna stand up |
| 158 | Take my people with me |
| 162 | Together we are going to a brand new home |
| 166 | Far across the river |
| 170 | I hear freedom calling |
| 171 | Calling me to answer |
| 175 | Gonna keep on keepin’ on |
| 179 | I can feel it in my bones |
| 184 | I go to prepare a place for you |
| 185 | I go to prepare a place for you |
| 186 | I go to prepare a place for you |
| 187 | I go to prepare a place for you |
unnested_lyrics <- lyrics_df %>%
unnest_tokens(word, text)
kable(head(unnested_lyrics,20), align = rep("l", ncol(unnested_lyrics))) %>%
kable_styling(bootstrap_options = c("basic","bordered","condensed"))%>%
scroll_box(height = "300px")
| line | word |
|---|---|
| 1 | 25 |
| 1 | contributorsstand |
| 1 | up |
| 1 | lyrics |
| 3 | i |
| 3 | been |
| 3 | walkin |
| 3 | with |
| 3 | my |
| 3 | face |
| 3 | turned |
| 3 | to |
| 3 | the |
| 3 | sun |
| 7 | weight |
| 7 | on |
| 7 | my |
| 7 | shoulders |
| 7 | a |
| 7 | bullet |
The NRC lexicon categorizes words into emotions (anger, fear, joy, etc.) and general sentiments (positive/negative). Some words may have multiple labels.
#Count sentiment-tagged words
nrc_lex <- get_sentiments("nrc")
unnested_lyrics %>%
inner_join(nrc_lex, by = "word")%>%
count(word,sentiment, sort=TRUE)
## Warning in inner_join(., nrc_lex, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 53 × 3
## word sentiment n
## <chr> <chr> <int>
## 1 freedom joy 5
## 2 freedom positive 5
## 3 freedom trust 5
## 4 prepare anticipation 4
## 5 prepare positive 4
## 6 fill trust 2
## 7 sun anticipation 2
## 8 sun joy 2
## 9 sun positive 2
## 10 sun surprise 2
## # ℹ 43 more rows
Total NRC sentiment counts.
unnested_lyrics %>%
inner_join(nrc_lex, by = "word")%>%
count(sentiment, sort=TRUE)
## Warning in inner_join(., nrc_lex, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 10 × 2
## sentiment n
## <chr> <int>
## 1 positive 17
## 2 trust 13
## 3 anticipation 9
## 4 joy 9
## 5 negative 8
## 6 fear 6
## 7 disgust 4
## 8 sadness 4
## 9 surprise 4
## 10 anger 3
Place the sentiments into 3 buckets, positive, negative, and neutral.
nrc_sentiment <- unnested_lyrics %>%
inner_join(get_sentiments("nrc"), by = "word") %>%
mutate(sentiment_bucket = case_when(
sentiment %in% c("positive", "trust", "joy") ~ "Positive",
sentiment %in% c("negative", "fear", "disgust", "anger") ~ "Negative",
TRUE ~ "Neutral"
))
## Warning in inner_join(., get_sentiments("nrc"), by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 12060 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
kable(nrc_sentiment, align = rep('l', ncol(nrc_sentiment)))%>%
kable_styling(bootstrap_options = c("basic","bordered","condensed"))
| line | word | sentiment | sentiment_bucket |
|---|---|---|---|
| 3 | sun | anticipation | Neutral |
| 3 | sun | joy | Positive |
| 3 | sun | positive | Positive |
| 3 | sun | surprise | Neutral |
| 3 | sun | trust | Positive |
| 7 | weight | anticipation | Neutral |
| 7 | weight | disgust | Negative |
| 7 | weight | fear | Negative |
| 7 | weight | joy | Positive |
| 7 | weight | negative | Negative |
| 7 | weight | positive | Positive |
| 7 | weight | sadness | Neutral |
| 7 | weight | surprise | Neutral |
| 7 | weight | trust | Positive |
| 7 | gun | anger | Negative |
| 7 | gun | fear | Negative |
| 7 | gun | negative | Negative |
| 11 | case | fear | Negative |
| 11 | case | negative | Negative |
| 11 | case | sadness | Neutral |
| 19 | fill | trust | Positive |
| 38 | freedom | joy | Positive |
| 38 | freedom | positive | Positive |
| 38 | freedom | trust | Positive |
| 53 | sun | anticipation | Neutral |
| 53 | sun | joy | Positive |
| 53 | sun | positive | Positive |
| 53 | sun | surprise | Neutral |
| 53 | sun | trust | Positive |
| 53 | shine | positive | Positive |
| 57 | start | anticipation | Neutral |
| 61 | muddy | disgust | Negative |
| 61 | muddy | negative | Negative |
| 65 | lose | anger | Negative |
| 65 | lose | disgust | Negative |
| 65 | lose | fear | Negative |
| 65 | lose | negative | Negative |
| 65 | lose | sadness | Neutral |
| 65 | lose | surprise | Neutral |
| 65 | salvation | anticipation | Neutral |
| 65 | salvation | joy | Positive |
| 65 | salvation | positive | Positive |
| 65 | salvation | trust | Positive |
| 69 | fight | anger | Negative |
| 69 | fight | fear | Negative |
| 69 | fight | negative | Negative |
| 69 | strength | positive | Positive |
| 69 | strength | trust | Positive |
| 69 | die | fear | Negative |
| 69 | die | negative | Negative |
| 69 | die | sadness | Neutral |
| 88 | freedom | joy | Positive |
| 88 | freedom | positive | Positive |
| 88 | freedom | trust | Positive |
| 106 | lord | disgust | Negative |
| 106 | lord | negative | Negative |
| 106 | lord | positive | Positive |
| 106 | lord | trust | Positive |
| 107 | fill | trust | Positive |
| 107 | sky | positive | Positive |
| 126 | freedom | joy | Positive |
| 126 | freedom | positive | Positive |
| 126 | freedom | trust | Positive |
| 148 | freedom | joy | Positive |
| 148 | freedom | positive | Positive |
| 148 | freedom | trust | Positive |
| 170 | freedom | joy | Positive |
| 170 | freedom | positive | Positive |
| 170 | freedom | trust | Positive |
| 184 | prepare | anticipation | Neutral |
| 184 | prepare | positive | Positive |
| 185 | prepare | anticipation | Neutral |
| 185 | prepare | positive | Positive |
| 186 | prepare | anticipation | Neutral |
| 186 | prepare | positive | Positive |
| 187 | prepare | anticipation | Neutral |
| 187 | prepare | positive | Positive |
Graph the nrc_sentiment percentages
nrc_sentiment %>%
count(sentiment_bucket) %>%
mutate(percent = n / sum(n)*100) %>%
ggplot(aes(x = sentiment_bucket, y = percent, fill = sentiment_bucket)) +
geom_col(width = 0.75, show.legend = FALSE) +
scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"))+
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(title = "Sentiment Distribution in Lyrics (NRC Lexicon)",
subtitle = "Based on word-level sentiment analysis",
x = "Sentiment Category", y = "Percentage of Words")+
geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)
This word cloud visualizes the most frequently used words in the lyrics after removing stop words. Words that appear more often are displayed in larger font sizes, making it easy to identify dominant themes or repeated expressions.
nrc_sentiment %>%
anti_join(stop_words) %>%
count(word) %>%
with(wordcloud(word, n))#, max.words = 100))
## Joining with `by = join_by(word)`
This word cloud shows how frequently sentiment-tagged words appear in
the lyrics. Larger words occur more often, and their color indicates
whether they’re associated with positive, negative, or neutral emotion.
While the layout loosely groups words near sentiment labels, it’s the
color that truly defines their emotional tone.”
#Forcing a level so that the assigned colors align with each bucket listed.
nrc_sentiment$sentiment_bucket <- factor(nrc_sentiment$sentiment_bucket,
levels = c("Positive", "Negative", "Neutral"))
word_freq2 <- nrc_sentiment %>%
anti_join(stop_words, by = "word") %>%
count(word, sentiment_bucket, sort = TRUE)
word_matrix <- acast(word_freq2, word ~ sentiment_bucket, value.var = "n", fill = 0)
# Plot comparison word cloud with custom colors
comparison.cloud(word_matrix,
colors = c("turquoise", "maroon", "grey"), # Positive, Negative, Neutral
# max.words = 100,
title.size = 1.5,
main = "Word Cloud by Sentiment Bucket")
Here’s a detailed breakdown of each sentiment bucket, showing how many words are associated with each NRC sentiment category. The bar lengths represent word counts, and the colors indicate which bucket (Positive, Negative, or Neutral) each sentiment belongs to.
nrc_sentiment %>%
count(sentiment_bucket, sentiment) %>%
ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment_bucket)) +
geom_col(width = 0.75, show.legend = TRUE) +
coord_flip() +
facet_wrap(~ sentiment_bucket, scales = "free_y") +
scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"),
name = "Sentiment Category")+
labs(title = "Detailed NRC Sentiment Breakdown",
x = "Sentiment", y = "Word Count")
The Bing lexicon provides a simpler binary classification: positive or negative.
This code analyzes the lyrics using the Bing sentiment lexicon. It joins each word with its sentiment label (positive or negative), counts how often each sentiment-tagged word appears per line, and then calculates a net sentiment score by subtracting the number of negative words from positive ones.
bing_sentiment <- unnested_lyrics %>%
inner_join(get_sentiments("bing")) %>%
count(word, index = line,sentiment) %>%
pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
mutate(sentiment = positive - negative)
## Joining with `by = join_by(word)`
head(bing_sentiment) #let's preview this data
## # A tibble: 6 × 5
## word index negative positive sentiment
## <chr> <int> <int> <int> <int>
## 1 die 69 1 0 -1
## 2 fail 106 1 0 -1
## 3 freedom 38 0 1 1
## 4 freedom 88 0 1 1
## 5 freedom 126 0 1 1
## 6 freedom 148 0 1 1
#Calculate the percentage of negative and positive sentiments for the bing lexicon
bing_percent <- unnested_lyrics %>%
inner_join(get_sentiments("bing"), by = "word") %>%
count(sentiment) %>%
mutate(percent = round(n / sum(n) *100,2))
kable(bing_percent, align = rep('l', ncol(bing_percent))) %>%
kable_styling(bootstrap_options = c("basic","bordered","condensed"))
| sentiment | n | percent |
|---|---|---|
| negative | 5 | 45.45 |
| positive | 6 | 54.55 |
We now plot the percentages of negative and positive words
bing_percent <- bing_percent %>%
mutate(sentiment = recode(sentiment,
"positive" = "Positive",
"negative" = "Negative"))
ggplot(bing_percent, aes(x = sentiment, y = percent, fill = sentiment)) +
geom_col(width = 0.75, show.legend = FALSE) +
scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon"))+
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(title = "Sentiment Distribution (Bing Lexicon)",
x = "Sentiment", y = "Percentage of Words")+
geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)
The AFINN lexicon assigns numeric scores from -5 (very negative) to +5 (very positive) to each word.
This code analyzes the lyrics using the AFINN sentiment lexicon. It joins each word in the lyrics with its AFINN score, then groups by word and line index to calculate a cumulative sentiment value per word occurrence. Each word is then bucketed into “Positive”, “Negative”, or “Neutral” based on its score.
afinn_sentiment <- unnested_lyrics %>%
inner_join(get_sentiments("afinn")) %>%
group_by(word, index = line) %>%
summarise(sentiment = sum(value),.groups = "drop") %>%
mutate(sentiment_bucket = case_when(
sentiment > 0 ~ "Positive",
sentiment < 0 ~ "Negative",
TRUE ~ "Neutral"))
## Joining with `by = join_by(word)`
kable(afinn_sentiment, align = rep('l', ncol(afinn_sentiment)))%>%
kable_styling(bootstrap_options = c("basic","bordered","condensed"))
| word | index | sentiment | sentiment_bucket |
|---|---|---|---|
| alone | 102 | -2 | Negative |
| die | 69 | -3 | Negative |
| fail | 106 | -2 | Negative |
| fight | 69 | -1 | Negative |
| freedom | 38 | 2 | Positive |
| freedom | 88 | 2 | Positive |
| freedom | 126 | 2 | Positive |
| freedom | 148 | 2 | Positive |
| freedom | 170 | 2 | Positive |
| gun | 7 | -1 | Negative |
| hard | 102 | -1 | Negative |
| strength | 69 | 2 | Positive |
We now plot percentages of negative and positive AFINN sentiments
afinn_sentiment %>%
count(sentiment_bucket) %>%
mutate(percent = n / sum(n)*100) %>%
ggplot(aes(x = sentiment_bucket, y = percent, fill = sentiment_bucket)) +
geom_col(width = 0.75, show.legend = FALSE) +
scale_fill_manual(values = c("Positive" = "turquoise","Negative" = "maroon", "Neutral" = "grey"))+
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(title = "Sentiment Distribution in Lyrics (AFINN Lexicon)",
x = "Sentiment Category", y = "Percentage of Words")+
geom_text(aes(label = paste0(round(percent,2), "%")), vjust = -0.5)
Since my graph is displaying 50% negative and 50% positive emotions, I summarize the AFINN score to determine if the overall sentiment is positive or negative.
afinn_summary <- unnested_lyrics %>%
inner_join(get_sentiments("afinn")) %>%
summarise(
total_score = sum(value),
total_positive = sum(value[value > 0]),
total_negative = sum(value[value < 0])
)
## Joining with `by = join_by(word)`
kable(afinn_summary, align = rep('l', ncol(afinn_summary)))%>%
kable_styling(bootstrap_options = c("basic","bordered","condensed"))
| total_score | total_positive | total_negative |
|---|---|---|
| 2 | 12 | -10 |
The AFINN summary shows that the total sentiment score is positive, and the cumulative strength of positive word scores outweighs the negative ones. Therefore, the AFINN lexicon interprets the overall emotional tone of the song as positive.
We analyze sentiment at the sentence/line level using the
syuzhet package’s “afinn”, “bing”, and “syuzhet”
methods.
lyrics_df2 <- lyrics_df %>%
mutate(
syuzhet_score = get_sentiment(text, method = "syuzhet"),
afinn_score = get_sentiment(text, method = "afinn"),
bing_score = get_sentiment(text, method = "bing")
)
kable(lyrics_df2, align = rep('l', ncol(lyrics_df2)))%>%
kable_styling(bootstrap_options = c("basic","bordered","condensed"))
| line | text | syuzhet_score | afinn_score | bing_score |
|---|---|---|---|---|
| 1 | 25 ContributorsStand Up Lyrics | 0.00 | 0 | 0 |
| 3 | I been walkin’ with my face turned to the sun | 0.60 | 0 | 0 |
| 7 | Weight on my shoulders, a bullet in my gun | -0.50 | -1 | 0 |
| 11 | Oh, I got eyes in the back of my head just in case I have to run | 0.00 | 0 | 0 |
| 15 | I do what I can when I can while I can for my people | 0.00 | 0 | 0 |
| 19 | While the clouds roll back and the stars fill the night | 0.00 | 0 | 0 |
| 25 | That’s when I’m gonna stand up | 0.00 | 0 | 0 |
| 26 | Take my people with me | 0.00 | 0 | 0 |
| 30 | Together we are going to a brand new home | 0.80 | 0 | 0 |
| 34 | Far across the river | 0.00 | 0 | 0 |
| 38 | Can you hear freedom calling? | 0.75 | 2 | 1 |
| 39 | Calling me to answer | 0.00 | 0 | 0 |
| 43 | Gonna keep on keepin’ on | 0.00 | 0 | 0 |
| 47 | I can feel it in my bones | 0.00 | 0 | 0 |
| 53 | Early in the mornin’ before the sun begins to shine | 1.10 | 0 | 1 |
| 57 | Gonna start movin’ towards that separating line | 0.00 | 0 | 0 |
| 61 | I’m wading through muddy waters, you know I got a made-up mind | -0.25 | -2 | -1 |
| 65 | And I don’t mind if I lose any blood on the way to salvation | 0.75 | 2 | -1 |
| 69 | And I’ll fight with the strength that I got until I die | -0.75 | -2 | -1 |
| 75 | So I’m gonna stand up | 0.00 | 0 | 0 |
| 76 | Take my people with me | 0.00 | 0 | 0 |
| 80 | Together we are going to a brand new home | 0.80 | 0 | 0 |
| 84 | Far across the river | 0.00 | 0 | 0 |
| 88 | Can you hear freedom calling? | 0.75 | 2 | 1 |
| 89 | Calling me to answer | 0.00 | 0 | 0 |
| 93 | Gonna keep on keepin’ on | 0.00 | 0 | 0 |
| 98 | And I know what’s around the bend | 0.00 | 0 | 0 |
| 102 | Might be hard to face ’cause I’m alone | -0.85 | -3 | -1 |
| 106 | And I just might fail, but Lord knows I tried | -0.75 | -2 | -1 |
| 107 | Sure as stars fill up the sky | 0.00 | 0 | 0 |
| 113 | Stand up | 0.00 | 0 | 0 |
| 114 | Take my people with me | 0.00 | 0 | 0 |
| 118 | Together we are going to a brand new home | 0.80 | 0 | 0 |
| 122 | Far across the river | 0.00 | 0 | 0 |
| 126 | Can you hear freedom calling? | 0.75 | 2 | 1 |
| 127 | Calling me to answer | 0.00 | 0 | 0 |
| 131 | Gonna keep on keepin’ on | 0.00 | 0 | 0 |
| 135 | I’m gonna stand up | 0.00 | 0 | 0 |
| 136 | Take my people with me | 0.00 | 0 | 0 |
| 140 | Together we are going to a brand new home | 0.80 | 0 | 0 |
| 144 | Far across the river | 0.00 | 0 | 0 |
| 148 | Do you hear freedom calling? | 0.75 | 2 | 1 |
| 149 | Calling me to answer | 0.00 | 0 | 0 |
| 153 | Gonna keep on keepin’ on | 0.00 | 0 | 0 |
| 157 | I’m gonna stand up | 0.00 | 0 | 0 |
| 158 | Take my people with me | 0.00 | 0 | 0 |
| 162 | Together we are going to a brand new home | 0.80 | 0 | 0 |
| 166 | Far across the river | 0.00 | 0 | 0 |
| 170 | I hear freedom calling | 0.75 | 2 | 1 |
| 171 | Calling me to answer | 0.00 | 0 | 0 |
| 175 | Gonna keep on keepin’ on | 0.00 | 0 | 0 |
| 179 | I can feel it in my bones | 0.00 | 0 | 0 |
| 184 | I go to prepare a place for you | 0.10 | 0 | 0 |
| 185 | I go to prepare a place for you | 0.10 | 0 | 0 |
| 186 | I go to prepare a place for you | 0.10 | 0 | 0 |
| 187 | I go to prepare a place for you | 0.10 | 0 | 0 |
Graph each method’s score to compare
#Reshape data to long format to plot comparison
lyrics_long <- lyrics_df2 %>%
select(line, afinn_score, bing_score, syuzhet_score) %>%
pivot_longer(cols = c(afinn_score, bing_score, syuzhet_score), names_to = "method", values_to = "score") %>%
mutate(method = str_remove(method, "_score") %>% str_to_title())
#Plot the data
ggplot(lyrics_long, aes(x = line, y = score, color = method)) +
geom_line(linewidth = 0.5) +
geom_point(size = 1.5, alpha = 0.7) +
labs(
title = "Sentiment Scores Across Lyrics",
subtitle = "Comparing three sentiment analysis methods",
x = "Lyric Line Number",
y = "Sentiment Score",
color = "Method"
) +
theme_minimal()
Total score calculated for each sentiment method
sentiment_score_summary <- lyrics_df2 %>%
summarise(
total_afinn_score = sum(afinn_score),
total_bing_score = sum(bing_score),
total_syuzhet_score = sum(syuzhet_score)
)
kable(sentiment_score_summary, align = rep('l', ncol(sentiment_score_summary)))%>%
kable_styling(bootstrap_options = c("basic","bordered","condensed"))
| total_afinn_score | total_bing_score | total_syuzhet_score |
|---|---|---|
| 2 | 1 | 7.5 |
All three methods produce positive total scores, indicating that “Stand Up” by Cynthia Erivo is classified as a positive song across all sentiment analysis approaches.
This analysis examined the song “Stand Up” by Cynthia Erivo using multiple sentiment analysis approaches:
syuzhet
package with three different methodsKey Findings:
Conclusion:
Based on the evidence across multiple sentiment analysis methods, we can conclude that the song “Stand Up” by artist “Cynthia Erivo” is a positive song.
Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. O’Reilly Media. https://www.tidytextmining.com