Social Media and Sentiment Analysis with Reddit Threads on Metal
Music
This is Major Assignment 4 in CP8883: Introduction to Urban
Analytics. The objective of this assignment is to download, analyze, and
visualize Reddit threads based on the keyword “metal music”.
I have prioritized readability and providing insightful analysis in
this assignment.
Step 3: Clean and tokenize text data
# Tokenization (word tokens)
words <- threads_1 %>%
unnest_tokens(output = word, input = text, token = "words")
words %>%
count(word, sort = TRUE) %>%
top_n(20) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "words",
y = "counts",
title = "Unique wordcounts")
## Selecting by n

# load list of stop words - from the tidytext package
data("stop_words")
# view random 50 words
print(stop_words$word[sample(1:nrow(stop_words), 50)])
## [1] "where" "thanks" "become" "that" "too"
## [6] "needs" "with" "ain't" "numbers" "welcome"
## [11] "just" "group" "particular" "my" "that's"
## [16] "with" "zero" "over" "a" "rd"
## [21] "until" "wish" "do" "very" "furthers"
## [26] "only" "didn't" "who" "inc" "been"
## [31] "showed" "himself" "ever" "both" "myself"
## [36] "did" "who's" "before" "further" "end"
## [41] "how's" "state" "after" "nine" "your"
## [46] "namely" "self" "into" "or" "four"
# Regex that matches URL-type string
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&|<|>"
words_clean <- threads_1 %>%
# drop URLs
mutate(text = str_replace_all(text, replace_reg, "")) %>%
# Tokenization (word tokens)
unnest_tokens(word, text, token = "words") %>%
# drop stop words
anti_join(stop_words, by = "word") %>%
# drop non-alphabet-only strings
filter(str_detect(word, "[a-z]"))
# Check the number of rows after removal of the stop words. There should be fewer words now
print(
glue::glue("Before: {nrow(words)}, After: {nrow(words_clean)}")
)
## Before: 9117, After: 3376
plot_words_clean <-
words_clean %>%
count(word, sort = TRUE) %>%
top_n(20, n) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "words",
y = "counts",
title = "Unique wordcounts")
plot_words_clean

Step 4: Word cloud
Instruction: Generate a word cloud that illustrates the frequency of
words except your keyword.
n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark or bright
df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))
plot_word_cloud <-
words_clean %>%
count(word, sort = TRUE) %>%
wordcloud2(color = pal,
minRotation = 0,
maxRotation = 0,
ellipticity = 0.8)
plot_word_cloud
Step 5: tri-gram analysis
Instruction: 1. Extract tri-grams from text data. 2.Remove tri-grams
containing stop words or non-alphabetic terms. 3.Present the frequency
of tri-grams in a table. 4.Discuss any noteworthy tri-grams you come
across.
words_ngram <- threads_1 %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
select(text) %>%
unnest_tokens(output = paired_words,
input = text,
token = "ngrams",
n = 3)
words_ngram %>%
count(paired_words, sort = TRUE) %>%
head(10) %>%
knitr::kable()
| NA |
103 |
| 50 albums of |
12 |
| a lot of |
12 |
| top 50 albums |
12 |
| i listen to |
9 |
| to listen to |
6 |
| albums of the |
5 |
| cookie monster vocals |
5 |
| i get wet |
5 |
| the top 50 |
5 |
#get ngrams. You may try playing around with the value of n, n=3 , n=4
words_ngram <- threads_1 %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
select(text) %>%
unnest_tokens(output = paired_words,
input = text,
token = "ngrams",
n = 3)
words_ngram_pair <- words_ngram %>%
separate(paired_words, c("word1", "word2", "word3"), sep = " ")
# filter rows where there are stop words under word 1 column and word 2 column
words_ngram_pair_filtered <- words_ngram_pair %>%
# drop stop words
filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word & !word3 %in% stop_words$word) %>%
# drop non-alphabet-only strings
filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]"))
# Filter out words that are not encoded in ASCII
# To see what's ASCCII, google 'ASCII table'
library(stringi)
words_ngram_pair_filtered %<>%
filter(stri_enc_isascii(word1) & stri_enc_isascii(word2) & stri_enc_isascii(word3))
words_counts <- words_ngram_pair_filtered %>%
count(word1, word2, word3) %>%
arrange(desc(n))
head(words_counts, 15) %>%
knitr::kable()
| cookie |
monster |
vocals |
5 |
| assassins |
black |
meddle |
4 |
| black |
meddle |
pt |
4 |
| heavy |
metal |
music |
4 |
| meddle |
pt |
1 |
4 |
| shareutm_medium |
web2xcontext |
3 |
4 |
| utm_source |
shareutm_medium |
web2xcontext |
4 |
| metal |
death |
metal |
3 |
| august |
burns |
red |
2 |
| baroness |
blue |
record |
2 |
| black |
metal |
death |
2 |
| blue |
record |
blue |
2 |
| brandon |
stosuy |
stosuy |
2 |
| breakdowns |
fast |
tempos |
2 |
| death |
metal |
thrash |
2 |
plot_tri_gram_network <-
words_counts %>%
filter(n >= 2) %>%
graph_from_data_frame() %>% # convert to graph
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = 0.3, edge_width = n)) +
geom_node_point(color = "darkred", linewidth = 3) +
geom_node_text(aes(label = name), vjust = 1.8) +
labs(title = "Metal Music Thread Word Networks",
x = "", y = "")
## Error in eval(expr, envir, enclos): object 'words_counts' not found
plot_tri_gram_network
## Error in eval(expr, envir, enclos): object 'plot_tri_gram_network' not found
Discussion noteworthy tri-grams
There is one main cluster of discussion: metal music sub-genres
(e.g., heavy metal, black metal, death metal).
It’s not surprising to see heavy metal is the cluster center, which
is perhaps the most well-known sub-genre in metal. The related words to
heavy metal are more focused on musical techniques/elements such as
tempo, blast, breakdown, and screaming. These words represent typical
features of metal, which is fast and loud.
The term “cookie monster” comes from a saying where people referred
to brutal growls as “Cookie Monster vocals” because the deep, (guttural
vocals in metal music sound similar to the voice of the blue Cookie
Monster character](https://www.youtube.com/embed/JWac5UT80no?si=UnkUHMKiOZtCRhd9)
In the context of small connection nodes, Leviathan-Mastodon and
Sunbather-Deafheaven represent the relationship between albums and
bands. Additionally, funeral doom is another sub-genre within the metal
genre.
Tri-gram analysis reveals that threads in metal music are centered
around music itself, with a focus on taxonomy (sub-genres), albums, and
bands.
Step 6: Sentiment analysis with dictionary method and BERT
reddit_sentiment <- read_csv('metal_reddit_bert.csv') %>%
drop_na('bert_label')
## New names:
## Rows: 228 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (6): Unnamed: 0, title, text, subreddit, url, bert_label dbl (4): ...1,
## timestamp, comments, bert_score date (1): date_utc
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
Get sentiment scores using the dictionary method for comparison.
reddit_sentiment %<>%
mutate(title = replace_na(title, ""),
text = replace_na(text, ""),
title_text = str_c(title, text, sep = ". "))
library(sentimentr)
reddit_sentiment_dictionary <- sentiment_by(reddit_sentiment$title_text)
reddit_sentiment$sentiment_dict <- reddit_sentiment_dictionary %>% pull(ave_sentiment)
reddit_sentiment$word_count <- reddit_sentiment_dictionary %>% pull(word_count)
Check the correlation between the sentiment values from two different
methods.
reddit_sentiment %<>% mutate(bert_label_numeric = str_sub(bert_label, 1, 1) %>% as.numeric())
cor(reddit_sentiment$bert_label_numeric, reddit_sentiment$sentiment_dict)
## [1] 0.3124655
0.31 implies a mild positive correlation.
ggplot(data = reddit_sentiment, aes(x = bert_label_numeric, y = sentiment_dict)) +
geom_jitter(width = 0.1, height = 0) +
geom_line(aes(y = 0), color = 'darkolivegreen', lwd = 1, linetype='dashed')

sentimentr_example <- reddit_sentiment %>%
mutate(sentimentr_abs = abs(sentiment_dict),
sentimentr_binary = case_when(sentiment_dict > 0 ~ 'positive',
TRUE ~ 'negative')) %>%
group_by(sentimentr_binary) %>%
arrange(desc(sentimentr_abs)) %>%
slice_head(n = 10) %>%
ungroup() %>%
arrange(sentiment_dict)
# positive
sentimentr_example %>% filter(sentimentr_binary == 'positive') %>% pull(title_text, sentiment_dict) %>% print()
## 0.54
## "*Heavy metal music intensifies*. "
## 0.54
## "*Heavy Metal Music Intensifies*. "
## 0.54
## "*heavy metal music intensifies. "
## 0.615777318283033
## "Rock/heavy metal music with positive lyrics?. I love music and I\031m trying to transition into more uplifting songs. I want to combine my love of rock with positive lyrics. I\031m looking for more stuff like TOOL\031s Parabola. Anything helps. Thanks :)"
## 0.620651539378848
## "If it helps, my favorite genres are rock/metal and electronic music. "
## 0.623538290724796
## "Heavy metal music. "
## 0.796084166404533
## "A cool guide to metal music.. "
## 0.841872912024137
## "A cool guide I made to introduce classical music to rock and metal fans. "
## 0.992043345827187
## "Heavy metal music isn\031t good. "
## 1.01823376490863
## "Liking heavy metal music linked to high intelligence. "
# negative
sentimentr_example %>% filter(sentimentr_binary == 'negative') %>% pull(title_text, sentiment_dict) %>% print()
## -0.521775813927782
## "Suicidal Tendencies - Institutionalized - Official Music Video [Hardcore Punk/Thrash Metal]. "
## -0.521159421729228
## "We\031re now at the \034metal music is evil\035 stage of the Satanic panic. "
## -0.41576092031015
## "People who don't like metal music, why?. "
## -0.347011046894284
## "Communism is when no metal music!. "
## -0.305340941294474
## "Most high-profile metal music is just as overproduced as pop music.. Triggered, quantized drums that sound fake as hell, sanitized guitar tone, and not-so-transparent compression makes a lot of modern metal sound less heavy.\n\nNuclear Blast is notorious for this. Suffocation's latest album sounds horrible. Somehow Behemoth has fantastic organic production, but Cradle of Filth is plasticky as hell.\n\nUnderground death and black metal acts are keeping real performances and organic recordings alive, but Amon Amarth and the likes sound incredibly fake and predictable."
## -0.290625
## "Not directly related to metal music, but what is this sub\031s opinion on Weird Al?. "
## -0.28939387817473
## "First time hitchhiking with a dead person (hearse). And of course the mortician was into metal music!. "
## -0.2
## "\034God awful metal music\035. "
## -0.14142135623731
## "On the hypocrisy of the metal music industry.. "
## -0.14142135623731
## "On the hypocrisy of the metal music industry.. "
data("stop_words")
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&|<|>"
reddit_sentiment_clean <- reddit_sentiment %>%
mutate(title_text = str_replace_all(title_text, replace_reg, "")) %>%
unnest_tokens(word, title_text, token = "words") %>%
anti_join(stop_words, by = "word") %>%
filter(str_detect(word, "[a-z]")) %>%
filter(!word %in% c('metal','music'))
We are not interested in words that are commonly seen in both
positive and negative threads. We can identify words that are uniquely
seen in either positive or negative threads using
anti_join.
reddit_sentiment_clean_negative <- reddit_sentiment_clean %>%
filter(bert_label_numeric %in% c(1,2))
reddit_sentiment_clean_positive <- reddit_sentiment_clean %>%
filter(bert_label_numeric %in% c(4,5))
reddit_sentiment_clean_negative_unique <- reddit_sentiment_clean_negative %>%
anti_join(reddit_sentiment_clean_positive, by = 'word')
reddit_sentiment_clean_positive_unique <- reddit_sentiment_clean_positive %>%
anti_join(reddit_sentiment_clean_negative, by = 'word')
- Words appearing in positive threads
plot_positive_word_cloud <-
reddit_sentiment_clean_positive_unique %>%
count(word, sort = TRUE) %>%
wordcloud2(color = pal,
minRotation = 0,
maxRotation = 0,
ellipticity = 0.8)
plot_positive_word_cloud
- Words appearing in negative threads
n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark or bright
df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))
plot_negative_word_cloud <-
reddit_sentiment_clean_negative_unique %>%
count(word, sort = TRUE) %>%
wordcloud2(color = pal,
minRotation = 0,
maxRotation = 0,
ellipticity = 0.8)
plot_negative_word_cloud
Step 7: Credibility evaluation
Instruction: Display 10 sample texts alongside their sentiment scores
and evaluate the credibility of the sentiment analysis outcomes.
It is important to note that the names of metal bands or albums can
influence the results of sentiment analysis. Even neutral news releases
that mention these names are classified as “negative”.
For example, the first negative example, “Suicidal Tendencies -
Institutionalized - Official Music Video [Hardcore Punk/Thrash Metal]”,
is positive and exciting for metalheads, but it receives a score of
-0.5.
The first three positive samples are not duplicates! Seems like
someone post three similar threads.
# positive
sentimentr_example %>% filter(sentimentr_binary == 'positive') %>% pull(title_text, sentiment_dict) %>% print()
## 0.54
## "*Heavy metal music intensifies*. "
## 0.54
## "*Heavy Metal Music Intensifies*. "
## 0.54
## "*heavy metal music intensifies. "
## 0.615777318283033
## "Rock/heavy metal music with positive lyrics?. I love music and I\031m trying to transition into more uplifting songs. I want to combine my love of rock with positive lyrics. I\031m looking for more stuff like TOOL\031s Parabola. Anything helps. Thanks :)"
## 0.620651539378848
## "If it helps, my favorite genres are rock/metal and electronic music. "
## 0.623538290724796
## "Heavy metal music. "
## 0.796084166404533
## "A cool guide to metal music.. "
## 0.841872912024137
## "A cool guide I made to introduce classical music to rock and metal fans. "
## 0.992043345827187
## "Heavy metal music isn\031t good. "
## 1.01823376490863
## "Liking heavy metal music linked to high intelligence. "
# negative
sentimentr_example %>% filter(sentimentr_binary == 'negative') %>% pull(title_text, sentiment_dict) %>% print()
## -0.521775813927782
## "Suicidal Tendencies - Institutionalized - Official Music Video [Hardcore Punk/Thrash Metal]. "
## -0.521159421729228
## "We\031re now at the \034metal music is evil\035 stage of the Satanic panic. "
## -0.41576092031015
## "People who don't like metal music, why?. "
## -0.347011046894284
## "Communism is when no metal music!. "
## -0.305340941294474
## "Most high-profile metal music is just as overproduced as pop music.. Triggered, quantized drums that sound fake as hell, sanitized guitar tone, and not-so-transparent compression makes a lot of modern metal sound less heavy.\n\nNuclear Blast is notorious for this. Suffocation's latest album sounds horrible. Somehow Behemoth has fantastic organic production, but Cradle of Filth is plasticky as hell.\n\nUnderground death and black metal acts are keeping real performances and organic recordings alive, but Amon Amarth and the likes sound incredibly fake and predictable."
## -0.290625
## "Not directly related to metal music, but what is this sub\031s opinion on Weird Al?. "
## -0.28939387817473
## "First time hitchhiking with a dead person (hearse). And of course the mortician was into metal music!. "
## -0.2
## "\034God awful metal music\035. "
## -0.14142135623731
## "On the hypocrisy of the metal music industry.. "
## -0.14142135623731
## "On the hypocrisy of the metal music industry.. "
# save.image('1130.RData')
Step 8: Insights and visualization
Instruction: Discuss intriguing insights derived from the sentiment
analysis, supporting your observations with at least two plots.
Based on sentiment analysis, metalheads are actually positive and
straightforward on social media. They are more focused on music, and
even the negative threads are about specific albums or sub-genres,
rather than directed towards other users.
reddit_sentiment_rm_outlier <- reddit_sentiment %>%
group_by(bert_label) %>%
filter(
between(
comments,
quantile(comments, 0.25) - 1.5 * IQR(comments),
quantile(comments, 0.75) + 1.5 * IQR(comments)))
cor.test(reddit_sentiment_rm_outlier$comments, reddit_sentiment_rm_outlier$bert_label_numeric)
##
## Pearson's product-moment correlation
##
## data: reddit_sentiment_rm_outlier$comments and reddit_sentiment_rm_outlier$bert_label_numeric
## t = -0.86207, df = 197, p-value = 0.3897
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.19870036 0.07845417
## sample estimates:
## cor
## -0.06130473
reddit_sentiment_rm_outlier %>%
ggplot(aes(x = bert_label_numeric, y = comments)) +
geom_jitter(height = 0, width = 0.05) +
geom_smooth(method = 'loess', span = 0.75)
## `geom_smooth()` using formula = 'y ~ x'

reddit_sentiment %>%
ggplot(aes(x = bert_label, y = word_count)) +
geom_jitter(height = 0, width = 0.05) +
stat_summary(fun = mean, geom = "crossbar", width = 0.4, color = "red")

# fix republish issue
remove.packages("rsconnect") #Remove Installed Packages
## Removing package from '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
remotes::install_version("rsconnect", version = "0.8.29") #Installing a Specific Version of a Package1
## Downloading package from url: https://packagemanager.posit.co/cran/__linux__/jammy/latest/src/contrib/Archive/rsconnect/rsconnect_0.8.29.tar.gz
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
Social Media and Sentiment Analysis with Reddit Threads on Metal Music
This is Major Assignment 4 in CP8883: Introduction to Urban Analytics. The objective of this assignment is to download, analyze, and visualize Reddit threads based on the keyword “metal music”.
I have prioritized readability and providing insightful analysis in this assignment.
Step 1: Topic - Changes in sentiment towards metal music over the past 5 years.
Research Objective
Analyze metalheads’ attitudes and sentiment on social media platforms like Reddit to understand if they align with the aggressive nature of the music or if the extreme music actually makes them more mild-mannered and polite.
Topic and Introduction
Heavy metal, also known simply as metal, is a genre of music characterized by its thick, monumental sound, featuring distorted guitars, extended guitar solos, emphatic beats, and loudness.
Metalheads, passionate fans of metal music, often express their rebellion against the system in peaceful terms. They value the present over the future and tend to be loyal, opinionated, and respectful individuals.
It is important to note that being a fan of violent and aggressive music does not necessarily make metalheads violent people. Studies suggest that exposure to violent music may desensitize individuals to some extent.
To better understand metalheads’ attitudes and sentiment expressed on social media, conducting a sentiment analysis on a metal music thread on platforms like Reddit can provide insights.
This analysis can help determine if metalheads’ attitudes align with the aggressive nature of the music or if the extreme music actually makes them more mild-mannered and polite individuals.
Step 2: Search Reddit threads using keyword “metal music”
Step 3: Clean and tokenize text data
Step 4: Word cloud
Step 5: tri-gram analysis
Discussion noteworthy tri-grams
There is one main cluster of discussion: metal music sub-genres (e.g., heavy metal, black metal, death metal).
It’s not surprising to see heavy metal is the cluster center, which is perhaps the most well-known sub-genre in metal. The related words to heavy metal are more focused on musical techniques/elements such as tempo, blast, breakdown, and screaming. These words represent typical features of metal, which is fast and loud.
The term “cookie monster” comes from a saying where people referred to brutal growls as “Cookie Monster vocals” because the deep, (guttural vocals in metal music sound similar to the voice of the blue Cookie Monster character](https://www.youtube.com/embed/JWac5UT80no?si=UnkUHMKiOZtCRhd9)
In the context of small connection nodes, Leviathan-Mastodon and Sunbather-Deafheaven represent the relationship between albums and bands. Additionally, funeral doom is another sub-genre within the metal genre.
Tri-gram analysis reveals that threads in metal music are centered around music itself, with a focus on taxonomy (sub-genres), albums, and bands.
Step 6: Sentiment analysis with dictionary method and BERT
Get sentiment scores using the dictionary method for comparison.
Check the correlation between the sentiment values from two different methods.
0.31 implies a mild positive correlation.
We are not interested in words that are commonly seen in both positive and negative threads. We can identify words that are uniquely seen in either positive or negative threads using
anti_join.Step 7: Credibility evaluation
It is important to note that the names of metal bands or albums can influence the results of sentiment analysis. Even neutral news releases that mention these names are classified as “negative”.
For example, the first negative example, “Suicidal Tendencies - Institutionalized - Official Music Video [Hardcore Punk/Thrash Metal]”, is positive and exciting for metalheads, but it receives a score of -0.5.
The first three positive samples are not duplicates! Seems like someone post three similar threads.
Step 8: Insights and visualization
Based on sentiment analysis, metalheads are actually positive and straightforward on social media. They are more focused on music, and even the negative threads are about specific albums or sub-genres, rather than directed towards other users.
Reference
[1]Lifestyle correlates of musical preference: 1. Relationships, living arrangements, beliefs, and crime” by Adrian C. North and David J. Hargreaves, Psychology of Music 2007 35:1, 58-87
[2]https://micaelawillers.blogspot.com/2013/06/10-characteristics-of-real-metalhead.html