Social Media and Sentiment Analysis with Reddit Threads on Metal
Music
This is Major Assignment 4 in CP8883: Introduction to Urban
Analytics. The objective of this assignment is to download, analyze, and
visualize Reddit threads based on the keyword “metal music”.
I have prioritized readability and providing insightful analysis in
this assignment.
Step 2: Search Reddit threads using keyword “metal music”
Tips: Specifying a subreddit for search is optional. It is okay to
combine data obtained by searching the keyword across multiple
subreddits. You can choose any period, but ensure you gather a
sufficient amount of data.
library(dplyr)
library(stringi)
devtools::install_github("lchiffon/wordcloud2")
# Package names
packages <- c("RedditExtractoR", "anytime", "magrittr", "ggplot2", "dplyr", "tidytext", "tidyverse", "igraph", "ggraph", "tidyr", "wordcloud2", "textdata", "sf", "tmap")
# Load packages
invisible(lapply(packages, library, character.only = TRUE))
# using keyword
threads_1 <- find_thread_urls(
keywords = "metal music",
sort_by = 'relevance',
period = 'all')
colnames(threads_1)
head(threads_1)
# save(thread_1, file = 'metal.RData')
load('metal.RData')
# create new column: date
threads_1 %<>%
mutate(date = as.POSIXct(date_utc)) %>%
filter(!is.na(date))
## Error in eval(expr, envir, enclos): object 'threads_1' not found
# number of threads by week
plot_threads_by_year <-
threads_1 %>%
ggplot(aes(x = date)) +
geom_histogram(color="black", position = 'stack', binwidth = 60*60*24*7) +
stat_density(geom = "line", aes(y = after_stat(scaled)), color = "red") +
scale_x_datetime(date_labels = "%Y",
breaks = seq(min(threads_1$date, na.rm = TRUE),
max(threads_1$date, na.rm = TRUE),
by = "1 year")) +
theme_minimal()
## Error in eval(expr, envir, enclos): object 'threads_1' not found
plot_threads_by_year
## Error in eval(expr, envir, enclos): object 'plot_threads_by_year' not found
Step 3: Clean and tokenize text data
# Tokenization (word tokens)
words <- threads_1 %>%
unnest_tokens(output = word, input = text, token = "words")
## Error in eval(expr, envir, enclos): object 'threads_1' not found
words %>%
count(word, sort = TRUE) %>%
top_n(20) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "words",
y = "counts",
title = "Unique wordcounts")
## Error in UseMethod("count"): no applicable method for 'count' applied to an object of class "character"
# load list of stop words - from the tidytext package
data("stop_words")
# view random 50 words
print(stop_words$word[sample(1:nrow(stop_words), 50)])
## [1] "you" "whom" "saw" "cant"
## [5] "my" "members" "unfortunately" "the"
## [9] "many" "currently" "whether" "our"
## [13] "value" "grouped" "saw" "should"
## [17] "was" "long" "take" "high"
## [21] "were" "likely" "indeed" "nobody"
## [25] "best" "it's" "same" "rather"
## [29] "of" "sometimes" "uucp" "consider"
## [33] "either" "across" "down" "for"
## [37] "been" "thereafter" "does" "happens"
## [41] "clearly" "be" "came" "another"
## [45] "hither" "way" "comes" "anybody"
## [49] "there's" "older"
# Regex that matches URL-type string
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&|<|>"
words_clean <- threads_1 %>%
# drop URLs
mutate(text = str_replace_all(text, replace_reg, "")) %>%
# Tokenization (word tokens)
unnest_tokens(word, text, token = "words") %>%
# drop stop words
anti_join(stop_words, by = "word") %>%
# drop non-alphabet-only strings
filter(str_detect(word, "[a-z]"))
## Error in eval(expr, envir, enclos): object 'threads_1' not found
# Check the number of rows after removal of the stop words. There should be fewer words now
print(
glue::glue("Before: {nrow(words)}, After: {nrow(words_clean)}")
)
## Error in eval(parse(text = text, keep.source = FALSE), envir): object 'words_clean' not found
plot_words_clean <-
words_clean %>%
count(word, sort = TRUE) %>%
top_n(20, n) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "words",
y = "counts",
title = "Unique wordcounts")
## Error in eval(expr, envir, enclos): object 'words_clean' not found
plot_words_clean
## Error in eval(expr, envir, enclos): object 'plot_words_clean' not found
Step 4: Word cloud
Instruction: Generate a word cloud that illustrates the frequency of
words except your keyword.
n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark or bright
df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))
plot_word_cloud <-
words_clean %>%
count(word, sort = TRUE) %>%
wordcloud2(color = pal,
minRotation = 0,
maxRotation = 0,
ellipticity = 0.8)
## Error in eval(expr, envir, enclos): object 'words_clean' not found
plot_word_cloud
## Error in eval(expr, envir, enclos): object 'plot_word_cloud' not found
Step 5: tri-gram analysis
Instruction: 1. Extract tri-grams from text data. 2.Remove tri-grams
containing stop words or non-alphabetic terms. 3.Present the frequency
of tri-grams in a table. 4.Discuss any noteworthy tri-grams you come
across.
words_ngram <- threads_1 %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
select(text) %>%
unnest_tokens(output = paired_words,
input = text,
token = "ngrams",
n = 3)
## Error in eval(expr, envir, enclos): object 'threads_1' not found
words_ngram %>%
count(paired_words, sort = TRUE) %>%
head(10) %>%
knitr::kable()
## Error in eval(expr, envir, enclos): object 'words_ngram' not found
#get ngrams. You may try playing around with the value of n, n=3 , n=4
words_ngram <- threads_1 %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
select(text) %>%
unnest_tokens(output = paired_words,
input = text,
token = "ngrams",
n = 3)
## Error in eval(expr, envir, enclos): object 'threads_1' not found
words_ngram_pair <- words_ngram %>%
separate(paired_words, c("word1", "word2", "word3"), sep = " ")
## Error in eval(expr, envir, enclos): object 'words_ngram' not found
# filter rows where there are stop words under word 1 column and word 2 column
words_ngram_pair_filtered <- words_ngram_pair %>%
# drop stop words
filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word & !word3 %in% stop_words$word) %>%
# drop non-alphabet-only strings
filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]"))
## Error in eval(expr, envir, enclos): object 'words_ngram_pair' not found
# Filter out words that are not encoded in ASCII
# To see what's ASCCII, google 'ASCII table'
library(stringi)
words_ngram_pair_filtered %<>%
filter(stri_enc_isascii(word1) & stri_enc_isascii(word2) & stri_enc_isascii(word3))
## Error in eval(expr, envir, enclos): object 'words_ngram_pair_filtered' not found
words_counts <- words_ngram_pair_filtered %>%
count(word1, word2, word3) %>%
arrange(desc(n))
## Error in eval(expr, envir, enclos): object 'words_ngram_pair_filtered' not found
head(words_counts, 15) %>%
knitr::kable()
## Error in eval(expr, envir, enclos): object 'words_counts' not found
plot_tri_gram_network <-
words_counts %>%
filter(n >= 2) %>%
graph_from_data_frame() %>% # convert to graph
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = 0.3, edge_width = n)) +
geom_node_point(color = "darkred", linewidth = 3) +
geom_node_text(aes(label = name), vjust = 1.8) +
labs(title = "Metal Music Thread Word Networks",
x = "", y = "")
## Error in eval(expr, envir, enclos): object 'words_counts' not found
plot_tri_gram_network
## Error in eval(expr, envir, enclos): object 'plot_tri_gram_network' not found
Discussion noteworthy tri-grams
There is one main cluster of discussion: metal music sub-genres
(e.g., heavy metal, black metal, death metal).
It’s not surprising to see heavy metal is the cluster center, which
is perhaps the most well-known sub-genre in metal. The related words to
heavy metal are more focused on musical techniques/elements such as
tempo, blast, breakdown, and screaming. These words represent typical
features of metal, which is fast and loud.
The term “cookie monster” comes from a saying where people referred
to brutal growls as “Cookie Monster vocals” because the deep, guttural
vocals in metal music sound similar to the voice of the blue Cookie
Monster character.
In the context of small connection nodes, Leviathan-Mastodon and
Sunbather-Deafheaven represent the relationship between albums and
bands. Additionally, funeral doom is another sub-genre within the metal
genre.
Tri-gram analysis reveals that threads in metal music are centered
around music itself, with a focus on taxonomy (sub-genres), albums, and
bands.
Step 6: Sentiment analysis with dictionary method and BERT
reddit_sentiment <- read_csv('metal_reddit_bert.csv') %>%
drop_na('bert_label')
## New names:
## Rows: 228 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (6): Unnamed: 0, title, text, subreddit, url, bert_label dbl (4): ...1,
## timestamp, comments, bert_score date (1): date_utc
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
Get sentiment scores using the dictionary method for comparison.
reddit_sentiment %<>%
mutate(title = replace_na(title, ""),
text = replace_na(text, ""),
title_text = str_c(title, text, sep = ". "))
reddit_sentiment_dictionary <- sentiment_by(reddit_sentiment$title_text)
reddit_sentiment$sentiment_dict <- reddit_sentiment_dictionary %>% pull(ave_sentiment)
reddit_sentiment$word_count <- reddit_sentiment_dictionary %>% pull(word_count)
Check the correlation between the sentiment values from two different
methods.
reddit_sentiment %<>% mutate(bert_label_numeric = str_sub(bert_label, 1, 1) %>% as.numeric())
cor(reddit_sentiment$bert_label_numeric, reddit_sentiment$sentiment_dict)
## [1] 0.3124655
0.31 implies a mild positive correlation.
ggplot(data = reddit_sentiment, aes(x = bert_label_numeric, y = sentiment_dict)) +
geom_jitter(width = 0.1, height = 0) +
geom_line(aes(y = 0), color = 'darkolivegreen', lwd = 1, linetype='dashed')

sentimentr_example <- reddit_sentiment %>%
mutate(sentimentr_abs = abs(sentiment_dict),
sentimentr_binary = case_when(sentiment_dict > 0 ~ 'positive',
TRUE ~ 'negative')) %>%
group_by(sentimentr_binary) %>%
arrange(desc(sentimentr_abs)) %>%
slice_head(n = 10) %>%
ungroup() %>%
arrange(sentiment_dict)
# positive
sentimentr_example %>% filter(sentimentr_binary == 'positive') %>% pull(title_text, sentiment_dict) %>% print()
## 0.54
## "*Heavy metal music intensifies*. "
## 0.54
## "*Heavy Metal Music Intensifies*. "
## 0.54
## "*heavy metal music intensifies. "
## 0.615777318283033
## "Rock/heavy metal music with positive lyrics?. I love music and I\031m trying to transition into more uplifting songs. I want to combine my love of rock with positive lyrics. I\031m looking for more stuff like TOOL\031s Parabola. Anything helps. Thanks :)"
## 0.620651539378848
## "If it helps, my favorite genres are rock/metal and electronic music. "
## 0.623538290724796
## "Heavy metal music. "
## 0.796084166404533
## "A cool guide to metal music.. "
## 0.841872912024137
## "A cool guide I made to introduce classical music to rock and metal fans. "
## 0.992043345827187
## "Heavy metal music isn\031t good. "
## 1.01823376490863
## "Liking heavy metal music linked to high intelligence. "
# negative
sentimentr_example %>% filter(sentimentr_binary == 'negative') %>% pull(title_text, sentiment_dict) %>% print()
## -0.521775813927782
## "Suicidal Tendencies - Institutionalized - Official Music Video [Hardcore Punk/Thrash Metal]. "
## -0.521159421729228
## "We\031re now at the \034metal music is evil\035 stage of the Satanic panic. "
## -0.41576092031015
## "People who don't like metal music, why?. "
## -0.347011046894284
## "Communism is when no metal music!. "
## -0.305340941294474
## "Most high-profile metal music is just as overproduced as pop music.. Triggered, quantized drums that sound fake as hell, sanitized guitar tone, and not-so-transparent compression makes a lot of modern metal sound less heavy.\n\nNuclear Blast is notorious for this. Suffocation's latest album sounds horrible. Somehow Behemoth has fantastic organic production, but Cradle of Filth is plasticky as hell.\n\nUnderground death and black metal acts are keeping real performances and organic recordings alive, but Amon Amarth and the likes sound incredibly fake and predictable."
## -0.290625
## "Not directly related to metal music, but what is this sub\031s opinion on Weird Al?. "
## -0.28939387817473
## "First time hitchhiking with a dead person (hearse). And of course the mortician was into metal music!. "
## -0.2
## "\034God awful metal music\035. "
## -0.14142135623731
## "On the hypocrisy of the metal music industry.. "
## -0.14142135623731
## "On the hypocrisy of the metal music industry.. "
data("stop_words")
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&|<|>"
reddit_sentiment_clean <- reddit_sentiment %>%
mutate(title_text = str_replace_all(title_text, replace_reg, "")) %>%
unnest_tokens(word, title_text, token = "words") %>%
anti_join(stop_words, by = "word") %>%
filter(str_detect(word, "[a-z]")) %>%
filter(!word %in% c('metal','music'))
We are not interested in words that are commonly seen in both
positive and negative threads. We can identify words that are uniquely
seen in either positive or negative threads using
anti_join.
reddit_sentiment_clean_negative <- reddit_sentiment_clean %>%
filter(bert_label_numeric %in% c(1,2))
reddit_sentiment_clean_positive <- reddit_sentiment_clean %>%
filter(bert_label_numeric %in% c(4,5))
reddit_sentiment_clean_negative_unique <- reddit_sentiment_clean_negative %>%
anti_join(reddit_sentiment_clean_positive, by = 'word')
reddit_sentiment_clean_positive_unique <- reddit_sentiment_clean_positive %>%
anti_join(reddit_sentiment_clean_negative, by = 'word')
- Words appearing in positive threads
plot_positive_word_cloud <-
reddit_sentiment_clean_positive_unique %>%
count(word, sort = TRUE) %>%
wordcloud2(color = pal,
minRotation = 0,
maxRotation = 0,
ellipticity = 0.8)
plot_positive_word_cloud
- Words appearing in negative threads
n <- 20
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark or bright
df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))
plot_negative_word_cloud <-
reddit_sentiment_clean_negative_unique %>%
count(word, sort = TRUE) %>%
wordcloud2(color = pal,
minRotation = 0,
maxRotation = 0,
ellipticity = 0.8)
plot_negative_word_cloud
Step 7: Credibility evaluation
Instruction: Display 10 sample texts alongside their sentiment scores
and evaluate the credibility of the sentiment analysis outcomes.
It is important to note that the names of metal bands or albums can
influence the results of sentiment analysis. Even neutral news releases
that mention these names are classified as “negative”.
For example, the first negative example, “Suicidal Tendencies -
Institutionalized - Official Music Video [Hardcore Punk/Thrash Metal]”,
is positive and exciting for metalheads, but it receives a score of
-0.5.
The first three positive samples are not duplicates! Seems like
someone post three similar threads.
# positive
sentimentr_example %>% filter(sentimentr_binary == 'positive') %>% pull(title_text, sentiment_dict) %>% print()
## 0.54
## "*Heavy metal music intensifies*. "
## 0.54
## "*Heavy Metal Music Intensifies*. "
## 0.54
## "*heavy metal music intensifies. "
## 0.615777318283033
## "Rock/heavy metal music with positive lyrics?. I love music and I\031m trying to transition into more uplifting songs. I want to combine my love of rock with positive lyrics. I\031m looking for more stuff like TOOL\031s Parabola. Anything helps. Thanks :)"
## 0.620651539378848
## "If it helps, my favorite genres are rock/metal and electronic music. "
## 0.623538290724796
## "Heavy metal music. "
## 0.796084166404533
## "A cool guide to metal music.. "
## 0.841872912024137
## "A cool guide I made to introduce classical music to rock and metal fans. "
## 0.992043345827187
## "Heavy metal music isn\031t good. "
## 1.01823376490863
## "Liking heavy metal music linked to high intelligence. "
# negative
sentimentr_example %>% filter(sentimentr_binary == 'negative') %>% pull(title_text, sentiment_dict) %>% print()
## -0.521775813927782
## "Suicidal Tendencies - Institutionalized - Official Music Video [Hardcore Punk/Thrash Metal]. "
## -0.521159421729228
## "We\031re now at the \034metal music is evil\035 stage of the Satanic panic. "
## -0.41576092031015
## "People who don't like metal music, why?. "
## -0.347011046894284
## "Communism is when no metal music!. "
## -0.305340941294474
## "Most high-profile metal music is just as overproduced as pop music.. Triggered, quantized drums that sound fake as hell, sanitized guitar tone, and not-so-transparent compression makes a lot of modern metal sound less heavy.\n\nNuclear Blast is notorious for this. Suffocation's latest album sounds horrible. Somehow Behemoth has fantastic organic production, but Cradle of Filth is plasticky as hell.\n\nUnderground death and black metal acts are keeping real performances and organic recordings alive, but Amon Amarth and the likes sound incredibly fake and predictable."
## -0.290625
## "Not directly related to metal music, but what is this sub\031s opinion on Weird Al?. "
## -0.28939387817473
## "First time hitchhiking with a dead person (hearse). And of course the mortician was into metal music!. "
## -0.2
## "\034God awful metal music\035. "
## -0.14142135623731
## "On the hypocrisy of the metal music industry.. "
## -0.14142135623731
## "On the hypocrisy of the metal music industry.. "
Step 8: Insights and visualization
Instruction: Discuss intriguing insights derived from the sentiment
analysis, supporting your observations with at least two plots.
Social Media and Sentiment Analysis with Reddit Threads on Metal Music
This is Major Assignment 4 in CP8883: Introduction to Urban Analytics. The objective of this assignment is to download, analyze, and visualize Reddit threads based on the keyword “metal music”.
I have prioritized readability and providing insightful analysis in this assignment.
Step 1: Topic - Changes in sentiment towards metal music over the past 5 years.
Research Objective
Analyze metalheads’ attitudes and sentiment on social media platforms like Reddit to understand if they align with the aggressive nature of the music or if the extreme music actually makes them more mild-mannered and polite.
Topic and Introduction
Heavy metal, also known simply as metal, is a genre of music characterized by its thick, monumental sound, featuring distorted guitars, extended guitar solos, emphatic beats, and loudness.
Metalheads, passionate fans of metal music, often express their rebellion against the system in peaceful terms. They value the present over the future and tend to be loyal, opinionated, and respectful individuals.
It is important to note that being a fan of violent and aggressive music does not necessarily make metalheads violent people. Studies suggest that exposure to violent music may desensitize individuals to some extent.
To better understand metalheads’ attitudes and sentiment expressed on social media, conducting a sentiment analysis on a metal music thread on platforms like Reddit can provide insights.
This analysis can help determine if metalheads’ attitudes align with the aggressive nature of the music or if the extreme music actually makes them more mild-mannered and polite individuals.
Step 2: Search Reddit threads using keyword “metal music”
Step 3: Clean and tokenize text data
Step 4: Word cloud
Step 5: tri-gram analysis
Discussion noteworthy tri-grams
There is one main cluster of discussion: metal music sub-genres (e.g., heavy metal, black metal, death metal).
It’s not surprising to see heavy metal is the cluster center, which is perhaps the most well-known sub-genre in metal. The related words to heavy metal are more focused on musical techniques/elements such as tempo, blast, breakdown, and screaming. These words represent typical features of metal, which is fast and loud.
The term “cookie monster” comes from a saying where people referred to brutal growls as “Cookie Monster vocals” because the deep, guttural vocals in metal music sound similar to the voice of the blue Cookie Monster character.
In the context of small connection nodes, Leviathan-Mastodon and Sunbather-Deafheaven represent the relationship between albums and bands. Additionally, funeral doom is another sub-genre within the metal genre.
Tri-gram analysis reveals that threads in metal music are centered around music itself, with a focus on taxonomy (sub-genres), albums, and bands.
Step 6: Sentiment analysis with dictionary method and BERT
Get sentiment scores using the dictionary method for comparison.
Check the correlation between the sentiment values from two different methods.
0.31 implies a mild positive correlation.
We are not interested in words that are commonly seen in both positive and negative threads. We can identify words that are uniquely seen in either positive or negative threads using
anti_join.Step 7: Credibility evaluation
It is important to note that the names of metal bands or albums can influence the results of sentiment analysis. Even neutral news releases that mention these names are classified as “negative”.
For example, the first negative example, “Suicidal Tendencies - Institutionalized - Official Music Video [Hardcore Punk/Thrash Metal]”, is positive and exciting for metalheads, but it receives a score of -0.5.
The first three positive samples are not duplicates! Seems like someone post three similar threads.
Step 8: Insights and visualization
Reference
[1]Lifestyle correlates of musical preference: 1. Relationships, living arrangements, beliefs, and crime” by Adrian C. North and David J. Hargreaves, Psychology of Music 2007 35:1, 58-87
[2]https://micaelawillers.blogspot.com/2013/06/10-characteristics-of-real-metalhead.html