This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
#Instructions In this assignment, you will download, analyze, and visualize Reddit threads based on a keyword of your choice. Specifically, you will be performing the following steps:
Describe in one sentence what you aim to examine using user-generated text data and sentiment analysis. e.g., Changes in sentiment towards Elon Musk over the past 12 months.
Search Reddit threads using a keyword of your choice.
Clean your text data and then tokenize it.
Generate a word cloud that illustrates the frequency of words excluding your keyword.
Conduct a tri-gram analysis.
Perform a sentiment analysis on your text data using a dictionary method that accommodates negations.
Display 10 sample texts alongside their sentiment scores and evaluate the credibility of the sentiment analysis outcomes.
Discuss intriguing insights derived from the sentiment analysis, supporting your observations with at least THREE plots.
# Package names
packages <- c("RedditExtractoR", "anytime", "magrittr", "httr", "tidytext", "tidyverse", "igraph", "ggraph", "wordcloud2", "textdata", "here", "jsonlite", "syuzhet", "dplyr", "sentimentr", "ggplot2", "stringr", "devtools", "htmltools")
# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
# Load packages
invisible(lapply(packages, library, character.only = TRUE))
I aim to examine public reactions to the latest Marvel movie on Reddit by analyzing user-generated text data through sentiment analysis
# using keyword
threads_1_f <- find_thread_urls(keywords = 'Guardians of the Galaxy Vol. 3',
sort_by = 'relevance',
period = 'all') %>%
drop_na()
rownames(threads_1_f) <- NULL
# Sanitize text
threads_1_f %<>%
mutate(across(
where(is.character),
~ .x %>%
str_replace_all("\\|", "/") %>% # replace vertical bars
str_replace_all("\\n", " ") %>% # replace newlines
str_squish() # clean up extra spaces
))
colnames(threads_1_f)
head(threads_1_f, 3) %>% knitr::kable()
Searching by subreddit using the
find_subreddits() for list of related subreddits to the
keyword.
# search for subreddits
subreddit_list <- RedditExtractoR::find_subreddits('Guardians of the Galaxy Vol. 3')
subreddit_list %>%
arrange(desc(subscribers)) %>%
.[1:25,c('subreddit','title','subscribers')] %>%
knitr::kable()
threads_1_f$subreddit %>% table() %>% sort(decreasing = T) %>% head(20)
Threads within subreddit for Guardians of the Galaxy Vol. 3.
# using subreddit
threads_2_f <- find_thread_urls(subreddit = c(' marvelstudios', 'boxoffice', 'MarvelStudiosSpoilers', 'movies', 'shittymoviedetails', 'Marvel', 'comicbooks', 'DC_Cinematic', 'marvelmemes', 'MarvelStudios_Rumours'),
sort_by = 'top',
period = 'year') %>%
drop_na()
rownames(threads_2_f) <- NULL
# Sanitize text
threads_2_f %<>%
mutate(across(
where(is.character),
~ .x %>%
str_replace_all("\\|", "/") %>%
str_replace_all("\\n", " ") %>%
str_squish()
))
head(threads_2_f, 3) %>% knitr::kable()
searching by both the keyword and subreddit.
# using both subreddit and keyword
threads_3_f <- find_thread_urls(keywords= 'Guardians of the Galaxy Vol. 3',
subreddit = c(' marvelstudios', 'boxoffice', 'MarvelStudiosSpoilers', 'movies', 'shittymoviedetails', 'Marvel', 'comicbooks', 'DC_Cinematic', 'marvelmemes', 'MarvelStudios_Rumours'),
sort_by = 'relevance',
period = 'all') %>%
drop_na()
rownames(threads_3_f) <- NULL
# Sanitize text
threads_3_f %<>%
mutate(across(
where(is.character),
~ .x %>%
str_replace_all("\\|", "/") %>%
str_replace_all("\\n", " ") %>%
str_squish()
))
head(threads_3_f, 3) %>% knitr::kable()
# get individual comments
threads_1_content <- get_thread_content(threads_1_f$url[1:4])
threads_2_content <- get_thread_content(threads_2_f$url[1:4])
threads_3_content <- get_thread_content(threads_3_f$url[1:4])
names(threads_2_content)
# check upvotes and downvotes
print(threads_2_content$threads[,c('upvotes','downvotes','up_ratio')])
load("threads_2_f.RData")
load("threads_2_content.RData")
# Sanitize text
threads_2_content$comments %<>%
mutate(across(
where(is.character),
~ .x %>%
str_replace_all("\\|", "/") %>%
str_replace_all("\\n", " ") %>%
str_squish()
))
head(threads_2_content$comments, 3) %>% knitr::kable()
# Save each data frame to a RData
save(threads_1_f, "threads_1_f.RData")
save(threads_1_content, "threads_1_content.RData")
save(threads_2_f, "threads_2_f.RData")
save(threads_2_content, "threads_2_content.RData")
save(threads_3_f, "threads_3_f.RData")
save(threads_3_content, "threads_3_content.RData")
# Word tokenization
words <- threads_2_f %>%
unnest_tokens(output = word, input = text, token = "words") # run `?tidytext::unnest_tokens` on the console
words %>%
count(word, sort = TRUE) %>%
top_n(20) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "words",
y = "counts",
title = "Unique wordcounts")
## Selecting by n
removing stop words using a built-in dataset from the
tidytext package.
# load list of stop words - from the tidytext package
data("stop_words")
# view random 50 words
print(stop_words$word[sample(1:nrow(stop_words), 100)])
## [1] "generally" "see" "etc" "and"
## [5] "we" "yours" "gave" "can"
## [9] "face" "make" "weren't" "which"
## [13] "like" "except" "those" "become"
## [17] "new" "yourself" "formerly" "he'd"
## [21] "sensible" "latter" "consequently" "more"
## [25] "made" "looking" "young" "uses"
## [29] "want" "go" "you'll" "away"
## [33] "used" "would" "did" "inner"
## [37] "we've" "goods" "keeps" "don't"
## [41] "point" "him" "whom" "yourself"
## [45] "even" "we've" "everywhere" "themselves"
## [49] "others" "various" "does" "everyone"
## [53] "certain" "almost" "man" "corresponding"
## [57] "through" "same" "as" "theirs"
## [61] "co" "used" "high" "seeming"
## [65] "doesn't" "perhaps" "until" "six"
## [69] "seeing" "every" "currently" "well"
## [73] "few" "thanks" "really" "been"
## [77] "little" "shouldn't" "turned" "allows"
## [81] "both" "anyone" "wherein" "present"
## [85] "already" "sub" "a" "neither"
## [89] "theres" "p" "per" "nd"
## [93] "needing" "later" "being" "apart"
## [97] "how" "more" "back" "right"
The anti_join() function was used to remove the stop words from the text which left ua with a cleaned set of words.
# Regex that matches URL-type string
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&|<|>"
words_clean <- threads_2_f %>%
# drop URLs
mutate(text = str_replace_all(text, replace_reg, "")) %>%
# Tokenization (word tokens)
unnest_tokens(word, text, token = "words") %>%
# drop stop words
anti_join(stop_words, by = "word") %>%
# drop non-alphabet-only strings
filter(str_detect(word, "[a-z]"))
# Check the number of rows after removal of the stop words. There should be fewer words now
print(
glue::glue("Before: {nrow(words)}, After: {nrow(words_clean)}")
)
## Before: 7240, After: 2585
A new plot is created after removing all the stop words for visualisation. This helps to see the words that sound meaningful and frequently used.
words_clean %>%
count(word, sort = TRUE) %>%
top_n(20, n) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "words",
y = "counts",
title = "Unique wordcounts")
This plot compare the frequency of words before and after removing stop words using a word cloud.
knitr::opts_chunk$set(widgetframe=FALSE)
# words %>%
# count(word, sort = TRUE) %>%
# wordcloud2()
wc1 <- words %>%
count(word, sort = TRUE) %>%
wordcloud2()
htmltools::tagList(wc1)
words_clean %>%
count(word, sort = TRUE) %>%
wordcloud2()
knitr::include_graphics("C:/Users/akaamah3/Documents/SCaRP Course Materials_Fall2025/Into_to_Urban_Analytics/CP8883_working_with_R/1.png")
The word clouds generated above look nice, but their color schemes can be a bit overwhelming. Therefore the following block of code creates a custom color palette designed to highlight a selected number of words while graying out the rest. The collection of random colors are generated using the HSV (Hue, Saturation, Value) color model.
n <- 20 # number of words with color
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark or bright
df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))
words_clean %>%
count(word, sort = TRUE) %>%
wordcloud2(color = pal,
minRotation = 0,
maxRotation = 0,
ellipticity = 0.8)
knitr::include_graphics("C:/Users/akaamah3/Documents/SCaRP Course Materials_Fall2025/Into_to_Urban_Analytics/CP8883_working_with_R/2.png")
# Get trigrams.
words_trigram <- threads_2_f %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
select(text) %>%
unnest_tokens(output = trigram,
input = text,
token = "ngrams",
n = 3)
# Show trigram with sorted values
words_trigram %>%
count(trigram, sort = TRUE) %>%
head(20) %>%
knitr::kable()
| trigram | n |
|---|---|
| NA | 177 |
| did not like | 11 |
| a lot of | 9 |
| one of the | 8 |
| 10 liked it | 6 |
| brave new world | 6 |
| america brave new | 5 |
| was able to | 5 |
| 10 loved it | 4 |
| 7 10 liked | 4 |
| captain america brave | 4 |
| do you think | 4 |
| he didn t | 4 |
| in this movie | 4 |
| loved it liked | 4 |
| of the best | 4 |
| pov shown in | 4 |
| 10 did not | 3 |
| 9 10 loved | 3 |
| able to build | 3 |
#separate the paired words into three columns
words_trigram_sep <- words_trigram %>%
separate(trigram, into = c("word1", "word2", "word3"), sep = " ")
library(stringi)
words_trigram_filtered <- words_trigram_sep %>%
filter(!word1 %in% stop_words$word &
!word2 %in% stop_words$word &
!word3 %in% stop_words$word) %>%
filter(str_detect(word1, "[a-z]") &
str_detect(word2, "[a-z]") &
str_detect(word3, "[a-z]")) %>%
filter(stri_enc_isascii(word1) &
stri_enc_isascii(word2) &
stri_enc_isascii(word3))
# Sort the new trigram (n=3) counts:
trigram_counts <- words_trigram_filtered %>%
count(word1, word2, word3) %>%
arrange(desc(n))
head(trigram_counts, 20) %>%
knitr::kable()
| word1 | word2 | word3 | n |
|---|---|---|---|
| captain | america | brave | 4 |
| america | civil | war | 2 |
| avengers | infinity | war | 2 |
| captain | america | civil | 2 |
| disney | marvel | captain | 2 |
| marvel | captain | america | 2 |
| news | disney | marvel | 2 |
| random | unknown | actor | 2 |
| rob | zombie | movie | 2 |
| absurd | energy | source | 1 |
| abusive | family | basically | 1 |
| acknowledge | killing | battlestar | 1 |
| actual | people | speak | 1 |
| ad | revenue | fake | 1 |
| aged | white | dude | 1 |
| aging | scientist | father | 1 |
| ago | spoiling | plot | 1 |
| ahem | red | hulk | 1 |
| america | anthony | mackie | 1 |
| anthony | mackie | captain | 1 |
The tri-grams with the highest frequencies (captain america brave, captain america civil, marvel captain america) indicate strong thematic focus on Captain America as a character and hero figure. Tri-grams like america civil war, and avengers infinity war suggest that users are mentioning movie titles or reacting to events in these movies. Again, Tri-grams such as disney marvel captain and news disney marvel highlight discussions involving Marvel Studios and Disney, possibly referencing announcements, news updates, or movie promotions.
# Get ngrams. You may try playing around with the value of n, n=3, n=4
words_ngram <- threads_2_f %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
select(text) %>%
unnest_tokens(output = paired_words,
input = text,
token = "ngrams",
n = 2)
# Showing bi-grams with sorted values
words_ngram %>%
count(paired_words, sort = TRUE) %>%
head(20) %>%
knitr::kable()
| paired_words | n |
|---|---|
| NA | 175 |
| of the | 32 |
| did not | 20 |
| and i | 19 |
| it s | 19 |
| in the | 18 |
| to the | 14 |
| captain america | 13 |
| didn t | 13 |
| it was | 13 |
| not like | 12 |
| the best | 12 |
| the mcu | 12 |
| a lot | 11 |
| i m | 11 |
| one of | 11 |
| this movie | 11 |
| and the | 10 |
| i think | 10 |
| really liked | 10 |
#separate the paired words into two columns
words_ngram_pair <- words_ngram %>%
separate(paired_words, c("word1", "word2"), sep = " ")
# filter rows where there are stop words under word 1 column and word 2 column
words_ngram_pair_filtered <- words_ngram_pair %>%
# drop stop words
filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word) %>%
# drop non-alphabet-only strings
filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]"))
# Filter out words that are not encoded in ASCII
# To see what's ASCII, google 'ASCII table'
library(stringi)
words_ngram_pair_filtered %<>%
filter(stri_enc_isascii(word1) & stri_enc_isascii(word2))
# Sort the new bi-gram (n=2) counts:
words_counts <- words_ngram_pair_filtered %>%
count(word1, word2) %>%
arrange(desc(n))
head(words_counts, 20) %>%
knitr::kable()
| word1 | word2 | n |
|---|---|---|
| captain | america | 13 |
| tony | stark | 8 |
| infinity | war | 6 |
| america | brave | 5 |
| civil | war | 5 |
| avengers | endgame | 4 |
| pov | shown | 4 |
| action | scenes | 3 |
| black | panther | 3 |
| comic | book | 3 |
| gonna | ruin | 3 |
| marvel | movies | 3 |
| america | civil | 2 |
| anthony | mackie | 2 |
| avengers | infinity | 2 |
| blade | movie | 2 |
| captain | marvel | 2 |
| chris | evans | 2 |
| christian | bale | 2 |
| comic | accurate | 2 |
# plot word network
words_counts %>%
filter(n >= 3) %>%
graph_from_data_frame() %>% # convert to graph
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = .6, edge_width = n)) +
geom_node_point(color = "darkslategray4", size = 3) +
geom_node_text(aes(label = name), vjust = 1.8) +
labs(title = "Word Networks",
x = "", y = "")
## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# syuzhet package
get_sentiment(threads_2_f$text, method='syuzhet')
## [1] 0.00 3.85 2.95 -8.00 0.00 0.00 0.00 8.15 0.00 0.00 0.00 0.00
## [13] 0.00 0.00 0.00 0.00 0.00 2.40 5.20 1.75 0.00 0.00 0.00 1.00
## [25] 0.00 0.00 0.00 3.40 0.00 2.05 0.00 0.00 2.40 0.00 0.00 0.15
## [37] 0.00 -0.75 0.00 0.00 0.00 -0.60 7.40 0.00 0.00 0.00 0.00 0.50
## [49] 0.00 0.00 2.30 0.00 0.00 0.00 0.00 0.00 0.00 7.80 0.00 0.00
## [61] 0.00 0.00 0.00 0.00 0.00 0.00 2.90 2.05 0.00 1.30 0.00 -1.65
## [73] 0.00 0.00 0.00 4.00 0.05 0.00 0.00 0.00 1.05 0.00 0.00 0.00
## [85] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [97] 0.00 0.00 2.70 1.25 0.00 0.00 3.35 2.85 0.00 0.00 0.00 0.00
## [109] 0.00 2.35 0.00 0.00 1.30 0.00 0.00 2.30 0.00 4.00 0.00 0.00
## [121] 2.75 0.00 0.00 0.00 2.55 -2.70 1.15 0.00 -0.25 0.00 0.00 0.00
## [133] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.75 0.00 6.20 0.00
## [145] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.35 0.00 0.00 0.00 0.00
## [157] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [169] 0.00 0.00 -0.50 1.55 0.00 0.60 0.00 0.00 0.00 0.00 0.00 0.00
## [181] 0.00 1.30 12.10 0.00 -5.30 0.00 -1.10 0.00 0.00 0.00 -1.40 0.00
## [193] 0.00 0.10 0.00 -1.25 -0.35 0.00 0.00 0.00 0.00 -0.55 0.00 0.00
## [205] 0.00 0.00 1.65 0.00 0.00 0.00 0.80 0.00 0.00 0.80 0.00 0.00
## [217] 0.00 0.00 0.00 -0.45 0.00 0.00 0.00 0.25 4.00 0.00 0.00 0.00
## [229] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## [241] 0.00 1.55 0.00 0.00 0.90 0.00 0.00 -3.35 0.00
get_sentiment(threads_2_f$text, method='bing')
## [1] 0 1 3 -7 0 0 0 8 0 0 0 0 0 0 0 0 0 0 1 4 0 0 0 2 0
## [26] 0 0 5 0 1 0 0 0 0 0 0 0 -1 0 0 0 -1 2 0 0 0 0 1 0 0
## [51] 5 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 3 -1 0 1 0 0 0 0 0
## [76] 3 2 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0
## [101] 0 0 -2 1 0 0 0 0 0 3 0 0 1 0 0 4 0 4 0 0 2 0 0 0 3
## [126] -6 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0 0
## [151] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0
## [176] 0 0 0 0 0 0 1 12 0 -5 0 -1 0 0 0 -1 0 0 1 0 4 -1 0 0 0
## [201] 0 -1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 -6 0 0 0 0 3
## [226] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 -2 0
get_sentiment(threads_2_f$text, method='afinn')
## [1] 0 5 8 -20 0 0 0 20 0 0 0 0 0 0 -2 0 0 -4
## [19] 5 8 0 0 0 12 0 0 0 14 0 7 0 0 6 0 0 5
## [37] 0 0 0 0 0 0 13 0 0 0 0 2 0 0 7 0 0 0
## [55] 0 0 0 21 0 0 0 0 0 0 0 0 5 -4 0 2 0 1
## [73] 0 0 0 13 -1 0 0 0 2 0 0 0 0 0 0 0 0 0
## [91] 0 0 0 3 0 0 0 0 4 2 0 0 10 3 0 0 0 0
## [109] 1 12 0 0 1 0 0 13 0 6 0 0 10 0 0 0 5 -3
## [127] 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 3 0 15 0
## [145] 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0
## [163] 0 0 0 0 0 0 0 0 -1 7 0 2 0 0 0 0 0 0
## [181] 0 4 14 0 -18 0 0 0 0 0 -1 0 0 0 0 -14 0 0
## [199] 0 0 0 -3 0 0 0 0 -2 0 0 0 0 0 0 1 0 0
## [217] 0 0 0 5 0 0 0 2 14 0 0 0 0 0 0 0 0 0
## [235] 0 0 0 0 0 0 0 3 0 0 4 0 0 -3 0
get_sentiment(threads_2_f$text, method='nrc')
## [1] 0 3 2 -8 0 0 0 8 0 0 0 0 0 0 0 0 0 3 8 2 0 0 0 -3 0
## [26] 0 0 8 0 2 0 0 2 0 1 1 0 -1 0 0 0 -1 13 0 0 0 0 -1 0 0
## [51] 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 2 0 0 0 1 0 0 0
## [76] 9 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3
## [101] 0 0 8 -1 0 0 0 0 0 7 0 0 0 0 0 3 0 3 0 0 3 0 0 0 3
## [126] 1 4 0 -1 0 0 0 0 0 0 0 0 0 0 1 0 0 5 0 0 0 0 0 0 0
## [151] 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 1 0 1 0
## [176] 0 0 0 0 0 0 1 8 0 -6 0 -1 0 0 0 0 0 0 1 0 -4 -1 0 0 0
## [201] 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 4
## [226] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 -5 0
get_nrc_sentiment(threads_2_f$text)
## anger anticipation disgust fear joy sadness surprise trust negative
## 1 0 0 0 0 0 0 0 0 0
## 2 7 11 3 9 6 5 4 14 16
## 3 0 1 0 0 1 0 0 2 1
## 4 11 3 5 7 1 5 2 6 14
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 0 0 0
## 8 1 6 2 3 8 4 4 8 4
## 9 0 0 0 0 0 0 0 0 0
## 10 0 0 0 0 0 0 0 1 0
## 11 0 0 0 0 0 0 0 0 0
## 12 0 0 0 0 0 0 0 0 0
## 13 0 0 0 0 0 0 0 0 0
## 14 0 0 0 0 0 0 0 0 0
## 15 0 0 0 0 0 0 0 0 0
## 16 0 0 0 0 0 0 0 0 0
## 17 0 0 0 0 0 0 0 0 0
## 18 3 2 2 4 3 4 0 4 3
## 19 3 5 1 0 3 2 3 7 5
## 20 1 2 0 1 2 0 3 2 1
## 21 0 0 0 0 0 0 0 0 0
## 22 0 0 0 0 0 0 0 0 0
## 23 0 0 0 0 0 0 0 0 0
## 24 1 5 1 2 1 4 2 2 5
## 25 0 0 0 0 0 0 0 0 0
## 26 0 0 0 0 0 0 0 0 0
## 27 0 0 0 0 0 0 0 0 0
## 28 2 11 2 4 7 1 4 10 3
## 29 0 0 0 0 0 0 0 0 0
## 30 1 3 1 1 2 1 1 3 2
## 31 0 0 0 0 0 0 0 0 0
## 32 0 0 0 0 0 0 0 0 0
## 33 0 0 0 0 2 0 1 1 0
## 34 0 0 0 0 0 0 0 0 0
## 35 0 0 0 0 0 0 0 0 0
## 36 0 1 1 0 0 0 0 0 1
## 37 0 0 0 0 0 0 0 0 0
## 38 1 1 1 1 0 1 1 0 1
## 39 0 0 0 0 0 0 0 0 0
## 40 0 0 0 0 0 0 0 0 0
## 41 0 0 0 0 0 0 0 0 0
## 42 0 0 0 0 0 0 0 0 1
## 43 1 7 1 4 6 1 3 9 6
## 44 0 0 0 0 0 0 0 0 0
## 45 0 0 0 0 0 0 0 0 0
## 46 0 0 0 0 0 0 0 0 0
## 47 0 0 0 0 0 0 0 0 0
## 48 1 1 1 1 0 1 0 0 1
## 49 0 0 0 0 0 0 0 0 0
## 50 0 0 0 0 0 0 0 0 0
## 51 1 2 1 1 1 1 0 1 1
## 52 0 0 0 0 0 0 0 0 0
## 53 0 0 0 0 0 0 0 0 0
## 54 0 0 0 0 0 0 0 0 0
## 55 0 0 0 0 0 0 0 0 0
## 56 0 0 0 0 0 0 0 0 0
## 57 0 0 0 0 0 0 0 0 0
## 58 4 7 2 4 2 3 4 2 5
## 59 0 0 0 0 0 0 0 0 0
## 60 0 0 0 0 0 0 0 0 0
## 61 0 0 0 0 0 0 0 0 0
## 62 0 0 0 0 0 0 0 0 0
## 63 0 0 0 0 0 0 0 0 0
## 64 0 0 0 0 0 0 0 0 0
## 65 0 0 0 0 0 0 0 0 0
## 66 0 0 0 0 0 0 0 0 0
## 67 0 0 1 0 0 0 0 0 0
## 68 0 1 1 2 1 0 1 1 1
## 69 0 0 0 0 0 0 0 0 0
## 70 0 0 0 0 0 0 0 1 0
## 71 0 0 0 0 0 0 0 0 0
## 72 2 0 1 1 0 1 1 1 4
## 73 0 0 0 0 0 0 0 0 0
## 74 0 0 0 0 0 0 0 0 0
## 75 0 0 0 0 0 0 0 0 0
## 76 1 5 3 2 4 1 6 5 2
## 77 2 2 1 3 3 1 1 3 3
## 78 0 0 0 0 0 0 0 0 0
## 79 0 0 0 0 0 0 0 0 0
## 80 0 0 0 0 0 0 0 0 0
## 81 0 0 0 0 0 1 0 0 0
## 82 0 0 0 0 0 0 0 0 0
## 83 0 0 0 0 0 0 0 0 0
## 84 0 0 0 0 0 0 0 0 0
## 85 0 0 0 0 0 0 0 0 0
## 86 0 0 0 0 0 0 0 0 0
## 87 0 0 0 0 0 0 0 0 0
## 88 0 0 0 0 0 0 0 0 0
## 89 0 0 0 0 0 0 0 0 0
## 90 0 0 0 0 0 0 0 0 0
## 91 0 0 0 0 0 0 0 0 0
## 92 0 0 0 0 0 0 0 0 0
## 93 0 0 0 0 0 0 0 0 0
## 94 0 0 0 0 0 0 0 0 0
## 95 0 0 0 0 0 0 0 0 0
## 96 0 0 0 0 0 0 0 0 0
## 97 0 0 0 0 0 0 0 0 0
## 98 0 0 0 0 0 0 0 0 0
## 99 0 1 0 1 1 0 2 2 1
## 100 0 1 1 2 1 1 1 2 0
## 101 0 0 0 0 0 0 0 0 0
## 102 0 0 0 0 0 0 0 0 0
## 103 12 11 9 13 10 14 8 12 20
## 104 2 0 2 2 1 1 1 2 4
## 105 0 0 0 0 0 0 0 0 0
## 106 0 0 0 0 0 0 0 0 0
## 107 0 0 0 0 0 0 0 0 0
## 108 0 0 0 0 0 0 0 0 0
## 109 0 0 0 0 0 0 0 0 0
## 110 3 4 5 5 6 5 3 5 6
## 111 0 0 0 0 0 0 0 0 0
## 112 0 0 0 0 0 0 0 0 0
## 113 0 0 1 0 0 0 0 0 0
## 114 0 0 0 0 0 0 0 0 0
## 115 0 0 0 0 0 0 0 0 0
## 116 2 5 0 2 3 2 2 3 6
## 117 0 0 0 0 0 0 0 0 0
## 118 0 0 0 0 0 0 1 0 0
## 119 0 0 0 0 0 0 0 0 0
## 120 0 0 0 0 0 0 0 0 0
## 121 1 0 0 0 0 0 0 1 1
## 122 0 0 0 0 0 0 0 0 0
## 123 0 0 0 0 0 0 0 0 0
## 124 0 0 0 0 0 0 0 0 0
## 125 0 0 0 0 0 0 0 2 0
## 126 2 2 1 1 0 2 0 1 4
## 127 1 1 0 0 2 0 1 2 0
## 128 0 0 0 0 0 0 0 0 0
## 129 0 0 0 0 0 0 1 0 1
## 130 0 0 0 0 0 0 0 0 0
## 131 0 0 0 0 0 0 0 0 0
## 132 0 0 0 0 0 0 0 0 0
## 133 0 0 0 0 0 0 0 0 0
## 134 0 0 0 0 0 0 0 0 0
## 135 0 0 0 0 0 0 0 0 0
## 136 0 0 0 0 0 0 0 0 0
## 137 0 0 0 0 0 0 0 0 0
## 138 0 0 0 0 0 0 0 0 0
## 139 0 0 0 0 0 0 0 0 0
## 140 0 0 0 0 0 0 0 0 0
## 141 4 3 0 3 2 3 2 3 7
## 142 0 0 0 0 0 0 0 0 0
## 143 0 8 0 2 6 0 3 6 6
## 144 0 0 0 0 0 0 0 0 0
## 145 0 0 0 0 0 0 0 0 0
## 146 0 0 0 0 0 0 0 0 0
## 147 0 0 0 0 0 0 0 0 0
## 148 0 0 0 0 0 0 0 0 0
## 149 0 0 0 0 0 0 0 0 0
## 150 0 0 0 0 0 0 0 0 0
## 151 0 0 0 0 0 0 0 0 0
## 152 0 0 0 0 0 0 1 0 0
## 153 0 0 0 0 0 0 0 0 0
## 154 0 0 0 0 0 0 0 0 0
## 155 0 0 0 0 0 0 0 0 0
## 156 0 0 0 0 0 0 0 0 0
## 157 0 0 0 0 0 0 0 0 0
## 158 0 1 0 0 0 0 0 0 0
## 159 0 0 0 0 0 0 0 0 0
## 160 0 0 0 0 0 0 0 0 0
## 161 0 0 0 0 0 0 0 0 0
## 162 0 0 0 0 0 0 0 0 0
## 163 0 0 0 0 0 0 0 0 0
## 164 0 0 0 0 0 0 0 0 0
## 165 0 0 0 0 0 0 0 0 0
## 166 0 0 0 0 0 0 0 0 0
## 167 0 0 0 0 0 0 0 0 0
## 168 0 0 0 0 0 0 0 0 0
## 169 0 0 0 0 0 0 0 0 0
## 170 0 0 0 0 0 0 0 0 0
## 171 0 0 0 0 0 0 0 0 1
## 172 0 1 2 0 2 0 2 3 3
## 173 0 0 0 0 0 0 0 0 0
## 174 0 0 0 0 0 0 0 0 0
## 175 0 0 0 0 0 0 0 0 0
## 176 0 0 0 0 0 0 0 0 0
## 177 0 0 0 0 0 0 0 0 0
## 178 0 0 0 0 0 0 0 0 0
## 179 0 0 0 0 0 0 0 0 0
## 180 0 0 0 0 0 0 0 0 0
## 181 0 0 0 0 0 0 0 0 0
## 182 0 0 0 0 1 0 0 1 0
## 183 4 7 4 6 7 6 4 14 9
## 184 0 0 0 0 0 0 0 0 0
## 185 6 0 3 7 0 4 0 3 8
## 186 0 0 0 0 0 0 0 0 0
## 187 1 1 1 1 0 1 0 0 1
## 188 0 0 0 0 0 0 0 0 0
## 189 0 0 0 0 0 0 0 0 0
## 190 0 0 0 0 0 0 0 0 0
## 191 0 0 1 1 1 1 0 1 2
## 192 0 0 0 0 0 0 0 0 0
## 193 0 0 0 0 0 0 0 0 0
## 194 0 0 0 0 0 0 0 1 0
## 195 0 0 0 0 0 0 0 0 0
## 196 6 4 3 4 3 3 2 3 9
## 197 1 1 1 2 0 2 0 2 4
## 198 0 0 0 0 0 0 0 0 0
## 199 0 0 0 0 0 0 0 0 0
## 200 0 0 0 0 0 0 0 0 0
## 201 0 0 0 0 0 0 0 0 0
## 202 1 1 2 4 1 2 1 2 3
## 203 0 0 0 0 0 0 0 0 0
## 204 0 0 0 0 0 0 0 0 0
## 205 0 0 0 0 0 0 0 0 0
## 206 0 0 0 0 0 0 0 0 0
## 207 4 7 4 6 3 5 6 5 8
## 208 0 0 0 0 0 0 0 0 0
## 209 0 0 0 0 0 0 0 0 0
## 210 0 0 0 0 0 0 0 0 0
## 211 0 1 0 1 0 0 0 0 0
## 212 0 0 0 0 0 0 0 0 0
## 213 0 0 0 0 0 0 0 0 0
## 214 1 1 0 1 0 2 0 1 2
## 215 0 0 0 0 0 0 0 0 0
## 216 0 0 0 0 0 0 0 0 0
## 217 0 1 0 1 0 0 0 0 0
## 218 0 0 0 0 0 0 0 0 0
## 219 0 0 0 0 0 0 0 0 0
## 220 2 2 1 4 3 3 1 2 4
## 221 0 0 0 0 0 0 0 0 0
## 222 0 0 0 0 0 0 0 0 0
## 223 0 0 0 0 0 0 0 0 0
## 224 1 0 1 0 1 0 1 0 1
## 225 0 2 1 0 2 0 2 3 0
## 226 0 0 0 0 0 0 0 0 0
## 227 0 0 0 0 0 0 0 0 0
## 228 0 0 0 0 0 0 0 0 0
## 229 0 0 0 0 0 0 0 0 0
## 230 0 0 0 0 0 0 0 0 0
## 231 0 0 0 0 0 0 0 0 0
## 232 0 0 0 0 0 0 0 0 0
## 233 0 0 0 0 0 0 0 0 0
## 234 0 0 0 0 0 0 0 0 0
## 235 0 0 0 0 0 0 0 0 0
## 236 0 0 0 0 0 0 0 0 0
## 237 0 0 0 0 0 0 0 0 0
## 238 0 0 0 0 0 0 0 0 0
## 239 0 0 0 0 0 0 0 0 0
## 240 0 0 0 0 0 0 0 0 0
## 241 0 0 0 0 0 0 0 0 0
## 242 0 0 0 0 0 0 1 0 0
## 243 0 0 0 0 0 0 0 0 0
## 244 0 0 0 0 0 0 0 0 0
## 245 0 0 0 0 0 0 0 0 0
## 246 0 0 0 0 0 0 0 0 0
## 247 0 0 0 0 0 0 0 0 0
## 248 3 1 3 2 0 2 1 3 6
## 249 0 0 0 0 0 0 0 0 0
## positive
## 1 0
## 2 18
## 3 3
## 4 6
## 5 0
## 6 0
## 7 0
## 8 12
## 9 0
## 10 0
## 11 0
## 12 0
## 13 0
## 14 0
## 15 0
## 16 0
## 17 0
## 18 6
## 19 12
## 20 3
## 21 0
## 22 0
## 23 0
## 24 2
## 25 0
## 26 0
## 27 0
## 28 11
## 29 0
## 30 4
## 31 0
## 32 0
## 33 2
## 34 0
## 35 1
## 36 2
## 37 0
## 38 0
## 39 0
## 40 0
## 41 0
## 42 0
## 43 19
## 44 0
## 45 0
## 46 0
## 47 0
## 48 0
## 49 0
## 50 0
## 51 2
## 52 0
## 53 0
## 54 0
## 55 0
## 56 0
## 57 0
## 58 7
## 59 0
## 60 0
## 61 0
## 62 0
## 63 0
## 64 0
## 65 0
## 66 0
## 67 2
## 68 3
## 69 0
## 70 0
## 71 0
## 72 5
## 73 0
## 74 0
## 75 0
## 76 11
## 77 4
## 78 0
## 79 0
## 80 0
## 81 0
## 82 0
## 83 0
## 84 0
## 85 0
## 86 0
## 87 0
## 88 0
## 89 0
## 90 0
## 91 0
## 92 0
## 93 0
## 94 0
## 95 0
## 96 0
## 97 0
## 98 0
## 99 4
## 100 3
## 101 0
## 102 0
## 103 28
## 104 3
## 105 0
## 106 0
## 107 0
## 108 0
## 109 0
## 110 13
## 111 0
## 112 0
## 113 0
## 114 0
## 115 0
## 116 9
## 117 0
## 118 3
## 119 0
## 120 0
## 121 4
## 122 0
## 123 0
## 124 0
## 125 3
## 126 5
## 127 4
## 128 0
## 129 0
## 130 0
## 131 0
## 132 0
## 133 0
## 134 0
## 135 0
## 136 0
## 137 0
## 138 0
## 139 0
## 140 1
## 141 7
## 142 0
## 143 11
## 144 0
## 145 0
## 146 0
## 147 0
## 148 0
## 149 0
## 150 0
## 151 0
## 152 2
## 153 0
## 154 0
## 155 0
## 156 0
## 157 0
## 158 0
## 159 0
## 160 0
## 161 0
## 162 0
## 163 0
## 164 0
## 165 0
## 166 0
## 167 0
## 168 0
## 169 0
## 170 0
## 171 0
## 172 4
## 173 0
## 174 1
## 175 0
## 176 0
## 177 0
## 178 0
## 179 0
## 180 0
## 181 0
## 182 1
## 183 17
## 184 0
## 185 2
## 186 0
## 187 0
## 188 0
## 189 0
## 190 0
## 191 2
## 192 0
## 193 0
## 194 1
## 195 0
## 196 5
## 197 3
## 198 0
## 199 0
## 200 0
## 201 0
## 202 2
## 203 0
## 204 0
## 205 0
## 206 0
## 207 12
## 208 0
## 209 0
## 210 0
## 211 0
## 212 0
## 213 0
## 214 2
## 215 0
## 216 0
## 217 0
## 218 0
## 219 0
## 220 4
## 221 0
## 222 0
## 223 0
## 224 2
## 225 4
## 226 0
## 227 0
## 228 0
## 229 0
## 230 0
## 231 0
## 232 0
## 233 0
## 234 0
## 235 0
## 236 0
## 237 0
## 238 0
## 239 0
## 240 0
## 241 0
## 242 1
## 243 0
## 244 0
## 245 0
## 246 0
## 247 0
## 248 1
## 249 0
# by string
library(sentimentr)
sentiment_by(threads_2_f$text)
## Key: <element_id>
## element_id word_count sd ave_sentiment
## <int> <int> <num> <num>
## 1: 1 0 NA 0.00000000
## 2: 2 616 0.4149229 0.08268525
## 3: 3 84 0.2554850 0.19380210
## 4: 4 166 0.9055534 -0.89195202
## 5: 5 0 NA 0.00000000
## ---
## 245: 245 16 NA 0.22500000
## 246: 246 0 NA 0.00000000
## 247: 247 0 NA 0.00000000
## 248: 248 78 0.5713983 -0.34350844
## 249: 249 0 NA 0.00000000
# by sentence
sentiment(threads_2_f$text)
## Key: <element_id, sentence_id>
## element_id sentence_id word_count sentiment
## <int> <int> <int> <num>
## 1: 1 1 NA 0.0000000
## 2: 2 1 15 0.0000000
## 3: 2 2 14 -0.5478855
## 4: 2 3 16 -0.3500000
## 5: 2 4 17 -0.1819017
## ---
## 664: 248 1 28 -0.4889915
## 665: 248 2 9 0.0000000
## 666: 248 3 29 -1.0584634
## 667: 248 4 12 0.2309401
## 668: 249 1 NA 0.0000000
# --- Preview a slice of your dataset ---
threads_2_f[20:30, ]
## date_utc timestamp
## 20 2025-06-24 1750783563
## 21 2024-12-19 1734569871
## 22 2025-09-16 1758045179
## 23 2025-04-03 1743710395
## 24 2025-02-13 1739462512
## 25 2025-06-29 1751218833
## 26 2025-08-27 1756299442
## 27 2025-01-18 1737160215
## 28 2025-06-29 1751219647
## 29 2024-12-12 1734031921
## 30 2025-06-14 1749912568
## title
## 20 Don't Care What Nobody Says, This Hyped Me Up Back in 2023.
## 21 Charlie Cox says the upcoming Disney+ Daredevil series will go darker than the Netflix series: "We really pushed for the show to remain geared towards an older audience and not dumbed down to kind of capture a wider net of people"
## 22 What do you think of the mcu version of lady death/ Rio Vidal played by audrey plaza
## 23 Chris Pratt Confirms Star-Lord Will Return, Jokes About Being Absent from 'Doomsday' Reveal: "They must have cut away from it. I don't know what happened. My chair was there. I'm sure it was there.\035
## 24 Michael B. Jordan Says Marvel Will Get Its Success Back, but He Tells the Studio: \030I Want to See a Blade Movie\031
## 25 Scarlett Johansson: \030I was cast for my desirability \024 that\031s shifted\031
## 26 Jake Schreier shares new BTS pics to celebrate Thunderbolts* streaming on Disney+
## 27 People think Daredevil isn't funny, but Matt is hilarious
## 28 I am clearly not Ironheart\031s target demographic.
## 29 Denzel Washington Called Ryan Coogler to Apologize for Spilling \030Black Panther 3\031 News
## 30 Is someone erased from shot?
## text
## 20 Kang was literally the best part about Quantumania. Just thinking about The Kang Dynasty and The Avengers and Co having to fight many many different versions of Kang was enough for me to get excited for Loki Season 2, Kang Dynsasty, everything else involving Kang. I truly hope that Marvel comes to their senses and bring Kang back for Phase 7.
## 21
## 22
## 23
## 24 > \034[Marvel\031s] doing great,\035 said Jordan, who is one of the MCU\031s all-time great villains after playing Erik Killmonger in \034Black Panther\035 and its sequel. \034They\031ll get it back.\035 > One comic book tentpole Jordan hopes Marvel gets off the ground is its long-in-the-works \034Blade\035 movie. First announced in 2019 with Mahershala Ali tapped to play the eponymous vampire hunter, \034Blade\035 has been through various writers and directors. Marvel officially took the movie off its release calendar last fall. > \034Launching any franchise, it\031s tough,\035 Jordan said. \034I hope it gets together. I want to see a \030Blade\031 movie, you know what I\031m saying? The \030Blade\031 franchise was everything.\035
## 25
## 26
## 27
## 28 Nearly middle-aged white dude. Have had some qualms about some projects since Endgame. And here is this show about a teenage girl that seems like it is trying to fill the Iron Man void. But damn if this show isn\031t actually good. I am really enjoying the acting, the storytelling, and the way the show is going. It\031s really fun to watch and I am really getting in to the characters- especially NATALIE. And Joe. Riri is having a pretty great arc here, and I get the feeling I am going to be way more invested in her as a character as more episodes come out. I wasn\031t planning on watching this. It just so happened that my wife had a girl\031s night and I put my kid to bed and had nothing else to do after finishing Andor. So I said \034fuck it, let\031s see.\035 And I\031m glad I did. I highly suggest checking it out. There are some great action sequences, some mysterious intrigue, and ya know, it\031s just cool.
## 29
## 30 Might be a very rogue theory but i think in typical MCU fashion, someone important to the story is erased from this shot. Maybe Victor? Steve (Another rogue theory that Cap is somehow in F4 because his time travel was the reason for this F4 timeline), BABY FRANKLIN???. There's too much unused space there that it gives me NWH trailer vibes. Like how Toby and Andrew were erased from the swinging shot.
## subreddit comments
## 20 marvelstudios 457
## 21 marvelstudios 202
## 22 marvelstudios 412
## 23 marvelstudios 276
## 24 marvelstudios 259
## 25 marvelstudios 390
## 26 marvelstudios 140
## 27 marvelstudios 152
## 28 marvelstudios 793
## 29 marvelstudios 146
## 30 marvelstudios 580
## url
## 20 https://www.reddit.com/r/marvelstudios/comments/1ljg6p7/dont_care_what_nobody_says_this_hyped_me_up_back/
## 21 https://www.reddit.com/r/marvelstudios/comments/1hhgtx9/charlie_cox_says_the_upcoming_disney_daredevil/
## 22 https://www.reddit.com/r/marvelstudios/comments/1niokze/what_do_you_think_of_the_mcu_version_of_lady/
## 23 https://www.reddit.com/r/marvelstudios/comments/1jqsj5z/chris_pratt_confirms_starlord_will_return_jokes/
## 24 https://www.reddit.com/r/marvelstudios/comments/1iomc2i/michael_b_jordan_says_marvel_will_get_its_success/
## 25 https://www.reddit.com/r/marvelstudios/comments/1lnkmkg/scarlett_johansson_i_was_cast_for_my_desirability/
## 26 https://www.reddit.com/r/marvelstudios/comments/1n1gb1n/jake_schreier_shares_new_bts_pics_to_celebrate/
## 27 https://www.reddit.com/r/marvelstudios/comments/1i3v74j/people_think_daredevil_isnt_funny_but_matt_is/
## 28 https://www.reddit.com/r/marvelstudios/comments/1lnkyne/i_am_clearly_not_ironhearts_target_demographic/
## 29 https://www.reddit.com/r/marvelstudios/comments/1hct77d/denzel_washington_called_ryan_coogler_to/
## 30 https://www.reddit.com/r/marvelstudios/comments/1lbagts/is_someone_erased_from_shot/
# --- 1. Clean text and handle negations ---
handle_negations <- function(text){
text %>%
str_replace_all("\\bnot ([a-z]+)", "not_\\1") %>%
str_replace_all("\\bnever ([a-z]+)", "never_\\1") %>%
str_replace_all("\\bno ([a-z]+)", "no_\\1") %>%
str_squish()
}
threads_2_f <- threads_2_f %>%
mutate(text_clean = handle_negations(text))
# --- 2. Dictionary-based sentiment (syuzhet) ---
threads_2_f <- threads_2_f %>%
mutate(
syuzhet_score = get_sentiment(text_clean, method = "syuzhet"),
bing_score = get_sentiment(text_clean, method = "bing"),
afinn_score = get_sentiment(text_clean, method = "afinn"),
nrc_score = get_sentiment(text_clean, method = "nrc")
)
# Optional: NRC emotions
nrc_emotions <- get_nrc_sentiment(threads_2_f$text_clean)
# --- 3. Negation-aware sentiment using sentimentr ---
threads_2_f <- threads_2_f %>%
mutate(
text_split = get_sentences(text_clean)
)
reddit_sentiment <- sentiment_by(threads_2_f$text_split)
# --- 4. Merge sentiment results for analysis ---
threads_2_f <- threads_2_f %>%
bind_cols(reddit_sentiment %>% select(ave_sentiment, sd, word_count))
# --- 5. Summarize dictionary sentiment ---
dict_summary <- threads_2_f %>%
summarize(
syuzhet_avg = mean(syuzhet_score),
bing_avg = mean(bing_score),
afinn_avg = mean(afinn_score),
nrc_avg = mean(nrc_score)
)
dict_summary
## syuzhet_avg bing_avg afinn_avg nrc_avg
## 1 0.3891566 0.2851406 0.9076305 0.4216867
# --- 6. Plot negation-aware sentiment (sentimentr) ---
ggplot(threads_2_f, aes(x = ave_sentiment)) +
geom_histogram(binwidth = 0.1, fill = "steelblue", color = "black") +
labs(
title = "Distribution of Reddit Comment Sentiment (Negation-Aware)",
x = "Average Sentiment per Comment",
y = "Frequency"
)
# --- 7. Optional: test sarcasm (demonstration) ---
text_test <- c(
"I loved this movie! Not good at all.",
"Marvel really nailed it. Never boring!",
"Worst movie ever, bless your heart."
)
sentiment_by(text_test)
## Key: <element_id>
## element_id word_count sd ave_sentiment
## <int> <int> <num> <num>
## 1: 1 8 0.44194174 -0.0625000
## 2: 2 6 0.02270292 0.6910534
## 3: 3 6 NA 0.2041241
# Select 10 random samples
set.seed(123) # for reproducibility
sample_comments <- threads_2_f %>%
sample_n(10) %>%
select(text_clean, syuzhet_score, bing_score, afinn_score, nrc_score, ave_sentiment)
# Display as a table
library(knitr)
kable(sample_comments, caption = "Sample Reddit Comments with Sentiment Scores")
| text_clean | syuzhet_score | bing_score | afinn_score | nrc_score | ave_sentiment |
|---|---|---|---|---|---|
| 0.00 | 0 | 0 | 0 | 0.0000000 | |
| I think we’re all aware that it’s nearly impossible for Marvel to ever beat Infinity War or Endgame, but the way the events of those movies (especially the Blip) are still playing part in most of the Marvel movies and TV shows to this day is so cool. We got four different point of views of the blip: - Normal POV, shown in almost all movies and TV series since Infinity War. People slowly turning into dust one by one - Third person POV, shown in Far From Home as the students suddenly disappeared without anyone having idea of what’s happening - Monica POV, shown in Wandavision: everyone returning from the dust, as the world slowly becomes pure chaos because of the amount of people coming back - Yelena POV, shown in Hawkeye: the point of view of someone who was dusted. They were deleted from the existence for about less than 10 seconds until Hulk snapped his fingers and everyone went back. The whole background around them changes as well since they were out for 5 years Even if it’s small mentions, they keep finding a way to bring the blip consequences back, after all it was the biggest disaster in the whole universe, so it’ll obviously play a part on the plot forever Not only the blip, but basically everything that happened in those two movies. Thanos killing Vision resulted on the legendary Wandavision series and improved Wanda’s character so much (we don’t talk about MoM tho) Some characters’ deaths had huges consequences for other characters too, like how Iron Man’s death impacted Spider Man’s story I really hope Marvel finds a way to do a movie as good as Endgame and Infinity War, those two movies affected how their whole cinematographic universe worked and even though some movies like Quantummania or Far From Home were hated by a big part of the public it’s still cool to see how they are also affected by the snap events some way. | 1.65 | 0 | -2 | 4 | 0.1543963 |
| 0.00 | 0 | 0 | 0 | 0.0000000 | |
| 0.00 | 0 | 0 | 0 | 0.0000000 | |
| 0.00 | 0 | 0 | 0 | 0.0000000 | |
| 0.00 | 0 | 0 | 0 | 0.0000000 | |
| 0.00 | 0 | 0 | 0 | 0.0000000 | |
| Post-Credits Scene: A new character shows up. Actual Payoff: …never_addressed again, but hey, cool orb, right? At this point, Marvel post-credits scenes are like checking your phone for a message that never_comes. | 4.00 | 4 | 6 | 3 | 0.6666667 |
| I keep seeing people on other social platforms talk about the decline of Marvel or DC movies to superhero fatigue, and honestly, I think that explanation misses a lot. Its become a catch-all phrase that ignores other issues with how these movies are being made and released. First, Disney put a lot of pressure on Marvel. Disney pushed for more and more content, especially on Disney+, which led to a bunch of shows and movies coming out back to back. This is the explanation weve heard from studio execs like Kevin Feige. What I feel largely does not_get discussed is the ramifications from the pandemic. It changed how people go to the movies. Some people still havent gone back to theaters regularly, and streaming is now a bigger part of how we watch things. Plus, Disney+ drops Marvel movies just a few months after their theatrical release so for a lot of people, why rush to see it in theaters when you can wait and watch at home? For example, even if its anecdotal, when I asked my brother what movie he would see in July, he said Superman because Fantastic Four will drop in a few months. I also think going to the movies has become expensive, especially for families which is part of the core general audience of these films. Imagine you have a family, you probably already have Disney+, why go to a theatre, spend about $60 on tickets, pay for higher marked food items, etc. Also, international audiences have shifted. Marvel used to crush globally, but those numbers have softened a lot. Not every market is still hyped on the superhero genre the way they used to be. This can be due to a variety of things. There are still places around the globe that havent recovered economically or some other places that have implemented policies to promote their country movies as opposed to American movies. I agree with James Gunns sentiments that the U.S is not_on good terms with other countries. So yeah, superhero fatigue might sound like an easy answer, but it lets studios off the hook. I also think general audiences just love nostalgia. Its human nature too. People gravitate towards what they know after becoming familiar with something for so many years. Its the reason why I think Spider-Man NWH and Deadpool & Wolverine did well. Its the same reason why I think Nintendo can remake the same game with updated graphics and sell it for a higher price. As much as people say they want new stories, people in overwhelming numbers flock to what they already know. | 7.40 | 2 | 13 | 13 | 0.0850477 |
| 0.00 | 0 | 0 | 0 | 0.0000000 |
library(ggplot2)
ggplot(threads_2_f, aes(x = ave_sentiment)) +
geom_histogram(binwidth = 0.1, fill = "steelblue", color = "black") +
labs(
title = "Distribution of Reddit Comment Sentiment (Negation-Aware)",
x = "Average Sentiment per Comment",
y = "Frequency"
)
The distribution in Plot 1 shows average sentiment per comment using a negation-aware method. From the histogram, average sentiments are clustered near zero, but more spread than dictionary methods. The small negative tail indicates some meaningful negativity that lexicons may undercount. on the other hand, the Small but visible positive tail suggests some genuinely enthusiastic comments but not as many as one might expect for a fan-driven Reddit topic.
threads_2_f[20:30, ]
## date_utc timestamp
## 20 2025-06-24 1750783563
## 21 2024-12-19 1734569871
## 22 2025-09-16 1758045179
## 23 2025-04-03 1743710395
## 24 2025-02-13 1739462512
## 25 2025-06-29 1751218833
## 26 2025-08-27 1756299442
## 27 2025-01-18 1737160215
## 28 2025-06-29 1751219647
## 29 2024-12-12 1734031921
## 30 2025-06-14 1749912568
## title
## 20 Don't Care What Nobody Says, This Hyped Me Up Back in 2023.
## 21 Charlie Cox says the upcoming Disney+ Daredevil series will go darker than the Netflix series: "We really pushed for the show to remain geared towards an older audience and not dumbed down to kind of capture a wider net of people"
## 22 What do you think of the mcu version of lady death/ Rio Vidal played by audrey plaza
## 23 Chris Pratt Confirms Star-Lord Will Return, Jokes About Being Absent from 'Doomsday' Reveal: "They must have cut away from it. I don't know what happened. My chair was there. I'm sure it was there.\035
## 24 Michael B. Jordan Says Marvel Will Get Its Success Back, but He Tells the Studio: \030I Want to See a Blade Movie\031
## 25 Scarlett Johansson: \030I was cast for my desirability \024 that\031s shifted\031
## 26 Jake Schreier shares new BTS pics to celebrate Thunderbolts* streaming on Disney+
## 27 People think Daredevil isn't funny, but Matt is hilarious
## 28 I am clearly not Ironheart\031s target demographic.
## 29 Denzel Washington Called Ryan Coogler to Apologize for Spilling \030Black Panther 3\031 News
## 30 Is someone erased from shot?
## text
## 20 Kang was literally the best part about Quantumania. Just thinking about The Kang Dynasty and The Avengers and Co having to fight many many different versions of Kang was enough for me to get excited for Loki Season 2, Kang Dynsasty, everything else involving Kang. I truly hope that Marvel comes to their senses and bring Kang back for Phase 7.
## 21
## 22
## 23
## 24 > \034[Marvel\031s] doing great,\035 said Jordan, who is one of the MCU\031s all-time great villains after playing Erik Killmonger in \034Black Panther\035 and its sequel. \034They\031ll get it back.\035 > One comic book tentpole Jordan hopes Marvel gets off the ground is its long-in-the-works \034Blade\035 movie. First announced in 2019 with Mahershala Ali tapped to play the eponymous vampire hunter, \034Blade\035 has been through various writers and directors. Marvel officially took the movie off its release calendar last fall. > \034Launching any franchise, it\031s tough,\035 Jordan said. \034I hope it gets together. I want to see a \030Blade\031 movie, you know what I\031m saying? The \030Blade\031 franchise was everything.\035
## 25
## 26
## 27
## 28 Nearly middle-aged white dude. Have had some qualms about some projects since Endgame. And here is this show about a teenage girl that seems like it is trying to fill the Iron Man void. But damn if this show isn\031t actually good. I am really enjoying the acting, the storytelling, and the way the show is going. It\031s really fun to watch and I am really getting in to the characters- especially NATALIE. And Joe. Riri is having a pretty great arc here, and I get the feeling I am going to be way more invested in her as a character as more episodes come out. I wasn\031t planning on watching this. It just so happened that my wife had a girl\031s night and I put my kid to bed and had nothing else to do after finishing Andor. So I said \034fuck it, let\031s see.\035 And I\031m glad I did. I highly suggest checking it out. There are some great action sequences, some mysterious intrigue, and ya know, it\031s just cool.
## 29
## 30 Might be a very rogue theory but i think in typical MCU fashion, someone important to the story is erased from this shot. Maybe Victor? Steve (Another rogue theory that Cap is somehow in F4 because his time travel was the reason for this F4 timeline), BABY FRANKLIN???. There's too much unused space there that it gives me NWH trailer vibes. Like how Toby and Andrew were erased from the swinging shot.
## subreddit comments
## 20 marvelstudios 457
## 21 marvelstudios 202
## 22 marvelstudios 412
## 23 marvelstudios 276
## 24 marvelstudios 259
## 25 marvelstudios 390
## 26 marvelstudios 140
## 27 marvelstudios 152
## 28 marvelstudios 793
## 29 marvelstudios 146
## 30 marvelstudios 580
## url
## 20 https://www.reddit.com/r/marvelstudios/comments/1ljg6p7/dont_care_what_nobody_says_this_hyped_me_up_back/
## 21 https://www.reddit.com/r/marvelstudios/comments/1hhgtx9/charlie_cox_says_the_upcoming_disney_daredevil/
## 22 https://www.reddit.com/r/marvelstudios/comments/1niokze/what_do_you_think_of_the_mcu_version_of_lady/
## 23 https://www.reddit.com/r/marvelstudios/comments/1jqsj5z/chris_pratt_confirms_starlord_will_return_jokes/
## 24 https://www.reddit.com/r/marvelstudios/comments/1iomc2i/michael_b_jordan_says_marvel_will_get_its_success/
## 25 https://www.reddit.com/r/marvelstudios/comments/1lnkmkg/scarlett_johansson_i_was_cast_for_my_desirability/
## 26 https://www.reddit.com/r/marvelstudios/comments/1n1gb1n/jake_schreier_shares_new_bts_pics_to_celebrate/
## 27 https://www.reddit.com/r/marvelstudios/comments/1i3v74j/people_think_daredevil_isnt_funny_but_matt_is/
## 28 https://www.reddit.com/r/marvelstudios/comments/1lnkyne/i_am_clearly_not_ironhearts_target_demographic/
## 29 https://www.reddit.com/r/marvelstudios/comments/1hct77d/denzel_washington_called_ryan_coogler_to/
## 30 https://www.reddit.com/r/marvelstudios/comments/1lbagts/is_someone_erased_from_shot/
## text_clean
## 20 Kang was literally the best part about Quantumania. Just thinking about The Kang Dynasty and The Avengers and Co having to fight many many different versions of Kang was enough for me to get excited for Loki Season 2, Kang Dynsasty, everything else involving Kang. I truly hope that Marvel comes to their senses and bring Kang back for Phase 7.
## 21
## 22
## 23
## 24 > \034[Marvel\031s] doing great,\035 said Jordan, who is one of the MCU\031s all-time great villains after playing Erik Killmonger in \034Black Panther\035 and its sequel. \034They\031ll get it back.\035 > One comic book tentpole Jordan hopes Marvel gets off the ground is its long-in-the-works \034Blade\035 movie. First announced in 2019 with Mahershala Ali tapped to play the eponymous vampire hunter, \034Blade\035 has been through various writers and directors. Marvel officially took the movie off its release calendar last fall. > \034Launching any franchise, it\031s tough,\035 Jordan said. \034I hope it gets together. I want to see a \030Blade\031 movie, you know what I\031m saying? The \030Blade\031 franchise was everything.\035
## 25
## 26
## 27
## 28 Nearly middle-aged white dude. Have had some qualms about some projects since Endgame. And here is this show about a teenage girl that seems like it is trying to fill the Iron Man void. But damn if this show isn\031t actually good. I am really enjoying the acting, the storytelling, and the way the show is going. It\031s really fun to watch and I am really getting in to the characters- especially NATALIE. And Joe. Riri is having a pretty great arc here, and I get the feeling I am going to be way more invested in her as a character as more episodes come out. I wasn\031t planning on watching this. It just so happened that my wife had a girl\031s night and I put my kid to bed and had nothing else to do after finishing Andor. So I said \034fuck it, let\031s see.\035 And I\031m glad I did. I highly suggest checking it out. There are some great action sequences, some mysterious intrigue, and ya know, it\031s just cool.
## 29
## 30 Might be a very rogue theory but i think in typical MCU fashion, someone important to the story is erased from this shot. Maybe Victor? Steve (Another rogue theory that Cap is somehow in F4 because his time travel was the reason for this F4 timeline), BABY FRANKLIN???. There's too much unused space there that it gives me NWH trailer vibes. Like how Toby and Andrew were erased from the swinging shot.
## syuzhet_score bing_score afinn_score nrc_score
## 20 1.75 4 8 2
## 21 0.00 0 0 0
## 22 0.00 0 0 0
## 23 0.00 0 0 0
## 24 1.00 2 12 -3
## 25 0.00 0 0 0
## 26 0.00 0 0 0
## 27 0.00 0 0 0
## 28 3.40 5 14 8
## 29 0.00 0 0 0
## 30 2.05 1 7 2
## text_split
## 20 Kang was literally the best part about Quantumania., Just thinking about The Kang Dynasty and The Avengers and Co having to fight many many different versions of Kang was enough for me to get excited for Loki Season 2, Kang Dynsasty, everything else involving Kang., I truly hope that Marvel comes to their senses and bring Kang back for Phase 7.
## 21
## 22
## 23
## 24 > \034[Marvel\031s] doing great,\035 said Jordan, who is one of the MCU\031s all-time great villains after playing Erik Killmonger in \034Black Panther\035 and its sequel., \034They\031ll get it back.\035 > One comic book tentpole Jordan hopes Marvel gets off the ground is its long-in-the-works \034Blade\035 movie., First announced in 2019 with Mahershala Ali tapped to play the eponymous vampire hunter, \034Blade\035 has been through various writers and directors., Marvel officially took the movie off its release calendar last fall., > \034Launching any franchise, it\031s tough,\035 Jordan said., \034I hope it gets together., I want to see a \030Blade\031 movie, you know what I\031m saying?, The \030Blade\031 franchise was everything.\035
## 25
## 26
## 27
## 28 Nearly middle-aged white dude., Have had some qualms about some projects since Endgame., And here is this show about a teenage girl that seems like it is trying to fill the Iron Man void., But damn if this show isn\031t actually good., I am really enjoying the acting, the storytelling, and the way the show is going., It\031s really fun to watch and I am really getting in to the characters- especially NATALIE., And Joe., Riri is having a pretty great arc here, and I get the feeling I am going to be way more invested in her as a character as more episodes come out., I wasn\031t planning on watching this., It just so happened that my wife had a girl\031s night and I put my kid to bed and had nothing else to do after finishing Andor., So I said \034fuck it, let\031s see.\035 And I\031m glad I did., I highly suggest checking it out., There are some great action sequences, some mysterious intrigue, and ya know, it\031s just cool.
## 29
## 30 Might be a very rogue theory but i think in typical MCU fashion, someone important to the story is erased from this shot., Maybe Victor?, Steve (Another rogue theory that Cap is somehow in F4 because his time travel was the reason for this F4 timeline), BABY FRANKLIN???., There's too much unused space there that it gives me NWH trailer vibes., Like how Toby and Andrew were erased from the swinging shot.
## ave_sentiment sd word_count
## 20 0.267505592 0.2977986 59
## 21 0.000000000 NA 0
## 22 0.000000000 NA 0
## 23 0.000000000 NA 0
## 24 0.120650878 0.1201703 117
## 25 0.000000000 NA 0
## 26 0.000000000 NA 0
## 27 0.000000000 NA 0
## 28 0.073634652 0.2187022 180
## 29 0.000000000 NA 0
## 30 -0.007113112 0.3371808 72
library(sentimentr)
reddit_sentiment <- threads_2_f %>%
mutate(text_split = get_sentences(text)) %$%
sentiment_by(text_split)
reddit_sentiment %>% arrange(desc(ave_sentiment))
## Key: <element_id>
## element_id word_count sd ave_sentiment
## <int> <int> <num> <num>
## 1: 225 37 0.34127094 0.6078778
## 2: 33 18 0.27639320 0.5116673
## 3: 152 7 NA 0.5102520
## 4: 118 36 0.11996198 0.4642992
## 5: 125 42 NA 0.4073608
## ---
## 245: 126 146 0.33275290 -0.2289097
## 246: 187 19 0.02335499 -0.2432932
## 247: 248 78 0.57139830 -0.3435084
## 248: 185 94 0.35494365 -0.4299785
## 249: 4 166 0.90555344 -0.8919520
plot(reddit_sentiment)
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the sentimentr package.
## Please report the issue at <https://github.com/trinker/sentimentr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the sentimentr package.
## Please report the issue at <https://github.com/trinker/sentimentr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Plot 2 shows how emotional valence changes across the full corpus, treating all combined comments like a single narrative. That is, the begins stable and neutral comments start at moderate emotional valence. Then there is a strong rise in the middle suggesting that users express the most positivity/excitement mid-discussions, possibly in response to specific trailers, casting news, or fan theories. It declines towards the end which might be as a result of arguments or disagreements surfacing.
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
# Reshape for plotting
dict_scores <- threads_2_f %>%
select(syuzhet_score, bing_score, afinn_score) %>%
melt(variable.name = "method", value.name = "score")
## No id variables; using all as measure variables
ggplot(dict_scores, aes(x = score, fill = method)) +
geom_histogram(alpha = 0.5, position = "identity", bins = 30) +
labs(
title = "Comparison of Dictionary-Based Sentiment Scores",
x = "Sentiment Score",
y = "Frequency"
) +
theme_minimal()
The histogram in plot 3 overlays distributions from the three
lexicon-based sentiment methods. All three methods show a very heavy
concentration around zero, meaning the majority of Reddit comments are
either neutral or only mildly emotional. The narrow clustering around 0
Suggests conversations about the Marvel topic are more
discussion-oriented than emotional. Reddit users aren’t extremely
positive or negative most of the time. Again, the few extreme values on
either tail from the graph further suggests the rare highly
positive/negative values which likely represents excited fans reacting
strongly, criticisms or frustrations about specific movie decisions or
sarcastic remarks that lexicons incorrectly mark as extremely
negative
The NRC lexicon provides emotions like joy, anger, sadness, trust, etc.
nrc_emotions <- get_nrc_sentiment(threads_2_f$text_clean)
# Sum each emotion
nrc_summary <- colSums(nrc_emotions)
# Convert to a dataframe for plotting
nrc_df <- data.frame(
emotion = names(nrc_summary),
count = as.numeric(nrc_summary)
)
ggplot(nrc_df, aes(x = reorder(emotion, -count), y = count, fill = emotion)) +
geom_bar(stat = "identity") +
labs(
title = "NRC Emotion Distribution Across Reddit Comments",
x = "Emotion",
y = "Total Count"
) +
theme_minimal() +
theme(legend.position = "none")
Across all sentiment approaches, the discussion surrounding the Marvel topic on Reddit appears overwhelmingly neutral, with relatively few extreme emotional reactions. Dictionary-based methods (syuzhet, bing, afinn) all show heavy clustering around a sentiment score of zero, while the negation-aware method (sentimentr) reveals slightly more emotional variation but the same overall pattern. The sentiment trajectory plot indicates that the conversation becomes increasingly positive halfway through before turning more negative toward the end, suggesting moments of enthusiasm followed by criticism or debate. Taken together, the four plots demonstrate that Reddit discourse is nuanced and context-dependent: while fans express excitement, the overall tone remains cautious, mixed, or analytical rather than strongly emotional.