Examining user-generated text data and sentiment analysis from Reddit threads
This project analyzes user-generated text data from Reddit to explore
how readers discuss and respond to the works of Haruki Murakami, one of
my favorite authors, whose writing has inspired me to undertake this
study.
Murakami has been writing and publishing since the late 1970s, achieving
international fame with Norwegian Wood (1987), whose English
translations introduced his distinctive blend of magical realism,
surrealism, and themes of love, loneliness, and music to a worldwide
audience. His literature has historically received mixed reviews -
praised for its imaginative storytelling, dreamlike narrative style, and
evocative atmospheres, while occasionally critiqued for the perceived
emotional detachment in his characters or sometimes the repetition of
certain themes.
By combining n-gram analysis, sentiment scoring, and NRC-based emotional
categorization, this study aims to identify frequently mentioned book
titles, reveal patterns in genre recognition, and highlight both
positive reception and critical engagement among readers.
Loading all the necessary R packages that will be used throughout the workflow
# Package names
packages <- c("RedditExtractoR", "anytime", "magrittr", "httr", "tidytext", "tidyverse", "igraph", "ggraph", "wordcloud2", "textdata", "here", "stringi","sentimentr","syuzhet", "wordcloud")
# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
# Load packages
invisible(lapply(packages, library, character.only = TRUE))
###Search subreddits Identifying and selecting Haruki Murakami - related subreddits to build a focused and reliable text dataset for subsequent N-gram and sentiment analysis
# using keyword
threads <- find_thread_urls(keywords = 'Haruki Murakami',
sort_by = 'relevance',
period = 'all') %>%
drop_na()
rownames(threads) <- NULL
colnames(threads)
head(threads, 3) %>% knitr::kable()
threads$subreddit %>% table() %>% sort(decreasing = T) %>% head(20)
# search for subreddits
subreddit_list <- RedditExtractoR::find_subreddits('Haruki Murakami')
subreddit_list %>%
arrange(desc(subscribers)) %>%
.[1:25,c('subreddit','title','subscribers')] %>%
knitr::kable()
# using both subreddit and keyword
t_1 <- find_thread_urls(keywords= 'Haruki Murakami',
subreddit = 'murakami',
sort_by = 'relevance',
period = 'all') %>%
drop_na()
rownames(t_1) <- NULL
t_2 <- find_thread_urls(keywords= 'Haruki Murakami',
subreddit = 'HarukiMurakami',
sort_by = 'relevance',
period = 'all') %>%
drop_na()
rownames(t_2) <- NULL
t_3 <- find_thread_urls(keywords= 'Haruki Murakami',
subreddit = 'MurakamiBookClub',
sort_by = 'relevance',
period = 'all') %>%
drop_na()
rownames(t_3) <- NULL
threads <- rbind(t_1, t_2, t_3)
save(threads, file = "Murakami.RData")
Cleaning and standardizing thread text
load("Murakami.RData")
# Sanitize text
threads %<>%
mutate(across(
where(is.character),
~ .x %>%
str_replace_all("\\|", "/") %>%
str_replace_all("\\n", " ") %>%
str_squish()
))
head(threads, 2) %>% knitr::kable()
| date_utc | timestamp | title | text | subreddit | comments | url |
|---|---|---|---|---|---|---|
| 2021-11-12 | 1636692365 | The Wind-Up Bird Chronicle, Haruki Murakami - Book Review | It starts off slightly odd, with each subsequent chapter it grows stranger, more complex and more compelling. https://youtu.be/eqhvaFboII8 | murakami | 0 | https://www.reddit.com/r/murakami/comments/qs3mdq/the_windup_bird_chronicle_haruki_murakami_book/ |
| 2018-10-11 | 1539273196 | Haruki Murakami interview: ‘When I write fiction I go to weird, secret places in myself’ | murakami | 7 | https://www.reddit.com/r/murakami/comments/9nb68g/haruki_murakami_interview_when_i_write_fiction_i/ |
Exploring the temporal distribution of threads by converting timestamps and visualizing year-to-year activity trends
# create new column: date
threads%<>%
mutate(date = as.POSIXct(date_utc)) %>%
filter(!is.na(date))
# number of threads by year
threads %>%
ggplot(aes(x = date)) +
geom_histogram(color="black", position = 'stack', binwidth = 3153600) +
scale_x_datetime(date_labels = "%y",
breaks = seq(min(threads$date, na.rm = T),
max(threads$date, na.rm = T),
by = "1 year")) +
theme_minimal()
Cleaning thread text, tokenizing words, and visualizing unique word counts dominant vocabulary in Murakami discussions
# By removing stop words, we focus on meaningful words that reveal actual themes, topics, or patterns in the text, producing more insightful and interpretable analysis.
# load list of stop words - from the tidytext package
data("stop_words")
# view random 10 words
print(stop_words$word[sample(1:nrow(stop_words), 20)])
## [1] "they'd" "find" "w" "aren't" "certain"
## [6] "she" "yourselves" "no" "indicate" "that"
## [11] "out" "going" "too" "without" "there's"
## [16] "she'd" "isn't" "despite" "instead" "showed"
# Regex that matches URL-type string
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&|<|>"
words_clean <- threads %>%
# drop URLs
mutate(text = str_replace_all(text, replace_reg, "")) %>%
# Tokenization (word tokens)
unnest_tokens(word, text, token = 'words') %>%
# drop stop words
anti_join(stop_words, by = "word") %>%
# drop non-alphabet-only strings
filter(str_detect(word, "[a-z]"))
# Check the number of rows after removal of the stop words. There should be fewer words now
print(
glue::glue("Before: {nrow(words)}, After: {nrow(words_clean)}")
)
words_clean %>%
count(word, sort = TRUE) %>%
top_n(20, n) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "words",
y = "counts",
title = "Unique wordcounts")
The dominant terms in discussions reveal the community’s focus on “murakami” and “haruki” naturally lead as the author’s name, followed by “book,” “read,” and “story”. The prominence of “norwegian” and “wood,” likely refer to the novel titled ‘Norwegian Woods’. A specific character reference “toru” (Toru Watanabe from ‘Norwegian Woods’) also frequently appear indicating this novel particularly dominate the conversation.
Visualizing an illustrative word cloud
n <- 20 # number of words with color
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark or bright
df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))
words_clean %>%
count(word, sort = TRUE) %>%
wordcloud2(color = pal,
minRotation = 0,
maxRotation = 0,
ellipticity = 0.8)
Extracting, cleaning, and visualizing trigrams to uncover
common three-word phrases and their co-occurrence
patterns
(Without removing stop words) - While this approach highlights
several book titles as frequent trigrams, it also captures many trivial
or uninformative combinations that can be filtered out for clearer
insights.
words_3gram <- threads %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
select(text) %>%
unnest_tokens(output = paired_words,
input = text,
token = "ngrams",
n = 3)
# Show ngrams with sorted values
words_3gram %>%
count(paired_words, sort = TRUE) %>%
head(20) %>%
knitr::kable()
| paired_words | n |
|---|---|
| NA | 186 |
| on the shore | 25 |
| kafka on the | 24 |
| by haruki murakami | 19 |
| wind up bird | 18 |
| dance dance dance | 17 |
| of norwegian wood | 16 |
| up bird chronicle | 16 |
| if there are | 15 |
| there are any | 15 |
| the wind up | 14 |
| a lot of | 13 |
| let me know | 13 |
| i want to | 12 |
| of his books | 12 |
| of the book | 12 |
| the end of | 12 |
| one of my | 10 |
| please let me | 10 |
| a is for | 9 |
Extracting, cleaning, and visualizing trigrams to uncover
common three-word phrases and their co-occurrence
patterns
(Removing stop words)
# separate the paired words into three columns
words_3gram_pair <- words_3gram %>%
separate(paired_words, c("word1", "word2", "word3"), sep = " ")
# filter rows where there are stop words under word 1 column, word 2 column and word 3 column
words_3gram_pair_filtered <- words_3gram_pair %>%
# drop stop words
filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word & !word3 %in% stop_words$word) %>%
# drop non-alphabet-only strings
filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]") & str_detect(word3, "[a-z]"))
# Filter out words that are not encoded in ASCII
words_3gram_pair_filtered %<>%
filter(stri_enc_isascii(word1) & stri_enc_isascii(word2) & stri_enc_isascii(word3))
# Sort the new trigram (n=3) counts:
words_counts_3 <- words_3gram_pair_filtered %>%
count(word1, word2, word3) %>%
arrange(desc(n))
head(words_counts_3, 20) %>%
knitr::kable()
| word1 | word2 | word3 | n |
|---|---|---|---|
| dance | dance | dance | 17 |
| wild | sheep | chase | 9 |
| phrases | quotes | win | 7 |
| voted | words | phrases | 7 |
| words | phrases | quotes | 7 |
| blind | willow | sleeping | 6 |
| willow | sleeping | woman | 6 |
| hard | boiled | wonderland | 5 |
| norwegian | wood | kafka | 5 |
| audience | member’s | question | 4 |
| colorless | tsukuru | tazaki | 4 |
| correct | meta | data | 4 |
| frog | saves | tokyo | 4 |
| super | frog | saves | 4 |
| cheesecake | shaped | poverty | 3 |
| fiction | haruki | murakami | 3 |
| haruki | murakami | book | 3 |
| life | changing | lesson | 3 |
| poverty | haruki | murakami | 3 |
| proper | meta | data | 3 |
# plot word network
words_counts_3 %>%
filter(n >= 3) %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = .6, edge_width = n)) +
geom_node_point(color = "darkslategray4", size = 3) +
geom_node_text(aes(label = name), vjust = 1.8) +
labs(title = "Word Networks",
x = "", y = "")
The word networks reveal few more titles of stories apart from the famous ‘Norwegian Woods’ that are being discussed by people by visualizing frequent word pairings. They include ‘saves-frog-super’, ‘cheesecake-shaped-poverty’, and ‘sleeping-willow-blind’ which likely refer to stories like ‘Super-Frog Saves Tokyo’, ‘My Cheesecake Shaped Poverty’, and ‘Blind Willow, Sleeping Woman’, respectively.
Similarly extracting, cleaning, and visualizing bigrams to uncover common word pairs and their co-occurrence patterns
words_2gram <- threads %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
select(text) %>%
unnest_tokens(output = paired_words,
input = text,
token = "ngrams",
n = 2)
words_2gram_pair <- words_2gram %>%
separate(paired_words, c("word1", "word2"), sep = " ")
words_2gram_pair_filtered <- words_2gram_pair %>%
filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word) %>%
filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]"))
words_2gram_pair_filtered %<>%
filter(stri_enc_isascii(word1) & stri_enc_isascii(word2))
words_counts_2 <- words_2gram_pair_filtered %>%
count(word1, word2) %>%
arrange(desc(n))
head(words_counts_2, 20) %>%
knitr::kable()
| word1 | word2 | n |
|---|---|---|
| haruki | murakami | 82 |
| norwegian | wood | 58 |
| dance | dance | 34 |
| short | story | 29 |
| short | stories | 25 |
| bird | chronicle | 16 |
| haruki | murakami’s | 16 |
| sputnik | sweetheart | 14 |
| hard | boiled | 10 |
| sheep | chase | 10 |
| cutty | sark | 9 |
| dolphin | hotel | 9 |
| meta | data | 9 |
| murakami | books | 9 |
| voted | words | 9 |
| wild | sheep | 9 |
| border | west | 8 |
| fuka | eri | 7 |
| killing | commendatore | 7 |
| magical | realism | 7 |
words_counts_2 %>%
filter(n >= 10) %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = .6, edge_width = n)) +
geom_node_point(color = "darkslategray4", size = 3) +
geom_node_text(aes(label = name), vjust = 1.8) +
labs(title = "Word Networks",
x = "", y = "")
Further a bi-gram word network highlight more Murakami story titles. They include ‘sheep-chase’, ‘sputnik-sweetheart’, ‘bird-chronicle’ and ‘boiled hard’, which likely refer to stories like ‘A Wild Sheep Chase’,‘Sputnik Sweetheart’, ‘The Wind-Up Bird Chronicle’, and ‘Hard-Boiled Wonderland and the End of the World’, respectively. The word ‘dance’ appear in isolation in both trigram and bigram analysis as it like refers to the book named ‘Dance Dance Dance’. Its repeated occurrence reflects the repetition of the same word in the title.
Sentiment analysis of threads using sentimentr package with negation handling to track yearly trends across chosen subreddits
threads_sentiment <- threads %>%
filter(comments > 0) %>%
mutate(year = year(date_utc)) %>%
group_by(subreddit, year) %>%
summarise(ave_sentiment = mean(sentiment_by(text)$ave_sentiment, na.rm = TRUE)) %>%
ungroup()
# Calculate overall sentiment per year across all subreddits
total_sentiment <- threads_sentiment %>%
group_by(year) %>%
summarise(ave_sentiment = mean(ave_sentiment, na.rm = TRUE)) %>%
mutate(subreddit = "All")
# Combine subreddit-level and total sentiment
combined_sentiment <- bind_rows(threads_sentiment, total_sentiment)
# Plot
ggplot(combined_sentiment,
aes(x = factor(year, levels = sort(unique(year))),
y = ave_sentiment,
color = subreddit,
group = subreddit)) +
geom_point() +
geom_line() +
labs(x = "Year", y = "Average Sentiment", color = "Subreddit") +
theme_minimal()
Sentiment patterns reveal interesting subreddit-specific dynamics.
These patterns reflect natural fluctuations in readership and engagement over time, capturing how interest in Murakami’s work rises and falls with broader trends, discussions of specific books, and evolving community dynamics.
Displaying 10 sample texts alongside their sentiment scores to evaluate the credibility of the sentiment analysis outcomes -
# make the random sample reproducible
set.seed(123)
sample_texts <- threads %>%
# keep only posts with comments
filter(comments > 0) %>%
# randomly select 10 posts
slice_sample(n = 10) %>%
mutate(sentiment = sentiment_by(text)$ave_sentiment)
sample_texts %>%
select(date_utc, title, text, sentiment) %>%
knitr::kable(caption = "Random Sample of 10 Posts with Sentiment Scores")
| date_utc | title | text | sentiment |
|---|---|---|---|
| 2023-09-19 | ABCs of Haruki Murakami - J | Post your suggestions. The top two voted words/phrases/quotes win. A is for Affairs and Aomame B is for Breasts and Blues C is for Cats and Cutty Sark D is for Dreams and Dolphin Hotel E is for Ears and Ennui F is for Fuka-Eri and Food G is for Gatsby and Gin & Tonic H is for Hokkaido and Hard-boiled I is for Isolation and Id, the primitive and instinctual part of the mind that contains sexual and aggressive drives, which coincidentally sounds a lot like “ido”in Japanese, the word for a water well. J is for ? | 0.0148722 |
| 2020-04-18 | What is characteristic of Kitaru that you like in Yesterday by Haruki Murakami? | Is there any characteristic of Kitaru that makes him interesting and favorable? | 0.3608439 |
| 2024-02-23 | Haruki Murakamis influence in the band Amazarashi | Hey everyone, Im a new poster who found Haruki Murakami just recently Ive been listening to Amazarashi for about a few years now. One song of theirs I liked the most is Getsuyoubi [ Monday ], a song created as a collaboration with a mangaka and his work named Getsuyoubi no Tomodachi. In the song, I found parts of the lyrics I believe are inspired by Murakamis book What I Talk About when I talk about Running. It seems fitting, considering that the protagonist of Getsuyoubi no Tomodachi is a tomboyish writer who enjoys being physically active. The song contains the lyrics: From the gyms storage room comes the smell of moldy mats. The lines on the court make it clear where each of us stands. A dove fell to its death in the corridor. Akutagawa looks better than he ever did inside the textbook. This reminds me of many portions of the book where he finds dead animals on his runs. Waiting here in vain inside the large train station. My ice cream melted at the same time the horns whole note played. The closer we get, the more we come to know. And when theres so much we dont know, we gaze at the school district across the river. I find the last few lines were direct references to a iconic line in the book: The most important thing we ever learn at school is the fact that the most important things cant be learned at school. Lastly, the chorus contains lyrics that reference a different book by Harukami that I have not read thoroughly, so its best for others to verify this: I dont remember it being so difficult admitting to all of the things that I like. I guess Ill take a deep breath and dive down below the waves. Ill dive into the very depths of your heart, deeper than anyone would have believed possible. I sometimes think that peoples hearts are like deep wells. Nobody knows whats at the bottom. All you can do is imagine by what comes floating to the surface every once in a while. This line is from Blind Willow, Sleeping Woman. English translations for the lyrics of Getsuyoubi by Amazarashi are sourced here. | 0.0515164 |
| 2025-03-22 | Start rereading Hear the Wind Sing (Goossen trans.), but I favour the cover of Birnbaum trans. | 0.0000000 | |
| 2025-10-11 | Haruki Murakami (songs from his books) playlist | Hi everyone, here’s the playlist I made from your suggestions from my recent post here! Hope someone get something new from it! I’ll keep adding if there’s more suggestions! Have a great weekend (reading Murakami’s books) = Youtube Music Spotify | 0.2610141 |
| 2025-07-08 | Estonian hardbacks | While flying home from Estonia I spotted these beautiful covers for Haruki Murakami in the airport shop. Feeling very jealous as someone that only speaks English cause every other countries editions of Murakami look so much cooler than the UK versions available now. | 0.0932718 |
| 2021-01-02 | NTS Interview with Ryuchi Sakamoto? | Did anyone get to catch any of the NTS interview between Haruki Murakami and Ryuchi Sakamoto from last week, or the Sakamoto and Bowie that aired on New Year’s Day? ​ https://crackmagazine.net/2020/12/archive-ryuichi-sakamoto-radio-shows-with-david-bowie-and-haruki-murakami-to-air-on-nts/ https://www.nts.live/shows/guests/episodes/ryuchi-sakamoto-david-bowie-1983-1st-december-2020 | 0.0797026 |
| 2024-10-07 | Haruki Murakami 1Q84 | Currently reading 1Q84 S3 I WANNA END IT !!!! | 0.0353553 |
| 2017-10-05 | Five Must-Read Books of Haruki Murakami | 0.0000000 | |
| 2024-11-09 | The City and Its Uncertain Walls by Haruki Murakami | 0.0000000 |
The random sample of 10 posts with their sentiment scores reveals both strengths and limitations of the dictionary-based sentiment analysis approach when applied to literary discussions
The sentiment analysis demonstrates limited credibility for literary discussion forums due to a fundamental methodological flaw: it cannot distinguish between negative words used to describe thematic content and negative sentiment toward that content. Readers discussing Murakami’s characteristically melancholic, lonely, and surreal narratives will naturally use vocabulary that triggers negative sentiment scores, even when expressing deep appreciation for these exact qualities. This creates a systematic bias where thoughtful thematic discussions is misclassified as negative sentiment. While the dictionary method can identify straightforward positive expressions and neutral informational posts, it fundamentally struggles with the nature of literary discourse, where discussing dark themes is often a marker of engagement and appreciation rather than dissatisfaction. This limitation must be acknowledged when interpreting all sentiment trends in this analysis.
Visualizing the frequency of words falling under different NRC sentiment categories using syuzhet
threads_sentiments <- words_clean %>%
inner_join(get_sentiments("nrc"), by = "word") %>%
count(sentiment, sort = TRUE)
threads_sentiments %>%
ggplot(aes(x = sentiment, y = n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
labs(title = "Sentiment Analysis of Comments by NRC Categories", x = "Sentiment", y = "Frequency")
The emotional landscape of Murakami discussions skews heavily positive, with “positive” sentiment dominating all other categories. “Trust” ranks second, followed by “negative” and “joy”, suggesting while discussions are generally favorable, they also reflect a complex love–hate nature of readers’ responses to Murakami’s literature. But as we have previously recognized as a drawback of this analysis method, “negative”, “sadness” and “fear” may also connect to Murakami’s often melancholic and surreal narrative themes. The significant presence of “anticipation” likely reflects excitement about new releases.
Visualising distribution of overall sentiment scores of comments under different subreddits
# Calculate sentiment scores for each post
threads_sentiment_scores <- threads %>%
filter(comments > 0) %>%
select(subreddit, text, timestamp) %>%
rowwise() %>%
mutate(sentiment_score = sentiment_by(text)$ave_sentiment) %>%
ungroup()
# Plot distribution curves for each subreddit
ggplot(threads_sentiment_scores, aes(x = sentiment_score, color = subreddit, fill = subreddit)) +
geom_density(alpha = 0.3) + # semi-transparent fill
labs(
title = "Distribution of Sentiment Scores Across Subreddits",
x = "Sentiment Score",
y = "Density",
color = "Subreddit",
fill = "Subreddit"
) +
theme_minimal()
The sentiment distribution reveals distinct community characteristics.
Visualsing word clouds to compare between frequent positive and negative words in Murakami discussions
# Prepare data again
nrc_posneg <- get_sentiments("nrc") %>%
filter(sentiment %in% c("positive", "negative"))
words_sentiment <- words_clean %>%
inner_join(nrc_posneg, by = "word")
positive_words <- words_sentiment %>%
filter(sentiment == "positive") %>%
count(word, sort = TRUE)
negative_words <- words_sentiment %>%
filter(sentiment == "negative") %>%
count(word, sort = TRUE)
# Create a side-by-side plotting window
par(mfrow = c(1, 2))
# Positive word cloud
wordcloud(
words = positive_words$word,
freq = positive_words$n,
max.words = 150,
scale = c(3, 0.5),
colors = brewer.pal(8, "Greens")
)
title("Positive Words")
# Negative word cloud
wordcloud(
words = negative_words$word,
freq = negative_words$n,
max.words = 150,
scale = c(3, 0.5),
colors = brewer.pal(8, "Reds")
)
title("Negative Words")
The positive sentiment cloud is dominated by “love,” “dance,” and “favorite,” with supporting terms like “enjoy,” “real,” “found,” and “music” reflecting readers’ emotional connections and aesthetic appreciation. The prominence of “music” also resonates with Murakami’s well-known personal passion for music, which often permeates his narratives and shapes the reading experience. Conversely, the negative cloud prominently features “weird,” “lost,” “words,” “wild,” “feeling,” “loneliness,” and “blues”, terms that may actually reflect Murakami’s characteristic themes and narrative atmosphere rather than purely negative reader reactions. This suggests the sentiment analysis captures both reader evaluations and thematic content, with terms like “lonely,” “strange,” and “depressed” possibly describing plot elements rather than expressing dissatisfaction.
Temporal dynamics: Yearly sentiment trends reveal community-specific dynamics, from volatility and neutrality in early discussion forums to more stable, convergent moderate positive sentiment by 2025. This pattern reflects both sustained reader interest and evolving engagement with Murakami’s literature over the years.
Community engagement: Across three Murakami-focused subreddits, murakami, HarukiMurakami, and MurakamiBookClub users engage deeply with his work. While Norwegian Wood dominates discussions, readers also actively reference short stories and other niche novels, including Dance Dance Dance, The Wind-Up Bird Chronicle, and Super-Frog Saves Tokyo.
Reader appreciation and thematic recognition: Overall, the communities demonstrate a generally positive sentiment toward Murakami’s work, albeit with varying emotional intensities. MurakamiBookClub reflects a more neutral, analytical engagement typical of structured reading groups, whereas HarukiMurakami displays highly passionate discussions encompassing both the author’s works and broader aspects of his life. Positive word clouds are dominated by love, dance, favorite, and music, reflecting both readers’ emotional connections and recurring positive plot elements in Murakami’s literature. The prominence of music also resonates with Murakami’s personal passion for jazz and classical music, which frequently informs his narratives. Conversely, words such as loneliness, strange, and surreal appear in negative sentiment analyses, yet often represent thematic elements in his melancholic and magical realist style rather than criticism, underscoring the nuanced interpretation required in literary sentiment analysis.
Methodological limitation: Sentiment analysis of literary discussions is inherently constrained. Negative words in Murakami’s narratives often describe thematic content rather than express disapproval, leading to systematic bias. While dictionary-based methods capture clear positive expressions and neutral informational posts, they struggle to distinguish appreciation of dark or surreal themes from genuine negative sentiment. This limitation should be considered when interpreting sentiment trends in similar studies.