CP8883: Major Assignment 3

1 Haruki Murakami

Examining user-generated text data and sentiment analysis from Reddit threads

This project analyzes user-generated text data from Reddit to explore how readers discuss and respond to the works of Haruki Murakami, one of my favorite authors, whose writing has inspired me to undertake this study.
Murakami has been writing and publishing since the late 1970s, achieving international fame with Norwegian Wood (1987), whose English translations introduced his distinctive blend of magical realism, surrealism, and themes of love, loneliness, and music to a worldwide audience. His literature has historically received mixed reviews - praised for its imaginative storytelling, dreamlike narrative style, and evocative atmospheres, while occasionally critiqued for the perceived emotional detachment in his characters or sometimes the repetition of certain themes.
By combining n-gram analysis, sentiment scoring, and NRC-based emotional categorization, this study aims to identify frequently mentioned book titles, reveal patterns in genre recognition, and highlight both positive reception and critical engagement among readers.

2 R packages

Loading all the necessary R packages that will be used throughout the workflow

# Package names
packages <- c("RedditExtractoR", "anytime", "magrittr", "httr", "tidytext", "tidyverse", "igraph", "ggraph", "wordcloud2", "textdata", "here", "stringi","sentimentr","syuzhet", "wordcloud")
# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

# Load packages
invisible(lapply(packages, library, character.only = TRUE))

3 Prepare data

###Search subreddits Identifying and selecting Haruki Murakami - related subreddits to build a focused and reliable text dataset for subsequent N-gram and sentiment analysis

# using keyword
threads <- find_thread_urls(keywords = 'Haruki Murakami', 
                              sort_by = 'relevance', 
                              period = 'all') %>% 
  drop_na()

rownames(threads) <- NULL
colnames(threads)
head(threads, 3) %>% knitr::kable()
threads$subreddit %>% table() %>% sort(decreasing = T) %>% head(20)

# search for subreddits
subreddit_list <- RedditExtractoR::find_subreddits('Haruki Murakami')
subreddit_list %>% 
  arrange(desc(subscribers)) %>% 
  .[1:25,c('subreddit','title','subscribers')] %>% 
  knitr::kable()

# using both subreddit and keyword
t_1 <- find_thread_urls(keywords= 'Haruki Murakami', 
                              subreddit = 'murakami', 
                              sort_by = 'relevance', 
                              period = 'all') %>% 
  drop_na()
rownames(t_1) <- NULL

t_2 <- find_thread_urls(keywords= 'Haruki Murakami', 
                        subreddit = 'HarukiMurakami', 
                        sort_by = 'relevance', 
                        period = 'all') %>% 
  drop_na()
rownames(t_2) <- NULL

t_3 <- find_thread_urls(keywords= 'Haruki Murakami', 
                        subreddit = 'MurakamiBookClub', 
                        sort_by = 'relevance', 
                        period = 'all') %>% 
  drop_na()
rownames(t_3) <- NULL

threads <- rbind(t_1, t_2, t_3)
save(threads, file = "Murakami.RData")

3.1 Cleaning text

Cleaning and standardizing thread text

load("Murakami.RData")
# Sanitize text
threads %<>% 
  mutate(across(
    where(is.character),
    ~ .x %>%
      str_replace_all("\\|", "/") %>% 
      str_replace_all("\\n", " ") %>% 
      str_squish() 
  ))

head(threads, 2) %>% knitr::kable()

date_utc	timestamp	title	text	subreddit	comments	url
2021-11-12	1636692365	The Wind-Up Bird Chronicle, Haruki Murakami - Book Review	It starts off slightly odd, with each subsequent chapter it grows stranger, more complex and more compelling. https://youtu.be/eqhvaFboII8	murakami	0	https://www.reddit.com/r/murakami/comments/qs3mdq/the_windup_bird_chronicle_haruki_murakami_book/
2018-10-11	1539273196	Haruki Murakami interview: ‘When I write fiction I go to weird, secret places in myself’		murakami	7	https://www.reddit.com/r/murakami/comments/9nb68g/haruki_murakami_interview_when_i_write_fiction_i/

3.2 Data overview

Exploring the temporal distribution of threads by converting timestamps and visualizing year-to-year activity trends

# create new column: date
threads%<>% 
  mutate(date = as.POSIXct(date_utc)) %>%
  filter(!is.na(date))

# number of threads by year
threads %>% 
  ggplot(aes(x = date)) +
  geom_histogram(color="black", position = 'stack', binwidth = 3153600) +
  scale_x_datetime(date_labels = "%y",
                   breaks = seq(min(threads$date, na.rm = T), 
                                max(threads$date, na.rm = T), 
                                by = "1 year")) +
  theme_minimal()

4 N-gram Analysis

4.1 Tokenizing words

Cleaning thread text, tokenizing words, and visualizing unique word counts dominant vocabulary in Murakami discussions

# By removing stop words, we focus on meaningful words that reveal actual themes, topics, or patterns in the text, producing more insightful and interpretable analysis.
# load list of stop words - from the tidytext package
data("stop_words")
# view random 10 words
print(stop_words$word[sample(1:nrow(stop_words), 20)])

##  [1] "they'd"     "find"       "w"          "aren't"     "certain"   
##  [6] "she"        "yourselves" "no"         "indicate"   "that"      
## [11] "out"        "going"      "too"        "without"    "there's"   
## [16] "she'd"      "isn't"      "despite"    "instead"    "showed"

# Regex that matches URL-type string
replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&amp;|&lt;|&gt;"

words_clean <- threads %>% 
  # drop URLs
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  # Tokenization (word tokens)
  unnest_tokens(word, text, token = 'words') %>% 
  # drop stop words
  anti_join(stop_words, by = "word") %>% 
  # drop non-alphabet-only strings
  filter(str_detect(word, "[a-z]"))

# Check the number of rows after removal of the stop words. There should be fewer words now
print(
  glue::glue("Before: {nrow(words)}, After: {nrow(words_clean)}")
)

words_clean %>%
  count(word, sort = TRUE) %>%
  top_n(20, n) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
  labs(x = "words",
       y = "counts",
       title = "Unique wordcounts")

The dominant terms in discussions reveal the community’s focus on “murakami” and “haruki” naturally lead as the author’s name, followed by “book,” “read,” and “story”. The prominence of “norwegian” and “wood,” likely refer to the novel titled ‘Norwegian Woods’. A specific character reference “toru” (Toru Watanabe from ‘Norwegian Woods’) also frequently appear indicating this novel particularly dominate the conversation.

Visualizing an illustrative word cloud

n <- 20 # number of words with color
h <- runif(n, 0, 1) # any color
s <- runif(n, 0.6, 1) # vivid
v <- runif(n, 0.3, 0.7) # neither too dark or bright

df_hsv <- data.frame(h = h, s = s, v = v)
pal <- apply(df_hsv, 1, function(x) hsv(x['h'], x['s'], x['v']))
pal <- c(pal, rep("grey", 10000))

words_clean %>% 
  count(word, sort = TRUE) %>% 
  wordcloud2(color = pal, 
             minRotation = 0, 
             maxRotation = 0, 
             ellipticity = 0.8)

4.2 Tri-gram Analysis

Extracting, cleaning, and visualizing trigrams to uncover common three-word phrases and their co-occurrence patterns
(Without removing stop words) - While this approach highlights several book titles as frequent trigrams, it also captures many trivial or uninformative combinations that can be filtered out for clearer insights.

words_3gram <- threads %>%
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  select(text) %>%
  unnest_tokens(output = paired_words,
                input = text,
                token = "ngrams",
                n = 3)
# Show ngrams with sorted values
words_3gram %>%
  count(paired_words, sort = TRUE) %>% 
  head(20) %>% 
  knitr::kable()

paired_words	n
NA	186
on the shore	25
kafka on the	24
by haruki murakami	19
wind up bird	18
dance dance dance	17
of norwegian wood	16
up bird chronicle	16
if there are	15
there are any	15
the wind up	14
a lot of	13
let me know	13
i want to	12
of his books	12
of the book	12
the end of	12
one of my	10
please let me	10
a is for	9

Extracting, cleaning, and visualizing trigrams to uncover common three-word phrases and their co-occurrence patterns
(Removing stop words)

# separate the paired words into three columns
words_3gram_pair <- words_3gram %>%
  separate(paired_words, c("word1", "word2", "word3"), sep = " ")

# filter rows where there are stop words under word 1 column, word 2 column and word 3 column
words_3gram_pair_filtered <- words_3gram_pair %>%
  # drop stop words
  filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word & !word3 %in% stop_words$word) %>% 
  # drop non-alphabet-only strings
  filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]") & str_detect(word3, "[a-z]"))

# Filter out words that are not encoded in ASCII
words_3gram_pair_filtered %<>% 
  filter(stri_enc_isascii(word1) & stri_enc_isascii(word2) & stri_enc_isascii(word3))

# Sort the new trigram (n=3) counts:
words_counts_3 <- words_3gram_pair_filtered %>%
  count(word1, word2, word3) %>%
  arrange(desc(n))

head(words_counts_3, 20) %>% 
  knitr::kable()

word1	word2	word3	n
dance	dance	dance	17
wild	sheep	chase	9
phrases	quotes	win	7
voted	words	phrases	7
words	phrases	quotes	7
blind	willow	sleeping	6
willow	sleeping	woman	6
hard	boiled	wonderland	5
norwegian	wood	kafka	5
audience	member’s	question	4
colorless	tsukuru	tazaki	4
correct	meta	data	4
frog	saves	tokyo	4
super	frog	saves	4
cheesecake	shaped	poverty	3
fiction	haruki	murakami	3
haruki	murakami	book	3
life	changing	lesson	3
poverty	haruki	murakami	3
proper	meta	data	3

# plot word network
words_counts_3 %>%
  filter(n >= 3) %>%
  graph_from_data_frame() %>% 
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = .6, edge_width = n)) +
  geom_node_point(color = "darkslategray4", size = 3) +
  geom_node_text(aes(label = name), vjust = 1.8) +
  labs(title = "Word Networks",
       x = "", y = "")

The word networks reveal few more titles of stories apart from the famous ‘Norwegian Woods’ that are being discussed by people by visualizing frequent word pairings. They include ‘saves-frog-super’, ‘cheesecake-shaped-poverty’, and ‘sleeping-willow-blind’ which likely refer to stories like ‘Super-Frog Saves Tokyo’, ‘My Cheesecake Shaped Poverty’, and ‘Blind Willow, Sleeping Woman’, respectively.

4.3 Bi-gram Analysis

Similarly extracting, cleaning, and visualizing bigrams to uncover common word pairs and their co-occurrence patterns

words_2gram <- threads %>%
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  select(text) %>%
  unnest_tokens(output = paired_words,
                input = text,
                token = "ngrams",
                n = 2)

words_2gram_pair <- words_2gram %>%
  separate(paired_words, c("word1", "word2"), sep = " ")
words_2gram_pair_filtered <- words_2gram_pair %>%
  filter(!word1 %in% stop_words$word & !word2 %in% stop_words$word) %>% 
  filter(str_detect(word1, "[a-z]") & str_detect(word2, "[a-z]"))

words_2gram_pair_filtered %<>% 
  filter(stri_enc_isascii(word1) & stri_enc_isascii(word2))

words_counts_2 <- words_2gram_pair_filtered %>%
  count(word1, word2) %>%
  arrange(desc(n))

head(words_counts_2, 20) %>% 
  knitr::kable()

word1	word2	n
haruki	murakami	82
norwegian	wood	58
dance	dance	34
short	story	29
short	stories	25
bird	chronicle	16
haruki	murakami’s	16
sputnik	sweetheart	14
hard	boiled	10
sheep	chase	10
cutty	sark	9
dolphin	hotel	9
meta	data	9
murakami	books	9
voted	words	9
wild	sheep	9
border	west	8
fuka	eri	7
killing	commendatore	7
magical	realism	7

words_counts_2 %>%
  filter(n >= 10) %>%
  graph_from_data_frame() %>% 
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = .6, edge_width = n)) +
  geom_node_point(color = "darkslategray4", size = 3) +
  geom_node_text(aes(label = name), vjust = 1.8) +
  labs(title = "Word Networks",
       x = "", y = "")

Further a bi-gram word network highlight more Murakami story titles. They include ‘sheep-chase’, ‘sputnik-sweetheart’, ‘bird-chronicle’ and ‘boiled hard’, which likely refer to stories like ‘A Wild Sheep Chase’,‘Sputnik Sweetheart’, ‘The Wind-Up Bird Chronicle’, and ‘Hard-Boiled Wonderland and the End of the World’, respectively. The word ‘dance’ appear in isolation in both trigram and bigram analysis as it like refers to the book named ‘Dance Dance Dance’. Its repeated occurrence reflects the repetition of the same word in the title.

5 Sentiment Analysis

5.1 Using dictionary method

Sentiment analysis of threads using sentimentr package with negation handling to track yearly trends across chosen subreddits

threads_sentiment <- threads %>%
  filter(comments > 0) %>%
  mutate(year = year(date_utc)) %>%
  group_by(subreddit, year) %>%
  summarise(ave_sentiment = mean(sentiment_by(text)$ave_sentiment, na.rm = TRUE)) %>%
  ungroup()

# Calculate overall sentiment per year across all subreddits
total_sentiment <- threads_sentiment %>%
  group_by(year) %>%
  summarise(ave_sentiment = mean(ave_sentiment, na.rm = TRUE)) %>%
  mutate(subreddit = "All")

# Combine subreddit-level and total sentiment
combined_sentiment <- bind_rows(threads_sentiment, total_sentiment)

# Plot
ggplot(combined_sentiment, 
       aes(x = factor(year, levels = sort(unique(year))),
           y = ave_sentiment,
           color = subreddit,
           group = subreddit)) +
  geom_point() +
  geom_line() +
  labs(x = "Year", y = "Average Sentiment", color = "Subreddit") +
  theme_minimal()

Sentiment patterns reveal interesting subreddit-specific dynamics.

The ‘murakami’ subreddit, that focus on discussions specifically related to the works of Haruki Murakami shows dramatic sentiment volatility in 2016, followed by relative stability near neutral.
’HarukiMurakami’which invites discussions on “Any and all things related to Haruki Murakami” demonstrates gradual sentiment growth from 2016-2020, peaking around 2020.
‘MurakamiBookClub’, a subreddit for book-club-style discussions enters later (2017) with moderate positive sentiment that declines toward 2024 after a sharp uptick in 2021.
All three communities converge toward similar moderate positive sentiment (~0.1) by 2025.

These patterns reflect natural fluctuations in readership and engagement over time, capturing how interest in Murakami’s work rises and falls with broader trends, discussions of specific books, and evolving community dynamics.

5.2 10 Sample Test

Displaying 10 sample texts alongside their sentiment scores to evaluate the credibility of the sentiment analysis outcomes -

# make the random sample reproducible
set.seed(123)   

sample_texts <- threads %>%
  # keep only posts with comments
  filter(comments > 0) %>%                   
  # randomly select 10 posts
  slice_sample(n = 10) %>%                   
  mutate(sentiment = sentiment_by(text)$ave_sentiment)

sample_texts %>%
  select(date_utc, title, text, sentiment) %>% 
  knitr::kable(caption = "Random Sample of 10 Posts with Sentiment Scores")

Random Sample of 10 Posts with Sentiment Scores
date_utc	title	text	sentiment
2023-09-19	ABCs of Haruki Murakami - J	Post your suggestions. The top two voted words/phrases/quotes win. A is for Affairs and Aomame B is for Breasts and Blues C is for Cats and Cutty Sark D is for Dreams and Dolphin Hotel E is for Ears and Ennui F is for Fuka-Eri and Food G is for Gatsby and Gin & Tonic H is for Hokkaido and Hard-boiled I is for Isolation and Id, the primitive and instinctual part of the mind that contains sexual and aggressive drives, which coincidentally sounds a lot like “ido”in Japanese, the word for a water well. J is for ?	0.0148722
2020-04-18	What is characteristic of Kitaru that you like in Yesterday by Haruki Murakami?	Is there any characteristic of Kitaru that makes him interesting and favorable?	0.3608439
2024-02-23	Haruki Murakamis influence in the band Amazarashi	Hey everyone, Im a new poster who found Haruki Murakami just recently Ive been listening to Amazarashi for about a few years now. One song of theirs I liked the most is Getsuyoubi [ Monday ], a song created as a collaboration with a mangaka and his work named Getsuyoubi no Tomodachi. In the song, I found parts of the lyrics I believe are inspired by Murakamis book What I Talk About when I talk about Running. It seems fitting, considering that the protagonist of Getsuyoubi no Tomodachi is a tomboyish writer who enjoys being physically active. The song contains the lyrics: From the gyms storage room comes the smell of moldy mats. The lines on the court make it clear where each of us stands. A dove fell to its death in the corridor. Akutagawa looks better than he ever did inside the textbook. This reminds me of many portions of the book where he finds dead animals on his runs. Waiting here in vain inside the large train station. My ice cream melted at the same time the horns whole note played. The closer we get, the more we come to know. And when theres so much we dont know, we gaze at the school district across the river. I find the last few lines were direct references to a iconic line in the book: The most important thing we ever learn at school is the fact that the most important things cant be learned at school. Lastly, the chorus contains lyrics that reference a different book by Harukami that I have not read thoroughly, so its best for others to verify this: I dont remember it being so difficult admitting to all of the things that I like. I guess Ill take a deep breath and dive down below the waves. Ill dive into the very depths of your heart, deeper than anyone would have believed possible. I sometimes think that peoples hearts are like deep wells. Nobody knows whats at the bottom. All you can do is imagine by what comes floating to the surface every once in a while. This line is from Blind Willow, Sleeping Woman. English translations for the lyrics of Getsuyoubi by Amazarashi are sourced here.	0.0515164
2025-03-22	Start rereading Hear the Wind Sing (Goossen trans.), but I favour the cover of Birnbaum trans.		0.0000000
2025-10-11	Haruki Murakami (songs from his books) playlist	Hi everyone, here’s the playlist I made from your suggestions from my recent post here! Hope someone get something new from it! I’ll keep adding if there’s more suggestions! Have a great weekend (reading Murakami’s books) = Youtube Music Spotify	0.2610141
2025-07-08	Estonian hardbacks	While flying home from Estonia I spotted these beautiful covers for Haruki Murakami in the airport shop. Feeling very jealous as someone that only speaks English cause every other countries editions of Murakami look so much cooler than the UK versions available now.	0.0932718
2021-01-02	NTS Interview with Ryuchi Sakamoto?	Did anyone get to catch any of the NTS interview between Haruki Murakami and Ryuchi Sakamoto from last week, or the Sakamoto and Bowie that aired on New Year’s Day? https://crackmagazine.net/2020/12/archive-ryuichi-sakamoto-radio-shows-with-david-bowie-and-haruki-murakami-to-air-on-nts/ https://www.nts.live/shows/guests/episodes/ryuchi-sakamoto-david-bowie-1983-1st-december-2020	0.0797026
2024-10-07	Haruki Murakami 1Q84	Currently reading 1Q84 S3 I WANNA END IT !!!!	0.0353553
2017-10-05	Five Must-Read Books of Haruki Murakami		0.0000000
2024-11-09	The City and Its Uncertain Walls by Haruki Murakami		0.0000000

The random sample of 10 posts with their sentiment scores reveals both strengths and limitations of the dictionary-based sentiment analysis approach when applied to literary discussions

Posts with clearly positive sentiment are appropriately scored (0.26 for the playlist post, 0.09 for the Estonian hardbacks enthusiasm)
Neutral informational posts receive near-zero scores (0.014, 0.00 for simple title announcements and book lists).
The lengthy Amazarashi post (0.05) about musical connections to Murakami’s work contains deeply emotional and positive content about literary-musical synergy, yet receives a relatively modest score likely because the sentiment analysis struggles with complex, metaphorical language (“dead animals,” “melted ice cream,” “deep wells”) that serves descriptive rather than evaluative purposes

The sentiment analysis demonstrates limited credibility for literary discussion forums due to a fundamental methodological flaw: it cannot distinguish between negative words used to describe thematic content and negative sentiment toward that content. Readers discussing Murakami’s characteristically melancholic, lonely, and surreal narratives will naturally use vocabulary that triggers negative sentiment scores, even when expressing deep appreciation for these exact qualities. This creates a systematic bias where thoughtful thematic discussions is misclassified as negative sentiment. While the dictionary method can identify straightforward positive expressions and neutral informational posts, it fundamentally struggles with the nature of literary discourse, where discussing dark themes is often a marker of engagement and appreciation rather than dissatisfaction. This limitation must be acknowledged when interpreting all sentiment trends in this analysis.

6 Visualization and Analysis

6.1 Plot 1: Sentiment Frequency Bar Chart

Visualizing the frequency of words falling under different NRC sentiment categories using syuzhet

threads_sentiments <- words_clean %>%
  inner_join(get_sentiments("nrc"), by = "word") %>%
  count(sentiment, sort = TRUE)

threads_sentiments %>%
  ggplot(aes(x = sentiment, y = n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  labs(title = "Sentiment Analysis of Comments by NRC Categories", x = "Sentiment", y = "Frequency")

The emotional landscape of Murakami discussions skews heavily positive, with “positive” sentiment dominating all other categories. “Trust” ranks second, followed by “negative” and “joy”, suggesting while discussions are generally favorable, they also reflect a complex love–hate nature of readers’ responses to Murakami’s literature. But as we have previously recognized as a drawback of this analysis method, “negative”, “sadness” and “fear” may also connect to Murakami’s often melancholic and surreal narrative themes. The significant presence of “anticipation” likely reflects excitement about new releases.

6.2 Plot 3: Sentiment Score Density Curves

Visualising distribution of overall sentiment scores of comments under different subreddits

# Calculate sentiment scores for each post
threads_sentiment_scores <- threads %>%
  filter(comments > 0) %>%
  select(subreddit, text, timestamp) %>%
  rowwise() %>%
  mutate(sentiment_score = sentiment_by(text)$ave_sentiment) %>%
  ungroup()

# Plot distribution curves for each subreddit
ggplot(threads_sentiment_scores, aes(x = sentiment_score, color = subreddit, fill = subreddit)) +
  geom_density(alpha = 0.3) +   # semi-transparent fill
  labs(
    title = "Distribution of Sentiment Scores Across Subreddits",
    x = "Sentiment Score",
    y = "Density",
    color = "Subreddit",
    fill = "Subreddit"
  ) +
  theme_minimal()

The sentiment distribution reveals distinct community characteristics.

‘MurakamiBookClub’ shows a sharp, narrow peak centered near 0, indicating predominantly neutral discussions typical of structured reading groups.
‘murakami’ displays a broader distribution with a secondary positive peak around 0.15-0.25, suggesting more varied emotional engagement. This likely reflects the subreddit’s focus on in-depth discussions of Murakami’s novels.
‘HarukiMurakami’ exhibits the most diverse sentiment profile with significant density extending into positive territory. This indicates that the general fan community engages in more passionate discussions, expressing strong emotional reactions not only about the author’s works but also about his life and broader influence, which are central topics in this subreddit.

6.3 Plot 3: Word Clouds - Positive vs. Negative

Visualsing word clouds to compare between frequent positive and negative words in Murakami discussions

# Prepare data again
nrc_posneg <- get_sentiments("nrc") %>%
  filter(sentiment %in% c("positive", "negative"))

words_sentiment <- words_clean %>%
  inner_join(nrc_posneg, by = "word")

positive_words <- words_sentiment %>%
  filter(sentiment == "positive") %>%
  count(word, sort = TRUE)

negative_words <- words_sentiment %>%
  filter(sentiment == "negative") %>%
  count(word, sort = TRUE)

# Create a side-by-side plotting window
par(mfrow = c(1, 2))

# Positive word cloud
wordcloud(
  words = positive_words$word,
  freq = positive_words$n,
  max.words = 150,
  scale = c(3, 0.5),
  colors = brewer.pal(8, "Greens")
)

title("Positive Words")

# Negative word cloud
wordcloud(
  words = negative_words$word,
  freq = negative_words$n,
  max.words = 150,
  scale = c(3, 0.5),
  colors = brewer.pal(8, "Reds")
)

title("Negative Words")

The positive sentiment cloud is dominated by “love,” “dance,” and “favorite,” with supporting terms like “enjoy,” “real,” “found,” and “music” reflecting readers’ emotional connections and aesthetic appreciation. The prominence of “music” also resonates with Murakami’s well-known personal passion for music, which often permeates his narratives and shapes the reading experience. Conversely, the negative cloud prominently features “weird,” “lost,” “words,” “wild,” “feeling,” “loneliness,” and “blues”, terms that may actually reflect Murakami’s characteristic themes and narrative atmosphere rather than purely negative reader reactions. This suggests the sentiment analysis captures both reader evaluations and thematic content, with terms like “lonely,” “strange,” and “depressed” possibly describing plot elements rather than expressing dissatisfaction.

7 Inferences

Temporal dynamics: Yearly sentiment trends reveal community-specific dynamics, from volatility and neutrality in early discussion forums to more stable, convergent moderate positive sentiment by 2025. This pattern reflects both sustained reader interest and evolving engagement with Murakami’s literature over the years.
Community engagement: Across three Murakami-focused subreddits, murakami, HarukiMurakami, and MurakamiBookClub users engage deeply with his work. While Norwegian Wood dominates discussions, readers also actively reference short stories and other niche novels, including Dance Dance Dance, The Wind-Up Bird Chronicle, and Super-Frog Saves Tokyo.
Reader appreciation and thematic recognition: Overall, the communities demonstrate a generally positive sentiment toward Murakami’s work, albeit with varying emotional intensities. MurakamiBookClub reflects a more neutral, analytical engagement typical of structured reading groups, whereas HarukiMurakami displays highly passionate discussions encompassing both the author’s works and broader aspects of his life. Positive word clouds are dominated by love, dance, favorite, and music, reflecting both readers’ emotional connections and recurring positive plot elements in Murakami’s literature. The prominence of music also resonates with Murakami’s personal passion for jazz and classical music, which frequently informs his narratives. Conversely, words such as loneliness, strange, and surreal appear in negative sentiment analyses, yet often represent thematic elements in his melancholic and magical realist style rather than criticism, underscoring the nuanced interpretation required in literary sentiment analysis.
Methodological limitation: Sentiment analysis of literary discussions is inherently constrained. Negative words in Murakami’s narratives often describe thematic content rather than express disapproval, leading to systematic bias. While dictionary-based methods capture clear positive expressions and neutral informational posts, they struggle to distinguish appreciation of dark or surreal themes from genuine negative sentiment. This limitation should be considered when interpreting sentiment trends in similar studies.