# Load packages
library(tidyverse) # All the tidy things
library(lubridate) # Easily fixing pesky dates
library(tidytext) # Tidy text mining
library(textdata) # Lexicons of sentiment data
library(widyr) # Easily calculating pairwise counts
library(igraph) # Special graphs for network analysis
library(ggraph) # An extension of ggplot for relational data used in networksStudio Ghibli Sentiment Analysis
For this assignment, I wanted to explore something fun and meaningful to me, so I chose Studio Ghibli films. Their stories are emotional, whimsical, and beautifully written, making them perfect for text-based sentiment analysis.
To compare patterns, I grouped the films into two categories: Miyazaki-directed films and all other Ghibli directors.
From here, three questions were explored:
1. Which emotions appear most frequently in film descriptions for Miyazaki films compared to others Ghibli directors?
2.How does positivity/negativity change over time for Miyazaki vs other directors?
3. Are more positive film descriptions associated with higher Rotten Tomatoes scores?
#Load data
ghibli_clean <-
read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/whitek25_xavier_edu/IQCKqbbc-UsMSp1TNG6dPM0PAWfJ5RybHuGuRoRC_DzI9Ww?download=1")
#Wrangling - director groups
ghibli_base <-
ghibli_clean %>%
mutate(director_group = if_else(director == "Hayao Miyazaki",
"Miyazaki", "Other Ghibli"))Q1: Which emotions appear most frequently in film descriptions for Miyazaki films compared to others Ghibli directors?
Studio Ghibli films all carry their own emotional identity, but Miyazaki’s work is often described as warm, imaginative, and full of subtle emotional depth. I wanted to see whether those impressions show up in the text descriptions. By comparing the emotional language used in the plot descriptions for Miyazaki films versus other Ghibli films, can get a cleaner picture of whether their emotional tones differ in a measurable way, or whether that distinction is something viewers (including me) mainly feel, rather than something reflected directly in the language used to describe them.
Each film’s description was tokenized into individual words with stop words removed and joined the NRC lexicon to score each word with one of several emotions (joy, fear, trust, anger, etc.). The percentage of scorable words in each emotion category was then calculated for both director groups.
#tokenize descriptions
tidy_ghibli <-
ghibli_base %>%
unnest_tokens(word,description) %>%
anti_join(stop_words)
#NRC emotive lexicon
nrc <- get_sentiments("nrc")
ghibli_nrc <-
tidy_ghibli %>%
inner_join(nrc, by = "word", relationship = "many-to-many")
ghibli_nrc_summary <-
ghibli_nrc %>%
group_by(director_group, sentiment) %>%
summarize(Count = n()) %>%
group_by(director_group) %>%
mutate(`Percent of Scorable Words` = Count / sum(Count)) %>%
arrange(director_group, -`Percent of Scorable Words`)
#visualization
ghibli_nrc_summary %>%
ggplot(aes(x = sentiment,
y = `Percent of Scorable Words`,
fill = director_group)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::percent) +
labs(title = "Emotive Sentiment in Ghibli Film Descriptions",
x = "Emotion",
y = "Percent of Scorable Words",
fill = "Director group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Overall, both Miyazaki and other Ghibli films share a similar emotional profile. The most common emotional categories in the film descriptions are positive, negative, and fear, followed by anticipation, and trust. Other Ghibli films have higher percentages for positive, sadness, surprise, and trust. These core emotions appear frequently for both groups, which makes sense given Ghibli’s overall storytelling style.
There are, however, some noticeable differences. Other Ghibli films show higher percentages of positive, sadness, surprise, and trust words. This suggests their descriptions may lean more dramatic or emotionally charged, sometimes hinting at unexpected or heavier plot turns. Miyazaki’s descriptions score a bit higher in joy and anticipation. This supports the ideas that even alongside tension or conflict, his stories often balance in moments of warmth, wonder, and forward-looking energy. It’s a softer tonal shift rather than a dramatic one.
Differences aren’t extreme but do reflect measurable stylistic nuances. It’s the kind of pattern where the emotional tones people feel when watching these films do line up with the subtle differences in the way their stories are described.
Q2: How has positivity/negativity changed over time for Miyazaki vs other directors?
Ghibli films span multiple decades, and storytelling styles evolve. I was curious whether film descriptions have shifted in tone- especially during different creative eras.
Using the Bing lexicon, I counted positive vs negative words for each film and created a net sentiment score (positive - negative). Then I plotted this against release year for both director groups.
bing <- get_sentiments("bing")
#per film positivity
ghibli_bing <-
tidy_ghibli %>%
inner_join(bing, by = "word")
film_sentiment <-
ghibli_bing %>%
group_by(title, director_group, release_date, rt_score) %>%
summarize(
positive = sum(sentiment == "positive"),
negative = sum(sentiment == "negative")) %>%
mutate(net_sentiment = positive - negative)
#visualization, sentiment over time
film_sentiment %>%
ggplot(aes(x = release_date, y = net_sentiment, color = director_group)) +
geom_line() +
geom_point() +
labs(title = "Studio Ghibli Positivty Score Over Time",
x = "Release year",
y = "Net sentiment (postive - negative)",
color = "Director Group")When looking at net positivity over time, neither group shows a smooth upward or downward trend. Instead, both Miyazaki and other Ghibli films fluctuate noticeably across decades. This isn’t surprising as emotional tone often shifts depending on each film’s themes rather than the era it was released in.
Miyazaki’s line, though uneven, tends to hover closer to neutral. Most of his films fall between -3 and +2, with dips around the early 200s and higher points in the early 2010s. This suggests a balance between emotional tension and warmth, but nothing that moves drastically in one direction over time. Other Ghibli directors show much sharper swings. Their sentiment ranges from slightly positive at times to very negative in others, including a dramatic drop around 2006. These swings imply that their film descriptions tend to emphasize more intense or varied emotional tensions depending on the project.
Even with these differences, neither group shows a clear long-term trend toward becoming more positive or negative. The emotional tone of their descriptions seems to shift film-by-film rather than following a consistent chronological pattern.
Q3: Are more positive film descriptions associated with higher Rotten Tomatoes scores?
I wondered whether the positivity of a film’s written description lines up with how critics received it. AFINN lexicon gives a numeric score to each word (-5 t0 +5), which lets me calculate an average sentiment score for each film description. Films were then grouped into three categories based on their average AFINN score and used to compare the average Rotten Tomatoes score within each sentiment category and director group.
I joined the AFINN lexicon to the tokenized descriptions and summarized each film’s average sentiment score.
afinn <- get_sentiments("afinn")
ghibli_afinn <-
tidy_ghibli %>%
inner_join(afinn, by = "word")
film_afinn <-
ghibli_afinn %>%
group_by(title, director_group, release_date, rt_score) %>%
summarize(
avg_afinn = mean(value),
total_afinn = sum(value),
word_count = n())
film_afinn_cats <-
film_afinn %>%
mutate(sentiment_group = if_else(
avg_afinn > 0, "Positive",
if_else(avg_afinn <0, "Negative", "Neutral")
)) %>%
group_by(sentiment_group, director_group) %>%
summarise(avg_rt = mean(rt_score))
film_afinn_cats %>%
ggplot(aes(x = sentiment_group,
y = avg_rt,
fill = director_group)) +
geom_col(position = "dodge") +
labs(title = "Avg Rotten Tomatoes score by Description Sentiment Category",
x = "Sentiment Category",
y = "Avg Rotten Tomatoes score",
fill = "Director Group")Looking at results by sentiment category, there isn’t a clear pattern that “more positive” descriptions consistently lead to a higher Rotten Tomatoes score. Across negative, neutral, and positive description categories, the average scores stay high for all groups.
For Miyazaki films, the neutral description category has the highest Rotten Tomatoes score, with negative and positive descriptions slightly lower, around mid- low 90s. For other Ghibli directors, the pattern is also inconsistent: the films with negative descriptions have slightly higher average ratings than those with positive descriptions, and there are no neutral descriptions for this group at all.
Overall, critics seem to rate Ghibli films highly regardless of whether the written description leans negative, neutral, or positive. This suggests that the emotional tone of the plot description by itself doesn’t do a good job of predicting critic scores.
Conclusion
Across all questions, the sentiment patterns in Studio Ghibli film descriptions showed some interesting differences, but none were dramatic enough to suggest that the emotional language fully defines how these films are received. Taken together, these results suggest that while sentiment analysis reveals subtle stylistic fingerprints across the studio’s films, the deeper emotional impact of Ghibli storytelling is something that extends beyond what can be measured in a few lines of descriptive text.