options(repos = c(CRAN = "https://cran.rstudio.com/"))
if (!requireNamespace("gutenbergr", quietly = TRUE)) install.packages("gutenbergr")
if (!requireNamespace("dplyr", quietly = TRUE)) install.packages("dplyr")
if (!requireNamespace("tidytext", quietly = TRUE)) install.packages("tidytext")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
if (!requireNamespace("textdata", quietly = TRUE)) install.packages("textdata")
library(gutenbergr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidytext)
library(ggplot2)
library(textdata)
Write text and code here.
Anna Karenina is a novel in which the emotional relationships between characters are very important. Therefore, I would like to examine the overall emotional trajectory and then derive the relationship between the emotions in the novel and the words expressing them through this analysis.
Explain where the data came from, what agency or company made it, how it is structured, what it shows, etc.
install.packages("gutenbergr")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
install.packages("dplyr")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
install.packages("tidytext")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
install.packages("ggplot2")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
library(gutenbergr)
library(dplyr)
library(tidytext)
library(ggplot2)
anna_karenina <- gutenberg_download(1399)
## Determining mirror for Project Gutenberg from https://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
Describe and show how you cleaned and reshaped the data
tidy_anna_karenina <- anna_karenina %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)
## Joining with `by = join_by(word)`
Describe and show how you created the first figure. Why did you choose this figure type?
bing_sentiments <- get_sentiments("bing")
anna_karenina <- anna_karenina %>%
mutate(chapter = cumsum(str_detect(text, regex("^Chapter [0-9]+", ignore_case = TRUE)))) %>%
filter(chapter > 0)
chapter_sentiments <- anna_karenina %>%
unnest_tokens(word, text) %>%
anti_join(stop_words, by = "word") %>%
inner_join(bing_sentiments, by = "word") %>%
count(chapter, sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative)
## Warning in inner_join(., bing_sentiments, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 33510 of `x` matches multiple rows in `y`.
## ℹ Row 6331 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
chapter_sentiment_plot <- ggplot(chapter_sentiments, aes(x = chapter, y = sentiment)) +
geom_line() +
geom_point() +
labs(title = "Sentiment Analysis of Anna Karenina by Chapter",
x = "Chapter",
y = "Sentiment Score") +
theme_minimal()
print(chapter_sentiment_plot)
ggsave(filename = "chapter_sentiment_analysis_anna_karenina.png", plot = chapter_sentiment_plot, width = 10, height = 8)
In the early part (Chapters 1-30), the meeting between Anna and Vronsky, and the relationship between Kitty and Levin, bring about an increase in positive sentiment. However, conflicts in Kitty and Levin’s relationship introduce negative sentiments. In the early mid part (Chapters 31-60), as Anna and Vronsky’s relationship deepens, positive sentiments continue, but Kitty’s disappointment and recovery process lead to more negative sentiments. The mid part (Chapters 61-120) focuses on Levin’s rural life and philosophical reflections, which bring positive sentiments, while conflicts between Anna and Vronsky significantly increase negative sentiments. In the late mid part (Chapters 121-180), Anna’s pain and conflicts intensify, leading to high negative sentiments, while Levin and Kitty’s marriage and happy moments introduce positive sentiments. In the final part (Chapters 181-240), Anna’s tragic end causes negative sentiment to peak, but Levin and Kitty’s domestic life brings an increase in positive sentiments. Overall, positive sentiments mainly appear during the development of relationships between Anna and Vronsky and Levin and Kitty, while negative sentiments arise from conflicts between Anna and Vronsky, and Anna’s internal pain and despair.
nrc_sentiments <- get_sentiments("nrc")
nrc_sentiments_anna <- tidy_anna_karenina %>%
inner_join(nrc_sentiments, by = "word")
## Warning in inner_join(., nrc_sentiments, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 12 of `x` matches multiple rows in `y`.
## ℹ Row 12986 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
sentiment_counts <- nrc_sentiments_anna %>%
count(sentiment, sort = TRUE)
sentiment_plot <- ggplot(sentiment_counts, aes(x = reorder(sentiment, n), y = n, fill = sentiment)) +
geom_col() +
coord_flip() +
labs(title = "NRC Sentiment Analysis of Anna Karenina",
x = "Sentiment",
y = "Count") +
theme_minimal()
print(sentiment_plot)
ggsave(filename = "nrc_sentiment_analysis_anna_karenina.png", plot = sentiment_plot, width = 10, height = 8)
In “Anna Karenina,” positive sentiment is the most frequent, reflecting moments of love and happiness, such as Levin and Kitty’s relationship and early stages of Anna and Vronsky’s love. Negative sentiment is also prominent due to conflicts and tragic events, like Anna’s affair, her tragic death, and Levin’s struggles. Trust is crucial, highlighted in Levin and Kitty’s relationship and Levin’s interactions with farm workers. Joy appears in events like weddings, recoveries, and childbirth. Anticipation is seen in wedding preparations, future plans, and Levin’s agricultural innovations. Sadness pervades the story with Anna’s suicide, illnesses, and deaths. Fear arises from social and personal uncertainties, such as the fear of affairs being discovered or farm failures. Anger is present in conflicts, like Karenin’s reaction to Anna’s affair and societal rejection. Disgust occurs in social condemnation and perceived immoral relationships. Surprise, while least frequent, is significant in moments like Anna’s suicide and Levin’s revelations. This analysis connects the novel’s key plot points with the emotions they evoke, providing insight into the emotional depth of Tolstoy’s work.
install.packages("gutenbergr")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
install.packages("dplyr")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
install.packages("tidytext")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
install.packages("ggplot2")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
install.packages("textdata")
##
## The downloaded binary packages are in
## /var/folders/dt/c1pyqxt57hb3sm0487zltg540000gn/T//RtmpAUPDN8/downloaded_packages
library(gutenbergr)
library(dplyr)
library(tidytext)
library(ggplot2)
library(textdata)
nrc_sentiments <- get_sentiments("nrc")
anna_karenina <- gutenberg_download(1399)
anna_karenina <- anna_karenina %>%
mutate(chapter = cumsum(str_detect(text, regex("^Chapter [0-9]+", ignore_case = TRUE)))) %>%
filter(chapter > 0)
tidy_anna_karenina <- anna_karenina %>%
unnest_tokens(word, text) %>%
mutate(word = tolower(word)) %>%
anti_join(stop_words, by = "word")
word_counts <- tidy_anna_karenina %>%
count(word, sort = TRUE)
tf_idf <- tidy_anna_karenina %>%
count(chapter, word, sort = TRUE) %>%
bind_tf_idf(word, chapter, n)
top_tf_idf <- tf_idf %>%
arrange(desc(tf_idf)) %>%
top_n(20, tf_idf)
emotion_words <- tidy_anna_karenina %>%
inner_join(nrc_sentiments, by = "word") %>%
count(word, sentiment, sort = TRUE) %>%
bind_tf_idf(word, sentiment, n) %>%
group_by(sentiment) %>%
top_n(10, tf_idf) %>%
ungroup() %>%
arrange(sentiment, desc(tf_idf))
## Warning in inner_join(., nrc_sentiments, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 3 of `x` matches multiple rows in `y`.
## ℹ Row 12986 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
emotion_plot <- ggplot(emotion_words, aes(x = reorder(word, tf_idf), y = tf_idf, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ sentiment, scales = "free") +
coord_flip() +
labs(title = "Top TF-IDF Words by Sentiment in Anna Karenina",
x = "Words",
y = "TF-IDF")
print(emotion_plot)
ggsave(filename = "top_tf_idf_words_by_sentiment_anna_karenina.png", plot = emotion_plot, width = 10, height = 8)
The sentiment analysis of “Anna Karenina” reveals how various emotions are intertwined with the novel’s plot and characters. Words associated with anger, such as “words,” “spite,” “blame,” and “angry,” are linked to moments of conflict and betrayal, like Karenin discovering Anna’s affair and Anna’s internal strife with Vronsky. Anticipation is highlighted through terms like “time,” “coming,” “ready,” and “letter,” reflecting moments of expectation and preparation, including Levin’s proposal to Kitty and Anna’s visions of her future with Vronsky.
Disgust is conveyed through words like “gray,” “blame,” “ashamed,” and “sick,” related to societal condemnation and Anna’s feelings of shame due to her ostracism. Fear is depicted with keywords such as “change,” “afraid,” “difficult,” and “force,” showing characters’ anxieties, including Anna’s fear of exposure, Vronsky’s fear in battle, and Levin’s fear of failure on his farm.
Joy is represented by words like “love,” “smile,” “happy,” and “child,” illustrating the joyous moments in Levin and Kitty’s marriage and family life. Negative sentiments are reflected through terms such as “words,” “spite,” “afraid,” and “impossible,” highlighting the novel’s overall negative tone, including Anna’s tragic end and Levin’s existential struggles.
Positive emotions are conveyed by words like “love,” “princess,” “smile,” and “talk,” found mainly in Levin and Kitty’s relationship, their love, marriage, and family life. Sadness is captured through keywords such as “love,” “impossible,” “mother,” and “doubt,” tied to the tragic elements of the story, like Anna’s despair and eventual suicide.
Surprise is depicted with words like “suddenly,” “smile,” “chance,” and “rapid,” capturing unexpected events such as Anna’s suicide and Levin’s revelations. Trust is emphasized through terms like “smile,” “brother,” “father,” and “doctor,” highlighting the importance of trust in Levin and Kitty’s relationship and their stable family life.
This sentiment analysis provides a deeper understanding of the emotional landscape in “Anna Karenina,” showing how Tolstoy intertwines emotions with the narrative to enrich the reader’s experience.
The sentiment analysis charts for “Anna Karenina” were designed for clarity, readability, and effective communication. Colors were chosen to quickly distinguish different emotions and maintain consistency across the charts. Simple sans-serif fonts were used to enhance readability, with appropriate font sizes to ensure all text is legible. Bar charts and line graphs were used to clearly compare word frequencies and emotional trajectories, while faceting allowed for individual analysis of each sentiment. Legends and labels clarified the meaning of visual elements, ensuring that the data is accurately represented without distortion.
```