Loads the initial libraries
#Install & load the necessary libraries.
#install.packages("textdata")
#install.packages("tidytext")
library(tidytext)
library(ggplot2)
#Intall the Sentiment Lexicons
get_sentiments("afinn")
## # A tibble: 2,477 × 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # … with 2,467 more rows
get_sentiments("bing")
## # A tibble: 6,786 × 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # … with 6,776 more rows
get_sentiments("nrc")
## # A tibble: 13,872 × 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
## 7 abandoned negative
## 8 abandoned sadness
## 9 abandonment anger
## 10 abandonment fear
## # … with 13,862 more rows
library(janeaustenr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(tidyr)
# Tokenize book text into word tokens
<- austen_books() %>%
tidy_books group_by(book) %>%
mutate(linenumber = row_number(),
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]", ignore_case = TRUE)))) %>%
ungroup() %>%
unnest_tokens(word, text)
#Joy words in NRC sentiment
<- get_sentiments("nrc") %>%
nrcjoy filter(sentiment == "joy")
#Find joy words in Emma
%>%
tidy_books filter(book == "Emma") %>%
inner_join(nrcjoy) %>%
count(word, sort = TRUE)
## Joining, by = "word"
## # A tibble: 301 × 2
## word n
## <chr> <int>
## 1 good 359
## 2 friend 166
## 3 hope 143
## 4 happy 125
## 5 love 117
## 6 deal 92
## 7 found 92
## 8 present 89
## 9 kind 82
## 10 happiness 76
## # … with 291 more rows
#Calculates sentiment score
<- tidy_books %>%
janeaustensentiment inner_join(get_sentiments("bing")) %>%
count(book, index = linenumber %/% 80, sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative)
## Joining, by = "word"
#Plots ths sentiment score
ggplot(janeaustensentiment, aes(index, sentiment, fill = book)) +
geom_col(show.legend = FALSE) +
facet_wrap(~book, ncol = 2, scales = "free_x")
I wanted to see if the sentiment analysis was able to capture the sentiment to 2 poems that appear on opposite ends of the sentiment spectrum.
Elegy Written in a Country Churchyard by Thomas Gray “embodies a meditation on death, and remembrance after death. The poem argues that the remembrance can be good and bad, and the narrator finds comfort in pondering the lives of the obscure rustics buried in the churchyard.” (Wikipedia). The tone and themses of this poem are somber and perhaps a bit unsatisfying as it reflects on death, remembrance and the obscurity of the graveyard residents.
somewhere i have never travelled by ee cummings, as my English teacher once put it, is one of those love poems that high school sweethearts have been reciting to each other since it was popularized. It is an ode to the narrator’s beloved who holds a deep and intense power over him.
Using the Afinn lexicon which scores the text elements, the plot was able to capture the somber complexity of this poem as it progresses. Interestingly, most of the poem’s sentiment fluctuates between 3 and -3 indicating an overall balance of the tone. However, it most negative sentiment (-6 score) appears to hover around the phrase “The threats of pain and ruin to despise”
library(ggplot2)
library(readr)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
#Reads the text of the poem "Elegy Written In a Country Churchyard" by Thomas Gray
<- read.table("https://raw.githubusercontent.com/johnnydrodriguez/data607_week11/main/churchyard.txt", sep = "\n", quote="", fill=FALSE)
churchyard
# Converts the text into a dataframe
<- tibble(lines = 1:130, text = churchyard$V1)
churchyard_df $text <- trimws(churchyard_df$text)
churchyard_df
#Tokenizes the poem
<- churchyard_df %>%
churchyard_df unnest_tokens(word, text)
#Using the Afinn lexicon, generates sentiment value
<- churchyard_df %>%
elegysentiment inner_join(get_sentiments("afinn"))
## Joining, by = "word"
#Plots the sentiment for this poem
<- ggplot(elegysentiment, aes(lines, value)) +
e geom_col(show.legend = FALSE)+
ggtitle("Elegy Written In a Country Churchyard")
e
Surprisingly, and unlike the previous poem, the Afinn lexicon only scored a handful(5) of words. Although this is likely too small a dataset to capture the sentiment, it does reflect a generally positive sentiment.
#Reads the text of the poem "somewhere i have never travelled" by ee cummings
<- read.table("https://raw.githubusercontent.com/johnnydrodriguez/data607_week11/main/somewhere.txt", sep = "\n", quote="", fill=FALSE)
somewhere
# Converts the text into a dataframe
<- tibble(lines = 1:21, text = somewhere$V1)
somewhere_df
#Tokenizes the poem
<- somewhere_df %>%
somewhere_df unnest_tokens(word, text)
#Using the Afinn lexicon, generates sentiment value
<- somewhere_df %>%
somewheresentiment inner_join(get_sentiments("afinn"))
## Joining, by = "word"
#Plots the sentiment for this poem
<- ggplot(somewheresentiment, aes(lines, value)) +
s geom_col(show.legend = FALSE)+
ggtitle("somewhere i have never travelled")
s
The comparison shows the very distinct sentiments of these poems.
grid.arrange(e, s, ncol =2)
Although the sentiment scores were able to generalize the sentiments of these poems (perhaps fairly accurately), its unclear whether sentiment scoring can capture something as complex and, often, personal as poetry that can have very broad interpretations. In this specific case, it was surprising that “Elegy” appears to be more balanced in sentiment than its reputation, setting and themes would lead us to believe.