Summary
This short analysis is done for a self-training purpose. It is based on the Introduction to the Syuzhet Package by Matthew Jockers
The text is based on the English version from May 13, 2011 (release date).
First view at Flowers of Evil
The basic idea is to extract the text and its sentences, and put a value on words which compose the book. Negative values imply negative emotions, positive values imply positive emotions and 0 is neutral.
library(syuzhet)
library(ggplot2)
library(tm)
library(SnowballC)
library(dplyr)
library(tidyr)#load txt
my.text <- get_text_as_string("C:/Users/marc/Desktop/Data/160914_fleurs du mal/fleurs du mal.txt")
#detect sentence
text.sent <- get_sentences(my.text)
sentiment.vector <- get_sentiment(text.sent, method="syuzhet")
#first look and first surprise, median and mean are positive values
summary(sentiment.vector)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.9500 -0.7000 0.1500 0.3236 1.3120 5.5500
#first plot
sentiment.dtf <- as.data.frame(sentiment.vector)
sentiment.dtf$sentence <- rownames(sentiment.dtf)
colnames(sentiment.dtf) <- c("sentiment","sentence")
sentiment.dtf$sentence <- as.numeric(sentiment.dtf$sentence)
ggplot(sentiment.dtf, aes(x=sentence, y= sentiment))+
geom_line(size=1.0, colour="grey")+
geom_smooth(method = "loess", se=FALSE, colour="red", size=1.2)+
theme_bw()+
xlab("Narrative Time") +
ylab("Emotional Valence")+
ggtitle("Emotional valence fluctuation among Flowers of Evil")A Lot of noise here. There are sometimes strong fluctuation from one sentences to the next ones. For example, between the sentences 249 and 252, we move from 5.1 value (highly positive) to -4.9 (highly negative).
They are both related to the “The Set of the Romantic Sun” poem.
Those 2 sentences are:
- The Set of the Romantic Sun How beauteous the sun as it rises supreme, Like an explosion that greets us from above, Oh, happy is he that can hail with love, Its decline, more glorious far, than a dream.
- But the god, who eludes me, I chase all in vain, The night, irresistible, plants its domain, Black mists and vague shivers of death it forbodes; While an odour of graves through the darkness spreads, And on the swamp’s margin, my timid foot treads Upon slimy snails, and on unseen toads.
It is worth saying that we extracted sentences and not verses which is probably not the best way to analyze poetry. Flowers of Evil is (I think) sometimes labelled as gloomy or sad. I was expecting to see something quite negative but it turned out in average it seems pretty neutral - slightly positive at first view. Obviously some words meaning vary in the context, and a word could be counted as positive while in the context it was negative; however, I guess this is not something that this method could detect.
Percentage Values
To reduce the noise of the previous chart, we are going to have a look at the get_percentage_values function, it “divides a text into an equal number of”chunks" and then calculates the mean sentiment valence for each chunk"
In this case, we are going to cut the text into 20 chunks and calculate the mean of each chunk.
percent_vals <- get_percentage_values(sentiment.vector, bins = 20)
plot(
percent_vals,
type="l",
main="Flowers of Evil using Percentage-Based Means",
xlab = "Narrative Time",
ylab= "Emotional Valence",
col="red"
) It definitely helps to see the trend and clearly highlight the negativity around the three-quarter length of the book.
To complete the picture, the Syuzhet package also come with a handy function named simple plot. It applies three smoothing methods on a vector (moving average, loess, and discrete cosine transformation).
Simple plot
simple_plot(sentiment.vector, title = "Syuzhet Plot", legend_pos = "right")Emoticon lexicon overview
Let’s have a look now at what type of emotions we are dealing with. We are going to use the NRC method from Saif Mohammad:“the NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).
So, let’s start with the positive and negative sentiments.
nrc.sentiment <- get_nrc_sentiment(text.sent)
ggplot(nrc.sentiment, aes(x=seq(1:292)))+
geom_smooth(aes(y=positive, colour="Positive"), size=1.2, se=FALSE, method = "loess")+
geom_smooth(aes(y=negative, colour= "Negative"), size=1.2, se=FALSE, method= "loess")+
scale_colour_manual("",
breaks = c("Positive", "Negative"),
values = c("Positive"="green", "Negative"="red"))+
xlab("Narrative Time") +
ylab("Emotional Valence") +
theme_bw()+
ggtitle("Positive and Negative valence")sum.nrc <- as.data.frame(apply(nrc.sentiment[,1:8], 2, sum))
sum.nrc$emotion <- rownames(sum.nrc)
colnames(sum.nrc) <- c("freq", "emotion")
sum.nrc$type[c(2,5,7,8)] <- "positive"
sum.nrc$type[c(1,3,4,6)] <- "negative"
ggplot(sum.nrc, aes(x=emotion, y= freq, fill= type)) +
geom_bar(stat="identity")+
ggtitle("Count of Emotion types in the Flowers of Evil")+
theme_bw()+
scale_fill_manual(values = c("#fc4e03","#62ca35"))+
theme(legend.position="none")nrc.sentiment$sentence <- rownames(nrc.sentiment)
nrc.sentiment <- gather(nrc.sentiment, emotion, value, -sentence)
nrc.sentiment <- filter(nrc.sentiment, emotion != "negative", emotion != "positive")
nrc.sentiment$sentence <- as.numeric(nrc.sentiment$sentence)
nrc.sentiment$emotion <- as.factor(nrc.sentiment$emotion)
my_colour <- c("#fd7e00","#0a5407", "#ea0000", "#9a0101", "#98f8c0", "#ffcc00", "#96f00f", "#62ca35")
ggplot(nrc.sentiment, aes(x=sentence, y=value, color=emotion)) +
geom_smooth(method = "loess", se=FALSE, size=2) +
theme_bw()+
ggtitle("Emotion types along Flowers of Evil")+
xlab("Narrative Time")+
scale_color_manual(values = my_colour)