The Divine Comedy Sentiment Analysis

Introduction

The Divine Comedy is a 14,233 line narrative poem by Italian philosopher Dante Alighieri. Written over the course of 12 years, the poem was published in 1320 and is considered one of the greatest, most important works of Western literature. The Divine Comedy is divided into three cantiche - Inferno, Purgatory and Paradise - and follows Alighieri’s imagined soul as it traverses the three realms of the dead. The religious poem is heavily influenced by Roman, Catholic and Islamic mythology and is meticulously structured, with each of the 3 cantiche consisting of 33 cantos that contain 33 syllables. Clearly, every word of The Divine Comedy was carefully considered, so I am interested in performing a sentiment analysis on each cantica of the epic poem to assess my hypothesis that, as the narrative progresses, the tone of The Divine Comedy rises alongside the soul’s ascension to heaven.

Required Libraries

library(tidyverse) 
library(tidytext)
library(readr)
library(ggplot2)
library(textdata)
library(wordcloud2)

Inferno Analysis

Formatting the data

First, I loaded the entire text of Inferno from Project Gutenberg by downloading the text as a .txt file and loading uploading it as a .csv file. I titled this “hell.”

hell <- read_csv("~/Desktop/hell.txt")

I added a heading to this file titled “text,” then I used “unnest tokens” to make each word of the poem a distinct value and made this a new filed called “hell_words.”

colnames(hell)[1] <- "text"

hell %>% 
  unnest_tokens(word, text) -> hell_words

Visualizing Inferno’s sentiment

To analyze the poem’s sentiment, I used the “AFINN” lexicon. I removed the stop words from the poem and applied the lexicon to hell_words to get a list of the most used sentiment words in Inferno in descending order of their frequency. Naturally the AFINN lexicon will not include any character names, like Beatrice or Virgil, that appear frequently in the text, so along with stop words, I will not have to worry about the inclusion of names. I made this list into a new file called “hell_sentiment.”

hell_sentiment <- hell_words %>% 
  anti_join(stop_words) %>% 
  count(word) %>% 
  inner_join(get_sentiments("afinn")) %>% 
  arrange(desc(n))

Using my list of all of Inferno’s sentiment words, I made a word cloud. The more frequently a sentiment word is used, the larger it is in the cloud, and it is clear that the largest words in the Inferno word cloud are negative.

hell_sentiment %>% wordcloud2()

I wanted to make a graph visualizing the numeric values assigned to each word in the AFINN lexicon, so I capped the list of most frequently used sentiment words at 10 in a new file called “hell_sentiment_ten” that I could use in a chart.

hell_sentiment_ten <- hell_sentiment %>% head(10)

Then, I created the chart and filled in the columns with a dark red color to symbolize inferno.

ggplot(hell_sentiment_ten, aes(word, value, fill = word)) +
  geom_col(show.legend = FALSE, fill="dark red") + ggthemes::theme_clean() + 
  labs(title = "Inferno")

Purgatory Analysis

Formatting the data

The process used to visualize Inferno’s sentiment will be similarly applied for the poems two following cantiche. First, the text must be loaded into R and “unnested” so each word is recognized as a distinct value.

purgatory <- read_csv("~/Desktop/purgatory.txt")

colnames(purgatory)[1] <- "text"

purgatory %>% 
  unnest_tokens(word, text) -> purgatory_words

Visualizing Purgatory’s sentiment

Again I applied the AFINN lexicon to “purgatory_words” after removing stop words and arranging them in descending order. I used this list to create a new file called “purgatory_sentiment.”

purgatory_sentiment <- purgatory_words %>% 
  anti_join(stop_words) %>% 
  count(word) %>% 
  inner_join(get_sentiments("afinn")) %>% 
  arrange(desc(n))

Next, I used this list of sentiment words to create a word cloud for the Purgatory cantica.

purgatory_sentiment %>% wordcloud2()

To make a graph to visualize the AFINN analysis, so I capped the list at 10 and created a new file called “purgatory_sentiment_ten.”

purgatory_sentiment_ten <- purgatory_sentiment %>% head(10)

I used “purgatory_sentiment_ten” to create a chart and colored it grey to represent purgatory.

ggplot(purgatory_sentiment_ten, aes(word, value, fill = word)) +
  geom_col(show.legend = FALSE, fill="grey") + ggthemes::theme_clean() + 
  labs(title = "Purgatory")

Paradise Analysis

Because this will be the third application of the same procedure, I do not feel it is necessary to explain the steps I took to visualize the sentiment of the Paradise cantica.

Formatting the data

paradise <- read_csv("~/Desktop/paradise.txt")

colnames(paradise)[1] <- "text"

paradise %>% 
  unnest_tokens(word, text) -> paradise_words

Visualizing Paradise’s sentiment

paradise_sentiment <- paradise_words %>% 
  anti_join(stop_words) %>% 
  count(word) %>% 
  inner_join(get_sentiments("afinn")) %>% 
  arrange(desc(n))

paradise_sentiment %>% 
  wordcloud2()

paradise_sentiment_ten <- paradise_sentiment %>% head(10)

ggplot(paradise_sentiment_ten, aes(word, value, fill = word)) +
  geom_col(show.legend = FALSE, fill="light blue") + ggthemes::theme_clean() + 
  labs(title = "Paradise")

Word Frequency

I am also interested in the most used words throughout The Divine Comedy. Using the charts, I concluded that the top three words are love, god and spirit, but I will use the count function to calculate the total number of times each word is used in the poem.

(paradise_words %>% filter(word %in% "love") %>% count()) + 
(purgatory_words %>% filter(word %in% "love") %>% count()) + 
(hell_words %>% filter(word %in% "love") %>% count())

##     n
## 1 162

(paradise_words %>% filter(word %in% "spirit") %>% count()) + 
(purgatory_words %>% filter(word %in% "spirit") %>% count()) + 
(hell_words %>% filter(word %in% "spirit") %>% count())

##     n
## 1 112

(paradise_words %>% filter(word %in% "god") %>% count()) + 
(purgatory_words %>% filter(word %in% "god") %>% count()) + 
(hell_words %>% filter(word %in% "god") %>% count())

##     n
## 1 109

Despite my predictions about the influence of Inferno’s negativity, the word love is by far the most used word in The Divine Comedy, being used a total of 162 times. Love is followed by spirit with 112 and god with 109 uses, which is less surprising considering the theological themes of the text.

Conclusion

A quick glance at the three word clouds seems to support my initial hypothesis; however, I think it is worthwhile to look at the numerical insights provided by the three graphs. Upon analyzing each of the graphs, we can see that of the top ten most frequent sentiment words in Inferno, there are 20% more negative words than positive words, and the negative total sentiment score of those words (14) is twice that of the total positive score (7). Therefore, Inferno is decidedly negative. Of the top ten most frequent sentiment words in Purgatory, there are 40% more positive words, but those words are more mild in sentiment. The positive total sentiment score (10) is only three points away from the total negative score (7), so while Purgatory does skew positive, it functions essentially as a sentimental middle ground between the two extremes of Hell and Paradise. While I believed Hell would contain the most extreme language, it is actually Paradise, as 90% of its top ten most frequent sentiment words are positive. Moreover, the positive total sentiment score (15) is 7.5 times larger than the negative total sentiment score (2), so we can conclude that Paradise is overwhelmingly positive in sentiment.

These findings support my initial hypothesis that as The Divine Comedy’s narrative progresses and Dante Alighieri’s soul ascends through the three realms of the dead, the tone of the poem will shift from net negative to net positive. However, after realizing that the top three most frequently used sentiment words in The Divine Comedy are love, god and spirit, I think it is worthwhile to critique my indiscriminate application of AFINN lexicon. In the context of the poem, I can accept an assignment of +3 to the word love, but I don’t think AFINN’s assignment of +1 to god and spirit is entirely accurate, because the intended tone of these words varies depending on the context of the poem. For example, the intended tone of the word spirit is very different in this line from Inferno - “The spirit writhed with both his feet; sighing, and with weeping voice” - compared to this line from Paradise - “With pleasure, from the Holy Spirit conceived.” The same can be said for the word god in comparing this line in Inferno - “Ye have made yourselves a god of gold and silver” - to this line in Paradise “God and our nature join’d!” Therefore, I do believe the overall findings of this project to be accurate, however, I also think that AFINN, Bing, or NRC sentiment analysis lack context and nuance critical to understanding most literature.

The Divine Comedy Sentiment Analysis

Madalyn Howard

October 18, 2020

Introduction

Required Libraries

Inferno Analysis

Formatting the data

Visualizing Inferno’s sentiment

Purgatory Analysis

Formatting the data

Visualizing Purgatory’s sentiment

Paradise Analysis

Formatting the data

Visualizing Paradise’s sentiment

Word Frequency

Conclusion