Edgar Allen Poe Project

Aaron Byington 4/03/22

Question

Edgar Allen Poe’s short stories invoked readers in newspapers and beyond. And Poe wrote many short stories surrounding gloom and death. This project aims to analyze the happiness of his work throughout three classical texts.

Hypothesis

I argured that throughout three of Poe’s stories, the words he used may of skewed the mean value score up higher or lower then it seems. So I made three column charts for each story: a positive sentiment, a negative sentiment and a n-grams sentiment using either the word ‘not’ or ‘no’.

The short stories included are The Fall of House of Usher(1839), The Masque of the Red Death(1843) and Amontiladdo (1846). These stories were converted as follow:

Required R Packages .

library(devtools)

## Loading required package: usethis

library(tm)

## Loading required package: NLP

library(rmarkdown)
library(readr)
library(tidytext)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ dplyr   1.0.7
## ✓ tibble  3.1.6     ✓ stringr 1.4.0
## ✓ tidyr   1.2.0     ✓ forcats 0.5.1
## ✓ purrr   0.3.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x ggplot2::annotate() masks NLP::annotate()
## x dplyr::filter()     masks stats::filter()
## x dplyr::lag()        masks stats::lag()

library(tidyr)
library(ggplot2)
library(wordcloud)

## Loading required package: RColorBrewer

library(textdata)
library(dplyr)
library(stringr)
library(ggthemes)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

Setup

To start this projected I collected and downloaded each story from the Gutenburg Project. I choose these stories as some of the most dreary from https://www.sparknotes.com/blog/edgar-allan-poe-stories-ranked-by-how-creepy-they-are/. Then I downloaded each story from the following links in the Gutenburg Project:

The Fall of the House of Usher: https://www.gutenberg.org/ebooks/932

The Masque of the Red Death: https://www.gutenberg.org/ebooks/1064

The Cask of Amontillado: https://www.gutenberg.org/ebooks/1063

Usher <- read_csv("Usher.txt", 
                  col_names  = FALSE)

## Warning: One or more parsing issues, see `problems()` for details

## Rows: 470 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): X1
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Usher %>% 
unnest_tokens(word, X1) ->   usher_words 

count(usher_words)

## # A tibble: 1 × 1
##       n
##   <int>
## 1  7198

usher_words %>% 
  inner_join(get_sentiments('afinn')) %>% 
  arrange(desc(value)) -> usher_sentiment

## Joining, by = "word"

 MasqueRedDeath <- read_csv("MasqueRedDeath.txt", 
                            col_names = FALSE)

## Warning: One or more parsing issues, see `problems()` for details

## Rows: 191 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): X1
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

 #View(MasqueRedDeath) 
 
 MasqueRedDeath %>% 
    unnest_tokens(word, X1) -> RedDeath_words
 
 count(RedDeath_words)

## # A tibble: 1 × 1
##       n
##   <int>
## 1  2439

  RedDeath_words %>% 
   inner_join(get_sentiments('afinn')) %>% 
   arrange(desc(value)) -> MasqueRedDeath_sentiment

## Joining, by = "word"

  Amontillado <- read_csv("Amontillado.txt",
                         col_names = FALSE)

## Warning: One or more parsing issues, see `problems()` for details

## Rows: 212 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): X1
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

 Amontillado %>% 
    unnest_tokens(word, X1) -> amontillado_words
 
  count(amontillado_words)

## # A tibble: 1 × 1
##       n
##   <int>
## 1  2349

   amontillado_words %>% 
   inner_join(get_sentiments('afinn')) %>% 
   arrange(desc(value)) -> amontillado_sentiment

## Joining, by = "word"

Count

Usher: 7198

Red Death: 2439

Amontillado: 2349

AFINN value scores

Usher: -0.387012987012987

Red Death: -0.490909090909091

Amontillado: 0.258741258741259

As we can see here: Usher and Red Death are negative and Amontillado is positive. Note that Usher has over 7000 words compared to just around 2,500 words.

Next, I wanted to see the frequency for each word while also measuring it’s AFINN Value.

Fall of the House of Usher

usher_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("afinn")) %>% 
  filter(value <= 0) %>% 
  count(word, value, sort = TRUE)%>% 
  head(20) %>% 
  ggplot(aes(reorder(word, n), n, fill = value)) + geom_col() + coord_flip() + ylab("Number of Occurrences") +
      xlab("Words") + ggtitle("The Fall of the House of Usher Negative Sentiment")

## Joining, by = "word"
## Joining, by = "word"

usher_words %>% 
   anti_join(stop_words) %>% 
  inner_join(get_sentiments('afinn')) %>% 
     filter(value > 0) %>% 
   count(word,  value,sort = TRUE)  -> usher_positive

## Joining, by = "word"
## Joining, by = "word"

usher_positive %>% 
  head(20) %>% 
  ggplot(aes(reorder(word, n,),n, fill = value)) + geom_col() + coord_flip() + ylab("Number of Occurrences") +
      xlab("Words") + ggtitle("The Fall of the House of Usher Positive Sentiment")

The most negative and frequent words of the Fall of the House of Usher are “terror” and "terrible with a frequent use of 6 times and also both scoring an AFINN score of a -3 On the positive graph there is most common words are spirit, intense and earnest where they all stand between 1 and 2 on the AFINN value chart. This could suggest that the story has a more frequent use of really negative words thus earning them a sligtly negative mean value score (-1.38).

Usher %>% 
      unnest_tokens(bigram, X1, token="ngrams", n=2)  -> Usher_bigrams 
    
    Usher_bigrams %>% 
    count(bigram, sort = TRUE)

## # A tibble: 5,301 × 2
##    bigram       n
##    <chr>    <int>
##  1 of the     164
##  2 in the      41
##  3 upon the    30
##  4 of his      27
##  5 of a        24
##  6 and the     23
##  7 to the      22
##  8 it was      17
##  9 of my       16
## 10 had been    15
## # … with 5,291 more rows

   Usher_filtered <- Usher_bigrams %>% 
       separate(bigram, c("word1", "word2"), sep=" ")
   
   
  usher_words %>% 
      anti_join(stop_words) %>% 
      inner_join(get_sentiments("afinn"))  ->  UsherAFINN

## Joining, by = "word"
## Joining, by = "word"

   no_words <-  Usher_filtered %>%
      filter(word1 == "no") %>%
      inner_join(UsherAFINN, by = c(word2 = "word")) %>%
      count(word2, value, sort = TRUE)
   
   Usher_filtered %>% 
      filter(!word1 %in% stop_words$word) %>% 
      filter(!word2 %in% stop_words$word) -> Usher_filtered2
   
   Usher_bigram_counts <- Usher_filtered2 %>% 
      count(word1, word2, sort = TRUE )
   
   no_words %>%
      mutate(contribution = n * value) %>%
      arrange(desc(abs(contribution))) %>%
      head(20) %>%
      mutate(word2 = reorder(word2, contribution)) %>%
      ggplot(aes(n * value, word2, fill = n * value > 0)) +
      geom_col(show.legend = FALSE) +
      labs(x = "Sentiment value * number of occurrences",
           y = "Words preceded by \"no\"")

The n-grams chart suggests that there is the use of not for a negative and positive word. Thus ‘evening’ out the sentiment score line for both. ## Masque of the Red Death

 RedDeath_words %>% 
   anti_join(stop_words) %>% 
   inner_join(get_sentiments("afinn")) %>% 
   filter(value <= 0) %>% 
   count(word, value, sort = TRUE)%>% 
   head(20) %>% 
   ggplot(aes(reorder(word, n), n, fill = value)) + geom_col() + coord_flip() + ylab("Number of Occurrences") +
       xlab("Words") + ggtitle("Masque of the Red Death Negative Sentiment")

## Joining, by = "word"
## Joining, by = "word"

 RedDeath_words %>% 
   anti_join(stop_words) %>% 
   inner_join(get_sentiments('afinn')) %>% 
   count(word, value,sort = TRUE) %>% 
   filter(n >0)  %>% 
   filter(value > 0) %>%  
   head(20) %>% 
   ggplot(aes(reorder(word, n,),n, fill = value)) + geom_col() + coord_flip() + ylab("Number of Occurrences") +
       xlab("Words") + ggtitle("The Fall of the House of Usher Positive Sentiment")

## Joining, by = "word"
## Joining, by = "word"

## Explanation

The Masque of the Red Death most common word is death which is roughly a score of -2 on gthe AFINN scale. This is followed by terror and mad where both have an extreme negative score of -3 occurring 3 times each.

It also shares the same word in terror and death.

On the Positive side, the word dreams occurs the most but has the lowest positive value. This is followed by bold and strong which share a similar value score. Two words that stand out are excited and fantastic both ranging of a value score between 3 and 4.

MasqueRedDeath %>% 
      unnest_tokens(bigram, X1, token="ngrams", n=2)  -> MasqueRedDeath_bigrams 

MasqueRedDeath_bigrams %>% 
    count(bigram, sort = TRUE)

## # A tibble: 1,842 × 2
##    bigram         n
##    <chr>      <int>
##  1 of the        60
##  2 in the        20
##  3 to the        15
##  4 from the      13
##  5 it was        12
##  6 and the       11
##  7 there were    10
##  8 the prince     9
##  9 there was      8
## 10 of his         7
## # … with 1,832 more rows

MasqueRedDeath_filtered <- MasqueRedDeath_bigrams %>% 
       separate(bigram, c("word1", "word2"), sep=" ")
   
   
RedDeath_words %>% 
      anti_join(stop_words) %>% 
      inner_join(get_sentiments("afinn"))  ->  MasqueRedDeathAFINN

## Joining, by = "word"
## Joining, by = "word"

   not_words1 <-  MasqueRedDeath_filtered %>%
      filter(word1 == "not") %>%
      inner_join(MasqueRedDeathAFINN, by = c(word2 = "word")) %>%
      count(word2, value, sort = TRUE)
   
   MasqueRedDeath_filtered %>% 
      filter(!word1 %in% stop_words$word) %>% 
      filter(!word2 %in% stop_words$word) -> MasqueRedDeath_filtered2
   
   MasqueRedDeath_bigram_counts <- MasqueRedDeath_filtered2 %>% 
      count(word1, word2, sort = TRUE )
   
   not_words1 %>%
      mutate(contribution = n * value) %>%
      arrange(desc(abs(contribution))) %>%
      head(20) %>%
      mutate(word2 = reorder(word2, contribution)) %>%
      ggplot(aes(n * value, word2, fill = n * value > 0)) +
      geom_col(show.legend = FALSE) +
      labs(x = "Sentiment value * number of occurrences",
           y = "Words preceded by \"not\"")

The word approved is slightly positive, but is not substaintial enough to change it’s value.

The Cask of Amontillado

 amontillado_words %>% 
   anti_join(stop_words) %>% 
   inner_join(get_sentiments("afinn")) %>% 
   filter(value <= 0) %>% 
   count(word, value, sort = TRUE)%>% 
   head(20) %>% 
   ggplot(aes(reorder(word, n), n, fill = value)) + geom_col() + coord_flip() + ylab("Number of Occurrences") +
       xlab("Words") + ggtitle("Amontillado Negative Sentiment")

## Joining, by = "word"
## Joining, by = "word"

 amontillado_words %>% 
   anti_join(stop_words) %>% 
   inner_join(get_sentiments('afinn')) %>% 
   count(word, value,sort = TRUE) %>% 
   filter(n >0)  %>% 
   filter(value > 0) %>%  
   head(20) %>% 
   ggplot(aes(reorder(word, n,),n, fill = value)) + geom_col() + coord_flip()  + ylab("Number of Occurrences") +
       xlab("Words") + ggtitle("Amontillado Positive Sentiment")

## Joining, by = "word"
## Joining, by = "word"

This stories most common word doubts is relatively low on the value scale as well cry which is tired for the 2nd most. Die and Arrested are the most negative words in this chart, but only share one occurrence.

The most common positive word in Amontillado is true which has an AFINN score of 2.0 and holding 4 spots each. Matter is up next, but is relatively small at a value of 1.0. The most common word with a high value is love followed by happy, excited, beloved, astounded and admired.

Amontillado %>% 
      unnest_tokens(bigram, X1, token="ngrams", n=2)  -> Amontillado_bigrams 

Amontillado_bigrams %>% 
    count(bigram, sort = TRUE)

## # A tibble: 1,779 × 2
##    bigram      n
##    <chr>   <int>
##  1 of the     30
##  2 ugh ugh    14
##  3 he he      12
##  4 i said     12
##  5 it is      10
##  6 <NA>       10
##  7 and the     9
##  8 you are     9
##  9 he said     8
## 10 i had       8
## # … with 1,769 more rows

Amontillado_filtered <- Amontillado_bigrams %>% 
       separate(bigram, c("word1", "word2"), sep=" ")
   
   
amontillado_words %>% 
      anti_join(stop_words) %>% 
      inner_join(get_sentiments("afinn"))  ->  AmontilladoAFINN

## Joining, by = "word"
## Joining, by = "word"

   no_words2 <-  Amontillado_filtered %>%
      filter(word1 == "no") %>%
      inner_join(AmontilladoAFINN, by = c(word2 = "word")) %>%
      count(word2, value, sort = TRUE)
   
   Amontillado_filtered %>% 
      filter(!word1 %in% stop_words$word) %>% 
      filter(!word2 %in% stop_words$word) -> Amontillado_filtered2
   
   Amontillado_bigram_counts <- Amontillado_filtered2 %>% 
      count(word1, word2, sort = TRUE )
   
   no_words2 %>%
      mutate(contribution = n * value) %>%
      arrange(desc(abs(contribution))) %>%
      head(20) %>%
      mutate(word2 = reorder(word2, contribution)) %>%
      ggplot(aes(n * value, word2, fill = n * value > 0)) +
      geom_col(show.legend = FALSE) +
      labs(x = "Sentiment value * number of occurrences",
           y = "Words preceded by \"no\"")

‘Matter’ is a slightly positive word, so this in turn would make it negtive.

Assumptions

I thought that some of these stories to a degree was substantial due to Poe’s creative use of language.

Explanation

I saw three different scenarios in each of the stories. In the first bi-gram of The Fall of the House of Usher. The word ‘no’ was used both negatively and positively. However, in the word count we see this story had over 7000 words - increasing the amount of

In the second bi-gram using ‘not’ instead of ‘no’ I was able to see that it only occurred once before ‘approved’. Which according to the sentiment is a positive word thus becoming a negative word. The original mean score was already very negative, so that did not adjust the score.

The third bi-gram used ‘no’ again and ‘matter’ was the only thing that popped up. Thus becoming more negative then it really is. Since this mean value sentiment score is positive, perhaps this (and other words) exemplifies a decrease. However, more research would have to be done.

Conclusion

Poe used more harsh words in Death of the Red Masque and The Fall of the House of Usher then positive words. And in The Cask of Amontillado Poe used more positive (although not highly valued) words compared to negative words. The maximum positive word Poe used was valued a 4 and the most negative word Poe used was valued a -3. Thus, Poe never reached extreme (5 or -5) words in his texts. This could explain why his sentiment analysis does not ranges between 0.5 and -2.

Throughout my column charts no correlation could be inferred since measuring the bi-grams only slightly might of adjusted the mean value sentiment score. More research into n-grams would have to be done in order to analyze the sentimental text.