This code begins by loading the book, Alice in Wonderland, from the Gutenburg Project. This site has released over 60,000 free books online for the public to read.
From there we will analyze the sentiment of each sentence in the book, using the package Syuzhet. This package will score each sentence with a negative number being a negative sentence and a positive value correlating to a happy or positive sentence.
alice_text_raw <- get_text_as_string('http://www.gutenberg.org/files/11/11-0.txt')
substring(alice_text_raw,1,2000)## The Project Gutenberg EBook of Alice’s Adventures in Wonderland, by Lewis Carroll This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org Title: Alice’s Adventures in Wonderland Author: Lewis Carroll Release Date: June 25, 2008 [EBook #11] Last Updated: February 22, 2020 Language: English Character set encoding: UTF-8 *** START OF THIS PROJECT GUTENBERG EBOOK ALICE’S ADVENTURES IN WONDERLAND *** Produced by Arthur DiBianca and David Widger [Illustration] Alice’s Adventures in Wonderland by Lewis Carroll THE MILLENNIUM FULCRUM EDITION 3.0 Contents CHAPTER I. Down the Rabbit-Hole CHAPTER II. The Pool of Tears CHAPTER III. A Caucus-Race and a Long Tale CHAPTER IV. The Rabbit Sends in a Little Bill CHAPTER V. Advice from a Caterpillar CHAPTER VI. Pig and Pepper CHAPTER VII. A Mad Tea-Party CHAPTER VIII. The Queen’s Croquet-Ground CHAPTER IX. The Mock Turtle’s Story CHAPTER X. The Lobster Quadrille CHAPTER XI. Who Stole the Tarts? CHAPTER XII. Alice’s Evidence CHAPTER I. Down the Rabbit-Hole Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?” So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so _very_ remarkable in that; nor did Alice think it so _very_ much out of the way to hear the Rabbit say to itself, “Oh
The text from the e-book contains some extra words including a table of contents and a conclusion after the book about the Gutenburg Project. We will remove both of these sections in order to complete our sentiment analysis.
#Delete end text talking about Gutenburg project
alice_split <- str_split(alice_text_raw," THE END ")
alice_text <- alice_split[[1]][1]
alice_sentences_raw <- get_sentences(alice_text)
#Delete first few sentences containing table of contents & publishing info
alice_sentences <- alice_sentences_raw[14:length(alice_sentences_raw)]
#Add additional columns
alice_df <- as.data.frame(alice_sentences)
alice_df$sentence_number <- seq.int(nrow(alice_df)) #Sentence number
alice_df$sentiment_score <- get_sentiment(alice_sentences, method="syuzhet") #Sentence sentiment value
alice_df$sentiment_word <- if(alice_df$sentiment_score > 0){'positive'}else if(alice_df$sentiment_score == 0){'neutral'}else{'negative'}
#Get the chapter using regular expressions
alice_df <- alice_df %>%
mutate(chapter = as.double(cumsum(str_detect(alice_sentences, regex("CHAPTER [\\divxlc.$]",
ignore_case = TRUE)))))
kable(head(alice_df)) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>% scroll_box(width = "100%") | alice_sentences | sentence_number | sentiment_score | sentiment_word | chapter |
|---|---|---|---|---|
| Alice’s Evidence CHAPTER I. | 1 | 0.00 | neutral | 1 |
| Down the Rabbit-Hole Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?” | 2 | -0.65 | neutral | 1 |
| So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. | 3 | 0.70 | neutral | 1 |
| There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, "Oh dear! | 4 | 1.25 | neutral | 1 |
| Oh dear! | 5 | 0.50 | neutral | 1 |
| I shall be late!" | 6 | -0.25 | neutral | 1 |
alice_tokens <- get_tokens(paste(alice_df$alice_sentences,collapse = " "))
alice_words <- as.data.frame(alice_tokens)
colnames(alice_words) <- c('word')
#Remove stop words
stopwords <- tidytext::get_stopwords(language = "en",source = "smart")
stopwords <- bind_rows(tibble(word = c("t","s","i","like","tm","ll"),
lexicon = c("custom","custom","custom","custom","custom","custom")),
stopwords)
alice_words <- alice_words %>%
anti_join(stopwords, by= c("word" = "word"))## Warning: Column `word` joining factor and character vector, coercing into
## character vector
word_count <- alice_words %>%
count(word,sort = TRUE) %>%
arrange(-n)
head(word_count,15) %>%
ggplot(aes(word, n)) +
geom_col() +
coord_flip() +
ggtitle("Most Frequently Used Words") +
theme(plot.title = element_text(hjust = 0.5)) +
ylab("Count")Here we see the sentiment analysis over the entire book. It is clear, there is a lot of movement from negative to positive emotions in the section of the book. Something really terrible looks to happen twice in the end of the book.
ggplot(alice_df,aes(x=sentence_number,y=sentiment_score)) +
geom_line() +
ggtitle("Sentence Sentiment Over Entire Book - Alice in Wonderland") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Sentence #") +
ylab("Sentence Sentiment")Since there are many chapters, we will look at the sentiment over each chapter individually. Chapter 12, the final chapter has the most drastic negative to positive movement back and forth. Chapter 1 appears to have some positive spikes in text. Also, there appear to be some negative spans at the end of Chapter 2.
ggplot(alice_df,aes(x=sentence_number,y=sentiment_score,color=chapter)) +
geom_line() +
ggtitle("Sentence Sentiment Per Chapter - Alice in Wonderland") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Sentence #") +
ylab("Sentence Sentiment") +
facet_wrap(~chapter,scales = "free_x")Although I haven’t seen Alice in Wonderland in a long time, the most negative sentence from chapter 12, I believe is describing a scene where Alice is fleeing away from the Queen who wants to kill her. The scene is chaotic as she runs for her life while being chased . See Youtube Video for Movie Scene
ending_lows <- alice_df %>% filter(chapter == 12) %>% arrange(sentiment_score)
kable(head(ending_lows$alice_sentences,1)) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>% scroll_box(width = "100%") | x |
|---|
| The long grass rustled at her feet as the White Rabbit hurried by—the frightened Mouse splashed his way through the neighbouring pool—she could hear the rattle of the teacups as the March Hare and his friends shared their never-ending meal, and the shrill voice of the Queen ordering off her unfortunate guests to execution—once more the pig-baby was sneezing on the Duchess’s knee, while plates and dishes crashed around it—once more the shriek of the Gryphon, the squeaking of the Lizard’s slate-pencil, and the choking of the suppressed guinea-pigs, filled the air, mixed up with the distant sobs of the miserable Mock Turtle. |
It’s clear that the story, Alice in Wonderland, has a lot of ups and downs as seen by the sentiment analysis. With more experience I would like use a more advanced stop word list and a more complex lexicon, which includes negative word combinations (e.g. not happy is known to be a negative phrase).