F. Scott Fitzgerald was a modern American writer. This research will analyze the popular words, word count and sentiments of the novels that he published while he was alive - hence why The Last Tycoon will not be analyzed.
I hypothesize that his novels will have mentions of time, love - two themes common in modernist literature. I also hypothesize that the overall sentiment will be negative,as modernist literature is composed of sad themes.
This research aims to determine if Fitzgerald’s sentiment in his novel becomes more negative as he began to become a more prominent author, struggled in the spotlight and lived in raucous and roaring 1920s.
After downloading each dataset as a text file from Project Gutenberg and Project Gutenberg Australia, I renamed the data set for ease of use throughout the project. In this case, I named “This Side of Paradise” the phrase “tsop”, “The Beautiful and the Damned” the phrase “bad” and so on and so forth. I unnested the text so I could get a proper word count of each novel, filtering out the stop words and character names. Next, I created a data table that displays the frequency of the most used words in each novel. Also, I included a quote from each novel that contextualizes one of the most used words - who doesn’t love a good Fitzgerald quote?
I looked at each novel through the three sentiment lexicons: bing, afinn and nrc. I wanted to see if there was a common sentiment throughout his novels and if that sentiment changed at any point. Interestingly enough, bing and nrc lexicons show contrasting results - this could be due to the fact that nrc has eight categories in which words are analyzed, while bing only shows two overarching sentiment categories. I also looked at the mean afinn value of each novel to gauge the overall tone of the novel as well.
The graphs depict the bing lexicon and the nrc lexicon for each novel. Also, I created wordclouds of the most used positive and most used negative words as defined by the afinn lexicon.
Published in 1920, F. Scott Fitzgerald’s first novel tells the early life of Amory Blaine. This text was pulled from Project Gutenberg’s American website. The novel is organized into sections represented by stages in Amory’s life.
tsop%>%
unnest_tokens(word,X1) -> tsopwords
Word Count:
count(tsopwords)
## # A tibble: 1 x 1
## n
## <int>
## 1 59902
The top words are night, people, life, and eyes.
tsopwords %>%
count(word, sort = TRUE) %>%
anti_join(stop_words) %>%
filter(!word %in% c("amory","rosalind", "dick", "maury", "gloria", "anthony")) %>%
arrange(desc(n)) %>%
head(10) %>%
knitr::kable()
## Joining, by = "word"
| word | n |
|---|---|
| don’t | 137 |
| night | 118 |
| i’m | 114 |
| people | 106 |
| life | 99 |
| it’s | 90 |
| eyes | 87 |
| day | 84 |
| you’re | 79 |
| love | 76 |
“The unwelcome November rain had perversely stolen the day’s last hour and pawned it with that ancient fence, the night.”
tsop_bing <- tsopwords %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
## Joining, by = "word"
ggplot(tsop_bing) + geom_bar(aes(sentiment))
tsop_nrc <-tsopwords %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
## Joining, by = "word"
ggplot(tsop_nrc) + geom_bar(aes(sentiment))
tsopwords%>%
anti_join(stop_words) %>%
inner_join(get_sentiments("afinn")) -> tsop_afinn
## Joining, by = "word"
## Joining, by = "word"
mean(tsop_afinn$value)
## [1] -0.09489832
This mean is closer 0 - meaning that the afinn sentiment is negative, almost neutral. The overall sentiment for “This Side of Paradise” in the bing lexicon is negative.
“Love” is clearly the most used, with 76 mentions, with “God” 34 times.
tsop_afinn %>%
filter(value > 0) %>%
count(word, sort = TRUE) %>%
head(10) %>%
knitr::kable()
| word | n |
|---|---|
| love | 76 |
| god | 34 |
| kiss | 31 |
| beautiful | 22 |
| care | 22 |
| laughed | 22 |
| pretty | 20 |
| reached | 17 |
| strong | 17 |
| matter | 16 |
The top 10 most common words have that have a value below 0 range from “poor” to “war” to “gray”. “Afraid” is the most used word, with 30 mentions. “Poor” is mentioned 29 times.
tsop_afinn %>%
filter(value < 0) %>%
count(word, sort = TRUE) %>%
head(10) %>%
knitr::kable()
| word | n |
|---|---|
| afraid | 30 |
| poor | 29 |
| gray | 25 |
| cried | 23 |
| damn | 21 |
| dead | 20 |
| war | 20 |
| bad | 17 |
| lost | 17 |
| tired | 17 |
tsop_afinn %>%
filter(value > 0) %>%
count(word, sort = TRUE) %>%
wordcloud2()
tsop_afinn %>%
filter(value < 0) %>%
count(word, sort = TRUE) %>%
wordcloud2()
Published in 1922, Fitzgerald’s second novel concerns a handsome young married couple who choose to wait for an expected inheritance rather than involve themselves in productive, meaningful lives." This title was shortened to “bad”. Source for book information: https://www.britannica.com/topic/The-Beautiful-and-Damned
Word Count:
## # A tibble: 1 x 1
## n
## <int>
## 1 93859
The top words are time, eyes, day, and night.
| word | n |
|---|---|
| time | 166 |
| eyes | 137 |
| day | 130 |
| night | 114 |
| life | 111 |
| sort | 96 |
| voice | 94 |
| people | 85 |
| found | 82 |
| half | 82 |
“Rather nice night, after all. Stars are out and everything. Exceptionally tasty assortment of them.”
## Joining, by = "word"
## Joining, by = "word"
## [1] -0.2205915
“Matter” is mentioned 65 times and “love” is mentioned 63 times.
| word | n |
|---|---|
| matter | 65 |
| love | 63 |
| beautiful | 43 |
| laughed | 42 |
| god | 40 |
| pretty | 35 |
| reached | 32 |
| care | 30 |
| kiss | 27 |
| cool | 24 |
“Cried” is mentioned 58 times and gray is mentioned 48 times.
| word | n |
|---|---|
| cried | 58 |
| gray | 48 |
| broken | 29 |
| demanded | 28 |
| tired | 26 |
| broke | 25 |
| fire | 24 |
| hate | 24 |
| war | 24 |
| bad | 22 |
“The Great Gatsby”, Fitzgerald’s third novel, was published in 1925. The text was pulled from Project Gutenberg Australia. “Set in Jazz Age New York, the novel tells the tragic story of Jay Gatsby, a self-made millionaire, and his pursuit of Daisy Buchanan, a wealthy young woman whom he loved in his youth. Unsuccessful upon publication, the book is now considered a classic of American fiction and has often been called the Great American Novel.” Source: https://www.britannica.com/topic/The-Great-Gatsby
gatsby %>%
unnest_tokens(word,X1) -> gatsbywords
Word Count:
## # A tibble: 1 x 1
## n
## <int>
## 1 43549
The top words are house, eyes, time and looked.
“The eyes of Doctor T. J. Eckleburg are blue and gigantic— their retinas are one yard high. They look out of no face, but, instead, from a pair of enormous yellow spectacles which pass over a nonexistent nose”
| word | n |
|---|---|
| house | 91 |
| eyes | 80 |
| looked | 79 |
| time | 73 |
| car | 69 |
| door | 67 |
| night | 67 |
| moment | 63 |
| hand | 56 |
| people | 56 |
## [1] -0.2263697
The word “love” appears 24 times, while the word “god” is mentioned 21 times.
| word | n |
|---|---|
| love | 21 |
| matter | 21 |
| god | 20 |
| loved | 19 |
| laughed | 14 |
| pretty | 14 |
| reached | 14 |
| care | 13 |
| cool | 13 |
| nice | 13 |
The word “miss” appears 38 times and “cried” appears 31 times.
| word | n |
|---|---|
| miss | 32 |
| cried | 30 |
| demanded | 24 |
| broke | 22 |
| stopped | 19 |
| hard | 18 |
| crazy | 14 |
| war | 13 |
| dead | 12 |
| stop | 12 |
“Tender is the Night” is the final book written by Fitzgerald while he was alive in 1934, published six before his death in 1940. This text was pulled from Project Gutenberg Australia. Arguably his most autobiographical novel, “ Tender Is the Night tells the story of Dick and Nicole Diver’s crumbling marriage. Though not well received at the time of its 1934 serial publication, both readers and critics have since recognized the novel as one of the twentieth century’s best. More than a simple story of estrangement and infidelity, Tender Is the Night grapples with the complexity of human relationships and the manipulations and ministrations of those closest to us.” Source: https://study.com/academy/lesson/tender-is-the-night-summary-characters-themes-analysis.html#:~:text=The%20darkest%20and%20most%20autobiographical,of%20the%20twentieth%20century’s%20best.
## # A tibble: 1 x 1
## n
## <int>
## 1 61923
The top words are time and doctor.
| word | n |
|---|---|
| time | 113 |
| doctor | 92 |
| people | 91 |
| looked | 81 |
| love | 76 |
| girl | 70 |
| hotel | 67 |
| night | 66 |
| day | 62 |
| mother | 62 |
“When you’re older you’ll know what people who love suffer. The agony. It’s better to be cold and young than to love. It’s happened to me before but never like this - so accidental - just when everything was going well.”
## [1] -0.7811258
“Tender is the Night” has the mean afinn value farthest away from zero, meaning that the afinn lexicon analyzed this book as incredibly negative.
“Love” is clearly the most used - at over 76 mentions, with “nice” at 36.
| word | n |
|---|---|
| love | 76 |
| nice | 36 |
| laughed | 33 |
| matter | 32 |
| fine | 27 |
| agreed | 25 |
| care | 21 |
| fun | 21 |
| god | 21 |
| glad | 19 |
“Afraid” is the most used word, with 30 mentions. “Poor” is mentioned 29 times.
| word | n |
|---|---|
| dick | 509 |
| demanded | 29 |
| cried | 28 |
| hard | 28 |
| leave | 26 |
| dead | 24 |
| war | 23 |
| bad | 21 |
| afraid | 19 |
| miss | 19 |
My first hypothesis is correct - Fitzgerald mentions “time” and “love” multiple times in each novel. Another interesting trend that I saw was the use of the word “war” throughout the novels - this makes sense as modernist literature was the genre that exploded post-World War I. It is clear that the war affected his writing. Fitzgerald also wrote “cried” and “god” frequently. It is harder to track my second hypothesis due to the differences in how the lexicons analyze words into different categories.
If we were to base the analysis on just the mean afinn value - Fitzgerald’s second and fourth novels are the most negative. His most negative novel was his last novel, “Tender is the Night” - this could be as a result of it being the most “autobiographical” version and was published during an incredibly hard time in the author’s life. Regardless, Fitzgerald left an incredible impact on American literature.