Jane Austen wrote Pride and Prejudice in 1813. The novel primarily follows the development of Elizabeth Bennet, but also, follows the other four Bennet sisters on their journey to marriage. The girls must marry someone or the inheritance of the house and belongings will be taken out of the family. The novel’s main theme is the importance of marrying for love rather than marriage. Ideas of sexism and the patriarchy are clearly a theme.
Pride and Prejudice is regularly considered an early-feminist novel; however, it’s main focus is marriage because of the way inheritance used to happen. Many of the female characters are portrayed as girly and loud gossipers who can’t do much for themselves. I’m choosing to focus my main analysis on the characters Elizabeth and Fitzwilliam Darcy. Elizabeth best represents a modern view of feminism because of her natural wit and intelligence, while Darcy’s pride often causes him to look down upon other people.
Since Elizabeth is considered to the closest femal character in the novel to a modern feminist, text analysis should reiterate that idea.
I started my project by cleaning the text. When you first download literature into R, all of the contents are included: page numbers, copyright date, etc. None of that information helps me for this project, so I used some basic code to clean it up.
pride %>% unnest_tokens(word, text) -> prideWords
View(prideWords)
prideWords %>% anti_join(stop_words) -> prideClean
## Joining, by = "word"
prideClean %>% count(word, sort = TRUE)
## # A tibble: 6,009 x 2
## word n
## <chr> <int>
## 1 elizabeth 597
## 2 darcy 373
## 3 bennet 294
## 4 miss 283
## 5 jane 264
## 6 bingley 257
## 7 time 203
## 8 lady 183
## 9 sister 180
## 10 wickham 162
## # … with 5,999 more rows
I studied the most common words used in the novel by creating a wordcloud.
library(wordcloud)
## Loading required package: RColorBrewer
prideClean %>%
count(word, sort = TRUE) %>%
with(wordcloud(word, n, max.words=50))
Then, I looked at the overall sentiment in the novel. To do this, I created “bing” and “afinn” sentiment graphs to understand the overall connotation of the novel. I used the top 20 words for both graphs but filtered out the words “miss”, “like”, and “well”.
pride_word_counts_bing <- prideWords %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining, by = "word"
pride_word_counts_afinn <- prideWords %>%
inner_join(get_sentiments("afinn")) %>%
count(word, value, sort = TRUE) %>%
ungroup()
## Joining, by = "word"
The two charts show slightly different results due to different computing methods, but, overall, the sentiment in the novel appears to be possitive.
Then, I begun my search into the difference of words used between the two characters Elizabeth and Fitzwilliam expecting to see more positive/feminist words used with Elizabeth. This proved to be more difficult than originally expected because Fitzwilliam is commonly referred to as Darcy, and his sister is fairly present in the novel. This led to a bit of overlap, so I had to use my knowledge of the book to make assumptions on the words. (Fitzwilliam’s only bigram was Colonel Fitzwilliam)
pride %>% unnest_tokens(bigram, text, token="ngrams", n=2) -> pride_bigrams
pride_bigrams %>%
count(bigram, sort = TRUE)
## # A tibble: 54,998 x 2
## bigram n
## <chr> <int>
## 1 of the 464
## 2 to be 443
## 3 in the 382
## 4 i am 302
## 5 of her 260
## 6 to the 252
## 7 it was 251
## 8 mr darcy 243
## 9 of his 234
## 10 she was 209
## # … with 54,988 more rows
bigrams_separated <- pride_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
bigrams_filtered <- bigrams_separated %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)
bigram_counts <- bigrams_filtered %>%
count(word1, word2, sort = TRUE)
bigram_counts
## # A tibble: 5,922 x 3
## word1 word2 n
## <chr> <chr> <int>
## 1 lady catherine 100
## 2 miss bingley 72
## 3 miss bennet 60
## 4 sir william 38
## 5 de bourgh 35
## 6 miss darcy 34
## 7 colonel forster 26
## 8 colonel fitzwilliam 25
## 9 cried elizabeth 24
## 10 miss lucas 23
## # … with 5,912 more rows
bigrams_united <- bigrams_filtered %>%
filter(word2 == "elizabeth") %>%
unite(bigram, word1, word2, sep = " ")
bigrams_united %>%
count(bigram, sort = TRUE)
## # A tibble: 170 x 2
## bigram n
## <chr> <int>
## 1 cried elizabeth 24
## 2 replied elizabeth 18
## 3 miss elizabeth 12
## 4 cousin elizabeth 3
## 5 daughter elizabeth 3
## 6 evening elizabeth 3
## 7 longbourn elizabeth 3
## 8 sister elizabeth 3
## 9 collins elizabeth 2
## 10 congratulations elizabeth 2
## # … with 160 more rows
bigrams_darcy <- bigrams_filtered %>%
filter(word2 == "darcy") %>%
unite(bigram, word1, word2, sep = " ")
bigrams_darcy %>%
count(bigram, sort = TRUE)
## # A tibble: 29 x 2
## bigram n
## <chr> <int>
## 1 miss darcy 34
## 2 replied darcy 6
## 3 cried darcy 2
## 4 fitzwilliam darcy 2
## 5 added darcy 1
## 6 affairs darcy 1
## 7 anne darcy 1
## 8 appeared darcy 1
## 9 applied darcy 1
## 10 beautiful darcy 1
## # … with 19 more rows
View(bigrams_darcy)
Overall, the sentiment towards the two characters was fairly similar. The novel seemed to have a fairly positive view on Darcy with few adjectives like “horrible” being thrown in there. Elizabeth did have words like “cried” and “dear” which seem to demean her a bit.
It appears that my hypothesis of Elizabeth being portrayed as a “modern feminist” was somewhat disproven. Although there were not too many negative phrases associated with her name, there were the typical “soft adjectives” given to her.