R Markdown

Jane Austen wrote Pride and Prejudice in 1813. The novel primarily follows the development of Elizabeth Bennet, but also, follows the other four Bennet sisters on their journey to marriage. The girls must marry someone or the inheritance of the house and belongings will be taken out of the family. The novel’s main theme is the importance of marrying for love rather than marriage. Ideas of sexism and the patriarchy are clearly a theme.

Pride and Prejudice is regularly considered an early-feminist novel; however, it’s main focus is marriage because of the way inheritance used to happen. Many of the female characters are portrayed as girly and loud gossipers who can’t do much for themselves. I’m choosing to focus my main analysis on the characters Elizabeth and Fitzwilliam Darcy. Elizabeth best represents a modern view of feminism because of her natural wit and intelligence, while Darcy’s pride often causes him to look down upon other people.

Since Elizabeth is considered to the closest femal character in the novel to a modern feminist, text analysis should reiterate that idea.

I started my project by cleaning the text. When you first download literature into R, all of the contents are included: page numbers, copyright date, etc. None of that information helps me for this project, so I used some basic code to clean it up.

pride %>%  unnest_tokens(word, text) -> prideWords
View(prideWords)
prideWords %>% anti_join(stop_words) -> prideClean
## Joining, by = "word"
prideClean %>% count(word, sort = TRUE)
## # A tibble: 6,009 x 2
##    word          n
##    <chr>     <int>
##  1 elizabeth   597
##  2 darcy       373
##  3 bennet      294
##  4 miss        283
##  5 jane        264
##  6 bingley     257
##  7 time        203
##  8 lady        183
##  9 sister      180
## 10 wickham     162
## # … with 5,999 more rows

I studied the most common words used in the novel by creating a wordcloud.

library(wordcloud)
## Loading required package: RColorBrewer
prideClean %>% 
  count(word, sort = TRUE) %>% 
  with(wordcloud(word, n, max.words=50))

Then, I looked at the overall sentiment in the novel. To do this, I created “bing” and “afinn” sentiment graphs to understand the overall connotation of the novel. I used the top 20 words for both graphs but filtered out the words “miss”, “like”, and “well”.

pride_word_counts_bing <- prideWords %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
## Joining, by = "word"

pride_word_counts_afinn <- prideWords %>%
  inner_join(get_sentiments("afinn")) %>%
  count(word, value, sort = TRUE) %>%
  ungroup()
## Joining, by = "word"

The two charts show slightly different results due to different computing methods, but, overall, the sentiment in the novel appears to be possitive.

Then, I begun my search into the difference of words used between the two characters Elizabeth and Fitzwilliam expecting to see more positive/feminist words used with Elizabeth. This proved to be more difficult than originally expected because Fitzwilliam is commonly referred to as Darcy, and his sister is fairly present in the novel. This led to a bit of overlap, so I had to use my knowledge of the book to make assumptions on the words. (Fitzwilliam’s only bigram was Colonel Fitzwilliam)

pride %>% unnest_tokens(bigram, text, token="ngrams", n=2) -> pride_bigrams
pride_bigrams %>% 
  count(bigram, sort = TRUE)
## # A tibble: 54,998 x 2
##    bigram       n
##    <chr>    <int>
##  1 of the     464
##  2 to be      443
##  3 in the     382
##  4 i am       302
##  5 of her     260
##  6 to the     252
##  7 it was     251
##  8 mr darcy   243
##  9 of his     234
## 10 she was    209
## # … with 54,988 more rows
bigrams_separated <- pride_bigrams %>% 
  separate(bigram, c("word1", "word2"), sep = " ")

bigrams_filtered <- bigrams_separated %>% 
  filter(!word1 %in% stop_words$word) %>% 
  filter(!word2 %in% stop_words$word)

bigram_counts <- bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

bigram_counts
## # A tibble: 5,922 x 3
##    word1   word2           n
##    <chr>   <chr>       <int>
##  1 lady    catherine     100
##  2 miss    bingley        72
##  3 miss    bennet         60
##  4 sir     william        38
##  5 de      bourgh         35
##  6 miss    darcy          34
##  7 colonel forster        26
##  8 colonel fitzwilliam    25
##  9 cried   elizabeth      24
## 10 miss    lucas          23
## # … with 5,912 more rows
bigrams_united <- bigrams_filtered %>% 
  filter(word2 == "elizabeth") %>% 
  unite(bigram, word1, word2, sep = " ")

bigrams_united %>% 
    count(bigram, sort = TRUE)
## # A tibble: 170 x 2
##    bigram                        n
##    <chr>                     <int>
##  1 cried elizabeth              24
##  2 replied elizabeth            18
##  3 miss elizabeth               12
##  4 cousin elizabeth              3
##  5 daughter elizabeth            3
##  6 evening elizabeth             3
##  7 longbourn elizabeth           3
##  8 sister elizabeth              3
##  9 collins elizabeth             2
## 10 congratulations elizabeth     2
## # … with 160 more rows
bigrams_darcy <- bigrams_filtered %>% 
  filter(word2 == "darcy") %>% 
  unite(bigram, word1, word2, sep = " ")

bigrams_darcy %>% 
  count(bigram, sort = TRUE)
## # A tibble: 29 x 2
##    bigram                n
##    <chr>             <int>
##  1 miss darcy           34
##  2 replied darcy         6
##  3 cried darcy           2
##  4 fitzwilliam darcy     2
##  5 added darcy           1
##  6 affairs darcy         1
##  7 anne darcy            1
##  8 appeared darcy        1
##  9 applied darcy         1
## 10 beautiful darcy       1
## # … with 19 more rows
View(bigrams_darcy)

Overall, the sentiment towards the two characters was fairly similar. The novel seemed to have a fairly positive view on Darcy with few adjectives like “horrible” being thrown in there. Elizabeth did have words like “cried” and “dear” which seem to demean her a bit.

It appears that my hypothesis of Elizabeth being portrayed as a “modern feminist” was somewhat disproven. Although there were not too many negative phrases associated with her name, there were the typical “soft adjectives” given to her.