Analyzing Jane Austen and Charlotte Bronte

Author

Ella Kucera

Comparing Jane Austen and Charlotte Bronte’s earlier novels to their later novels to see if their perspective on love changes.

For this assignment I chose Sense and Sensibility and Emma by Jane Austen. Sense and Sensibility was one of her first novels about three sisters and their widow mother and the story surrounds love, marriage, class, society and gender roles. Emma is about a cheeky young match maker who teaches us about love, and not to meddle into people’s love life’s. I wanted to compare Jane Austen to Charlotte Bronte because they were both writers during the Regency Era and a lot of their story surround the same themes. Love, marriage, growing up as a women, class and societal issues as well as sibling and friendship relationships. Shirley, Charlotte Bronte’s later novel, is about two women who have fallen into difficult circumstances. Jane Eyre is Charlotte Bronte’s most famous novel and ones of her firsts published. Its about the protagonist experiencing adulthood, love and growing up as a young women during a difficult time.

I predict that Sense and Sensibility by Jane Austen and Jane Eyre by Charlotte Bronte will have more optimistic views on love and womanhood, and their later novels. Shirley and Emma will have more pessimistic views on love and womanhood. My first step to this project was to load all my packages that are used in Rstudio.

Jane Austen, Emma Jane Austen, Sense and Sensibility Charlotte Bronte, Shirley Charlotte Bronte, Jane Eyre

library(textdata)
library(tidytext)
library(gutenbergr)
library(janeaustenr)
library(wordcloud2)

Once that was completed I began to import my data and clean the text files from Gutenberg.

as.data.frame(sensesensibility) -> sense_and_sensibility
colnames(sense_and_sensibility)[1] <- "text"

sense_and_sensibility|> 
  unnest_tokens(word, text) |> 
  mutate(book = "Sense & Sensibility") -> sense_and_sensibility_words

sense_and_sensibility_words |> 
  anti_join(stop_words) -> sense_and_sensibility_cleaned

sense_and_sensibility_cleaned |> 
  filter(!word %in% c("by", "1", "was", "no", "in", "the", "of", "they", "as", 
         "a", "to", "so", "who", "all", "chapter") ) -> sense_and_sensibility_cleaned2

as.data.frame(emma) ->emma_words  
colnames(emma_words)[1] <- "text"


emma_words |> 
  anti_join(stop_words) -> emma_words_cleaned

emma_words_cleaned |> 
  filter(!word %in% c("chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they", 
                      "so", "ago")) -> emma_words_cleaned2

```{’’’shirley <- read_csv(“shirley.txt”, col_names = FALSE)} as.data.frame(shirley) -> shirley_words colnames(shirley_words)[1] <- “text”

shirley_words |> unnest_tokens(word, text) |> mutate(book = “shirley”) -> shirley_words

shirley_words |> anti_join(stop_words) -> shirley_words_cleaned

shirley_words_cleaned |> filter(!word %in% c(“chapter”, “the”, “to”, “a”, “as”, “of”, “all”, “who”, “no”, “they”, “so”,“ago”, “volume”, “copy”, “ebook”, “gutenberg”, “project”, “license”, “title”, “author”, “updated”, “2021”, “illustration”, “language”, “english”, “28”, “terms”, “laws”, “restriction”, “www.gutenberg.org”, “world”, “united”, “check”, “cost”)) -> shirley_words_cleaned2


```{'''janeeyre <- read_csv("janeeyre.txt", col_names = FALSE)}

as.data.frame(janeeyre) -> janeeyre_words
colnames(janeeyre_words)[1] <- "text"


janeeyre_words |> 
  unnest_tokens(word, text) |> 
  mutate(book = "Janeeyre") -> janeeyre_words

janeeyre_words |> 
  anti_join(stop_words) -> janeeyre_words_cleaned

janeeyre_words_cleaned |> 
  filter(!word %in% c("chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they", 
                      "so","ago", "volume", "copy", "ebook", "gutenberg", "project", "license", "title", "author", 
                      "updated", "2021", "illustration", "language", "english", "28", "terms", "laws", "restriction", "www.gutenberg.org", 
                      "world", "united", "check", "cost")) -> janeeyre_words_cleaned2

After filtering and cleaning up my original data, I wanted to see what the most common words were throughout each book.

  anti_join(stop_words) |> 
  count(word) |> 
  filter(!word %in% c ("Charlotte", "Bronte", "Jane", "Austen", "don't", "didn't", "it's", "it", "is", "the", "and","chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they", 
                       "so","ago", "volume", "copy", 'jane', 'emma', "Emma", "caroline", "elinor", 'marianne', 'moore', 'sir' )) |> 
  arrange(desc(n)) |> 
  head(50) |> 
  wordcloud2()

  anti_join(stop_words) |> 
  count(word) |> 
  filter(!word %in% c ("Charlotte", "Bronte", "Jane", "Austen", "don't", "didn't", "it's", "it", "is", "the", "and","chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they", 
                       "so","ago", "volume", "copy", 'jane', 'emma', "Emma", "caroline", "elinor", 'marianne', 'moore', 'sir' )) |> 
  arrange(desc(n)) |> 
  head(20) |> 
  wordcloud2()

anti_join(stop_words) |> 
  count(word) |> 
  filter(!word %in% c ("Charlotte", "Bronte", "Jane", "Austen", "don't", "didn't", "it's", "it", "is", "the", "and","chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they", 
                       "so","ago", "volume", "copy", 'jane', 'emma', "Emma", "caroline", "elinor", 'marianne', 'moore', 'sir' )) |> 
  arrange(desc(n)) |> 
  head(20) |> 
  wordcloud2()

anti_join(stop_words) |> 
  count(word) |> 
  filter(!word %in% c ( "Jane", "Austen", "don't", "didn't", "it's", "it", "is", "the", "and","chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they")) |> 
  arrange(desc(n)) |> 
  head(20) |> 
  wordcloud2()

This was helpful to see how positive or negative the words were throughout the books alone. In all except for Emma, Love was a common word amongst the three books separately.

Now, I have to turn all the cleaned code into bigram’s.

  unnest_tokens(bigram, X1, token="ngrams", n=2) -> je_bigrams

je_bigrams %>% 
  separate(bigram, c("word1", "word2"), sep = " ") -> je_bigrams

je_bigrams %>%  
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word) %>% 
  filter(!word1 %in% NA) %>% 
  filter(!word2 %in% NA) -> je_bigrams

je_bigrams %>% 
  count(word1, word2, sort = TRUE)

```{’’’as.data.frame(emma) -> emma} emma |> unnest_tokens(bigram, emma, token=“ngrams”, n=2) -> emma_bigrams

emma |> unnest_tokens(bigram, emma, token=“ngrams”, n=2) -> emma_bigrams

emma_bigrams %>% separate(bigram, c(“word1”, “word2”), sep = ” “) -> emma_bigrams

emma_bigrams %>%
filter(!word1 %in% stop_words\(word) %>% filter(!word2 %in% stop_words\)word) %>% filter(!word1 %in% NA) %>% filter(!word2 %in% NA) -> emma_bigrams

emma_bigrams %>% count(word1, word2, sort = TRUE)


```{'''shirley |>}
  unnest_tokens(bigram, X1, token="ngrams", n=2) -> shirley_bigrams

shirley_bigrams %>% 
  separate(bigram, c("word1", "word2"), sep = " ") -> shirley_bigrams

shirley_bigrams %>%  
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word) %>% 
  filter(!word1 %in% NA) %>% 
  filter(!word2 %in% NA) -> shirley_bigrams

shirley_bigrams %>% 
  count(word1, word2, sort = TRUE)

  unnest_tokens(bigram, text, token="ngrams", n=2) -> s_bigrams

s_bigrams %>% 
  separate(bigram, c("word1", "word2"), sep = " ") -> s_bigrams

s_bigrams %>%  
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word) %>% 
  filter(!word1 %in% NA) %>% 
  filter(!word2 %in% NA) -> s_bigrams

s_bigrams %>% 
  count(word1, word2, sort = TRUE)

After creating separate bigram’s for each set of code, it was ready to merge all the books together.

  full_join(shirley_words_cleaned2) |> 
  full_join(janeeyre_words_cleaned2) |> 
  full_join(sense_and_sensibility_cleaned2) -> merged_1

After merging all the code together I wanted to see what all the books looked like compared together and what the most common words were throughout them.

   anti_join(stop_words) |> 
   count(word) |> 
   filter(!word %in% c ("Charlotte", "Bronte", "Jane", "Austen")) |> 
   arrange(desc(n)) |> 
   head(20) |> 
   wordcloud2()

The most common words throughout all four books are ‘love’, ‘miss’, ‘time’, ‘mind’ and ‘heart’. This tells me that the most common theme throughout all the stories is about love and romance, despite the other larger themes and messages the books follow.

  inner_join(get_sentiments('afinn')) |> 
  group_by(book) |> 
  summarize(average = mean(value)) |> 
  ggplot(aes(reorder(book, average), average, fill = book)) + geom_col() + coord_flip()

The graph tells me that Jane Eyre and Shirley are the most negative books out of the four and Sense and Sensibility and Emma are the least negative out of the four. This statement doesn’t fully disprove my theory, I predicted Shirley would have more pesstimistic views on love but not Jane Eyre.