Analyzing Jane Austen and Charlotte Bronte
Comparing Jane Austen and Charlotte Bronte’s earlier novels to their later novels to see if their perspective on love changes.
For this assignment I chose Sense and Sensibility and Emma by Jane Austen. Sense and Sensibility was one of her first novels about three sisters and their widow mother and the story surrounds love, marriage, class, society and gender roles. Emma is about a cheeky young match maker who teaches us about love, and not to meddle into people’s love life’s. I wanted to compare Jane Austen to Charlotte Bronte because they were both writers during the Regency Era and a lot of their story surround the same themes. Love, marriage, growing up as a women, class and societal issues as well as sibling and friendship relationships. Shirley, Charlotte Bronte’s later novel, is about two women who have fallen into difficult circumstances. Jane Eyre is Charlotte Bronte’s most famous novel and ones of her firsts published. Its about the protagonist experiencing adulthood, love and growing up as a young women during a difficult time.
I predict that Sense and Sensibility by Jane Austen and Jane Eyre by Charlotte Bronte will have more optimistic views on love and womanhood, and their later novels. Shirley and Emma will have more pessimistic views on love and womanhood. My first step to this project was to load all my packages that are used in Rstudio.
Jane Austen, EmmaJane Austen, Sense and SensibilityCharlotte Bronte, ShirleyCharlotte Bronte, Jane Eyre
library(textdata)
library(tidytext)
library(gutenbergr)
library(janeaustenr)
library(wordcloud2)
Once that was completed I began to import my data and clean the text files from Gutenberg.
as.data.frame(sensesensibility) -> sense_and_sensibility
colnames(sense_and_sensibility)[1] <- "text"
sense_and_sensibility|>
unnest_tokens(word, text) |>
mutate(book = "Sense & Sensibility") -> sense_and_sensibility_words
sense_and_sensibility_words |>
anti_join(stop_words) -> sense_and_sensibility_cleaned
sense_and_sensibility_cleaned |>
filter(!word %in% c("by", "1", "was", "no", "in", "the", "of", "they", "as",
"a", "to", "so", "who", "all", "chapter") ) -> sense_and_sensibility_cleaned2
as.data.frame(emma) ->emma_words
colnames(emma_words)[1] <- "text"
emma_words |>
anti_join(stop_words) -> emma_words_cleaned
emma_words_cleaned |>
filter(!word %in% c("chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they",
"so", "ago")) -> emma_words_cleaned2
```{’’’shirley <- read_csv(“shirley.txt”, col_names = FALSE)} as.data.frame(shirley) -> shirley_words colnames(shirley_words)[1] <- “text”
shirley_words |> unnest_tokens(word, text) |> mutate(book = “shirley”) -> shirley_words
shirley_words |> anti_join(stop_words) -> shirley_words_cleaned
shirley_words_cleaned |> filter(!word %in% c(“chapter”, “the”, “to”, “a”, “as”, “of”, “all”, “who”, “no”, “they”, “so”,“ago”, “volume”, “copy”, “ebook”, “gutenberg”, “project”, “license”, “title”, “author”, “updated”, “2021”, “illustration”, “language”, “english”, “28”, “terms”, “laws”, “restriction”, “www.gutenberg.org”, “world”, “united”, “check”, “cost”)) -> shirley_words_cleaned2
```{'''janeeyre <- read_csv("janeeyre.txt", col_names = FALSE)}
as.data.frame(janeeyre) -> janeeyre_words
colnames(janeeyre_words)[1] <- "text"
janeeyre_words |>
unnest_tokens(word, text) |>
mutate(book = "Janeeyre") -> janeeyre_words
janeeyre_words |>
anti_join(stop_words) -> janeeyre_words_cleaned
janeeyre_words_cleaned |>
filter(!word %in% c("chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they",
"so","ago", "volume", "copy", "ebook", "gutenberg", "project", "license", "title", "author",
"updated", "2021", "illustration", "language", "english", "28", "terms", "laws", "restriction", "www.gutenberg.org",
"world", "united", "check", "cost")) -> janeeyre_words_cleaned2
After filtering and cleaning up my original data, I wanted to see what the most common words were throughout each book.
anti_join(stop_words) |>
count(word) |>
filter(!word %in% c ("Charlotte", "Bronte", "Jane", "Austen", "don't", "didn't", "it's", "it", "is", "the", "and","chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they",
"so","ago", "volume", "copy", 'jane', 'emma', "Emma", "caroline", "elinor", 'marianne', 'moore', 'sir' )) |>
arrange(desc(n)) |>
head(50) |>
wordcloud2()
anti_join(stop_words) |>
count(word) |>
filter(!word %in% c ("Charlotte", "Bronte", "Jane", "Austen", "don't", "didn't", "it's", "it", "is", "the", "and","chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they",
"so","ago", "volume", "copy", 'jane', 'emma', "Emma", "caroline", "elinor", 'marianne', 'moore', 'sir' )) |>
arrange(desc(n)) |>
head(20) |>
wordcloud2()
anti_join(stop_words) |>
count(word) |>
filter(!word %in% c ("Charlotte", "Bronte", "Jane", "Austen", "don't", "didn't", "it's", "it", "is", "the", "and","chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they",
"so","ago", "volume", "copy", 'jane', 'emma', "Emma", "caroline", "elinor", 'marianne', 'moore', 'sir' )) |>
arrange(desc(n)) |>
head(20) |>
wordcloud2()
anti_join(stop_words) |>
count(word) |>
filter(!word %in% c ( "Jane", "Austen", "don't", "didn't", "it's", "it", "is", "the", "and","chapter", "the", "to", "a", "as", "of", "all", "who", "no", "they")) |>
arrange(desc(n)) |>
head(20) |>
wordcloud2()
This was helpful to see how positive or negative the words were throughout the books alone. In all except for Emma, Love was a common word amongst the three books separately.
Now, I have to turn all the cleaned code into bigram’s.
unnest_tokens(bigram, X1, token="ngrams", n=2) -> je_bigrams
je_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ") -> je_bigrams
je_bigrams %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% NA) %>%
filter(!word2 %in% NA) -> je_bigrams
je_bigrams %>%
count(word1, word2, sort = TRUE)
```{’’’as.data.frame(emma) -> emma} emma |> unnest_tokens(bigram, emma, token=“ngrams”, n=2) -> emma_bigrams
emma |> unnest_tokens(bigram, emma, token=“ngrams”, n=2) -> emma_bigrams
emma_bigrams %>% separate(bigram, c(“word1”, “word2”), sep = ” “) -> emma_bigrams
emma_bigrams %>%
filter(!word1 %in% stop_words\(word) %>% filter(!word2 %in% stop_words\)word) %>% filter(!word1 %in% NA) %>% filter(!word2 %in% NA) -> emma_bigrams
emma_bigrams %>% count(word1, word2, sort = TRUE)
```{'''shirley |>}
unnest_tokens(bigram, X1, token="ngrams", n=2) -> shirley_bigrams
shirley_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ") -> shirley_bigrams
shirley_bigrams %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% NA) %>%
filter(!word2 %in% NA) -> shirley_bigrams
shirley_bigrams %>%
count(word1, word2, sort = TRUE)
unnest_tokens(bigram, text, token="ngrams", n=2) -> s_bigrams
s_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ") -> s_bigrams
s_bigrams %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% NA) %>%
filter(!word2 %in% NA) -> s_bigrams
s_bigrams %>%
count(word1, word2, sort = TRUE)
After creating separate bigram’s for each set of code, it was ready to merge all the books together.
full_join(shirley_words_cleaned2) |>
full_join(janeeyre_words_cleaned2) |>
full_join(sense_and_sensibility_cleaned2) -> merged_1
After merging all the code together I wanted to see what all the books looked like compared together and what the most common words were throughout them.
anti_join(stop_words) |>
count(word) |>
filter(!word %in% c ("Charlotte", "Bronte", "Jane", "Austen")) |>
arrange(desc(n)) |>
head(20) |>
wordcloud2()
The most common words throughout all four books are ‘love’, ‘miss’, ‘time’, ‘mind’ and ‘heart’. This tells me that the most common theme throughout all the stories is about love and romance, despite the other larger themes and messages the books follow.
inner_join(get_sentiments('afinn')) |>
group_by(book) |>
summarize(average = mean(value)) |>
ggplot(aes(reorder(book, average), average, fill = book)) + geom_col() + coord_flip()
The graph tells me that Jane Eyre and Shirley are the most negative books out of the four and Sense and Sensibility and Emma are the least negative out of the four. This statement doesn’t fully disprove my theory, I predicted Shirley would have more pesstimistic views on love but not Jane Eyre.