assignment_10A

Approach

Set Up

library(tidytext)

Warning: package 'tidytext' was built under R version 4.5.3

library(janeaustenr)

Warning: package 'janeaustenr' was built under R version 4.5.3

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(stringr)
library(tidyr)
library(ggplot2)

Source Code

tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(
      text,
      regex("^chapter [\\divxlc]", ignore_case = TRUE)
    ))
  ) %>%
  ungroup() %>%
  unnest_tokens(word, text)

jane_austen_sentiment <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(book, index = linenumber %/% 80, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>% 
  mutate(sentiment = positive - negative)

Joining with `by = join_by(word)`

Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 435434 of `x` matches multiple rows in `y`.
ℹ Row 5051 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

ggplot(jane_austen_sentiment, aes(index, sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~book, ncol = 2, scales = "free_x")

Citations

(Silge et al. 2022; Yan 2021)

Implementation

I’ll use AFINN instead of BING. I’ll also try the gutenbergr package

library(gutenbergr)

Warning: package 'gutenbergr' was built under R version 4.5.3

gutenberg_metadata |> filter(str_detect(author, "Vonnegut"))

# A tibble: 2 × 8
  gutenberg_id title     author gutenberg_author_id language gutenberg_bookshelf
         <int> <chr>     <chr>                <int> <fct>    <chr>              
1        21279 2 B R 0 … Vonne…                9812 en       Science Fiction/Ca…
2        30240 The Big … Vonne…                9812 en       Science Fiction/Ca…
# ℹ 2 more variables: rights <fct>, has_text <lgl>

vonnegut <- gutenberg_download(30240)

Using mirror https://aleph.pglaf.org.

head(vonnegut)

# A tibble: 6 × 2
  gutenberg_id text                   
         <int> <chr>                  
1        30240 " THE BIG TRIP"        
2        30240 "       UP YONDER"     
3        30240 ""                     
4        30240 "By KURT VONNEGUT, JR."
5        30240 ""                     
6        30240 "Illustrated by KOSSIN"

I’ll also use AFINN

afinn <- get_sentiments("afinn")

Codebase

Silge, Julia, Michael Chirico, Patrick O. Perry, and Jeroen Ooms. 2022. Juliasilge/Janeaustenr: Janeaustenr 1.0.0. Zenodo. https://doi.org/10.5281/ZENODO.7026678.

Yan, Jianwei. 2021. “Text Mining with R: A Tidy Approach, by Julia Silge and David Robinson. Sebastopol, CA: O’Reilly Media, 2017. ISBN 978-1-491-98165-8. XI + 184 Pages.” Natural Language Engineering 28 (1): 137–39. https://doi.org/10.1017/s1351324920000649.