assignment_10A

Approach

Set Up

library(tidytext)
Warning: package 'tidytext' was built under R version 4.5.3
library(janeaustenr)
Warning: package 'janeaustenr' was built under R version 4.5.3
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(stringr)
library(tidyr)
library(ggplot2)

Source Code

tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(
      text,
      regex("^chapter [\\divxlc]", ignore_case = TRUE)
    ))
  ) %>%
  ungroup() %>%
  unnest_tokens(word, text)

jane_austen_sentiment <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(book, index = linenumber %/% 80, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>% 
  mutate(sentiment = positive - negative)
Joining with `by = join_by(word)`
Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 435434 of `x` matches multiple rows in `y`.
ℹ Row 5051 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
ggplot(jane_austen_sentiment, aes(index, sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~book, ncol = 2, scales = "free_x")

Citations

(Silge et al. 2022; Yan 2021)

Implementation

I’ll use AFINN instead of BING. I’ll also try the gutenbergr package

library(gutenbergr)
Warning: package 'gutenbergr' was built under R version 4.5.3
gutenberg_metadata |> filter(str_detect(author, "Vonnegut"))
# A tibble: 2 × 8
  gutenberg_id title     author gutenberg_author_id language gutenberg_bookshelf
         <int> <chr>     <chr>                <int> <fct>    <chr>              
1        21279 2 B R 0 … Vonne…                9812 en       Science Fiction/Ca…
2        30240 The Big … Vonne…                9812 en       Science Fiction/Ca…
# ℹ 2 more variables: rights <fct>, has_text <lgl>
vonnegut <- gutenberg_download(30240)
Using mirror https://aleph.pglaf.org.
head(vonnegut)
# A tibble: 6 × 2
  gutenberg_id text                   
         <int> <chr>                  
1        30240 " THE BIG TRIP"        
2        30240 "       UP YONDER"     
3        30240 ""                     
4        30240 "By KURT VONNEGUT, JR."
5        30240 ""                     
6        30240 "Illustrated by KOSSIN"

I’ll also use AFINN

afinn <- get_sentiments("afinn")

Codebase

# transform Vonnegut file to get each line, chapter, word as it's own row. 
df_vonnegut <- vonnegut |>
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(
      text,
      regex("^chapter [\\divxlc]", ignore_case = TRUE)
    ))
  ) |>
  ungroup() |>
  unnest_tokens(word, text)

# inner join the afinn data values per word
vonnegut_sentiment <- df_vonnegut |>
  inner_join(afinn, by = "word") |>
  mutate(index = linenumber %/% 80) |>
  group_by(index) |>
  summarise(
    sentiment = sum(value, na.rm = TRUE),       # net AFINN score
    n_words   = n()                              # words with sentiment scores
  ) |>
  ungroup()

vonnegut_sentiment |>
  ggplot(aes(x = index, y = sentiment)) +
  geom_col(fill = "darkred", alpha = 0.8) +
  geom_smooth(se = FALSE, color = "black") +
  labs(title = "Sentiment Progression in the Vonnegut Book (AFINN)",
       x = "Section (every 80 lines)",
       y = "Net Sentiment Score") +
  theme_minimal()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Summary

This is a tiny book / short story from Vonnegut that is dark and satirical. Using AFINN, we see the darker sentiment in the middle when the family conflict gets worse and everything becomes more hostile and tense. Then we see more levity near the end, when Lou and Em decide they actually like being in prison because it finally gives them peace and privacy. That shift fits the story well, so it is interesting that AFINN is able to follow the basic emotional arc even if it cannot fully understand the satire.

References

Silge, Julia, Michael Chirico, Patrick O. Perry, and Jeroen Ooms. 2022. Juliasilge/Janeaustenr: Janeaustenr 1.0.0. Zenodo. https://doi.org/10.5281/ZENODO.7026678.
Yan, Jianwei. 2021. “Text Mining with R: A Tidy Approach, by Julia Silge and David Robinson. Sebastopol, CA: OReilly Media, 2017. ISBN 978-1-491-98165-8. XI + 184 Pages.” Natural Language Engineering 28 (1): 137–39. https://doi.org/10.1017/s1351324920000649.