For this assignment, we are to do a sentimental analysis of a corpus of our choosing.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Warning: package 'tidytext' was built under R version 4.4.3
## Warning: package 'janeaustenr' was built under R version 4.4.3
## Warning: package 'gutenbergr' was built under R version 4.4.3
## Warning: package 'syuzhet' was built under R version 4.4.3
Example from Text Mining with R, Chapter 2 looks at Sentiment Analysis. Citation: Silge, Julia, and David Robinson. “Text Mining with R: A Tidy Approach.” O’Reilly Media, Inc., 2017.
## # A tibble: 2,477 × 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # ℹ 2,467 more rows
I will be choosing a book from using Project Gutenberg (URL: https://www.gutenberg.org/). This project is oldest digital library with many of world’s literature. This project focuses on older books/works whose US copyright has expired. I used the R package “syuzhet” for the analysis. Within the R package, I used the Jockers-Rinker lexicon because it assigns continuous (rather than binary) sentiment scores to words, allowing for more nuanced detection of positive and negative sentiments in text. I decided to pick the book Wuthering Heights by Emily Brontë for sentimental analysis. I’ve read this book before, and it predominantly conveys a tragic theme rather than a happy one. Let’s examine whether the sentiment analysis supports this observation.
## # A tibble: 1 × 8
## gutenberg_id title author gutenberg_author_id language gutenberg_bookshelf
## <int> <chr> <chr> <int> <chr> <chr>
## 1 768 Wutherin… Bront… 405 en Best Books Ever Li…
## # ℹ 2 more variables: rights <chr>, has_text <lgl>
## Determining mirror for Project Gutenberg from https://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
## Joining with `by = join_by(word)`
| index | sentiment |
|---|---|
| 0 | 1.55 |
| 1 | 0.30 |
| 2 | -0.10 |
| 3 | -4.70 |
| 4 | 6.15 |
| 5 | -0.20 |
The sentiment analysis of “Wuthering Heights” was conducted using the Jockers-Rinker sentiment lexicon. The text was segmented into smaller parts, each comprising 100 lines, and the sentiment scores for these segments were calculated and analyzed. The overall sentiment throughout “Wuthering Heights” reflects its predominantly tragic theme. This is supported by the negative sentiment scores in the book. There are positive sentiments in the book, which shows happier moments in the book. A bar graph was utilized to illustrate the sentiment distribution across the text’s segments. The graph has certain segments that shows high negative sentiments scores, which helps confirm the tragic theme in the book.