The base code for this assignment is originally from “Text Mining with R: A Tidy Approach” by Julia Silge and David Robinson, Chapter 2: https://www.tidytextmining.com/sentiment.html#sentiment
This assignment focuses on sentiment analysis. To quote the original text, “We can use the tools of text mining to approach the emotional content of text programmatically”.
## Warning: package 'tidytext' was built under R version 4.3.3
## # A tibble: 2,477 × 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # ℹ 2,467 more rows
## # A tibble: 6,786 × 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # ℹ 6,776 more rows
## # A tibble: 13,872 × 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
## 7 abandoned negative
## 8 abandoned sadness
## 9 abandonment anger
## 10 abandonment fear
## # ℹ 13,862 more rows
## Warning: package 'janeaustenr' was built under R version 4.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
tidy_books <- austen_books() %>%
group_by(book) %>%
mutate(
linenumber = row_number(),
chapter = cumsum(str_detect(text,
regex("^chapter [\\divxlc]",
ignore_case = TRUE)))) %>%
ungroup() %>%
unnest_tokens(word, text)nrc_joy <- get_sentiments("nrc") %>%
filter(sentiment == "joy")
tidy_books %>%
filter(book == "Emma") %>%
inner_join(nrc_joy) %>%
count(word, sort = TRUE)## Joining with `by = join_by(word)`
## # A tibble: 301 × 2
## word n
## <chr> <int>
## 1 good 359
## 2 friend 166
## 3 hope 143
## 4 happy 125
## 5 love 117
## 6 deal 92
## 7 found 92
## 8 present 89
## 9 kind 82
## 10 happiness 76
## # ℹ 291 more rows
library(tidyr)
jane_austen_sentiment <- tidy_books %>%
inner_join(get_sentiments("bing")) %>%
count(book, index = linenumber %/% 80, sentiment) %>%
pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
mutate(sentiment = positive - negative)## Joining with `by = join_by(word)`
## Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 435434 of `x` matches multiple rows in `y`.
## ℹ Row 5051 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
library(ggplot2)
ggplot(jane_austen_sentiment, aes(index, sentiment, fill = book)) +
geom_col(show.legend = FALSE) +
facet_wrap(~book, ncol = 2, scales = "free_x")As a lover of Tolkien, I’m curious on the sentiment analysis behind his most popular, “The Lord of the Rings”. Lets add the loughran sentiment lexicon and use that for our analysis. I’ve pulled the text from an existing source I found on github.
##
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
##
## complete
## Warning: `as.tibble()` was deprecated in tibble 2.0.0.
## ℹ Please use `as_tibble()` instead.
## ℹ The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Joining with `by = join_by(word)`
## # A tibble: 6 × 1
## word
## <chr>
## 1 special
## 2 note
## 3 reprint
## 4 minor
## 5 inaccuracies
## 6 noted
loughran_pos <- get_sentiments("loughran") %>%
filter(sentiment == "positive")
lotrWords %>%
inner_join(loughran_pos) %>%
count(word, sort = TRUE)## Joining with `by = join_by(word)`
## # A tibble: 121 × 2
## word n
## <chr> <int>
## 1 strong 167
## 2 strength 161
## 3 dream 84
## 4 beautiful 77
## 5 easy 73
## 6 leading 69
## 7 smooth 47
## 8 pleased 45
## 9 stronger 41
## 10 pleasant 38
## # ℹ 111 more rows
lotrWords$linenumber <- 1:nrow(lotrWords)
lotrSentiment <- lotrWords %>%
inner_join(get_sentiments("loughran")) %>%
count(index = linenumber %/% 500, sentiment) %>%
pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
mutate(sentiment = positive - negative)## Joining with `by = join_by(word)`
## Warning in inner_join(., get_sentiments("loughran")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 89 of `x` matches multiple rows in `y`.
## ℹ Row 2173 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
I honestly would have expected more positive sentiment towards the beginning, though I suppose less negative sentiment will have to do. Compared to Jane Austen, Tolkien is much more negative…