In this assignment, I’m tasked to apply some sentiment analysis base code from chapter 2 in Text Mining With R and apply it to a corpus of my choosing.
First let’s take a look at the code from chapter 2. The code features the following package dependencies:
library(tidytext)
library(tidyverse)
library(janeaustenr)
About the packages:
tidytext provides functions for tidying text janeaustenr is a sample dataset of Jane Austen’s 6 novels for text analysis.
First we’ll load joy words from the NRC lexicon and the Jane Austen dataset, which we’ll also tidy.
#load nrc lexicon, filtering for joy
nrcjoy = tidytext::get_sentiments("nrc") |>
filter(sentiment == "joy")
# load and tidy the jane austen data
tidy_books = austen_books() |>
group_by(book) |>
mutate(linenumber = row_number(),
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]", ignore_case=TRUE)))) |>
ungroup() |>
tidytext::unnest_tokens(word, text)
Let’s take a look at the output files:
head(nrcjoy)
## # A tibble: 6 × 2
## word sentiment
## <chr> <chr>
## 1 absolution joy
## 2 abundance joy
## 3 abundant joy
## 4 accolade joy
## 5 accompaniment joy
## 6 accomplish joy
head(tidy_books)
## # A tibble: 6 × 4
## book linenumber chapter word
## <fct> <int> <int> <chr>
## 1 Sense & Sensibility 1 0 sense
## 2 Sense & Sensibility 1 0 and
## 3 Sense & Sensibility 1 0 sensibility
## 4 Sense & Sensibility 3 0 by
## 5 Sense & Sensibility 3 0 jane
## 6 Sense & Sensibility 3 0 austen
Now we’ll combine the nrc sentiment data with the Jane Austin data to get a count of joy words that appear in the novel “Emma”.
tidy_books |>
filter(book == "Emma") |>
inner_join(nrcjoy) |>
count(word, sort=TRUE)
## Joining, by = "word"
## # A tibble: 301 × 2
## word n
## <chr> <int>
## 1 good 359
## 2 friend 166
## 3 hope 143
## 4 happy 125
## 5 love 117
## 6 deal 92
## 7 found 92
## 8 present 89
## 9 kind 82
## 10 happiness 76
## # … with 291 more rows
We can also use the BING lexicon which assigns a binary value to each word (positive or negative) to calculate a net sentiment.
# calculate a net sentiment
janeaustensentiment = tidy_books |>
inner_join(get_sentiments("bing")) |>
count(book, index = linenumber %/% 80, sentiment) |>
spread(sentiment, n, fill = 0) |>
mutate(sentiment = positive - negative)
## Joining, by = "word"
head(janeaustensentiment)
## # A tibble: 6 × 5
## book index negative positive sentiment
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Sense & Sensibility 0 16 32 16
## 2 Sense & Sensibility 1 19 53 34
## 3 Sense & Sensibility 2 12 31 19
## 4 Sense & Sensibility 3 15 31 16
## 5 Sense & Sensibility 4 16 34 18
## 6 Sense & Sensibility 5 16 51 35
I’m going to take a look at some unlabeled book reviews.
library(rvest)
file = "https://raw.githubusercontent.com/josh1den/DATA-607/main/HW/HW11/book_unlabeled.html"
#using rvest to read in the file
data = rvest::read_html(file)
Parse the data into a dataframe:
books = data.frame(
id = data |> html_elements("asin") |> html_text2(),
product_name = data |> html_elements("product_name") |> html_text2(),
helpful = data |> html_elements("helpful") |> html_text2(),
rating = data |> html_elements("rating") |> html_text2(),
review = data |> html_elements("review_text") |> html_text2()
)
View the parsed data:
head(books)
## id
## 1 1884956068
## 2 0679728740
## 3 0679728740
## 4 0679728740
## 5 0679728740
## 6 0679728740
## product_name
## 1 Manual pedi�trico para los due�os del nuevo beb�: Books: Graciela Esquivel-Aguilar,Horst D. Weinburg
## 2 Child of God: Books: Cormac Mccarthy
## 3 Child of God: Books: Cormac Mccarthy
## 4 Child of God: Books: Cormac Mccarthy
## 5 Child of God: Books: Cormac Mccarthy
## 6 Child of God: Books: Cormac Mccarthy
## helpful rating
## 1 7 of 8 5.0
## 2 0 of 1 5.0
## 3 2 of 5 2.0
## 4 4 of 5 5.0
## 5 11 of 14 4.0
## 6 7 of 9 4.0
## review
## 1 This all-Spanish handbook for parents with new babies will prove essential for any concerned about a child's health. Manual Pediatrico appeared first in English; the Spanish edition provides essential health information at a glance for parents unversed in medical care; and should be a mainstay of any Spanish-speaking home with children
## 2 McCarthy's writing and portrayal of Lester Ballard, a necrophiliac, is so well done that when the townfolk are after him you want him to escape. And then you have to wonder...why am I siding with a necrophiliac of all people? The writing is up to the high standards expected of McCarthy, and as usual he plumbs darker side of the human psyche. The book has an interesting twist to the plot - Ballard is falsely accused of rape, his house is auctioned off and he's left as a social outcast, an animal. He is removed of all his ties to humanity and so becomes the animal. If you like books that deal with the darker side of life then give this a read
## 3 Do you giggle uncontrollably when poking corpses with a stick? If so, look no further, this book is for you. I understand a book like this will appeal to a certain demographic. I guess I shouldn't have expected much, and I certainly didn't expect a literary masterpeice, but this was the first book in awhile I just felt like giving up on. I didn't, since it's so short, but I may just as well have. It is not that the book is so "grotesque" or "disturbing" as seen described elsewhere. The author either left out or was incapable of the proper narrative to make the potentially disturbing scenes at all vivid. Unfortunately, that applies to all aspects of this book. The entire book is in rural vernacular, including ignoring proper punctuation. But the end result is that nothing is described in any detail. It's like reading a poorly worded list of stuff that happened. It's almost as if he wasn't really trying very hard, or as if the story really was told by a simpleminded country person - an omniscient one that can read people's minds. I suppose the idea could have worked, but doesn't. Not a terrible read, just annoying and vague. With so many other good books out there, why waste your time
## 4 I was initiated into the world of Cormac McCarthy with this novel in Southern Lit class. My professor was the vice president of the Cormac McCarthy Appreciation Society and considers McCarthy one the most talented novelists of the twentieth century, as do I. This work is very much a product of an evolved understanding of Faulkner. It incorporates all of the typical faulknarian literary elements and subject matter, but stretches and evolves them to an unusually intense point. There is a message about decay, especially of the south in the diction, especially where the flood and the degeneration of Lester Ballard are concerned. There is Old South v. New South and the post reconstuction circumstances of the south with the disposession of Ballard. There is also lust here, something that Faulkner tackeled in a more subtle manner than McCarthy in the Sound and the Fury and As I Lay Dying. However, McCarthy's story of lust is intense and grotesque and is described without sentiment in an amazing display of the gift of total candor. McCarthy is nothing short of stoic in his descriptions and must posess an amazing constitution, as he has the ability to write what would make most of us vomit just thinking about. The ability to reduce a human character to the lowest common denomimnator, performing unspeakable acts of depravity and at the same time remaining a valid character whose presence still carries a literary message and a human one as well, is the most unique of gifts. This novel may be hard to take for the faint of heart, but it is well worth the read. It is haunting to the reader, not for its perverse subject matter, but for its understated messages, masterfully placed in the character of Lester Ballard, a disposessed and depraved madman, holding the dark secrets of what humanity can be driven to
## 5 I cannot speak to the literary points in the novel though I can say I enjoyed it. In fact, I couldn't put it down until I finished it. However, I think it an interesting setting considering it is set in my hometown of Sevierville, Tennessee! Strangely, the author refers accurately to several persons and events that I've known forever. Mr. Wade's children still live in Sevierville, so do the Whaley and Ogle families. The 1964 flood was over the parking meters and the White Caps were stopped by a real life Clint Eastwood of a Sevier County Sheriff! The opening scene with the auctioner can be based on no other than C.B. McCarter whose trademark saying was "WE SELL THE EARTH." C.B., my grandfather, has never told me of any run-ins with Lester Ballard, the novels main character! THANK GOODNESS! As it turns out Ballard is a murdering necrophiliac. This is where McCarthy takes over and writes his story instead of mirroring persons of the actual community
## 6 There is no denying the strain of Faulkner that runs through McCarthy's early works; like his predecessor, McCarthy is concerned less with plot than with character and the many and sundry ways in which character and place (here, the hills of Eastern Tennessee) interact. But McCarthy is more fun to read; his prose is lean and lyric and leaves lasting images in the mind's eye. He does not shrink from displaying humanity in all its ugly (often ungodly) forms. "Child of God" is best-known for its haunting portrayal of necrophilia--few writers could address so ghastly an act in such beautiful, elegant prose. But that is one of the great joys of Cormac McCarthy's early novels--they are not so much tours de force as they are exhibitions of beautifully painted landscape and haunting, nightmarish imagery
Split the titles, drop unnecessary columns:
books[c("title","category","author")] = str_split_fixed(books$product_name, ': ', 3)
books = books |>
select(c("id","title","author","rating","helpful","review"))
head(books)
## id title
## 1 1884956068 Manual pedi�trico para los due�os del nuevo beb�
## 2 0679728740 Child of God
## 3 0679728740 Child of God
## 4 0679728740 Child of God
## 5 0679728740 Child of God
## 6 0679728740 Child of God
## author rating helpful
## 1 Graciela Esquivel-Aguilar,Horst D. Weinburg 5.0 7 of 8
## 2 Cormac Mccarthy 5.0 0 of 1
## 3 Cormac Mccarthy 2.0 2 of 5
## 4 Cormac Mccarthy 5.0 4 of 5
## 5 Cormac Mccarthy 4.0 11 of 14
## 6 Cormac Mccarthy 4.0 7 of 9
## review
## 1 This all-Spanish handbook for parents with new babies will prove essential for any concerned about a child's health. Manual Pediatrico appeared first in English; the Spanish edition provides essential health information at a glance for parents unversed in medical care; and should be a mainstay of any Spanish-speaking home with children
## 2 McCarthy's writing and portrayal of Lester Ballard, a necrophiliac, is so well done that when the townfolk are after him you want him to escape. And then you have to wonder...why am I siding with a necrophiliac of all people? The writing is up to the high standards expected of McCarthy, and as usual he plumbs darker side of the human psyche. The book has an interesting twist to the plot - Ballard is falsely accused of rape, his house is auctioned off and he's left as a social outcast, an animal. He is removed of all his ties to humanity and so becomes the animal. If you like books that deal with the darker side of life then give this a read
## 3 Do you giggle uncontrollably when poking corpses with a stick? If so, look no further, this book is for you. I understand a book like this will appeal to a certain demographic. I guess I shouldn't have expected much, and I certainly didn't expect a literary masterpeice, but this was the first book in awhile I just felt like giving up on. I didn't, since it's so short, but I may just as well have. It is not that the book is so "grotesque" or "disturbing" as seen described elsewhere. The author either left out or was incapable of the proper narrative to make the potentially disturbing scenes at all vivid. Unfortunately, that applies to all aspects of this book. The entire book is in rural vernacular, including ignoring proper punctuation. But the end result is that nothing is described in any detail. It's like reading a poorly worded list of stuff that happened. It's almost as if he wasn't really trying very hard, or as if the story really was told by a simpleminded country person - an omniscient one that can read people's minds. I suppose the idea could have worked, but doesn't. Not a terrible read, just annoying and vague. With so many other good books out there, why waste your time
## 4 I was initiated into the world of Cormac McCarthy with this novel in Southern Lit class. My professor was the vice president of the Cormac McCarthy Appreciation Society and considers McCarthy one the most talented novelists of the twentieth century, as do I. This work is very much a product of an evolved understanding of Faulkner. It incorporates all of the typical faulknarian literary elements and subject matter, but stretches and evolves them to an unusually intense point. There is a message about decay, especially of the south in the diction, especially where the flood and the degeneration of Lester Ballard are concerned. There is Old South v. New South and the post reconstuction circumstances of the south with the disposession of Ballard. There is also lust here, something that Faulkner tackeled in a more subtle manner than McCarthy in the Sound and the Fury and As I Lay Dying. However, McCarthy's story of lust is intense and grotesque and is described without sentiment in an amazing display of the gift of total candor. McCarthy is nothing short of stoic in his descriptions and must posess an amazing constitution, as he has the ability to write what would make most of us vomit just thinking about. The ability to reduce a human character to the lowest common denomimnator, performing unspeakable acts of depravity and at the same time remaining a valid character whose presence still carries a literary message and a human one as well, is the most unique of gifts. This novel may be hard to take for the faint of heart, but it is well worth the read. It is haunting to the reader, not for its perverse subject matter, but for its understated messages, masterfully placed in the character of Lester Ballard, a disposessed and depraved madman, holding the dark secrets of what humanity can be driven to
## 5 I cannot speak to the literary points in the novel though I can say I enjoyed it. In fact, I couldn't put it down until I finished it. However, I think it an interesting setting considering it is set in my hometown of Sevierville, Tennessee! Strangely, the author refers accurately to several persons and events that I've known forever. Mr. Wade's children still live in Sevierville, so do the Whaley and Ogle families. The 1964 flood was over the parking meters and the White Caps were stopped by a real life Clint Eastwood of a Sevier County Sheriff! The opening scene with the auctioner can be based on no other than C.B. McCarter whose trademark saying was "WE SELL THE EARTH." C.B., my grandfather, has never told me of any run-ins with Lester Ballard, the novels main character! THANK GOODNESS! As it turns out Ballard is a murdering necrophiliac. This is where McCarthy takes over and writes his story instead of mirroring persons of the actual community
## 6 There is no denying the strain of Faulkner that runs through McCarthy's early works; like his predecessor, McCarthy is concerned less with plot than with character and the many and sundry ways in which character and place (here, the hills of Eastern Tennessee) interact. But McCarthy is more fun to read; his prose is lean and lyric and leaves lasting images in the mind's eye. He does not shrink from displaying humanity in all its ugly (often ungodly) forms. "Child of God" is best-known for its haunting portrayal of necrophilia--few writers could address so ghastly an act in such beautiful, elegant prose. But that is one of the great joys of Cormac McCarthy's early novels--they are not so much tours de force as they are exhibitions of beautifully painted landscape and haunting, nightmarish imagery
Now let’s take a look at count of positive and negative words that appear in reviews for Child of God:
cog = books |>
filter(title == "Child of God") |>
select("review") |>
unnest_tokens(word, review) |>
inner_join(get_sentiments("bing")) |>
count(word, sentiment) |>
pivot_wider(names_from=sentiment, values_from=n, values_fill=0) |>
mutate(sentiment = positive - negative)
## Joining, by = "word"
head(cog)
## # A tibble: 6 × 4
## word positive negative sentiment
## <chr> <int> <int> <int>
## 1 accurately 1 0 1
## 2 amazing 2 0 2
## 3 annoying 0 1 -1
## 4 appeal 1 0 1
## 5 beautiful 1 0 1
## 6 beautifully 1 0 1
Now let’s look at the total sentiment score for Child of God and compare it to its mean rating:
sum(cog$sentiment)
## [1] -16
The total sentiment for Child of God based on its user reviews is -16, or slightly negative. Let’s take a look at its average overall rating:
books |>
filter(title == "Child of God") |>
mutate(rating = as.numeric(rating)) |>
group_by(title) |>
summarize(mean_rating = mean(rating))
## # A tibble: 1 × 2
## title mean_rating
## <chr> <dbl>
## 1 Child of God 3.88
This is a positive rating on a scale of 1 - 5.
One set of criteria is not enough to establish a sentiment analysis. Allowing more time in future projects, to use multiple lexicons and compare how overall products relate to user ratings would provide valuable context to understanding consumer sentiment.