sentiment Analysis:
For this project I have chosen Charles Dickens’s literary work. I have analyzed “A Tale of Two Cities”,“Great Expectation”,“A christmas carol”,“Oliver Twist”,“Hard Times”. I have used gutenbergr library to download these books.
This assignment focuses on opinion mining.When human readers approach a text, we use our understanding of the emotional intent of words to infer whether a section of text is positive or negative, or perhaps characterized by some other more nuanced emotion like surprise or disgust. We can use the tools of text mining to approach the emotional content of text programmatically,
The codes from Sentiment analysis with tidy data from Text Mining with R have been implemented on my chosen corpus.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: NLP
library(purrr)
library(tidytext)
library(gutenbergr)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
##
## annotate
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
dickens <- gutenberg_download(c(98, 1400, 46, 730, 786))
## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
Firstly the data has been loaded and tockenized,and stopwords are removed.
tidy_dickens <- dickens %>%
mutate(
linenumber = row_number()) %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)
## Joining, by = "word"
tidy_dickens %>%
count(word, sort = TRUE)
## # A tibble: 19,587 x 2
## word n
## <chr> <int>
## 1 time 1218
## 2 hand 918
## 3 don’t 863
## 4 night 835
## 5 looked 814
## 6 head 813
## 7 oliver 766
## 8 dear 751
## 9 joe 719
## 10 miss 702
## # ... with 19,577 more rows
bing package is used to observe the emotions in these five novels.
bing_word_counts <- tidy_dickens %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining, by = "word"
## # A tibble: 3,133 x 3
## word sentiment n
## <chr> <chr> <int>
## 1 miss negative 702
## 2 poor negative 350
## 3 dark negative 299
## 4 hard negative 223
## 5 dead negative 218
## 6 strong positive 203
## 7 love positive 202
## 8 fell negative 198
## 9 death negative 194
## 10 cold negative 192
## # ... with 3,123 more rows
nrc is used to project sentiments.
nrc_word_counts <- tidy_dickens %>%
inner_join(get_sentiments("nrc")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining, by = "word"
## # A tibble: 8,683 x 3
## word sentiment n
## <chr> <chr> <int>
## 1 time anticipation 1218
## 2 dear positive 751
## 3 sir positive 697
## 4 sir trust 697
## 5 boy disgust 608
## 6 boy negative 608
## 7 gentleman positive 561
## 8 gentleman trust 561
## 9 father trust 446
## 10 fire fear 353
## # ... with 8,673 more rows
Presence of bing,nrc, afinn in Dickens’s work
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble 3.1.0 v stringr 1.4.0
## v tidyr 1.1.3 v forcats 0.5.1
## v readr 1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x ggplot2::annotate() masks NLP::annotate()
## x dplyr::filter() masks stats::filter()
## x kableExtra::group_rows() masks dplyr::group_rows()
## x dplyr::lag() masks stats::lag()
new_afinn <- tidy_dickens %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = linenumber %/% 5) %>%
summarise(sentiment = sum(value)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
new_bing_and_nrc1 <- bind_rows(
tidy_dickens %>%
inner_join(get_sentiments("bing")) %>%
mutate(method = "Bing et al."),
tidy_dickens %>%
inner_join(get_sentiments("nrc") %>%
filter(sentiment %in% c("positive",
"negative"))
) %>%
mutate(method = "NRC")) %>%
count(method, index = linenumber %/% 5, sentiment) %>%
pivot_wider(names_from = sentiment,
values_from = n,
values_fill = 0) %>%
mutate(sentiment = positive - negative)
## Joining, by = "word"
## Joining, by = "word"
bind_rows(new_afinn,
new_bing_and_nrc1) %>%
ggplot(aes(index, sentiment, fill = method)) +
geom_col(show.legend = FALSE) +
facet_wrap(~method, ncol = 1, scales = "free_y")

Sentiment analysis is getting implemented in Tale of Two Cities.Three dictionaries have been compared.It refers the net sentiment (positive - negative) in each chunk of the novel text for each sentiment lexicon.All of them are added together for visualization.
A_tale_of_two_cities <- tidy_dickens %>%
filter(tidy_dickens$gutenberg_id == "98")
A_tale_of_two_cities
## # A tibble: 46,636 x 3
## gutenberg_id linenumber word
## <int> <int> <chr>
## 1 98 3843 tale
## 2 98 3843 cities
## 3 98 3845 story
## 4 98 3845 french
## 5 98 3845 revolution
## 6 98 3847 charles
## 7 98 3847 dickens
## 8 98 3850 contents
## 9 98 3853 book
## 10 98 3853 recalled
## # ... with 46,626 more rows
afinn1 <- A_tale_of_two_cities %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = linenumber %/% 5) %>%
summarise(sentiment = sum(value)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
bing_and_nrc1 <- bind_rows(
A_tale_of_two_cities %>%
inner_join(get_sentiments("bing")) %>%
mutate(method = "Bing et al."),
A_tale_of_two_cities %>%
inner_join(get_sentiments("nrc") %>%
filter(sentiment %in% c("positive",
"negative"))
) %>%
mutate(method = "NRC")) %>%
count(method, index = linenumber %/% 5, sentiment) %>%
pivot_wider(names_from = sentiment,
values_from = n,
values_fill = 0) %>%
mutate(sentiment = positive - negative)
## Joining, by = "word"
## Joining, by = "word"
bind_rows(afinn1,
bing_and_nrc1) %>%
ggplot(aes(index, sentiment, fill = method)) +
geom_col(show.legend = FALSE) +
facet_wrap(~method, ncol = 1, scales = "free_y")

Most common positive and negative words:
bing_word_counts2 <- tidy_dickens %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining, by = "word"
## # A tibble: 3,133 x 3
## word sentiment n
## <chr> <chr> <int>
## 1 miss negative 702
## 2 poor negative 350
## 3 dark negative 299
## 4 hard negative 223
## 5 dead negative 218
## 6 strong positive 203
## 7 love positive 202
## 8 fell negative 198
## 9 death negative 194
## 10 cold negative 192
## # ... with 3,123 more rows
bing_word_counts2 %>%
group_by(sentiment) %>%
slice_max(n, n = 10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(n, word, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
labs(x = "Contribution to sentiment",
y = NULL)

WordClouds
## Loading required package: RColorBrewer
set.seed(123) # for reproducibility
tidy_dickens %>%
anti_join(stop_words) %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100,
rot.per=0.35,
colors=brewer.pal(7, "Accent")))
## Joining, by = "word"
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : time could not be fit on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : miss could not be fit on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : defarge could not be fit on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : word could not be fit on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : house could not be fit on page. It will not be plotted.

##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
tidy_dickens %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud(colors = c("red", "blue"),
max.words = 100)
## Joining, by = "word"

Postive and Negative words for A tale of two cities through wordclouds:
A_tale_of_two_cities %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud(colors = c("red", "blue"),
max.words = 100)
## Joining, by = "word"

library(sentimentr)
library(tidyverse)
I have incorporated addtional Sentiment lexicon named Sentimentr. It is designed by Tyler Rinker to quickly calculate text polarity sentiment at the sentence level and optionally aggregate by rows or grouping variable(s).
"A Tale of Two Cities have one of the most hearwrenching emdings I have ever read. Although Sydney Carton does not express his words before getting executed at the guillotine, Dickens ends the novel imagining what he might have said. The hypothetical farewell speech is the amalgamation of all the human emotions possible. A sence of optimism for a better future along with pain, diasappoinment,anger,regret. His whole hypothetical speech seems very euphric to me. I just wanted to see how I can identify these beautifully embedded emotions through Sentimentr through its analysis of polarity of sentiments.
text<-("I see a beautiful city and a brilliant people rising from this abyss, and, in their struggles to be truly free, in their triumphs and defeats, through long, long years to come, I see the evil of this time and of the previous time of which this is the natural birth, gradually making expiation for itself and wearing out.I see the lives for which I lay down my life, peaceful, useful, prosperous and happy, in that England which I shall see no more. I see Her with a child upon her bosom, who bears my name. I see her father, aged and bent, but otherwise restored, and faithful to all men in his healing office, and at peace; I see the good old man, so long their friend, in ten years' time enriching them with all he has, and passing tranquilly to his reward.I see that I hold a sanctuary in their hearts, and in the hearts of their descendants, generations hence. I see her, an old woman, weeping for me on the anniversary of this day. I see her and her husband, their course done, lying side by side in their last earthly bed, and I know that each was not more honoured and held sacred in the other's soul, than I was in the souls of both.I see that child who lay upon her bosom and who bore my name, a man winning his way up in that path of life which once was mine. I see him winning it so well, that my name is made illustrious there by the light of his. I see the blots I threw upon it, faded away. I see him, foremost of just judges and honoured men, bringing a boy of my name, with a forehead that I know and golden hair, to this place - then fair to look upon, with not a trace of this day's disfigurement - and I hear him tell the child my story, with a tender and a faltering voice.
It is a far, far better thing that I do, than I have ever done; it is a far, far better rest that I go to, than I have ever known." )
sentiment(text)
## element_id sentence_id word_count sentiment
## 1: 1 1 59 0.21481170
## 2: 1 2 25 0.55000000
## 3: 1 3 13 0.16641006
## 4: 1 4 48 0.72529628
## 5: 1 5 19 0.32118203
## 6: 1 6 15 -0.12909944
## 7: 1 7 42 -0.07715167
## 8: 1 8 29 0.11141720
## 9: 1 9 19 0.47030225
## 10: 1 10 10 0.00000000
## 11: 1 11 57 0.28477446
## 12: 1 12 31 0.28736848
sentiment_by(text, by = NULL)
## element_id word_count sd ave_sentiment
## 1: 1 367 0.2544732 0.2472257
emotional_analysis<-emotion(text)
emotional_analysis
## element_id sentence_id word_count emotion_type emotion_count
## 1: 1 1 59 anger 1
## 2: 1 1 59 anticipation 7
## 3: 1 1 59 disgust 1
## 4: 1 1 59 fear 3
## 5: 1 1 59 joy 4
## ---
## 188: 1 12 31 fear_negated 0
## 189: 1 12 31 joy_negated 0
## 190: 1 12 31 sadness_negated 0
## 191: 1 12 31 surprise_negated 0
## 192: 1 12 31 trust_negated 0
## emotion
## 1: 0.01694915
## 2: 0.11864407
## 3: 0.01694915
## 4: 0.05084746
## 5: 0.06779661
## ---
## 188: 0.00000000
## 189: 0.00000000
## 190: 0.00000000
## 191: 0.00000000
## 192: 0.00000000
Using the Sentimentr package I have also analyzed 2012 presidential debate.Positive and Negative emotions from the debate is identified. The dataset is accessed from the GitHub page of Sentimentr package.
debates <- presidential_debates_2012
debates%>%
get_sentences() %>%
sentiment() -> debate_sentiments
debate_sentiments %>%
ggplot()+geom_density(aes(sentiment))

debate_sentiments %>%
mutate(polarity_level=ifelse(sentiment>0, "Positive","Negative"))%>%
count(person,polarity_level)%>%
ggplot()+geom_col(aes(x=person,y=n,fill=polarity_level))
