R Markdown

sentiment Analysis:

For this project I have chosen Charles Dickens’s literary work. I have analyzed “A Tale of Two Cities”,“Great Expectation”,“A christmas carol”,“Oliver Twist”,“Hard Times”. I have used gutenbergr library to download these books.

This assignment focuses on opinion mining.When human readers approach a text, we use our understanding of the emotional intent of words to infer whether a section of text is positive or negative, or perhaps characterized by some other more nuanced emotion like surprise or disgust. We can use the tools of text mining to approach the emotional content of text programmatically,

The codes from Sentiment analysis with tidy data from Text Mining with R have been implemented on my chosen corpus.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tm)
## Loading required package: NLP
library(purrr)
library(tidytext)
library(gutenbergr)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
## 
##     annotate
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
dickens <- gutenberg_download(c(98, 1400, 46, 730, 786))
## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org

Firstly the data has been loaded and tockenized,and stopwords are removed.

tidy_dickens <- dickens %>%
  mutate(
    linenumber = row_number()) %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words)
## Joining, by = "word"
tidy_dickens %>%
  count(word, sort = TRUE)
## # A tibble: 19,587 x 2
##    word       n
##    <chr>  <int>
##  1 time    1218
##  2 hand     918
##  3 don’t    863
##  4 night    835
##  5 looked   814
##  6 head     813
##  7 oliver   766
##  8 dear     751
##  9 joe      719
## 10 miss     702
## # ... with 19,577 more rows

bing package is used to observe the emotions in these five novels.

bing_word_counts <- tidy_dickens %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
## Joining, by = "word"
bing_word_counts 
## # A tibble: 3,133 x 3
##    word   sentiment     n
##    <chr>  <chr>     <int>
##  1 miss   negative    702
##  2 poor   negative    350
##  3 dark   negative    299
##  4 hard   negative    223
##  5 dead   negative    218
##  6 strong positive    203
##  7 love   positive    202
##  8 fell   negative    198
##  9 death  negative    194
## 10 cold   negative    192
## # ... with 3,123 more rows

nrc is used to project sentiments.

nrc_word_counts <- tidy_dickens %>%
  inner_join(get_sentiments("nrc")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
## Joining, by = "word"
nrc_word_counts
## # A tibble: 8,683 x 3
##    word      sentiment        n
##    <chr>     <chr>        <int>
##  1 time      anticipation  1218
##  2 dear      positive       751
##  3 sir       positive       697
##  4 sir       trust          697
##  5 boy       disgust        608
##  6 boy       negative       608
##  7 gentleman positive       561
##  8 gentleman trust          561
##  9 father    trust          446
## 10 fire      fear           353
## # ... with 8,673 more rows

Presence of bing,nrc, afinn in Dickens’s work

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble  3.1.0     v stringr 1.4.0
## v tidyr   1.1.3     v forcats 0.5.1
## v readr   1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x ggplot2::annotate()      masks NLP::annotate()
## x dplyr::filter()          masks stats::filter()
## x kableExtra::group_rows() masks dplyr::group_rows()
## x dplyr::lag()             masks stats::lag()
new_afinn <- tidy_dickens %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = linenumber %/% 5) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
new_bing_and_nrc1 <- bind_rows(
  tidy_dickens %>% 
    inner_join(get_sentiments("bing")) %>%
    mutate(method = "Bing et al."),
  tidy_dickens %>% 
    inner_join(get_sentiments("nrc") %>% 
                 filter(sentiment %in% c("positive", 
                                         "negative"))
    ) %>%
    mutate(method = "NRC")) %>%
  count(method, index = linenumber %/% 5, sentiment) %>%
  pivot_wider(names_from = sentiment,
              values_from = n,
              values_fill = 0) %>% 
  mutate(sentiment = positive - negative)
## Joining, by = "word"
## Joining, by = "word"
bind_rows(new_afinn, 
          new_bing_and_nrc1) %>%
  ggplot(aes(index, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y")

Sentiment analysis is getting implemented in Tale of Two Cities.Three dictionaries have been compared.It refers the net sentiment (positive - negative) in each chunk of the novel text for each sentiment lexicon.All of them are added together for visualization.

A_tale_of_two_cities <- tidy_dickens %>% 
  filter(tidy_dickens$gutenberg_id == "98")
A_tale_of_two_cities
## # A tibble: 46,636 x 3
##    gutenberg_id linenumber word      
##           <int>      <int> <chr>     
##  1           98       3843 tale      
##  2           98       3843 cities    
##  3           98       3845 story     
##  4           98       3845 french    
##  5           98       3845 revolution
##  6           98       3847 charles   
##  7           98       3847 dickens   
##  8           98       3850 contents  
##  9           98       3853 book      
## 10           98       3853 recalled  
## # ... with 46,626 more rows
afinn1 <- A_tale_of_two_cities %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = linenumber %/% 5) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
bing_and_nrc1 <- bind_rows(
  A_tale_of_two_cities %>% 
    inner_join(get_sentiments("bing")) %>%
    mutate(method = "Bing et al."),
  A_tale_of_two_cities %>% 
    inner_join(get_sentiments("nrc") %>% 
                 filter(sentiment %in% c("positive", 
                                         "negative"))
    ) %>%
    mutate(method = "NRC")) %>%
  count(method, index = linenumber %/% 5, sentiment) %>%
  pivot_wider(names_from = sentiment,
              values_from = n,
              values_fill = 0) %>% 
  mutate(sentiment = positive - negative)
## Joining, by = "word"
## Joining, by = "word"
bind_rows(afinn1, 
          bing_and_nrc1) %>%
  ggplot(aes(index, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y")

Most common positive and negative words:

bing_word_counts2 <- tidy_dickens %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
## Joining, by = "word"
bing_word_counts2
## # A tibble: 3,133 x 3
##    word   sentiment     n
##    <chr>  <chr>     <int>
##  1 miss   negative    702
##  2 poor   negative    350
##  3 dark   negative    299
##  4 hard   negative    223
##  5 dead   negative    218
##  6 strong positive    203
##  7 love   positive    202
##  8 fell   negative    198
##  9 death  negative    194
## 10 cold   negative    192
## # ... with 3,123 more rows
bing_word_counts2 %>%
  group_by(sentiment) %>%
  slice_max(n, n = 10) %>% 
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(x = "Contribution to sentiment",
       y = NULL)

WordClouds

library(wordcloud)
## Loading required package: RColorBrewer
set.seed(123) # for reproducibility 
tidy_dickens %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100,
          rot.per=0.35,
          colors=brewer.pal(7, "Accent")))
## Joining, by = "word"
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : time could not be fit on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : miss could not be fit on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : defarge could not be fit on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : word could not be fit on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100, rot.per = 0.35, colors =
## brewer.pal(7, : house could not be fit on page. It will not be plotted.

library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
tidy_dickens %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("red", "blue"),
                   max.words = 100)
## Joining, by = "word"

Postive and Negative words for A tale of two cities through wordclouds:

A_tale_of_two_cities %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("red", "blue"),
                   max.words = 100)
## Joining, by = "word"

library(sentimentr)
library(tidyverse)

I have incorporated addtional Sentiment lexicon named Sentimentr. It is designed by Tyler Rinker to quickly calculate text polarity sentiment at the sentence level and optionally aggregate by rows or grouping variable(s).

"A Tale of Two Cities have one of the most hearwrenching emdings I have ever read. Although Sydney Carton does not express his words before getting executed at the guillotine, Dickens ends the novel imagining what he might have said. The hypothetical farewell speech is the amalgamation of all the human emotions possible. A sence of optimism for a better future along with pain, diasappoinment,anger,regret. His whole hypothetical speech seems very euphric to me. I just wanted to see how I can identify these beautifully embedded emotions through Sentimentr through its analysis of polarity of sentiments.

text<-("I see a beautiful city and a brilliant people rising from this abyss, and, in their struggles to be truly free, in their triumphs and defeats, through long, long years to come, I see the evil of this time and of the previous time of which this is the natural birth, gradually making expiation for itself and wearing out.I see the lives for which I lay down my life, peaceful, useful, prosperous and happy, in that England which I shall see no more. I see Her with a child upon her bosom, who bears my name. I see her father, aged and bent, but otherwise restored, and faithful to all men in his healing office, and at peace; I see the good old man, so long their friend, in ten years' time enriching them with all he has, and passing tranquilly to his reward.I see that I hold a sanctuary in their hearts, and in the hearts of their descendants, generations hence. I see her, an old woman, weeping for me on the anniversary of this day. I see her and her husband, their course done, lying side by side in their last earthly bed, and I know that each was not more honoured and held sacred in the other's soul, than I was in the souls of both.I see that child who lay upon her bosom and who bore my name, a man winning his way up in that path of life which once was mine. I see him winning it so well, that my name is made illustrious there by the light of his. I see the blots I threw upon it, faded away. I see him, foremost of just judges and honoured men, bringing a boy of my name, with a forehead that I know and golden hair, to this place - then fair to look upon, with not a trace of this day's disfigurement - and I hear him tell the child my story, with a tender and a faltering voice.
It is a far, far better thing that I do, than I have ever done; it is a far, far better rest that I go to, than I have ever known." )

sentiment(text)
##     element_id sentence_id word_count   sentiment
##  1:          1           1         59  0.21481170
##  2:          1           2         25  0.55000000
##  3:          1           3         13  0.16641006
##  4:          1           4         48  0.72529628
##  5:          1           5         19  0.32118203
##  6:          1           6         15 -0.12909944
##  7:          1           7         42 -0.07715167
##  8:          1           8         29  0.11141720
##  9:          1           9         19  0.47030225
## 10:          1          10         10  0.00000000
## 11:          1          11         57  0.28477446
## 12:          1          12         31  0.28736848
sentiment_by(text, by = NULL)
##    element_id word_count        sd ave_sentiment
## 1:          1        367 0.2544732     0.2472257
emotional_analysis<-emotion(text)
emotional_analysis
##      element_id sentence_id word_count     emotion_type emotion_count
##   1:          1           1         59            anger             1
##   2:          1           1         59     anticipation             7
##   3:          1           1         59          disgust             1
##   4:          1           1         59             fear             3
##   5:          1           1         59              joy             4
##  ---                                                                 
## 188:          1          12         31     fear_negated             0
## 189:          1          12         31      joy_negated             0
## 190:          1          12         31  sadness_negated             0
## 191:          1          12         31 surprise_negated             0
## 192:          1          12         31    trust_negated             0
##         emotion
##   1: 0.01694915
##   2: 0.11864407
##   3: 0.01694915
##   4: 0.05084746
##   5: 0.06779661
##  ---           
## 188: 0.00000000
## 189: 0.00000000
## 190: 0.00000000
## 191: 0.00000000
## 192: 0.00000000

Using the Sentimentr package I have also analyzed 2012 presidential debate.Positive and Negative emotions from the debate is identified. The dataset is accessed from the GitHub page of Sentimentr package.

debates <- presidential_debates_2012  

debates%>%
  get_sentences() %>%
  sentiment() -> debate_sentiments

debate_sentiments %>%
  ggplot()+geom_density(aes(sentiment))

debate_sentiments %>%
  mutate(polarity_level=ifelse(sentiment>0, "Positive","Negative"))%>%
  count(person,polarity_level)%>%
  ggplot()+geom_col(aes(x=person,y=n,fill=polarity_level))

Sources

1.Robinson, J. S. and D. (n.d.). Text mining with r: A tidy approach. https://www.tidytextmining.com/sentiment.html.↩︎

2. Rinker, T. (n.d.). Trinker/sentimentr. Retrieved April 19, 2021, from https://github.com/trinker/sentimentr