ASSIGNMENT

In Text Mining with R, Chapter 2 looks at Sentiment Analysis. In this assignment, you should start by getting the primary example code from chapter 2 working in an R Markdown document. You should provide a citation to this base code. You’re then asked to extend the code in two ways:

Work with a different corpus of your choosing, and Incorporate at least one additional sentiment lexicon (possibly from another R package that you’ve found through research). As usual, please submit links to both an .Rmd file posted in your GitHub repository and to your code on rpubs.com.

Sentiment analysis with tidy data

The code chunks and texts below are from Chapter 2 of Text Mining with R (Silge and Robinson, 2020)

First, we will load the required libraries and take a look at the different sentiment lexicons.

library(janeaustenr)
library(tidyverse)

## -- Attaching packages ---------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts ------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(stringr)
library(tidytext)
library(jsonlite)

## 
## Attaching package: 'jsonlite'

## The following object is masked from 'package:purrr':
## 
##     flatten

library(dplyr)
library (ggplot2)

get_sentiments("afinn")

## # A tibble: 2,477 x 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # ... with 2,467 more rows

get_sentiments("bing")

## # A tibble: 6,786 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # ... with 6,776 more rows

get_sentiments("nrc")

## # A tibble: 13,901 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # ... with 13,891 more rows

Sentiment analysis with inner join

Let’s look at the words with a joy score from the NRC lexicon. What are the most common joy words in Emma?

tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
      ignore_case = TRUE
    )))
  ) %>%
  ungroup() %>%
  unnest_tokens(word, text)

Now that the text is in a tidy format with one word per row, we are ready to do the sentiment analysis. First, let’s use the NRC lexicon and filter() for the joy words. Next, let’s filter() the data frame with the text from the books for the words from Emma and then use inner_join() to perform the sentiment analysis. What are the most common joy words in Emma?

nrc_joy <- get_sentiments("nrc") %>%
  filter(sentiment == "joy")

tidy_books %>%
  filter(book == "Emma") %>%
  inner_join(nrc_joy) %>%
  count(word, sort = TRUE)

## Joining, by = "word"

## # A tibble: 303 x 2
##    word        n
##    <chr>   <int>
##  1 good      359
##  2 young     192
##  3 friend    166
##  4 hope      143
##  5 happy     125
##  6 love      117
##  7 deal       92
##  8 found      92
##  9 present    89
## 10 kind       82
## # ... with 293 more rows

Next, we count up how many positive and negative words there are in defined sections of each book. We define an index here to keep track of where we are in the narrative; this index (using integer division) counts up sections of 80 lines of text. Small sections of text may not have enough words in them to get a good estimate of sentiment while really large sections can wash out narrative structure. For these books, using 80 lines works well, but this can vary depending on individual texts, how long the lines were to start with, etc. We then use spread() so that we have negative and positive sentiment in separate columns, and lastly calculate a net sentiment (positive - negative).

jane_austen_sentiment <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(book, index = linenumber %/% 80, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)

Now we can plot these sentiment scores across the plot trajectory of each novel.

ggplot(jane_austen_sentiment, aes(index, sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~book, ncol = 2, scales = "free_x")

## Comparing the three sentiment dictionaries With several options for sentiment lexicons, you might want some more information on which one is appropriate for your purposes. Let’s use all three sentiment lexicons and examine how the sentiment changes across the narrative arc of Pride and Prejudice.

pride_prejudice <- tidy_books %>%
  filter(book == "Pride & Prejudice")

pride_prejudice

## # A tibble: 122,204 x 4
##    book              linenumber chapter word     
##    <fct>                  <int>   <int> <chr>    
##  1 Pride & Prejudice          1       0 pride    
##  2 Pride & Prejudice          1       0 and      
##  3 Pride & Prejudice          1       0 prejudice
##  4 Pride & Prejudice          3       0 by       
##  5 Pride & Prejudice          3       0 jane     
##  6 Pride & Prejudice          3       0 austen   
##  7 Pride & Prejudice          7       1 chapter  
##  8 Pride & Prejudice          7       1 1        
##  9 Pride & Prejudice         10       1 it       
## 10 Pride & Prejudice         10       1 is       
## # ... with 122,194 more rows

afinn <- pride_prejudice %>%
  inner_join(get_sentiments("afinn")) %>%
  group_by(index = linenumber %/% 80) %>%
  summarise(sentiment = sum(value)) %>%
  mutate(method = "AFINN")

bing_and_nrc <- bind_rows(
  pride_prejudice %>%
    inner_join(get_sentiments("bing")) %>%
    mutate(method = "Bing et al."),
  pride_prejudice %>%
    inner_join(get_sentiments("nrc") %>%
      filter(sentiment %in% c(
        "positive",
        "negative"
      ))) %>%
    mutate(method = "NRC")
) %>%
  count(method, index = linenumber %/% 80, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)

We now have an estimate of the net sentiment (positive - negative) in each chunk of the novel text for each sentiment lexicon. Let’s bind them together and visualize them.

bind_rows(
  afinn,
  bing_and_nrc
) %>%
  ggplot(aes(index, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y")

Why is the result for the NRC lexicon biased so high in sentiment compared to the Bing et al. result? Let’s look briefly at how many positive and negative words are in these lexicons.

get_sentiments("nrc") %>%
  filter(sentiment %in% c(
    "positive",
    "negative"
  )) %>%
  count(sentiment)

## # A tibble: 2 x 2
##   sentiment     n
##   <chr>     <int>
## 1 negative   3324
## 2 positive   2312

get_sentiments("bing") %>%
  count(sentiment)

## # A tibble: 2 x 2
##   sentiment     n
##   <chr>     <int>
## 1 negative   4781
## 2 positive   2005

Most common positive and negative words

bing_word_counts <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()

## Joining, by = "word"

bing_word_counts

## # A tibble: 2,585 x 3
##    word     sentiment     n
##    <chr>    <chr>     <int>
##  1 miss     negative   1855
##  2 well     positive   1523
##  3 good     positive   1380
##  4 great    positive    981
##  5 like     positive    725
##  6 better   positive    639
##  7 enough   positive    613
##  8 happy    positive    534
##  9 love     positive    495
## 10 pleasure positive    462
## # ... with 2,575 more rows

bing_word_counts %>%
  group_by(sentiment) %>%
  top_n(10) %>%
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(
    y = "Contribution to sentiment",
    x = NULL
  ) +
  coord_flip()

## Selecting by n

custom_stop_words <- bind_rows(
  tibble(
    word = c("miss"),
    lexicon = c("custom")
  ),
  stop_words
)

custom_stop_words

## # A tibble: 1,150 x 2
##    word        lexicon
##    <chr>       <chr>  
##  1 miss        custom 
##  2 a           SMART  
##  3 a's         SMART  
##  4 able        SMART  
##  5 about       SMART  
##  6 above       SMART  
##  7 according   SMART  
##  8 accordingly SMART  
##  9 across      SMART  
## 10 actually    SMART  
## # ... with 1,140 more rows

Wordclouds

Let’s look at the most common words in Jane Austen’s works as a whole.

library(wordcloud)

## Warning: package 'wordcloud' was built under R version 4.0.3

## Loading required package: RColorBrewer

tidy_books %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

## Joining, by = "word"

Let’s do the sentiment analysis to tag positive and negative words using an inner join, then find the most common positive and negative words. Until the step where we need to send the data to comparison.cloud(), this can all be done with joins, piping, and dplyr because our data is in tidy format.

library(reshape2)

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(
    colors = c("gray20", "gray80"),
    max.words = 100
  )

## Joining, by = "word"

## Looking at units beyond just words

We may want to tokenize text into sentences, and it makes sense to use a new name for the output column in such a case.

PandP_sentences <- tibble(text = prideprejudice) %>%
  unnest_tokens(sentence, text, token = "sentences")

PandP_sentences$sentence[2]

## [1] "however little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters."

austen_chapters <- austen_books() %>%
  group_by(book) %>%
  unnest_tokens(chapter, text,
    token = "regex",
    pattern = "Chapter|CHAPTER [\\dIVXLC]"
  ) %>%
  ungroup()
# unnest splits into tokens using a regex pattern
austen_chapters %>%
  group_by(book) %>%
  summarise(chapters = n())

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 6 x 2
##   book                chapters
##   <fct>                  <int>
## 1 Sense & Sensibility       51
## 2 Pride & Prejudice         62
## 3 Mansfield Park            49
## 4 Emma                      56
## 5 Northanger Abbey          32
## 6 Persuasion                25

Let’s find the number of negative words in each chapter and divide by the total words in each chapter. For each book, which chapter has the highest proportion of negative words?

bingnegative <- get_sentiments("bing") %>%
  filter(sentiment == "negative")

wordcounts <- tidy_books %>%
  group_by(book, chapter) %>%
  summarize(words = n())

## `summarise()` regrouping output by 'book' (override with `.groups` argument)

tidy_books %>%
  semi_join(bingnegative) %>%
  group_by(book, chapter) %>%
  summarize(negativewords = n()) %>%
  left_join(wordcounts, by = c("book", "chapter")) %>%
  mutate(ratio = negativewords / words) %>%
  filter(chapter != 0) %>%
  top_n(1) %>%
  ungroup()

## Joining, by = "word"
## `summarise()` regrouping output by 'book' (override with `.groups` argument)

## Selecting by ratio

## # A tibble: 6 x 5
##   book                chapter negativewords words  ratio
##   <fct>                 <int>         <int> <int>  <dbl>
## 1 Sense & Sensibility      43           161  3405 0.0473
## 2 Pride & Prejudice        34           111  2104 0.0528
## 3 Mansfield Park           46           173  3685 0.0469
## 4 Emma                     15           151  3340 0.0452
## 5 Northanger Abbey         21           149  2982 0.0500
## 6 Persuasion                4            62  1807 0.0343

Work with a different corpus of your choosing: NYT Movie

I would like to extend my assignment from Week 9, for which I looked at movie reviews for movies that were released in 2019. For this assignment, I will perform sentiment analysis on the summary of the NYT movie review, for movies released in 2019.

Fetching data from API

url <- "https://api.nytimes.com/svc/movies/v2/reviews/search.json?opening-date=2019-01-01;2020-01-01"
key <- "OkVf8SLjqbsAbQAvVbiJBn6yRY7azROI"
addurl <- paste0(url, "&api-key=")
# fetched using json + key call
data <- fromJSON(paste0(addurl, key))
df <- data$results
knitr:: kable (df)

## Warning in `[<-.data.frame`(`*tmp*`, , j, value = structure(list(type =
## structure(c("article", : provided 3 variables to replace 1 variables

## Warning in `[<-.data.frame`(`*tmp*`, , j, value = structure(list(type =
## structure(c("mediumThreeByTwo210", : provided 4 variables to replace 1 variables

display_title	mpaa_rating	critics_pick	byline	headline	summary_short	publication_date	opening_date	date_updated	link	multimedia
The Devil Has a Name	R	0	Ben Kenigsberg	‘The Devil Has a Name’ Review: A Little Guy Takes On Big Oil	A farmer sues an oil company in this well-meaning but muddled drama directed by Edward James Olmos.	2020-10-15	2019-08-04	2020-10-15 11:04:07	article	mediumThreeByTwo210
The Cuban		0	Glenn Kenny	‘The Cuban’ Review: Memories Lost and Reignited	Louis Gossett Jr. plays a musician with Alzheimer’s disease whose new nurse helps him reach back into his past.	2020-07-30	2019-12-07	2020-07-30 11:04:04	article	mediumThreeByTwo210
Spark		0	Ben Kenigsberg	‘Spark’ and ‘The Observer’ Review: A Filmmaker’s Past, and China’s	A pair of documentaries serve as an introduction to Hu Jie, a documentarian whose films memorialize the horrors of the Mao era.	2020-07-02	2019-12-31	2020-07-02 15:16:02	article	mediumThreeByTwo210
Unsettled: Seeking Refuge in America		0	Ben Kenigsberg	‘Unsettled: Seeking Refuge in America’ Review: Embracing a New Home	A documentary on L.G.B.T.Q. refugees becomes progressively engaging as its subjects’ paths diverge.	2020-06-29	2019-04-01	2020-06-29 20:02:02	article	mediumThreeByTwo210
Parkland Rising		0	Teo Bugbee	‘Parkland Rising’ Review: A Close-Up on Activism After a Tragedy	A documentary profiles students and parents who became organizers after the school shooting, but doesn’t provide a lot of fresh insight.	2020-06-04	2019-10-04	2020-06-04 11:04:03	article	mediumThreeByTwo210
Citizen K		0	Ben Kenigsberg	‘Citizen K’ Review: Trying to Pin Down a Russian Oligarch	A detailed documentary on Mikhail Khodorkovsky proves slightly unsatisfying.	2020-01-14	2019-11-22	2020-02-12 17:44:01	article	mediumThreeByTwo210
Ghost Stories		0	Bilal Qureshi	‘Ghost Stories’ Review: Bollywood Aims for Frights	With this Netflix anthology, four directors from Indian cinema draw horror from a country’s lived reality.	2020-01-02	2019-12-31	2020-01-02 12:04:02	article	mediumThreeByTwo210
One Cut of the Dead	Not Rated	1	Elisabeth Vincentelli	‘One Cut of the Dead’ Review: A Fresh Take on the Zombie Flick	A one-take movie stunt is justified in the Japanese director Shinichiro Ueda’s fast and furious backstage comedy.	2019-12-25	2019-09-24	2019-12-25 14:04:02	article	mediumThreeByTwo210
Clemency	R	0	Manohla Dargis	‘Clemency’ Review: No Place for Mercy	A tremendous Alfre Woodard plays a warden at a prison whose world is upended by the fate of death-row inmates.	2019-12-25	2019-12-27	2020-01-17 17:44:02	article	mediumThreeByTwo210
The 21st Annual Animation Show of Shows		0	Glenn Kenny	Review: Animated Shorts of Every Stripe and Feather	Find a pen-and-ink dog, stop-motion girl and a C.G.I. fox in “The 21st Annual Animation Show of Shows.”	2019-12-24	2019-12-25	2020-01-13 17:44:01	article	mediumThreeByTwo210
What She Said: The Art of Pauline Kael		0	Jeannette Catsoulis	‘What She Said’ Review: Pauline Kael, Screen Queen	Kael’s distinctively passionate voice, competing with movie fragments, is disastrously muffled, as are those of her admirers and detractors.	2019-12-24	2019-12-25	2020-01-15 17:44:02	article	mediumThreeByTwo210
1917	R	0	Manohla Dargis	‘1917’ Review: Paths of Technical Glory	Sam Mendes directs this visually extravagant drama about young British soldiers on a perilous mission in World War I.	2019-12-24	2019-12-25	2020-01-24 17:44:02	article	mediumThreeByTwo210
Spies in Disguise	PG	0	Glenn Kenny	‘Spies in Disguise’ Review: Smug Agent Meets Gadget Geek	Will Smith and Tom Holland are an action odd couple in this animated comedy.	2019-12-24	2019-12-25	2020-01-24 17:44:02	article	mediumThreeByTwo210
The Song of Names	PG-13	0	Ben Kenigsberg	‘The Song of Names’ Review: A Prodigy, a War and a Mystery	A young violinist goes missing in London in 1951. The eventual answer as to why is powerful.	2019-12-24	2019-12-25	2020-01-17 17:44:02	article	mediumThreeByTwo210
Little Women	PG	1	A.O. Scott	‘Little Women’ Review: This Movie Is Big	Greta Gerwig refreshes a literary classic with the help of a dazzling cast that includes Saoirse Ronan, Florence Pugh, Laura Dern and Meryl Streep.	2019-12-23	2019-12-25	2020-01-23 17:44:01	article	mediumThreeByTwo210
Dabangg 3		0	Rachel Saltz	‘Dabangg 3’ Review: A Hero From the School of Knock ’em Hard	In this Bollywood action flick, Salman Khan is a one-man wrecking crew. When not knocking heads, he dances.	2019-12-22	2019-12-20	2020-01-09 17:44:01	article	mediumThreeByTwo210
Invisible Life	R	1	Glenn Kenny	‘Invisible Life’ Review: Sisterhood Is Stronger Than Patriarchy	Two sisters living in 1950s Brazil are kept apart by their father but can’t be spiritually separated.	2019-12-19	2019-12-20	2020-01-19 17:44:02	article	mediumThreeByTwo210
Togo	PG	0	Jason Bailey	‘Togo’ Review: A Man, His Dogs and a Very Bad Storm	Willem Dafoe stars in the latest addition to Disney’s sled dog canon.	2019-12-19	2019-12-20	2019-12-19 12:04:02	article	mediumThreeByTwo210
She’s Missing		0	Jeannette Catsoulis	‘She’s Missing’ Review: Gone Girl	An ominous atmosphere of impermanence marks this story of a New Mexico waitress who embarks on a perilous search for her vanished friend.	2019-12-19	2019-12-20	2019-12-19 12:04:03	article	mediumThreeByTwo210
Cats	PG	0	Manohla Dargis	‘Cats’ Review: They Dance, They Sing, They Lick Their Digital Fur	Tom Hooper’s movie is not a catastrophe. It’s not even an epic hairball.	2019-12-19	2019-12-20	2020-01-19 17:44:02	article	mediumThreeByTwo210

Sentimentr

We will use the sentimentr package to try to understand the sentiments conveyed in the reviews as a whole. The Sentimentr package allows the users to quickly perform sentiment analysis on sentences and it corrects for inversions. It assigns a score from -1 to 1 that indicates whether the sentiment is negative, neutral or positive.

library(sentimentr)

## Warning: package 'sentimentr' was built under R version 4.0.3

library(data.table)

## Warning: package 'data.table' was built under R version 4.0.3

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:reshape2':
## 
##     dcast, melt

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following object is masked from 'package:purrr':
## 
##     transpose

sentiment <- sentiment_by(df$summary_short)
View(sentiment)

The first column (element_id) in our case are the movies as they appear in the table above. Word_Count is the number of words in each sentences. The sentimentr package looks at each sentence in the review separately and calculates the overall average score and the standard deviation for the reviews. Most of the reviews in our case were one sentence long which is why our sd column is mostly empty.

Summarizing the sentiments

I want to convert the average sentiment scores into the following categories: positive, neutral and negative.

#function that generates a sentiment class based on average score
sentiment_df<- setDF(sentiment)
get_sentiment_class <- function(ave_sentiment){
  sentiment_class="Positive"
if (ave_sentiment < -.3){
  sentiment_class = "Negative"}
else if (ave_sentiment<.3){
  sentiment_class = "Neutral"
}
sentiment_class
}

sentiment_df$ave_sentiment <- 
  sapply(sentiment_df$ave_sentiment,get_sentiment_class)
sentiment_df

##    element_id word_count         sd ave_sentiment
## 1           1         18         NA      Negative
## 2           2         20         NA       Neutral
## 3           3         22         NA       Neutral
## 4           4         17         NA       Neutral
## 5           5         22         NA      Positive
## 6           6          9         NA       Neutral
## 7           7         17         NA       Neutral
## 8           8         19         NA       Neutral
## 9           9         21         NA      Negative
## 10         10         23         NA       Neutral
## 11         11         20         NA       Neutral
## 12         12         19         NA       Neutral
## 13         13         14         NA       Neutral
## 14         14         16 0.21250000       Neutral
## 15         15         24         NA       Neutral
## 16         16         19 0.04902903       Neutral
## 17         17         18         NA       Neutral
## 18         18         13         NA       Neutral
## 19         19         23         NA       Neutral
## 20         20         15 0.40130899       Neutral

ggplot(data=sentiment_df,aes(x=ave_sentiment,fill=ave_sentiment))+geom_bar()

It seems like most reviews were neutral. However, it is also interesting to see that there were more negative reviews than positive ones.

Afinn

Let’s see if we see similar results with the ‘Afinn’ lexicon:

x <- tibble (txt=df$summary_short)
x <-x %>% unnest_tokens(word,txt)

library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:purrr':
## 
##     compact

y <-join(x,get_sentiments("afinn"),type="inner")

## Joining by: word

##           word value
## 1        helps     2
## 2        reach     1
## 3        fresh     1
## 4    justified     2
## 5      furious    -3
## 6       comedy     1
## 7       prison    -2
## 8        death    -2
## 9         stop    -1
## 10  passionate     2
## 11         war    -2
## 12         odd    -2
## 13      comedy     1
## 14     missing    -2
## 15    powerful     2
## 16        help     2
## 17     ominous     3
## 18 catastrophe    -3

y_df<- setDF(y)
get_sentiment_class <- function(value){
  sentiment_class="Positive"
if (value < (-3)){
  sentiment_class = "Negative"}
else if (value < (3)){
  sentiment_class = "Neutral"
}
sentiment_class
}

y_df$value <- 
  sapply(y_df$value,get_sentiment_class)
y_df

##           word    value
## 1        helps  Neutral
## 2        reach  Neutral
## 3        fresh  Neutral
## 4    justified  Neutral
## 5      furious  Neutral
## 6       comedy  Neutral
## 7       prison  Neutral
## 8        death  Neutral
## 9         stop  Neutral
## 10  passionate  Neutral
## 11         war  Neutral
## 12         odd  Neutral
## 13      comedy  Neutral
## 14     missing  Neutral
## 15    powerful  Neutral
## 16        help  Neutral
## 17     ominous Positive
## 18 catastrophe  Neutral

ggplot(data=y_df,aes(x=value,fill=value))+geom_bar()

Even with Afinn, we are seeing more neutral reviews. However, unlike with Sentimentr, Afinn did not detect any negative reviews.

Which lexicon did you have was most useful for your corpus and why?

I thought the Sentimentr package was more useful for my corpus since it evaluates the entire sentence and is thus able to account for the context in which a word is being used.

Moreover,with Afinn,it seems like we are only limited to the list of words the lexicon contains. It is also interesting to see that Afinn gave the word ‘ominous’ a positive value.

References Robinson, Julia Silge and David. “Text Mining with R.” 2 Sentiment Analysis with Tidy Data, 29 Oct. 2020, www.tidytextmining.com/sentiment.html.

Week 10 Assignment

Atina Karim

10/29/2020