Week10Assignment

#The function get_sentiments() allows us to get specific sentiment lexicons with the appropriate measures for each one.

#install.packages("afinn")
#install.packages("bing")
#install.packages("nrc")

#I had to download library(“textdata”) because I could not install afinn,bing or nrc seperately

#install.packages("textdata")

library(tidytext)

get_sentiments("afinn")

## # A tibble: 2,477 × 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # ℹ 2,467 more rows

get_sentiments("bing")

## # A tibble: 6,786 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # ℹ 6,776 more rows

get_sentiments("nrc")

## # A tibble: 13,872 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # ℹ 13,862 more rows

Title:Sentiment analysis with inner join

#Let’s look at the words with a joy score from the NRC lexicon. = The ACTION.What we are doing #What are the most common joy words in Emma? = THE Question #The parts are broken down as such: First, we need to take the text of the novels and convert the text to the tidy format using unnest_tokens(), just as we did in Section 1.3.
# Let’s also set up some other columns to keep track of which line and chapter of the book each word comes from; we use group_by and mutate to construct those columns.

library(janeaustenr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(stringr)

tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(text, 
                                regex("^chapter [\\divxlc]", 
                                      ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)

WHAT IS HAPPENING #we chose the name word for the output column from unnest_tokens(). This is a convenient choice because the sentiment lexicons and stop word datasets have columns named word; performing inner joins and anti-joins is thus easier.

#Now that the text is in a tidy format with one word per row, we are ready to do the sentiment analysis. #First, let’s use the NRC lexicon and filter() for the joy words. #Next, let’s filter() the data frame with the text from the books for the words from Emma and then use inner_join() to perform the sentiment analysis. #What are the most common joy words in Emma? Let’s use count() from dplyr.

nrc_joy <- get_sentiments("nrc") %>% 
  filter(sentiment == "joy")

tidy_books %>%
  filter(book == "Emma") %>%
  inner_join(nrc_joy) %>%
  count(word, sort = TRUE)

## Joining with `by = join_by(word)`

## # A tibble: 301 × 2
##    word          n
##    <chr>     <int>
##  1 good        359
##  2 friend      166
##  3 hope        143
##  4 happy       125
##  5 love        117
##  6 deal         92
##  7 found        92
##  8 present      89
##  9 kind         82
## 10 happiness    76
## # ℹ 291 more rows

#We can also examine how sentiment changes throughout each novel. We can do this with just a handful of lines that are mostly dplyr functions. #First, we find a sentiment score for each word using the Bing lexicon and inner_join().

#Next, we count up how many positive and negative words there are in defined sections of each book. #We define an index here to keep track of where we are in the narrative; this index (using integer division) counts up sections of 80 lines of text. #. We then use pivot_wider() so that we have negative and positive sentiment in separate columns, and lastly calculate a net sentiment (positive - negative).

library(tidyr)

jane_austen_sentiment <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(book, index = linenumber %/% 80, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>% 
  mutate(sentiment = positive - negative)

## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 435434 of `x` matches multiple rows in `y`.
## ℹ Row 5051 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

#Now we can plot these sentiment scores across the plot trajectory of each novel. #Notice that we are plotting against the index on the x-axis that keeps track of narrative time in sections of text.

library(ggplot2)

ggplot(jane_austen_sentiment, aes(index, sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~book, ncol = 2, scales = "free_x")

Title:Comparing the three sentiment dictionaries First, let’s use filter() to choose only the words from the one novel we are interested in.

pride_prejudice <- tidy_books %>% 
  filter(book == "Pride & Prejudice")

pride_prejudice

## # A tibble: 122,204 × 4
##    book              linenumber chapter word     
##    <fct>                  <int>   <int> <chr>    
##  1 Pride & Prejudice          1       0 pride    
##  2 Pride & Prejudice          1       0 and      
##  3 Pride & Prejudice          1       0 prejudice
##  4 Pride & Prejudice          3       0 by       
##  5 Pride & Prejudice          3       0 jane     
##  6 Pride & Prejudice          3       0 austen   
##  7 Pride & Prejudice          7       1 chapter  
##  8 Pride & Prejudice          7       1 1        
##  9 Pride & Prejudice         10       1 it       
## 10 Pride & Prejudice         10       1 is       
## # ℹ 122,194 more rows

#Now, we can use inner_join() to calculate the sentiment in different ways. #Let’s again use integer division (%/%) to define larger sections of text that span multiple lines, #and we can use the same pattern with count(), pivot_wider(), and mutate() to find the net sentiment in each of these sections of text.

afinn <- pride_prejudice %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = linenumber %/% 80) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN")

## Joining with `by = join_by(word)`

bing_and_nrc <- bind_rows(
  pride_prejudice %>% 
    inner_join(get_sentiments("bing")) %>%
    mutate(method = "Bing et al."),
  pride_prejudice %>% 
    inner_join(get_sentiments("nrc") %>% 
                 filter(sentiment %in% c("positive", 
                                         "negative"))
    ) %>%
    mutate(method = "NRC")) %>%
  count(method, index = linenumber %/% 80, sentiment) %>%
  pivot_wider(names_from = sentiment,
              values_from = n,
              values_fill = 0) %>% 
  mutate(sentiment = positive - negative)

## Joining with `by = join_by(word)`
## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("nrc") %>% filter(sentiment %in% : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 215 of `x` matches multiple rows in `y`.
## ℹ Row 5178 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

#We now have an estimate of the net sentiment (positive - negative) in each chunk of the novel text for each sentiment lexicon. Let’s bind them together and visualize them

bind_rows(afinn, 
          bing_and_nrc) %>%
  ggplot(aes(index, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y")

Why is, for example, the result for the NRC lexicon biased so high in sentiment compared to the Bing et al. result? Let’s look briefly at how many positive and negative words are in these lexicons.

get_sentiments("nrc") %>% 
  filter(sentiment %in% c("positive", "negative")) %>% 
  count(sentiment)

## # A tibble: 2 × 2
##   sentiment     n
##   <chr>     <int>
## 1 negative   3316
## 2 positive   2308

get_sentiments("bing") %>% 
  count(sentiment)

## # A tibble: 2 × 2
##   sentiment     n
##   <chr>     <int>
## 1 negative   4781
## 2 positive   2005

Most common positive and negative words THE ACTION: analyze word counts that contribute to each sentiment. By implementing count() here with arguments of both word and sentiment, we find out how much each word contributed to each sentiment.

bing_word_counts <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()

## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 435434 of `x` matches multiple rows in `y`.
## ℹ Row 5051 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

bing_word_counts

## # A tibble: 2,585 × 3
##    word     sentiment     n
##    <chr>    <chr>     <int>
##  1 miss     negative   1855
##  2 well     positive   1523
##  3 good     positive   1380
##  4 great    positive    981
##  5 like     positive    725
##  6 better   positive    639
##  7 enough   positive    613
##  8 happy    positive    534
##  9 love     positive    495
## 10 pleasure positive    462
## # ℹ 2,575 more rows

#This can be shown visually, an

bing_word_counts %>%
  group_by(sentiment) %>%
  slice_max(n, n = 10) %>% 
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(x = "Contribution to sentiment",
       y = NULL)

WordCloud

library(wordcloud)

## Loading required package: RColorBrewer

tidy_books %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

## Joining with `by = join_by(word)`

library(reshape2)

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"),
                   max.words = 100)

## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 435434 of `x` matches multiple rows in `y`.
## ℹ Row 5051 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

#some sentiment analysis algorithms look beyond only unigrams (i.e. single words) to try to understand the sentiment of a sentence as a whole. These algorithms try to understand that

#I am not having a good day.

#is a sad sentence, not a happy one

For these, we may want to tokenize text into sentences, and it makes sense to use a new name for the output column in such a case.

p_and_p_sentences <- tibble(text = prideprejudice) %>% 
  unnest_tokens(sentence, text, token = "sentences")

p_and_p_sentences$sentence[2]

## [1] "by jane austen"

#Another option in unnest_tokens() is to split into tokens using a regex pattern. We could use this, for example, to split the text of Jane Austen’s novels into a data frame by chapter.

austen_chapters <- austen_books() %>%
  group_by(book) %>%
  unnest_tokens(chapter, text, token = "regex", 
                pattern = "Chapter|CHAPTER [\\dIVXLC]") %>%
  ungroup()

austen_chapters %>% 
  group_by(book) %>% 
  summarise(chapters = n())

## # A tibble: 6 × 2
##   book                chapters
##   <fct>                  <int>
## 1 Sense & Sensibility       51
## 2 Pride & Prejudice         62
## 3 Mansfield Park            49
## 4 Emma                      56
## 5 Northanger Abbey          32
## 6 Persuasion                25

WHAT HAPPENED: We have recovered the correct number of chapters in each novel (plus an “extra” row for each novel title). In the austen_chapters data frame, each row corresponds to one chapter.

WHAT ARE WE LOOKING: We can use tidy text analysis to ask questions such as what are the most negative chapters in each of Jane Austen’s novels? First, let’s get the list of negative words from the Bing lexicon. Second, let’s make a data frame of how many words are in each chapter so we can normalize for the length of chapters.

bingnegative <- get_sentiments("bing") %>% 
  filter(sentiment == "negative")

wordcounts <- tidy_books %>%
  group_by(book, chapter) %>%
  summarize(words = n())

## `summarise()` has grouped output by 'book'. You can override using the
## `.groups` argument.

Then, let’s find the number of negative words in each chapter and divide by the total words in each chapter. For each book, which chapter has the highest proportion of negative words?

tidy_books %>%
  semi_join(bingnegative) %>%
  group_by(book, chapter) %>%
  summarize(negativewords = n()) %>%
  left_join(wordcounts, by = c("book", "chapter")) %>%
  mutate(ratio = negativewords/words) %>%
  filter(chapter != 0) %>%
  slice_max(ratio, n = 1) %>% 
  ungroup()

## Joining with `by = join_by(word)`
## `summarise()` has grouped output by 'book'. You can override using the
## `.groups` argument.

## # A tibble: 6 × 5
##   book                chapter negativewords words  ratio
##   <fct>                 <int>         <int> <int>  <dbl>
## 1 Sense & Sensibility      43           161  3405 0.0473
## 2 Pride & Prejudice        34           111  2104 0.0528
## 3 Mansfield Park           46           173  3685 0.0469
## 4 Emma                     15           151  3340 0.0452
## 5 Northanger Abbey         21           149  2982 0.0500
## 6 Persuasion                4            62  1807 0.0343

I want to check something out

austen_books()

## # A tibble: 73,422 × 2
##    text                    book               
##  * <chr>                   <fct>              
##  1 "SENSE AND SENSIBILITY" Sense & Sensibility
##  2 ""                      Sense & Sensibility
##  3 "by Jane Austen"        Sense & Sensibility
##  4 ""                      Sense & Sensibility
##  5 "(1811)"                Sense & Sensibility
##  6 ""                      Sense & Sensibility
##  7 ""                      Sense & Sensibility
##  8 ""                      Sense & Sensibility
##  9 ""                      Sense & Sensibility
## 10 "CHAPTER 1"             Sense & Sensibility
## # ℹ 73,412 more rows

MY EXTENSION: I found a Rpackage for H.C.Anderson Fairytales and I am going to use loughran sentiments. I could not find the several other lexicons, just loughran.

THE NEW CORPUS

#install.packages("hcandersenr")
library(hcandersenr)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ tibble  3.2.1     ✔ purrr   1.0.1
## ✔ readr   2.1.4     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

hca_fairytales() %>% 
  select(book, language) %>% 
  unique() %>% 
  mutate(langauge = fct_relevel(language, c("English", "Spanish", "German", "Danish", "French"))) %>%
  ggplot(aes(langauge, book)) + 
  geom_raster(alpha = 0.3) +
  scale_x_discrete(position = "top")

READ IN THE DATA The Book I need : English

hcandersen_en

## # A tibble: 31,380 × 2
##    text                                                                    book 
##    <chr>                                                                   <chr>
##  1 "A soldier came marching along the high road: \"Left, right - left, ri… The …
##  2 "had his knapsack on his back, and a sword at his side; he had been to… The …
##  3 "and was now returning home. As he walked on, he met a very frightful-… The …
##  4 "witch in the road. Her under-lip hung quite down on her breast, and s… The …
##  5 "and said, \"Good evening, soldier; you have a very fine sword, and a … The …
##  6 "knapsack, and you are a real soldier; so you shall have as much money… The …
##  7 "you like.\""                                                           The …
##  8 "\"Thank you, old witch,\" said the soldier."                           The …
##  9 "\"Do you see that large tree,\" said the witch, pointing to a tree wh… The …
## 10 "beside them. \"Well, it is quite hollow inside, and you must climb to… The …
## # ℹ 31,370 more rows

loughran_sentiments = get_sentiments("loughran")
loughran_sentiments

## # A tibble: 4,150 × 2
##    word         sentiment
##    <chr>        <chr>    
##  1 abandon      negative 
##  2 abandoned    negative 
##  3 abandoning   negative 
##  4 abandonment  negative 
##  5 abandonments negative 
##  6 abandons     negative 
##  7 abdicated    negative 
##  8 abdicates    negative 
##  9 abdicating   negative 
## 10 abdication   negative 
## # ℹ 4,140 more rows

tidy_books_ft = hcandersen_en %>%
  group_by(book) %>%
  mutate( 
    linenumber = row_number()
    
  )%>%
  
  ungroup() %>%
  unnest_tokens(word,text)

I picked Little Claus and Big Claus book beacuse 1)we are close to the holiday ,2) it is one of the longer books and 3) because the snow Queen is not apart of the english package

Claus = tidy_books_ft %>%
  filter(book == "Little Claus and Big Claus")
Claus

## # A tibble: 0 × 3
## # ℹ 3 variables: book <chr>, linenumber <int>, word <chr>

tidy_books_ft

## # A tibble: 416,311 × 3
##    book           linenumber word    
##    <chr>               <int> <chr>   
##  1 The tinder-box          1 a       
##  2 The tinder-box          1 soldier 
##  3 The tinder-box          1 came    
##  4 The tinder-box          1 marching
##  5 The tinder-box          1 along   
##  6 The tinder-box          1 the     
##  7 The tinder-box          1 high    
##  8 The tinder-box          1 road    
##  9 The tinder-box          1 left    
## 10 The tinder-box          1 right   
## # ℹ 416,301 more rows

THIS ONLY GOES UP TO TINDER BOX. I did take out the ungroup function, did not make a difference I put in line number and took it out , it does not make a difference Since the en version is a database, let me try two different things

Claus = hcandersen_en %>%
  filter(book == "Little Claus and Big Claus")
Claus

## # A tibble: 0 × 2
## # ℹ 2 variables: text <chr>, book <chr>

I have to read this in using a the package hcandersenr and the Tidy data frame hca fairytales which already has three colmun: text, Book and language and filter out from there

tidy_books_tales = hcandersenr::hca_fairytales() %>%
  group_by(book) %>%
 
  ungroup() %>%
  unnest_tokens(word, text)

tidy_books_tales

## # A tibble: 1,567,749 × 3
##    book           language word       
##    <chr>          <chr>    <chr>      
##  1 The tinder-box Danish   der        
##  2 The tinder-box Danish   kom        
##  3 The tinder-box Danish   en         
##  4 The tinder-box Danish   soldat     
##  5 The tinder-box Danish   marcherende
##  6 The tinder-box Danish   hen        
##  7 The tinder-box Danish   ad         
##  8 The tinder-box Danish   landevejen 
##  9 The tinder-box Danish   én         
## 10 The tinder-box Danish   to         
## # ℹ 1,567,739 more rows

Clausv2 = tidy_books_tales %>%
  filter(book == "Litte Claus and big claus" ) %>%
  filter(language == "English")
Clausv2

## # A tibble: 0 × 3
## # ℹ 3 variables: book <chr>, language <chr>, word <chr>

New Story :The Tinder-box

Tinder = tidy_books_tales %>% 
  filter(book == "The tinder-box") %>% 
  filter(language == "English")

Tinder

## # A tibble: 2,908 × 3
##    book           language word    
##    <chr>          <chr>    <chr>   
##  1 The tinder-box English  a       
##  2 The tinder-box English  soldier 
##  3 The tinder-box English  came    
##  4 The tinder-box English  marching
##  5 The tinder-box English  along   
##  6 The tinder-box English  the     
##  7 The tinder-box English  high    
##  8 The tinder-box English  road    
##  9 The tinder-box English  left    
## 10 The tinder-box English  right   
## # ℹ 2,898 more rows

ANALYSIS

afinn_tales = Tinder %>% 
  inner_join(get_sentiments("afinn")) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN")

## Joining with `by = join_by(word)`

loughran_tales = Tinder %>%
  inner_join(get_sentiments("loughran")) %>%
  filter(!is.na(sentiment)) %>%
  count(sentiment, sort = TRUE)

## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("loughran")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2656 of `x` matches multiple rows in `y`.
## ℹ Row 2526 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

loughran_tales

## # A tibble: 5 × 2
##   sentiment        n
##   <chr>        <int>
## 1 negative        24
## 2 positive        24
## 3 uncertainty     15
## 4 litigious        7
## 5 constraining     2

loughran_counts = Tinder %>%
  inner_join(get_sentiments("loughran")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()

## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("loughran")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2656 of `x` matches multiple rows in `y`.
## ℹ Row 2526 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

loughran_counts

## # A tibble: 35 × 3
##    word      sentiment       n
##    <chr>     <chr>       <int>
##  1 could     uncertainty     9
##  2 good      positive        6
##  3 shall     litigious       5
##  4 cut       negative        4
##  5 great     positive        4
##  6 might     uncertainty     4
##  7 beautiful positive        3
##  8 best      positive        3
##  9 pleasant  positive        3
## 10 closed    negative        2
## # ℹ 25 more rows

loughran_counts %>%
  group_by(sentiment) %>%
  slice_max(n, n = 10) %>% 
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(x = "Contribution to sentiment",
       y = NULL)

tidy_books_tales %>%
  filter(book == "The tinder-box") %>% 
  filter(language == "English") %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

## Joining with `by = join_by(word)`

tidy_books_tales %>%
  filter(book == "The tinder-box") %>% 
  inner_join(get_sentiments("bing")) %>%
  filter(language == "English") %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"),
                   max.words = 100)

## Joining with `by = join_by(word)`

afinn_txt = Tinder %>% 
  inner_join(get_sentiments("afinn")) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN")

## Joining with `by = join_by(word)`

bing_and_nrc_txt = bind_rows(
  Tinder %>% 
    inner_join(get_sentiments("bing")) %>%
    mutate(method = "Bing et al."),
  Tinder %>% 
    inner_join(get_sentiments("nrc") %>% 
                 filter(sentiment %in% c("positive", 
                                         "negative"))
    ) %>%
    mutate(method = "NRC")) %>%
  count(method, sentiment) %>%
  pivot_wider(names_from = sentiment,
              values_from = n,
              values_fill = 0) %>% 
  mutate(sentiment = positive - negative)

## Joining with `by = join_by(word)`
## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("nrc") %>% filter(sentiment %in% : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1050 of `x` matches multiple rows in `y`.
## ℹ Row 4705 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

bind_rows(afinn, 
          bing_and_nrc) %>%
  ggplot(aes(index, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y")

CONCLUSION

When I first counted the sentiments, under AFINN, positive and negative sentiments were equal in values with uncertainty coming in in second place. Thus I expected an equal amount of negative and positive words popping up at the same frequency in the visuals. Side note when we did tally up the frequency of each word, the highest was could which was categorize as uncertainty and then out of the Top 10 words to frequently appear most were under the category positive. Thus I changed my theory that the visual would lean towards positive sentiment analysis.

However when created the visuals to track sentiment along the storyline in all three lexicons, the story The-tinder box is predominately assessed as positive. ## R Markdown

This is an R Markdown document

SOURCE Robinson, J. S. and D. (n.d.). 2 sentiment analysis with Tidy Data: Text mining with R. 2 Sentiment analysis with tidy data | Text Mining with R. 11/03/23.< https://www.tidytextmining.com/sentiment.html>

EmilHvitfeldt. (n.d.). Emilhvitfeldt/hcandersenr: An R package for H.C. Andersens Fairy tales. GitHub.11/04/2023.https://cran.r-project.org/web/packages/hcandersenr/hcandersenr.pdf https://github.com/EmilHvitfeldt/hcandersenr

https://cran.r-project.org/web/packages/hcandersenr/hcandersenr.pdf https://github.com/EmilHvitfeldt/hcandersenr https://rdrr.io/cran/hcandersenr/f/README.md

ent. Markgfbfdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Week10Assignment_ZMO

Zainab.O

2023-11-04

Including Plots