ASSIGNMENT 10

In Text Mining with R, Chapter 2 looks at Sentiment Analysis. In this assignment, you should start by getting the primary example code from chapter 2 working in an R Markdown document. You should provide a citation to this base code. You’re then asked to extend the code in two ways:

Work with a different corpus of your choosing, and Incorporate at least one additional sentiment lexicon (possibly from another R package that you’ve found through research). As usual, please submit links to both an .Rmd file posted in your GitHub repository and to your code on rpubs.com. You make work on a small team on this assignment.

Approach

After researching around there are many ways I can approach this. I first tried using the gutenbergr library to get a different corpus and there were many novels and text available. However I wanted to try some thing different. I wanted to try to do sentimental analysis either some Financial data, Tweets or emails. The easeist to get access to was Financial Data. I tried accessing finance data via gutenbergr library but there were only novals and text available on those subjects so instead I decided to download Amazon 10K and upload it to Github to be used as my corpus.

Libraries

library(tidytext)
library(textdata)
library(janeaustenr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
library(ggplot2)
library(tidyr)
library(wordcloud)
## Loading required package: RColorBrewer

Base Code from Text Mining with R

Here we are reattempting to recreate the example from Chapter 2 of Text Mining with R.

Citation

Robinson, Julia Silge and David. “1 The Tidy Text Format: Text Mining with R.” 1 The Tidy Text Format | Text Mining with R, https://www.tidytextmining.com/tidytext.html.

get_sentiments("afinn")
## # A tibble: 2,477 x 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # ... with 2,467 more rows
get_sentiments("bing")
## # A tibble: 6,786 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # ... with 6,776 more rows
get_sentiments("nrc")
## # A tibble: 13,875 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # ... with 13,865 more rows
tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(text, 
                                regex("^chapter [\\divxlc]", 
                                      ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)
nrc_joy <- get_sentiments("nrc") %>% 
  filter(sentiment == "joy")

tidy_books %>%
  filter(book == "Emma") %>%
  inner_join(nrc_joy) %>%
  count(word, sort = TRUE)
## Joining, by = "word"
## # A tibble: 301 x 2
##    word          n
##    <chr>     <int>
##  1 good        359
##  2 friend      166
##  3 hope        143
##  4 happy       125
##  5 love        117
##  6 deal         92
##  7 found        92
##  8 present      89
##  9 kind         82
## 10 happiness    76
## # ... with 291 more rows
jane_austen_sentiment <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(book, index = linenumber %/% 80, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>% 
  mutate(sentiment = positive - negative)
## Joining, by = "word"
ggplot(jane_austen_sentiment, aes(index, sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~book, ncol = 2, scales = "free_x")

pride_prejudice <- tidy_books %>% 
  filter(book == "Pride & Prejudice")

pride_prejudice
## # A tibble: 122,204 x 4
##    book              linenumber chapter word     
##    <fct>                  <int>   <int> <chr>    
##  1 Pride & Prejudice          1       0 pride    
##  2 Pride & Prejudice          1       0 and      
##  3 Pride & Prejudice          1       0 prejudice
##  4 Pride & Prejudice          3       0 by       
##  5 Pride & Prejudice          3       0 jane     
##  6 Pride & Prejudice          3       0 austen   
##  7 Pride & Prejudice          7       1 chapter  
##  8 Pride & Prejudice          7       1 1        
##  9 Pride & Prejudice         10       1 it       
## 10 Pride & Prejudice         10       1 is       
## # ... with 122,194 more rows
afinn <- pride_prejudice %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = linenumber %/% 80) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
bing_and_nrc <- bind_rows(
  pride_prejudice %>% 
    inner_join(get_sentiments("bing")) %>%
    mutate(method = "Bing et al."),
  pride_prejudice %>% 
    inner_join(get_sentiments("nrc") %>% 
                 filter(sentiment %in% c("positive", 
                                         "negative"))
    ) %>%
    mutate(method = "NRC")) %>%
  count(method, index = linenumber %/% 80, sentiment) %>%
  pivot_wider(names_from = sentiment,
              values_from = n,
              values_fill = 0) %>% 
  mutate(sentiment = positive - negative)
## Joining, by = "word"
## Joining, by = "word"
bind_rows(afinn, 
          bing_and_nrc) %>%
  ggplot(aes(index, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y")

bing_word_counts <- tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
## Joining, by = "word"
bing_word_counts %>%
  group_by(sentiment) %>%
  slice_max(n, n = 10) %>% 
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(x = "Contribution to sentiment",
       y = NULL)

tidy_books %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))
## Joining, by = "word"

library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
tidy_books %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"),
                   max.words = 100)
## Joining, by = "word"

p_and_p_sentences <- tibble(text = prideprejudice) %>% 
  unnest_tokens(sentence, text, token = "sentences")
p_and_p_sentences$sentence[2]
## [1] "by jane austen"
austen_chapters <- austen_books() %>%
  group_by(book) %>%
  unnest_tokens(chapter, text, token = "regex", 
                pattern = "Chapter|CHAPTER [\\dIVXLC]") %>%
  ungroup()

austen_chapters %>% 
  group_by(book) %>% 
  summarise(chapters = n())
## # A tibble: 6 x 2
##   book                chapters
##   <fct>                  <int>
## 1 Sense & Sensibility       51
## 2 Pride & Prejudice         62
## 3 Mansfield Park            49
## 4 Emma                      56
## 5 Northanger Abbey          32
## 6 Persuasion                25
bingnegative <- get_sentiments("bing") %>% 
  filter(sentiment == "negative")

wordcounts <- tidy_books %>%
  group_by(book, chapter) %>%
  summarize(words = n())
## `summarise()` has grouped output by 'book'. You can override using the `.groups` argument.
tidy_books %>%
  semi_join(bingnegative) %>%
  group_by(book, chapter) %>%
  summarize(negativewords = n()) %>%
  left_join(wordcounts, by = c("book", "chapter")) %>%
  mutate(ratio = negativewords/words) %>%
  filter(chapter != 0) %>%
  slice_max(ratio, n = 1) %>% 
  ungroup()
## Joining, by = "word"
## `summarise()` has grouped output by 'book'. You can override using the `.groups` argument.
## # A tibble: 6 x 5
##   book                chapter negativewords words  ratio
##   <fct>                 <int>         <int> <int>  <dbl>
## 1 Sense & Sensibility      43           161  3405 0.0473
## 2 Pride & Prejudice        34           111  2104 0.0528
## 3 Mansfield Park           46           173  3685 0.0469
## 4 Emma                     15           151  3340 0.0452
## 5 Northanger Abbey         21           149  2982 0.0500
## 6 Persuasion                4            62  1807 0.0343

Amazon 10K Corpus

I took a different approach regarding the New Corpus. I couldnt find a Financial Statement on the Gutenbergr library so Instead I downladed Amazon 10K which was released on 29OCT21. Amazon had performed below Analysts expectation and I wanted apply loughran lexicon to Amazon 10K.

https://ir.aboutamazon.com/sec-filings/sec-filings-details/default.aspx?FilingId=15311356

library(gutenbergr)

Amazon10K= "https://github.com/mianshariq/SPS/raw/10668f542cef868339300d9278f69bf6cd12dcf2/Data%20607/Assignments/Amazon10k.txt"
Amazon10K=readLines(Amazon10K)
Amazon10K <- tibble(text = Amazon10K)
Amazon10K
## # A tibble: 6,193 x 1
##    text                                   
##    <chr>                                  
##  1 ""                                     
##  2 "Table of Contents"                    
##  3 ""                                     
##  4 ""                                     
##  5 ""                                     
##  6 ""                                     
##  7 "UNITED STATES"                        
##  8 "SECURITIES AND EXCHANGE COMMISSION"   
##  9 "Washington, D.C. 20549"               
## 10 " ____________________________________"
## # ... with 6,183 more rows
Count_Amazon10K <- Amazon10K[c(1:nrow(Amazon10K)),]

Amazon10K_Chapters <- Count_Amazon10K %>% 
  filter(text != "") %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("CHAPTER [\\dIVXLC]", ignore_case =  TRUE)))) 
Amazon10K_Chapters
## # A tibble: 2,686 x 3
##    text                                                       linenumber chapter
##    <chr>                                                           <int>   <int>
##  1 Table of Contents                                                   1       0
##  2 UNITED STATES                                                       2       0
##  3 SECURITIES AND EXCHANGE COMMISSION                                  3       0
##  4 Washington, D.C. 20549                                              4       0
##  5  ____________________________________                               5       0
##  6 FORM 10-Q                                                           6       0
##  7 ____________________________________                                7       0
##  8 (Mark One)                                                          8       0
##  9 ?                                                                   9       0
## 10 QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE S~         10       0
## # ... with 2,676 more rows

Loughran Lexicon

According to https://sraf.nd.edu/textual-analysis/resources/ loughran lexicon is used for Accounting and Fiannce. Intrestingally it stated that “A growing literature finds significant relations between financial phenomena (e.g., stock returns, commodity prices, bankruptcies, governance) and the sentiment of financial disclosures as measured by word classifications such as those provided below.” The sintements used to describe the sentiments are: “negative”, “positive”, “litigious”, “uncertainty”, “constraining”, or “superfluous”

get_sentiments("loughran")
## # A tibble: 4,150 x 2
##    word         sentiment
##    <chr>        <chr>    
##  1 abandon      negative 
##  2 abandoned    negative 
##  3 abandoning   negative 
##  4 abandonment  negative 
##  5 abandonments negative 
##  6 abandons     negative 
##  7 abdicated    negative 
##  8 abdicates    negative 
##  9 abdicating   negative 
## 10 abdication   negative 
## # ... with 4,140 more rows
Amazon10K_tidy <- Amazon10K_Chapters %>% 
  unnest_tokens(word, text) %>% 
  inner_join(get_sentiments("loughran")) %>% 
  count(word, sentiment, sort = TRUE) %>%
  group_by(sentiment) %>%
  top_n(10) %>% ungroup() %>% mutate(word = reorder(word, n)) %>%
  anti_join(stop_words)

names(Amazon10K_tidy)<-c("word", "sentiment", "Freq")
Amazon10K_tidy
## # A tibble: 55 x 3
##    word          sentiment     Freq
##    <fct>         <chr>        <int>
##  1 obligations   constraining    44
##  2 risks         uncertainty     36
##  3 losses        negative        31
##  4 loss          negative        30
##  5 jurisdictions litigious       28
##  6 laws          litigious       28
##  7 regulations   litigious       28
##  8 risk          uncertainty     23
##  9 commitments   constraining    21
## 10 requirements  constraining    20
## # ... with 45 more rows
ggplot(data = Amazon10K_tidy, aes(x = word, y = Freq, fill = sentiment)) + 
  geom_bar(stat = "identity") + coord_flip() + facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",x = NULL)

Amazon10K_tidy %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))
## Warning in wordcloud(word, n, max.words = 100): required could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): improvements could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): efficiently could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): prevent could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): unable could not be fit on page.
## It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): approximately could not be fit
## on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): assumptions could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): expose could not be fit on page.
## It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): regulation could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): controversies could not be fit
## on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): contracts could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): regulations could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): adverse could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): difficult could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): opportunities could not be fit
## on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): jurisdictions could not be fit
## on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): comply could not be fit on page.
## It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): adequately could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): investigations could not be fit
## on page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): claims could not be fit on page.
## It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): requires could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): require could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): favorable could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): contractual could not be fit on
## page. It will not be plotted.

Amazon10K_tidy %>%
  inner_join(get_sentiments("loughran")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"),
                   max.words = 100)

Comparing to Different Quarter 10K

I want to compare the sentimnet to a different Quarter and see whether there is a difference there as the most recent quarter results were lover than expected. You can see a small difference in the sentiments and that is predictable as Amazon is a big company and they dont want to spook their investors becasue ther didnt meet expectations in their last Quarter.

Amazon10KQ2= "https://github.com/mianshariq/SPS/raw/3790a3bf2750dd6cb34548d50cc6b3507eb0e904/Data%20607/Assignments/Amazon10KQ3.txt"
Amazon10KQ2 =readLines(Amazon10KQ2)
Amazon10KQ2 <- tibble(text = Amazon10KQ2)
Amazon10KQ2
## # A tibble: 8,861 x 1
##    text                                
##    <chr>                               
##  1 ""                                  
##  2 "Table of Contents"                 
##  3 "  "                                
##  4 ""                                  
##  5 ""                                  
##  6 ""                                  
##  7 ""                                  
##  8 "UNITED STATES"                     
##  9 "SECURITIES AND EXCHANGE COMMISSION"
## 10 "Washington, D.C. 20549"            
## # ... with 8,851 more rows
Count_Amazon10KQ2 <- Amazon10KQ2[c(1:nrow(Amazon10KQ2)),]

Amazon10KQ2_Chapters <- Count_Amazon10KQ2 %>% 
  filter(text != "") %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("CHAPTER [\\dIVXLC]", ignore_case =  TRUE)))) 
Amazon10KQ2_tidy <- Amazon10KQ2_Chapters %>% 
  unnest_tokens(word, text) %>% 
  inner_join(get_sentiments("loughran")) %>% 
  count(word, sentiment, sort = TRUE) %>%
  group_by(sentiment) %>%
  top_n(10) %>% ungroup() %>% mutate(word = reorder(word, n)) %>%
  anti_join(stop_words)

names(Amazon10KQ2_tidy)<-c("word", "sentiment", "Freq")
Amazon10K_tidy
## # A tibble: 55 x 3
##    word          sentiment     Freq
##    <fct>         <chr>        <int>
##  1 obligations   constraining    44
##  2 risks         uncertainty     36
##  3 losses        negative        31
##  4 loss          negative        30
##  5 jurisdictions litigious       28
##  6 laws          litigious       28
##  7 regulations   litigious       28
##  8 risk          uncertainty     23
##  9 commitments   constraining    21
## 10 requirements  constraining    20
## # ... with 45 more rows
Amazon10KQ2_tidy
## # A tibble: 47 x 3
##    word          sentiment     Freq
##    <fct>         <chr>        <int>
##  1 obligations   constraining    51
##  2 losses        negative        46
##  3 loss          negative        45
##  4 risks         uncertainty     38
##  5 jurisdictions litigious       35
##  6 laws          litigious       32
##  7 required      constraining    30
##  8 regulations   litigious       28
##  9 restricted    constraining    28
## 10 risk          uncertainty     28
## # ... with 37 more rows
ggplot(data = Amazon10KQ2_tidy, aes(x = word, y = Freq, fill = sentiment)) + 
  geom_bar(stat = "identity") + coord_flip() + facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",x = NULL)

Amazon10KQ2_tidy %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))
## Warning in wordcloud(word, n, max.words = 100): impairment could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): adversely could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): assumptions could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): complaint could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): improvements could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): successfully could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): requirements could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): claims could not be fit on page.
## It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): damages could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): effective could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): required could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): alliances could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): enable could not be fit on page.
## It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): prevent could not be fit on
## page. It will not be plotted.

Amazon10K_tidy %>%
  inner_join(get_sentiments("loughran")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"),
                   max.words = 100)

Applying bing Sentiment to Corpus

I want to apply the bing sentiment to pur corpus. This will classify between Positive and Negative. It shopud be interesting to see how some of the words are classified since its a more technical document.

Amazon10K_tidy <- Amazon10K_Chapters %>% 
  unnest_tokens(word, text) %>% 
  inner_join(get_sentiments("bing")) %>% 
  count(word, sentiment, sort = TRUE) %>%
  group_by(sentiment) %>%
  top_n(10) %>% ungroup() %>% mutate(word = reorder(word, n)) %>%
  anti_join(stop_words)

names(Amazon10K_tidy)<-c("word", "sentiment", "Freq")

Results for bing sentiment

Its interesting to see fulfillment as a postive here. However we now that fulfillment is the centers Amazon use as their warehouse.

ggplot(data = Amazon10K_tidy, aes(x = word, y = Freq, fill = sentiment)) + 
  geom_bar(stat = "identity") + coord_flip() + facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",x = NULL)

Amazon10K_tidy %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))
## Warning in wordcloud(word, n, max.words = 100): effective could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): gross could not be fit on page.
## It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): losses could not be fit on page.
## It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): adverse could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): fulfillment could not be fit on
## page. It will not be plotted.
## Warning in wordcloud(word, n, max.words = 100): outstanding could not be fit on
## page. It will not be plotted.

Amazon10K_tidy %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("gray20", "gray80"),
                   max.words = 100)
## Warning in comparison.cloud(., colors = c("gray20", "gray80"), max.words = 100):
## outstanding could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80"), max.words = 100):
## protection could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80"), max.words = 100):
## significant could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80"), max.words = 100):
## sufficient could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80"), max.words = 100):
## liability could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80"), max.words = 100):
## restricted could not be fit on page. It will not be plotted.
## Warning in comparison.cloud(., colors = c("gray20", "gray80"), max.words = 100):
## unable could not be fit on page. It will not be plotted.

Conclusion

Sentiment analysis provides a way to make it easier to show hoe opinions are expressed in texts or whether a text is classified to a certain attitude. In our Amazon 10K example we can use sentiment analysis to understand how a 10k can be used to correlate stock prices and business effectiveness and restructuring . In this assignment, we added a new corpus from Amazon 10K and applied sentiment analysis. Then we used laughran lexicon and applied it to the Amazon 10K corpus. We found out that words such as obligations had many frequency under the constraining sentiment and losses and loss had high frequency in the negative sentiment. Based on comparing the different 10K for Amazon using the laughran lexicon, One thing I do found interesting was that in Q2, since Amazon had higher gains than Compare to Q3. They did mention gains as the most positive sentiment compare to being second in Q3 sentiments and the frequencies were 20:12 between Q2 and Q3. For the financial documents laughran lexicon seems useful because it gave an in depth sentiment of the 10K compare to the bing lexicon where the most positive word was fulfillment which in this case is int a positive work but it reference fulfillment centers amazon have.