Data 607 Week 10 assignment

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(tidytext)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(syuzhet)
library(textdata)
library(janeaustenr)

Including Plots

You can also embed plots, for example:

get_sentiments("afinn")

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

get_sentiments("bing")

get_sentiments("nrc")

tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(text, 
                                regex("^chapter [\\divxlc]", 
                                      ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)

nrc_joy <- get_sentiments("nrc") %>% 
  filter(sentiment == "joy")

tidy_books %>%
  filter(book == "Emma") %>%
  inner_join(nrc_joy) %>%
  count(word, sort = TRUE)

## Joining with `by = join_by(word)`

Using a different sentiment analysis

get_sentiments("loughran")

Using sentiment analysis for a different book called Persuasion

nrc_joy <- get_sentiments("nrc") %>% 
  filter(sentiment == "joy")

tidy_books %>%
  filter(book == "Emma") %>%
  inner_join(nrc_joy) %>%
  count(word, sort = TRUE)

## Joining with `by = join_by(word)`

Finding the positive sentiment number of the dataset

loughran_positive <- get_sentiments("loughran") %>% 
  filter(sentiment == "positive")

Persuasion <- tidy_books %>%
  filter(book == "Persuasion") %>%
  inner_join(loughran_positive) %>%
  count(word, sort = TRUE)

## Joining with `by = join_by(word)`

sum_positive <- sum(Persuasion$n)

print(sum_positive)

## [1] 1225

Finding the negative sentiment number of the dataset

loughran_negative <- get_sentiments("loughran") %>% 
  filter(sentiment == "negative")

Persuasion2 <- tidy_books %>%
  filter(book == "Persuasion") %>%
  inner_join(loughran_negative) %>%
  count(word, sort = TRUE)

## Joining with `by = join_by(word)`

sum_negative <- sum(Persuasion2$n)

print(sum_negative)

## [1] 1227

Conclusion

The over all sentiment seems to be somewhat neutral with just slightly edging out in the negative sentiment.