Introduction
In Text Mining with R, Chapter 2 “https://www.tidytextmining.com/sentiment.html” looks at Sentiment Analysis. In this assignment, you should start by getting the primary example code from chapter 2 working in an R Markdown document. You should provide a citation to this base code. You’re then asked to extend the code in two ways:
-Work with a different corpus of your choosing, and
-Incorporate at least one additional sentiment lexicon (possibly from another R package that you’ve found through research).
As usual, please submit links to both an .Rmd file posted in your GitHub repository and to your code on rpubs.com. You may work as a small team on this assignment.
library(tidytext)
library(NLP)
library(tm)
library(SnowballC)
library(fastDummies)
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(janeaustenr)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
##
## annotate
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.5 ✓ purrr 0.3.4
## ✓ tidyr 1.1.4 ✓ forcats 0.5.1
## ✓ readr 2.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x ggplot2::annotate() masks NLP::annotate()
## x dplyr::arrange() masks plyr::arrange()
## x purrr::compact() masks plyr::compact()
## x dplyr::count() masks plyr::count()
## x dplyr::failwith() masks plyr::failwith()
## x dplyr::filter() masks stats::filter()
## x dplyr::id() masks plyr::id()
## x dplyr::lag() masks stats::lag()
## x dplyr::mutate() masks plyr::mutate()
## x dplyr::rename() masks plyr::rename()
## x dplyr::summarise() masks plyr::summarise()
## x dplyr::summarize() masks plyr::summarize()
library(textdata)
get_sentiments(lexicon = c("afinn"))
## # A tibble: 2,477 × 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # … with 2,467 more rows
get_sentiments("bing")
## # A tibble: 6,786 × 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # … with 6,776 more rows
get_sentiments("nrc")
## # A tibble: 13,875 × 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
## 7 abandoned negative
## 8 abandoned sadness
## 9 abandonment anger
## 10 abandonment fear
## # … with 13,865 more rows
clothing <- read.csv("Womens Clothing E-Commerce Reviews.csv")
head(clothing)
## X Clothing.ID Age Title
## 1 0 767 33
## 2 1 1080 34
## 3 2 1077 60 Some major design flaws
## 4 3 1049 50 My favorite buy!
## 5 4 847 47 Flattering shirt
## 6 5 1080 49 Not for the very petite
## Review.Text
## 1 Absolutely wonderful - silky and sexy and comfortable
## 2 Love this dress! it's sooo pretty. i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite. i bought a petite and am 5'8". i love the length on me- hits just a little below the knee. would definitely be a true midi on someone who is truly petite.
## 3 I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c
## 4 I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!
## 5 This shirt is very flattering to all due to the adjustable front tie. it is the perfect length to wear with leggings and it is sleeveless so it pairs well with any cardigan. love this shirt!!!
## 6 I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.
## Rating Recommended.IND Positive.Feedback.Count Division.Name Department.Name
## 1 4 1 0 Initmates Intimate
## 2 5 1 4 General Dresses
## 3 3 0 0 General Dresses
## 4 5 1 0 General Petite Bottoms
## 5 5 1 6 General Tops
## 6 2 0 4 General Dresses
## Class.Name
## 1 Intimates
## 2 Dresses
## 3 Dresses
## 4 Pants
## 5 Blouses
## 6 Dresses
afinnSL <- get_sentiments("afinn")
table(afinnSL$value)
##
## -5 -4 -3 -2 -1 0 1 2 3 4 5
## 16 43 264 966 309 1 208 448 172 45 5
bingSL <- get_sentiments("bing")
table(bingSL$sentiment)
##
## negative positive
## 4781 2005
nrcSL <- get_sentiments("nrc")
table(nrcSL$sentiment)
##
## anger anticipation disgust fear joy negative
## 1246 837 1056 1474 687 3318
## positive sadness surprise trust
## 2308 1187 532 1230
clothing <- read.csv("Womens Clothing E-Commerce Reviews.csv")
summary(clothing)
## X Clothing.ID Age Title
## Min. : 0 Min. : 0.0 Min. :18.0 : 3810
## 1st Qu.: 5871 1st Qu.: 861.0 1st Qu.:34.0 Love it! : 136
## Median :11742 Median : 936.0 Median :41.0 Beautiful : 95
## Mean :11742 Mean : 918.1 Mean :43.2 Love : 88
## 3rd Qu.:17614 3rd Qu.:1078.0 3rd Qu.:52.0 Love! : 84
## Max. :23485 Max. :1205.0 Max. :99.0 Beautiful!: 72
## (Other) :19201
## Review.Text
## : 845
## Perfect fit and i've gotten so many compliments. i buy all my suits from here now! : 3
## I bought this shirt at the store and after going home and trying it on, i promptly went online and ordered two more! i've gotten multiple compliments anytime i wear any of them. great for looking put together with no fuss. \npeople that have commented there's were destroyed in the wash didn't read the care label which says dry clean. : 2
## I purchased this and another eva franco dress during retailer's recent 20% off sale. i was looking for dresses that were work appropriate, but that would also transition well to happy hour or date night. they both seemed to be just what i was looking for. i ordered a 4 regular and a 6 regular, as i am usually in between sizes. the 4 was definitely too small. the 6 fit, technically, but was very ill fitting. not only is the dress itself short, but it is very short-waisted. i am only 5'3", but it fe: 2
## Lightweight, soft cotton top and shorts. i think it's meant to be a beach cover-up but i'm wearing it as a thin, light-weight summer outfit on these hot hot days. the top has a loose elastic around the bottom which i didn't realize when i ordered it, but i like it and it matches the look in the photos. and the shorts are very low-cut - don't expect them up around your waist. again, i like that. some might want to wear a cami underneath because it's a thin cotton but i'm fine as-is. i bought it i : 2
## Love, love these jeans. being short they come right to my ankle. super soft and don?t require any hemming. i ordered my typical jean size of 26 and they fit like a glove. would love to have these in black and grey. : 2
## (Other) :22630
## Rating Recommended.IND Positive.Feedback.Count
## Min. :1.000 Min. :0.0000 Min. : 0.000
## 1st Qu.:4.000 1st Qu.:1.0000 1st Qu.: 0.000
## Median :5.000 Median :1.0000 Median : 1.000
## Mean :4.196 Mean :0.8224 Mean : 2.536
## 3rd Qu.:5.000 3rd Qu.:1.0000 3rd Qu.: 3.000
## Max. :5.000 Max. :1.0000 Max. :122.000
##
## Division.Name Department.Name Class.Name
## : 14 : 14 Dresses :6319
## General :13850 Bottoms : 3799 Knits :4843
## General Petite: 8120 Dresses : 6319 Blouses :3097
## Initmates : 1502 Intimate: 1735 Sweaters:1428
## Jackets : 1032 Pants :1388
## Tops :10468 Jeans :1147
## Trend : 119 (Other) :5264
clothing <- read.csv("Womens Clothing E-Commerce Reviews.csv")
head(clothing)
## X Clothing.ID Age Title
## 1 0 767 33
## 2 1 1080 34
## 3 2 1077 60 Some major design flaws
## 4 3 1049 50 My favorite buy!
## 5 4 847 47 Flattering shirt
## 6 5 1080 49 Not for the very petite
## Review.Text
## 1 Absolutely wonderful - silky and sexy and comfortable
## 2 Love this dress! it's sooo pretty. i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite. i bought a petite and am 5'8". i love the length on me- hits just a little below the knee. would definitely be a true midi on someone who is truly petite.
## 3 I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c
## 4 I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!
## 5 This shirt is very flattering to all due to the adjustable front tie. it is the perfect length to wear with leggings and it is sleeveless so it pairs well with any cardigan. love this shirt!!!
## 6 I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.
## Rating Recommended.IND Positive.Feedback.Count Division.Name Department.Name
## 1 4 1 0 Initmates Intimate
## 2 5 1 4 General Dresses
## 3 3 0 0 General Dresses
## 4 5 1 0 General Petite Bottoms
## 5 5 1 6 General Tops
## 6 2 0 4 General Dresses
## Class.Name
## 1 Intimates
## 2 Dresses
## 3 Dresses
## 4 Pants
## 5 Blouses
## 6 Dresses
ggplot(data = clothing, aes(x = Age)) + geom_histogram( fill = "green")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
=> The mean age of is 43 and the distribution seems to be positively skewed to the right.
clothing %>%
ggplot(aes(x = factor(Recommended.IND), fill = Recommended.IND)) +
geom_bar(alpha = 0.9) +
guides(fill = FALSE)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
=> The graph shows the distribution of the positive reviews “1” of the data.
clothing %>%
ggplot(aes(x = factor(Department.Name), fill = Department.Name)) +
geom_bar(alpha = 0.9) +
guides(fill = FALSE)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
=> The graph above displays the count or purchases of each department.
hist(clothing$Age)
=> The graph above shows the frequency of ages among of purchases.
word_sentiment = clothing %>%
group_by(Recommended.IND) %>%
summarise_each(funs(sum()))
## Warning: `summarise_each_()` was deprecated in dplyr 0.7.0.
## Please use `across()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
word_sentiment
## # A tibble: 2 × 11
## Recommended.IND X Clothing.ID Age Title Review.Text Rating
## <int> <int> <int> <int> <int> <int> <int>
## 1 0 0 0 0 0 0 0
## 2 1 0 0 0 0 0 0
## # … with 4 more variables: Positive.Feedback.Count <int>, Division.Name <int>,
## # Department.Name <int>, Class.Name <int>
word_sentiment = t(word_sentiment)
(word_sentiment)
## [,1] [,2]
## Recommended.IND 0 1
## X 0 0
## Clothing.ID 0 0
## Age 0 0
## Title 0 0
## Review.Text 0 0
## Rating 0 0
## Positive.Feedback.Count 0 0
## Division.Name 0 0
## Department.Name 0 0
## Class.Name 0 0
CORPUS
tidy_books <- austen_books() %>%
group_by(book) %>%
mutate(linenumber = row_number(),
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
ignore_case = TRUE)))) %>%
ungroup() %>%
unnest_tokens(word, text)
pride_prejudice <- tidy_books %>%
filter(book == "Pride & Prejudice")
pride_prejudice
## # A tibble: 122,204 × 4
## book linenumber chapter word
## <fct> <int> <int> <chr>
## 1 Pride & Prejudice 1 0 pride
## 2 Pride & Prejudice 1 0 and
## 3 Pride & Prejudice 1 0 prejudice
## 4 Pride & Prejudice 3 0 by
## 5 Pride & Prejudice 3 0 jane
## 6 Pride & Prejudice 3 0 austen
## 7 Pride & Prejudice 7 1 chapter
## 8 Pride & Prejudice 7 1 1
## 9 Pride & Prejudice 10 1 it
## 10 Pride & Prejudice 10 1 is
## # … with 122,194 more rows
afinn <- pride_prejudice %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = linenumber %/% 80) %>%
summarise(sentiment = sum(value)) %>%
mutate(method = "AFINN")
## Joining, by = "word"
bind_rows(afinn) %>%
ggplot(aes(index, sentiment, fill = method)) +
geom_col(show.legend = FALSE) +
facet_wrap(~method, ncol = 1, scales = "free_y")
get_sentiments(lexicon = c("bing", "afinn", "loughran", "nrc"))
## # A tibble: 6,786 × 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # … with 6,776 more rows
get_sentiments("loughran")
## # A tibble: 4,150 × 2
## word sentiment
## <chr> <chr>
## 1 abandon negative
## 2 abandoned negative
## 3 abandoning negative
## 4 abandonment negative
## 5 abandonments negative
## 6 abandons negative
## 7 abdicated negative
## 8 abdicates negative
## 9 abdicating negative
## 10 abdication negative
## # … with 4,140 more rows
Corpus = Corpus(VectorSource(clothing$Review.Text))
Corpus[[1]][1]
## $content
## [1] "Absolutely wonderful - silky and sexy and comfortable"
clothing$Recommended.IND[1]
## [1] 1
Corpus = tm_map(Corpus, PlainTextDocument)
## Warning in tm_map.SimpleCorpus(Corpus, PlainTextDocument): transformation drops
## documents
Corpus
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 2
Github => https://github.com/Gunduzhazal/week10
Rpubs => https://rpubs.com/gunduzhazal/829091