DATA 607 week 10 Assignment

Hazal Gunduz

Introduction

In Text Mining with R, Chapter 2 “https://www.tidytextmining.com/sentiment.html” looks at Sentiment Analysis. In this assignment, you should start by getting the primary example code from chapter 2 working in an R Markdown document. You should provide a citation to this base code. You’re then asked to extend the code in two ways:

-Work with a different corpus of your choosing, and

-Incorporate at least one additional sentiment lexicon (possibly from another R package that you’ve found through research).

As usual, please submit links to both an .Rmd file posted in your GitHub repository and to your code on rpubs.com. You may work as a small team on this assignment.

library(tidytext)
library(NLP)
library(tm)
library(SnowballC)
library(fastDummies)
library(plyr) 
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
library(janeaustenr)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
## 
##     annotate
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble  3.1.5     ✓ purrr   0.3.4
## ✓ tidyr   1.1.4     ✓ forcats 0.5.1
## ✓ readr   2.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x ggplot2::annotate() masks NLP::annotate()
## x dplyr::arrange()    masks plyr::arrange()
## x purrr::compact()    masks plyr::compact()
## x dplyr::count()      masks plyr::count()
## x dplyr::failwith()   masks plyr::failwith()
## x dplyr::filter()     masks stats::filter()
## x dplyr::id()         masks plyr::id()
## x dplyr::lag()        masks stats::lag()
## x dplyr::mutate()     masks plyr::mutate()
## x dplyr::rename()     masks plyr::rename()
## x dplyr::summarise()  masks plyr::summarise()
## x dplyr::summarize()  masks plyr::summarize()
library(textdata)
get_sentiments(lexicon = c("afinn"))
## # A tibble: 2,477 × 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # … with 2,467 more rows
get_sentiments("bing")
## # A tibble: 6,786 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # … with 6,776 more rows
get_sentiments("nrc")
## # A tibble: 13,875 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # … with 13,865 more rows
clothing <- read.csv("Womens Clothing E-Commerce Reviews.csv")
head(clothing)
##   X Clothing.ID Age                   Title
## 1 0         767  33                        
## 2 1        1080  34                        
## 3 2        1077  60 Some major design flaws
## 4 3        1049  50        My favorite buy!
## 5 4         847  47        Flattering shirt
## 6 5        1080  49 Not for the very petite
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Review.Text
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                Absolutely wonderful - silky and sexy and comfortable
## 2                                                                                                                                                                                                      Love this dress!  it's sooo pretty.  i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite.  i bought a petite and am 5'8".  i love the length on me- hits just a little below the knee.  would definitely be a true midi on someone who is truly petite.
## 3 I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c
## 4                                                                                                                                                                                                                                                                                                                                                                                         I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!
## 5                                                                                                                                                                                                                                                                                                                     This shirt is very flattering to all due to the adjustable front tie. it is the perfect length to wear with leggings and it is sleeveless so it pairs well with any cardigan. love this shirt!!!
## 6             I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.
##   Rating Recommended.IND Positive.Feedback.Count  Division.Name Department.Name
## 1      4               1                       0      Initmates        Intimate
## 2      5               1                       4        General         Dresses
## 3      3               0                       0        General         Dresses
## 4      5               1                       0 General Petite         Bottoms
## 5      5               1                       6        General            Tops
## 6      2               0                       4        General         Dresses
##   Class.Name
## 1  Intimates
## 2    Dresses
## 3    Dresses
## 4      Pants
## 5    Blouses
## 6    Dresses
afinnSL <- get_sentiments("afinn")
table(afinnSL$value)
## 
##  -5  -4  -3  -2  -1   0   1   2   3   4   5 
##  16  43 264 966 309   1 208 448 172  45   5
bingSL <- get_sentiments("bing")
table(bingSL$sentiment)
## 
## negative positive 
##     4781     2005
nrcSL <- get_sentiments("nrc")
table(nrcSL$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##         1246          837         1056         1474          687         3318 
##     positive      sadness     surprise        trust 
##         2308         1187          532         1230
clothing <- read.csv("Womens Clothing E-Commerce Reviews.csv")
summary(clothing)
##        X          Clothing.ID          Age              Title      
##  Min.   :    0   Min.   :   0.0   Min.   :18.0             : 3810  
##  1st Qu.: 5871   1st Qu.: 861.0   1st Qu.:34.0   Love it!  :  136  
##  Median :11742   Median : 936.0   Median :41.0   Beautiful :   95  
##  Mean   :11742   Mean   : 918.1   Mean   :43.2   Love      :   88  
##  3rd Qu.:17614   3rd Qu.:1078.0   3rd Qu.:52.0   Love!     :   84  
##  Max.   :23485   Max.   :1205.0   Max.   :99.0   Beautiful!:   72  
##                                                  (Other)   :19201  
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Review.Text   
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        :  845  
##  Perfect fit and i've gotten so many compliments. i buy all my suits from here now!                                                                                                                                                                                                                                                                                                                                                                                                                                    :    3  
##  I bought this shirt at the store and after going home and trying it on, i promptly went online and ordered two more! i've gotten multiple compliments anytime i wear any of them. great for looking put together with no fuss. \npeople that have commented there's were destroyed in the wash didn't read the care label which says dry clean.                                                                                                                                                                       :    2  
##  I purchased this and another eva franco dress during retailer's recent 20% off sale. i was looking for dresses that were work appropriate, but that would also transition well to happy hour or date night. they both seemed to be just what i was looking for. i ordered a 4 regular and a 6 regular, as i am usually in between sizes. the 4 was definitely too small. the 6 fit, technically, but was very ill fitting. not only is the dress itself short, but it is very short-waisted. i am only 5'3", but it fe:    2  
##  Lightweight, soft cotton top and shorts. i think it's meant to be a beach cover-up but i'm wearing it as a thin, light-weight summer outfit on these hot hot days. the top has a loose elastic around the bottom which i didn't realize when i ordered it, but i like it and it matches the look in the photos. and the shorts are very low-cut - don't expect them up around your waist. again, i like that. some might want to wear a cami underneath because it's a thin cotton but i'm fine as-is. i bought it i  :    2  
##  Love, love these jeans. being short they come right to my ankle. super soft and don?t require any hemming. i ordered my typical jean size of 26 and they fit like a glove. would love to have these in black and grey.                                                                                                                                                                                                                                                                                                :    2  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               :22630  
##      Rating      Recommended.IND  Positive.Feedback.Count
##  Min.   :1.000   Min.   :0.0000   Min.   :  0.000        
##  1st Qu.:4.000   1st Qu.:1.0000   1st Qu.:  0.000        
##  Median :5.000   Median :1.0000   Median :  1.000        
##  Mean   :4.196   Mean   :0.8224   Mean   :  2.536        
##  3rd Qu.:5.000   3rd Qu.:1.0000   3rd Qu.:  3.000        
##  Max.   :5.000   Max.   :1.0000   Max.   :122.000        
##                                                          
##         Division.Name   Department.Name     Class.Name  
##                :   14           :   14   Dresses :6319  
##  General       :13850   Bottoms : 3799   Knits   :4843  
##  General Petite: 8120   Dresses : 6319   Blouses :3097  
##  Initmates     : 1502   Intimate: 1735   Sweaters:1428  
##                         Jackets : 1032   Pants   :1388  
##                         Tops    :10468   Jeans   :1147  
##                         Trend   :  119   (Other) :5264
clothing <- read.csv("Womens Clothing E-Commerce Reviews.csv") 
head(clothing) 
##   X Clothing.ID Age                   Title
## 1 0         767  33                        
## 2 1        1080  34                        
## 3 2        1077  60 Some major design flaws
## 4 3        1049  50        My favorite buy!
## 5 4         847  47        Flattering shirt
## 6 5        1080  49 Not for the very petite
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Review.Text
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                Absolutely wonderful - silky and sexy and comfortable
## 2                                                                                                                                                                                                      Love this dress!  it's sooo pretty.  i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite.  i bought a petite and am 5'8".  i love the length on me- hits just a little below the knee.  would definitely be a true midi on someone who is truly petite.
## 3 I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c
## 4                                                                                                                                                                                                                                                                                                                                                                                         I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!
## 5                                                                                                                                                                                                                                                                                                                     This shirt is very flattering to all due to the adjustable front tie. it is the perfect length to wear with leggings and it is sleeveless so it pairs well with any cardigan. love this shirt!!!
## 6             I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.
##   Rating Recommended.IND Positive.Feedback.Count  Division.Name Department.Name
## 1      4               1                       0      Initmates        Intimate
## 2      5               1                       4        General         Dresses
## 3      3               0                       0        General         Dresses
## 4      5               1                       0 General Petite         Bottoms
## 5      5               1                       6        General            Tops
## 6      2               0                       4        General         Dresses
##   Class.Name
## 1  Intimates
## 2    Dresses
## 3    Dresses
## 4      Pants
## 5    Blouses
## 6    Dresses
ggplot(data = clothing, aes(x = Age)) + geom_histogram( fill = "green") 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

=> The mean age of is 43 and the distribution seems to be positively skewed to the right.

clothing %>%
  ggplot(aes(x = factor(Recommended.IND), fill = Recommended.IND)) +
    geom_bar(alpha = 0.9) +
    guides(fill = FALSE)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

=> The graph shows the distribution of the positive reviews “1” of the data.

clothing %>%
  ggplot(aes(x = factor(Department.Name), fill = Department.Name)) +
    geom_bar(alpha = 0.9) +
    guides(fill = FALSE)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

=> The graph above displays the count or purchases of each department.

hist(clothing$Age)

=> The graph above shows the frequency of ages among of purchases.

word_sentiment = clothing %>%
  group_by(Recommended.IND) %>%
  summarise_each(funs(sum()))
## Warning: `summarise_each_()` was deprecated in dplyr 0.7.0.
## Please use `across()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
 word_sentiment
## # A tibble: 2 × 11
##   Recommended.IND     X Clothing.ID   Age Title Review.Text Rating
##             <int> <int>       <int> <int> <int>       <int>  <int>
## 1               0     0           0     0     0           0      0
## 2               1     0           0     0     0           0      0
## # … with 4 more variables: Positive.Feedback.Count <int>, Division.Name <int>,
## #   Department.Name <int>, Class.Name <int>
word_sentiment = t(word_sentiment)
(word_sentiment)
##                         [,1] [,2]
## Recommended.IND            0    1
## X                          0    0
## Clothing.ID                0    0
## Age                        0    0
## Title                      0    0
## Review.Text                0    0
## Rating                     0    0
## Positive.Feedback.Count    0    0
## Division.Name              0    0
## Department.Name            0    0
## Class.Name                 0    0

CORPUS

tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]", 
                                                 ignore_case = TRUE)))) %>%
  ungroup() %>%
  unnest_tokens(word, text)
pride_prejudice <- tidy_books %>%
  filter(book == "Pride & Prejudice")

  pride_prejudice
## # A tibble: 122,204 × 4
##    book              linenumber chapter word     
##    <fct>                  <int>   <int> <chr>    
##  1 Pride & Prejudice          1       0 pride    
##  2 Pride & Prejudice          1       0 and      
##  3 Pride & Prejudice          1       0 prejudice
##  4 Pride & Prejudice          3       0 by       
##  5 Pride & Prejudice          3       0 jane     
##  6 Pride & Prejudice          3       0 austen   
##  7 Pride & Prejudice          7       1 chapter  
##  8 Pride & Prejudice          7       1 1        
##  9 Pride & Prejudice         10       1 it       
## 10 Pride & Prejudice         10       1 is       
## # … with 122,194 more rows
afinn <- pride_prejudice %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = linenumber %/% 80) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN")
## Joining, by = "word"
bind_rows(afinn) %>%
  ggplot(aes(index, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y")

get_sentiments(lexicon = c("bing", "afinn", "loughran", "nrc"))
## # A tibble: 6,786 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # … with 6,776 more rows
get_sentiments("loughran")
## # A tibble: 4,150 × 2
##    word         sentiment
##    <chr>        <chr>    
##  1 abandon      negative 
##  2 abandoned    negative 
##  3 abandoning   negative 
##  4 abandonment  negative 
##  5 abandonments negative 
##  6 abandons     negative 
##  7 abdicated    negative 
##  8 abdicates    negative 
##  9 abdicating   negative 
## 10 abdication   negative 
## # … with 4,140 more rows
Corpus = Corpus(VectorSource(clothing$Review.Text))
Corpus[[1]][1]
## $content
## [1] "Absolutely wonderful - silky and sexy and comfortable"
clothing$Recommended.IND[1]
## [1] 1
Corpus = tm_map(Corpus, PlainTextDocument)
## Warning in tm_map.SimpleCorpus(Corpus, PlainTextDocument): transformation drops
## documents
Corpus
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 2

Github => https://github.com/Gunduzhazal/week10

Rpubs => https://rpubs.com/gunduzhazal/829091