Objective: Performing Sentiment Analysis on Amazon Alexa reviews data set
Source: Kaggle
About Data:
This dataset consists of a nearly 3000 Amazon customer reviews (input text), star ratings, date of review, variant and feedback of various amazon Alexa products like Alexa Echo, Echo dots, Alexa Firesticks etc.
Let us take a look at the data:
alexa<-as.data.frame(fread("amazon_alexa.tsv"))
alexa <- as.tibble(alexa)
x <- head(alexa, n = 10)
datatable(x, caption = "Table: Data")
The dataset has reviews from 3150 alexa users.
The different alexa products that are reviewed:
alexa %>%
group_by(variation) %>%
count() %>%
arrange(desc(n))
## # A tibble: 16 x 2
## # Groups: variation [16]
## variation n
## <chr> <int>
## 1 Black Dot 516
## 2 Charcoal Fabric 430
## 3 Configuration: Fire TV Stick 350
## 4 Black Plus 270
## 5 Black Show 265
## 6 Black 261
## 7 Black Spot 241
## 8 White Dot 184
## 9 Heather Gray Fabric 157
## 10 White Spot 109
## 11 White 91
## 12 Sandstone Fabric 90
## 13 White Show 85
## 14 White Plus 78
## 15 Oak Finish 14
## 16 Walnut Finish 9
Let us take a look at the ratings:
ggplot(alexa , aes(rating)) +
geom_bar()
Clearly, we can see that most of the reviews are positive. Very few reviews are negative having rating <=2. Let us take a look at some of the negative reviews:
alexa_reviews <- alexa %>%
filter(rating == 2 | rating == 1) %>%
select(verified_reviews)
alexa_head <- head(alexa_reviews, n = 10)
datatable(alexa_head, caption = "Reviews with rating <=2")
Let us take a look at the word cloud:
We observe words like echo, amazon, product, music more frequenty. Few not-so-good words are also observed. Let’s remove the obvious words and look deep. Coloring postitive words in green and negative words in red:
check <- alexa %>% unnest_tokens(word, verified_reviews)
check1 <- check %>%
group_by(word) %>%
mutate(freq = n()) %>%
select(rating, variation, feedback, word, freq)
checked <- check1 %>% inner_join(get_sentiments("bing"), by = "word")
what <- checked %>%
count(word, sentiment) %>%
mutate(color = ifelse(sentiment == "positive", "darkgreen", "red"))
wordcloud(what$word, what$n, random.order = FALSE, colors = what$color, ordered.colors = TRUE)
We observe that positive words like love and great are dominating. However, there are also negatives ones like- disappointing, frustrating, disabled. Let us closely look at only the negative words:
below_rated <- checked %>%
filter(rating <= 2) %>%
count(word, rating, sentiment) %>%
filter(sentiment == 'negative')
wordcloud(below_rated$word, below_rated$n, max.words = 100, random.order = FALSE)
Let us check the ratio of positive/negative words for the different alexa products.
checked %>%
group_by(variation, sentiment) %>%
summarize(freq = mean(freq)) %>%
spread(sentiment, freq) %>%
ungroup() %>%
mutate(ratio = positive/negative,
variation = reorder(variation, ratio)) %>%
ggplot(aes(variation, ratio)) +
geom_point() +
coord_flip()
Conclusion: White Show is appreciated the most whereas Black Spot isn’t taken well.
Move on to the next tab for sentiment analysis using different lexicons such as bing, AFINN, and nrc.
Sentiment Analysis Using BING:
alexa2 <- alexa1 %>%
unnest_tokens(word, verified_reviews)
alexa2$variation <- as.factor(alexa2$variation)
alexa2 %>%
group_by(variation) %>%
inner_join(get_sentiments("bing")) %>%
count(variation, review.no = review.no , sentiment) %>%
ungroup() %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative,
variation = factor(variation)) %>%
ggplot(aes(review.no, sentiment, fill = variation)) +
geom_bar(alpha = 0.5, stat = "identity", show.legend = FALSE) +
facet_wrap(~ variation, ncol = 2, scales = "free_x")
Comparing Sentiments Using BING, AFINN, and NRC
AFINN has a range from -5 (strong -ve) to +5 (strong +ve) for measuring sentiment for different words. BING and NRC are more similar as they are used for binary classification of sentiment and we have used positive sentiment = +1 and negative sentiment = -1 for these 2 lexicons. Hence, AFINN will have higher peaks compared with the other two lexicons.
The overall trend for these 3 lexicons looks similar. Due to the variation in scoring, AFINN will have different sentiment score.
afinn <- alexa2 %>%
group_by(variation) %>%
inner_join(get_sentiments("afinn")) %>%
group_by(variation, review.no) %>%
summarise(sentiment = sum(score)) %>%
mutate(method = "AFINN")
bing_and_nrc <- bind_rows(alexa2 %>%
group_by(variation) %>%
inner_join(get_sentiments("bing")) %>%
mutate(method = "Bing"),
alexa2 %>%
group_by(variation) %>%
inner_join(get_sentiments("nrc") %>%
filter(sentiment %in% c("positive", "negative"))) %>%
mutate(method = "NRC")) %>%
count(variation, method, review.no = review.no , sentiment) %>%
ungroup() %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) %>%
dplyr::select(variation, review.no, method, sentiment)
bind_rows(afinn,
bing_and_nrc) %>%
ungroup() %>%
mutate(variation = factor(variation)) %>%
ggplot(aes(review.no, sentiment, fill = method)) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
facet_grid(variation ~ method)
Comparing Sentiments - Negative Reviews
I have visually compared the three lexicon’s performance in classifying a review as negative
Only negative reviews have been used and we should expect scores to be negative and below the axis
We can clearly see in the following figure that some reviews are wrongly classified as positive
afinn.neg <- alexa2 %>% filter(rating <3) %>%
group_by(variation) %>%
inner_join(get_sentiments("afinn")) %>%
group_by(variation, review.no) %>%
summarise(sentiment = sum(score)) %>%
mutate(method = "AFINN")
bing_and_nrc.neg <- bind_rows(alexa2 %>%
filter(rating <3) %>%
group_by(variation) %>%
inner_join(get_sentiments("bing")) %>%
mutate(method = "Bing"),
alexa2 %>%
filter(rating <3) %>%
group_by(variation) %>%
inner_join(get_sentiments("nrc") %>%
filter(sentiment %in% c("positive", "negative"))) %>%
mutate(method = "NRC")) %>%
count(variation, method, review.no = review.no , sentiment) %>%
ungroup() %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) %>%
dplyr::select(variation, review.no, method, sentiment)
bind_rows(afinn.neg, bing_and_nrc.neg) %>%
ungroup() %>%
mutate(variation = factor(variation)) %>%
ggplot(aes(review.no, sentiment, fill = method)) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
facet_grid(variation ~ method)
Comparing Sentiment Analysis Results with Customer Reviews
Customer review score range: 1-5 with being bad and 5 being good
If actual customer review <=2, then that review has been considered as negative. Otherwise non -negative
If sentiment score <0 then negative, otherwise non negative
# get sentiment of each review by adding up sentiment scores at sentence level
alexa_sentences <- alexa1 %>% unnest_tokens(sentence, verified_reviews, token = "sentences")
# text is verified_review
abc <- alexa_sentences %>%
group_by(review.no) %>%
mutate(sentence_num = 1:n()) %>%
unnest_tokens(word, sentence) %>%
inner_join(get_sentiments("afinn")) %>%
group_by(variation, review.no) %>%
summarise(sentiment = sum(score, na.rm = TRUE)) %>%
mutate(calcul.sentiment = ifelse(sentiment<=0,"negative", "non-negative"))
table(abc$calcul.sentiment)
##
## negative non-negative
## 81 974
abc1 <- abc %>% left_join(alexa1, c("variation", "review.no")) %>%
mutate(actual.sent = ifelse(rating<=2,"negative", "non-negative")) %>%
mutate(match= ifelse(actual.sent==calcul.sentiment,1,0))
table(abc1$actual.sent, abc$calcul.sentiment, dnn=c("Actual","Measured Sentiment"))
## Measured Sentiment
## Actual negative non-negative
## negative 29 30
## non-negative 52 944
Bigram Analysis
alexa3 <- alexa1 %>%
unnest_tokens(bigram, verified_reviews, token = "ngrams", n = 2)
# Common bigrams
alexa3 %>%
count(bigram, sort = TRUE)
## # A tibble: 11,428 x 2
## bigram n
## <chr> <int>
## 1 love it 155
## 2 i love 117
## 3 the echo 110
## 4 set up 100
## 5 easy to 98
## 6 i have 87
## 7 to set 77
## 8 it is 76
## 9 echo dot 75
## 10 i am 73
## # ... with 11,418 more rows
Removing stop words - only from word 1 and looking at diiferent variations
# removing stop words - only from word 1 and not word 2. Information lost - example: "love it"
# if "it" is removed, "love" will automatically be removed
x <- alexa3 %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word1 %in% c("echo", "prime", "black", "dot", "white", "alexa") ) %>%
count(word1, word2, sort = TRUE) %>% filter(!is.na(word1))
alexa3 %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word1 %in% c("echo", "prime", "black", "dot", "white", "alexa")) %>%
count(variation, word1, word2, sort = TRUE) %>% filter(!is.na(word1)) %>%
unite("bigram", c(word1, word2), sep = " ") %>%
group_by(variation) %>%
top_n(10) %>%
ungroup() %>%
ggplot(aes(reorder(bigram, n), n, fill = variation)) +
geom_bar(stat = "identity", alpha = .8, show.legend = FALSE) +
facet_wrap(~ variation, ncol = 2, scales = "free") +
coord_flip()
Negating Word COmbinations
alexa3 %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(word1 == "not") %>%
count(variation, word1, word2, sort = TRUE)
## # A tibble: 124 x 4
## variation word1 word2 n
## <chr> <chr> <chr> <int>
## 1 Black Dot not sure 6
## 2 Black Dot not the 6
## 3 Black Dot not a 4
## 4 Black Dot not as 4
## 5 Black Dot not have 4
## 6 Black Dot not impressed 4
## 7 Charcoal Fabric not have 4
## 8 Charcoal Fabric not like 4
## 9 Charcoal Fabric not to 4
## 10 Charcoal Fabric not very 4
## # ... with 114 more rows
Wrong sentiment scores because of not considering negation:
AFINN <- get_sentiments("afinn")
(nots <- alexa3 %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(word1 == "not") %>%
inner_join(AFINN, by = c(word2 = "word")) %>%
count(word2, score, sort = TRUE)
)
## # A tibble: 18 x 3
## word2 score n
## <chr> <int> <int>
## 1 like 2 9
## 2 allow 1 4
## 3 impressed 3 4
## 4 awkward -2 2
## 5 bad -3 2
## 6 easy 1 2
## 7 miss -2 2
## 8 perfect 3 2
## 9 satisfied 2 2
## 10 supporting 1 2
## 11 true 2 2
## 12 worth 2 2
## 13 disappointed -2 1
## 14 good 3 1
## 15 happy 3 1
## 16 recommend 2 1
## 17 regret -2 1
## 18 worry -3 1
nots %>%
mutate(contribution = n * score) %>%
arrange(desc(abs(contribution))) %>%
head(20) %>%
ggplot(aes(reorder(word2, contribution), n * score, fill = n * score > 0)) +
geom_bar(stat = "identity", show.legend = FALSE) +
xlab("Words preceded by 'not'") +
ylab("Sentiment score multiplied by no. of occurrances") +
coord_flip()
Words preceded by ‘not’ and ‘no’
# looking at a number of negation words
negation_words <- c("not", "no", "without")
negated <- alexa3 %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(word1 %in% negation_words) %>%
inner_join(AFINN, by = c(word2 = "word")) %>%
count(word1, word2, score, sort = TRUE) %>%
ungroup()
negated %>%
mutate(contribution = n * score) %>%
arrange(desc(abs(contribution))) %>%
group_by(word1) %>%
top_n(10, abs(contribution)) %>%
ggplot(aes(reorder(word2, contribution), contribution, fill = contribution > 0)) +
geom_bar(stat = "identity", show.legend = FALSE) +
xlab("Words preceded by 'not'") +
ylab("Sentiment score multiplied by no. of occurrances") +
facet_wrap(~ word1, scales = "free") +
coord_flip()
We found that the 3 lexicons gave a similar trend for sentiment scores for reviews (visual inspection)
We compared the negative reviews (customer review<2) with the sentiment score. Only 7.5% of the reviews were wrongly misclassified. Hence, the sentiment analysis gave us decent results.
The primary reason of misclassification is ignoring the effect of negating words such as “not good”, “not recommend”, etc.