This project uses Amazon product reviews spanning May 1996 and July 2014 to determine whether text is positive or negative. We look specifically at video game reviews. Since the json code for video game reviews is not in strict json, we use python to convert to a pandas dataframe and then export that dataframe to csv.
library(DT)
library(tidytext)
library(dplyr)
library(stringr)
library(sentimentr)
library(ggplot2)
library(RColorBrewer)
library(readr)
library(SnowballC)
library(tm)
library(wordcloud)
library(reticulate)
library(crfsuite)
import pandas as pd
import gzip
def parse(path):
g = gzip.open(path, 'rb')
for l in g:
yield eval(l)
def getDF(path):
i = 0
reviews = {}
for d in parse(path):
df[i] = d
i += 1
return pd.DataFrame.from_dict(df, orient='index')
df = getDF('reviews_Video_Games_5.json.gz')
df.to_csv(r'reviews.csv')
Next, we read that csv file into an R dataframe.
reviews <- readr::read_csv(file = 'reviews.csv')
Now that we have the dataset imported, we can take a peak at the data. The column that contains the review is titled ‘reviewText’ and the column that indicates the rating associated with each review is ‘overall’.
summary(reviews)
## X1 reviewerID asin reviewerName
## Min. : 0 Length:231780 Length:231780 Length:231780
## 1st Qu.: 57945 Class :character Class :character Class :character
## Median :115890 Mode :character Mode :character Mode :character
## Mean :115890
## 3rd Qu.:173834
## Max. :231779
## helpful reviewText overall summary
## Length:231780 Length:231780 Min. :1.000 Length:231780
## Class :character Class :character 1st Qu.:4.000 Class :character
## Mode :character Mode :character Median :5.000 Mode :character
## Mean :4.086
## 3rd Qu.:5.000
## Max. :5.000
## unixReviewTime reviewTime
## Min. :9.399e+08 Length:231780
## 1st Qu.:1.213e+09 Class :character
## Median :1.318e+09 Mode :character
## Mean :1.277e+09
## 3rd Qu.:1.368e+09
## Max. :1.406e+09
In order to begin analyzing the sentiment of each review, we look at the individual sentiments of each word. More speifically, we filter the reviews text to remove any punctuation and stop words then create an individual row for each word.
words <- reviews %>%
select(c("reviewerID", "asin", "overall", "reviewText")) %>%
unnest_tokens(word, reviewText) %>%
filter(!word %in% stop_words$word, str_detect(word, "^[a-z']+$"))
datatable(head(words))
To predict the sentiment of words in this dataset, we use the Afinn list of English words and associated ratings. Each word is ranked from -5 to 5, where 5 is the most positive rating while -5 is the most negative. By joining the Afinn sentiment score with our reviews dataframe, we can compare the two methods of ranking words.
afinn <- get_sentiments("afinn") %>% mutate(word = wordStem(word))
reviews.afinn <- words %>%
inner_join(afinn, by = "word")
head(reviews.afinn)
## # A tibble: 6 x 5
## reviewerID asin overall word score
## <chr> <chr> <dbl> <chr> <int>
## 1 A2HD75EMZR8QLN 0700099867 1 live 2
## 2 A2HD75EMZR8QLN 0700099867 1 dirt -2
## 3 A3UR8NLLY1ZHCX 0700099867 4 huge 1
## 4 A3UR8NLLY1ZHCX 0700099867 4 fan 3
## 5 A1INA0F5CWW3J4 0700099867 1 fake -3
## 6 A1INA0F5CWW3J4 0700099867 1 fake -3
Here, we see the most common words and the average ratings and sentiment scores associated with each word.
word_summary <- reviews.afinn %>%
group_by(word) %>%
summarise(mean_rating = mean(overall), score = max(score), count_word = n()) %>%
arrange(desc(count_word))
datatable(head(word_summary))
We can try to visualize the words associated with each amazon review rating and sentiment score. Most video game ratings fall between 3.5 and 4.5 in this amazon dataset, so we set this range as the filter. The plot below shows that many of these words are divided in two clusters: one with a positive sentiment score and one with a negative sentiment score. The quantity of words with positive Amazon ratings but negative sentiment scores is concerning, so we will look into the effect this has on sentiment by products later on.
ggplot(filter(word_summary, count_word < 50000), aes(mean_rating, score)) + geom_text(aes(label = word, color = count_word, size=count_word), position= position_jitter()) + scale_color_gradient(low = "lightblue", high = "darkblue") + coord_cartesian(xlim=c(3.5,4.5)) + guides(size = FALSE, color=FALSE)
We can look at high-frequency words in the word cloud below.
library(RColorBrewer)
wordcloud(words = word_summary$word, freq = word_summary$count_word, scale=c(5,.5), max.words=300, colors=brewer.pal(8, "Set2"))
Let’s also visualize only the positive words, determined by mean ratings of video games in this dataset - if the word rating is above the mean rating, we classify it as a positive word.
good <- reviews.afinn %>%
group_by(word) %>%
summarise(mean_rating = mean(overall), score = max(score), count_word = n()) %>%
filter(mean_rating>mean(mean_rating)) %>%
arrange(desc(mean_rating))
wordcloud(words = good$word, freq = good$count_word, scale=c(5,.5), max.words=100, colors=brewer.pal(8, "Set2"))
How do negative words differ from the positive ones? Words were considered negative if their mean rating is below the overall mean amazon ratings for all words in this dataset.
bad <- reviews.afinn %>%
group_by(word) %>%
summarise(mean_rating = mean(overall), score = max(score), count_word = n()) %>%
filter(count_word>1000) %>%
filter(mean_rating<mean(mean_rating)) %>%
arrange(mean_rating)
wordcloud(words = bad$word, freq = bad$count_word, scale=c(5,.5), max.words=100, colors=brewer.pal(8, "Set2"))
As mentioned earlier, we should investigate how the sentiment of individual words affects the overall sentiment rating of a product. To do this, we group by asin number (a unique identifier for each video game). Then, we establish the mean rating and mean sentiment of all words associated with ratings for that product.
review_summary <- reviews.afinn %>%
group_by(asin) %>%
summarise(mean_rating = mean(overall), sentiment = mean(score))
datatable(head(review_summary))
We can now plot the relationship between mean rating reviews and mean rating sentiments for products. To determine how successfully the afinn dictionary determined sentiment for this dataset, we divide the plot points into four quadrants:
Successful quadrants are those in which the review and sentiment match. We see that there are more successful datapoints than unsuccessful datapoints, and that there is a weak positive relationship between the review rating and the sentiment. Despite this, there are significant inaccuracies in how words and products reviews are classified in this dataset.
y_mid = 0
x_mid = 3.5
review_summary %>%
mutate(quadrant = case_when(mean_rating > x_mid & sentiment > y_mid ~ "Positive Review/Postive Sentiment",
mean_rating <= x_mid & sentiment > y_mid ~ "Negative Review/Positive Sentiment",
mean_rating <= x_mid & sentiment <= y_mid ~ "Negative Review/Negative Sentiment",
TRUE ~ "Positive Review/Negative Sentiment")) %>%
ggplot(aes(x = mean_rating, y = sentiment, color = quadrant)) +
geom_hline(yintercept=y_mid, color = "black", size=.5) +
geom_vline(xintercept=x_mid, color = "black", size=.5) +
guides(color=FALSE) +
scale_color_manual(values=c("lightgreen", "pink", "pink","lightgreen")) +
ggtitle("Amazon Product Rating vs Sentiment Rating of Review") +
ggplot2::annotate("text", x = 4.33, y=3.5,label="Positive Review/Postive Sentiment") +
ggplot2::annotate("text", x = 2, y=3.5,label="Negative Review/Positive Sentiment") +
ggplot2::annotate("text", x = 4.33, y=-2.5,label="Positive Review/Negative Sentiment") +
ggplot2::annotate("text", x = 2, y=-2.5,label="Negative Review/Negative Sentiment") +
geom_point()
Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering R. He, J. McAuley WWW, 2016