A Tool for Amazon Product Rating and Review Analysis

WQD7001 Principle of Data Science

Group Assignment: Develop ShinyApps

Member's Name Student ID
Liow Wei Jie S2016012
Muhammad Umair S2001767
Teo Boon Long 17198093
Kong Mun Yeen 17055182

Introduction

Most of the common rating systems only showcase the average rating scale from 1 to 5. However, this type of rating showcase is too general and if buyer want to know the true comment/review of the interested product, they will have to browser through the comment section one by one.

Therefore, the purpose of this apps is to help buyers able to quickly browse through the reviews summary and gain a more detailed understanding on the product via word cloud. Two word clouds are being generated of this sentiment analysis of both positive reviews and negative reviews.

Data Source for Word Cloud

Positive review and negative review are differentiate by the product rating. Rating more or equal to 3 is consider as good review while rating less than 3 is consider as bad review. The whole review sentence is being reduced to the keyword that can well represent the specific review.

Below is the sample coding to retrieve the keywords in review as the data source for word cloud.

    good_t_w <- t %>% filter(title == input$ProductSearch) %>%filter(overall>=3) %>%select(summary)
        mycorpus <- Corpus(VectorSource(bad_t_w $summary))
        mycorpus <-tm_map(mycorpus,content_transformer(tolower))
        mycorpus <- tm_map(mycorpus, removeNumbers)
        mycorpus <- tm_map(mycorpus, removeWords,stopwords("english"))
        mycorpus <- tm_map(mycorpus,removePunctuation)
        mycorpus <- tm_map(mycorpus,stripWhitespace)
        mycorpus <- tm_map(mycorpus, removeWords, c("one","two","three","four","five", "star","stars")) 
        token_delim <- " \\t\\r\\n.!?,;\"()"
        bitoken <- NGramTokenizer(mycorpus, Weka_control(min=2,max=2, delimiters = token_delim))
        two_word <- data.frame(table(bitoken))
        sort_two <- two_word[order(two_word$Freq,decreasing=TRUE),]

Coding on generating Word Cloud

By taking in the filtered keywords as data source, it is then feeds into a render plot to generate the word cloud.

output$good_review <- renderPlot({
        x <- datasource()$bitoken
        y <- datasource()$Freq
        minfreq_bigram <- 2
        wordcloud(x,y,random.order=FALSE,scale = c(3,0.5),min.freq = minfreq_bigram,colors = brewer.pal(8,"Dark2"),
                  max.words=50)})

Sample Output and Future Work

Positive WordCloudNegative WordCloud

Due to the limiting processing power, only product data within Appliances category being selected for this apps. Future work can be carry out by including all categories of product within the Amazon dataset.

Link to shinyApps: https://group-a-pds.shinyapps.io/shiny-ui/#!/

Link to GitHub:link{target=“_blank”}