Review Text Analysis

Introduction

  • The value proposition of Yelp is the word-of-mouth based on consumer review and star-rating.

  • The objective is to understand choice-of-word of consumer through mining the Review Dataset and analyse the sentiments based on differences between positive and negative terms for each star-rating.

  • Questions:

    (a) What are the consumer choice-of-word in review text ?

    (b) How is the difference in positive and negative terms between review text correlate to star-rating ?

Methods and Data

  • The hypothesis is that initutively consumer uses more positive terms in review text will give higher star-rating.

  • The methods and data used are:

    (1) Examine Review Dataset containing the review text and star-rating from Yelp Academic JSON. Use tmMAP for preprocsssing and NGramTokenizer to tokenize bags of words.

    (2) Use TermDocumentMatrix to process the frequency of the terms used in the review text per star-rating and Opinion-Lexicon by Hu and Liu for list of postive and negative terms.

Results

plot of chunk unnamed-chunk-1

plot of chunk unnamed-chunk-2

Discussion

  • The WordCloud helps business to understand choice-of-word of customer. Positive terms tends to dominate the review text which means data is highly skewed.

  • The text review is analysed according to different star-rating to refine the results. Lower star-rating tends to have less positive terms, and higher star-rating tends to have more positive terms. The sentiment distribution of star-rating is centered between 3-4.

  • Based on the correlation between star-rating and the review text, similiar algorithm can be used to build star-rating prediction model based on review text.