Is the review rating different between the local residents and the tourists in Nevada ?

smjb

Objective of the study
We investigate whether the ratings and reviews given by the the local resident are significantly different compared to the tourists visiting the town. As the dataset do not provide the user residency, we have devised a simple method to identify each user residency up to the state level. We selected reviews for 4 business catagories in Nevada (Restaurants, Hotels & Travel, Arts & Entertainment and Nightlife) made by residents of Nevada, Arizona and North Carolina to answer the question.

Methodology

  • The user residency is determined by finding the state he reviewed the most. We simply group the review by user and state of the business and count the number of review he made (by linking few tables). We only consider users with more than 5 reviews.
  • The null hypothesis is that the rating made by users is not dependant on the users residency. We tested the null hypothesis accross the three user group for each business category.
  • We also use sentiment analysis to gauge the sentiments of the review and test the null hypothesis of the sentiments calculated. The null hypothesis is that the sentiment difference should follow the rating differences.

Results for Rating

plot of chunk plotratee

The figure on the left shows the p-value of the significance test of the rating given by users from different states. p-value of less or equal to 0.05 are considered significant at 95%.
  • Null hypothesis test at 95% significants shows Arizonans rating is significantly different from Nevadans except for Nightlife business.
  • North Carolinian ratings are only significantly different compared to Nevadans on Restaurant businesses.
  • No significant difference is detected between the reviews from Arizonans and North Carolinians.

Comparison with Sentiment Analysis

plot of chunk statsum

The figure on the left shows the comparison between the p-value of the rating significance test and p-value of the sentiment polarity significance test. Please refer to the paper for more detailed summary

  • All rating pairs which was significant is also significant in the sentiment polarity pair
  • The two plots show that Arizonans are rating Nevadans business lower than Nevadans
  • The sentiment polarity plot also confirms that the North Carolinians only review differently from Nevadans for Restaurant category

Conclusion

  • Based on the analysis made on the available dataset, it is highly probable that the Arizonans are rating Nevadans businesses significantly lower than Nevadans.
  • North Carolinians ratings are not significantly different compared to Arizonans or Nevadans, except for Restaurants where North Carolinians is tangibly disagreeing with Nevadans ratings.
  • Due to the data from North Carolinians ratings, we cannot definitively conclude that the Nevadans are less critical of the businesses in Nevada compared to the tourists
  • The results also suggest that the current user residency classification is good enough for the analysis presented in the paper.
  • To study more detailed on the possible causes of the bias, we may need to go into smaller area comparisons such as city and needs a more precise user residency informtion. The current residency classification menthod is not robust against high reviewing activities within a short period. It may classify the wrong residency to the user if a user reviews a lot while in vacation but not when he is at home.
  • Without the true user residency data, a more detailed insight may have bigger and hidden error margin making the analysis potentially misleading and inconclusive.