Marty Gaupp
Nov 2015
Problem Statement: Are bad ratings (those receiving 1 star) more useful than good ratings (those receiving 5 stars), or vice versa?
I will analyze the Yelp reviews dataset to answer this question so that I can help users of Yelp determine what types of reviews they should trust more - good ratings or bad ratings.
Results of exploratory analysis on the votes.useful variable in the Reviews dataset
Too hard to tell which rating is more useful, so turn to statistics…
\[ \mbox{Relative_Usefulness} = \frac{\mbox{Useful_Vote_Count}}{\mbox{Rating_Count}} \]
| stars | Rating_Count | Useful_Vote_Count | Relative_Usefulness |
|---|---|---|---|
| 1 | 159,811 | 210,546 | 1.32 |
| 5 | 579,527 | 564,130 | 0.97 |
\[ \begin{array}{l} \mbox{H}_0: \mbox{Relative Usefulness}_1 \leq \mbox{Relative Usefulness}_5 \\ \mbox{H}_1: \mbox{Relative Usefulness}_1 > \mbox{Relative Usefulness}_5 \\ \mbox{test stat: } 55.647 \\ \mbox{p-value: } 0 \mbox{ therefore reject H}_0 \mbox{ and conclude H}_1 \\ \end{array} \]
| OneCntBetter | OnePercBetter | FiveCntBetter | FivePercBetter | NumOfBusinesses |
|---|---|---|---|---|
| 13,530 | 18,901 | 30,724 | 26,067 | 60,785 |
\[ \begin{array}{l} \mbox{H}_0: \mbox{# 1 Star Counts/Percents More Useful} \geq \mbox{# 5 Star Counts/Percents More Useful} \\ \mbox{H}_1: \mbox{# 1 Star Counts/Percents More Useful} < \mbox{# 5 Star Counts/Percents More Useful} \\ \mbox{test stat: } -125.443 \mbox{ and } -48.408 \\ \mbox{p-value: } 0 \mbox{ and } 0 \mbox{ therefore reject H}_0 \mbox{ and conclude H}_1 \\ \end{array} \]