Project Description

“Crowdsourcing is a specific sourcing model in which individuals or organizations use contributions from Internet users to obtain needed services or ideas.” -https://en.wikipedia.org/wiki/Crowdsourcing

In this project, reviews from Amazon will be used to crowdsource the overall opinion of a fine food. The data spans a period of more than 10 years, including more than 500,000 reviews up to October 2012.


Data Explanation

The data contains ten columns of information:

  1. Id - Id of the review
  2. ProductId - Id of the product
  3. UserId - Id of the user
  4. ProfileName - Name of the user
  5. HelpfulnessNumerator - Number of users who found the review helpful
  6. HelpfulnessDenominator - Number of users who viewed the review
  7. Score - Rating of the product
  8. Time - Time of the review (UNIX time)
  9. Summary - Review summary
  10. Text - Text of the review

The product that the crowsourced rating is going to be found for is ProductId = B007JFMH8M, because this product has the most reviews. This product id is for Quaker Soft Baked Oatmeal Raisin Cookies. Just these reviews are collected below.

allReviews = read.csv("Reviews.csv")
cookieReviews = allReviews[allReviews$ProductId=="B007JFMH8M",]

The Text column is also cut out because it’s taking up memory and won’t be used in any computation. Once this is done, a summary of the data is displayed.

cookieReviews = subset(cookieReviews, select = -Text)
summary(cookieReviews)
##        Id              ProductId              UserId     ProfileName 
##  Min.   :562971   B007JFMH8M:913   A100CFHPG2AIP :  1   Becky  :  3  
##  1st Qu.:563199   0006641040:  0   A100WO06OQR8BQ:  1   Jen    :  3  
##  Median :563427   141278509X:  0   A1011PVIALNY6J:  1   Jessica:  3  
##  Mean   :563427   2734888454:  0   A102EES14OUWJ6:  1   Amanda :  2  
##  3rd Qu.:563655   2841233731:  0   A103USS4JZN50K:  1   Amber  :  2  
##  Max.   :563883   7310172001:  0   A104FLKIMW5MRP:  1   Amy    :  2  
##                   (Other)   :  0   (Other)       :907   (Other):898  
##  HelpfulnessNumerator HelpfulnessDenominator     Score      
##  Min.   :0.00000      Min.   :0.00000        Min.   :1.000  
##  1st Qu.:0.00000      1st Qu.:0.00000        1st Qu.:4.000  
##  Median :0.00000      Median :0.00000        Median :5.000  
##  Mean   :0.04491      Mean   :0.05038        Mean   :4.583  
##  3rd Qu.:0.00000      3rd Qu.:0.00000        3rd Qu.:5.000  
##  Max.   :5.00000      Max.   :5.00000        Max.   :5.000  
##                                                             
##       Time                                        Summary   
##  Min.   :1.342e+09   Yummy                            : 22  
##  1st Qu.:1.342e+09   Yummy!                           : 21  
##  Median :1.342e+09   Quaker Soft Baked Oatmeal Cookies: 18  
##  Mean   :1.342e+09   Delicious                        : 16  
##  3rd Qu.:1.343e+09   Delicious!                       : 16  
##  Max.   :1.351e+09   Quaker Soft Baked Oatmeal Cookie :  9  
##                      (Other)                          :811

Computations

For this crowdsourcing analysis, the average review scores (1-5) will be taken from each customer who made a review. Each of the customers will also be weighted based on how helpful their review was to others. If more than half of people found their review helpful, then their score will be given a weight of 75%. If less than half of people found their review helpful, then their score will be given a weight of 25%.

First the fraction of helpfulness needs to be computed for each review.

cookieReviews$Helpfulness = cookieReviews$HelpfulnessNumerator / cookieReviews$HelpfulnessDenominator
cookieReviews$Helpfulness[is.na(cookieReviews$Helpfulness)] = 0
summary(cookieReviews$Helpfulness)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.00000 0.03158 0.00000 1.00000


Now the averages of the scores for both categories of helpfulness can be determined.

aboveAve = mean(cookieReviews$Score[cookieReviews$Helpfulness > .50])
aboveAve
## [1] 4.142857
belowAve = mean(cookieReviews$Score[cookieReviews$Helpfulness <= .50])
belowAve
## [1] 4.59661


The weights are applied.

aboveAve = aboveAve*0.75
aboveAve
## [1] 3.107143
belowAve = belowAve*0.25
belowAve
## [1] 1.149153


The final score is computed.

finalScore = aboveAve + belowAve
finalScore
## [1] 4.256295


The crowdsourced score for Quaker Soft Baked Oatmeal Raisin Cookies considering the degree of helpfulness for each review is about 4.26.


References

Crowdsourcing. (2017, December 05). Retrieved December 17, 2017, from https://en.wikipedia.org/wiki/Crowdsourcing

W. (2017, November 27). How to Calculate Weighted Average. Retrieved December 17, 2017, from https://www.wikihow.com/Calculate-Weighted-Average

Project, S. N. (2017, May 01). Retrieved December 17, 2017, from https://www.kaggle.com/snap/amazon-fine-food-reviews/data

Web data: Amazon Fine Foods reviews. (n.d.). Retrieved December 17, 2017, from http://snap.stanford.edu/data/web-FineFoods.html