“Crowdsourcing is a specific sourcing model in which individuals or organizations use contributions from Internet users to obtain needed services or ideas.” -https://en.wikipedia.org/wiki/Crowdsourcing
In this project, reviews from Amazon will be used to crowdsource the overall opinion of a fine food. The data spans a period of more than 10 years, including more than 500,000 reviews up to October 2012.
The data contains ten columns of information:
The product that the crowsourced rating is going to be found for is ProductId = B007JFMH8M, because this product has the most reviews. This product id is for Quaker Soft Baked Oatmeal Raisin Cookies. Just these reviews are collected below.
allReviews = read.csv("Reviews.csv")
cookieReviews = allReviews[allReviews$ProductId=="B007JFMH8M",]
The Text column is also cut out because it’s taking up memory and won’t be used in any computation. Once this is done, a summary of the data is displayed.
cookieReviews = subset(cookieReviews, select = -Text)
summary(cookieReviews)
## Id ProductId UserId ProfileName
## Min. :562971 B007JFMH8M:913 A100CFHPG2AIP : 1 Becky : 3
## 1st Qu.:563199 0006641040: 0 A100WO06OQR8BQ: 1 Jen : 3
## Median :563427 141278509X: 0 A1011PVIALNY6J: 1 Jessica: 3
## Mean :563427 2734888454: 0 A102EES14OUWJ6: 1 Amanda : 2
## 3rd Qu.:563655 2841233731: 0 A103USS4JZN50K: 1 Amber : 2
## Max. :563883 7310172001: 0 A104FLKIMW5MRP: 1 Amy : 2
## (Other) : 0 (Other) :907 (Other):898
## HelpfulnessNumerator HelpfulnessDenominator Score
## Min. :0.00000 Min. :0.00000 Min. :1.000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:4.000
## Median :0.00000 Median :0.00000 Median :5.000
## Mean :0.04491 Mean :0.05038 Mean :4.583
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:5.000
## Max. :5.00000 Max. :5.00000 Max. :5.000
##
## Time Summary
## Min. :1.342e+09 Yummy : 22
## 1st Qu.:1.342e+09 Yummy! : 21
## Median :1.342e+09 Quaker Soft Baked Oatmeal Cookies: 18
## Mean :1.342e+09 Delicious : 16
## 3rd Qu.:1.343e+09 Delicious! : 16
## Max. :1.351e+09 Quaker Soft Baked Oatmeal Cookie : 9
## (Other) :811
For this crowdsourcing analysis, the average review scores (1-5) will be taken from each customer who made a review. Each of the customers will also be weighted based on how helpful their review was to others. If more than half of people found their review helpful, then their score will be given a weight of 75%. If less than half of people found their review helpful, then their score will be given a weight of 25%.
First the fraction of helpfulness needs to be computed for each review.
cookieReviews$Helpfulness = cookieReviews$HelpfulnessNumerator / cookieReviews$HelpfulnessDenominator
cookieReviews$Helpfulness[is.na(cookieReviews$Helpfulness)] = 0
summary(cookieReviews$Helpfulness)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.03158 0.00000 1.00000
Now the averages of the scores for both categories of helpfulness can be determined.
aboveAve = mean(cookieReviews$Score[cookieReviews$Helpfulness > .50])
aboveAve
## [1] 4.142857
belowAve = mean(cookieReviews$Score[cookieReviews$Helpfulness <= .50])
belowAve
## [1] 4.59661
The weights are applied.
aboveAve = aboveAve*0.75
aboveAve
## [1] 3.107143
belowAve = belowAve*0.25
belowAve
## [1] 1.149153
The final score is computed.
finalScore = aboveAve + belowAve
finalScore
## [1] 4.256295
The crowdsourced score for Quaker Soft Baked Oatmeal Raisin Cookies considering the degree of helpfulness for each review is about 4.26.
Crowdsourcing. (2017, December 05). Retrieved December 17, 2017, from https://en.wikipedia.org/wiki/Crowdsourcing
W. (2017, November 27). How to Calculate Weighted Average. Retrieved December 17, 2017, from https://www.wikihow.com/Calculate-Weighted-Average
Project, S. N. (2017, May 01). Retrieved December 17, 2017, from https://www.kaggle.com/snap/amazon-fine-food-reviews/data
Web data: Amazon Fine Foods reviews. (n.d.). Retrieved December 17, 2017, from http://snap.stanford.edu/data/web-FineFoods.html