21/11/2020

Introduction

Choosing is hard

  • Many webstores use a 5-star rating system for products
  • Choosing between a product that has a rating of 5 stars and one review and one with 5 stars and thousands of reviews is easy
  • But what about a recent product with only ten 5-star reviews vs. an old product with thousands of 4-stars reviews? Which one should you pick?

Let’s take a look into one method - Bayesian Ranking

Bayesian Ranking

  • Bayesian Raking uses Bayes theorem to rank products based on their reviews
  • It not only gives a point estimate of the ranking but also a confidence estimate based on the probabilities that the product has a particular amount of stars
  • And furthermore you can influence the result by giving the algorithm prior information about how you think how good the product is

But how is it done? Don’t worry, we don’t dive deep into the maths

Algorithm (I)

  • Reviews can be modeled by a beta-distribution. And if you are really not confident if the ratings can be trusted the uniform beta(1,1) distribution is a good choice.
  • This is called a prior in Bayesian Theorem, you can pick it according to your expectations
    • Say you are pretty confident that all involved products are good? Start off with a beta(20, 1) distribution
    • You think all products are bad? Try beta(1,20)
    • All products seem to be average? beta(20,20) could be a good starter

How would these distributions look like?

The following plot shows individual priors: The probability for each star rating 1 to 5 on a continuous scale.

Algorithm (II)

We see that beta(1,1) is has no particular maximum. beta(20,20) is right in the middle and beta(20,1) has it’s maximum right on the 5-star rating - far higher than the other distributions because the spread of probabilities is much smaller.

Hands on

Average Prior

  • You now take your choosen prior and add 1 for each 5-star review to the left side of the beta-distribution parameters, and 1 on the right side for each 1 star review
  • All other stars will add as a fraction to both sides of the beta distribution
    • e.g.: A 4-star rating adds \(\frac{3}{4}\) on the left side and \(\frac{1}{4}\) on the right side

Let’s compare a product with six 5-star ratings against one with a product with twenty 4-star reviews and five 5-star review with an “average” beta(20,20) prior: \[ \text{Product 1} = beta(20 + 5, 20) = beta(25, 20) \\ \text{Product 2} = beta(20 + 20 \times \frac{3}{4} + 5, 20 + 20 \times \frac{1}{4}) = beta(40, 25) \]

Probabilities (average prior)

Given the prior of average reviews it would seem reasonable to choose the second product over the first one. The many 4-star reviews managed to shift the distribution more to the right than the five 5-star reviews. But what if you really don’t know if these products are good or bad and choose a non-informative beta(1,1)-prior?

Probabilities (non-informative prior)

Wow! Product 1 could now be a really awesome one - but also has a longer tail of being really aweful. Which one should you choose in this scenario?

Ranking

The probability of being 95% better than x-Stars for these products is:

c(qbeta(.05, 6, 1) * 4 + 1,
  qbeta(.05, 21, 6) * 4 + 1)
## [1] 3.427849 3.549621

So even now product two wins. But remember: Choose your prior according to your knowledge! It can affect the outcome.

And furthermore: You could miss out on some really great new products with the disadvantage of being new and not properly reviewed. Choose your own risk you want to take, maybe it is okay for you to be wrong 20% of the time? Plug in the numbers and find out which product wins in this scenario.

Thank you

And I hope you have learned something from this. For further details you can visit this Link where you can find more information about this method.

I hope you learned something new. Farewell!