"Understanding Biases in Search & Recommender Systems"

This article appeared on Search Engine Journal (SEJ) by Greg Jarboe/ Dec 2019.

Reference:

(1) https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/2016_0504_data_discrimination.pdf

(2) https://cacm.acm.org/magazines/2018/6/228035-bias-on-the-web/fulltext

(3) https://www.searchenginejournal.com/biases-search-recommender-systems/339319/

Introduction

Biases whatever forms it might take is creating sub-optimal solutions for the general public. Biases has been intrinsically embedded in culture and history since the beginning of time. However, due to the rise of digital data, it can now spread faster than ever and reach many more people. This has detrimnental effects in the way we digest, spread and get influenced by big data. For instance, minorities, especially, have felt the harmful effects of data bias when pursuing life goals, with outcomes governed primarily by algorithms, from mortgage loans to advertising personalization[1].


Historical perspective

Our inherent human tendency of favoring one thing or opinion over another is reflected in every aspect of our lives, creating both latent and overt biases toward everything we see, hear, and do. Therefore, biases on the Web reflects both societal and internal biases within ourselves, emerging in subtler ways. This article aims to increase awareness of the potential side effects imposed on us all through bias present in Web use and content and by extension in the recommender algorithms that arises from our interactions within the Web [2].


Discussion I: What are Biases?

The article forcussed on a talk by Dr. Baeza-Yates, CTO of NTENT in a speech he gave about biases in search and recommender systems. Essentially according to his research, there are 3 main types of biases:

Most people like myself think of biases in recommender systems on cultural grounds: gender, racial, sexual, age, religious, social, linguistic, geographic, political, educational, economic, and technological. But, we fail to realize that we have been practicing biases every day since we took our first statistics class in middle school. We “infer” and/or extrapolated results based on a very small sample size to the entire population. This included the gathering process of the raw data, sampling processes and validations of the samples. A case in point is polling of votes. We extrapolated the outcome of the entire election based on a small sampling of the entire population of would-be voters. That is how we got it so wrong in many occasions; the last major Snafu was the US general election of 2016 between Trump vs. Clinton.

Another important example is one of cognitive bias: confirmation bias. This is the tendency to search for, interpret, favor, and recall information in a way that affirms one’s prior beliefs or hypotheses; we do that all the time subconsciously!


Discussion II: How does this impact search and recommender systems?

Most web systems are optimized by using implicit user feedback. However, user data is partly biased by the choices that these systems make. For instance, we can only click on things that are shown to us. Because these systems are usually based on Machine Learning, they learn to reinforce their own biases, yielding self-fulfilled prophecies and/or sub-optimal solutions. Furthermore, personalization and filter bubbles for users can itself create echo chambers for recommender systems.


Discussion III: Biases Are Everywhere!

Because biases are everywhere, it is imperative that we as consumers of recommender systems are aware of it first and foremost. The users of search and recommender systems need to realize that removing bias involves more than just making engineers tune their algorithms. It also requires us to be aware of our own cultural and cognitive biases. As such, it also meant that search and recommender systems don’t need to be perfect, they just need to be better than humans who aren’t aware of their biases.


Discussion IV: How Does Bias in Search & Recommender Systems Impact us?

A common situation in the past with recommender systems was “popularity bias”. That is if you only get recommended a few of of the most popular items from x,y,z websites, then you will never buy anything else other than the most popular items and the sellers are likely to undercut their own sales of new items that haven’t had time to become popular yet – which is the ecommerce equivalent of eating your seed corn. This seems to be more or less taken care in recent years as Tech giants have now resolve these issues.

Other biases on web interaction in particular ecommerce with data and algorithmic biases are:


Summmary of key take aways from this article


For the designers of recommendation systems[3], its split into 3 different categories: Data, Interactions & Design and Implementation. From the data perspective, the authors of recommendation systems should analyze for known and unkown biases. Debias or mitigate whenever possible. Recollect more data for difficult/sparse regions of the problem if neccessary. Lastly, delete attributes associated directly/indirectly with harmful bias. Within the interaction spheres, make sure that the user is aware of the biases all the time and give more control to the users. Last but not least, when designing and implementating, let the experts/colleagues/users contest every step of the process. Allow for more open and free discussion around the subject matter, especially if they are controversial. By allowing diversity into the discussion during the design phase, we are actually performing some sort of vetting and cleansing before its release into the public.

And for the consumers of these recommender systems, we must be at all times know that systems are only a reflection of ourselves; with it we will witness the good, the bad and the ugly of human nature. The problem in today’s digital age is that the Web amplifies everything and we must be aware of not only our own biases but the biases of the engineers who created them and counteract them to stop the vicious bias cycle.