Xaringan

class: title-slide
background-image: url("assets/uvalogo_g_CommunicationScience.png"), url("assets/candle.gif")
background-position: 9% 91%, 100% 50%
background-size: 300px, 50% 100%
background-color: #000000

<br/>

# .Large[(In)visible Information]

## .white[.font100[Theory, Measurement, and Boosting]]

### .white[.font80[Saurabh Khanna]]

### .white[.font80[21.05.2024]]

---

# About me

<br/>

🇬🇧 Research Associate, University of Oxford [Pembroke College]

<br/>

]
]

.pull-right[
<div class="leaflet html-widget html-fill-item-overflow-hidden html-fill-item" id="htmlwidget-64e2288858dca2ccf622" style="width:504px;height:504px;"></div>
<script type="application/json" data-for="htmlwidget-64e2288858dca2ccf622">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addProviderTiles","args":["CartoDB.Positron",null,null,{"errorTileUrl":"","noWrap":false,"detectRetina":false}]},{"method":"addAwesomeMarkers","args":[[52.3556,51.7548],[4.9558,-1.2544],{"icon":"university","markerColor":"blue","iconColor":"white","spin":false,"squareMarker":false,"iconRotate":0,"font":"monospace","prefix":"fa"},null,null,{"interactive":true,"draggable":false,"keyboard":true,"title":"","alt":"","zIndexOffset":0,"opacity":1,"riseOnHover":false,"riseOffset":250},null,null,null,null,["University of Amsterdam","University of Oxford"],{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"limits":{"lat":[51.7548,52.3556],"lng":[-1.2544,4.9558]}},"evals":[],"jsHooks":[]}</script>
]

---

# About me

<br/>

🇬🇧 Research Associate, University of Oxford [Pembroke College]
]
<br/>

Previously:

🇬🇧 Postdoctoral Research Fellow, University of Oxford [Pembroke College]

🇺🇸 PhD, Education Policy [Computer Science minor], Stanford University

🇺🇸 MA, Economics, Stanford University

🇮🇳 MA, Education, TISS Mumbai

🇮🇳 BS, Computer Science, BITS Pilani
]
]

.pull-right[
<div class="leaflet html-widget html-fill-item-overflow-hidden html-fill-item" id="htmlwidget-7c50a5964fc6d53e164b" style="width:504px;height:504px;"></div>
<script type="application/json" data-for="htmlwidget-7c50a5964fc6d53e164b">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addProviderTiles","args":["CartoDB.Positron",null,null,{"errorTileUrl":"","noWrap":false,"detectRetina":false}]},{"method":"addAwesomeMarkers","args":[[52.3556,51.7548],[4.9558,-1.2544],{"icon":"university","markerColor":"blue","iconColor":"white","spin":false,"squareMarker":false,"iconRotate":0,"font":"monospace","prefix":"fa"},null,null,{"interactive":true,"draggable":false,"keyboard":true,"title":"","alt":"","zIndexOffset":0,"opacity":1,"riseOnHover":false,"riseOffset":250},null,null,null,null,["University of Amsterdam","University of Oxford"],{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"limits":{"lat":[51.7548,52.3556],"lng":[-1.2544,4.9558]}},"evals":[],"jsHooks":[]}</script>
]

---

# Digitizing Human Lives

- Scale and Speed: `$8.5B$` Google searches per day

- COVID-19: 47% rise in broadband usage across the US

- Enabled democratic discourse

- Some concerns though

---

# Concerns

- Misinformation
- Bias

---

# Concerns

- Misinformation: Information consumed `$\ne$` Truth
- Bias: [Information consumed `$\ne$` Truth] `$+$` Discriminates against a social group

- .red[Truth?]
  - ~~A norm~~ An exception on the Internet (Flanagin & Metzger, 2000)

<br/>
--

- Consider two statements:
  - `$S_1$`: The election was rigged
  - `$S_2$`: <span style="color:blue">People think</span> the election was rigged

---

# Concerns

- Misinformation: Information consumed `$\ne$` Truth
- Bias: [Information consumed `$\ne$` Truth] `$+$` Discriminates against a social group
- .red[Truth?]
  - ~~A norm~~ An exception on the Internet (Flanagin & Metzger, 2000)

<br/>

- Consider two statements:
  - `$S_1$`: The election was rigged ❌
  - `$S_2$`: <span style="color:blue">People think</span> the election was rigged ✅

--
  - `$S_1$` and `$S_2$` have similar effect on the reader (Tucker & Persily, 2020)
  
--
  - Telling the reader `$S_1$` is False doesn't help either (Bail, 2021)

---

class: middle, bottom
background-image: url("assets/venn1.png")
background-size: contain
background-position: 50% 50%

---

class: middle, bottom
background-image: url("assets/venn2.png")
background-size: contain
background-position: 50% 50%
background-color: black

---

## .white[Towards a Theory of Invisible Information]

---

# Three aspects

### 1. Not all knowledge is digitized

- minority languages
- rare diseases
- certain types of crime
- ...
]

---

# Three aspects

### 2. Digitized information in dynamically 'ranked' ~~by~~ for us

---

# Three aspects

### 3. We consume the tip of the information iceberg

.pull-left-1[
`$$P_i = \frac{\frac{1}{i^s}}{\sum_{j=1}^{N} \frac{1}{j^s}},$$`
where `$P_i$` is the probability of clicking a search result ranked `$i$`.
]

---

## .white[Measuring Invisible Information]

---

### Leveraging text embeddings

- Transformer-based embeddings (2017), sBERT (2019)
- Relevance of a document, given a search query, can be computed as the semantic distance between the document and the query in the embedding space (2020)

.center[
.blockquote[
<img src="assets/vectors.png" alt="rda_logo" width="500" style="border:0;"/>
]
]

---

`$\vec{q}$`: query

`$\vec{C} = \sum w_i \vec{r_i}$`: corpus constructed as weighted aggregate of  `$r_i$` vectors

`$\vec{r_i}$`: one of out `$n$` search results

`$w_i$`: weight assigned to each search result

.content-box-purple[
`$$I_{visibility} = \sum_{i=1}^N P_i (\vec{C} \cdot \vec{r_i}) = \sum_{i=1}^N \left(\frac{\frac{1}{i^s}}{\sum_{j=1}^{N} \frac{1}{j^s}}\right) (\vec{C} \cdot \vec{r_i})$$`
]

---

# Information Visibility

- Single metric `$\in [0, 1]$`
- Intuitive
- Applicable to any ranked information source
  - search results
  - social media feeds
  - image galleries
]

.content-box-purple[
`$$I_{visibility} = \sum_{i=1}^N P_i (\vec{C} \cdot \vec{r_i}) = \sum_{i=1}^N \left(\frac{\frac{1}{i^s}}{\sum_{j=1}^{N} \frac{1}{j^s}}\right) (\vec{C} \cdot \vec{r_i})$$`
]

---

# Another (more granular) metric

### Information Visibility Curve

.pull-left-1[
<br/>
.content-box-purple[
`$$f(i) = \vec{C} \cdot \left(\sum_{i=1}^n w_i \vec{r_i}\right)$$`
]

The area under the curve indicates how efficiently information visibility was gained for a given dataset.

]

]

---

## .white[Boosting Invisible Information]

---

# Boosting

### Boost **all invisible** information? ❌

---

# Boosting

### Boost **relevant invisible** information ✅

.pull-left[
- We need
  - .blue[Data]
  - .red[A measure of relevance]
  - .green[A measure of invisibility]
]

.pull-right[
- .blue[IMDb corpus for 697,872 films since 1874
- 45% have user ratings
- 100s of features per film]
- .red[Predict ratings for the remaining 55% (A measure of relevance)]
- .green[Compare plot embeddings against the whole corpus (A measure of invisibility)]
]

<br/>

.content-box-gray[
.center[
Boost **hidden gems** based on a harmonic mean of .red[both] .green[measures] above.
]
]

---

# [🕯️](https://theinvisiblelab.org/) The (In)visible Lab

<br/>

🌱 Social and Behavioural Data Science Centre, University of Amsterdam
]

🇺🇸 Stanford University

🇬🇧 University of Oxford

🇪🇸 University of Deusto

🇨🇦 Public Knowledge Project

🇮🇸 Citizens Foundation
]

---

<br/>

# .Large[Thank You!]

### .white[Feedback/Questions]

### .white[saurabh.khanna@uva.nl]