2020-02-26 22:36:49

Discussion Paper(s): Chap 1

Overview

Take Away:

  1. Privacy Preserving Data
  2. k-anonymous
  3. Differential Privacy

TimeLine

How It Began?

k-anonymity

What is it?

Limitations

  • Unfortunately, although k-anonymity can prevent record reidentification in the strictest sense, it highlights that reidentification is not the only (or even the main) privacy risk.
  • When multiple datasets are released, even if all of them have been released with a k-anonymity guarantee.
  • Solution is not “aggregate statistics”: privacy risks there, too.

A DIFFERENT(IAL) NOTION OF PRIVACY

What is it ?

  • A stringent measure of privacy but still allows insights.
  • “We should be comparing what someone might learn from an analysis if any particular person’s data was included in the dataset with what someone might learn if it was not.”
    • Gödel Prize: Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith.
  • Requires Randomized Algorithm: randomized algorithm maps inputs to the probabilities of different outputs
  • The use of randomness - deliberately add noise to computations, in a way that promises that any one person’s data cannot be reverse-engineered from the results.

Theoretical Definition:

Simple But Remarkable:

  • We are able to learn what we wanted without incidentally collecting any strongly incriminating information about any single individual in the population : collecting only the aggregate information that we wanted.

  • an important trade-off: the more private we make the protocol, the more plausible deniability each individual we poll gets, and the less anyone can learn from an individual polling response

  • So to get the same accuracy, the smaller the privacy parameter, the more people we need to poll.

Who is using Differential Privacy

First Two Large Scale Deployments

Third Deployment : US Census Bureau

  • 2020 Census will be protected with differential privacy
  • entralized model of differential privacy : adding privacy protections to the aggregate statistics it publicly releases
  • Questions:
    • So why did the Census Bureau decide to adopt differential privacy in the first place?
    • Why a weaker centralized trust model, which does not protect against subpoenas, hacks, and the like?

What DP doesn’t do ?

Beyond Dp’s control:

  • Unintended Correlations:

  • Catching The Golden State Killer:

Discussion Thought :

Let’s Discuss:

  • Q: With DP’s aggregate statistic , are the application in government use “equitable and fair”?
  • Q: Can we protect privacy by not recording things that should not be recorded: race, etc. ?