Hill-Climbing using the Leaderboard Score

Harel Lustiger
2015-07-23

Machine learning competitions have become an extremely popular format for solving prediction and classification problems of all kinds.
The central component of any competition is the leaderboard which ranks all teams in the competition by the score of their best submission.
Often, participants incorporate the feedback from the leaderboard into the design of their classifier thus creating a dependence between the classifier and the data on which it is evaluated.
Our Shiny application is designed to demonstrate how this dependence leads to biased estimate of the classifier’s true performance.

Typically, the competition is designed such the data is partitioned into two sets: a training set (instances with labels) and a test set instances without labels).
To avoid overfitting to the test set, the competition organizers further partition the test set into two parts:
- One part of the test set is used for computing scores on the public leaderboard.
- The other is used to rank all submissions after the competition ended.
Our goal is to climb the public leaderboard without even looking at the data.

Notation

Choose \( y_1,...,y_k \in \{0,1\}^N \) uniformly at random.
Let \( I= \{ i \in [k]:s_H(y_i)<0.5 \} \).
Output \( \hat{y} = \text{majority} \{ y_i:i \in I\} \), where the majority is component-wise.

Lo and behold, this is what happens:

plot of chunk WackyBoosting