Hill-Climbing using the Leaderboard Score

Harel Lustiger
2015-07-23

Problem

  • Machine learning competitions have become an extremely popular format for solving prediction and classification problems of all kinds.
  • The central component of any competition is the leaderboard which ranks all teams in the competition by the score of their best submission.
  • Often, participants incorporate the feedback from the leaderboard into the design of their classifier thus creating a dependence between the classifier and the data on which it is evaluated.
  • Our Shiny application is designed to demonstrate how this dependence leads to biased estimate of the classifier’s true performance.

Objective

  • Typically, the competition is designed such the data is partitioned into two sets: a training set (instances with labels) and a test set instances without labels).
  • To avoid overfitting to the test set, the competition organizers further partition the test set into two parts:
    • One part of the test set is used for computing scores on the public leaderboard.
    • The other is used to rank all submissions after the competition ended.
  • Our goal is to climb the public leaderboard without even looking at the data.

Algorithm (Wacky Boosting)

Notation

  • \( y \in \{0,1\}^N \) set of labeled prediction
  • \( s_H(y) \); public score of a submission \( y \)

Algorithm

  1. Choose \( y_1,...,y_k \in \{0,1\}^N \) uniformly at random.
  2. Let \( I= \{ i \in [k]:s_H(y_i)<0.5 \} \).
  3. Output \( \hat{y} = \text{majority} \{ y_i:i \in I\} \), where the majority is component-wise.

Results

Lo and behold, this is what happens:

plot of chunk WackyBoosting

  • We keep climbing the leaderboard! :)
  • Wacky boosting did nothing whatsoever on the final test set :(

Further Reading