Update!

The 2024 Competition is now open! You can see this year’s rules and the timeline here

The rules have changed rather drastically from previous years. Instead of submitting probabilities for every possible game, you are submitting a bracket (or a “portfolio of brackets”) with projected winners.

I believe that this set of rules will:

Another Twist

A second new twist this year is that instead of just uploading a .csv file with our predictions, we need to submit a notebook (with code) that creates the .csv file. Kaggle lets you make notebooks using either R or python code. If you have used R markdown files at some point (we used them in Stats last year), these notebook will look a bit familiar. We will practice writing notebooks together in class and I will make an example notebook and share it with you.

The Big Picture

The data files contains lots of historical data – regular season and tournament data for teams from past seasons. The idea is that you could devise a method of winning this contest and see how well that method would have performed in past years. This year’s data won’t be complete until (probably) the Monday before the tournaments begin (since it requires all previous games to have been played and the brackets to be selected). Your predictions are due on March 21st, just before the tournaments start. That doesn’t leave much time between when all of this year’s data is available and when your predictions are due. Therefore, we should probably write code that would work for a previous year, possibly last year, see that it works and then we will need to make only minor changes to adapt it for this year’s competition.

What can we do right now?

Read through the descriptions of the data and try to get a handle on the types of data that are available. Here are some important files to read about:

If you scroll to the bottom of this page, you will see “Data Explorer” on the right and can click on data files and see samples of the data.