Note: This document may be updated as the event approaches; any major updates will be clearly marked.
Presentation upload link for Sunday: bit.ly/df22-upload (See presentations for details.)
DataFest 2022 @ Duke will start with registration and end with awards ceremony at Penn Pavilion. See the schedule for more detailed information for important events in between and their locations.
You can see a map of all DataFest locations and suggested parking spots here.
We ask that you carpool with teammates as parking on campus can be tricky.
Non-Duke students who will be traveling to Duke’s campus for DataFest should plan on driving with their teammates. The closest parking lot is the Bryan Center Parking Garage.
More information on parking at the Bryan Center and associated costs can be found at http://parking.duke.edu/parking/visitor_parking/.
The schedule is at https://www2.stat.duke.edu/datafest/#schedule.
Registration opens at 5:00pm on Friday at Penn Pavilion.
You are of course free to come and go as you please throughout the event, but here are the times all team members should plan to be on premises:
Consultants will be available for help until midnight.
You will have access to Penn Pavilion until midnight on Friday and Saturday. You should have your Duke ID and name badge with you at all times.
We recommend that every member of the team bring a laptop, if possible. You might find it helpful to have a mix of PCs and Macs, since they have different strengths.
We recommend that you make sure beforehand that the software you will be using throughout the weekend is properly installed and running on your computer. You will be working with a large dataset so make sure that you have the space for it on your hard drive.
You might want to bring some favorite statistical or computational reference books, if you have them, or bookmark some pages that you routinely refer to.
We will provide meals, snacks, and munchies. Feel free to bring anything additional you might want.
Cloud computing resources for R, Python, and Julia can be accessed at https://cmgr.oit.duke.edu/containers/datafest2022. Duke students can log in with their Duke Net ID and password. When you click on “reserve a container”, you get both an RStudio and JupyterLab container and these share the same home directory in case there are people who want to use both R and Python. The JupyterLab container also has Julia installed. To handle this dual headed approach, when you tell the container manager reservation system you want to login, you get a screen with two button - one to start RStudio and the other for JupyterLab.
Data will be available on the containers at 6pm on Friday, in the
data-readonly folder.
At the end of the kickoff presentation you will be given three options to obtain the data:
You will also be given a link to a Google Doc where you can ask questions about the data and a representative from the provider will answer them periodically throughout the event. Link to be emailed.
The dataset you will be working with is quite large. If you type a
variable name to view it, it will take a while to display. Therefore,
remember these R commands: head(), tail(),
str().
We strongly recommend you create a small data set that you can use to test things on. Then, if it works out, you can apply your procedure to the large dataset. Some procedures can take a frustratingly long time to run on large data sets, and so it will be comforting to know that your procedure works (because you tested it on a smaller data set) while you wait. We recommend taking a random sample of rows from the original data set, but there might be other approaches you find useful.
Each team will have 4 minutes + 1 minute Q&A to present their findings to the judges. That’s exactly 4 minutes, not 4 minutes and a few additional seconds. Each team will be allowed at most three slides. Three! So at some point Saturday night or Sunday morning, you might want to set aside time to think about what you want the judges to know. The 4 minute presentation and 1 minute Q&A time limits will be strictly enforced. All team members must be present for the presentation, but not all team members need to actually speak (given the time limitation).
Along with your presentation you will also turn in a one-page write-up of your project. You can think about this as the text of your presentation. The judges will refer to these during deliberation.
At noon on Sunday all work must stop and you must upload your presentation and your write up at the upload link given on top of this document. If you are having technical difficulty, you can come to the info desk and ask a consultant for help. Consultants will be around to help until midnight.
Teams who fail to upload their presentations and write-ups by 12:30pm will not be eligible to have their presentations judged.
The files you’re submitting must be named in the following manner:
Note that you will not have time to log on/off to your account before your presentation. We don’t want to restrict your creativity but it is your responsibility to make sure that your presentation works seamlessly before the judging session begins.
Judging will happen in two rounds.
Teams will be randomly assigned to 3 judging sessions (~10 teams per session) where they will present their findings to a panel of two judges. Each judging panel will evaluate and score all teams in their session on three categories (insight, use of outside data/statistical analysis, visualization) and nominate 2-3 teams to continue on to the next round of judging. They will also have the opportunity to nominate additional teams as honorable mentions or on other categories. These teams will be acknowledged at the awards ceremony, but will not be competing for awards in the final round.
During the time participants will be transitioning from their first round judging rooms to Penn Pavilion for final round, organizers will process scoring data from the judges. Scores will be normalized to scale judges’ scoring, and top 10 teams will be selected to present their work in the final round of judging.
First round presentations and judging will be held at the following eight venues. On Saturday you will receive an email letting you know your room assignment.
If you are unfamiliar with Duke’s campus, you might want to check out your room assignment before 1pm on Sunday so that you are not late to your presentation. Helpers will be on hand to guide students from Penn Pavilion to these locations as well.
7 highest ranked out of the teams that were nominated to proceed onto the next round of judging will present their work again to a new panel of judges. Teams will give the same presentation as they gave before. Three of these will be selected for the award categories listed below. The judges also have the option to name a fourth winner as Judges’ Pick.
Final round of judging will take place in Penn Pavilion (same room as the kickoff).
All are welcomed to the presentations and award ceremony.
Awards will be given in three categories:
These are listed in no particular order.
The judges also have the option to name a fourth winner as Judges’ Pick.
Award ceremony will take place in Penn Pavilion (same room as the kickoff).
Winners will receive medals and books as well as one-year student memberships to the American Statistical Association. See amstat.org/membership for membership benefits.
Throughout the event we will be giving mini prizes for various challenges. We’ll announce these on the event schedule. You need to be in Penn Pavilion to participated in the challenges.
You can come and go as you please, but all work must be completed on premises.
Do not share the name of the data source publicly or on social media before May 1st. There are many other upcoming DataFests around the country and we want to make sure the dataset remains a surprise for them.
Clicking on the download link for the dataset means that you agree to the following Non-Disclosure agreement from the data provider. You can freely share your results, presentations, findings, etc. as part of your digital portfolio, however you are not allowed to share the raw data with anyone outside of DataFest. At the end of DataFest, you must delete all data from thumb drives, hard drives, etc. The data are sensitive.
At all times between 9am-12 (midnight) there will be a friendly consultants present. These are faculty, grad students, members of the Research Triangle Analysts Group, or other professionals with field specific knowledge on the dataset. They all have different areas of expertise, so if you get stuck on something and one consultant isn’t able to help, ask someone else later. Feel free to ask anything. This is not an exam, but a collaboratory competition. Do not expect the consultants to write code for you, or do data management, etc. They are there to help point you in the right direction, but you’re responsible for getting there on your own.
Social media
Follow us on social media and don’t hesitate to share the fun and thank our sponsors (except for the data provider, which we need to keep a secret for the time being).
We will use these channels for announcements throughout the event as well, so make sure that you’re checking regularly.