Note: This document may be updated as the event approaches; any major updates will be clearly marked.
Presentation upload link for Sunday: bit.ly/df19-upload (See presentations for details.)
DataFest 2019 @ Duke will start with registration and end with awards ceremony at Penn Pavilion. See the schedule for more detailed information for important events in between and their locations.
You can see a map of all DataFest locations and suggested parking spots here.
We ask that you carpool with teammates as parking on campus can be tricky.
Non-Duke students who will be traveling to Duke’s campus for DataFest should plan on driving with their teammates. The closest parking lot is the Bryan Center Parking Garage.
More information on parking at the Bryan Center and associated costs can be found at http://parking.duke.edu/parking/visitor_parking/.
There is a bus that runs directly from UNC to Duke Chapel every half-hour with no stops. Their schedule is here: http://admin.gotransitnc.org/sites/default/files/maps-and-schedules/gotriangle/RoutesAndSchedules-1759.pdf
The last shuttle back to UNC leaves at 10:30 p.m. Friday and 11:30 p.m. on Saturday. The first ones on Saturday/Sunday leave UNC at 12:00 pm. The bus is free for UNC GoPass holders and otherwise $3.00.
If you will be arriving earlier than the earliest bus, or leaving later than the latest bus, and do not have access to a car, see below.
If you do not have access to a car / carpool, we recommend that you use a service like Uber or Lyft.
We can reimburse you for your ride as long as there are 3+ students in the car with you (we do not have the funds to reimburse individual rides).
UNC students will be reimbursed up to $20/day/team and NC State students will be reimbursed up to $40/day/team.
For reimbursement of parking and transportation you must give / send your receipt to
by Monday, April 8. The names of the riders must be written in the back of the receipt. No reimbursement will be provided without printed receipts.
The schedule is at https://stat.duke.edu/datafest/#schedule.
Registration opens at 4:00pm on Friday at Penn Pavillion.
You are of course free to come and go as you please throughout the event, but here are the times all team members should plan to be on premises:
Consultants will be available for help until midnight; you can work as late/early as you like. We recommend checking out the Consultant schedule to plan out your weekend.
You will have 24 hour access to Penn Pavilion. Duke students should have their Duke IDs with them. All students will be given DataFest badges that you should wear at all times.
We recommend that every member of the team bring a laptop, if possible. You might find it helpful to have a mix of PCs and Macs, since they have different strengths.
We recommend that you make sure beforehand that the software you will be using throughout the weekend is properly installed and running on your computer. You will be working with a large dataset so make sure that you have the space for it on your hard drive.
You might want to bring some favorite statistical or computational reference books, if you have them, or bookmark some pages that you routinely refer to.
We will provide meals, snacks, and munchies. Feel free to bring anything additional you might want.
Cloud computing resources for R and Python will be provided. Details to be emailed.
At the end of the kickoff presentation you will be given three options to obtain the data:
You will also be given a link to a Google Doc where you can ask questions about the data and a representative from the provider will answer them periodically throughout the event. Link to be emailed.
The dataset you will be working with is quite large. If you type a variable name to view it, it will take a while to display. Therefore, remember these R commands: head(), tail(), str().
We strongly recommend you create a small data set that you can use to test things on. Then, if it works out, you can apply your procedure to the large dataset. Some procedures can take a frustratingly long time to run on large data sets, and so it will be comforting to know that your procedure works (because you tested it on a smaller data set) while you wait. We recommend taking a random sample of rows from the original data set, but there might be other approaches you find useful.
Each team will have 4 minutes + 1 minute Q&A to present their findings to the judges. That’s exactly 4 minutes, not 4 minutes and a few additional seconds. Each team will be allowed at most three slides. Three! So at some point Saturday night or Sunday morning, you might want to set aside time to think about what you want the judges to know. The 4 minute presentation and 1 minute Q&A time limits will be strictly enforced. All team members must be present for the presentation, but not all team members need to actually speak (given the time limitation).
Along with your presentation you will also turn in a one-page write-up of your project. You can think about this as the text of your presentation. The judges will refer to these during deliberation.
At noon on Sunday all work must stop and you must upload your presentation and your write up at the upload link given on top of this document. If you are having technical difficulty, you can come to the info desk and ask a consultant for help. Consultants will be around to help until midnight.
Teams who fail to upload their presentations and write-ups by 12:30pm will not be eligible to have their presentations judged.
The files you’re submitting must be named in the following manner:
Note that you will not have time to log on/off to your account before your presentation. We don’t want to restrict your creativity but it is your responsibility to make sure that your presentation works seamlessly before the judging session begins.
Judging will happen in two rounds:
Teams will be randomly assigned to 7 judging sessions (~10 teams per session) where they will present their findings to a panel of two judges. Each judging panel will evaluate and score all teams in their session on three categories (insight, use of outside data, visualization) and nominate 1-2 teams to continue on to the next round of judging. They will also have the opportunity to nominate additional teams as honorable mentions or on other categories. These teams will be acknowledged at the awards ceremony, but will not be competing for awards in the final round.
During the time participants will be transitioning from their first round judging rooms to Penn Pavilion for final round, organizers will process scoring data from the judges. Scores will be normalized to scale judges’ scoring, and top 10 teams will be selected to present their work in the final round of judging.
First round presentations and judging will be held at the following eight venues. On Saturday you will receive an email letting you know your room assignment.
If you are unfamiliar with Duke’s campus, you might want to check out your room assignment before 1pm on Sunday so that you are not late to your presentation. Helpers will be on hand to guide students from Penn Pavilion to these locations as well.
7 highest ranked out of the teams that were nominated to proceed onto the next round of judging will present their work again to a new panel of judges. Teams will give the same presentation as they gave before. Three of these will be selected for the award categories listed below. The judges also have the option to name a fourth winner as Judges’ Pick.
Final round of judging will take place in Penn Pavilion (same room as the kickoff).
All are welcomed to the presentations and award ceremony.
Awards will be given in three categories:
These are listed in no particular order.
The judges also have the option to name a fourth winner as Judges’ Pick.
Award ceremony will take place in Penn Pavilion (same room as the kickoff).
Winners will receive medals and books as well as one-year student memberships to the American Statistical Association. See amstat.org/membership for membership benefits.
Throughout the event we will be giving out raffle prizes. Announcements for these will be shared on social media. Follow these channels to get a chance to win one of these sweet prizes! Winning will also require that you are on premises at the time a prize is announced.
DataFest is a great recruiting opportunity for many employers, and surely they won’t miss it!
Many of our sponsors are sponsoring the event so you can find out more about them.
Most of our consultants are coming from companies who are recruiting or at a minimum wanting to meet you, so chat with them, find out what they do, network.
We will collect resumes and share them with some of our sponsors. Participation in the resume book is optional, but highly recommended. You will receive information about this during the event.
You can come and go as you please, but all work must be completed on premises.
Do not share the name of the data source publicly or on social media before May 1st. There are many other upcoming DataFests around the country and we want to make sure the dataset remains a surprise for them.
Clicking on the download link for the dataset means that you agree to the following Non-Disclosure agreement from the data provider. You can freely share your results, presentations, findings, etc. as part of your digital portfolio, however you are not allowed to share the raw data with anyone outside of DataFest. At the end of DataFest, you must delete all data from thumb drives, hard drives, etc. The data are sensitive.
At all times between 9am-12 (midnight) there will be a friendly consultants present. These are faculty, grad students, members of the Research Triangle Analysts Group, or other professionals with field specific knowledge on the dataset. They all have different areas of expertise, so if you get stuck on something and one consultant isn’t able to help, ask someone else later. Feel free to ask anything. This is not an exam, but a collaboratory competition. Do not expect the consultants to write code for you, or do data management, etc. They are there to help point you in the right direction, but you’re responsible for getting there on your own.
View the participant summary document for more on the participants.
Social media
Follow us on social media and don’t hesitate to share the fun and thank our sponsors (except for the data provider, which we need to keep a secret for the time being).
We will use these channels for announcements throughout the event as well, so make sure that you’re checking regularly.