Data 624/MDCH 700 L02 – Advanced Exploration and Visualization in Health

Instructor: Zahra Shakeri – Winter 2021

Datathon Description and Instructions

Datathon’s Goals

Food and waterborne infections are caused by eating or drinking contaminated food or beverages, swimming in contaminated water, or close contact with a patient or infected animal. Symptoms are typically gastrointestinal, such as diarrhea and stomach cramps. Although hospitalizations and deaths are rare, certain groups (such as pregnant females, young children, and elderly) are at increased risk of developing serious complications. Public health surveillance for food and waterborne disease outbreaks provides insight into the effectiveness of regulations and control measures, helps identify new and emerging pathogens, as well as informs regulations and public awareness activities to promote safe food handling, healthy swimming, and safe drinking water.

The main goal of this Datathon is to analyze and visualize the provided outbreaks dataset in a creative and insightful way. Assume that you are a team of data scientists who have been tasked with exploring a dataset over a week to create as much value as possible for the national public health. You are free to formulate and pursue any questions or visualizations you think might be interesting! Feel free to ask Dina or I for help, or book an appointment with her or me during the week for help with any data analysis or visualization questions.

Dataset Information

The dataset for this datathon was collected by the CDC through the National Outbreak Reporting System (NORS; link). It includes data on foodborne and waterborne disease outbreaks reported by all U.S. states and territories from 1971 to 2018. Data fields include date, US state, primary mode of transmission, causative organism, number of illnesses, hospitalizations, and deaths from the outbreak, as well as specific food/water type/animal. Please refer to the “data dictionary” tab of the provided file for full details.

The dataset can be found at Datahons/Datathon #4/Datasets, and it will be provided at 10:00 am on Wednesday, February 24, 2021 .

Instructions for Submission

You are encouraged to discuss your work with other teams and can use online and offline resources. However, all the members of your team should make large, meaningful contributions to your submission in fairness to all teams participating in this datathon. Teams must submit the following materials by the 12:00 pm (Feb 25th) deadline and 4:00 pm the final deadline. It is recommended that teams work continuously from the beginning on deliverables rather than finish it all within the last hour. You should begin working on the deliverables at least three days before the deadline.

Components of Submission

1. A Low-fidelity Prototype (In-class Submission)

The first session (phase) of this Datathon is a working session in which students collaborate to turn the provided datasets into insight. Teams need to develop research questions and [preliminary] findings and submit a low-fidelity prototype of their solution to Dropbox/Datathons/Datathon #4/Low-fidelity Prototype.

All the teams will be required to submit a low-fidelity prototype of their solution via D2L by 12:00 pm, February 25, 2021, through D2L. A successful submission will include neat and readable hand-drawn/digital (e.g., tablet) sketches that are appropriately labeled and explained. Make sure to state the question you are trying to answer for each sketch.

2. A High-fidelity Prototype in the form of Narrative Visualization

All of the teams will be required to submit and present a narrative Tableau presentation (3 minutes for the presentation, followed by 2 minutes for questions; minimum of 3 and maximum of 5 visual components). The goal of the presentation is to guide your TA on how you utilized the available data to answer the question you came up with. The presentation should include meaningful visualizations and text. It is up to your discretion as to what kind of material you would like to put in the presentation, but the analytical process, visual findings, and conclusion should be clear. In general, the content in the presentation should be a condensed version of the written report.

It is required that teams submit both the written report and the Tableau dashboard used for the presentation by 4:00 pm, March 3, 2021.

3. A Written Report

Teams must write a report that describes the steps taken to answer their proposed question or prompt. There is no set format for how the report should be written, but example sections of the report can include, but are not limited to, the following:

  • Introduction: What question are you answering with the data, and why is it important?
  • Data Engineering Process: How did you clean and prepare the data, and what data did you use?
  • Analysis: What analytical techniques did you use, and why?
  • Findings: What did you discover (refer to visualizations)?
  • Conclusion: What can a decision-maker at National Public Health conclude from your team’s work?

At minimum, the report must include the question being answered, findings and description of visualizations, and a conclusion. Your report may not be longer than four pages in length. Please do not include visualizations in the report; instead, you can refer to each figure by its title in the Narrative Visualization. From the report, it should be clear as to how you approached your analysis. Please submit your written report as a .pdf document. In addition to this pdf file, all codes written for this Datathon need to be submitted (as a .zip file). The codes can be messy, uncommented, in multiple files, etc., and will not be evaluated on their quality. Make sure to put the names of all group members on the first page of your report!

Ensure that all materials are submitted by 4:00 pm, March 3rd. Unfortunately, no late submissions will be accepted.

This Datathon is pretty free-form! This is intentional; projects you work on in industry will rarely be very specific. Please feel free to show early results to me or the TA to get some feedback you can use to ensure a successful submission!

Important Dates

Component Due Time Where to Submit?
Data Availability February 24, 10:00 am Datahons/Datathon #4/Datasets
Low-fidelity Prototype February 25, 12:00 pm Dropbox/Datathons/Datathon #4/Low-fidelity Prototype
Narrative Visualization March 3, 4:00 pm Dropbox/Datathons/Datathon #4/Presentation
Written Report March 3, 4:00 pm Dropbox/Datathons/Datathon #4/Written Report