Data 624- Advanced Exploration and Visualization in Health

Instructor: Zahra Shakeri– Winter 2021

Datathon Description and Instructions

Datathton’s Goals

For this datathon, you will work on your course project and you will be required to submit the low-fidelity prototype of your analysis results, with some early sketches of your visualizations. For the analysis part, you will need to add two more questions to the objectives of your project, considering the following constraints:

  • New Question #1: Define a sub-question for one of the main objectives of your project such that answering to this question requires applying Unsupervised Information Extraction (e.g. language modeling, topic modeling, TF-IDF)

  • New Question #2: Define a sub-question for one of the main objectives of your project (this can be the same as the objective that you used for New Question #1) such that answering to this question requires applying Supervised Information Extraction (e.g. text/quantitative classification)

Please feel free to brainstorm you new questions with your client or with Dina and I.

Dataset Information

The dataset for this datathon is the same as the dataset for your course project. The results of your data analysis for this datathon will be submitted as the second delivery of your course project. Please be advised that this datathon counts for 30% of the course project plus 13% of your final grade (21% in total).

The dataset can be found at Course Project/Dataset. Also, for the purpose of supervised analysis of textual data, you may need to use the labelled dataset from the labelling sheet here.

You might find this link, for Twitter API Data Dictionary, useful.

Instructions for Submission

You are encouraged to discuss your work with other teams and can use online and offline resources. However, both members of your team should make significant, meaningful contributions to your submission in fairness to all teams participating in this datathon. Teams must submit the following materials by the 8:00 pm in-class deadline and 11:59 pm the final deadline. It is recommended that teams work continuously from the beginning on deliverables rather than finish it all within the last hour. It would be best if you begin working on the deliverables at least seven days before the deadline.

Components of Submission

1. A Low-fidelity Prototype (In-class Submission)

The first session (phase) of this Datathon is a working session in which students collaborate to come up with an idea about the information extraction task. Teams need to develop two new questions as well as their [preliminary] analysis plan and submit a low-fidelity prototype of their solution to Dropbox/Datathons/Datathon #3/Low-fidelity Prototype.

All the teams will be required to submit a low-fidelity prototype of their new questions and its tentative solution via D2L by 8:00 pm, February 10, 2020, through D2L. A successful submission will include neat and readable text as well as hand-drawn/digital sketches that are appropriately labelled and explained. Make sure to state the question you are trying to answer for each sketch.

2. A High-fidelity Prototype in the form of Narrative Visualization

All of the teams will be required to submit a narrative Tableau presentation (minimum of 3 and maximum of 5 visual components) to show the preliminary results of their course project. The goal of the presentation is to guide your TA on your progress on the course project. The presentation should include meaningful visualizations and text. It is up to your discretion as to what kind of material you would like to put in the presentation, but the analytical process, visual findings, and conclusion should be clear. In general, the content in the presentation should be a condensed version of the written report.

This presentation is due on February 24, 11:59 pm

3. A Written Report

Teams must write a report that describes the steps taken to answer their proposed question or prompt. To report your data analysis process, preliminary results and visualizations, please update the shared Overleaf file. However, you will need to submit the pdf output of this file as your Datathon #3’s report.

At minimum, the report must include the new questions being answered, your analysis process, findings (including evaluation results) and visualizations, and a brief conclusion (this will be updated in later stages of the project). Your report may not be longer than seven pages in length. Please do not include visualizations in the report; instead, you can refer to each figure by its title in the Narrative Visualization. From the report, it should be clear as to how you approached your analysis. Please submit your written report as a .pdf document. In addition to this pdf file and your visualization file, all codes written for this datathon will need to be submitted (as a .zip file). The programs can be messy, uncommented, in multiple files, etc., and will not be evaluated on their quality. Make sure to put the names of all group members on the first page of your report!

Ensure that all materials are submitted by 11:59 pm, February 24th. Unfortunately, no late submissions will be accepted.

Please feel free to show early results to me or the TA to get some feedback you can use to ensure a successful submission!

Important Dates

Component Due Time Where to Submit?
Data Availability Project’s Dataset Project/Datasets
Low-fidelity Prototype February 10, 8:00 pm Dropbox/Datathons/Datathon #3/Low-fidelity Prototype
Narrative Visualization February 24, 11:59 pm Dropbox/Datathons/Datathon #3/Presentation
Written Report February 24, 11:59 pm Dropbox/Datathons/Datathon #3/Written Report