Data 624/MDCH 700 L02 - Advanced Exploration and Visualization in Health

Instructor: Zahra Shakeri– Winter 2021

Datathon Description and Instructions

Datathon’s Goals

The main goal of this datathon is to analyze and visualize any (or all) of the provided dataset in a creative and insightful way. Pretend you are a team of data scientists at Health Canada, and you have been tasked with exploring a few dataset over a week to create as much value as possible for the national public health. You are free to formulate and pursue any questions or visualizations you think might be interesting! Feel free to ask Dina for help, or book an appointment with her or me during the week for help with any data analysis or visualization questions.

Dataset Information

The dataset for this datathon were collected by Statistics Canada and present the mental health status of Canadians on both illness and positive mental health continuums by age group and sex. Four of these dataset (health characteristics and the three mental health characteristics dataset) were generated from the Canadian Community Health Survey (CCHS) and break down data by province as well as total values for Canada. The fifth one (deaths in Canada by cause ….) was collected from Vital Statistics – Death Database and presents deaths in Canada from mental and behavioral health disorders by ICD-10 codes (WHO International Statistical Classification of Disease and Related Health Problems). Years covered by each dataset are not necessarily the same and are included in the dataset name.

The dataset can be found at Datahons/Datathon #1/Dataset, and it will be provided at 10:00 am on Wednesday, January 20, 2021 .

Instructions for Submission

You are encouraged to discuss your work with other teams and can use online and offline resources. However, all the members of your team should make large, meaningful contributions to your submission in fairness to all teams participating in this datathon. Teams must submit the following materials by the 8:00 pm in-class deadline and 4:00 pm the final deadline. It is recommended that teams work continuously from the beginning on deliverables rather than finish it all within the last hour. You should begin working on the deliverables at least three days before the deadline.

Components of Submission

1. A Low-fidelity Prototype (In-class Submission)

The first session (phase) of this Datathon is a working session in which students collaborate to turn the provided dataset into insight. Teams need to develop research questions and [preliminary] findings and submit a low-fidelity prototype of their solution to Dropbox/Datathons/Datathon #1/Low-fidelity Prototype.

What is a low-fidelity prototype?

In software engineering, the term prototype refers to a broad range of techniques and tools, from paper to programming. The fidelity of these prototypes, as well as the time and energy required to create them, lives on a spectrum:

One end of the spectrum is characterized by low-fidelity (or lo-fi) prototypes. These include mock-ups quickly sketched on paper or a whiteboard with impressions of what the data might look like, and fast, digital mock-ups that may include some controls for explaining interaction ideas, such as slide jumps in PowerPoint or Keynote.

Lo-fi digital mock-ups can also incorporate charts generated in a tool like Excel or Tableau with fake or sampled data to explore possible visualization representation ideas. These lo-fi prototypes are great for communicating the gist of an idea in an interview, or for recording high-level ideas when planning out how to explore the data. Lo-fi prototypes are, by nature, fast and easy to produce.

Communicating ideas with lo-fi prototypes can rapidly help establish whether the visualization designer is on the same page as the stakeholders.

Reference: Making Data Visual: A Practical Guide to Using Visualization for Insight

All the teams will be required to submit a low-fidelity prototype of their solution via D2L by 8:00 pm, January 20, 2021, through D2L. A successful submission will include neat and readable hand-drawn/digital (e.g., tablet) sketches that are appropriately labeled and explained. Make sure to state the question you are trying to answer for each sketch.

2. A High-fidelity Prototype in the form of Narrative Visualization

All of the teams will be required to submit and present a narrative Tableau presentation (3 minutes and up to 5 visual components). The goal of the presentation is to guide your TA on how you utilized the available data to answer the question you came up with. The presentation should include meaningful visualizations and text. It is up to your discretion as to what kind of material you would like to put in the presentation, but the analytical process, visual findings, and conclusion should be clear. In general, the content in the presentation should be a condensed version of the written report.

In order for TA to properly prepare teams’ presentations, it is required that teams submit both written report and the Tableau dashboard used for the presentation by 4:00 pm, January 27, 2021.

3. A Written Report

Teams must write a report that describes the steps taken to answer their proposed question or prompt. There is no set format for how the report should be written, but example sections of the report can include but are not limited to, the following:

  • Introduction: What question are you answering with the data, and why is it important?
  • Data Engineering Process: How did you clean and prepare the data, and what data did you use?
  • Analysis: What analytical techniques did you use, and why?
  • Findings: What did you discover (include visualizations)?
  • Conclusion: What can a decision-maker at Health Canada conclude from your team’s work?

At minimum, the report must include the question being answered, findings and visualizations, and a conclusion. Your report may not be longer than four pages in length. Please do not include visualizations in the report; instead, you can refer to each figure by its title in the Narrative Visualization. From the report, it should be clear as to how you approached your analysis. Please submit your written report as a .pdf document. In addition to this pdf file, all codes written for this datathon will need to be submitted (as a .zip file). The programs can be messy, uncommented, in multiple files, etc., and will not be evaluated on their quality. Make sure to put the names of all group members on the first page of your report!

Ensure that all materials are submitted by 4:00 pm, January 27th. Unfortunately, in fairness to all teams, we cannot offer any extensions to the deadline. No late submissions will be accepted.

This datathon is pretty free-form! This is intentional; projects you work on in industry will rarely be very specific. Please feel free to show early results to me or the TA to get some feedback you can use to ensure a successful submission!

Important Dates

Component Due Time Where to Submit?
Data Availability January 20, 10:00 am Datahons/Datathon #1/Dataset
Low-fidelity Prototype January 20, 8:00 pm Dropbox/Datathons/Datathon #1/Low-fidelity Prototype
Narrative Visualization January 27, 4:00 pm Dropbox/Datathons/Datathon #1/Presentation
Written Report January 27, 4:00 pm Dropbox/Datathons/Datathon #1/Written Report