According to McKinsey (via TechEmergence), “Big data and machine learning in pharma and medicine could generate a value of up to $100 billion annually, based on better decision-making, optimized innovation, improved efficiency of research/clinical trials, and new tool creation for physicians, consumers, insurers and regulators.” The main goal of this Datathon is to analyze and visualize the provided drug_review dataset in a creative and insightful way. Drug Review helps ensure that scarce health care resources are used to fund the most effective drugs. Clinicians, researchers, payers, and patients all have important but potentially different ideas on what should be considered to determine a drug’s value. To help with this process, assume that you are a team of data scientists and you have been tasked with exploring a dataset over a week to create as much value as possible for the national public health by analyzing patients’ perspectives on specific drugs.
You are free to formulate and pursue any questions or visualizations you think might be interesting! Feel free to ask Dina for help, or book an appointment with her or me during the week for help with any data analysis or visualization questions.
The dataset for this datathon was collected by Gräßer, Felix et al. [1], and it provides patient reviews on specific drugs, along with related conditions and a 10-star patient rating system reflecting overall patient satisfaction. The data were obtained by crawling online pharmaceutical review sites. This dataset was published in a study on sentiment analysis of drug experience over multiple facets. This data can be housed in various databases, including pharmacy management systems, financial systems, category product systems, and supply chain systems.
[1] Gräßer, Felix, et al. “Aspect-based sentiment analysis of drug reviews applying cross-domain and cross-data learning.” Proceedings of the 2018 International Conference on Digital Health. 2018.
The dataset can be found at Datahons/Datathon #2/Datasets, and it will be provided at 10:00 am on Wednesday, January 27, 2021 .
You are encouraged to discuss your work with other teams and can use online and offline resources. However, all the members of your team should make large, meaningful contributions to your submission in fairness to all teams participating in this datathon. Teams must submit the following materials by the 8:00 pm in-class deadline and 4:00 pm the final deadline. It is recommended that teams work continuously from the beginning on deliverables rather than finish it all within the last hour. You should begin working on the deliverables at least three days before the deadline.
The first session (phase) of this Datathon is a working session in which students collaborate to turn the provided datasets into insight. Teams need to develop research questions and [preliminary] findings and submit a low-fidelity prototype of their solution to Dropbox/Datathons/Datathon #2/Low-fidelity Prototype.
All the teams will be required to submit a low-fidelity prototype of their solution via D2L by 8:00 pm, January 27, 2021, through D2L. A successful submission will include neat and readable hand-drawn/digital (e.g., tablet) sketches that are appropriately labeled and explained. Make sure to state the question you are trying to answer for each sketch.
All of the teams will be required to submit and present a narrative Tableau presentation (3 minutes for the presentation, followed by 2 minutes for questions; minimum of 3 and maximum of 5 visual components). The goal of the presentation is to guide your TA on how you utilized the available data to answer the question you came up with. The presentation should include meaningful visualizations and text. It is up to your discretion as to what kind of material you would like to put in the presentation, but the analytical process, visual findings, and conclusion should be clear. In general, the content in the presentation should be a condensed version of the written report.
In order for the TA to properly prepare teams’ presentations, it is required that teams submit both the written report and the Tableau dashboard used for the presentation by 4:00 pm, February 3, 2021.
Teams must write a report that describes the steps taken to answer their proposed question or prompt. There is no set format for how the report should be written, but example sections of the report can include, but are not limited to, the following:
At minimum, the report must include the question being answered, findings and description of visualizations, and a conclusion. Your report may not be longer than four pages in length. Please do not include visualizations in the report; instead, you can refer to each figure by its title in the Narrative Visualization. From the report, it should be clear as to how you approached your analysis. Please submit your written report as a .pdf document. In addition to this pdf file, all codes written for this Datathon need to be submitted (as a .zip file). The codes can be messy, uncommented, in multiple files, etc., and will not be evaluated on their quality. Make sure to put the names of all group members on the first page of your report!
Ensure that all materials are submitted by 4:00 pm, February 3rd. Unfortunately, no late submissions will be accepted.
This Datathon is pretty free-form! This is intentional; projects you work on in industry will rarely be very specific. Please feel free to show early results to me or the TA to get some feedback you can use to ensure a successful submission!
| Component | Due Time | Where to Submit? |
|---|---|---|
| Data Availability | January 27, 10:00 am | Datahons/Datathon #2/Datasets |
| Low-fidelity Prototype | January 27, 8:00 pm | Dropbox/Datathons/Datathon #2/Low-fidelity Prototype |
| Narrative Visualization | February 3, 4:00 pm | Dropbox/Datathons/Datathon #2/Presentation |
| Written Report | February 3, 4:00 pm | Dropbox/Datathons/Datathon #2/Written Report |