This page contains information about the episode Keeping ourselves honest when we work with observational health care data on the podcast Linear Digressions, produced April 19, 2020.

Article Summary

This podcast from Linear Digressions covers broadly struggles with causal inference of observational data in health care. At first, the podcast creators, Ben Jaffe and Katie Malone, go over the basics of causal inference related to medicine: whether a medicine caused a better outcome than if the patient hadn’t taken it, or had taken another medicine. This inference is difficult to perform with observational data because it is not randomized as in a typical experiment. They then turn to a study in the Harvard Data Science Review, titled How Confident are we About Observational Studies in Healthcare. This study looks at different strategies to analyze observational data for causal inference, as there are different ways to preprocess the data that could potentially lead to different answers. They set to analyze healthcare data from 4 different databases and try all of these different analysis choices. To tell if the analysis method was returning the “right answer”, they looked at medicines where a certain outcome was most certainly not associated, and made sure that the analysis method wasn’t finding a signal where there wasn’t one. In addition, they inserted a synthetic signal in to the dataset to see if the analysis method could find the signal, and that it was the correct size.

Keywords

Keywords Definitions
Causal Inference Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect.
Randomization A method based on chance alone by which study participants are assigned to a treatment group.
Observational Data Observational data refers to information gathered without the subject of the research (for example an individual customer, patient, employee, etc.) having to be explicitly involved in recording what they are doing.
Bias Bias is disproportionate weight in favor of or against an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair.
Signal Any value obtained by a measurement contains two components: one carries the information of interest, the signal, the other consists of random errors, or noise, that is superimposed on the first component.

Harvard Study Data

The below data represents the mean precision after empirical calibration across strata for each of the four databases used.

Database: CCAE

Mean precision after empirical calibration across strata for CCAE database (https://data.ohdsi.org/MethodEvalViewer/)

Mean precision after empirical calibration across strata for CCAE database (https://data.ohdsi.org/MethodEvalViewer/)

Database: JMDC

Mean precision after empirical calibration across strata for JMDC database (https://data.ohdsi.org/MethodEvalViewer/)

Mean precision after empirical calibration across strata for JMDC database (https://data.ohdsi.org/MethodEvalViewer/)

Database: MDCR

Mean precision after empirical calibration across strata for MDCR database (https://data.ohdsi.org/MethodEvalViewer/)

Mean precision after empirical calibration across strata for MDCR database (https://data.ohdsi.org/MethodEvalViewer/)

Database: PanTher

Mean precision after empirical calibration across strata for PanTher database (https://data.ohdsi.org/MethodEvalViewer/)

Mean precision after empirical calibration across strata for PanTher database (https://data.ohdsi.org/MethodEvalViewer/)

What do I think?

I think that the podcast did a very good job explaining the purpose of the Harvard data science study as well as how they conducted their research. I think if I tried to read the actual study, I would have had some difficulty interpreting it. The podcast hosts Katie and Ben had a good back and forth and seemed knowledgeable on the topic. In addition, they structured this episode so that they first introduced the listener to the ideas behind causal inference and then dove into the study which made it an easier listen. I plan to tune into some more of their podcasts!

Podcast Creators

Katie Malone

  • Education
    • Undergraduate Engineering Physics: Ohio State University
    • PhD Physics: Stanford University
  • Career
    • Director of Data Science: Tempus Labs
    • Data Scientist: Civis Analytics

Ben Jaffe

  • Education
    • Theater Arts and Computer Science: UCSC
  • Career
    • Senior Software Engineer: Netflix
    • UI Engineer: Facebook
    • Sound Technician: Theatreworks

Additional References