class: center, middle, inverse, title-slide .title[ # Lecture 1. Introduction ] .subtitle[ ## Causal Inference ] .author[ ### Jonathan Platt ] .institute[ ### University of Iowa ] .date[ ### 1/16/24 ] --- # Learning objectives -- + Define and describe the importance of causal inference -- + Understand the Potential Outcomes framework -- + Understand the fundamental problem of CI, and the proxy for the counterfactual -- + Learn common notation --- # What is causal inference? <! –  -> -- + Inferring the effects of any treatment, policy, intervention etc. (effect of x on y) -- - Effect of HAART on HIV incidence? -- - Effect of coffee on BP? -- - Effect of climate change policy on emissions? -- - Effect of social media use on adolescent mental health? --- # Example for intuition Question: Do seat belts prevent fatalities in car crashes? _Approach 1_: Based on all car crashes from police reports that recorded seat belt use, compare fatality proportions between seat-belt and non-seat-belt users. Use `\(\chi^{2}\)` test. -- _Concerns?_ -- _Approach 2_: Matched Case-Control Study: Pick 5000 car crashes involving fatalities. For car crash `\(i\)`, (repeat for `\(i=\)` `\(1, \ldots, 5000)\)` search the data set for a non-fatal car crash with the same/similar values of the measured covariates -- _Concerns?_ --- # Potential source of unmeasured confounding: Risk Tolerance -- _Cautious Driver_: More likely to wear seat belts, drive at lower speeds, allow greater distance from car ahead, pay attention to road and weather conditions, doesn't drive if impaired. -- vs. _Risky Driver_: More often does not wear seat belts, drives faster and closer, ignores road conditions, willing to drive when impaired (drunk, sleepy, tired, texting, etc.) -- Some of these variables are unknown: may know road conditions, but not whether driver paid them any attention. Don't know if driver was sleepy or tired. Some of these variables may be poorly estimated: e.g. speed at time of crash, exact weather conditions. --- # Approach 3: Matching #2 Evans (1986) found all crashes in the FARS* data set having 2 people in the front seat - one belted, one unbelted - having 1 fatality. *US Fatal Accident Reporting System; `\(n=816\)` crashes | Driver | Passenger | Driver Belt- /<br>Pass Belt+ | Driver Belt+ /<br>Pass Belt- | | ------ | ----------- | ---- | ---- | | Died | Survived | 189 | 153 | | Survived | Died | 111 | 363 | | | _Total_ | 300 | 516 | -- `\(\mathrm{Pr(Death|No Seatbelt) = } 552 / 816=67.6 \%\)` Belted occupants more likely to survive. Advantages: Control for all factors (measured, measured with uncertainty, unmeasured) that are the same for all car occupants: speed at crash, road traction, condition of driver, driver's reaction time, distance to car ahead, etc. --- # Approach 3: Stratified risk Risk of fatality may differ between driver and passenger sides of car. However, look at the strata: + Driver not belted, Passenger belted: unbelted died in `\(189 / 300 (63.0 \%)\)` crashes. + Driver belted, Passenger not belted: unbelted died in `\(363 / 516(70.3 \%)\)` crashes. Concerns? --- # Takeaway message -- Approach 1, as given, will be biased -- Approach 2, even with matching or regression adjustment for measured covariates, does not account for unmeasured or mismeasured confounders. -- Approach 3 controls for everything common to occupants of the same car. (not common : age of occupants?) -- Each of these approaches represents a different study design for this observation study: each design corresponds to a different analysis and often a different subsample of all available data. Study question and design are crucial. --- # Experimental vs. Observational Studies -- In both study designs, the ideal goal is often to estimate a causal effect -- Main difference is the treatment/exposure assignment mechanism -- + Experimental Study: treatment is assigned by the experimenter, with a known probability -- + Observational Study: treatment is nonrandom; assigned by _nature_ -- People who: + smoke, + take BP pills, + work at a chemical plant, are not randomly assigned --- #Likewise: + doctors may advise against coronary artery bypass surgery because of patient's frailty + factory supervisor assigns heavy lifting tasks to larger and younger employees --- # RCT and Observational study strengths and weaknesses RCT Strengths + Randomization attempts to balance the distribution of all confounders (both observed and unobserved) across treatment groups + It's the _gold standard_ for establishing cause and effect and estimating the average causal effect. --- # RCT and Observational study strengths and weaknesses RCT Weaknesses + Strict inclusion/exclusion criteria affects generalizability of RCT results: e.g. subjects with certain comorbidities excluded + Often Infeasible - unethical (exposure to diesel fumes, radiation) - very expensive - not high enough priority to fund - not possible or practical (e.g. global warming) + Takes a long time to get answer for important question + Many methodological challenges - Statistical power; (differential) attrition, treatment non-compliance, unblinding; and measurement error (Concato et al, 2000; Little & Rubin, 2000; Rubin, 2007). --- # RCT and Observational study strengths and weaknesses Observational Study Strengths: + Can consider a larger/more natural source population -- + Can study effect of toxic exposures -- + Often faster and cheaper than an RCT (when both are an option) --- # RCT and Observational study strengths and weaknesses Observational Study Weaknesses -- Internal validity -- + Treatment and control groups differ on factors that can confound treatment-outcome relationship (lack of randomization); Unmeasured confounding -- + Selection bias - Unclear source population - Attrition -- + Measurement error (quality often not as good as RCT data); even simple things like BP --- # RCT and Observational study strengths and weaknesses Obs study results used as justification for an RCT. -- + Post RCT, long-term obs study to look for unexpected side effects. (Phase IV study) -- Success stories: much of what we know about risk factors for certain diseases derive from observational studies (e.g., Framingham Heart Study, Women's Health Initiative) -- We often use RCT thinking to guide observational study design and analysis + Target trial emulation --- We all intuitively understand causation and have been introduced to causal thinking in previous courses. The purpose of this course is to build on that foundation, to learn the **theory**, **study designs**, and **analyses** to make causal claims and estimate causal effects. --- # What is a causal effect? Rubin, 1980 + _“Intuitively, the causal effect of one treatment E over another C for a particular unit and an interval of time from t1 to t2 is the difference between what would have happened at time t2 if the unit had been exposed to E initiated at t1 and what would have happened at t2 if the unit had been exposed to C initiated at t1.”_ -- Hernan, 2004 + _"We compare … the outcome when an action A is taken with the outcome when the action A is withheld. If the two outcomes differ, we say that the action A has a causal effect._" --- # Some notation -- + Capital letters represent random variables - Variables that may have different values for different individuals -- + Lower case letters denote particular values of a random variable -- + Our notation for this course (following Hernán and Robins) + _Consider a dichotomous treatment variable `\(A\)` (1: treated, 0: untreated) and a dichotomous outcome variable `\(Y\)` (1: death, 0: survival)_ -- - `\(Y^a\)` denotes the outcome under hypothetical treatment `\(A\)` -- - `\(Y^{a=1}\)` outcome variable that would have been observed under the treatment value `\(a = 1\)` -- - `\(Y^{a=0}\)` the outcome variable that would have been observed under the treatment value `\(a = 0\)` -- Notation varies widely, even within the same discipline + `\(Y^{a,t,x...=1} = Y_{a=1} = Y_1 = Y^1\)` --- # Individual Causal Effects + `\(i\)` subscript will denote in the individual in our sample of size `\(n\)` for `\(i = 1,2,3...,n\)` -- Each individual has a set of two sets of potential outcomes, one for each treatment condition `\(Y_i^{a=1}\)` and `\(Y_i^{a=0}\)` -- The treatment `\(A\)` has a causal effect on `\(Y\)` when `\(Y_i^{a=1}\)` `\(\neq\)` `\(Y_i^{a=0}\)` -- i.e., `\(Pr[Y_i^{a=1}]\)` and `\(Pr[Y_i^{a=0}]\)` --- #Individual Causal Effects (view of the Gods) | | `\(Y^{a=0}\)` | `\(Y^{a=1}\)` | Causal Effect? | | `\(Y^{a=0}\)` | `\(Y^{a=1}\)` | Causal Effect?| | --- | --- | --- | --- | --- | --- | --- | --- | | Rheia | 0 | 1 | | Leto | 0 | 1 | | | Kronos | 1 | 0 | | Ares | 1 | 1 | | | Demeter | 0 | 0 | | Athena | 1 | 1 | | | Hades | 0 | 0 | | Hephaestus | 0 | 1 | | | Hestia | 0 | 0 | | Aphrodite | 0 | 1 | | | Poseidon | 1 | 0 | | Cyclope | 0 | 1 | | | Hera | 0 | 0 | | Persephone | 1 | 1 | | | Zeus | 0 | 1 | | Hermes | 1 | 0 | | | Artemis | 1 | 1 | | Hebe | 1 | 0 | | | Apollo | 1 | 0 | | Dionysus | 1 | 0 | | --- #Average Causal Effect -- `\(Pr[Y^{a=1}]= 0.5\)` `\(Pr[Y^{a=0}]= 0.5\)` `\(E[Y^{a=1}]=E[Y^{a=0}]\)` + `\(Pr[Y^{a=0}]\)` for dichotomous outcomes + CDF for continuous outcomes Note: the average causal effect does not imply the absence of individual causal effects ( _see the "sharp null causal hypothesis"_ ) --- #The **Fundamental problem** To get an individual causal effect, we need three pieces of info -- 1. an outcome of interest -- 2. the actions `\(a\)` = 1 and `\(a\)` = 0 to be compared -- 3. the individual counterfactual outcomes `\(Y_i^{a=0}\)` and `\(Y_i^{a=1}\)` to be compared -- But we are always missing data for #3 --- #View of the researcher | | `\(Y^{a=0}\)` | `\(Y^{a=1}\)` | Causal Effect? | | `\(Y^{a=0}\)` | `\(Y^{a=1}\)` | Causal Effect?| | --- | --- | --- | --- | --- | --- | --- | --- | | Rheia | 0 | ? | | Leto | 0 | ? | | | Kronos | 1 | ? | | Ares | ? | 1 | | | Demeter | 0 | ? | | Athena | ? | 1 | | | Hades | 0 | ? | | Hephaestus | ? | 1 | | | Hestia | ? | 0 | | Aphrodite | ? | 1 | | | Poseidon | ? | 0 | | Cyclope | ? | 1 | | | Hera | ? | 0 | | Persephone | ? | 1 | | | Zeus | ? | 1 | | Hermes | ? | 0 | | | Artemis | 1 | ? | | Hebe | ? | 0 | | | Apollo | 1 | ? | | Dionysus | ? | 0 | | --- #Causal Effects vs. Associations 1. We can never estimate individual causal effects -- 2. In this scenario, we can estimate an _association_ between `\(A\)` and `\(Y\)` -- `\(Pr[Y|A=1]= 7/13 = 0.54\)` `\(Pr[Y|A=0]= 3/7 = 0.43\)` `\(RD=-0.11\)` `\(RR=1.26\)` --- #Causation vs. Association <! –  -> + "what would be the risk if everybody had been treated?" vs. + "what is the risk in the treated?” and “what is the risk in the untreated?" --- #What's the difference? -- Comparing treated vs. untreated, any differences in `\(Y\)` that we see, cannot necessarily be attributed to `\(A\)` -- + e.g., Rheia or Hades | | `\(Y^{a=0}\)` | `\(Y^{a=1}\)` | Causal Effect? | | --- | --- | --- | --- | --- | --- | --- | --- | | **Rheia** | 0 | 1 | | **Hades** | 0 | 0 | _In whom did A cause Y?_ -- | | `\(Y^{a=0}\)` | `\(Y^{a=1}\)` | Causal Effect? | | --- | --- | --- | --- | --- | --- | --- | --- | | **Rheia** | 0 | ?| | **Hades** | 0 | ? | --- We can estimate average causal effects (ACE), provided we meet the assumptions needed to identify and estimate the missing counterfactual risk. + Identifying the most valid proxy for the counterfactual is the main purpose of causal inference --- #Effect Measures + Can define RD, RR, OR analogously + Interpretation will vary widely - `\(Pr[Y^{a=1}]= 3/1000\)` - `\(Pr[Y^{a=0}]= 1/1000\)` - RR = 3; RD = 0.002 _What's the goal of the inference?_ --- #Random variability We are always concerned with sample variability, however, for the majority of this course we will ignore random error and assume: -- + We have large samples, such that `\(\hat{Pr}[Y^{a=0}=1] = Pr[Y^{a=0}=1]\)` -- + _Consistent estimators_: the larger the sample, the smaller the difference between `\(\hat{Pr}[Y^{a=0}=1]\)` & `\(Pr[Y^{a=0}=1]\)` --- # Learning objectives + Define and describe the importance of causal inference + Understand the Potential Outcomes framework + Understand the fundamental problem of CI, and the proxy for the counterfactual + Learn common notation