January 20, 2026
| Unit | Control | Treatm. | Effect |
|---|---|---|---|
| \(i\) | \(Y_i^C\) | \(Y_i^T\) | \(\delta_i\) |
| 1 | 8 | 9 | 1 |
| 2 | 5 | 3 | -2 |
| 3 | 6 | 4 | -2 |
| 4 | 6 | 2 | -4 |
| 5 | 15 | 18 | 3 |
| 6 | 13 | 16 | 3 |
| 7 | 8 | 9 | 1 |
| 8 | 2 | 0 | -2 |
| 9 | 4 | 3 | -1 |
| 10 | 2 | 0 | -2 |
| Mean | 6.9 | 6.4 | -0.5 |
“The science table”
\(D = T\): Treatment
\(D = C\): Control
\(Y = Y^T\) if \(D = T\)
\(Y = Y^C\) if \(D = C\)
Average causal effect:
\(\bar{\delta} = 6.4 - 6.9 = -0.5\)
| Unit | Status | PO Cont | PO Treat | Observed |
|---|---|---|---|---|
| \(i\) | \(D\) | \(Y_i^C\) | \(Y_i^T\) | \(Y_{obs}\) |
| 1 | T | ? | 9 | 9 |
| 2 | C | 5 | ? | 5 |
| 3 | C | 6 | ? | 6 |
| 4 | C | 6 | ? | 6 |
| 5 | T | ? | 18 | 18 |
| 6 | T | ? | 16 | 16 |
| 7 | T | ? | 9 | 9 |
| 8 | C | 2 | ? | 2 |
| 9 | C | 4 | ? | 4 |
| 10 | C | 2 | ? | 2 |
| Mean | 4.2 | 13 |
Observed “effect”:
\(13-4.2=8.8^{**}\)!!
Wrong conclusions:
New treatment adds 9 years of life
If all treated average life = 13
Random sample doesn’t help.
For subject \(i\), the causal effect of the treatment is the difference between two outcomes:
\(\delta_{i} = Y_i^T-Y_i^C\) (\(Y_i^T\): \(i\)’s PO in treatment, \(Y_i^C\): \(i\)’s PO in control)
But only one of the two potential outcomes is realised/observed
(Unless Christmas spirits help…)
| Group | \(D\) | \(Y_i^T\) | \(Y_i^C\) |
|---|---|---|---|
| Treatment | T | Observable | Counterfactual |
| Control | C | Counterfactual | Observable |
\(\hat{\delta}_{naive} = Avr(Y_i^{obs} | D = T) - Avr(Y_i^{obs} | D = C)\)
(Observed difference between treatment and control)
| Unit | Status | PO Cont | PO Treat | Observed |
|---|---|---|---|---|
| \(i\) | \(D\) | \(Y_i^C\) | \(Y_i^T\) | \(Y_{obs}\) |
| 1 | T | ? | 9 | 9 |
| 2 | C | 5 | ? | 5 |
| 3 | C | 6 | ? | 6 |
| 4 | C | 6 | ? | 6 |
| 5 | T | ? | 18 | 18 |
| 6 | T | ? | 16 | 16 |
| 7 | T | ? | 9 | 9 |
| 8 | C | 2 | ? | 2 |
| 9 | C | 4 | ? | 4 |
| 10 | C | 2 | ? | 2 |
| Mean | 4.2 | 13 |
\(\hat{\delta}_{naive} = \frac{9+18+16+9}{4} - \frac{5+6+6+2+4+2}{6} = 8.8\)
Individual Causal / Treatment Effect:
\(\delta_i = Y_i^T - Y_i^C\)
Average Treatment Effect (ATE) for the entire population:
\[ \text{ATE} = \text{Average}(\delta) = E\!\left[Y_i^T - Y_i^C\right] = \color{red}{E[Y_i^T] - E[Y_i^C]} \]
Average Treatment Effect for the Treated (ATT):
\[ \text{ATT} = \text{Average}(\delta \mid D = T) = E(Y_i^T - Y_i^C \mid D = T) = \color{red}{E(Y_i^T \mid D = T) - E(Y_i^C \mid D = T)} \]
Average Treatment Effect for the Controls (ATC) or sometimes ATUT:
\[ \text{ATC} = \text{Average}(\delta \mid D = C) = E(Y_i^T - Y_i^C \mid D = C) = \color{red}{E(Y_i^T \mid D = C) - E(Y_i^C \mid D = C)} \]
| Potential outcomes | ||||
|---|---|---|---|---|
| Unit | D | Control | Treatm. | Effect |
| i | \(Y_i^C\) | \(Y_i^T\) | \(\delta_i\) | |
| 1 | T | 8 | 9 | 1 |
| 2 | C | 5 | 3 | -2 |
| 3 | C | 6 | 4 | -2 |
| 4 | C | 6 | 2 | -4 |
| 5 | T | 15 | 18 | 3 |
| 6 | T | 13 | 16 | 3 |
| 7 | T | 8 | 9 | 1 |
| 8 | C | 2 | 0 | -2 |
| 9 | C | 4 | 3 | -1 |
| 10 | C | 2 | 0 | -2 |
| Unit | Status | Control | Treatm. | Effect |
|---|---|---|---|---|
| \(i\) | \(D\) | \(Y_i^C\) | \(Y_i^T\) | \(\delta_i\) |
| 1 | T | 8 | 9 | 1 |
| 2 | C | 5 | 3 | -2 |
| 3 | C | 6 | 4 | -2 |
| 4 | C | 6 | 2 | -4 |
| 5 | T | 15 | 18 | 3 |
| 6 | T | 13 | 16 | 3 |
| 7 | T | 8 | 9 | 1 |
| 8 | C | 2 | 0 | -2 |
| 9 | C | 4 | 3 | -1 |
| 10 | C | 2 | 0 | -2 |
| Mean | 6.9 | 6.4 | -0.5 |
ATE \(= E[Y_i^T] - E[Y_i^C] = 6.4 - 6.9 = -0.5\)
ATT \(= \text{Avrg}(\delta \mid D = T) = \frac{1 + 3 + 3 + 1}{4} = 2\)
ATC \(= \text{Avrg}(\delta \mid D = C) = \frac{-2 - 2 - 4 - 2 - 1 - 2}{6} = -2.17\)
Beware: these are not readily obtainable in observational data
Estimand: Average Treatment Effect (ATE)
\[ \begin{align} \widehat{\delta}_{\text{naive}} &= \underbrace{\text{Avrg}(\delta)}_{\text{ATE}} \\ &\quad + \underbrace{\text{Avrg}(Y_i^C \mid D = T) - \text{Avrg}(Y_i^C \mid D = C)}_{\text{Selection (baseline) bias}} \\ &\quad + (1 - \pi) \times \underbrace{\left[ \text{Avrg}(\delta \mid D = T) - \text{Avrg}(\delta \mid D = C) \right]}_{\text{ATT - ATC (differential treatment effect) bias} } \end{align} \] (\(\pi =\) proportion of sample in the treatment group. The more people are treated, the smaller will be the differential treatment effect bias because the ’naive’ estimate would be already more based on those who are treated)
| Potential outcomes | ||||
|---|---|---|---|---|
| Unit | D | Control | Treatm. | Effect |
| i | \(Y_i^C\) | \(Y_i^T\) | \(\delta_i\) | |
| 1 | T | 8 | 9 | 1 |
| 2 | C | 5 | 3 | -2 |
| 3 | C | 6 | 4 | -2 |
| 4 | C | 6 | 2 | -4 |
| 5 | T | 15 | 18 | 3 |
| 6 | T | 13 | 16 | 3 |
| 7 | T | 8 | 9 | 1 |
| 8 | C | 2 | 0 | -2 |
| 9 | C | 4 | 3 | -1 |
| 10 | C | 2 | 0 | -2 |
| Average | 6.9 | 6.4 | -0.5 | |
| Potential outcomes | ||||
|---|---|---|---|---|
| Unit | D | Control | Treatm. | Effect |
| i | \(Y_i^C\) | \(Y_i^T\) | \(\delta_i\) | |
| 1 | T | 8 | 9 | 1 |
| 2 | C | 5 | 3 | -2 |
| 3 | C | 6 | 4 | -2 |
| 4 | C | 6 | 2 | -4 |
| 5 | T | 15 | 18 | 3 |
| 6 | T | 13 | 16 | 3 |
| 7 | T | 8 | 9 | 1 |
| 8 | C | 2 | 0 | -2 |
| 9 | C | 4 | 3 | -1 |
| 10 | C | 2 | 0 | -2 |
| Average | 6.9 | 6.4 | -0.5 | |
\[ \begin{align} 8.8 &= -0.5 && (\text{ATE}) \\ &\quad + \frac{44}{4} - \frac{25}{6} && (= 6.8:\ \text{selection (baseline) bias}) \\ &\quad + 0.6 \cdot (2 + 2.17) && (= 2.5:\ \text{ATT--ATC bias}) \end{align} \]
Random assignment has been called the gold standard for causal inference: it guarantees the necessary assumptions for causal inference hold by design.
What is the difference between random assignment and random sampling???
When relying on the Rubin/Neyman Causal Model with potential outcomes, we rely on SUTVA:
SUTVA: Stable Unit Stable Unit Treatment Value Assumption.
An observation’s outcome is not affected by other observations’ assignments.
Example: Immunization randomised control trial may violate SUTVA because immunization also has a group effect.
Bias in the naive estimator when trying to reach our estimand (ATE): baseline differences (under the control condition), and differential response to the treatment (under the exposure condition).
When exposure is randomised properly, we know that who ends up in each treatment arm has nothing to do with their potential outcomes!
This is why we generally say that randomised experiments are great for internal validity: we can rule out systematic bias in our study sample!
This does not imply that our results are externally valid, i.e., that they apply to people outside our study! We need further assumptions to move from one to another.
Laboratory experiments: Usually conducted with a small sample (of undergraduate psychology students), many times involving games in a computer. Helpful for cognitive/behavioral questions.
Field experiments: In order to obtain more externally valid results, experiments conducted in the field (i.e., under real-world conditions) are the way to go. Definitely more expensive though. Audit studies are a particular type of field experiment.
Survey experiments: One can randomize treatment conditions in a survey to evaluate how participants change their responses based on certain stimulus. Vignettes and list experiments are examples of this approach.
(Bonus) Quasi-experiments: Researchers usually call quasi-experiments to real-world situations that offer as-if random variation in a treatment of interest. For example, earthquakes, change in laws, date of birth, etc.
How much dispersion (i.e. uncertainty) is in our distribution will be affected by the level at which randomization (i.e. treatment) happens: at the individual or cluster/group level?
The more the aggregation, the more uncertainty. So why would we want to randomize at the cluster level?
Conditional randomization (i.e., blocking) increases efficiency, when we have variables that are highly predictive of the outcome of interest
One extreme of this is randomization in matched pairs: for each pair of individual with similar covariates, we randomly assign one to treatment and one to control
Similar to the intuition for stratified random sampling in the context of surveys, blocking may increase precision in experimental design
Precision gains are similar to increasing the sample size
Collect background information on covariates relevant to the outcome
Pre-stratify your sample, then randomise within blocks
This ensures that, with respect to the blocked factors, both treatment arms are identical
It is essentially the same as running a separate experiment in each strata
For estimation, obtain block-specific effects, and average according to population shares. With \(J\) strata:
\[ \tau_{\text{block}} = \sum_{j=1}^{J} \frac{N_j}{N} \tau_j \]
It may be hard to imagine an experiment that would be relevant for the type of questions we care about.
Some even say that experiments tend to emphasize “small” versus “big” questions, promoting incremental/testable policies.
However, there are examples of experiments addressing big and difficult questions. Can you think of any example or a proposal? ( hint, see https://graemeblair.com/teaching/UCLA_PS200E_Syllabus.pdf )
Researchers tend to formalize their effect of interest as regression coefficients (i.e., their hypotheses are formulated within a statistical model)
Potential outcomes offer a way to formalize what we mean by a causal effect outside any statistical model. Graphical models provides a way to formalize our assumptions without parametric restrictions.
This allows us to clearly separate what do we want (a certain estimand), what needs to be true so we get what we want (identifying assumptions), the statistical machinery to transform data into an answer for our question (an estimator), and the particular answer we get (our empirical estimate).
Statistics/ML
Causal Inference
| Estimand | Activity | Field/Discipline | Questions | Example |
|---|---|---|---|---|
| \(\mathbf{P(Y \vert X)}\) | Seeing, Observing | Stats, Machine Learning | What would I believe about Y if I see X? | What is the expected income of a college graduate in a given field? |
| Estimand | Activity | Field/Discipline | Questions | Example |
|---|---|---|---|---|
| \(\mathbf{P(Y \vert do(x))}\) | Doing, Intervening | Experiments, Policy evaluation | What would happen with Y if I change X? | How would income levels change in response to college expansion? |
| Estimand | Activity | Field/Discipline | Questions | Example |
|---|---|---|---|---|
| \(\mathbf{P(Y_x \vert x',y')}\) | Imagining, Retrospecting | Structural Models | What would have happened with Y have I done X instead of X’? Why? | What would have my parents’ income been, have they graduated from college, given that they didn’t go? |
Most social science questions are in fact causal
The social sciences are experimenting what some authors have described as the rise of “causal empiricism” (Samii, 2016), a “credibility revolution” (Angrist and Pischke, 2010), or simply a “causal revolution” (Pearl and MacKenzie, 2018)
In artificial intelligence/ML, causality have been deemed “the next frontier” and “the next most important thing”
The enormous progress in the last decades has been facilitated by the development of mathematical frameworks that provide researchers with tools to handle causal questions: Potential Outcomes and the Structural Causal Model
This class is designed as a first course in causal inference, so we focus on essentials:
Familiarize yourself with the most widely used causal inference frameworks
Understand the role of randomisation to tackle causal questions
Use potential outcomes (and the do-operator) to formalize causal estimands
Use directed acyclic graphs (DAGs) to encode qualitative assumptions and derive testable implications
Selection on observables (regression, imputation, matching, weighting, doubly robust methods and flexible estimation using machine learning)
Difference-in-difference, synthetic controls, and extensions
Instrumental variables and regression discontinuity designs
Sensitivity analysis
Lectures: Wed 2-4pm
Exercises: hands-on practices that accompany the lectures (self-study), released weekly
Bi-weekly drop-ins: optional, lecturer and TA present for any questions, Fri 12-1pm weeks 2/4/6/8
Summative assessments: to be released bi-weekly
We are in it together!
Ask questions, engage with the material, immerse yourself
Get confused and frustrated but carry on
Help each other (except summative assessments)
Good methodological skills will empower you as a sociologist!
Climbing the causal ladder can change you (as did Ebenezer Scrooge)
Why Causal Inference?