Teaching ECON 305

2/7/2020

Economics, Causality and Analytics

A new econometrics course
Designed to go at the start of the sequence
Focusing on two things: an introduction to programming, and causal inference/research design

Resources

Focus

Notable exclusions:

Standard errors
Regression
Hypothesis testing
Robustness tests

Why?

Provide an introduction to programming separately
Emphasize the why of econometrics
Demystify regression
Encourage an understanding of research design, and prepare students to use econometrics

Today

Establish the goals of the course
Talk about the way I structure the course when I teach it
Walk through expectations in the programming part
Econometrics without regression
Concepts underlying the causal inference part
Causal inference without regression

Goals

Students should be able to manipulate data in R and do basic summarizing calculations
Understand the concept of a DGP
Be able to draw and work with causal diagrams
Be familiar with standard causal inference research designs

How I Do It

Everyone can do it their own way, of course. What I do:

Weekly homework (27%)
Two writing assignments: “Causal Inference in the News” and “Research Design” (10% ea.)
Programming midterm, causal inference midterm, final (12%/12%/20%)
Attendance (9%)

Logistics

R done through Rstudio.cloud
Causal inference homeworks turned in on Titanium
Computer lab
Plenty of provided materials
Data almost entirely comes from packages, occasionally I provide a file with URL to load directly

Writing Assignments

“Causal Inference in the News”

Find a news article that makes a causal claim
Model that claim
Evaluate whether, based on the work described in the article, the claim is adequately identified

Writing Assignments

“Research Design”

Come up with a research topic of a causal nature
Model that topic
Describe how it can be identified
And what method would be best used to identify the effect (in a particular setting), and why

Questions

Kinds of homework and exam questions?

Programming: create these variables, do this data manipulation, get means within bins, make this plot, choose appropriate summary statistics, interpret summary statistics
Causal inference: Draw a causal diagram, critique a diagram or application of a method, select controls to identify an effect, implement a causal inference method in R, for a description of a setting select a causal inference method

Programming

The first ~half of the class (a little less) is focused on learning to use R, with a focus on working close to the data. I won’t teach R in this workshop and will assume you attended the first workshop. So class goals:

Load data
Manipulate data with dplyr and the pipe %>%
Generate summary statistics and graphs
Use a for loop and create fake data to perform a Monte Carlo (!)

Why These?

Can’t do anything without working with data
Causal inference part of the course will avoid regression
Lowers number of packages to just tidyverse and stargazer, plus wherever you get data
Monte Carlo instrumental in understanding the DGP later. Plus you should learn how to do a for loop.

Loading Data

This they get off pretty easy on!
Most data sets come from within packages. Try data( and see what pops up. Or see here. Ecdat is great, as are wooldridge and AER
Or, upload the data somewhere nice, and give them a URL to use with read.csv

dplyr

Much of the programming content of the course revolves around dplyr
Key verbs:
mutate(), select(), filter(), arrange()
And especially: group_by() with mutate() or summarize() \(\leftarrow\) how we do a lot of our causal inference
Other relevant data-work functions: cut(), var(), mean(), quantile(), stargazer(), pull(), sample(), rnorm(), runif(), quantile()

Me-We-You

Common programming teaching approach is “I do it, we do it, you do it”

Example:

In tidyverse, get the midwest data. Let’s look at summary stats.
We’ll be using cut(breaks = 5) with group_by() and mutate() to get the proportion of variance of percbelowpoverty explained by popdensity (me), percollege (us), and perchsd (you)

Answers

library(tidyverse)
data(midwest)

midwest <- midwest %>%
  mutate(den_cuts = cut(popdensity, breaks = 5)) %>%
  group_by(den_cuts) %>%
  mutate(pov.r = percbelowpoverty - mean(percbelowpoverty)) %>%
  ungroup()

1 - var(midwest$pov.r)/var(midwest$percbelowpoverty)

## [1] 0.01590158

Using the pipe

The pipe %>% takes the thing on the left and makes it the first argument of the thing on the right. Students have difficult with nested parentheses and this HELPS.

Me-we-you:

In midwest, get the mean of popdensity in inmetro areas
In midwest, create a new variable equal to 1 if popdensity is above its median, then get median percollege by above/below median, sorted to be below then above
In midwest, keep just the variables for percollege and popdensity. Just among observations with popdensity above its 75th percentile, what is the mean percollege?

Answers

data("midwest")

midwest %>% filter(inmetro == 1) %>% pull(popdensity) %>% mean()

midwest %>% 
  mutate(abovemed = popdensity > median(popdensity)) %>%
  group_by(abovemed) %>%
  summarize(perc = median(percollege)) %>%
  arrange(abovemed)

midwest %>%
  select(percollege, popdensity) %>%
  filter(popdensity > quantile(popdensity, .75)) %>%
  pull(percollege) %>%
  mean()

Explained by and Residuals

Much of causal inference is going to be looking at how one variable “explains” another
Despite the occasional use of cor(), in general we look at the relationship between two variables by groups-within bins
This lets us do the analysis very close to the data
Steps: mutate(x.cut=cut(x)) any continuous explanatory variable, group_by(x.cut), and either mutate(y = mean(y)) to get a new variable of explained values, or mutate(y = y - mean(y)) for residuals, or summarize(y = mean(y)) to get aggregate means-within-bins.
ungroup()!

Summary stats and graphing

Discussion of different summary statistics and when they are used
Summary statistics are ways of summarizing the distribution of a variable
Start with a geom_density() graph, overlay summary stats to demonstrate
Focus on mean, percentiles
stargazer() in the stargazer package for a table of summary stats

Graphing

ggplot in ggplot2
Occasionally add on a second plot to contrast two things
Don’t worry about theming. We do scatterplot, density, line graph, maybe add labs(), that’s it.

Graphing - As complex as it gets

Potential homework assignment:

Take the midwest data in the tidyverse
Take the log() of popdensity and create a new variable: lpop
Use cut() with breaks = 5 to cut lpop and make a new variable den_cuts
Create pov.exp, which is percbelowpoverty explained by den_cuts
Graph percbelowpoverty (y-axis) against lpop (x-axis) on a scatterplot. Add a second scatterplot on top which has pov.exp as the y-axis and is in red. Label axes appropriately.

Answer

data(midwest)
midwest <- midwest %>%
  mutate(lpop = log(popdensity)) %>%
  mutate(den_cuts = cut(lpop, breaks = 5),) %>%
  group_by(den_cuts) %>%
  mutate(pov.exp = mean(percbelowpoverty)) %>%
  ungroup()
ggplot(midwest, aes(x = lpop, y = percbelowpoverty)) + geom_point() + 
  geom_point(aes(x = lpop, y = pov.exp), color = 'red') + 
  labs(x = "Log Population Density", y = "Percent Below Poverty")

Monte Carlo

We focus a lot on the data-generating process, and how we can recover that DGP from observed data
So it makes a lot of sense to use a known DGP and try to uncover that
Allows us to make causal relationships explicit

Example

store_results <- c()
for (i in 1:500) {
  # w -> x and w -> y and x -> y
  df <- tibble(w = rnorm(1000)) %>%
    mutate(x = rnorm(1000) + 2*w > 0) %>%
    mutate(y = x - 3*w + rnorm(1000))
  # Get effect of x on y
  analysis <- df %>%
    group_by(x) %>%
    summarize(y = mean(y))
  effect <- analysis$y[2] - analysis$y[1]
  store_results[i] <- effect
}
# True effect is +1. But what do we estimate?
mean(store_results)

## [1] -3.27781

Causal Inference

Goals:

Understand a DGP and the difference between “true model” and estimate
Understand how to write a DGP as a causal diagram
Understand how to use a diagram to select a minimum necessary set of controls
Understand basic toolbox methods (controls, FE, DID, RDD, IV) - perform them in our way using R, understand their representative DAGs, when they apply and don’t

DGP and Identification

Every analysis identifies something
But is that something the thing you want?
We need to think about the DGP to do this

Example Slides from My Class

We say that X causes Y if…
were we to intervene and change the value of X without changing anything else…
then Y would also change as a result

Some examples (EXAMPLE)

Examples of causal relationships!

Some obvious:

A light switch being set to on causes the light to be on
Setting off fireworks raises the noise level

Some less obvious:

Getting a college degree increases your earnings
Tariffs reduce the amount of trade

Some examples (EXAMPLE)

Examples of non-zero correlations that are not causal (or may be causal in the wrong direction!)

Some obvious:

People tend to wear shorts on days when ice cream trucks are out
Rooster crowing sounds are followed closely by sunrise

Some less obvious:

Colds tend to clear up a few days after you take Emergen-C
The performance of the economy tends to be lower or higher depending on the president’s political party

Important Note (EXAMPLE)

“X causes Y” doesn’t mean that X is necessarily the only thing that causes Y
And it doesn’t mean that all Y must be X
For example, using a light switch causes the light to go on
But not if the bulb is burned out (no Y, despite X), or if the light was already on (Y without X)
But still we’d say that using the switch causes the light! The important thing is that X changes the probability that Y happens, not that it necessarily makes it happen for certain

The Problem of Causal Inference (skip forward, EXAMPLE)

The main goal we have in doing causal inference is in making as good a guess as possible as to what that Y would have been if X had been different
That “would have been” is called a counterfactual - counter to the fact of what actually happened
In doing so, we want to think about two people/firms/countries that are basically exactly the same except that one has X=0 and one has X=1
[continue on to use experiments as intuition]

An Example (EXAMPLE)

Let’s cheat again and know how our data is generated! [FOCUS ON DGP]
Let’s say that getting X causes Y to increase by 1
And let’s run a randomized experiment of who actually gets X

df <- data.frame(Y.without.X = rnorm(1000),X=sample(c(0,1),1000,replace=T)) %>%
  mutate(Y.with.X = Y.without.X + 1) %>%
  #Now assign who actually gets X
  mutate(Observed.Y = ifelse(X==1,Y.with.X,Y.without.X))
df %>% group_by(X) %>% summarize(Y = mean(Observed.Y))

## # A tibble: 2 x 2
##       X      Y
##   <dbl>  <dbl>
## 1     0 0.0450
## 2     1 1.03

Now no randomization (EXAMPLE)

df <- data.frame(Z = runif(10000)) %>% mutate(Y.without.X = rnorm(10000) + Z, Y.with.X = Y.without.X + 1) %>%
  #Now assign who actually gets X
  mutate(X = Z > .7,Observed.Y = ifelse(X==1,Y.with.X,Y.without.X))
df %>% group_by(X) %>% summarize(Y = mean(Observed.Y))

## # A tibble: 2 x 2
##   X         Y
##   <lgl> <dbl>
## 1 FALSE 0.369
## 2 TRUE  1.84

#But if we properly model the process and compare apples to apples...
df2 <- df %>% filter(abs(Z-.7)<.01) %>% group_by(X) %>% summarize(Y = mean(Observed.Y))
c(df2$Y[1], df2$Y[2])

## [1] 0.7084716 1.7060520

So…

Focus heavily on the concept of a “true model”, aka the data-generating process
Our goal is to uncover that true model
Which we can do if we properly design our analysis

Causal Diagrams

This leads us to the use of causal diagrams
We get a lot of practice drawing them
And then solving them by selecting the appropriate set of controls

Important

Some class concepts:

“controlling for W” doesn’t mean “add a control W in a regression model.” It means “remove the variation in treatment and outcome explained by W”
This could be in a regression, but also could just be subtracting means-within-bins
Or matching to construct a sample with no variation in W
Or selecting a sample with no variation in W \(\leftarrow\) this can lead us to control for colliders by accident! (“selection bias”)

Causal Diagrams Intuition

I won’t go over causal diagrams, and will assume you attended the last workshop. But some helpful pedagogical tips:

Someone leaves their house. But how did they get out? Did they come out the front door, as we want, or the back door? Or the chimney? Only way to be sure is to block the back doors so there’s only one way to get out
The relationship between X and Y contains the front-door causal associations we want, but also the back-door associations we don’t

Building Diagrams

Students should get comfortable building these diagrams
This is a basic version of getting them ready for modeling the real world generally, likely to come up even in non-metrics classes

Building Diagram Steps (EXAMPLE)

Consider all the variables that are likely to be important in the data generating process (this includes variables you can’t observe)
For simplicity, combine them together or prune the ones least likely to be important
Consider which variables are likely to affect which other variables and draw arrows from one to the other
(Bonus: Test some implications of the model to see if you have the right one)

So let’s do it! (EXAMPLE)

Let’s start with an econometrics classic: what is the causal effect of an additional year of education on earnings?
That is, if we reached in and made someone get one more year of education than they already did, how much more money would they earn?

With a diagram, ask…

Is this diagram complete? What important variables or arrows may be left out?
Assuming the diagram is true, what needs to be controlled for to identify the effect?
Can we actually control for those things? Or are some variables unmeasured, or poorly measured?

Dagitty

Students can easily use dagitty.net to draw diagrams
Screenshot and crop to turn in with homework
Be aware it will give them the necessary-controls “answer” so can sometimes make writing homework problems difficult

The Toolbox

Okay, so we’ve got the concept of identification
And we’ve figured out conceptually how controlling works to get us there (and watch out for colliders!)
But how do we do it without regression?
And how can we do common designs that aren’t just controlling?

Controlling

To control for W in Y~X, get means of Y and X within categories/bins of W, subtract them out
Remove the part of the X/Y relationship explained by W

In Code

data(mtcars)

# Relationship between hp and wt, controlling for disp
mtcars <- mtcars %>%
  mutate(disp_bins = cut(disp, breaks = 3)) %>%
  group_by(disp_bins) %>%
  mutate(hp.res = hp - mean(hp),
         wt.res = wt - mean(wt))

cor(mtcars$hp, mtcars$wt)

## [1] 0.6587479

cor(mtcars$hp.res, mtcars$wt.res)

## [1] 0.1641105

Graphically

Matching

Considering dropping this since it doesn’t show up much in later metrics classes
Although it’s common in data science!
The easiest to do by hand is coarsened exact matching, which IMO is a better tool than PSM anyway (despite PSM’s ubuquity)
Conceptually, anything that picks controls with similar covariates to treated obs

Matching

library(Ecdat); data(Wages)
Wages <- Wages %>% mutate(ed.coarse = cut(ed,breaks=3),
                        exp.coarse = cut(exp,breaks=3))
#Split up the treated and untreated
union <- Wages %>% filter(union=='yes')
nonunion <- Wages %>% filter(union=='no') %>%
  #For every potential complete-match, let's get the average Y
  group_by(ed.coarse,exp.coarse,bluecol,
           ind,south,smsa,married,sex,black) %>%
  summarize(untreated.lwage = mean(lwage))
results <- union %>% inner_join(nonunion) %>%
  summarize(union.mean = mean(lwage),nonunion.mean=mean(untreated.lwage))
results

##   union.mean nonunion.mean
## 1   6.687606      6.571178

Fixed Effects

Fixed effects is super easy
We’re just back to regular old controlling
Just now we control for identity!
Be sure to emphasize what back doors this closes (anything fixed within individual) and what it doesn’t (anything changing over time within individual)

Fixed Effects

Effect of GDP per capita on life expectancy - control for country, but FE won’t get war!

Fixed Effects

Difference-in-Difference

We have before and after of a treatment for a treated group
Time is on a back door!
But we can’t control for it because that would eliminate all treatment difference
So we need a control group to contrast the before/after difference with

Difference-in-Difference

#Create our data
diddata <- tibble(year = sample(2002:2010,10000,replace=T),
                  group = sample(c('TreatedGroup','UntreatedGroup'),10000,replace=T)) %>%
  mutate(after = (year >= 2007)) %>%
  #Only let the treatment be applied to the treated group
  mutate(D = after*(group=='TreatedGroup')) %>%
  mutate(Y = 2*D + .5*year + rnorm(10000))
#Now, get before-after differences for both groups
means <- diddata %>% group_by(group,after) %>% summarize(Y=mean(Y))
#Before-after difference for untreated, has time effect only
bef.aft.untreated <- means$Y[4] - means$Y[3]
#Before-after for treated, has time and treatment effect
bef.aft.treated <- means$Y[2] - means$Y[1]
#Difference-in-Difference! Take the Time + Treatment effect, and remove the Time effect
DID <- bef.aft.treated - bef.aft.untreated
DID

## [1] 2.002812

Difference-in-Difference

Regression Discontinuity

Without regression?
All we lose is the ability to parametrically refine our prediction of near-the-cutoff values using away-from-cutoff values
So, limit to area around cutoff, take difference of means. Easy!
Conceptually, once we control for running variable (by limiting to area around cutoff), above is an IV for treatment (no actual IV yet so discuss this conceptually)

Regression Discontinuity

rdd.data <- tibble(test = runif(1000)*100) %>%
  mutate(GATE = test >= 75) %>% 
  mutate(earn = runif(1000)*40+10*GATE+test/2)
#Choose a "bandwidth" of how wide around the cutoff to look 
#(arbitrary in our example)
#Bandwidth of 2 with a cutoff of 75 means we look from 75-2 to 75+2
bandwidth <- 2
#Just look within the bandwidth
rdd <- rdd.data %>% filter(abs(75-test) < bandwidth) %>%
  #Create a variable indicating we're above the cutoff
  mutate(above = test >= 75) %>%
  #And compare our outcome just below the cutoff to just above
  group_by(above) %>% summarize(earn = mean(earn))
#Our effect looks just about right (10 is the truth)
rdd$earn[2] - rdd$earn[1]

## [1] 14.09107

Regression Discontinuity

Instrumental Variables

IV sets up very well as a DAG. You need all paths from Z to Y to go through X, after closing back doors with controls
Warn that there actually needs to be a relationship between Z and X (weak inst problem)
And that finding a realistic IV can be hard
Motivate the whole thing with imperfect random assignment \(\leftarrow\) intuitive
The opposite of a control. Instead of explaining X and Y with Z and throwing that part out, ONLY KEEP that part!

Instrumental Variables

Either use group_by() and summarize() for Wald estimator with binary instrument
Or do the same with binned instrument, but use cor() for results with continuous instrument

df <- tibble(R = sample(c(0,1),500,replace=T)) %>%
  #We tell them whether or not to get treated
  mutate(X = R) %>%
  #But some of them don't listen! 20% do the OPPOSITE!
  mutate(X = ifelse(runif(500) > .8,1-R,R)) %>%
  mutate(Y = 5*X + rnorm(500))

Instrumental Variables

Actually calculate it now:

iv <- df %>% group_by(R) %>% summarize(Y = mean(Y), X = mean(X))
iv

## # A tibble: 2 x 3
##       R     Y     X
##   <dbl> <dbl> <dbl>
## 1     0 0.828 0.160
## 2     1 3.94  0.779

#Remember, since our instrument is binary, we want the slope
(iv$Y[2] - iv$Y[1])/(iv$X[2]-iv$X[1])

## [1] 5.03231

Instrumental Variables - Continuous IV

library(AER)
data(CigarettesSW)

CigarettesSW <- CigarettesSW %>%
  mutate(cigtax = taxs-tax) %>%
  mutate(price = price/cpi,
         cigtax = cigtax/cpi) %>%
  group_by(cut(cigtax,breaks=7)) %>%
  summarize(priceexp = mean(price),
         packsexp = mean(packs)) %>%
  ungroup()

cor(CigarettesSW$priceexp,CigarettesSW$packsexp)

## [1] -0.9711096

Instrumental Variables

So!

ECON 305

This has covered the goals and approaches taken in ECON 305
I’m still revising my class, and obviously more than one way to do all of this
But this should give you a good start and hopefully make you comfortable with the idea of teaching it
Keep in mind all the available resources! All my slides are available, plus workshop materials, much more on my website nickchk.com

Economics, Causality and Analytics

Resources

Focus

Why?

Today

Goals

How I Do It

Logistics

Writing Assignments

Writing Assignments

Questions

Programming

Programming

Why These?

Loading Data

dplyr

Me-We-You

Answers

Using the pipe

Answers

Explained by and Residuals

Summary stats and graphing

Graphing

Graphing - As complex as it gets

Answer

Monte Carlo

Example

Causal Inference

Causal Inference

DGP and Identification

Example Slides from My Class

Some examples (EXAMPLE)

Some examples (EXAMPLE)

Important Note (EXAMPLE)

The Problem of Causal Inference (skip forward, EXAMPLE)

An Example (EXAMPLE)

Now no randomization (EXAMPLE)

So…

Causal Diagrams

Important

Causal Diagrams Intuition

Building Diagrams

Building Diagram Steps (EXAMPLE)

So let’s do it! (EXAMPLE)

With a diagram, ask…

Dagitty

The Toolbox

The Toolbox

Controlling

In Code

Graphically

Matching

Matching

Fixed Effects

Fixed Effects

Fixed Effects

Difference-in-Difference

Difference-in-Difference

Difference-in-Difference

Difference-in-Difference

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Instrumental Variables

Instrumental Variables

Instrumental Variables

Instrumental Variables

Instrumental Variables

Instrumental Variables - Continuous IV

Instrumental Variables

So!

ECON 305

Thanks!