2/7/2020

Economics, Causality and Analytics

  • A new econometrics course
  • Designed to go at the start of the sequence
  • Focusing on two things: an introduction to programming, and causal inference/research design

Resources

Focus

Notable exclusions:

  • Standard errors
  • Regression
  • Hypothesis testing
  • Robustness tests

Why?

  • Provide an introduction to programming separately
  • Emphasize the why of econometrics
  • Demystify regression
  • Encourage an understanding of research design, and prepare students to use econometrics

Today

  • Establish the goals of the course
  • Talk about the way I structure the course when I teach it
  • Walk through expectations in the programming part
  • Econometrics without regression
  • Concepts underlying the causal inference part
  • Causal inference without regression

Goals

  • Students should be able to manipulate data in R and do basic summarizing calculations
  • Understand the concept of a DGP
  • Be able to draw and work with causal diagrams
  • Be familiar with standard causal inference research designs

How I Do It

Everyone can do it their own way, of course. What I do:

  • Weekly homework (27%)
  • Two writing assignments: “Causal Inference in the News” and “Research Design” (10% ea.)
  • Programming midterm, causal inference midterm, final (12%/12%/20%)
  • Attendance (9%)

Logistics

  • R done through Rstudio.cloud
  • Causal inference homeworks turned in on Titanium
  • Computer lab
  • Plenty of provided materials
  • Data almost entirely comes from packages, occasionally I provide a file with URL to load directly

Writing Assignments

“Causal Inference in the News”

  • Find a news article that makes a causal claim
  • Model that claim
  • Evaluate whether, based on the work described in the article, the claim is adequately identified

Writing Assignments

“Research Design”

  • Come up with a research topic of a causal nature
  • Model that topic
  • Describe how it can be identified
  • And what method would be best used to identify the effect (in a particular setting), and why

Questions

Kinds of homework and exam questions?

  • Programming: create these variables, do this data manipulation, get means within bins, make this plot, choose appropriate summary statistics, interpret summary statistics
  • Causal inference: Draw a causal diagram, critique a diagram or application of a method, select controls to identify an effect, implement a causal inference method in R, for a description of a setting select a causal inference method

Programming

Programming

The first ~half of the class (a little less) is focused on learning to use R, with a focus on working close to the data. I won’t teach R in this workshop and will assume you attended the first workshop. So class goals:

  • Load data
  • Manipulate data with dplyr and the pipe %>%
  • Generate summary statistics and graphs
  • Use a for loop and create fake data to perform a Monte Carlo (!)

Why These?

  • Can’t do anything without working with data
  • Causal inference part of the course will avoid regression
  • Lowers number of packages to just tidyverse and stargazer, plus wherever you get data
  • Monte Carlo instrumental in understanding the DGP later. Plus you should learn how to do a for loop.

Loading Data

  • This they get off pretty easy on!
  • Most data sets come from within packages. Try data( and see what pops up. Or see here. Ecdat is great, as are wooldridge and AER
  • Or, upload the data somewhere nice, and give them a URL to use with read.csv

dplyr

  • Much of the programming content of the course revolves around dplyr
  • Key verbs:
  • mutate(), select(), filter(), arrange()
  • And especially: group_by() with mutate() or summarize() \(\leftarrow\) how we do a lot of our causal inference
  • Other relevant data-work functions: cut(), var(), mean(), quantile(), stargazer(), pull(), sample(), rnorm(), runif(), quantile()

Me-We-You

Common programming teaching approach is “I do it, we do it, you do it”

Example:

  • In tidyverse, get the midwest data. Let’s look at summary stats.
  • We’ll be using cut(breaks = 5) with group_by() and mutate() to get the proportion of variance of percbelowpoverty explained by popdensity (me), percollege (us), and perchsd (you)

Answers

library(tidyverse)
data(midwest)

midwest <- midwest %>%
  mutate(den_cuts = cut(popdensity, breaks = 5)) %>%
  group_by(den_cuts) %>%
  mutate(pov.r = percbelowpoverty - mean(percbelowpoverty)) %>%
  ungroup()

1 - var(midwest$pov.r)/var(midwest$percbelowpoverty)
## [1] 0.01590158

Using the pipe

The pipe %>% takes the thing on the left and makes it the first argument of the thing on the right. Students have difficult with nested parentheses and this HELPS.

Me-we-you:

  • In midwest, get the mean of popdensity in inmetro areas
  • In midwest, create a new variable equal to 1 if popdensity is above its median, then get median percollege by above/below median, sorted to be below then above
  • In midwest, keep just the variables for percollege and popdensity. Just among observations with popdensity above its 75th percentile, what is the mean percollege?

Answers

data("midwest")

midwest %>% filter(inmetro == 1) %>% pull(popdensity) %>% mean()

midwest %>% 
  mutate(abovemed = popdensity > median(popdensity)) %>%
  group_by(abovemed) %>%
  summarize(perc = median(percollege)) %>%
  arrange(abovemed)

midwest %>%
  select(percollege, popdensity) %>%
  filter(popdensity > quantile(popdensity, .75)) %>%
  pull(percollege) %>%
  mean()

Explained by and Residuals

  • Much of causal inference is going to be looking at how one variable “explains” another
  • Despite the occasional use of cor(), in general we look at the relationship between two variables by groups-within bins
  • This lets us do the analysis very close to the data
  • Steps: mutate(x.cut=cut(x)) any continuous explanatory variable, group_by(x.cut), and either mutate(y = mean(y)) to get a new variable of explained values, or mutate(y = y - mean(y)) for residuals, or summarize(y = mean(y)) to get aggregate means-within-bins.
  • ungroup()!

Summary stats and graphing

  • Discussion of different summary statistics and when they are used
  • Summary statistics are ways of summarizing the distribution of a variable
  • Start with a geom_density() graph, overlay summary stats to demonstrate
  • Focus on mean, percentiles
  • stargazer() in the stargazer package for a table of summary stats

Graphing

  • ggplot in ggplot2
  • Occasionally add on a second plot to contrast two things
  • Don’t worry about theming. We do scatterplot, density, line graph, maybe add labs(), that’s it.

Graphing - As complex as it gets

Potential homework assignment:

  • Take the midwest data in the tidyverse
  • Take the log() of popdensity and create a new variable: lpop
  • Use cut() with breaks = 5 to cut lpop and make a new variable den_cuts
  • Create pov.exp, which is percbelowpoverty explained by den_cuts
  • Graph percbelowpoverty (y-axis) against lpop (x-axis) on a scatterplot. Add a second scatterplot on top which has pov.exp as the y-axis and is in red. Label axes appropriately.

Answer

data(midwest)
midwest <- midwest %>%
  mutate(lpop = log(popdensity)) %>%
  mutate(den_cuts = cut(lpop, breaks = 5),) %>%
  group_by(den_cuts) %>%
  mutate(pov.exp = mean(percbelowpoverty)) %>%
  ungroup()
ggplot(midwest, aes(x = lpop, y = percbelowpoverty)) + geom_point() + 
  geom_point(aes(x = lpop, y = pov.exp), color = 'red') + 
  labs(x = "Log Population Density", y = "Percent Below Poverty")

Monte Carlo

  • We focus a lot on the data-generating process, and how we can recover that DGP from observed data
  • So it makes a lot of sense to use a known DGP and try to uncover that
  • Allows us to make causal relationships explicit

Example

store_results <- c()
for (i in 1:500) {
  # w -> x and w -> y and x -> y
  df <- tibble(w = rnorm(1000)) %>%
    mutate(x = rnorm(1000) + 2*w > 0) %>%
    mutate(y = x - 3*w + rnorm(1000))
  # Get effect of x on y
  analysis <- df %>%
    group_by(x) %>%
    summarize(y = mean(y))
  effect <- analysis$y[2] - analysis$y[1]
  store_results[i] <- effect
}
# True effect is +1. But what do we estimate?
mean(store_results)
## [1] -3.27781

Causal Inference

Causal Inference

Goals:

  • Understand a DGP and the difference between “true model” and estimate
  • Understand how to write a DGP as a causal diagram
  • Understand how to use a diagram to select a minimum necessary set of controls
  • Understand basic toolbox methods (controls, FE, DID, RDD, IV) - perform them in our way using R, understand their representative DAGs, when they apply and don’t

DGP and Identification

  • Every analysis identifies something
  • But is that something the thing you want?
  • We need to think about the DGP to do this

Example Slides from My Class

  • We say that X causes Y if…
  • were we to intervene and change the value of X without changing anything else…
  • then Y would also change as a result

Some examples (EXAMPLE)

Examples of causal relationships!

Some obvious:

  • A light switch being set to on causes the light to be on
  • Setting off fireworks raises the noise level

Some less obvious:

  • Getting a college degree increases your earnings
  • Tariffs reduce the amount of trade

Some examples (EXAMPLE)

Examples of non-zero correlations that are not causal (or may be causal in the wrong direction!)

Some obvious:

  • People tend to wear shorts on days when ice cream trucks are out
  • Rooster crowing sounds are followed closely by sunrise

Some less obvious:

  • Colds tend to clear up a few days after you take Emergen-C
  • The performance of the economy tends to be lower or higher depending on the president’s political party

Important Note (EXAMPLE)

  • “X causes Y” doesn’t mean that X is necessarily the only thing that causes Y
  • And it doesn’t mean that all Y must be X
  • For example, using a light switch causes the light to go on
  • But not if the bulb is burned out (no Y, despite X), or if the light was already on (Y without X)
  • But still we’d say that using the switch causes the light! The important thing is that X changes the probability that Y happens, not that it necessarily makes it happen for certain

The Problem of Causal Inference (skip forward, EXAMPLE)

  • The main goal we have in doing causal inference is in making as good a guess as possible as to what that Y would have been if X had been different
  • That “would have been” is called a counterfactual - counter to the fact of what actually happened
  • In doing so, we want to think about two people/firms/countries that are basically exactly the same except that one has X=0 and one has X=1
  • [continue on to use experiments as intuition]

An Example (EXAMPLE)

  • Let’s cheat again and know how our data is generated! [FOCUS ON DGP]
  • Let’s say that getting X causes Y to increase by 1
  • And let’s run a randomized experiment of who actually gets X
df <- data.frame(Y.without.X = rnorm(1000),X=sample(c(0,1),1000,replace=T)) %>%
  mutate(Y.with.X = Y.without.X + 1) %>%
  #Now assign who actually gets X
  mutate(Observed.Y = ifelse(X==1,Y.with.X,Y.without.X))
df %>% group_by(X) %>% summarize(Y = mean(Observed.Y))
## # A tibble: 2 x 2
##       X      Y
##   <dbl>  <dbl>
## 1     0 0.0450
## 2     1 1.03

Now no randomization (EXAMPLE)

df <- data.frame(Z = runif(10000)) %>% mutate(Y.without.X = rnorm(10000) + Z, Y.with.X = Y.without.X + 1) %>%
  #Now assign who actually gets X
  mutate(X = Z > .7,Observed.Y = ifelse(X==1,Y.with.X,Y.without.X))
df %>% group_by(X) %>% summarize(Y = mean(Observed.Y))
## # A tibble: 2 x 2
##   X         Y
##   <lgl> <dbl>
## 1 FALSE 0.369
## 2 TRUE  1.84
#But if we properly model the process and compare apples to apples...
df2 <- df %>% filter(abs(Z-.7)<.01) %>% group_by(X) %>% summarize(Y = mean(Observed.Y))
c(df2$Y[1], df2$Y[2])
## [1] 0.7084716 1.7060520

So…

  • Focus heavily on the concept of a “true model”, aka the data-generating process
  • Our goal is to uncover that true model
  • Which we can do if we properly design our analysis

Causal Diagrams

  • This leads us to the use of causal diagrams
  • We get a lot of practice drawing them
  • And then solving them by selecting the appropriate set of controls

Important

Some class concepts:

  • “controlling for W” doesn’t mean “add a control W in a regression model.” It means “remove the variation in treatment and outcome explained by W
  • This could be in a regression, but also could just be subtracting means-within-bins
  • Or matching to construct a sample with no variation in W
  • Or selecting a sample with no variation in W \(\leftarrow\) this can lead us to control for colliders by accident! (“selection bias”)

Causal Diagrams Intuition

I won’t go over causal diagrams, and will assume you attended the last workshop. But some helpful pedagogical tips:

  • Someone leaves their house. But how did they get out? Did they come out the front door, as we want, or the back door? Or the chimney? Only way to be sure is to block the back doors so there’s only one way to get out
  • The relationship between X and Y contains the front-door causal associations we want, but also the back-door associations we don’t

Building Diagrams

  • Students should get comfortable building these diagrams
  • This is a basic version of getting them ready for modeling the real world generally, likely to come up even in non-metrics classes

Building Diagram Steps (EXAMPLE)

  1. Consider all the variables that are likely to be important in the data generating process (this includes variables you can’t observe)
  2. For simplicity, combine them together or prune the ones least likely to be important
  3. Consider which variables are likely to affect which other variables and draw arrows from one to the other
  4. (Bonus: Test some implications of the model to see if you have the right one)

So let’s do it! (EXAMPLE)

  • Let’s start with an econometrics classic: what is the causal effect of an additional year of education on earnings?
  • That is, if we reached in and made someone get one more year of education than they already did, how much more money would they earn?

With a diagram, ask…

  • Is this diagram complete? What important variables or arrows may be left out?
  • Assuming the diagram is true, what needs to be controlled for to identify the effect?
  • Can we actually control for those things? Or are some variables unmeasured, or poorly measured?

Dagitty

  • Students can easily use dagitty.net to draw diagrams
  • Screenshot and crop to turn in with homework
  • Be aware it will give them the necessary-controls “answer” so can sometimes make writing homework problems difficult

The Toolbox

The Toolbox

  • Okay, so we’ve got the concept of identification
  • And we’ve figured out conceptually how controlling works to get us there (and watch out for colliders!)
  • But how do we do it without regression?
  • And how can we do common designs that aren’t just controlling?

Controlling

  • To control for W in Y~X, get means of Y and X within categories/bins of W, subtract them out
  • Remove the part of the X/Y relationship explained by W

In Code

data(mtcars)

# Relationship between hp and wt, controlling for disp
mtcars <- mtcars %>%
  mutate(disp_bins = cut(disp, breaks = 3)) %>%
  group_by(disp_bins) %>%
  mutate(hp.res = hp - mean(hp),
         wt.res = wt - mean(wt))

cor(mtcars$hp, mtcars$wt)
## [1] 0.6587479
cor(mtcars$hp.res, mtcars$wt.res)
## [1] 0.1641105

Graphically

Matching

  • Considering dropping this since it doesn’t show up much in later metrics classes
  • Although it’s common in data science!
  • The easiest to do by hand is coarsened exact matching, which IMO is a better tool than PSM anyway (despite PSM’s ubuquity)
  • Conceptually, anything that picks controls with similar covariates to treated obs

Matching

library(Ecdat); data(Wages)
Wages <- Wages %>% mutate(ed.coarse = cut(ed,breaks=3),
                        exp.coarse = cut(exp,breaks=3))
#Split up the treated and untreated
union <- Wages %>% filter(union=='yes')
nonunion <- Wages %>% filter(union=='no') %>%
  #For every potential complete-match, let's get the average Y
  group_by(ed.coarse,exp.coarse,bluecol,
           ind,south,smsa,married,sex,black) %>%
  summarize(untreated.lwage = mean(lwage))
results <- union %>% inner_join(nonunion) %>%
  summarize(union.mean = mean(lwage),nonunion.mean=mean(untreated.lwage))
results
##   union.mean nonunion.mean
## 1   6.687606      6.571178

Fixed Effects

  • Fixed effects is super easy
  • We’re just back to regular old controlling
  • Just now we control for identity!
  • Be sure to emphasize what back doors this closes (anything fixed within individual) and what it doesn’t (anything changing over time within individual)

Fixed Effects

  • Effect of GDP per capita on life expectancy - control for country, but FE won’t get war!

Fixed Effects

Difference-in-Difference

  • We have before and after of a treatment for a treated group
  • Time is on a back door!
  • But we can’t control for it because that would eliminate all treatment difference
  • So we need a control group to contrast the before/after difference with

Difference-in-Difference

Difference-in-Difference

#Create our data
diddata <- tibble(year = sample(2002:2010,10000,replace=T),
                  group = sample(c('TreatedGroup','UntreatedGroup'),10000,replace=T)) %>%
  mutate(after = (year >= 2007)) %>%
  #Only let the treatment be applied to the treated group
  mutate(D = after*(group=='TreatedGroup')) %>%
  mutate(Y = 2*D + .5*year + rnorm(10000))
#Now, get before-after differences for both groups
means <- diddata %>% group_by(group,after) %>% summarize(Y=mean(Y))
#Before-after difference for untreated, has time effect only
bef.aft.untreated <- means$Y[4] - means$Y[3]
#Before-after for treated, has time and treatment effect
bef.aft.treated <- means$Y[2] - means$Y[1]
#Difference-in-Difference! Take the Time + Treatment effect, and remove the Time effect
DID <- bef.aft.treated - bef.aft.untreated
DID
## [1] 2.002812

Difference-in-Difference

Regression Discontinuity

  • Without regression?
  • All we lose is the ability to parametrically refine our prediction of near-the-cutoff values using away-from-cutoff values
  • So, limit to area around cutoff, take difference of means. Easy!
  • Conceptually, once we control for running variable (by limiting to area around cutoff), above is an IV for treatment (no actual IV yet so discuss this conceptually)

Regression Discontinuity

Regression Discontinuity

rdd.data <- tibble(test = runif(1000)*100) %>%
  mutate(GATE = test >= 75) %>% 
  mutate(earn = runif(1000)*40+10*GATE+test/2)
#Choose a "bandwidth" of how wide around the cutoff to look 
#(arbitrary in our example)
#Bandwidth of 2 with a cutoff of 75 means we look from 75-2 to 75+2
bandwidth <- 2
#Just look within the bandwidth
rdd <- rdd.data %>% filter(abs(75-test) < bandwidth) %>%
  #Create a variable indicating we're above the cutoff
  mutate(above = test >= 75) %>%
  #And compare our outcome just below the cutoff to just above
  group_by(above) %>% summarize(earn = mean(earn))
#Our effect looks just about right (10 is the truth)
rdd$earn[2] - rdd$earn[1]
## [1] 14.09107

Regression Discontinuity

Instrumental Variables

  • IV sets up very well as a DAG. You need all paths from Z to Y to go through X, after closing back doors with controls
  • Warn that there actually needs to be a relationship between Z and X (weak inst problem)
  • And that finding a realistic IV can be hard
  • Motivate the whole thing with imperfect random assignment \(\leftarrow\) intuitive
  • The opposite of a control. Instead of explaining X and Y with Z and throwing that part out, ONLY KEEP that part!

Instrumental Variables

Instrumental Variables

Instrumental Variables

  • Either use group_by() and summarize() for Wald estimator with binary instrument
  • Or do the same with binned instrument, but use cor() for results with continuous instrument
df <- tibble(R = sample(c(0,1),500,replace=T)) %>%
  #We tell them whether or not to get treated
  mutate(X = R) %>%
  #But some of them don't listen! 20% do the OPPOSITE!
  mutate(X = ifelse(runif(500) > .8,1-R,R)) %>%
  mutate(Y = 5*X + rnorm(500))

Instrumental Variables

Actually calculate it now:

iv <- df %>% group_by(R) %>% summarize(Y = mean(Y), X = mean(X))
iv
## # A tibble: 2 x 3
##       R     Y     X
##   <dbl> <dbl> <dbl>
## 1     0 0.828 0.160
## 2     1 3.94  0.779
#Remember, since our instrument is binary, we want the slope
(iv$Y[2] - iv$Y[1])/(iv$X[2]-iv$X[1])
## [1] 5.03231

Instrumental Variables - Continuous IV

library(AER)
data(CigarettesSW)

CigarettesSW <- CigarettesSW %>%
  mutate(cigtax = taxs-tax) %>%
  mutate(price = price/cpi,
         cigtax = cigtax/cpi) %>%
  group_by(cut(cigtax,breaks=7)) %>%
  summarize(priceexp = mean(price),
         packsexp = mean(packs)) %>%
  ungroup()

cor(CigarettesSW$priceexp,CigarettesSW$packsexp)
## [1] -0.9711096

Instrumental Variables

So!

ECON 305

  • This has covered the goals and approaches taken in ECON 305
  • I’m still revising my class, and obviously more than one way to do all of this
  • But this should give you a good start and hopefully make you comfortable with the idea of teaching it
  • Keep in mind all the available resources! All my slides are available, plus workshop materials, much more on my website nickchk.com

Thanks!