Using Difference-in-Differences Models for Environmental Social Science

How Interaction Models can help us work with Data over Time

Introduction

Increasingly, political scientists are using data on multiple cities, countries, or individuals, recorded over time. How do we work with this data? In addition to fixed effects or random effects, which show constant effects over time, we might also be interested in change over time. This is the main advantage of Difference-in-Differences, and can achieved using a simple OLS model with an interaction effect, with several simple steps for scaling up to more powerful models. This tutorial introduces several short lines of R code on how to run a Difference-in-Differences model and get useful visuals, which you can do in SPSS, Stata, or R.

This tutorial will briefly introduce six parts:

Running your models in R with the Tokyo Emissions Dataset
Difference-in-Differences with an interaction effect
Using Fixed Effects in Difference-in-Differences Models
Visualizing Difference-in-Differences Models
Considerations for Collinearity
Conclusion

It’s worth saying that a good difference-in-difference model requires passing several additional assumptions, like the parallel slopes test. I recommend checking out Yiqing Xu’s Youtube lecture series on Causal Inference with Panel Data for more information.

1. Running your models in R with the Tokyo Emissions Dataset

1.1 Load Packages

First, we need to load our packages in R. New to R? That’s great - check out the helpful free online textbook Modern Dive: Statistical Inference via Data Science. It will help set you up to work with modeling in R. Below, we load six packages using the library() function. (If you haven’t installed them yet, just remove the hastag (“un-comment”) below and run the code to install these packages.)

# install.packages("tidyverse", "dataverse", "Zelig", "plm", "texreg")
library(tidyverse) # for data manipulation
library(dataverse) # for accessing data from the Harvard Dataverse
library(Zelig) # for modeling and convenient visualization
library(plm) # for extra modeling capacity for panel data
library(texreg) # for making model tables

1.2 Import Data

Next, we need to load Fraser and colleagues’ (2020) emissions dataset, which records the amount of carbon emissions in kilotons for each municipality in Japan, including towns, cities, and urban wards, for the years 2005 to 2017. We can download it directly from the Harvard Dataverse using the following code:

# First, let's specify we want to download from this server.
    Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
  
  # Second, import dataset of emissions from Harvard Dataverse
  dat <- get_dataframe_by_name(
    filename  = "dataset.tab",
    dataset   = "10.7910/DVN/E8PEW7")

1.3 Format Data

Finally, we’re going to format that data, zooming into just cities and wards in the Tokyo area. We will create our outcome of interest, the rate of carbon emissions per 1,000 residents.

We’ll also create a few extra covariates (many are saved in this dataset), including population density.

Also, key to this exercise, we’ll make a counter variable that counts how many years have passed since 2005, the start of our dataset.

We will also make our key treatment variable, treat, to indicate which cities over time exhibit strong government capacity in a year. We show this using each city’s financial strength index, a measure of the budget capacity of each city, affecting the scale of staffing, policies, and initiatives they can fund. To make treat, we will classify cities into above or below the median over time.

# Third, we're going to overwrite that dataset,
# keeping just a subset of cities.
dat <- dat %>%
  # Grab just the cities and wards in Tokyo
  filter(pref == "Tokyo") %>%
  # Calculate carbon emissions in kilotons per 1000 residents
  mutate(emissions = emissions / pop * 1000) %>%
  # Calculate population density
  mutate(pop_density = pop / area_inhabitable) %>%
  # Create an annual counter, going from 1, 2, 3, 4, etc.
  mutate(counter = year - min(year)) %>%
  # Create a binary treatment, where if this city *ever* was a high capacity city, 
  # then count them as treated for the rest of the time-period
  mutate(treat = if_else(fin_str_index > median(fin_str_index), 1, NA_real_)) %>%
  group_by(muni_code) %>%
  fill(treat, .direction = "up") %>%
  mutate(treat = if_else(is.na(treat), 0, treat)) %>%
  ungroup() %>%
  # Ensure year and city code are treated as factors for fixed effects
  mutate(year = factor(year),
         muni_code = factor(muni_code)) %>%
  # Finally, let's keep just a few of the many variables:
  select(
    muni, # name of municipality
    muni_code, # each city's five-digit unique identifier code
    year, # year
    counter, # number of years passed since 2005 (0, 1, 2, ...),
    emissions, # emissions in kilotons per 1,000 residents
    treat, # High government capacity (1/0)
    fin_str_index, # government capacity (numeric)
    # and a bunch of covariates that might also shape emissions, including:
    pop_density, # population density 
    income_per_capita, # income per capita 
    pop_age_65_plus, # % Age 65 or older
    pop_college, # % graduated college
    value_agr_mill, # agricultural output in millions of yen per capita 
    value_manuf_mill, # manufacturing output in millions of yen per capita 
    value_commerce_mill # commerical output in millions of yen per capita 
  )

1.4 Explore Data

In the examples below, we will test whether municipalities with stronger government capacity see lower rates of carbon emissions over time.

# Let's calculate the mean emissions for each treatment group 
myavg <- dat %>%
  group_by(year, treat) %>%
  summarize(emissions = median(emissions))

Now, let’s visualize it.

# Let's make a line plot
dat %>%
  # Where the x axis has years, and the y axis shows emissions
  ggplot(mapping = aes(x = year, y = emissions, 
                       # and each city gets its own line
                       group = muni_code,
                       # colored based on whether it has high government capacity
                       color = factor(treat) )) +
  # Let's make transparent lines
  geom_line(alpha = 0.2) +
  # Then let's plot average emissions for treated and untreated cities
  geom_point(data = myavg,
             mapping = aes(x = year, y = emissions, 
                           # Both the treated and untreated groups get their own line (2 total)
                           group = factor(treat),
                           # with colors!
                           color = factor(treat)),
             size = 3) +
  # Then let's make a line of best fit among these averages
  geom_smooth(data = myavg,
              mapping = aes(x = year, y = emissions, 
                            # Both the treated and untreated groups get their own line (2 total)
                            group = factor(treat),
                            # with colors!
                            color = factor(treat)),
              size = 1, method = "lm", se = FALSE) +
  # Let's log-transform the y-axis
  scale_y_log10() +
  # And add some helpful design touches
  theme_classic(base_size = 14) +
  labs(x = "Year", y = "Emissions per 1,000 residents\n(y-axis log-scaled)",
       color = "High\nGovernance\nCapacity (1/0)",
       subtitle = "Change in Emissions over Time\namong Tokyo Municipalities (2005-2017)",
       caption = "Transparent lines show individual cities' emissions. Points show average emissions.\n Solid lines show OLS lines of best fit for time trend among average emissions.")

It looks like cities with higher governance capacity (blue) have lower rates of emissions than cities with low governance capacity. These trends are also increasingly diverging over time, with cities with weak governance garnering greater emissions and those with strong governance capacity garnering slightly more emissions as at slightly slower rate.

Wouldn’t it be nice if we could measure how much greater this difference grows over time?

2. Difference-in-Differences

The difference-in-differences estimation procedure is a helpful procedure when working with treatment effects over time. In this example, we will be working with a binary treatment - high or low governance capacity - although a continuous treatment is also possible. This technique is only appropriate when working with panel data - data where each unit of observation is recorded multiple times, like cities over years, individuals over days, etc.

2.1 Working with Panel Data

Usually, when we work with panel data, these models have highly correlated residuals (heteroskedasticity), because each unit of observation, like a city in our case, literally appears multiple times in the data, for year 2005, 2006, 2007, etc. See for example below the case of Chiyoda-ku, a ward in the center of Tokyo, which has multiple observations over time.

dat %>%
  select(muni, year, emissions, treat, counter) %>% 
  head()

## # A tibble: 6 × 5
##   muni                year  emissions treat counter
##   <chr>               <fct>     <dbl> <dbl>   <dbl>
## 1 Tokyo-to Chiyoda-ku 2017       86.8     1      12
## 2 Tokyo-to Chiyoda-ku 2016       82.2     1      11
## 3 Tokyo-to Chiyoda-ku 2015       86.3     1      10
## 4 Tokyo-to Chiyoda-ku 2014       83.9     1       9
## 5 Tokyo-to Chiyoda-ku 2013       86.7     1       8
## 6 Tokyo-to Chiyoda-ku 2012       80.3     1       7

As a result, their residuals are closely related, and this breaks one of the rules of OLS (independence of observations), making p-values suspect. To deal with this, scholars actually add extra variables to the model to account for the variation due to each specific time period, or sometimes due to each specific city. This technique is called fixed effects, where you add a dummy variable for each time period. This allows you to estimate what I like to call the direct effect of governance capacity over time.

However, it’s more of a birds eye view of the trend. We would have even more convincing evidence of a causal relationship if we could prove that each year following a treatment, the treatment and control group grew further apart in terms of their key outcome. Difference-in-differences helps us identify how much this gap grows over time - the difference in the differences between treatment and control groups over time.

2.2 Interaction Effects for Difference-in-Differences

What’s neat about it is that we can use very straightforward tools to accomplish this.

The basic idea is that we can use an interaction effect between the treatment and a time to calculate the difference-in-differences. An interaction effect refers to multiplying two variables together, and including their product in the regression as an additional covariate.

If positive and statistically significant, it indicates that treated cities see an increase in the outcome for each passing year, relative to untreated cities. If negative and statistically significant, it indicates that the treated cities see decreases in the outcome for each passing year, relative to untreated cities.

Sometimes, though, your data has confounding variable; in the case of emissions, we definitely need to adjust for demographic and economic traits before we can get close to the true effect of governance capacity on emissions.

Let’s try a few models, adding additional variables to show the consistency and growth of the effect.

m1 <- dat %>%
           zelig(formula = log(emissions) ~ treat * counter, model = "ls", cite = FALSE) 
         
         m2 <- dat %>%
           zelig(formula = log(emissions) ~ treat * counter + 
                   pop_density, model = "ls", cite = FALSE)
         
         m3 <- dat %>%
           zelig(formula = log(emissions) ~ treat * counter + 
                   pop_density + income_per_capita + pop_age_65_plus + pop_college +
                   value_agr_mill + value_commerce_mill + value_manuf_mill, 
                 model = "ls", cite = FALSE)

We can bind these together into a nice table using the texreg package.

# Let's make a nice model table
texreg::htmlreg(
  list(m1,m2,m3),
  stars = c(0.001, 0.01, 0.05, 0.10),
  bold = 0.10, # flag variables significant at p < 0.10 as bolded
  caption.above = TRUE, # Add a caption up top
  caption = "OLS Difference-in-Differences Model of Emissions Rates (log-transformed)",
  # Add some helpful model names
  custom.model.names = c("Simple Model", "Simple Controls", "Complete Controls"),
  include.fstat = TRUE, 
  # We're also going to add a custom goodness of fit test
  custom.gof.rows = list(
    # Calculate maximum variance inflation factor per model
    "Max VIF" = list(m1,m2,m3) %>%
      map(~from_zelig_model(.) %>% car::vif() %>% max()) %>%
      unlist(),
    # Calculate mean variance inflation factor per model
    "Mean VIF" = list(m1,m2,m3) %>%
      map(~from_zelig_model(.) %>% car::vif() %>% mean()) %>%
      unlist()
  )
) %>%
  htmltools::HTML()

OLS Difference-in-Differences Model of Emissions Rates (log-transformed)
	Simple Model	Simple Controls	Complete Controls
(Intercept)	2.09^***	2.28^***	1.85^***
	(0.08)	(0.08)	(0.13)
treat	-0.48^***	-0.47^***	-0.33^***
	(0.10)	(0.09)	(0.06)
counter	0.06^***	0.03^**	0.06^***
	(0.01)	(0.01)	(0.01)
treat:counter	-0.04^**	-0.02	-0.06^***
	(0.01)	(0.01)	(0.01)
pop_density		-0.00^***	-0.00^***
		(0.00)	(0.00)
income_per_capita			0.00^***
			(0.00)
pop_age_65_plus			-0.01^***
			(0.00)
pop_college			0.01^*
			(0.00)
value_agr_mill			1.57^***
			(0.18)
value_commerce_mill			0.00^***
			(0.00)
value_manuf_mill			0.09^***
			(0.01)
Max VIF	6.63	7.53	7.89
Mean VIF	4.45	4.04	3.58
R²	0.20	0.23	0.72
Adj. R²	0.20	0.23	0.72
Num. obs.	744	744	744
F statistic	62.87	55.40	188.45
^*p < 0.001; ^p < 0.01; ^*p < 0.05; ^·p < 0.1

This reveals a few things. First, the treatment’s overall effect is negative, while the overall effect of time (counter) is positive. Second, the difference-in-differences is negative; over time, the treatment leads to lower emissions each year. As more controls are added, the effect grows more and more statistically significant.

One other thing to think about before working with interactions is that interaction effects naturally introduce considerable collinearity into models. Multicollinearity is bad for models, because while while predictions are not affected, your beta coefficients can be skewed, completely changing how we interpret the effects of different variables. So, it’s always best to start from the beginning with a simple model, and using the variance inflation factor to confirm that it has minimal collinearity before adding interactions.

In this case, we can use our original OLS models to confirm that. In the table above, we calculated the mean and maximum variance inflation factor among coefficients per model. Fortunately, all VIF scores are well below 10, the threshold for problematic multicollinearity, and on average, they are all quite low, between 3 to 4. It is usually quite difficult to keep collinearity low in higher-level difference-in-difference models. So, it is a good idea to show that any effects we see are consistent across simpler models too, to show that a positive or negative effects did not occur just due to collinearity.

3. Using Fixed Effects with Difference-in-Differences

Difference-in-differences with just an interaction effect is perfectly suited to comparing just two cases (a treatment and control case), or perhaps a matched sample, where covariate balance has already been accomplished and we do not need to worry about intervening variables. However, in most cases, we are looking at multiple groups (many cities) and multiple time periods (multiple years), then we have residual variation left in the model due specifically to the many different cities and time periods.

So, should we add fixed effects? Fixed effects refer to dummy variables that we add to models to account for ‘fixed’ effect of specific aspects of the panel data, like a dummy variable for each year, or a dummy variable for each city.

Let’s make several models below using the Zelig and plm packages. The plm package puts some of the syntax under the hood, so I will try and replicate what they are doing using Zelig to make it clearer.

As far as I can tell, adding fixed effects by time is an excellent idea. It better accounts for variation over time, and it is still quite easy to make inferences.

First, we will run the model with no fixed effects.

m1 <- dat %>%
  zelig(formula = log(emissions) ~ I(treat * counter) + treat + counter + 
          pop_density + income_per_capita + pop_age_65_plus + pop_college +
          value_agr_mill + value_commerce_mill + value_manuf_mill, 
        model = "ls", cite = FALSE)
# We can do that using just a pooled model in plm
m2 <- dat %>%
  plm(formula = log(emissions) ~ I(treat * counter) + treat + counter +
        pop_density + income_per_capita + pop_age_65_plus + pop_college +
        value_agr_mill + value_commerce_mill + value_manuf_mill, 
      model = "pooling", index = c("muni_code", "year"))

Second, we will run the model with annual fixed effects, removing the original counter from the model (since they are redundant and can’t be included simultaneously).

m3 <- dat %>%
           zelig(formula = log(emissions) ~ I(treat * counter) + treat +
                   pop_density + income_per_capita + pop_age_65_plus + pop_college +
                   value_agr_mill + value_commerce_mill + value_manuf_mill +
                   year, 
                 model = "ls", cite = FALSE)
# In plm, this is achieved using a "within" model with "time" effects
m4 <- dat %>%
           plm(formula = log(emissions) ~ I(treat * counter) + treat + 
                 pop_density + income_per_capita + pop_age_65_plus + pop_college +
                 value_agr_mill + value_commerce_mill + value_manuf_mill, 
               model = "within",  effect = "time", index = c("muni_code", "year"))

Third, we will add group fixed effects, for each municipality.

m5 <- dat %>%
  zelig(formula = log(emissions) ~ I(treat * counter) + treat +
          pop_density + income_per_capita + pop_age_65_plus + pop_college +
          value_agr_mill + value_commerce_mill + value_manuf_mill +
          year + muni_code, 
        model = "ls", cite = FALSE)
# In PLM, this is called two-way fixed effects
m6 <- dat %>%
  plm(formula = log(emissions) ~ I(treat * counter) + treat + 
        pop_density + income_per_capita + pop_age_65_plus + pop_college +
        value_agr_mill + value_commerce_mill + value_manuf_mill, 
      model = "within",  effect = "twoway", index = c("muni_code", "year"))

Fourth, we will estimate a two-way fixed effects model tends to have high collinearity with covariates, due to unit fixed effects. So let’s prove that our model still shows the same result even without these predictors and their collinearity with fixed effects.

m7 <- dat %>%
  zelig(formula = log(emissions) ~ I(treat * counter) + treat +
          year + muni_code, 
        model = "ls", cite = FALSE)
# In PLM, this is called two-way fixed effects
m8 <- dat %>%
  plm(formula = log(emissions) ~ I(treat * counter) + treat,
      model = "within",  effect = "twoway", index = c("muni_code", "year"))

Finally, we will display them all in one model table. The fixed effects will not be shown in the table, since there are dozens, but they are each included in their respective models.

# Let's make a nice model table
texreg::htmlreg(
  list(m1,m2,m3,m4,m5,m6,m7,m8),
  stars = c(0.001, 0.01, 0.05, 0.10),
  bold = 0.10, # flag variables significant at p < 0.10 as bolded
  caption.above = TRUE, # Add a caption up top
  caption = "OLS Difference-in-Differences Model of Emissions Rates (log-transformed)<br>with fixed effects",
  # Add some helpful model names
  custom.model.names = rep(c("Zelig", "plm"),4),
  custom.headers = list("Basic Interaction" = 1:2,
                        "Annual Fixed Effects" = 3:4,
                        "Two-way Fixed Effects" = 5:6,
                        "Two-way Fixed Effects but No Covariates" = 7:8),
  include.fstat = TRUE,
  omit.coef = "year|muni_code") %>%
  htmltools::HTML()

OLS Difference-in-Differences Model of Emissions Rates (log-transformed)
with fixed effects
	Zelig	plm	Zelig	plm	Zelig	plm	Zelig	plm
(Intercept)	1.85^***	1.85^***	1.88^***		3.41^***		4.37^***
	(0.13)	(0.13)	(0.13)		(0.17)		(0.02)
treat * counter	-0.06^***	-0.06^***	-0.05^***	-0.05^***	-0.02^***	-0.02^***	-0.02^***	-0.02^***
	(0.01)	(0.01)	(0.01)	(0.01)	(0.00)	(0.00)	(0.00)	(0.00)
treat	-0.33^***	-0.33^***	-0.44^***	-0.44^***	0.02^·	0.02^·	0.04^**	0.04^**
	(0.06)	(0.06)	(0.07)	(0.07)	(0.01)	(0.01)	(0.01)	(0.01)
counter	0.06^***	0.06^***
	(0.01)	(0.01)
pop_density	-0.00^***	-0.00^***	-0.00^***	-0.00^***	-0.00^**	-0.00^**
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
income_per_capita	0.00^***	0.00^***	0.00^***	0.00^***	0.00^***	0.00^***
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
pop_age_65_plus	-0.01^***	-0.01^***	-0.02^***	-0.02^***	-0.00	-0.00
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
pop_college	0.01^*	0.01^*	0.01^·	0.01^·	-0.00	-0.00
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
value_agr_mill	1.57^***	1.57^***	1.38^***	1.38^***	0.51^***	0.51^***
	(0.18)	(0.18)	(0.19)	(0.19)	(0.10)	(0.10)
value_commerce_mill	0.00^***	0.00^***	0.00^***	0.00^***	0.00^**	0.00^**
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
value_manuf_mill	0.09^***	0.09^***	0.10^***	0.10^***	0.05^***	0.05^***
	(0.01)	(0.01)	(0.01)	(0.01)	(0.00)	(0.00)
R²	0.72	0.72	0.73	0.73	1.00	0.42	0.99	0.12
Adj. R²	0.72	0.72	0.72	0.72	0.99	0.35	0.99	0.02
Num. obs.	744	744	744	744	744	744	744	744
F statistic	188.45		97.45		1700.23		1237.92
^*p < 0.001; ^p < 0.01; ^*p < 0.05; ^·p < 0.1

4. Visualizing Difference-in-Differences

Finally, let’s visualize these effects!

It’s important not to rely too heavily on beta coefficients, especially if you are using log-transformations like this example did. Instead, statistical simulations can be very helpful here. Statistical simulation is robust to heteroskedasticity, because it does not require calculating standard errors or p-values, so heteroskedasticity from panel data will not compromise simulations. Additionally, it’s a really flexible technique. For more information on statistical simulations, check out my other tutorials on Visualizing Statistical Model Results and Statistical Simulation in Zelig. In short, we can use 1000 sets of our model beta coefficients, each altered just slightly due to randomness, to visualize 95% confidence intervals of expected outcomes, given certain input values of X.

4.1 Tile Plot

We’re going to use the Zelig package to simulate effects. We’re going to ask, *how much greater are the emissions in 2017 than they were in 2005 for treated cities vs. untreated cities?

First make the model..

# First, we will run the model with no fixed effects
m1 <- dat %>%
  zelig(formula = log(emissions) ~ I(treat * counter) + treat + counter + 
          pop_density + income_per_capita + pop_age_65_plus + pop_college +
          value_agr_mill + value_commerce_mill + value_manuf_mill, 
        model = "ls", cite = FALSE)

Then simulate…

# Let's simulate the expected difference in emissions...
mysim <- m1 %>%
  # Between a municipality that didn't have strong government capacity in 2005
  sim(., setx(., treat = 0, counter = 0),
      # and a municipality that did have strong government capacity by 2017
      setx1(.,treat = 1, counter = 12))

Then write a function…

# Next, We're going to write a quick function
get_fd = function(mysimulation){
  # To extract the expected value of a statistical simulation
  data.frame(
    # For the first set of conditions
    x = mysimulation$sim.out$x$ev %>% unlist(),
    # For the second set of conditions
    x1 = mysimulation$sim.out$x1$ev %>% unlist()) %>%
    # Exponentiate them to get back to the normal scale of the outcome variable
    mutate(x = exp(x),
           x1 = exp(x1)) %>%
    # And subtract them to get the difference in the outcome 
    # given the second vs. first set of conditions
    mutate(fd = x1 - x) %>%
    return()
}

Then investigate simulations…

# Let's check out the expected differences,
# finding the median expected difference and 95% confidence intervals
mysim %>%
  get_fd() %>%
  summarize(median = median(fd),
            lower = quantile(fd, probs = 0.025),
            upper = quantile(fd, probs = 0.975))

##      median     lower      upper
## 1 -1.643846 -2.427091 -0.9967239

Next, let’s make a few more simulations, to show each of the four possible pairings of the treatment variable and counter variable.

# Let's simulate the expected difference in emissions...
           foursim <- list(
             # By getting the full universe of all four possible changes in conditions
             m1 %>%
               sim(., setx(., treat = 0, counter = 0),
                   setx1(.,treat = 0, counter = 0)),
             m1 %>%
               sim(., setx(., treat = 0, counter = 0),
                   setx1(.,treat = 0, counter = 12)), 
             m1 %>%
               sim(., setx(., treat = 0, counter = 0),
                   setx1(.,treat = 1, counter = 0)),
             m1 %>%
               sim(., setx(., treat = 0, counter = 0),
                   setx1(.,treat = 1, counter = 12))
           ) %>%
           # Get first differences for each, in a nice data.frame
           map_dfr(~get_fd(.), .id = "id") %>%
           # Now create useful labels
           mutate(
             treat = id %>% recode_factor(
               "1" = "0",
               "2" = "0",
               "3" = "1",
               "4" = "1"),
             counter = id %>% recode_factor(
               "1" = "2005",
               "2" = "2017",
               "3" = "2005",
               "4" = "2017"))

Now, let’s visualize it.

foursim %>%
           group_by(treat, counter) %>%
           summarize(median = median(fd),
                     lower = quantile(fd, probs = 0.025),
                     upper = quantile(fd, probs = 0.975)) %>%
           # Round to 3 decimals
           mutate_at(vars(median, lower, upper), funs(round(.,3))) %>%
           ggplot(mapping = aes(x = factor(treat), y = factor(counter), label = median, fill = median)) +
           geom_tile(color = "black", size = 1) +
           geom_text(color = "black") +
           scale_fill_gradient2(low = "#DC267F", mid = "white", high = "#648FFF") +
                                theme_classic(base_size = 14) +
                                  theme(panel.border = element_rect(fill = NA, color = "black")) +
                                  labs(x = "Treatment (1 = High Governance Capacity)", y = "Time", fill = "Expected\nChange in\nEmissions",
                                       subtitle = "Expected Change in Emissions\ncompared to a City with Low Governance Capacity in 2005")

This visual is very helpful because we can see the full range of outcomes. The lower left quandrant is our baseline for comparison. (There is 0 difference in emissions for a city with low governance capacity in 2005 and a city with low governance capacity in 2005.) But, relative to that, cities with low governance capacity tend to have much higher emissions by 2017 (8.7), while cities with high governance capacity in 2005 tended to have much lower emissions (-2.1) relative to cities with low governance capacity in 2005. The upper right hand quadrant shows the quantity we really care about: over time even though emissions tend to rise over time, the treatment leads cities with high governance capacity to see lower emissions (-1.7) by 2017 compared to cities with low governance capacity in 2005.

4.2 Ribbon plots

Let’s try to visualize this another way, showing the diverging trajectories for these cities over time.

First, we will generate the simulations.

timesim <- bind_rows(
  # Grab expected values over time for untreated cities
  m1 %>%
    setx(treat = 0, counter = 0:12) %>%
    sim() %>%
    zelig_qi_to_df() %>%
    qi_slimmer(qi_type = "ev", ci = 0.95) %>%
    select(treat, counter, contains("qi_")),
  # Grab expected values over time for treated cities
  m1 %>%
    setx(treat = 1, counter = 0:12) %>%
    sim() %>%
    zelig_qi_to_df() %>%
    qi_slimmer(qi_type = "ev", ci = 0.95) %>%
    select(treat, counter, contains("qi_"))
)

Now, let’s visualize it.

timesim %>%
  ggplot(mapping = aes(x = counter, y = qi_ci_median, 
                       ymin = qi_ci_min, ymax = qi_ci_max, 
                       group = treat, fill = factor(treat), 
                       color = treat)) +
  geom_ribbon(alpha = 0.5) +
  geom_line(linetype ="dashed") +
  labs(
    x = "Time",
    y = "Expected Emissions",
    fill = "Treatment\n(1 = High Capacity)\n(0 = Low Capacity") +
  scale_fill_manual(values = c("#DC267F", "#648FFF")) +
  theme_bw(base_size = 14) +
  theme(panel.border = element_rect(fill = NA, color = "black")) +
  guides(color = "none")

5. Considerations about Collinearity

A final consideration is that when you use interactions or fixed effects, the likelihood of high collinearity increases greatly. How do we think about our models in this context?

There are several different perspectives on collinearity that we should consider here, because difference-in-differences models can lead to both types of collinearity problems, including (1) structural (caused by interactions) and (2) data collinearity.

Structural collinearity is basically unavoidable and a product (pun intended!) of the research design, where you include literally collinear terms by multiplying predictors together. Further, the main effect of collinearity is variance inflation, meaning that p-values are less likely to be statistically significant; much like when dealing with a small-N sample, this means that if your variables are statistically significant despite collinearity, you are likely looking at an actual effect. As discussed elsewhere, collinearity is less of a problem with difference-in-differences designs, because the p-value of these interaction terms are generally not affected by collinearity.

On the other hand, data collinearity is avoidable. My argument would be to prove first that your model does not have data collinearity before introducing structural collinearity with interactions.

Given this information, we can approach collinearity a few different ways.

We can say we are solely interested in the interaction effect, which is generally unaffected by collinearity, and just not consider the other effects in the model.

We can provide a table of models, progressively increasing the number of covariates and showing that despite increasing covariates, fixed effects, and levels of collinearity, the interaction effect remains the same.

We can create a model with no collinearity before adding structural collinearity with an interaction, and build from there.

We can create a model with all key variables, interactions, and fixed effects, and then compare it to a model with just the interactions and fixed effects. In other words, we can remove the data collinearity and leave behind just the structural collinearity, which we know cannot be avoided.

If the effects are the same, we know that the data collinearity was not a huge problem. This works better for two-way models, where both municipal and time traits are adjusted; otherwise, removing key municipal covariates might mean important differences are not controlled for. In the latter situation, maybe using just time based fixed effects and key municipal covariates, or even just the simple interaction with key municipal covariates is best. Each of these approaches has trade-offs, but options 3 and 4 are the probably the most robust, good-faith efforts.

6. Conclusion

If you liked this tutorial, take a look at my other tutorials as well! Also, please do check out the Zelig team’s vignettes here, or read the paper that inspired this technique:

Gary King, Michael Tomz, and Jason Wittenberg. (2000). Making the most of statistical analyses: Improving interpretation and presentation. American Journal of Political Science 44, 341-365.

This tutorial has aimed to introduce difference-in-differences estimation, showing the basic mechanics of how to apply statistical simulation to interpret your results. We also used a log transformed outcome variable, to demonstrate how to work with rates. Finally, we explored the different applications of fixed effects in difference-in-differences models.

There is a lot more out there on difference-in-differences. You might consider the following sources, which may help you as you build up this toolkit.

See Bret Zeldow and Laura Hatfield’s introduction to Difference in Differences.
Check out Yiqing Xu’s guide to Generalized Synthetic Control Experiments, which expand upon difference-in-differences with two-way fixed effects and matching to infer against most comparable samples.
Explore Coady Wing and colleagues’ helpful introduction to difference-in-differences for public health policy.

Finally, to learn more about the dataset at the core of this tutorial, check out the following paper:

Fraser, Timothy, Lily Cunningham, Mary Bancroft, Amy Hunt, Eri Lee, and Amos Nasongo. Climate Crisis at City Hall: How Japanese communities mobilize to eliminate emissions. Environmental Innovations and Societal Transitions 37, 361-380.