PAF 573
Elaine MacPherson
Please re-read Section 5.2 of Mastering Metrics.
##I am going to take notes here for myself as a way of articulating the concepts to hopefully make the content make sense to me.Diff in diff is a tool we can use when we CANNOT do random assignment. If control and treatment groups move in parallel - but the divergence of a post treatment path might signal a treatment effect.
What policy question are the authors addressing in this section? In regular words (not statistics speak), how are they attempting to answer their question?
ANSWER
##MLDA = minimum legal drinking age. Due to various regional decisions
after the prohibition of alcohol was repealed, there was a variation in
drinking ages that makes comparison possible. Alabama had a MLDA of 19,
while the neighboring state Arkansas had a MLDA of 21. The question
being asked here is: did the lower drinking age result in more deaths of
young people?
The comparisons they want to make here are not as straightforward as they were in the minimum wage example. What are some of the complications?
ANSWER
##The minimum wage example is more straightforward because there was a natural “treatment” when the minimum wage in New Jersey went up from $4.25 to $5.05. If the question being asked is, “Does increasing the minimum wage decrease unemployment?” then the assumption could be that even WITHOUT this legal increase, employment levels might have changed. Finding a place with similar economic conditions (Pennsylvania) where the minimum wage had not changed made a good natural control group. Looking at differences for each place before and after the legal change in New Jersey makes this a relatively straightforward differences in differences model. In this scenario, there are several states up for comparison (while Arkansas and Alabama are used in the example, there are many states with variation for when they applied their MLDA laws). These changes for various states happened during different years, meaning that there isn’t a shared “treatmeant period” the way there was for the years right after the minimum wage went up. Finally, since the ages vary, the dummy variable we create has to show “legal access to drinking” whether that age is 18, 19, or 20 (for example) because the MLDA vary so widely.
I have uploaded part of the dataset used to conduct the analysis in that section in a file called mlda.csv. The dataset contins the death rates (of all types) for 18-20 year olds in 50 states plus the District of Columbia for each of 14 years, as well as the fraction of 18-20 year olds who can legally drink in each year. Import and examine this data.
# load libraries
library(tidyverse)
library(jtools)
library(plm)
library(lmtest) # for coeftest function
# read in MLDA data
URL <- "https://raw.githubusercontent.com/spiromar/files/main/paf573/mlda.csv"
mlda <- read.csv( URL )
# examine data
# legal = fraction of 18-20 year olds who can legally drink
# mrate = mortality rate of 18-20 year olds per 100,000
glimpse(mlda)## Rows: 714
## Columns: 13
## $ year <int> 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 197…
## $ state <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, …
## $ dtype <chr> "all", "all", "all", "all", "all", "all", "all", "all", "…
## $ count <int> 292, 316, 320, 291, 306, 326, 299, 332, 317, 309, 276, 24…
## $ pop <int> 189770, 195623, 200988, 206550, 213839, 220358, 225486, 2…
## $ age <dbl> 18.96999, 18.97218, 18.97470, 18.97023, 18.98014, 18.9813…
## $ legal <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.…
## $ beertaxa <dbl> 1.3737113, 1.3160493, 1.2751197, 1.2004504, 1.0811359, 0.…
## $ beerpercap <dbl> 0.60, 0.66, 0.74, 0.79, 0.83, 0.88, 0.89, 0.99, 0.98, 0.9…
## $ winepercap <dbl> 0.09, 0.09, 0.09, 0.10, 0.16, 0.16, 0.15, 0.13, 0.12, 0.1…
## $ spiritpercap <dbl> 0.70, 0.76, 0.78, 0.79, 0.81, 0.85, 0.86, 0.84, 0.88, 0.8…
## $ totpercap <dbl> 1.38, 1.52, 1.61, 1.69, 1.80, 1.88, 1.89, 1.96, 1.97, 1.9…
## $ mrate <dbl> 153.8705, 161.5352, 159.2135, 140.8860, 143.0983, 147.941…
##
## 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983
## 51 51 51 51 51 51 51 51 51 51 51 51 51 51
##
## 1 2 4 5 6 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14
## 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50 51 53 54 55 56
## 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14
Let’s start the analysis by simply examining the trend of death rates over time in the country. You will notice that there is a downward trend.
ggplot(mlda, aes(x=year, y=mrate)) +
stat_summary(fun = "mean", geom = "point") +
stat_summary(fun = "mean", geom = "line") Our general approach here is going to be to compare changes over time
within states to each other, so let’s break out these trends by state.
To do this we can use a facet_wrap layer. We also do not
need to take the mean within each year anymore, since there is only one
observation per year per state:
Press the Zoom button in the R Studio console to make this graph as big as possible on your screen, and examine the trends. You will notice that most states are similar to the national trend, but some states have more variance than others.
What we more interested in, however, is whether changes in the
drinking laws within the states resulted in changes to the death rates.
We can take an imperfect peak at this by coloring the points by the
applicable law in any given year. The legal variable gives
us this information. It is a continuous variable, but for the purposes
of the plot, let’s just use one color for when no 18-20 year olds are
allowed to drink (legal = 0) and another color when at least some are
allowed (legal > 0):
The first thing to notice is that some states don’t change colors. Those states are not going to provide as much usuable information for our purposes. Other states change colors once or more. The most interesting thing is to try to see what happens to the death rate when the color does change. If we think the minimum legal drinking age has an effect, as a first pass we would expect it to go up when the law becomes more lax, and down when the law becomes more stringent. Stop for a moment and look closely at the graph with this in mind. Note also that at this point, we haven’t relied on any particlar knowledge of statistics or probability. We are simply applying a little logic to see if we can tease out the effect.
But here is where we get a little stuck. As we have said many times in the course, the correlation between policy change and outcome is what might be reported in the news, or thrown around as evidence in political discourse, but it is not what we really want to know. What we really want to know when the colors change on a state’s graph is the counterfactual outcome for that state: What would have happened to the death rate if insetead of changing, the law had actually remained the same? To answer this quesiton, we have to make some choices. One option is to find a similar state that did not experience the change, and conduct a difference-in-differences analysis. That is what we did last week with the minimum wage. Another option is to create a counterfactual using data from all the states that had a different policy during the time period of interest, and make a difference-in-differences comparison. Of course, we don’t have to stop at just one state. We could make similar difference-in-differences comparisons for all states at all times where their policies changed. It is this generalization of the difference-in-differences logic that a fixed effects model enbles us to accomplish. In fact, Equation 5.5 on p.194 of the Mastering Metrics book gives us the exact model we need in this case.
In this problem, we will estimate Equation 5.5 in Mastering Metrics.
More specifically, run a regression that recreates the value in first
row of Column (1) of Table 5.2. Do not get confused by the summation
notation in that equation. The summation terms essentially correspond to
including state and time fixed effects. (See Footnote 8 in the book for
what that is the case.) Asso, note that this specification uses the
continous version of legal and not the some-or-nothing
simplification we used in our graphs.
##MODEL FIT: ##F(2,711) = 3.03, p = 0.05 ##R² = 0.01 ##Adj. R² = 0.01
##Standard errors: OLS ##————————————————- ## ##Est. S.E. t val. p —————– ——– —— ——– —— ##(Intercept) 143.91 3.50 41.09 0.00 ##legal -6.28 3.33 -1.88 0.06 ##state -0.14 0.09 -1.47 0.14 ##————————————————-
**ANSWER**
##gonna take my best shot here from the lecture:
str(mlda$legal)
binaryregressor <- lm(mrate ~ legal + state, data = mlda)
summ(binaryregressor)
##Seeing the regression outcome above, in states where young people aged 18-20 CAN drink (in other words, when legal does NOT equal 0) there appear to be 6.28 less deaths.
#### 3.2
Estimate the regression using `plm`. Use `coeftest` to cluster the standard errors by state.
```r
# put plm regression here
library(plm) ##MY CODE WORKED, THE R GODS ARE SMILING UPON ME TODAY
##Also - why are there 56 states? I’ve been wondering. Looking at these results state by state, it appears that there are many states where the lower MLDA (when legal does NOT equal zero) results in a decreased number of deaths. There are a few where this is not the case - State 2, 4, 16, 32, and 56 all have relatively high positive numbers. Other states have some lower coefficents that are still above zero, but by and large the effect is pretty dramatically downward in most states. In addition - almost all of these negative coefficients have high significance codes. Seems like we should probably lower the drinking age?