Difference-in-Differences Jose Fernandez

Team Names:

Dzhamilya Chakhalidze

Alec Rabalais

Jennifer Russo

John Schulten

Qais Shaban

Amy Shah

install.packages(“gtsummary”)

MSBA Data Analytics III Difference-in-Differences The data is about the expansion of the Earned Income Tax Credit. This is a legislation aimed at providing a tax break for low income individuals. For some background on the subject, see Eissa, Nada, and Jeffrey B. Liebman. 1996. Labor Supply Responses to the Earned Income Tax Credit. Quarterly Journal of Economics. 111(2): 605-637. Big Hint: Most of the code you need is in the notes

head(eitc)

1.Describe and summarize data. Format nicely, not just R printout.

The data shows that there are 13,746 females included in the data set. By viewing the mean of the ‘Employed’ variable, we can see that the employment rate of this population is 51.3%. The average age of this population is 35 and the average number of children is 1.19, so around 1 child per individual. The maximum number of children is 9. The histogram of the number of children shows that the data set is skewed to the left.

library(dplyr)
library(tidyverse)
library(stargazer)
library(knitr)
library(vtable)
library(foreign)
library(haven)


eitc <- read_dta("eitc.dta")

labs <- c('State Code',
    'Year',
    'State Unemployment Rate',
    '# of Children',
    'Non-White/White',
    'Annual Family Income',
    'Annual Earnings',
    'Age',
    'Years of Education',
    'Employed',
    'Unearned Income')

st(eitc,labels=labs)
Summary Statistics
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
State Code 13746 54.525 27.135 11 31 81 95
Year 13746 1993.347 1.703 1991 1992 1995 1996
State Unemployment Rate 13746 6.762 1.462 2.6 5.7 7.7 11.4
# of Children 13746 1.193 1.382 0 0 2 9
Non-White/White 13746 0.601 0.49 0 0 1 1
Annual Family Income 13746 15255.319 19444.25 0 5123.418 18659.178 575616.821
Annual Earnings 13746 10432.476 18200.758 0 0 14321.224 537880.612
Age 13746 35.21 10.157 20 26 44 54
Years of Education 13746 8.806 2.636 0 7 11 11
Employed 13746 0.513 0.5 0 0 1 1
Unearned Income 13746 4.823 7.123 0 0 6.864 134.058
#histogram showing number of children by age 
hist(eitc$children, xlab = "# of Children Per Individual", ylab = "Volume", main ="# of Children Per Individual", col = 
       "#69b3a2")

hist(eitc$age, xlab = "Age", ylab = "Number of Women", main = "Age Breakout", col = "#69b3a2")

  1. Calculate the sample means of all variables for (a) single women with no children, (b) single women with 1 child, and (c) single women with 2+ children.
a = sapply(subset(eitc, children == 0), mean)
b = sapply(subset(eitc, children == 1), mean)
c = sapply(subset(eitc, children >= 2), mean)

child_breakout <-cbind(a,b,c)
colnames(child_breakout) <- c("0 Children","1 Child","2+ Children")

kable(child_breakout, "pipe", align="llccrr")
0 Children 1 Child 2+ Children
state 5.339666e+01 5.559091e+01 5.524386e+01
year 1.993365e+03 1.993338e+03 1.993330e+03
urate 6.663067e+00 6.802060e+00 6.858664e+00
children 0.000000e+00 1.000000e+00 2.801092e+00
nonwhite 5.159440e-01 5.964683e-01 7.088847e-01
finc 1.855986e+04 1.394157e+04 1.198530e+04
earn 1.376026e+04 9.928279e+03 6.613547e+03
age 3.849823e+01 3.375899e+01 3.204747e+01
ed 8.548676e+00 8.992479e+00 9.006721e+00
work 5.744896e-01 5.376063e-01 4.207099e-01
unearn 4.799607e+00 4.013291e+00 5.371749e+00
  1. Create a new variable with earnings conditional on working (missing for non-employed) and calculate the means of this by group as well.
eitc$empearned = ifelse(eitc$work == 1, eitc$earn, 0)
d = sapply(subset(eitc, eitc$work ==1), mean)

childbreakout2 <-cbind(a,b,c,d)
colnames(childbreakout2) <- c("0 Children","1 Child","2+ Children","Employed")

kable(childbreakout2, "pipe", align="llccrr")
0 Children 1 Child 2+ Children Employed
state 5.339666e+01 5.559091e+01 5.524386e+01 5.658296e+01
year 1.993365e+03 1.993338e+03 1.993330e+03 1.993380e+03
urate 6.663067e+00 6.802060e+00 6.858664e+00 6.656325e+00
children 0.000000e+00 1.000000e+00 2.801092e+00 9.784458e-01
nonwhite 5.159440e-01 5.964683e-01 7.088847e-01 5.646625e-01
finc 1.855986e+04 1.394157e+04 1.198530e+04 1.951025e+04
earn 1.376026e+04 9.928279e+03 6.613547e+03 1.646485e+04
age 3.849823e+01 3.375899e+01 3.204747e+01 3.572632e+01
ed 8.548676e+00 8.992479e+00 9.006721e+00 9.022689e+00
work 5.744896e-01 5.376063e-01 4.207099e-01 1.000000e+00
unearn 4.799607e+00 4.013291e+00 5.371749e+00 3.045400e+00
empearned 5.339666e+01 5.559091e+01 5.524386e+01 1.646485e+04
  1. Construct a variable for the “treatment” called ANYKIDS and a variable for after the expansion (called POST93-should be 1 for 1994 and later).
eitc$post93 = as.numeric(eitc$year >= 1994)
eitc$anykids = as.numeric(eitc$children >= 1)
  1. Create a graph which plots mean annual employment rates by year (1991-1996) for single women with children (treatment) and without children (control). Hint: you should have two lines on the same graph.
minfo = aggregate(eitc$work, list(eitc$year,eitc$anykids == 1), mean)

# rename column headings (variables)
names(minfo) = c("YR","Treatment","LFPR")

# Attach a new column with labels
minfo$Group[1:6] = "Single women, no children"
minfo$Group[7:12] = "Single women, children"
#minfo

require(ggplot2) #package for creating nice plots
qplot(YR, LFPR, data=minfo, geom=c("point","line"), colour=Group,
xlab="Year", ylab="Employment Rate")+geom_vline(xintercept = 1994)

  1. Calculate the unconditional difference-in-difference estimates of the effect of the 1993 EITC expansion on employment of single women. Hint: This means calculate the DID treatment effect by just subtracting means (i.e. no regression)
# This is the code from the class notes 
require(foreign)
# Compute the four data points needed in the DID calculation:
a = sapply(subset(eitc, post93 == 0 & anykids == 0, select=work), mean)
b = sapply(subset(eitc, post93 == 0 & anykids == 1, select=work), mean)
c = sapply(subset(eitc, post93 == 1 & anykids == 0, select=work), mean)
d = sapply(subset(eitc, post93 == 1 & anykids == 1, select=work), mean)
# Compute the effect of the EITC on the employment of women with children:
difindif <- (d-c)-(b-a)

difindif

work 0.04687313

  1. Now run a regression to estimate the conditional difference-in-difference estimate of the effect of the EITC. Use all women with children as the treatment group. Hint: your answers for 6 and 7 should match.
reg1 = lm(work ~ post93 + anykids + post93*anykids, data = eitc)

stargazer(reg1,type="html",covariate.labels = c("Post 1993","Children/No Children", "Interaction term"))
Dependent variable:
work
Post 1993 -0.002
(0.013)
Children/No Children -0.129***
(0.012)
Interaction term 0.047***
(0.017)
Constant 0.575***
(0.009)
Observations 13,746
R2 0.013
Adjusted R2 0.012
Residual Std. Error 0.497 (df = 13742)
F Statistic 58.451*** (df = 3; 13742)
Note: p<0.1; p<0.05; p<0.01
  1. Reestimate this model including demographic characteristics. These are characteristics of the person.
reg2 = lm(work ~ post93 + anykids + nonwhite + age + state + earn + post93*anykids, data = eitc)

stargazer(reg2, type = "html", covariate.labels = c("Post 1993", "Kids/No Kids", "Non-White (1)/ White(0)", "Age", "State", "Annual Earnings", "Interaction Term"))
Dependent variable:
work
Post 1993 -0.002
(0.012)
Kids/No Kids -0.057***
(0.011)
Non-White (1)/ White(0) -0.061***
(0.008)
Age 0.001***
(0.0004)
State 0.001***
(0.0001)
Annual Earnings 0.00001***
(0.00000)
Interaction Term 0.033**
(0.016)
Constant 0.355***
(0.020)
Observations 13,746
R2 0.128
Adjusted R2 0.128
Residual Std. Error 0.467 (df = 13738)
F Statistic 288.714*** (df = 7; 13738)
Note: p<0.1; p<0.05; p<0.01
  1. Add the state unemployment rate and allow its effect to vary by the presence of children.
reg3 = lm(work ~ post93 + anykids + nonwhite + age + state + earn + urate + post93*anykids + urate*anykids, data = eitc)

stargazer(reg3, type = "html", covariate.labels = c("Post 1993", "Kids/No Kids", "Non-White (1)/ White(0)", "Age", "State", "Annual Earnings","Unemployment rate","Interaction Term - Post93/Any Kids","Interaction Term - Unemployment Rate/Any Kids"))
Dependent variable:
work
Post 1993 -0.031**
(0.014)
Kids/No Kids 0.033
(0.046)
Non-White (1)/ White(0) -0.044***
(0.008)
Age 0.001***
(0.0004)
State 0.002***
(0.0002)
Annual Earnings 0.00001***
(0.00000)
Unemployment rate -0.022***
(0.005)
Interaction Term - Post93/Any Kids 0.017
(0.018)
Interaction Term - Unemployment Rate/Any Kids -0.012**
(0.006)
Constant 0.491***
(0.038)
Observations 13,746
R2 0.134
Adjusted R2 0.133
Residual Std. Error 0.465 (df = 13736)
F Statistic 235.312*** (df = 9; 13736)
Note: p<0.1; p<0.05; p<0.01
  1. Allow the treatment effect to vary by those with 1 or 2+ children. You will need to create separate dummy variables. These will become your new treatment variables.
eitc$onekid = as.numeric(eitc$children == 1)
eitc$twopluskid = as.numeric(eitc$children >= 2)

reg2plus = lm(work ~ post93 + onekid +twopluskid + nonwhite + age + state + earn + urate + post93*onekid +post93*twopluskid +urate*onekid +urate*twopluskid, data = eitc)

stargazer(reg2plus, type = "html", covariate.labels = c("Post 1993", "One Child", "2+ Children", "Non-White (1)/ White(0)", "Age", "State", "Annual Earnings","Unemployment Rate", "Interaction Term - Post93/1 Child", "Interaction term Post93/2+ Children", "Interaction term -  Unemployment Rate/1 Child", "Interaction term - Unemployment Rate/2+ Children" ))
Dependent variable:
work
Post 1993 -0.032**
(0.013)
One Child 0.010
(0.059)
2+ Children 0.038
(0.053)
Non-White (1)/ White(0) -0.039***
(0.008)
Age 0.001***
(0.0004)
State 0.002***
(0.0001)
Annual Earnings 0.00001***
(0.00000)
Unemployment Rate -0.022***
(0.005)
Interaction Term - Post93/1 Child 0.007
(0.023)
Interaction term Post93/2+ Children 0.024
(0.020)
Interaction term - Unemployment Rate/1 Child -0.001
(0.008)
Interaction term - Unemployment Rate/2+ Children -0.018***
(0.007)
Constant 0.501***
(0.038)
Observations 13,746
R2 0.137
Adjusted R2 0.137
Residual Std. Error 0.464 (df = 13733)
F Statistic 182.192*** (df = 12; 13733)
Note: p<0.1; p<0.05; p<0.01
  1. Return to your original equation in part 8. Estimate a “placebo” treatment model. Take data from only the pre-reform period. Use the same treatment and control groups. Introduce a placebo policy that begins in 1992 (so 1992 and 1993 both have this fake policy).
eitc$post91 = as.numeric(eitc$year >= 1992)
eitc$anykids = as.numeric(eitc$children >= 1)

regfakepol = lm(work ~ post91 + anykids + nonwhite + age + state + earn + post91*anykids, data = eitc)

stargazer(regfakepol, type = "html", covariate.labels = c("Post 1991", "Kids/No Kids", "Non-White (1)/ White(0)", "Age", "State", "Annual Earnings","Interaction Term - Post 1991/Children"))
Dependent variable:
work
Post 1991 0.001
(0.016)
Kids/No Kids -0.043**
(0.019)
Non-White (1)/ White(0) -0.060***
(0.008)
Age 0.001***
(0.0004)
State 0.001***
(0.0001)
Annual Earnings 0.00001***
(0.00000)
Interaction Term - Post 1991/Children 0.002
(0.021)
Constant 0.352***
(0.023)
Observations 13,746
R2 0.128
Adjusted R2 0.127
Residual Std. Error 0.467 (df = 13738)
F Statistic 287.369*** (df = 7; 13738)
Note: p<0.1; p<0.05; p<0.01