Difference-in-Differences Jose Fernandez
Team Names:
Dzhamilya Chakhalidze
Alec Rabalais
Jennifer Russo
John Schulten
Qais Shaban
Amy Shah
install.packages(“gtsummary”)
MSBA Data Analytics III Difference-in-Differences The data is about the expansion of the Earned Income Tax Credit. This is a legislation aimed at providing a tax break for low income individuals. For some background on the subject, see Eissa, Nada, and Jeffrey B. Liebman. 1996. Labor Supply Responses to the Earned Income Tax Credit. Quarterly Journal of Economics. 111(2): 605-637. Big Hint: Most of the code you need is in the notes
head(eitc)
1.Describe and summarize data. Format nicely, not just R printout.
The data shows that there are 13,746 females included in the data set. By viewing the mean of the ‘Employed’ variable, we can see that the employment rate of this population is 51.3%. The average age of this population is 35 and the average number of children is 1.19, so around 1 child per individual. The maximum number of children is 9. The histogram of the number of children shows that the data set is skewed to the left.
library(dplyr)
library(tidyverse)
library(stargazer)
library(knitr)
library(vtable)
library(foreign)
library(haven)
eitc <- read_dta("eitc.dta")
labs <- c('State Code',
'Year',
'State Unemployment Rate',
'# of Children',
'Non-White/White',
'Annual Family Income',
'Annual Earnings',
'Age',
'Years of Education',
'Employed',
'Unearned Income')
st(eitc,labels=labs)
| Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
|---|---|---|---|---|---|---|---|
| State Code | 13746 | 54.525 | 27.135 | 11 | 31 | 81 | 95 |
| Year | 13746 | 1993.347 | 1.703 | 1991 | 1992 | 1995 | 1996 |
| State Unemployment Rate | 13746 | 6.762 | 1.462 | 2.6 | 5.7 | 7.7 | 11.4 |
| # of Children | 13746 | 1.193 | 1.382 | 0 | 0 | 2 | 9 |
| Non-White/White | 13746 | 0.601 | 0.49 | 0 | 0 | 1 | 1 |
| Annual Family Income | 13746 | 15255.319 | 19444.25 | 0 | 5123.418 | 18659.178 | 575616.821 |
| Annual Earnings | 13746 | 10432.476 | 18200.758 | 0 | 0 | 14321.224 | 537880.612 |
| Age | 13746 | 35.21 | 10.157 | 20 | 26 | 44 | 54 |
| Years of Education | 13746 | 8.806 | 2.636 | 0 | 7 | 11 | 11 |
| Employed | 13746 | 0.513 | 0.5 | 0 | 0 | 1 | 1 |
| Unearned Income | 13746 | 4.823 | 7.123 | 0 | 0 | 6.864 | 134.058 |
#histogram showing number of children by age
hist(eitc$children, xlab = "# of Children Per Individual", ylab = "Volume", main ="# of Children Per Individual", col =
"#69b3a2")
hist(eitc$age, xlab = "Age", ylab = "Number of Women", main = "Age Breakout", col = "#69b3a2")
a = sapply(subset(eitc, children == 0), mean)
b = sapply(subset(eitc, children == 1), mean)
c = sapply(subset(eitc, children >= 2), mean)
child_breakout <-cbind(a,b,c)
colnames(child_breakout) <- c("0 Children","1 Child","2+ Children")
kable(child_breakout, "pipe", align="llccrr")
| 0 Children | 1 Child | 2+ Children | |
|---|---|---|---|
| state | 5.339666e+01 | 5.559091e+01 | 5.524386e+01 |
| year | 1.993365e+03 | 1.993338e+03 | 1.993330e+03 |
| urate | 6.663067e+00 | 6.802060e+00 | 6.858664e+00 |
| children | 0.000000e+00 | 1.000000e+00 | 2.801092e+00 |
| nonwhite | 5.159440e-01 | 5.964683e-01 | 7.088847e-01 |
| finc | 1.855986e+04 | 1.394157e+04 | 1.198530e+04 |
| earn | 1.376026e+04 | 9.928279e+03 | 6.613547e+03 |
| age | 3.849823e+01 | 3.375899e+01 | 3.204747e+01 |
| ed | 8.548676e+00 | 8.992479e+00 | 9.006721e+00 |
| work | 5.744896e-01 | 5.376063e-01 | 4.207099e-01 |
| unearn | 4.799607e+00 | 4.013291e+00 | 5.371749e+00 |
eitc$empearned = ifelse(eitc$work == 1, eitc$earn, 0)
d = sapply(subset(eitc, eitc$work ==1), mean)
childbreakout2 <-cbind(a,b,c,d)
colnames(childbreakout2) <- c("0 Children","1 Child","2+ Children","Employed")
kable(childbreakout2, "pipe", align="llccrr")
| 0 Children | 1 Child | 2+ Children | Employed | |
|---|---|---|---|---|
| state | 5.339666e+01 | 5.559091e+01 | 5.524386e+01 | 5.658296e+01 |
| year | 1.993365e+03 | 1.993338e+03 | 1.993330e+03 | 1.993380e+03 |
| urate | 6.663067e+00 | 6.802060e+00 | 6.858664e+00 | 6.656325e+00 |
| children | 0.000000e+00 | 1.000000e+00 | 2.801092e+00 | 9.784458e-01 |
| nonwhite | 5.159440e-01 | 5.964683e-01 | 7.088847e-01 | 5.646625e-01 |
| finc | 1.855986e+04 | 1.394157e+04 | 1.198530e+04 | 1.951025e+04 |
| earn | 1.376026e+04 | 9.928279e+03 | 6.613547e+03 | 1.646485e+04 |
| age | 3.849823e+01 | 3.375899e+01 | 3.204747e+01 | 3.572632e+01 |
| ed | 8.548676e+00 | 8.992479e+00 | 9.006721e+00 | 9.022689e+00 |
| work | 5.744896e-01 | 5.376063e-01 | 4.207099e-01 | 1.000000e+00 |
| unearn | 4.799607e+00 | 4.013291e+00 | 5.371749e+00 | 3.045400e+00 |
| empearned | 5.339666e+01 | 5.559091e+01 | 5.524386e+01 | 1.646485e+04 |
eitc$post93 = as.numeric(eitc$year >= 1994)
eitc$anykids = as.numeric(eitc$children >= 1)
minfo = aggregate(eitc$work, list(eitc$year,eitc$anykids == 1), mean)
# rename column headings (variables)
names(minfo) = c("YR","Treatment","LFPR")
# Attach a new column with labels
minfo$Group[1:6] = "Single women, no children"
minfo$Group[7:12] = "Single women, children"
#minfo
require(ggplot2) #package for creating nice plots
qplot(YR, LFPR, data=minfo, geom=c("point","line"), colour=Group,
xlab="Year", ylab="Employment Rate")+geom_vline(xintercept = 1994)
# This is the code from the class notes
require(foreign)
# Compute the four data points needed in the DID calculation:
a = sapply(subset(eitc, post93 == 0 & anykids == 0, select=work), mean)
b = sapply(subset(eitc, post93 == 0 & anykids == 1, select=work), mean)
c = sapply(subset(eitc, post93 == 1 & anykids == 0, select=work), mean)
d = sapply(subset(eitc, post93 == 1 & anykids == 1, select=work), mean)
# Compute the effect of the EITC on the employment of women with children:
difindif <- (d-c)-(b-a)
difindif
work 0.04687313
reg1 = lm(work ~ post93 + anykids + post93*anykids, data = eitc)
stargazer(reg1,type="html",covariate.labels = c("Post 1993","Children/No Children", "Interaction term"))
| Dependent variable: | |
| work | |
| Post 1993 | -0.002 |
| (0.013) | |
| Children/No Children | -0.129*** |
| (0.012) | |
| Interaction term | 0.047*** |
| (0.017) | |
| Constant | 0.575*** |
| (0.009) | |
| Observations | 13,746 |
| R2 | 0.013 |
| Adjusted R2 | 0.012 |
| Residual Std. Error | 0.497 (df = 13742) |
| F Statistic | 58.451*** (df = 3; 13742) |
| Note: | p<0.1; p<0.05; p<0.01 |
reg2 = lm(work ~ post93 + anykids + nonwhite + age + state + earn + post93*anykids, data = eitc)
stargazer(reg2, type = "html", covariate.labels = c("Post 1993", "Kids/No Kids", "Non-White (1)/ White(0)", "Age", "State", "Annual Earnings", "Interaction Term"))
| Dependent variable: | |
| work | |
| Post 1993 | -0.002 |
| (0.012) | |
| Kids/No Kids | -0.057*** |
| (0.011) | |
| Non-White (1)/ White(0) | -0.061*** |
| (0.008) | |
| Age | 0.001*** |
| (0.0004) | |
| State | 0.001*** |
| (0.0001) | |
| Annual Earnings | 0.00001*** |
| (0.00000) | |
| Interaction Term | 0.033** |
| (0.016) | |
| Constant | 0.355*** |
| (0.020) | |
| Observations | 13,746 |
| R2 | 0.128 |
| Adjusted R2 | 0.128 |
| Residual Std. Error | 0.467 (df = 13738) |
| F Statistic | 288.714*** (df = 7; 13738) |
| Note: | p<0.1; p<0.05; p<0.01 |
reg3 = lm(work ~ post93 + anykids + nonwhite + age + state + earn + urate + post93*anykids + urate*anykids, data = eitc)
stargazer(reg3, type = "html", covariate.labels = c("Post 1993", "Kids/No Kids", "Non-White (1)/ White(0)", "Age", "State", "Annual Earnings","Unemployment rate","Interaction Term - Post93/Any Kids","Interaction Term - Unemployment Rate/Any Kids"))
| Dependent variable: | |
| work | |
| Post 1993 | -0.031** |
| (0.014) | |
| Kids/No Kids | 0.033 |
| (0.046) | |
| Non-White (1)/ White(0) | -0.044*** |
| (0.008) | |
| Age | 0.001*** |
| (0.0004) | |
| State | 0.002*** |
| (0.0002) | |
| Annual Earnings | 0.00001*** |
| (0.00000) | |
| Unemployment rate | -0.022*** |
| (0.005) | |
| Interaction Term - Post93/Any Kids | 0.017 |
| (0.018) | |
| Interaction Term - Unemployment Rate/Any Kids | -0.012** |
| (0.006) | |
| Constant | 0.491*** |
| (0.038) | |
| Observations | 13,746 |
| R2 | 0.134 |
| Adjusted R2 | 0.133 |
| Residual Std. Error | 0.465 (df = 13736) |
| F Statistic | 235.312*** (df = 9; 13736) |
| Note: | p<0.1; p<0.05; p<0.01 |
eitc$onekid = as.numeric(eitc$children == 1)
eitc$twopluskid = as.numeric(eitc$children >= 2)
reg2plus = lm(work ~ post93 + onekid +twopluskid + nonwhite + age + state + earn + urate + post93*onekid +post93*twopluskid +urate*onekid +urate*twopluskid, data = eitc)
stargazer(reg2plus, type = "html", covariate.labels = c("Post 1993", "One Child", "2+ Children", "Non-White (1)/ White(0)", "Age", "State", "Annual Earnings","Unemployment Rate", "Interaction Term - Post93/1 Child", "Interaction term Post93/2+ Children", "Interaction term - Unemployment Rate/1 Child", "Interaction term - Unemployment Rate/2+ Children" ))
| Dependent variable: | |
| work | |
| Post 1993 | -0.032** |
| (0.013) | |
| One Child | 0.010 |
| (0.059) | |
| 2+ Children | 0.038 |
| (0.053) | |
| Non-White (1)/ White(0) | -0.039*** |
| (0.008) | |
| Age | 0.001*** |
| (0.0004) | |
| State | 0.002*** |
| (0.0001) | |
| Annual Earnings | 0.00001*** |
| (0.00000) | |
| Unemployment Rate | -0.022*** |
| (0.005) | |
| Interaction Term - Post93/1 Child | 0.007 |
| (0.023) | |
| Interaction term Post93/2+ Children | 0.024 |
| (0.020) | |
| Interaction term - Unemployment Rate/1 Child | -0.001 |
| (0.008) | |
| Interaction term - Unemployment Rate/2+ Children | -0.018*** |
| (0.007) | |
| Constant | 0.501*** |
| (0.038) | |
| Observations | 13,746 |
| R2 | 0.137 |
| Adjusted R2 | 0.137 |
| Residual Std. Error | 0.464 (df = 13733) |
| F Statistic | 182.192*** (df = 12; 13733) |
| Note: | p<0.1; p<0.05; p<0.01 |
eitc$post91 = as.numeric(eitc$year >= 1992)
eitc$anykids = as.numeric(eitc$children >= 1)
regfakepol = lm(work ~ post91 + anykids + nonwhite + age + state + earn + post91*anykids, data = eitc)
stargazer(regfakepol, type = "html", covariate.labels = c("Post 1991", "Kids/No Kids", "Non-White (1)/ White(0)", "Age", "State", "Annual Earnings","Interaction Term - Post 1991/Children"))
| Dependent variable: | |
| work | |
| Post 1991 | 0.001 |
| (0.016) | |
| Kids/No Kids | -0.043** |
| (0.019) | |
| Non-White (1)/ White(0) | -0.060*** |
| (0.008) | |
| Age | 0.001*** |
| (0.0004) | |
| State | 0.001*** |
| (0.0001) | |
| Annual Earnings | 0.00001*** |
| (0.00000) | |
| Interaction Term - Post 1991/Children | 0.002 |
| (0.021) | |
| Constant | 0.352*** |
| (0.023) | |
| Observations | 13,746 |
| R2 | 0.128 |
| Adjusted R2 | 0.127 |
| Residual Std. Error | 0.467 (df = 13738) |
| F Statistic | 287.369*** (df = 7; 13738) |
| Note: | p<0.1; p<0.05; p<0.01 |