##Assignment
The Earned Income Tax Credit (EITC) is a refundable tax credit for low income wage workers. In order to qualify for the EITC a person must have some earnings, but have an adjusted gross income below a threshold that varies by year and family size. The credit also depends on the number of children (0, 1, 2+). In essence, EITC can be thought as a wage subsidy, and in order to receive this subsidy, one has to be employed at the first place.This credit was substantially expanded in 1993. Suppose, we would like to estimate the effects of the 1993 expansion on labor supply. Essentially, we will compare labor supply for single women before and after 1993 by whether or not they had children: the EITC largely applied to women with children. In this exercise, we measure labor force participation by employment status (employed or not employed).To simplify the analysis, we just use a small sample with a selected variable from a large dataset. In blackboard, you will find a dataset called eitcRR.dta. This dataset contains CPS data for single women 20-54 with less than a high school education, as this group is most likely to be affected by the EITC.
The first thing I do is import Earned Income Tax Credit (EITC) data.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(haven)
eitcRR <- read_dta("C:/Users/lower/Downloads/eitcRR.dta")
head(eitcRR)
## # A tibble: 6 x 10
## state year children nonwhite finc earn age ed work unearn
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 11 1991 0 0 9630 0 53 7 0 9.63
## 2 11 1991 0 1 18714. 18714. 26 10 1 0
## 3 11 1991 0 0 31228. 14730. 48 11 1 16.5
## 4 11 1991 0 0 54331. 17676. 44 11 0 36.7
## 5 11 1991 0 0 8249. 8249. 20 10 1 0
## 6 11 1991 1 0 7499. 0 27 10 0 7.50
library(readxl)
UKCPR <- read_excel("C:/Users/lower/Downloads/UKCPR_National_Welfare_Data_Final_Update_20180116_0.xlsx", sheet = "Data")
head(UKCPR)
## # A tibble: 6 x 73
## state_name state year Population Employment Unemployment `Unemployment rate`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AL 1 1980 3893888 1521183 148106 8.9
## 2 AK 2 1980 401851 169397 18008 9.6
## 3 AZ 3 1980 2718215 1146371 81630 6.6
## 4 AR 4 1980 2286435 922894 75386 7.6
## 5 CA 5 1980 23667902 10787673 791379 6.8
## 6 CO 6 1980 2889964 1405381 86267 5.8
## # ... with 66 more variables: Marginally Food Insecure <lgl>,
## # Food Insecure <lgl>, Very Low Food Secure <lgl>, Gross State Product <dbl>,
## # Number of low income uninsured children <dbl>,
## # Percent Low Income Unisured Children <dbl>, Personal income <dbl>,
## # Workers' compensation <dbl>, AFDC/TANF Recipients <dbl>,
## # AFDC/TANF Caseloads <dbl>, Food Stamp/SNAP Recipients <dbl>,
## # Food Stamp/SNAP Caseloads <dbl>,
## # AFDC/TANF Benefit for 2-Person family <dbl>,
## # AFDC/TANF Benefit for 3-person family <dbl>,
## # AFDC/TANF benefit for 4-person family <dbl>,
## # FS/SNAP Benefit for 1-person family <dbl>,
## # FS/SNAP Benefit for 2-person family <dbl>,
## # FS/SNAP Benefit for 3-person family <dbl>,
## # FS/SNAP Benefit for 4-person family <dbl>,
## # AFDC/TANF_FS 2-Person Benefit <dbl>, AFDC/TANF_FS 3-Person Benefit <dbl>,
## # AFDC/TANF_FS 4-Person Benefit <dbl>, Child-only AFDC/TANF cases <dbl>,
## # SSI-Federal <dbl>, SSI-State <dbl>, Total SSI <dbl>, SSI_FS Benefit <dbl>,
## # Number of Poor (thousands) <dbl>, Poverty Rate <dbl>,
## # Governor is Democrat (1=Yes) <dbl>, Number in Lower House Democrat <dbl>,
## # Number in Lower House Republican <dbl>,
## # Fraction of State House that is Democrat <dbl>,
## # Number in Upper House Democrat <dbl>,
## # Number in Upper House Republican <dbl>,
## # Fraction of State Senate that is Democrat <dbl>,
## # EITC Phase-In Rate No Dependents <dbl>,
## # EITC Phase-In Rate 1 Dependent <dbl>,
## # EITC Phase-In Rate 2 Dependents <dbl>,
## # EITC Phase-In Rate 3 Dependents <dbl>,
## # EITC Maximum Credit No Dependents <dbl>,
## # EITC Maximum Credit 1 Dependent <dbl>,
## # EITC Maximum Credit 2 Dependents <dbl>,
## # EITC Maximum Credit 3 Dependents <dbl>,
## # EITC Phase-Out Rate No Dependents <dbl>,
## # EITC Phase-Out Rate 1 Dependent <dbl>,
## # EITC Phase-Out Rate 2 Dependents <dbl>,
## # EITC Phase-Out Rate 3 Dependents <dbl>, State EITC Rate <dbl>,
## # Wisconsin EITC Rate 2 Dependents <dbl>,
## # Wisconsin EITC Rate 3 Dependents <dbl>,
## # Refundable State EITC (1=Yes) <dbl>, Federal Minimum Wage <dbl>,
## # State Minimum Wage <dbl>, SSI recipients <dbl>, SSI recipients--Aged <dbl>,
## # SSI recipients--Blind <dbl>, SSI recipients--Disabled <dbl>,
## # Medicaid beneficiaries <dbl>, WIC participation <dbl>,
## # NSLP Free Participation <dbl>, NSLP Reduced Participation <dbl>,
## # NSLP Total Participation <dbl>, SBP Free Participation <dbl>,
## # SBP Reduced Participation <dbl>, SBP Total Participation <dbl>
Before merging the aforementioned dataframes, I had to correct for a couple problems.
I had to extract state codes from the “eticrr” dataframe and put them into their own data frame, called “eitcst”. I did this so that I can merge these states codes into another dataframe (BLS) from the Bureau of Labor Statistics. For context, the BLS dataset contains both FIPS codes and Postal abbreviations. Both of these variables were invaluable because they allowed me to to merge both the eitcrr and UKCPR dataframes into one singular dataframe. For instance, the “eitcrr” dataframe has a variable called “Label” that matches with a column in the FIPSmerge dataframe called “State.” Both these variables pertain to the actual name of the state. Conversely, the UKCPR has a variable called “state_name” which is an abbreviated spelling of the state’s name and matches with the “Postal” variable in the BLS dataset. The final dataframe is called masterdf. I print the first 6 rows of the FIPSmerge dataset to give an idea of how the dataframes were merged.
eitcst <- read_excel("C:/Users/lower/Downloads/eitcst.xlsx")
BLS <- read_excel("C:/Users/lower/Downloads/state-geocodes-v2018.xlsx")
FIPSmerge = merge(eitcst, BLS, by.x="Label",by.y="State")
eitcrr_st = merge(eitcRR, FIPSmerge, by.x="state",by.y="Code")
eitcrr_st$stateco <- as.numeric(eitcrr_st$FIPS)
df1 = merge(UKCPR, BLS, by.x="state_name",by.y="Postal")
df1$stateco <- as.numeric(df1$FIPS)
masterdf = merge(eitcrr_st, df1, by.x=c("stateco","year"), by.y=c("stateco","year"),all.x = TRUE)
head(FIPSmerge)
## Label Code Postal FIPS
## 1 Alabama 63 AL 1
## 2 Alaska 94 AK 2
## 3 Arizona 86 AZ 4
## 4 Arkansas 71 AR 5
## 5 California 93 CA 6
## 6 Colorado 84 CO 8
I made the decision to choose the following state-level variables: Gross State Product, Uemployment Rate, and Food Stamp/SNAP Caseloads. I believe these to variables help to contextualize the economic condition of each state by considering their relative output produced by labor (GSP), hardship in terms of acquiring work (unemployment rate) and the nominal burden that each state has in terms of welfare allotments (Food stamp/SNAP caseloads).
In order to verify that the merge was done correctly, I investigate distribution of each variable via the summary fucntion and I also estimate means for each of the aforementioned state-level variables. For the latter task, I then choose one state (in this case, Texas) and compare it to the orginal UKCPR dataset (before it was merged). If the values are the same, it indicates that the merge was done correctly. As shown below, both estimates are the same. Estimates are computed for every state, implying that the merge was successful.
summary(masterdf)
## stateco year state.x children
## Min. : 1.00 Min. :1991 Min. :11.00 Min. :0.000
## 1st Qu.:12.00 1st Qu.:1992 1st Qu.:31.00 1st Qu.:0.000
## Median :28.00 Median :1993 Median :56.00 Median :1.000
## Mean :26.69 Mean :1993 Mean :54.52 Mean :1.193
## 3rd Qu.:37.00 3rd Qu.:1995 3rd Qu.:81.00 3rd Qu.:2.000
## Max. :56.00 Max. :1996 Max. :95.00 Max. :9.000
##
## nonwhite finc earn age
## Min. :0.0000 Min. : 0 Min. : 0 Min. :20.00
## 1st Qu.:0.0000 1st Qu.: 5123 1st Qu.: 0 1st Qu.:26.00
## Median :1.0000 Median : 9637 Median : 3332 Median :34.00
## Mean :0.6007 Mean : 15255 Mean : 10432 Mean :35.21
## 3rd Qu.:1.0000 3rd Qu.: 18659 3rd Qu.: 14321 3rd Qu.:44.00
## Max. :1.0000 Max. :575617 Max. :537881 Max. :54.00
##
## ed work unearn Label
## Min. : 0.000 Min. :0.000 Min. : 0.000 Length:13746
## 1st Qu.: 7.000 1st Qu.:0.000 1st Qu.: 0.000 Class :character
## Median :10.000 Median :1.000 Median : 2.973 Mode :character
## Mean : 8.806 Mean :0.513 Mean : 4.823
## 3rd Qu.:11.000 3rd Qu.:1.000 3rd Qu.: 6.864
## Max. :11.000 Max. :1.000 Max. :134.058
##
## Postal FIPS.x state_name state.y
## Length:13746 Min. : 1.00 Length:13746 Min. : 1.00
## Class :character 1st Qu.:12.00 Class :character 1st Qu.:10.00
## Mode :character Median :28.00 Mode :character Median :25.00
## Mean :26.69 Mean :24.01
## 3rd Qu.:37.00 3rd Qu.:34.00
## Max. :56.00 Max. :51.00
##
## Population Employment Unemployment Unemployment rate
## Min. : 457739 Min. : 223326 Min. : 10125 Min. : 2.600
## 1st Qu.: 4091025 1st Qu.: 1779490 1st Qu.: 123566 1st Qu.: 5.800
## Median : 9659871 Median : 4558922 Median : 318682 Median : 6.900
## Mean :12161467 Mean : 5589981 Mean : 441987 Mean : 6.758
## 3rd Qu.:18140894 3rd Qu.: 8112684 3rd Qu.: 605728 3rd Qu.: 7.700
## Max. :31780829 Max. :14300443 Max. :1444167 Max. :11.300
##
## Marginally Food Insecure Food Insecure Very Low Food Secure
## Mode:logical Mode:logical Mode:logical
## NA's:13746 NA's:13746 NA's:13746
##
##
##
##
##
## Gross State Product Number of low income uninsured children
## Min. : 11691 Min. : NA
## 1st Qu.: 95866 1st Qu.: NA
## Median :251573 Median : NA
## Mean :328892 Mean :NaN
## 3rd Qu.:519704 3rd Qu.: NA
## Max. :964186 Max. : NA
## NA's :13746
## Percent Low Income Unisured Children Personal income Workers' compensation
## Min. : NA Min. : 8564768 Min. : 2497
## 1st Qu.: NA 1st Qu.: 80680540 1st Qu.: 51772
## Median : NA Median :224967367 Median : 171516
## Mean :NaN Mean :284418505 Mean : 526903
## 3rd Qu.: NA 3rd Qu.:432623921 3rd Qu.:1288489
## Max. : NA Max. :828822422 Max. :1917500
## NA's :13746
## AFDC/TANF Recipients AFDC/TANF Caseloads Food Stamp/SNAP Recipients
## Min. : 12839 Min. : 4732 Min. : 30266
## 1st Qu.: 171745 1st Qu.: 60985 1st Qu.: 396863
## Median : 560561 Median :200699 Median :1022140
## Mean : 764624 Mean :268553 Mean :1229376
## 3rd Qu.:1053433 3rd Qu.:371889 3rd Qu.:2153627
## Max. :2679653 Max. :919471 Max. :3174651
##
## Food Stamp/SNAP Caseloads AFDC/TANF Benefit for 2-Person family
## Min. : 10102 Min. : 93
## 1st Qu.: 166259 1st Qu.:236
## Median : 418277 Median :322
## Mean : 492952 Mean :338
## 3rd Qu.: 884777 3rd Qu.:468
## Max. :1179193 Max. :821
##
## AFDC/TANF Benefit for 3-person family AFDC/TANF benefit for 4-person family
## Min. :120.0 Min. : 144.0
## 1st Qu.:294.0 1st Qu.: 346.0
## Median :409.0 Median : 488.0
## Mean :418.5 Mean : 496.3
## 3rd Qu.:577.0 3rd Qu.: 687.0
## Max. :924.0 Max. :1027.0
##
## FS/SNAP Benefit for 1-person family FS/SNAP Benefit for 2-person family
## Min. :105.0 Min. :193.0
## 1st Qu.:111.0 1st Qu.:203.0
## Median :111.0 Median :203.0
## Mean :112.3 Mean :206.1
## 3rd Qu.:115.0 3rd Qu.:212.0
## Max. :198.0 Max. :364.0
##
## FS/SNAP Benefit for 3-person family FS/SNAP Benefit for 4-person family
## Min. :277.0 Min. :352.0
## 1st Qu.:292.0 1st Qu.:370.0
## Median :292.0 Median :370.0
## Mean :295.9 Mean :375.5
## 3rd Qu.:304.0 3rd Qu.:386.0
## Max. :522.0 Max. :663.0
##
## AFDC/TANF_FS 2-Person Benefit AFDC/TANF_FS 3-Person Benefit
## Min. : 286.0 Min. : 397
## 1st Qu.: 444.0 1st Qu.: 592
## Median : 523.0 Median : 677
## Mean : 527.5 Mean : 681
## 3rd Qu.: 630.0 3rd Qu.: 794
## Max. :1052.0 Max. :1245
##
## AFDC/TANF_FS 4-Person Benefit Child-only AFDC/TANF cases SSI-Federal
## Min. : 496 Min. : 370 Min. :407.0
## 1st Qu.: 716 1st Qu.: 10343 1st Qu.:422.0
## Median : 808 Median : 21445 Median :434.0
## Mean : 819 Mean : 49971 Mean :437.6
## 3rd Qu.: 949 3rd Qu.: 52116 3rd Qu.:458.0
## Max. :1427 Max. :223455 Max. :470.0
##
## SSI-State Total SSI SSI_FS Benefit Number of Poor (thousands)
## Min. : 0.00 Min. :407.0 Min. :481.0 Min. : 45
## 1st Qu.: 0.00 1st Qu.:434.0 1st Qu.:513.0 1st Qu.: 595
## Median : 14.00 Median :465.0 Median :545.0 Median :1340
## Mean : 53.01 Mean :490.5 Mean :558.3 Mean :1943
## 3rd Qu.: 86.00 3rd Qu.:532.0 3rd Qu.:592.0 3rd Qu.:3020
## Max. :374.00 Max. :832.0 Max. :933.0 Max. :5803
## NA's :40
## Poverty Rate Governor is Democrat (1=Yes) Number in Lower House Democrat
## Min. : 5.30 Min. :0.0000 Min. : 13.00
## 1st Qu.:12.30 1st Qu.:0.0000 1st Qu.: 47.00
## Median :15.70 Median :0.0000 Median : 69.00
## Mean :15.15 Mean :0.4397 Mean : 68.85
## 3rd Qu.:17.00 3rd Qu.:1.0000 3rd Qu.: 93.00
## Max. :26.40 Max. :1.0000 Max. :145.00
## NA's :266 NA's :343
## Number in Lower House Republican Fraction of State House that is Democrat
## Min. : 6.00 Min. :0.190
## 1st Qu.: 33.00 1st Qu.:0.530
## Median : 43.00 Median :0.600
## Mean : 45.88 Mean :0.592
## 3rd Qu.: 55.00 3rd Qu.:0.640
## Max. :282.00 Max. :0.910
## NA's :343 NA's :343
## Number in Upper House Democrat Number in Upper House Republican
## Min. :10.00 Min. : 1.00
## 1st Qu.:20.00 1st Qu.:11.00
## Median :25.00 Median :16.00
## Mean :24.79 Mean :17.82
## 3rd Qu.:27.00 3rd Qu.:23.00
## Max. :46.00 Max. :35.00
## NA's :343 NA's :343
## Fraction of State Senate that is Democrat EITC Phase-In Rate No Dependents
## Min. :0.3200 Min. :0.00000
## 1st Qu.:0.4700 1st Qu.:0.00000
## Median :0.5800 Median :0.00000
## Mean :0.5904 Mean :0.03531
## 3rd Qu.:0.7000 3rd Qu.:0.07650
## Max. :0.9700 Max. :0.07650
## NA's :343
## EITC Phase-In Rate 1 Dependent EITC Phase-In Rate 2 Dependents
## Min. :0.1670 Min. :0.173
## 1st Qu.:0.1760 1st Qu.:0.184
## Median :0.1850 Median :0.195
## Mean :0.2345 Mean :0.261
## 3rd Qu.:0.3400 3rd Qu.:0.360
## Max. :0.3400 Max. :0.400
##
## EITC Phase-In Rate 3 Dependents EITC Maximum Credit No Dependents
## Min. :0.173 Min. : 0.0
## 1st Qu.:0.184 1st Qu.: 0.0
## Median :0.195 Median : 0.0
## Mean :0.261 Mean :144.9
## 3rd Qu.:0.360 3rd Qu.:314.0
## Max. :0.400 Max. :323.0
##
## EITC Maximum Credit 1 Dependent EITC Maximum Credit 2 Dependents
## Min. :1192 Min. :1235
## 1st Qu.:1324 1st Qu.:1384
## Median :1434 Median :1511
## Mean :1672 Mean :2144
## 3rd Qu.:2094 3rd Qu.:3110
## Max. :2152 Max. :3556
##
## EITC Maximum Credit 3 Dependents EITC Phase-Out Rate No Dependents
## Min. :1235 Min. :0.00000
## 1st Qu.:1384 1st Qu.:0.00000
## Median :1511 Median :0.00000
## Mean :2144 Mean :0.03531
## 3rd Qu.:3110 3rd Qu.:0.07650
## Max. :3556 Max. :0.07650
##
## EITC Phase-Out Rate 1 Dependent EITC Phase-Out Rate 2 Dependents
## Min. :0.1193 Min. :0.1236
## 1st Qu.:0.1257 1st Qu.:0.1314
## Median :0.1321 Median :0.1393
## Mean :0.1413 Mean :0.1610
## 3rd Qu.:0.1598 3rd Qu.:0.2022
## Max. :0.1598 Max. :0.2106
##
## EITC Phase-Out Rate 3 Dependents State EITC Rate
## Min. :0.1236 Min. :0.00000
## 1st Qu.:0.1314 1st Qu.:0.00000
## Median :0.1393 Median :0.00000
## Mean :0.1610 Mean :0.01719
## 3rd Qu.:0.2022 3rd Qu.:0.00000
## Max. :0.2106 Max. :0.50000
##
## Wisconsin EITC Rate 2 Dependents Wisconsin EITC Rate 3 Dependents
## Min. :0.14 Min. :0.430
## 1st Qu.:0.16 1st Qu.:0.500
## Median :0.25 Median :0.750
## Mean :0.21 Mean :0.637
## 3rd Qu.:0.25 3rd Qu.:0.750
## Max. :0.25 Max. :0.750
## NA's :13618 NA's :13618
## Refundable State EITC (1=Yes) Federal Minimum Wage State Minimum Wage
## Min. :0.00000 Min. :4.250 Min. :1.600
## 1st Qu.:0.00000 1st Qu.:4.250 1st Qu.:4.250
## Median :0.00000 Median :4.250 Median :4.250
## Mean :0.07508 Mean :4.312 Mean :4.173
## 3rd Qu.:0.00000 3rd Qu.:4.250 3rd Qu.:4.250
## Max. :1.00000 Max. :4.750 Max. :5.250
##
## SSI recipients SSI recipients--Aged SSI recipients--Blind
## Min. : 3895 Min. : 669 Min. : 53
## 1st Qu.: 98186 1st Qu.: 20440 1st Qu.: 1150
## Median : 194087 Median : 40396 Median : 2526
## Mean : 327061 Mean : 94721 Mean : 5327
## 3rd Qu.: 444546 3rd Qu.:127750 3rd Qu.: 4041
## Max. :1044753 Max. :335845 Max. :22602
##
## SSI recipients--Disabled Medicaid beneficiaries WIC participation
## Min. : 3117 Min. : 36804 Min. : 9272
## 1st Qu.: 74249 1st Qu.: 486110 1st Qu.: 101836
## Median :156696 Median :1171548 Median : 204254
## Mean :227013 Mean :1683387 Mean : 285893
## 3rd Qu.:314428 3rd Qu.:2557701 3rd Qu.: 429818
## Max. :690960 Max. :5106746 Max. :1141598
##
## NSLP Free Participation NSLP Reduced Participation NSLP Total Participation
## Min. : 12897 Min. : 1287 Min. : 39686
## 1st Qu.: 174511 1st Qu.: 30995 1st Qu.: 418577
## Median : 350257 Median : 54701 Median : 915151
## Mean : 582448 Mean : 71593 Mean :1043125
## 3rd Qu.: 982000 3rd Qu.:114318 3rd Qu.:1659921
## Max. :1686979 Max. :176436 Max. :2414950
##
## SBP Free Participation SBP Reduced Participation SBP Total Participation
## Min. : 2339 Min. : 72.91 Min. : 3134
## 1st Qu.: 56525 1st Qu.: 3108.78 1st Qu.: 69656
## Median :126334 Median : 9133.00 Median :155581
## Mean :218987 Mean :11983.09 Mean :255324
## 3rd Qu.:335336 3rd Qu.:16856.62 3rd Qu.:403637
## Max. :695687 Max. :43708.94 Max. :773287
##
## State FIPS.y
## Length:13746 Min. : 1.00
## Class :character 1st Qu.:12.00
## Mode :character Median :28.00
## Mean :26.69
## 3rd Qu.:37.00
## Max. :56.00
##
df2 <- masterdf %>% group_by(State) %>% summarise(gsp = mean(`Gross State Product`),
unemplrate = mean(`Unemployment rate`),
foodsnapca = mean(`Food Stamp/SNAP Caseloads`))%>%arrange(State)
UKCPR %>% filter(state_name == "TX") %>% summarise(gsp = mean(`Gross State Product`),
unemplrate = mean(`Unemployment rate`),
foodsnapca = mean(`Food Stamp/SNAP Caseloads`))
## # A tibble: 1 x 3
## gsp unemplrate foodsnapca
## <dbl> <dbl> <dbl>
## 1 761949. 6.19 851503.
library(kableExtra)#table decor
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
# Make Table
kable(df2, caption = "Table 1. Mean estimates of Gross State Product, unemployment rate and food stamps/SNAP caseloads", digits=2, booktabs = FALSE, format.args = list(big.mark = ",")) %>%
kable_classic_2(full_width = F, html_font = "Times")%>%
kable_styling(font_size = 14)%>%
row_spec(dim(df2)[1], bold = F) %>% # format last row
column_spec(1, italic = F) # format first column
| State | gsp | unemplrate | foodsnapca |
|---|---|---|---|
| Alabama | 89,169.06 | 6.47 | 206,533.34 |
| Alaska | 23,443.08 | 8.17 | 13,135.22 |
| Arizona | 99,018.60 | 6.03 | 168,514.82 |
| Arkansas | 49,318.43 | 6.08 | 103,797.05 |
| California | 855,873.77 | 8.40 | 1,050,085.62 |
| Colorado | 100,359.42 | 4.84 | 101,995.88 |
| Connecticut | 113,906.59 | 6.20 | 90,624.55 |
| Delaware | 24,937.40 | 5.23 | 19,676.83 |
| District of Columbia | 45,352.18 | 8.37 | 39,768.01 |
| Florida | 314,892.67 | 6.70 | 556,805.46 |
| Georgia | 182,906.09 | 5.39 | 305,895.95 |
| Hawaii | 36,555.06 | 4.72 | 45,510.94 |
| Idaho | 24,147.69 | 5.79 | 27,908.67 |
| Illinois | 331,483.19 | 6.60 | 483,010.42 |
| Indiana | 133,785.40 | 5.43 | 160,315.60 |
| Iowa | 66,081.13 | 4.09 | 74,715.56 |
| Kansas | 60,386.73 | 4.54 | 69,133.02 |
| Kentucky | 83,057.24 | 6.22 | 191,444.26 |
| Louisiana | 103,603.15 | 7.38 | 270,868.98 |
| Maine | 25,665.21 | 6.89 | 57,770.99 |
| Maryland | 130,196.23 | 5.64 | 156,356.43 |
| Massachusetts | 177,550.78 | 7.04 | 181,149.15 |
| Michigan | 231,593.37 | 7.18 | 416,329.34 |
| Minnesota | 122,547.20 | 4.56 | 127,937.96 |
| Mississippi | 49,203.00 | 6.99 | 189,674.68 |
| Missouri | 125,616.12 | 5.69 | 224,494.86 |
| Montana | 16,037.04 | 6.05 | 26,402.44 |
| Nebraska | 43,686.56 | 2.70 | 43,438.54 |
| Nevada | 42,243.22 | 6.06 | 39,889.81 |
| New Hampshire | 28,389.83 | 5.89 | 23,725.55 |
| New Jersey | 241,801.25 | 7.21 | 209,958.60 |
| New Mexico | 39,081.02 | 6.91 | 81,575.30 |
| New York | 549,996.13 | 7.20 | 927,157.24 |
| North Carolina | 170,360.37 | 5.14 | 240,494.74 |
| North Dakota | 13,549.39 | 4.01 | 17,122.74 |
| Ohio | 268,945.71 | 6.12 | 514,589.21 |
| Oklahoma | 67,226.84 | 5.43 | 140,837.49 |
| Oregon | 72,634.08 | 6.19 | 120,557.08 |
| Pennsylvania | 285,869.32 | 6.69 | 499,960.64 |
| Rhode Island | 24,210.56 | 7.22 | 39,025.45 |
| South Carolina | 77,570.54 | 6.28 | 134,900.44 |
| South Dakota | 16,010.47 | 3.31 | 18,897.56 |
| Tennessee | 121,560.84 | 5.81 | 282,316.16 |
| Texas | 461,910.03 | 6.69 | 905,784.23 |
| Utah | 40,723.39 | 4.10 | 44,023.96 |
| Vermont | 12,837.62 | 5.62 | 23,840.18 |
| Virginia | 171,340.52 | 5.25 | 213,824.92 |
| Washington | 144,249.79 | 6.68 | 186,592.98 |
| West Virginia | 33,040.72 | 9.54 | 118,846.69 |
| Wisconsin | 121,594.23 | 4.70 | 115,576.08 |
| Wyoming | 14,098.98 | 5.11 | 12,425.16 |
I perform this for this for all states (total) and for each state:
masterdf$earn_new <- ifelse(masterdf$earn == 0, NA, masterdf$earn)
a <- masterdf %>% summarise(no_children = mean(children == 0),
one_child = mean(children == 1),
two_more_children = mean(children >= 2),
earn = mean(earn),
earn2 = mean(earn_new, na.rm = TRUE))
b <- masterdf %>% group_by(State) %>% summarise(no_children = mean(children == 0),
one_child = mean(children == 1),
two_more_children = mean(children >= 2),
earn = mean(earn),
earn2 = mean(earn_new, na.rm = TRUE))
kable(a, caption = "Table 2. Mean estimates of children and earnings for single women", digits=2, booktabs = FALSE, format.args = list(big.mark = ",")) %>%
kable_classic_2(full_width = F, html_font = "Times")%>%
kable_styling(font_size = 14)%>%
row_spec(dim(a)[1], bold = F) %>% # format last row
column_spec(1, italic = F) # format first column
| no_children | one_child | two_more_children | earn | earn2 |
|---|---|---|---|---|
| 0.43 | 0.22 | 0.35 | 10,432.48 | 17,072 |
kable(b, caption = "Table 3. Mean estimates of children and earnings for single women by state", digits=2, booktabs = FALSE, format.args = list(big.mark = ",")) %>%
kable_classic_2(full_width = F, html_font = "Times")%>%
kable_styling(font_size = 14)%>%
row_spec(dim(b)[1], bold = F) %>% # format last row
column_spec(1, italic = F) # format first column
| State | no_children | one_child | two_more_children | earn | earn2 |
|---|---|---|---|---|---|
| Alabama | 0.47 | 0.21 | 0.32 | 7,958.87 | 14,604.93 |
| Alaska | 0.45 | 0.31 | 0.24 | 13,442.20 | 21,208.80 |
| Arizona | 0.34 | 0.23 | 0.44 | 10,101.77 | 15,974.89 |
| Arkansas | 0.48 | 0.24 | 0.29 | 7,730.55 | 11,068.75 |
| California | 0.36 | 0.25 | 0.39 | 12,425.22 | 19,371.46 |
| Colorado | 0.50 | 0.19 | 0.30 | 11,740.76 | 18,144.81 |
| Connecticut | 0.39 | 0.21 | 0.39 | 9,751.59 | 15,602.54 |
| Delaware | 0.49 | 0.30 | 0.20 | 10,605.23 | 15,670.41 |
| District of Columbia | 0.44 | 0.23 | 0.33 | 11,510.66 | 18,225.21 |
| Florida | 0.45 | 0.22 | 0.34 | 10,511.17 | 16,057.49 |
| Georgia | 0.55 | 0.18 | 0.27 | 11,430.47 | 16,255.02 |
| Hawaii | 0.48 | 0.17 | 0.35 | 22,252.16 | 34,967.68 |
| Idaho | 0.36 | 0.28 | 0.36 | 10,634.77 | 14,857.40 |
| Illinois | 0.45 | 0.19 | 0.36 | 12,402.17 | 21,970.61 |
| Indiana | 0.49 | 0.21 | 0.30 | 14,130.41 | 20,115.06 |
| Iowa | 0.50 | 0.24 | 0.26 | 11,439.76 | 14,490.36 |
| Kansas | 0.45 | 0.24 | 0.32 | 12,422.72 | 16,498.92 |
| Kentucky | 0.37 | 0.32 | 0.31 | 5,743.84 | 11,629.50 |
| Louisiana | 0.38 | 0.23 | 0.38 | 5,865.05 | 12,791.39 |
| Maine | 0.58 | 0.22 | 0.20 | 5,546.43 | 11,709.13 |
| Maryland | 0.47 | 0.19 | 0.34 | 10,497.08 | 17,858.67 |
| Massachusetts | 0.43 | 0.20 | 0.37 | 10,888.22 | 21,546.00 |
| Michigan | 0.44 | 0.21 | 0.35 | 11,498.88 | 20,166.60 |
| Minnesota | 0.39 | 0.21 | 0.39 | 6,030.81 | 10,194.94 |
| Mississippi | 0.40 | 0.19 | 0.41 | 7,899.70 | 11,804.66 |
| Missouri | 0.50 | 0.29 | 0.21 | 6,795.89 | 10,746.99 |
| Montana | 0.38 | 0.22 | 0.40 | 5,196.28 | 8,700.75 |
| Nebraska | 0.55 | 0.13 | 0.32 | 13,114.14 | 17,410.15 |
| Nevada | 0.47 | 0.18 | 0.35 | 15,925.20 | 20,218.08 |
| New Hampshire | 0.46 | 0.23 | 0.31 | 9,758.81 | 16,661.39 |
| New Jersey | 0.46 | 0.21 | 0.33 | 13,491.53 | 21,724.38 |
| New Mexico | 0.38 | 0.19 | 0.43 | 6,291.58 | 10,166.27 |
| New York | 0.40 | 0.22 | 0.38 | 9,027.18 | 19,524.74 |
| North Carolina | 0.51 | 0.25 | 0.24 | 11,096.30 | 14,943.93 |
| North Dakota | 0.55 | 0.22 | 0.22 | 9,402.82 | 15,365.59 |
| Ohio | 0.41 | 0.19 | 0.40 | 7,962.79 | 13,057.75 |
| Oklahoma | 0.48 | 0.28 | 0.24 | 6,810.53 | 11,322.51 |
| Oregon | 0.49 | 0.25 | 0.26 | 10,239.57 | 15,169.73 |
| Pennsylvania | 0.51 | 0.24 | 0.25 | 9,644.32 | 19,103.17 |
| Rhode Island | 0.41 | 0.23 | 0.36 | 12,565.55 | 18,345.70 |
| South Carolina | 0.44 | 0.21 | 0.35 | 11,512.95 | 17,745.53 |
| South Dakota | 0.52 | 0.23 | 0.24 | 9,270.93 | 13,449.38 |
| Tennessee | 0.49 | 0.18 | 0.33 | 10,031.12 | 16,600.17 |
| Texas | 0.42 | 0.22 | 0.36 | 9,918.50 | 14,040.87 |
| Utah | 0.35 | 0.19 | 0.46 | 14,760.41 | 22,632.63 |
| Vermont | 0.40 | 0.19 | 0.40 | 7,558.87 | 12,283.17 |
| Virginia | 0.55 | 0.21 | 0.24 | 12,812.53 | 17,807.24 |
| Washington | 0.43 | 0.26 | 0.31 | 8,705.69 | 14,565.29 |
| West Virginia | 0.52 | 0.20 | 0.28 | 5,692.65 | 10,849.52 |
| Wisconsin | 0.48 | 0.20 | 0.31 | 8,890.23 | 12,236.02 |
| Wyoming | 0.49 | 0.24 | 0.27 | 5,536.51 | 9,108.45 |
Though these estimates are purely descriptive, it is evident that the percent of women without a child make up nearly half of the total sample. The estimates in fact are quite telling across each category. For instance, Maine has the highest proportion of single women without children, while Kentucky and Utah have the largest percentage of one child and two or more children, respectively. Further, when we earnings is set to be conditional on working, Hawaii tends to have the largest earnings per woman.
masterdf$post <- ifelse(masterdf$year >= 1993,1,0)
masterdf$treatment <- ifelse(masterdf$children >= 1,1,0)
masterdf$did <- masterdf$post * masterdf$treatment
summary(masterdf$treatment)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5688 1.0000 1.0000
The graph below indicates that pre-treatment trends exhibit decliens prior to the expansion. SO yes, this may post difficulties.
library(ggplot2)
ggplot(masterdf, aes(x = year,
y = `Unemployment rate`,
color = as.factor(treatment))) +
stat_summary(geom = 'line') +
geom_vline(xintercept = 1993) +
theme_minimal()
## No summary function supplied, defaulting to `mean_se()`
The results below indicate that the expected differences are much more stark for women with children, whose effect changed from prior to the expanision from .449 to .476. This make for a difference of almost 3 points. The difference is much more marginal for women without children.
didunco <- masterdf %>%
group_by(post, treatment) %>%
summarize(women = mean(work))
## `summarise()` has grouped output by 'post'. You can override using the `.groups` argument.
didunco
## # A tibble: 4 x 3
## # Groups: post [2]
## post treatment women
## <dbl> <dbl> <dbl>
## 1 0 0 0.577
## 2 0 1 0.450
## 3 1 0 0.573
## 4 1 1 0.476
# Compute the four data points needed in the DID calculation:
a = sapply(subset(masterdf, post == 0 & treatment == 0, select=work), mean)
b = sapply(subset(masterdf, post == 0 & treatment == 1, select=work), mean)
c = sapply(subset(masterdf, post == 1 & treatment == 0, select=work), mean)
d = sapply(subset(masterdf, post == 1 & treatment == 1, select=work), mean)
# Compute the effect of the EITC on the employment of women with children:
(d-c)-(b-a)
## work
## 0.03112796
The joinnt interaction term under DiD has a postive association with the outcome. The statistical signficance of this particular model is marginally insignificant
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble 3.1.0 v purrr 0.3.4
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x kableExtra::group_rows() masks dplyr::group_rows()
## x dplyr::lag() masks stats::lag()
library(tidyverse) # ggplot(), %>%, mutate(), and friends
library(scales) # Format numbers with functions like comma(), percent(), and dollar()
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(broom) # Convert models to data frames
model1 <- glm(work ~ treatment + post + did + nonwhite + age,
data = masterdf, family = "binomial")
tidy(model1)
## # A tibble: 6 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.240 0.0817 2.94 3.32e- 3
## 2 treatment -0.453 0.0585 -7.74 9.64e-15
## 3 post -0.0147 0.0548 -0.268 7.89e- 1
## 4 did 0.136 0.0723 1.89 5.90e- 2
## 5 nonwhite -0.260 0.0356 -7.30 2.90e-13
## 6 age 0.00533 0.00177 3.01 2.59e- 3
summary(model1)
##
## Call:
## glm(formula = work ~ treatment + post + did + nonwhite + age,
## family = "binomial", data = masterdf)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4081 -1.1633 0.9708 1.1415 1.3364
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.240028 0.081734 2.937 0.00332 **
## treatment -0.453034 0.058502 -7.744 9.64e-15 ***
## post -0.014673 0.054818 -0.268 0.78895
## did 0.136435 0.072250 1.888 0.05898 .
## nonwhite -0.259838 0.035599 -7.299 2.90e-13 ***
## age 0.005327 0.001768 3.012 0.00259 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 19047 on 13745 degrees of freedom
## Residual deviance: 18823 on 13740 degrees of freedom
## AIC: 18835
##
## Number of Fisher Scoring iterations: 4
model2 <- glm(work ~ treatment + post + did + nonwhite + age + `Gross State Product` + `Food Stamp/SNAP Caseloads`, data = masterdf, family = "binomial")
tidy(model2)
## # A tibble: 8 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.282 0.0830 3.40 6.81e- 4
## 2 treatment -0.453 0.0586 -7.73 1.10e-14
## 3 post 0.0178 0.0551 0.322 7.47e- 1
## 4 did 0.137 0.0723 1.90 5.79e- 2
## 5 nonwhite -0.226 0.0375 -6.01 1.83e- 9
## 6 age 0.00570 0.00177 3.21 1.31e- 3
## 7 `Gross State Product` 0.000000930 0.000000194 4.80 1.63e- 6
## 8 `Food Stamp/SNAP Caseloads` -0.000000816 0.000000142 -5.75 8.70e- 9
summary(model2)
##
## Call:
## glm(formula = work ~ treatment + post + did + nonwhite + age +
## `Gross State Product` + `Food Stamp/SNAP Caseloads`, family = "binomial",
## data = masterdf)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.465 -1.165 0.947 1.143 1.450
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.821e-01 8.303e-02 3.397 0.000681 ***
## treatment -4.525e-01 5.856e-02 -7.727 1.10e-14 ***
## post 1.778e-02 5.515e-02 0.322 0.747079
## did 1.372e-01 7.234e-02 1.897 0.057852 .
## nonwhite -2.256e-01 3.752e-02 -6.012 1.83e-09 ***
## age 5.695e-03 1.772e-03 3.213 0.001313 **
## `Gross State Product` 9.297e-07 1.939e-07 4.795 1.63e-06 ***
## `Food Stamp/SNAP Caseloads` -8.163e-07 1.419e-07 -5.754 8.70e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 19047 on 13745 degrees of freedom
## Residual deviance: 18786 on 13738 degrees of freedom
## AIC: 18802
##
## Number of Fisher Scoring iterations: 4
The DiD estimates for the placebo are not significant, but even still, the differences across the artificial policy expansion is largely notable across both groups.
mastplac <- masterdf %>% filter(year < 1994)
mastplac$post_placebo <- ifelse(mastplac$year >= 1992,1,0)
mastplac$treatment <- ifelse(mastplac$children >= 1,1,0)
mastplac$did <- mastplac$post_placebo * mastplac$treatment
summary(mastplac$treatment)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5738 1.0000 1.0000
library(ggplot2)
ggplot(mastplac, aes(x = year,
y = `Unemployment rate`,
color = as.factor(treatment))) +
stat_summary(geom = 'line') +
geom_vline(xintercept = 1992) +
theme_minimal()
## No summary function supplied, defaulting to `mean_se()`
diduncob <- mastplac %>%
group_by(post_placebo, treatment) %>%
summarize(women = mean(work))
## `summarise()` has grouped output by 'post_placebo'. You can override using the `.groups` argument.
diduncob
## # A tibble: 4 x 3
## # Groups: post_placebo [2]
## post_placebo treatment women
## <dbl> <dbl> <dbl>
## 1 0 0 0.583
## 2 0 1 0.460
## 3 1 0 0.571
## 4 1 1 0.438
# Compute the four data points needed in the DID calculation:
a = sapply(subset(mastplac, post_placebo == 0 & treatment == 0, select=work), mean)
b = sapply(subset(mastplac, post_placebo == 0 & treatment == 1, select=work), mean)
c = sapply(subset(mastplac, post_placebo == 1 & treatment == 0, select=work), mean)
d = sapply(subset(mastplac, post_placebo == 1 & treatment == 1, select=work), mean)
# Compute the effect of the EITC on the employment of women with children:
(d-c)-(b-a)
## work
## -0.01012815
model11 <- glm(work ~ treatment + post_placebo + did + nonwhite + age,
data = mastplac, family = "binomial")
tidy(model11)
## # A tibble: 6 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.251 0.113 2.23 0.0258
## 2 treatment -0.432 0.0816 -5.30 0.000000118
## 3 post_placebo -0.0442 0.0757 -0.583 0.560
## 4 did -0.0412 0.0996 -0.414 0.679
## 5 nonwhite -0.251 0.0480 -5.22 0.000000176
## 6 age 0.00545 0.00242 2.25 0.0243
summary(model11)
##
## Call:
## glm(formula = work ~ treatment + post_placebo + did + nonwhite +
## age, family = "binomial", data = mastplac)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4162 -1.1364 0.9561 1.1547 1.3549
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.251486 0.112812 2.229 0.0258 *
## treatment -0.432176 0.081592 -5.297 1.18e-07 ***
## post_placebo -0.044167 0.075734 -0.583 0.5598
## did -0.041250 0.099564 -0.414 0.6787
## nonwhite -0.250845 0.048026 -5.223 1.76e-07 ***
## age 0.005448 0.002419 2.252 0.0243 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10260 on 7400 degrees of freedom
## Residual deviance: 10104 on 7395 degrees of freedom
## AIC: 10116
##
## Number of Fisher Scoring iterations: 4
model211 <- glm(work ~ treatment + post_placebo + did + nonwhite + age + `Gross State Product` + `Food Stamp/SNAP Caseloads`, data = mastplac, family = "binomial")
tidy(model211)
## # A tibble: 8 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.270 0.115 2.36 0.0184
## 2 treatment -0.428 0.0816 -5.24 0.000000163
## 3 post_placebo -0.00569 0.0767 -0.0742 0.941
## 4 did -0.0466 0.0996 -0.468 0.640
## 5 nonwhite -0.232 0.0510 -4.55 0.00000546
## 6 age 0.00579 0.00242 2.39 0.0169
## 7 `Gross State Product` 0.000000686 0.000000247 2.77 0.00552
## 8 `Food Stamp/SNAP Caseloads` -0.000000606 0.000000187 -3.23 0.00122
summary(model211)
##
## Call:
## glm(formula = work ~ treatment + post_placebo + did + nonwhite +
## age + `Gross State Product` + `Food Stamp/SNAP Caseloads`,
## family = "binomial", data = mastplac)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4516 -1.1391 0.9322 1.1665 1.4454
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.702e-01 1.146e-01 2.358 0.01835 *
## treatment -4.276e-01 8.164e-02 -5.237 1.63e-07 ***
## post_placebo -5.687e-03 7.668e-02 -0.074 0.94088
## did -4.660e-02 9.964e-02 -0.468 0.64002
## nonwhite -2.319e-01 5.101e-02 -4.546 5.46e-06 ***
## age 5.793e-03 2.424e-03 2.390 0.01687 *
## `Gross State Product` 6.864e-07 2.474e-07 2.775 0.00552 **
## `Food Stamp/SNAP Caseloads` -6.056e-07 1.872e-07 -3.235 0.00122 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10260 on 7400 degrees of freedom
## Residual deviance: 10094 on 7393 degrees of freedom
## AIC: 10110
##
## Number of Fisher Scoring iterations: 4