Question 1
Using Manual DID we find that the coeffienct of interest is 2.74. This is the effect of the policy on wages.
#remove missing values
cards_new <- filter(cards, fte != "NA")
#manual DID
cards_did <- group_by(cards_new, nj, d )
summarise(cards_did,
cards_g = mean(fte))## # A tibble: 4 x 3
## # Groups: nj [2]
## nj d cards_g
## <dbl> <dbl> <dbl>
## 1 0 0 23.3
## 2 0 1 21.2
## 3 1 0 20.4
## 4 1 1 21.0
## [1] 2.75361
Question 2
The coefficient of interest is 2.74 and is not statistically significant at the 0.1 significance level. The coefficient of interest is the diff in diff estimator. Here the effect on wages is 2.74 and its is not significant at 10%.
##
## Call:
## lm(formula = fte ~ nj * d, data = cards_new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.166 -6.439 -1.027 4.473 64.561
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.331 1.072 21.767 <2e-16 ***
## nj -2.892 1.194 -2.423 0.0156 *
## d -2.166 1.516 -1.429 0.1535
## nj:d 2.754 1.688 1.631 0.1033
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.406 on 790 degrees of freedom
## Multiple R-squared: 0.007401, Adjusted R-squared: 0.003632
## F-statistic: 1.964 on 3 and 790 DF, p-value: 0.118
Question 3
The answer does not change qualitatively. We see a small difference in the coefficient from 2.75 to 2.55. However, we still interpret the result as the causal effect is not statistically significant. The result only changes quantitatively.
Solution:
First we set the NA values to an absolute 0. Then we run the regression analysis. See below code and result.
##
## Call:
## lm(formula = fte ~ nj * d, data = cards)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.741 -6.322 -0.697 4.885 65.178
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.741 1.121 20.289 <2e-16 ***
## nj -2.919 1.247 -2.340 0.0195 *
## d -2.111 1.585 -1.332 0.1834
## nj:d 2.554 1.764 1.448 0.1481
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.962 on 816 degrees of freedom
## Multiple R-squared: 0.006773, Adjusted R-squared: 0.003121
## F-statistic: 1.855 on 3 and 816 DF, p-value: 0.1358