Paper: Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532
The paper evaluates the causal impact of class sizes on student performance. The main hypothesis is that students in smaller class sizes achieve better learning outcomes than those in larger class sizes.
In the ideal experiment to test the causal impact of class sizes on student performance, students with similar cognitive characteristics and enrolled in similar schooling environments with similar teaching conditions except class sizes would be randomly assigned to different class sizes. The sudents will be observed during a certain period in which they receive the same educational quality in all regards except the class size assignment. At the end of the study period, the students will take the same tests, and their performance on these tests will be examined. Any difference in test scores will be solely attributed to the difference in class size.
The author uses a randomized controlled trial of an education intervention that initially assigns kindergarden students in Tennessee to three different study groups: small classes (treatment 1), regular-size classes with teacher aides (treatment 2), and regular-size classes without teacher aides (control). The intervention is implemented for four years. At the end of each year, students take standardized tests, and the performance of the study groups is compared. The effect of class sizes is evaluated by comparing treatment 1 relative to the control, while the effect of teacher aides is evaluated by comparing treatment 2 relative to the control.
There are several threats to the identificaton strategy. For example, it would not be revealing a causal effect if any of the following two cases holds :
students transitioning from one study group to another group, in a non-random pattern ;
students leaving non-randomly the intervention from one year to another year.
The author addresses these two issues in the econometric analysis.
Paper: Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173
The paper evaluates the causal impact of schooling levels on wages. The main hypothesis is that higher schooling levels lead to higher wages.
It would be ethically hard for a study to randomly assign some subjects to lower schooling levels and other subjects to higher schooling levels until graduation, measure wage differences and attribute them to differences in schooling levels. In an alternative ideal experiment to test the causal impact of schooling levels on wages, subjects having similar ability characteristics, faced with similar job opportunities, and previously exposed to similar educational environments except that they do not leave schools at the same grades or levels, will be studied. Their wages upon graduation will be compared. Any difference in wages will be attributed to the difference in schooling levels, provided there are no confounding factors.
To identify the causal effect of schooling levels on wages, the authors use a sample of twins (genetically identical, thus with same innate ability) to estimate a wage equation with little to no endogeneity problem.
As the authors deal with a sample of twins, errors in wages may arise from the unobserved characteristics of families that are selected into the sample (family bias) and the unobserved characteristics of the twins themselves (individual bias). Therefore, the econometric specification disentangles these biases and isolates them from the desired structural effect of years of schooling on wages. The authors identify the structural effect of individual schooling levels on wages after controlling for the potential structural effects of other individual-level education and job variables and for the family selection effect.
There are several threats to the identificaton strategy. Most importantly, it would not be revealing a causal effect if there are sources of endogeneity in the wage regression. The two potential sources applicable, here, are:
omitted ability variables ;
measurement errors (mostly in the determinants of wage).
The authors conduct robust analyses to address these two issues in the econometric estimations.
# Loading packages
knitr::opts_chunk$set(echo = TRUE, eval=TRUE, message=FALSE, warning=FALSE, fig.height=4)
necessaryPackages <- c("foreign","reshape","rvest","tidyverse","dplyr","stringr","ggplot2", "stargazer","readr")
new.packages <- necessaryPackages[
!(necessaryPackages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(necessaryPackages, require, character.only = TRUE)
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] TRUE
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] TRUE
##
## [[7]]
## [1] TRUE
##
## [[8]]
## [1] TRUE
##
## [[9]]
## [1] TRUE
# Importing the dataset
paper2data <- read.dta("AshenfelterKrueger1994_twins.dta")
# Describing the dataset
#str(paper2data)
#summary(paper2data)
The equation estimated in column 5 of Table 3 is a first-difference regression of the logarithm of wage on years of schooling (educ), as shown in Equation (1): \[\begin{equation} \tag{1} \Delta log(wage)=\Delta \alpha + \beta \Delta educ + \Delta \varepsilon. \end{equation}\]
# Generating first-differencing variables
paper2data$dlwage <- paper2data$lwage1 - paper2data$lwage2
paper2data$deduc <- paper2data$educ1 - paper2data$educ2
# First-difference regression
fdmodel <- lm(dlwage ~ deduc, paper2data)
summary(fdmodel)
##
## Call:
## lm(formula = dlwage ~ deduc, data = paper2data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.03115 -0.20909 0.00722 0.34395 1.15740
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.07859 0.04547 -1.728 0.086023 .
## deduc 0.09157 0.02371 3.862 0.000168 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5542 on 147 degrees of freedom
## Multiple R-squared: 0.09211, Adjusted R-squared: 0.08593
## F-statistic: 14.91 on 1 and 147 DF, p-value: 0.0001682
The output below reproduces the estimation results from Table 3 column 5 of the paper.
# Formatting the table
stargazer(fdmodel, type="text", no.space=TRUE, keep.stat = c("n","rsq"), column.labels=c("First difference"))
##
## ========================================
## Dependent variable:
## ---------------------------
## dlwage
## First difference
## ----------------------------------------
## deduc 0.092***
## (0.024)
## Constant -0.079*
## (0.045)
## ----------------------------------------
## Observations 149
## R2 0.092
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
\[\beta = \frac{\partial \quad \Delta log(wage)}{\partial \quad \Delta educ}\] To rewrite the expression the expression of \(\beta\) without the first-difference operator, rewrite the original log-level model for each twin \(i\), \(i=1,2\), allowing for different intercepts: \[\begin{equation} \tag{2} log(wage_{1})=\alpha_{1} + \beta educ_{1} + \varepsilon_{1} \end{equation}\]
\[\begin{equation} \tag{3} log(wage_{2})=\alpha_{2} + \beta educ_{2} + \varepsilon_{2}. \end{equation}\]
Subtracting Equation (3) from Equation (2) yields Equation (1), thus showing that \(\beta\) has the same interpretation as the partial effect of years of schooling on the logarithm of earnings for any twin:
\[\beta = \frac{\partial \quad log(wage)}{\partial \quad educ} \quad \Rightarrow \quad 100 \beta = \frac{100 \quad d (wage)/wage}{d(educ)}. \] As the estimation of Equation (1) suggests \(\hat{\beta}=0.092\), it follows that a one-year increase in education is associated to an increase in wages by 9.2 percent. This marginal effect is significant at the 1% level of significance.
# Reshaping the dataset
paper2data <- paper2data[-c(11, 12)]
paper2dataLong = reshape(data=paper2data, idvar=c("famid","age"), varying = 3:10, sep = "", timevar = "twin", times = c(1, 2), new.row.names= 1:10000, direction = "long")
# Sorting by family and twin identifiers
paper2dataLong <- paper2dataLong[, c(1, 3, 2, 4, 6, 7, 5)]
#paper2dataLong[order(paper2dataLong$famid, paper2dataLong$twin), ]
# Generating a new variable for quadratic relationship between age and wage
paper2dataLong$agesqdiv100 <- paper2dataLong$age*paper2dataLong$age/100
paper2dataLong <- paper2dataLong[, c(1, 2, 3, 8, 4, 5, 6, 7)]
The equation estimated in column 1 of Table 3 is a pooled OLS regression of the logarithm of earnings (wage) on years of schooling (educ) and other control variables, as shown in Equation (4):
\[\begin{equation} \tag{4} log(wage)=\alpha + \beta_{1} educ + \beta_{2} age + \beta_{3} (age^{2}/100) + \beta_{4} male + \beta_{5} white + \varepsilon. \end{equation}\]
# Pooled OLS regression
polsmodel <- lm(lwage ~ educ + age + agesqdiv100 + male + white, paper2dataLong)
summary(polsmodel)
##
## Call:
## lm(formula = lwage ~ educ + age + agesqdiv100 + male + white,
## data = paper2dataLong)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.62602 -0.28748 0.00277 0.28474 2.42317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.47061 0.42602 -1.105 0.270210
## educ 0.08387 0.01443 5.814 1.60e-08 ***
## age 0.08782 0.01883 4.663 4.75e-06 ***
## agesqdiv100 -0.08686 0.02335 -3.720 0.000239 ***
## male 0.20403 0.06302 3.237 0.001345 **
## white -0.41047 0.12668 -3.240 0.001333 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5324 on 292 degrees of freedom
## Multiple R-squared: 0.2724, Adjusted R-squared: 0.2599
## F-statistic: 21.86 on 5 and 292 DF, p-value: < 2.2e-16
The output below reproduces the estimation results from Table 3 column 1 of the paper.
# Formatting the table
stargazer(polsmodel, type="text", no.space=TRUE, keep.stat = c("n","adj.rsq"), column.labels=c("OLS"))
##
## ========================================
## Dependent variable:
## ---------------------------
## lwage
## OLS
## ----------------------------------------
## educ 0.084***
## (0.014)
## age 0.088***
## (0.019)
## agesqdiv100 -0.087***
## (0.023)
## male 0.204***
## (0.063)
## white -0.410***
## (0.127)
## Constant -0.471
## (0.426)
## ----------------------------------------
## Observations 298
## Adjusted R2 0.260
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Based on Equation (4), the coefficient on education multiplied by 100 should be interpreted as the the percent change in wages from a one-unit increase in education. As the estimation of Equation (4) suggests \(\hat{\beta_{1}}=0.084\), it follows that one additional year of schooling leads to a 8.4 percent increase in wages. This marginal effect is significant at the 1% level of significance.
The relationship between age and the logarithm of wage is not linear, but quadratic. The marginal effect of age on the logarithm of wage can be derived as follows:
\[\frac{\partial \quad log(wage)}{\partial \quad age}=\beta_{2}+\frac{2 \beta_{3} age}{100} \quad \Rightarrow \quad \frac{100 \quad d (wage)/wage}{d(age)}=100\beta_{2} + 2 \beta_{3} age.\] As the estimation of Equation (4) suggests \(\hat{\beta_{2}}=0.088\) and \(\hat{\beta_{3}}=-0.087\), it follows that, at 50 years old age, being one more year older leads to a 0.1 percent increase in earnings, everything else being constant. At 30 years old age, the ceteris paribus effect of one additional year is about a 3.58 percent increase in earnings.
From Equation (4), the expression of \(\beta_{4}\) is the following: \[\beta_{4}=\mathop{\mathbb{E}}[log(wage)|male,educ,age,white]-\mathop{\mathbb{E}}[log(wage)|female,educ,age,white].\] Hence, \(\beta_{4}\) represents the expected difference in the logarithm of earnings between males and females, everything else being equal.
Applying an exponential transformation to each side of Equation (4) and doing little algebra help obtain that the quantity \(100[exp(\beta_{4})-1]\) represents the ceteris paribus percent change in earnings for males relative to females: \[\frac{100(wage_{male}-wage_{female})}{wage_{female}}=100[exp(\beta_{4})-1].\] As the estimation of Equation (4) gives \(\hat{\beta_{4}}=0.204\), males earn 22.6% higher than females, everything else kept constant. This gender gap is significant at the 1% level of significance.
Similarly to \(\beta_{4}\), the expression for \(\beta_{5}\) is:
\[\beta_{5}=\mathop{\mathbb{E}}[log(wage)|white=1,educ,age,male]-\mathop{\mathbb{E}}[log(wage)|white=0,educ,age,male].\] The ceteris paribus percent change in earnings for whites relative to non-whites is \(100[exp(\beta_{5})-1]\), as shown below : \[\frac{100(wage_{white}-wage_{non-white})}{wage_{non-white}}=100[exp(\beta_{5})-1].\] Based on the estimate \(\hat{\beta_{5}}=-0.410\) from Equation (4), whites earn 33.6% less than non-whites, everything else kept constant. This race gap is significant at the 1% level of significance.
Paper: Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793.
The paper evaluates the causal impact of minimum wage policy on employment rates in New Jersey. The main hypothesis is that, as the minimum wage changes, so does the employment rate. Prior studies find mixed evidence, with some showing a negative impact while others failing to detect it.
The ideal experiment can be conducted at the macro/regional level (with cities or geographical divisions as the unit of analysis) or at the micro level (with firms as the unit of analysis). Since the minimum wage policy is state-wide, other states as much similar to New Jersey as possible prior to the policy change are needed to serve as control states. In treated and control units, employment rates will be measured before and after the policy change. The before-after difference in differences in employment rates between treated units (in New Jersey) and control units (in other matched state(s)) will be the estimated impact.
The authors use a difference-in-difference approach as the identification strategy and estimate the impact at the micro level. They compare employment levels in fast-food restaurants in New Jersey and Pennsylvania (its neighboring state without similar policy change) before and after the minimum wage shifts from $4.25 to $5.05 per hour in New Jersey.
The difference-in-difference method faces a central threat known as the parallel assumption - that is, employment levels in New Jersey and Pennsylvania would follow the same time trend in the absence of the minimum wage policy change.
This identification strategy also assumes that fast-food restaurants in Pennsylvania are the best counterfactuals for those in New Jersey. Of the most importance, the extent of firm competitiveness in the food industry is assumed to be the same in the two states.
# Importing the dataset
paper3data <- read.csv("CardKrueger1994_fastfood.csv", header = TRUE)
# Describing the dataset
str(paper3data)
## 'data.frame': 410 obs. of 12 variables:
## $ id : int 46 49 506 56 61 62 445 451 455 458 ...
## $ state : int 0 0 0 0 0 0 0 0 0 0 ...
## $ emptot : num 40.5 13.8 8.5 34 24 ...
## $ emptot2 : num 24 11.5 10.5 20 35.5 NA 29 36.5 11 8.5 ...
## $ demp : num -16.5 -2.25 2 -14 11.5 NA -41.5 13 0 -0.5 ...
## $ chain : int 1 2 2 4 4 4 1 1 2 2 ...
## $ bk : int 1 0 0 0 0 0 1 1 0 0 ...
## $ kfc : int 0 1 1 0 0 0 0 0 1 1 ...
## $ roys : int 0 0 0 0 0 0 0 0 0 0 ...
## $ wendys : int 0 0 0 1 1 1 0 0 0 0 ...
## $ wage_st : num NA NA NA 5 5.5 5 5 5 5.25 5 ...
## $ wage_st2: num 4.3 4.45 5 5.25 4.75 NA 4.75 5 5 5 ...
summary(paper3data)
## id state emptot emptot2
## Min. : 1.0 Min. :0.0000 Min. : 5.00 Min. : 0.00
## 1st Qu.:119.2 1st Qu.:1.0000 1st Qu.:14.56 1st Qu.:14.50
## Median :237.5 Median :1.0000 Median :19.50 Median :20.50
## Mean :246.5 Mean :0.8073 Mean :21.00 Mean :21.05
## 3rd Qu.:371.8 3rd Qu.:1.0000 3rd Qu.:24.50 3rd Qu.:26.50
## Max. :522.0 Max. :1.0000 Max. :85.00 Max. :60.50
## NA's :12 NA's :14
## demp chain bk kfc
## Min. :-41.50000 Min. :1.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: -4.00000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 0.00000 Median :2.000 Median :0.0000 Median :0.0000
## Mean : -0.07044 Mean :2.117 Mean :0.4171 Mean :0.1951
## 3rd Qu.: 4.00000 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. : 34.00000 Max. :4.000 Max. :1.0000 Max. :1.0000
## NA's :26
## roys wendys wage_st wage_st2
## Min. :0.0000 Min. :0.0000 Min. :4.250 Min. :4.250
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:4.250 1st Qu.:5.050
## Median :0.0000 Median :0.0000 Median :4.500 Median :5.050
## Mean :0.2415 Mean :0.1463 Mean :4.616 Mean :4.996
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:4.950 3rd Qu.:5.050
## Max. :1.0000 Max. :1.0000 Max. :5.750 Max. :6.250
## NA's :20 NA's :21
Distribution of Store Types (percentages by state)
panel1Variables <- paper3data %>% group_by(state) %>%
summarize(a.BurgerKing = sum(bk), b.KFC = sum(kfc), c.RoyRogers = sum(roys), d.Wendys =
sum(wendys))
table2Panel1 <- as.matrix(panel1Variables)
table2Panel1 <- prop.table(t(table2Panel1[,-1]), margin=2)*100
colnames(table2Panel1) <- c("PA", "NJ")
print(table2Panel1)
## PA NJ
## a.BurgerKing 44.30380 41.08761
## b.KFC 15.18987 20.54381
## c.RoyRogers 21.51899 24.77341
## d.Wendys 18.98734 13.59517
FTE employment (means by wave and state)
panel2Variables <- paper3data %>% group_by(state) %>%
summarize(FTEemploymentWave1= mean(emptot, na.rm = TRUE),
FTEemploymentWave2 = mean(emptot2,na.rm = TRUE))
table2Panel2 <- as.matrix(panel2Variables)
table2Panel2 <- table2Panel2[,-1]
colnames(table2Panel2) <- c("FTE employment (Wave 1)", "FTE employment (Wave 2)")
rownames(table2Panel2) <- c("PA", "NJ")
table2Panel2 <- t(table2Panel2)
print(table2Panel2)
## PA NJ
## FTE employment (Wave 1) 23.33117 20.43941
## FTE employment (Wave 2) 21.16558 21.02743
To compute this estimate accurately, incomplete cases of the difference variable “demp” would be discarded.
paper3dataNodempNA <- paper3data[complete.cases(paper3data$demp),]
panel2Variables <- paper3dataNodempNA %>% group_by(state) %>%
summarize(FTEemploymentWave1= mean(emptot, na.rm = TRUE),
FTEemploymentWave2 = mean(emptot2,na.rm = TRUE))
table2Panel2 <- as.matrix(panel2Variables)
table2Panel2 <- table2Panel2[,-1]
colnames(table2Panel2) <- c("FTE employment (Wave 1)", "FTE employment (Wave 2)")
rownames(table2Panel2) <- c("PA", "NJ")
table2Panel2 <- t(table2Panel2)
print(table2Panel2)
## PA NJ
## FTE employment (Wave 1) 23.38000 20.43058
## FTE employment (Wave 2) 21.09667 20.89725
Given the table of FTE employment means by wave and state above, the difference-in-differences estimate (DD) can be calculated “by hand” in the following steps:
Wave 2 Difference: D2=(20.89725-21.09667) is the post-policy change difference in FTE employment means between the treated state (NJ) and the control state (PA);
Wave 1 Difference: D1=(20.43058-23.38000) is the pre-policy change difference in FTE employment means between the treated state (NJ) and the control state (PA);
DD = (D2-D1).
DD <- (20.89725-21.09667)-(20.43058-23.38000)
DD
## [1] 2.75
The difference-in-differences estimate of 2.75 is interpreted as the change in mean FTE employment between New Jersey and Pennsylvania fast-food restaurants over time relative to the mean FTE employment observed prior to the minimum wage policy change. Since the number is positif, at face value, the policy change has not led employers to cut employment. However, the standard error of this mean is needed to see if the estimated impact is statistically significant.
The equation of the OLS regression to obtain the same Diff-in-diff estimate DD is shown in Equation (5), where \(\Delta FTEemployment= FTEemployment_{NJ}-FTEemployment_{PA}\) and \(state\) is the indicator for \(NJ\). \[\begin{equation} \tag{5} \Delta FTEemployment=\alpha + \beta state + \varepsilon. \end{equation}\]
# OLS regression
olsmodel <- lm(demp ~ state, paper3dataNodempNA)
summary(olsmodel)
##
## Call:
## lm(formula = demp ~ state, data = paper3dataNodempNA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.217 -3.967 0.533 4.533 33.533
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.283 1.036 -2.205 0.0280 *
## state 2.750 1.154 2.382 0.0177 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.968 on 382 degrees of freedom
## Multiple R-squared: 0.01464, Adjusted R-squared: 0.01206
## F-statistic: 5.675 on 1 and 382 DF, p-value: 0.01769
The OLS output reproduces \(\hat{\beta}=DD\) as obtained in 3.2.d. The OLS result also shows that the impact estimate is significant at the 5% level of significance.
# Formatting the table
stargazer(olsmodel, type="text", no.space=TRUE, keep.stat = c("n","rsq"), column.labels=c("OLS"))
##
## ========================================
## Dependent variable:
## ---------------------------
## demp
## OLS
## ----------------------------------------
## state 2.750**
## (1.154)
## Constant -2.283**
## (1.036)
## ----------------------------------------
## Observations 384
## R2 0.015
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
# Reshaping the dataset
paper3dataNodempNA$emptot0 <- paper3dataNodempNA$emptot
paper3dataNodempNA$emptot1 <- paper3dataNodempNA$emptot2
paper3dataNodempNA <- paper3dataNodempNA[-c(3:12)]
paper3dataLong = reshape(data=paper3dataNodempNA, idvar=c("id", "state"), varying = 3:4, sep = "", timevar = "post", times = c(0, 1), new.row.names= 1:10000, direction = "long")
# Generating the interaction variable (state x post)
paper3dataLong$interaction <- paper3dataLong$state*paper3dataLong$post
The equation of the appropriate DiD regression is shown in Equation (6), where the restaurant is indexed by \(i\), the period is indexed by \(t\), and \(post\) is the indicator for post-policy change. The coefficient of interest is \(\delta\). \[\begin{equation} \tag{6} FTEemployment_{i,t}=\alpha + \beta state_{i} + \gamma post_{i,t} + \delta interaction_{i,t} + \varepsilon_{i,t}. \end{equation}\]
# DiD regression
didmodel <- lm(emptot ~ state + post + interaction , paper3dataLong)
summary(didmodel)
##
## Call:
## lm(formula = emptot ~ state + post + interaction, data = paper3dataLong)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.097 -6.472 -0.931 4.603 64.569
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.380 1.098 21.288 <2e-16 ***
## state -2.949 1.224 -2.409 0.0162 *
## post -2.283 1.553 -1.470 0.1419
## interaction 2.750 1.731 1.588 0.1126
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.511 on 764 degrees of freedom
## Multiple R-squared: 0.007587, Adjusted R-squared: 0.00369
## F-statistic: 1.947 on 3 and 764 DF, p-value: 0.1206
# Formatting the table
stargazer(didmodel, type="text", no.space=TRUE, keep.stat = c("n","adj.rsq"), column.labels=c("DiD"))
##
## ========================================
## Dependent variable:
## ---------------------------
## emptot
## DiD
## ----------------------------------------
## state -2.949**
## (1.224)
## post -2.283
## (1.553)
## interaction 2.750
## (1.731)
## Constant 23.380***
## (1.098)
## ----------------------------------------
## Observations 768
## Adjusted R2 0.004
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The DiD specification has more degrees of freedom than the OLS specification. The DiD regression output indicates \(\hat{\delta}=2.75\) as obtained using the OLS regression. However, based on these DiD estimation results, the study fails to reject the null hypothesis that the minimum wage policy change has no effect on FTE employment.
The “parallel trends” assumption, which is key for the DiD approach to work, means that in the absence of the rise in the minimum wage policy variable, FTE employment in NJ and PA states follow the same time trends.
This assumption is hard to be verified because of missing data problem, thus calling for the use of historical data before the policy implementation to show whether the linear trends apply.