1 Part 1: Paper using randomized data: Impact of Class Size on learning

1.1 Briefly answer these questions:

a.What is the causal link the paper is trying to reveal?

The paper studies the effects of educational resources on student achievement by providing an econometric analysis the large-scale randomized experiment on class size confucted in the United States, named as Project STAR.

b. What would be the ideal experiment to test this causal link?

The ideal experiment is in which kindergarten students and their teachers were randomly assigned to one of three groups: small classes, regular-size classes, and regular/aide classes with a full-time teacher’s aide. After their initial assignment, the design called for students to remain in the same class type for four years. The student achievement were presented by standardized tests at the end of each school year.

c. What is the identificatin strategy?

Class size is determined by random assignment, and it will be independent of the omitted variables. With random assignemtn, a simple comparison of mean achievement between children in small and large classes provides an unbiased estimate of the effect of class size on achievement. However, students could move to other class types and new students could enter in the project each schooling year. The authors compare the estimation results of both OLS and reduce-form regression by using actual assignment and initial assignment variables, respectively.

d. What are the assumptions/ threats to this identification strategy?

*The STAR data set does not contain students’ original class type assignments resulting from the randomization procedure.

*Baseline test score information on the students is not available, so one cannot examine whether the treatment and control groups “looked similar” on the measure before the experiment began.

2 Part 2: Paper using Twins for Identification: Economic Returns to Schooling

2.1 Briefly answer these questions:

a.What is the causal link the paper is trying to reveal?

This paper studies the causal relationship between wage rates and different schooling levels. The authors investigate that causality between schooling and wages is not due to a worker’s ability or other factors by controlling family backgrounds and individual characteristic.

b. What would be the ideal experiment to test this causal link?

The ideal experiment to test this causal link is to rule out the unobservable component from each family i and also adjusts for measurement error obtained from either the self-reported or sibling-reported schooling level.

c. What is the identificatin strategy?

Let \(y_{1i}\) and \(y_{2i}\) are the wage rates of the first and second twins in family i. Let \(X_i\) represent the set of variables that vary by family i but not across twins. \(Z_{1i}\) and \(Z_{2i}\) represent the sets of variables that may vary across the twins. \(\mu_i\) is the unobservable factor from each family i, \(\epsilon_{1i}\) and \(\epsilon_{2i}\) are the idiosyncratic unobserable components.

The wage rate for the first and second twins in each family i is:

\[y_{1i}=\alpha X_i +\beta Z_{1i} +\mu_i +\epsilon_{1i} \tag{1}\]

\[y_{2i}=\alpha X_i +\beta Z_{2i} +\mu_i +\epsilon_{2i} \tag{2}\] Since \(\mu_i\) is correlated with \(X_i\), \(Z_{1i}\) and \(Z_{2i}\), the unobservable family effect can be represented by observable variables. \[\mu_i= \gamma Z_{1i}+ \gamma Z_{2i} +\delta X_i + w_i \tag{3}\] Rewrite the wage rate \(y_{1i}\) and \(y_{2i}\) as following \[y_{1i}=(\alpha+\delta)X_i+ (\beta+\gamma)Z_{1i}+\gamma Z_{2i}+w_i+\epsilon_{1i} \tag{4}\] \[y_{2i}=(\alpha+\delta)X_i+ (\beta+\gamma)Z_{2i}+\gamma Z_{1i}+w_i+\epsilon_{2i} \tag{5}\] Take the first difference \[y_{1i}-y_{2i}= \beta(Z_{1i}-Z_{2i})+\epsilon_{1i}-\epsilon_{2i} \tag{6}\]

The authors use OLS, GLS, IV, first difference and first difference for IV to estimate the fix effect.

d. What are the assumptions/ threats to this identification strategy?

The authors believe the existence of misreport for individual’s and parental schooling levels. Therefore, they compare the estimates of both sibling-reported and self-reported schooling differences. Their results indicate that the measurement error in schooling lead to considerable underestimation of the schooling differences in studies based on siblings.

2.2 Replication analysis

a. Load Ashenflter and Krueger AER 1994 data.

library(readxl)
library(dplyr)
library(tidyr)
library(reshape2)
HW9 <- read_excel("HW9.xls")
white<- HW9 %>% select(famid,age,educ1,educ2,white1,white2)
white<- melt(white, id=c("famid","age","educ1","educ2"))
male<- HW9 %>% select(famid,age,educ1,educ2,male1,male2)
male<- melt(male, id=c("famid","age","educ1","educ2"))
wage<- HW9 %>% select(famid,age,educ1,educ2,lwage1,lwage2)
wage<- melt(wage, id=c("famid","age","educ1","educ2"))
wage$value<-exp(wage$value)
average_age<-c(mean(HW9$age),sd(HW9$age))
average_educ1<-c(mean(HW9$educ1),sd(HW9$educ1))
average_educ2<-c(mean(HW9$educ2),sd(HW9$educ2))
average_wage<-c(mean(wage$value),sd(wage$value))
average_white<-c(mean(white$value),sd(white$value))
average_male<-c(mean(male$value),sd(male$value))
Table<-data.frame(average_educ1,average_educ2,average_wage,average_age,average_white,average_male)
#keep 2 digits
Table<-round(Table,2)
names(Table)[1:6]<-c("Self-reported education","Silbling-reported education","Hourly wage","Age","White","Female")
rownames(Table)<-c("Mean","Standard Deviation")
Table<-t(Table)
library(DT)
datatable(Table)

b. Reproduce the result from table 3 column 5.

FD<- HW9 %>% select(lwage1,lwage2,educ1,educ2)
FD$FDlwage<-(FD$lwage1-FD$lwage2)
FD$FDeduc<-(HW9$educ1-HW9$educ2)
modelFD<-lm(FDlwage~FDeduc,FD)
summary(modelFD)

## 
## Call:
## lm(formula = FDlwage ~ FDeduc, data = FD)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.03115 -0.20909  0.00722  0.34395  1.15740 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.07859    0.04547  -1.728 0.086023 .  
## FDeduc       0.09157    0.02371   3.862 0.000168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5542 on 147 degrees of freedom
## Multiple R-squared:  0.09211,    Adjusted R-squared:  0.08593 
## F-statistic: 14.91 on 1 and 147 DF,  p-value: 0.0001682

c. Explain how this coefficient should be interpreted.

This coefficient implies if we change 1 unit of intrapair difference in schooling levels, we would expect the intrapair difference in wage rates to change by 9.2 percentage.

d. Reproduce the result in table 3 column 1.

attach(HW9)
First<-HW9[,c(1,2,3,5,7,9)]
Second<-HW9[,c(1,2,4,6,8,10)]
detach(HW9)
#Have to make sure the names of columns are same!!
names(First)[1:6]<-c("famid","age","educ","lwage","male","white")
names(Second)[1:6]<-c("famid","age","educ","lwage","male","white")
OLS1<-rbind(First,Second)
OLS1$ageSquare<-OLS1$age^2/100
modelOLS1<-lm(lwage~educ+age+ageSquare+male+white,OLS1)
summary(modelOLS1)

## 
## Call:
## lm(formula = lwage ~ educ + age + ageSquare + male + white, data = OLS1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.62602 -0.28748  0.00277  0.28474  2.42317 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.47061    0.42602  -1.105 0.270210    
## educ         0.08387    0.01443   5.814 1.60e-08 ***
## age          0.08782    0.01883   4.663 4.75e-06 ***
## ageSquare   -0.08686    0.02335  -3.720 0.000239 ***
## male         0.20403    0.06302   3.237 0.001345 ** 
## white       -0.41047    0.12668  -3.240 0.001333 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5324 on 292 degrees of freedom
## Multiple R-squared:  0.2724, Adjusted R-squared:  0.2599 
## F-statistic: 21.86 on 5 and 292 DF,  p-value: < 2.2e-16

e. Explain how the coefficient on education should be interpreted.

This coefficient implies if we change 1 unit of schooling level, we would expect the wage rates to change by 8.4 percentage.

f. Explain how the coefficient on the control variables should be interpreted.

Age: it implies if we change 1 unit of age, we would expect the wage rates to change by 8.8 percentage.

Age square: it implies if we change 1 unit of age square, we would expect the wage rates to change by -8.7 percentage.

Male: it implies that the wage rate increases by 20.4 percentage if the respondent is male.

White: it implies that the wage rate decreases by 41.0 percentage if the respondent is white.

2.3 Part 3: Paper using Difference-in-Differences: Impact of Minimum Wage

a.What is the causal link the paper is trying to reveal?

The paper presents new evidence on the effect of minimum wages on establishment level employment outcomes. The authors compare the empolyment, wages and prices at stores after the rise in order to offer a simple method for evaluating the effects of the minimum wage.

b. What would be the ideal experiment to test this causal link?

The ideal experiment to test this causal link would be required to satisfy 4 assumptions for DiD estimation:

1.Allocation of the treatment is not determined by outcome: this assumption requires that the distribution of store types has to be similar between New Jersey and Pennsylvania. The employees in the fast food restaurants are more likely to be affected by the minimum wage law. If these two regions have different distribution of store types of fast restaurants, we cannot believe the estimation results because of selection bias.

2.Treatment and controll groups have parallel trends in outcome: this assumption implies that economic environment of these two states are constant when they were having their survey from February to November in 1992.

3.Composition of intervention and comparison groups is stable: this assumption implies that the attributes between treatment groups and control groups should be constant over time. One of the example of this issue can be found in the paper: substitution of the full- and part-time workers.

4.No spillover effects: the minimum wage rise will only affect the workers who receives the low wage. This law will not affect the workers with high wages, this point had been proved in the paper.

c. What is the identificatin strategy?

First, the authors simply summarized the levels and changes in average employment per store in the survey and compared the difference in FTE employment in stores with higher wages with those with lower wages. Second, they adjusted the regressions to consider other sources of variation in employment growth such as the chain types. Finally, they did the robustness check by estimating the models for the proportional change in employment. Except difference in FTE employment, the authors also consider other factors that are related to the empolyment, such as price effects, nonwage offsets and substitution of full and part-time workers.

d. What are the assumptions/ threats to this identification strategy?

I think the most significant threats to this identification are the parallel trend assumption and stable composition of control and treatment group.

For the parallel trend assumption, it is hard to hold it in reality. Because rising the minimum wage rate defintely affected the production costs of all industries which relied on cheap labors. Although employees with low wages benefited from higher minimum wage rate, employers and consumers had to afford the loss caused by rising minimum wage. The consequence would be either employers reduced the employment or consumers paid a higher price of the product. The law of new minimum wage were not only affected the workers with low wages (treatment group) but also other with higher wages (control group). Therefore, the difference in controll groups between New Jersey and Pennsylvania could hardly be constant over time.

For statble composition, the violation might happens when the employment was flexible and labor was mobile between two states. More workers with low wage in Pennsylvania were willing to move to New Jersey for a higher wage. This leads to a selection bias.

2.4 Replication analysis

a. Load data from Card and Krueger AER 1994 data.

HW92 <- read_excel("HW92.xlsx")
library(DT)
datatable(HW92)

b. Verify that the data is matching that of the paper.

HW92$emptot<-as.numeric(HW92$emptot)
HW92$emptot2<-as.numeric(HW92$emptot2)
HW92$demp<-as.numeric(HW92$demp)
HW92$wage_st<-as.numeric(HW92$wage_st)
HW92$wage_st2<-as.numeric(HW92$wage_st2)
ST<- HW92 %>% select(state,bk,kfc,roys,wendys) %>%
          group_by(state) %>% 
          summarise_each(funs(sum(.,na.rm=TRUE)),bk,kfc,roys,wendys)%>%
          mutate(sum=rowSums(.)) %>%
          group_by(state) %>%
          summarise_each (funs(./sum),bk,kfc,roys,wendys)
Mean<-HW92 %>% select(state,emptot,emptot2) %>%
               group_by(state) %>%
               summarise(mean_wave1=mean(emptot,na.rm = TRUE),mean_wave2=mean(emptot2,na.rm=TRUE))
round(ST,1)

## # A tibble: 2 x 5
##   state    bk   kfc  roys wendys
##   <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1     0   0.4   0.2   0.2    0.2
## 2     1   0.4   0.2   0.2    0.1

round(Mean,1)

## # A tibble: 2 x 3
##   state mean_wave1 mean_wave2
##   <dbl>      <dbl>      <dbl>
## 1     0       23.3       21.2
## 2     1       20.4       21

library(knitr)
kable(ST,caption= "Distribution of Store Types",digits=2)

Distribution of Store Types
state	bk	kfc	roys	wendys
0	0.44	0.15	0.22	0.19
1	0.41	0.20	0.25	0.14

kable(Mean, caption= "Means of FTE employment in Wave1 and Wave2",digits=2)

Means of FTE employment in Wave1 and Wave2
state	mean_wave1	mean_wave2
0	23.33	21.17
1	20.44	21.03

c. Computer the differene-in-differences estimator “by hand”.

#mean value
FTE<-HW92 %>% select(state,emptot,emptot2,demp) %>%
  group_by(state) %>%
  summarise(mean_wave1=mean(emptot,na.rm=TRUE),mean_wave2=mean(emptot2,na.rm = TRUE),demp=mean(demp,na.rm=TRUE))%>%
  select(mean_wave1,mean_wave2,demp)
attach(FTE)
diff<-c((mean_wave1[2]-mean_wave1[1]),(mean_wave2[2]-mean_wave2[1]),(demp[2]-demp[1]) )
detach(FTE)
FTE<-rbind(FTE,diff)
names(FTE)[1:3]<-c("FTE employment before, all available observation","FTE employment after, all available observation", "Change in mean FTE,balanced sample of stores")
FTE<-t(FTE)
#standard error by hand
se<-HW92 %>% select(state,emptot,emptot2,demp) %>%
  filter(!is.na(emptot),!is.na(emptot2),!is.na(demp)) %>%
  group_by(state)%>%
  summarise(se_wave1=sqrt(var(emptot)/n()),se_wave2=sqrt(var(emptot2)/n()),
            se_demp=sqrt(var(demp)/n())) %>%
  select(se_wave1,se_wave2,se_demp)
attach(se)
diff_se<-c(sqrt(se_wave1[1]^2+se_wave1[2]^2),sqrt(se_wave2[1]^2+se_wave2[2]^2),sqrt(se_demp[1]^2+se_demp[2]^2))
detach(se)
se<-rbind(se,diff_se)
names(se)[1:3]<-c("FTE employment before, all available observation","FTE employment after, all available observation", "Change in mean FTE,balanced sample of stores")
se<-t(se) 

library(knitr)
kable(FTE,caption= "Average Employment Per Store Before And After The Rise in New Jersey Mininmum Wage",digits=2,
      col.names =c("PA","NJ","Difference,NJ-PA"),
      aglin="c")

Average Employment Per Store Before And After The Rise in New Jersey Mininmum Wage
	PA	NJ	Difference,NJ-PA
FTE employment before, all available observation	23.33	20.44	-2.89
FTE employment after, all available observation	21.17	21.03	-0.14
Change in mean FTE,balanced sample of stores	-2.28	0.47	2.75

kable(se, caption= "Average Employment Per Store Before And After The Rise in New Jersey Mininmum Wage(standard error)",digits=2,
      col.names =c("PA","NJ","Difference,NJ-PA"),aglin="c")

Average Employment Per Store Before And After The Rise in New Jersey Mininmum Wage(standard error)
	PA	NJ	Difference,NJ-PA
FTE employment before, all available observation	1.39	0.52	1.48
FTE employment after, all available observation	0.97	0.53	1.10
Change in mean FTE,balanced sample of stores	1.25	0.48	1.34

d. Interpret the difference-in-differences estimator. Does it (roughly) match paper?

The results were 2.75[1.34] whose values is rounded to 2 digits. Difference-in-difference estimator implies that the New Jersey’s minimum wage rise is to increase its mean FTE employment by 2.75 by comparing to that of Pennsylvania after this wage rise was enacted. However, the coefficient is not statistically significant.

e. Use OLS to obtain the same Diff-in-diff estimator as you just did.

We can find when clustering by chain types, the standard errors are more closed to the results calculated “by hand”

OLS<-HW92 %>% select(chain,state,emptot,emptot2,demp) %>%
     filter(!is.na(emptot),!is.na(emptot2),!is.na(demp))
modelOLS<-lm(demp~state,OLS)
summary(modelOLS)

## 
## Call:
## lm(formula = demp ~ state, data = OLS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.217  -3.967   0.533   4.533  33.533 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   -2.283      1.036  -2.205   0.0280 *
## state          2.750      1.154   2.382   0.0177 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.968 on 382 degrees of freedom
## Multiple R-squared:  0.01464,    Adjusted R-squared:  0.01206 
## F-statistic: 5.675 on 1 and 382 DF,  p-value: 0.01769

coeftest(modelOLS,vcov=vcovHC(modelOLS,cluster="chain",type="HC1"))

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -2.2833     1.2481 -1.8294  0.06812 .
## state         2.7500     1.3377  2.0557  0.04049 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

f. Reshape your data to long form.

Did<-HW92 %>% select(state,emptot,emptot2)
#this is a lazy way to separate two time period
names(Did)[1:3]<-c("state","emptot0","emptot1")
Did<-melt(Did,id="state")
Did$variable<-as.character(Did$variable)
Did$Time<-substr(Did$variable,7,7)
Did<-Did[,-c(2)]
names(Did)<-c("state","emptot","Time")
head(Did,10)

##    state emptot Time
## 1      0  40.50    0
## 2      0  13.75    0
## 3      0   8.50    0
## 4      0  34.00    0
## 5      0  24.00    0
## 6      0  20.50    0
## 7      0  70.50    0
## 8      0  23.50    0
## 9      0  11.00    0
## 10     0   9.00    0

g. Run the appropriate DiD regression and comment on the result.

The coefficient of DiD estimator is similar to the coefficient calculated either by hand or by OLS regression. However, its standard error is much hihger than two other methods.

Did<-na.omit(Did)
modelDid<-lm(emptot~state+Time+state*Time,Did)
summary(modelDid)

## 
## Call:
## lm(formula = emptot ~ state + Time + state * Time, data = Did)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.166  -6.439  -1.027   4.473  64.561 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   23.331      1.072  21.767   <2e-16 ***
## state         -2.892      1.194  -2.423   0.0156 *  
## Time1         -2.166      1.516  -1.429   0.1535    
## state:Time1    2.754      1.688   1.631   0.1033    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.406 on 790 degrees of freedom
## Multiple R-squared:  0.007401,   Adjusted R-squared:  0.003632 
## F-statistic: 1.964 on 3 and 790 DF,  p-value: 0.118

h. In your own words, explain what is the “linear trends” assumption.

The linear trends assumption requires that in the absence of treatment, the difference between the treatment and control group should be unchanged over time. In this paper, the linear trend assumption implies that minimum wage rise in New Jersey will not affect difference in FTE employment for other empolyees who does not benefit from this minimum wage rise. We can find that the coefficient of DiD estimator is statistically insignificant in part g). Since the minimum wage rise might be correlated to chain types, after we control for types of chain, DiD estimator shows the statisical significance at 10%. This result implies that we can avoid the violation of linear trends assumption by controlling more variables that are correlated with the execution of intervention.

pt<-HW92 %>%select(state,emptot,emptot2,chain)
names(pt)[1:4]<-c("state","emptot0","emptot1","chain")
pt<-melt(pt,id=c("state","chain"))
pt$variable<-as.character(pt$variable)
pt$Time<-substr(pt$variable,7,7)
pt$Time<-as.numeric(pt$Time)
pt<-pt[,-c(3)]
names(pt)<-c("state","chain","emptot","Time")
pt<-na.omit(pt)
pt$did=pt$state*pt$Time
modelpt<-lm(emptot~state+Time+did+chain,pt)
summary(modelpt)

## 
## Call:
## lm(formula = emptot ~ state + Time + did + chain, data = pt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.528  -6.488  -1.030   4.611  64.968 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  24.3263     1.2588  19.325   <2e-16 ***
## state        -2.9264     1.1928  -2.453   0.0144 *  
## Time         -2.2011     1.5148  -1.453   0.1466    
## did           2.7852     1.6872   1.651   0.0992 .  
## chain        -0.4561     0.3032  -1.504   0.1329    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.398 on 789 degrees of freedom
## Multiple R-squared:  0.01024,    Adjusted R-squared:  0.005223 
## F-statistic: 2.041 on 4 and 789 DF,  p-value: 0.08687

HW9:

Y.Y.

04/01/2020

1 Part 1: Paper using randomized data: Impact of Class Size on learning

1.1 Briefly answer these questions:

2 Part 2: Paper using Twins for Identification: Economic Returns to Schooling

2.1 Briefly answer these questions:

2.2 Replication analysis

2.3 Part 3: Paper using Difference-in-Differences: Impact of Minimum Wage

2.4 Replication analysis