Discussion 6

Author

Nuobing Fan

Part 1: Please skim through the entire article first - A Guide To Using The Difference-In-Differences Regression Model

A. In a simple diff-in-diff paper, we are looking for either the 2 X 2 matrix of regression equations (so that you can construct the diff-in-diff estimator), or you are looking for the following regression form: Y = α + β1(time)+ β2(treatment) + β3(time*treatment), where

-β1 is the expected mean change in outcome from before to after the onset of the intervention era among the control group. It reflects, if you will, the pure effect of the passage of time in the absence of the actual intervention. -β2 (coefficient of the treatment variable) is the estimated mean difference in Y between the treatment and control groups prior to the intervention: it represents whatever “baseline” differences existed between the groups before the intervention was applied to the control group. -β3 by itself is the difference in differences estimator. In most contexts, it is β3 that is the focus of interest. It tells us whether the expected mean change in outcome from before to after was different in the two groups. (That would typically be the hallmark of an effective intervention, assuming adequate power, etc.) -To get the estimated mean difference in Y between the treatment and control groups after the intervention, you need to look at β1 + β3. It is possible that you will find that β1 + β3 is significantly different from zero, even though neither β1, nor β3 by itself is.

C. Let’s try to understand and interpret what you have done.

-What is the control and the control group, and what is the treatment and the treatment group? 4 lines max.

Control: The condition of being unaffected by the disaster. Control Group: Areas that experienced the control condition, meaning they were not impacted by the disaster. Treatment: The disaster event itself. Treatment Group: Areas that were directly affected by the disaster.

-The basic idea is to compare the difference in outcomes between the treatment and control groups before and after the treatment is introduced. By differencing these differences, what do you hope to achieve, and why should it work? Intuitively, why does methodology work/describe it in plain simple English to your grandma.

The basic idea of this approach, called difference-in-differences, is to really understand how much impact something specific (like a natural disaster) has on house prices by comparing two sets of areas: one that was affected by the disaster (treatment group) and one that wasn’t (control group). Here’s why it should work and what we hope to achieve: By looking at how house prices change in both groups before and after the disaster, we can sort out what changes are just normal fluctuations that would have happened anyway (like market trends) and what changes are directly due to the disaster. This way, we don’t mistakenly attribute normal price changes to the disaster. In simpler terms for explaining: Imagine you and your neighbor both grew tomatoes in your gardens. This year, you tried a new fertilizer (the “treatment”) but your neighbor didn’t. At the end of the season, to see if the fertilizer really works, you compare how much more your tomatoes grew compared to your neighbor’s. Just like comparing gardens helps us see the fertilizer’s effect, comparing house prices in different areas helps us see the disaster’s real impact.

-Create the 2 X 2 matrix of regression equations with actual values from the data. HINT - take the mean of your y variable, not the count in each cell. Does your difference in difference coefficient in the linear regression above match the difference in difference effect of the two groups from the 2*2 table created? It should match by the way.

# Assuming data has been loaded into a dataframe called 'data'
# and that it includes 'Time_Period', 'Disaster_Affected', and 'HPI_CHG'

# Calculate the mean HPI_CHG for each group and time period
library(dplyr)

# Creating a summary table to calculate means for each group
summary_table <- data %>%
  group_by(Time_Period, Disaster_Affected) %>%
  summarise(Mean_HPI_CHG = mean(HPI_CHG, na.rm = TRUE)) %>%
  ungroup()
`summarise()` has grouped output by 'Time_Period'. You can override using the
`.groups` argument.
# Display the summary table to see the means
print(summary_table)
# A tibble: 4 × 3
  Time_Period Disaster_Affected Mean_HPI_CHG
        <dbl>             <dbl>        <dbl>
1           0                 0      0.0371 
2           0                 1      0.0231 
3           1                 0      0.00924
4           1                 1      0.0150 
# Creating the 2x2 matrix
# Assuming Time_Period and Disaster_Affected are both dummy variables 0 or 1
matrix_2x2 <- matrix(nrow = 2, ncol = 2)
colnames(matrix_2x2) <- c("Control", "Treatment")
rownames(matrix_2x2) <- c("Pre-Treatment", "Post-Treatment")

# Fill the matrix with calculated means
for (i in 0:1) {
  for (j in 0:1) {
    matrix_2x2[i + 1, j + 1] <- summary_table$Mean_HPI_CHG[summary_table$Time_Period == i & summary_table$Disaster_Affected == j]
  }
}

# Print the 2x2 matrix
print(matrix_2x2)
                   Control  Treatment
Pre-Treatment  0.037090020 0.02314612
Post-Treatment 0.009242792 0.01503835
# Compute Difference-in-Differences estimate from the 2x2 matrix
DiD_estimate <- (matrix_2x2[2,2] - matrix_2x2[1,2]) - (matrix_2x2[2,1] - matrix_2x2[1,1])
print(paste("Difference-in-Differences Estimate from 2x2 Matrix: ", DiD_estimate))
[1] "Difference-in-Differences Estimate from 2x2 Matrix:  0.0197394560315789"
# Compare with regression coefficient of interaction term
# Assuming you have already run the regression and have the model stored in 'model'
interaction_coefficient <- summary(model)$coefficients["Interaction", "Estimate"]
print(paste("Regression Coefficient for Interaction Term: ", interaction_coefficient))
[1] "Regression Coefficient for Interaction Term:  0.0197394560315789"
# Checking if they match
if (round(DiD_estimate, 5) == round(interaction_coefficient, 5)) {
  print("The DiD estimate matches the regression coefficient.")
} else {
  print("The DiD estimate does not match the regression coefficient.")
}
[1] "The DiD estimate matches the regression coefficient."

The difference-in-differences (DiD) coefficient from my linear regression matches the DiD effect calculated manually from the 2x2 table I created. Both methods resulted in a DiD estimate of approximately 0.019739456, confirming that the regression model accurately reflects the impact of the treatment on the outcome variable when compared to the manual calculations

D. What are the “threats to identification”? In other words, what are the “implicit assumptions” like for simple OLS we have the 5 Gauss Markov Assumptions/conditions? Alternatively, under what conditions can you trust the point estimates, and when would you buy the study results with a grain of salt? Googling may help with weaknesses of the study, or when does Diff-in-Diff methodology Download Diff-in-Diff methodologyfails in general. (HINT: parallel trend assumption) (2-3 sentences at least).

Threats to Identification: The most critical threat in DiD analysis is the violation of the parallel trends assumption. This assumption is essential as it underpins that, without the treatment, the outcome trajectories for both treated and control groups would have been similar over time. If these trends diverge or if the groups are impacted differently by external shocks (violating the common shocks assumption), the DiD estimator may become biased. These biases occur because the estimator might attribute changes driven by other factors to the treatment effect. Trust in Point Estimates: Trust in the point estimates from a DiD analysis is contingent on the strength and validity of its underlying assumptions. The parallel trends assumption must hold for the results to be credible. Additionally, there should be no anticipation effects, meaning participants should not change their behavior before the treatment based on their expectations of the treatment. When these assumptions are met, and the model appropriately accounts for potential confounders and external shocks, the point estimates can be considered reliable. However, if there’s evidence that these conditions are compromised, then the study results should be taken with caution, acknowledging the potential for biased outcomes. These insights are drawn from the thorough exploration of DiD methodology discussed in the documents you provided, which emphasize the importance of these assumptions for the integrity and reliability of DiD estimates.

E. In the paper here that uses triple Diff, what are the three margins? Type out the estimating equation and explain the study design - why does it work (who are we comparing against whom)? 8 lines max.

Age Groups (Treatment vs. Control): Younger girls (14-15) eligible for bicycles versus older girls (16-17) who were not. Gender: Changes in girls’ enrollment compared to boys’ within the same age groups. Geographic Comparison: Bihar (where the program was implemented) versus neighboring Jharkhand (without the program). \[ y=\beta_0 + \beta_1 * Female * Treatment * Bihar + controls + \epsilon \] This model works by comparing enrollment changes among younger girls eligible for bicycles in Bihar against older girls in Bihar, younger and older boys in Bihar, and similar groups in Jharkhand. This comprehensive comparison isolates the specific impact of the bicycle program from other demographic and regional influences on school enrollment.

Part 2: Download and read the following article: Card&Kreuger_Diff-in-Diff_paper_reading.pdf Download Card&Kreuger_Diff-in-Diff_paper_reading.pdf(I would begin with reading the abstract, then the conclusion, and then the first half of the paper/sections such that you understand Table1, Table2, the two graphs of before and after wage distribution {pg 777}, and Table 3.). Thus, try reading the original article from page 772 to page 779 (end of section III. Employment Effects of the Minimum-Wage Increase A. Differences in Differences).

After reading the discussion article and resources above, post your comments on what you found to be interesting and what you learned using the following template -

ARTICLE SUMMARY

What are the authors trying to do? In other words, what is the economic hypothesis that the authors are trying to test for? (2-3 sentences in your own words - EG do not simply copy/paste from the article abstract or conclusion) -> DOES MW REDUCE EMPLOYMENT?

The authors, Card and Krueger, are examining the impact of minimum wage increases on employment levels. They aim to test the conventional economic hypothesis that raising the minimum wage leads to lower employment. Their study specifically investigates whether this holds true by analyzing the effects of a minimum wage increase in New Jersey compared to Pennsylvania, where the minimum wage remained unchanged.The final results do not show that the minimum wage increase reduces employment.

What do the authors find? Do the results make sense under a perfectly competitive labor market (where binding minimum wages cause more unemployment)? Explain. (2-3 sentences) -> MW INCREASES EMPLOYMENT! NO, RESULTS DO NOT MAKE SENSE UNDER STANDARD PERFECTLY COMPETITIVE (LABOR) MKT ASSUMPTION.

METHODOLOGY / CRITICAL ANALYSIS

What is the treatment here? What is the treatment group, and what is the control group? -> CHANGE IN MW LAWS, NJ IS THE TREATMENT GROUP THAT SAW IN INCREASE IN MW, WHILE THE CONTROL GROUP IS PA.

The treatment in this study is the increase in minimum wage laws in New Jersey. Consequently, New Jersey serves as the treatment group that experienced the minimum wage increase. Pennsylvania, where the minimum wage remained unchanged, serves as the control group, allowing for a comparative analysis of employment changes due to the wage policy alteration.

What are the “threats to identification”? In other words, what are the “implicit assumptions” like for simple OLS we have the 5 Gauss Markov Assumptions/conditions? Alternatively, under what conditions can you trust the point estimates, and when would you buy the study results with a grain of salt? Googling may help with weaknesses of the study, or when does Diff-in-Diff methodology Download Diff-in-Diff methodologyfails in general. (HINT: parallel trend assumption to begin with) (2-3 sentences at least)

One of the main threats to identification in a Difference-in-Differences (DiD) analysis like this is the violation of the parallel trends assumption. This assumption requires that, absent the intervention (the change in minimum wage), the employment trends in both the treatment group (New Jersey) and the control group (Pennsylvania) would have followed similar trajectories. If this assumption doesn’t hold—if, for example, New Jersey was already on a different economic path due to unrelated policy changes or economic shifts—the results might be misleading. Therefore, while the study provides valuable insights, it’s important to consider the possibility of pre-existing differences between the groups, as these could bias the findings and should prompt caution in interpreting the results too definitively without further validation.

Part 3:

OPTIONAL: Another Econometric Debate (on impact of low skilled immigration)

Mariel BoatLift

Background in pictures Paper Paper Explained (the economic debates that followed are interesting). SummaryLinks to an external site. of debates. Crime and the Mariel Boatlift

The Mariel Boatlift of 1980, where approximately 125,000 Cubans migrated to Miami, provides a crucial case study on the economic and social impacts of large-scale immigration. David Card’s initial 1990 study indicated that this influx had no significant adverse effects on the wages and employment of local low-skilled workers, challenging traditional economic predictions that a labor supply surge would depress wages and raise unemployment. However, subsequent critiques, particularly by George Borjas, suggested that specific subgroups did experience wage declines, though further analyses criticized the methodologies used and suggested demographic shifts within samples might have influenced these findings. Additionally, a study by Alexander Billy and Michael Packard using synthetic control methods showed that the Boatlift led to an increase in property crimes and murders, attributed partly to the demographic characteristics of the immigrants, many of whom were young males with some having criminal backgrounds. This complex scenario highlights the nuanced effects of immigration on local economies and communities, underscoring the need for careful policy planning and integration strategies to manage such dynamics effectively.

Some source may be helpful: paper by Jessica Lynn Peck titled “Does Uber Reduce Drunk Driving? Evidence from a Natural Experiment in Las Vegas” (2017)