Using Panel data: Difference-in-Differences Impact of Minimum Wages

Part 1: Reading and questions

Download and go over this seminal paper by David Card and Alan Krueger. Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793. Be careful: They released a 2000 follow up with the exact same title followed by “:Reply”. We want the original, not the follow-up.

Briefly answer these questions:

a. What is the causal link the paper is trying to reveal?

The purpose of the paper is to shed light on the impact of minimum wages on employment. The authors of the paper aim to determine the effect of minimum wages by comparing employment status of fast food restaurants in two states, New Jersey and Pennsylvania. The comparison is done before and after a policy change where the minimum wage in New Jersey increased from $4.25 to $5.05 per hour. The neighboring state, Pennsylvania, serves as a control group as it did not experience a similar policy change. The paper presents new findings on the relationship between minimum wages and employment outcomes.

b. What would be the ideal experiment to test this causal link?

The authors of the paper aim to determine the impact of minimum wages on employment by conducting an experiment. They selected fast food restaurants in New Jersey as their primary sample and compared their data with restaurants in the neighboring state of Pennsylvania, which did not experience a minimum wage increase. The authors collected data on various variables such as wages, prices, etc. from 410 fast food restaurants in both states, before and after the minimum wage increase in New Jersey. The first data collection was done between February 15th and March 4th, 1992 (before the minimum wage increase) and the second data collection was conducted between November 5th and December 31st, 1992 (after the minimum wage increase). The use of two separate data collections and comparing two states with different minimum wage policies allowed the authors to test the causal link between minimum wages and employment outcomes.

c. What is the identification strategy?

The authors employed a difference-in-difference approach as their method of determining the impact of the policy change. They compared employment levels of fast-food restaurants in New Jersey and its neighboring state, Pennsylvania, before and after the minimum wage increased from $4.25 to $5.05 per hour in New Jersey. The authors modeled the change in employment from the first wave to the second wave of data collection as the dependent variable and used store characteristics and a dummy variable that indicated stores in New Jersey as the explanatory variables. They also created another equation with the same dependent variable, but this time used store characteristics and an alternative measure of the impact of the minimum wage at a specific store as the explanatory variables.

d. What are the assumptions / threats to this identification strategy? (answer specifically with reference to the data the authors are using)

The difference-in-difference method used by the authors in this paper is based on a few assumptions, one of which is the parallel assumption, which states that in the absence of a policy change, the employment levels in New Jersey and Pennsylvania would have followed the same trend. This is an important assumption as it underlies the validity of using Pennsylvania as a control group.

There are some potential threats to the validity of this identification strategy, such as omitted variable bias and the possibility of other changes that could affect the employment outcomes. For instance, it is possible that factors such as changes in the business cycle or shifts in consumer preferences could influence the employment outcomes in both New Jersey and Pennsylvania, making it difficult to attribute any changes to the minimum wage policy.

Part 2: Replication Analysis

a. Load data from Card and Krueger AER 1994

## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'readr'
## The following object is masked from 'package:rvest':
## 
##     guess_encoding
## 
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
## 
##     rename
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ tibble  3.1.6     ✔ purrr   0.3.4
## ✔ tidyr   1.2.0     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand()         masks reshape::expand()
## ✖ dplyr::filter()         masks stats::filter()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag()            masks stats::lag()
## ✖ reshape::rename()       masks dplyr::rename()
## [[1]]
## [1] "ggplot2"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [7] "methods"   "base"     
## 
## [[2]]
## [1] "stargazer" "ggplot2"   "stats"     "graphics"  "grDevices" "utils"    
## [7] "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "foreign"   "stargazer" "ggplot2"   "stats"     "graphics"  "grDevices"
##  [7] "utils"     "datasets"  "methods"   "base"     
## 
## [[4]]
##  [1] "stringr"   "foreign"   "stargazer" "ggplot2"   "stats"     "graphics" 
##  [7] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[5]]
##  [1] "dplyr"     "stringr"   "foreign"   "stargazer" "ggplot2"   "stats"    
##  [7] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[6]]
##  [1] "rvest"     "dplyr"     "stringr"   "foreign"   "stargazer" "ggplot2"  
##  [7] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [13] "base"     
## 
## [[7]]
##  [1] "readr"     "rvest"     "dplyr"     "stringr"   "foreign"   "stargazer"
##  [7] "ggplot2"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [13] "methods"   "base"     
## 
## [[8]]
##  [1] "reshape"   "readr"     "rvest"     "dplyr"     "stringr"   "foreign"  
##  [7] "stargazer" "ggplot2"   "stats"     "graphics"  "grDevices" "utils"    
## [13] "datasets"  "methods"   "base"     
## 
## [[9]]
##  [1] "forcats"   "purrr"     "tidyr"     "tibble"    "tidyverse" "reshape"  
##  [7] "readr"     "rvest"     "dplyr"     "stringr"   "foreign"   "stargazer"
## [13] "ggplot2"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [19] "methods"   "base"
setwd("/Users/jamesaugustin/Library/CloudStorage/OneDrive-UniversityofGeorgia/4MyStudies/Ph.D/UGA/Second Year/Spring 2023/AAEC 8610/HW5")
MinWageEmp <- read_csv("CardKrueger1994_fastfood.csv")
## Rows: 410 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (12): id, state, emptot, emptot2, demp, chain, bk, kfc, roys, wendys, wa...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(MinWageEmp, n=10)
## # A tibble: 10 × 12
##       id state emptot emptot2   demp chain    bk   kfc  roys wendys wage_st
##    <dbl> <dbl>  <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl>
##  1    46     0   40.5    24   -16.5      1     1     0     0      0   NA   
##  2    49     0   13.8    11.5  -2.25     2     0     1     0      0   NA   
##  3   506     0    8.5    10.5   2        2     0     1     0      0   NA   
##  4    56     0   34      20   -14        4     0     0     0      1    5   
##  5    61     0   24      35.5  11.5      4     0     0     0      1    5.5 
##  6    62     0   20.5    NA    NA        4     0     0     0      1    5   
##  7   445     0   70.5    29   -41.5      1     1     0     0      0    5   
##  8   451     0   23.5    36.5  13        1     1     0     0      0    5   
##  9   455     0   11      11     0        2     0     1     0      0    5.25
## 10   458     0    9       8.5  -0.5      2     0     1     0      0    5   
## # … with 1 more variable: wage_st2 <dbl>

b. Verify that the data is correct Reproduce the % of Burger King, KFC, Roys, and Wendys, as well as the FTE means in the 2 waves. (Note: This is just to force you to do a summary stats table with R. I used group_by then %>% then summarize. I’m sure some of you will find better ways to do it.)

#percentage of representation of each fast-food chain
#grouping the data
grouped_data <- MinWageEmp %>%
  group_by(state) %>%
  summarize(Burger_King = sum(bk),
            KFC = sum(kfc),
            Roys = sum(roys),
            Wendys = sum(wendys))
  
#creating summary tables
summary_table <- grouped_data %>%
  select(Burger_King, KFC, Roys, Wendys) %>%
  as.matrix() %>%
  t() 
#creating a contingency table and rounding the numbers to 1 decimal
percent_table <- round(prop.table(summary_table, margin = 2) *100 ,1)
#renaming the columns
colnames(percent_table) <- c("PA", "NJ")
#printing in different order
print(percent_table[,c("NJ","PA")])
##               NJ   PA
## Burger_King 41.1 44.3
## KFC         20.5 15.2
## Roys        24.8 21.5
## Wendys      13.6 19.0
#FTE means
#grouping the data
grouped_data2 <- MinWageEmp %>%
  group_by(state) %>%
  summarize(Wave1_FTE = mean(emptot, na.rm = TRUE),
            Wave_2FTE = mean(emptot2, na.rm = TRUE))
  
#creating summary tables
summary_table2 <- grouped_data2 %>%
  select(Wave1_FTE, Wave_2FTE) %>%
  as.matrix() %>%
  t() 

#rounding to 1 decimal
summary_table2 <- round(summary_table2,1)
#renaming the columns
colnames(summary_table2) <- c("PA", "NJ")
#printing in different order
print(summary_table2[,c("NJ","PA")])
##             NJ   PA
## Wave1_FTE 20.4 23.3
## Wave_2FTE 21.0 21.2

c. Use a “first-differenced” OLS to obtain their Diff-in-diff estimator (almost – you won’t get it exactly) Comment on how your OLS compared to the DiD estimate in Table 3 of the paper.

#removing all the na values
MinWageEmp_nona <- MinWageEmp %>% filter(!is.na(demp))
#using OLS for the diff-in-diff
Model1 <- lm(demp ~state, data=MinWageEmp_nona)
stargazer(Model1, type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                demp            
## -----------------------------------------------
## state                         2.750**          
##                               (1.154)          
##                                                
## Constant                     -2.283**          
##                               (1.036)          
##                                                
## -----------------------------------------------
## Observations                    384            
## R2                             0.015           
## Adjusted R2                    0.012           
## Residual Std. Error      8.968 (df = 382)      
## F Statistic            5.675** (df = 1; 382)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The coefficient from the paper is 2.76 with a standard error of 1.36. The one I obtained is 2.75 with a standard error of 1.154. In other words, the paper reported a slightly bigger coefficient with a smaller standard deviation.

Part 3: Alternative ways of running DiD

d. What would be the equation of a standard “difference in difference” regression? Just write down the equation and briefly explain each coefficient coefficient.

\[FTE_{change} = \alpha + \beta*location_{i} + \gamma*post_{i,t} + \lambda*location*post_{i,t} + \epsilon_{i,t}\] \(\alpha\) is the average change in FTE in the absence of any treatment (the change in minimum wage) and any location-specific differences.

\(\beta\) is the coefficient of the location variable, which represents the average difference in the change in FTE between the two locations (for example, New Jersey and Pennsylvania). This coefficient measures the effect of the location on the change in FTE, holding all other variables constant.

\(\gamma\) is the coefficient of the post variable, which represents the average change in FTE after the treatment (i.e., the increase in minimum wage). This coefficient measures the effect of the treatment on the change in FTE, holding all other variables constant.

\(\lambda\) is the coefficient of the interaction term between location and post, representing the difference in the treatment effect between the two locations.

e. Compute the difference-in-differences estimator “by hand”. Don’t use a regression. Get the numbers from columns 1,2,3 rows 1,2,4 (not 3) of the top-left corner of Table 3 in the paper and do the subtractions. You will not obtain the exact same estimate - but pretty close. I got 2.75 instead of 2.74. Interpret the results in a couple of sentences. Optionally: Compute the standard errors by hand too

mean_before_PA <- 23.33
mean_before_NJ <- 20.44

mean_after_PA <- 21.17
mean_after_NJ <- 21.03

#calculating the diff-in-diff estimator
(diff_in_diff <- mean_after_NJ - mean_after_PA - (mean_before_NJ - mean_before_PA))
## [1] 2.75

We got exactly the same results: 2.75. - The result of 2.75 represents the estimated effect of the treatment (change in minimum wage) on the outcome variable of interest (Full-Time Equivalent employees). Given that the treatment effect can be estimated as the difference in the pre-post change between the treatment group (New Jersey) and the control group (Pennsylvania) after controlling for other factors. It can be interpreted as the estimated change in Full-Time Equivalent employees in New Jersey relative to Pennsylvania after the change in minimum wage.

f. Run the regression you wrote up in part d (Note: You will likely need to reshape your data to long form first) Comment on the results you obtain.

#reshaping the data to long form
MinWageEmp_nona_long <- MinWageEmp_nona %>% 
  gather(key = "post", value = "wage", wage_st, wage_st2) %>% 
  mutate(post = ifelse(post == "wage_st", 0, 1))

#creating the interaction term between state and post
MinWageEmp_nona_long$state_post <- MinWageEmp_nona_long$state * MinWageEmp_nona_long$post

#running the regression
Model2 <- lm(demp ~ state + post + state_post, data = MinWageEmp_nona_long)

#summarizing the results
stargazer(Model2, type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                demp            
## -----------------------------------------------
## state                         2.750**          
##                               (1.154)          
##                                                
## post                          -0.000           
##                               (1.464)          
##                                                
## state_post                     0.000           
##                               (1.633)          
##                                                
## Constant                     -2.283**          
##                               (1.036)          
##                                                
## -----------------------------------------------
## Observations                    768            
## R2                             0.015           
## Adjusted R2                    0.011           
## Residual Std. Error      8.968 (df = 764)      
## F Statistic            3.784** (df = 3; 764)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

\(\alpha\) represents the average change in FTE for the control state before the treatment. The coefficient of -2.283 suggests that on average, there was a decrease of 2.283 units in FTE for the control state before the treatment.

\(\beta\) represents the difference in the average change in FTE between the two states, holding all other variables constant. The coefficient of 2.75 suggests that on average, there is a 2.75 unit difference in the change in FTE between the two states.

\(\gamma\) represents the change in the average FTE after the treatment, holding all other variables constant. The coefficient of -0.000000 suggests that on average, there is a negligible decrease in the change in FTE after the treatment.

\(\lambda\) represents the effect of the treatment on the difference in FTE between the two states. The coefficient of 0.00000000 suggests that the effect of the treatment on the difference in FTE between the two states is also negligible.

Only the coefficients for state and the constant are significant. Therefore, we can conclude that the difference in the average change in FTE between the two states is the only factor that significantly explains the variation in the change in FTE.